NitroWare.net

Please standby while the website is under maintenance. All existing content is still available to access.

 

Beyond statements and promises – our attempts at getting our Rangeley powered devices replaced

 

Our real world adventure to get Rangeley powered devices repaired/replaced focuses on Dell Datacentre grade switches which were purchased over the time frame of two to three years.

The Seagate NAS with the Rangeley Atom CPU mentioned is not covered under Seagate’s retail warranty.

Let’s look back as a reminder of what Dell have stated in March 2017 in their official KB note of their repair program and break down their statement..

http://www.dell.com/support/article/au/en/aubsd1/qna44095/networking-clock-signal-qa?lang=en  

 dell s series open networking switches

 

 

 

 “Beginning in July, Dell will proactively begin replacing impacted products that are under warranty or that are covered by a customer’s service contract.”

This fragment forms the beginning of the long journey. Our efforts to get at least our oldest Dell Switch replaced started in March 2017 and went through to December 2017. During this almost year, we achieved no progress at all on replacing any of our Dell Switches affected by this issue. It is purely a waiting game.

 

“Although the Dell EMC products with these components are currently performing normally and no failures have been reported to date (as of Aug 1, 2017), some product failures may occur over the years, beginning after the unit has been in operation for approximately 18 months.”

“Q: When does the 18 month windows for increased failure rate begin and what is the expected failure rate?

A: The 18 month timeframe for beginning to see increased failure rate is determined based off total system power-on runtime, not manufacturing or ship date. While Dell expects the failure rate to begin to increase at the 18 month mark, the anticipated failure rate for the first two years of operation is expected to be under extremely low.”

 

The metal layers that make up the silicon die for the Rangeley chip degrade over the stated 18 month period, after which time the electrical integrity of the LPC (low pin count) bus, a bus which connects the chipset to flash memory containing the firmware is no longer reliable and is poor. Since the system needs a stable clock signal to talk to its companion chips, the system fails to boot after a power outage or power cycle.

 

 “Q: Do I need to contact Dell to get replacement product or be issued a Return Merchandise Authorization (RMA)?

A: No you do not need to proactively contact Dell to get replacement product. Dell will be proactively contacting all customers starting in the July-August 2017 timeframe to arrange for replacement product.”

 

This class of Dell networking are sold on a b-2-b basis in some regions such as Australia. From this interaction, Dell has a unique record of who has purchased the device and is able to contact these customers for after sales service without the concern of warranty card registration, a concept depended on in the Americas but not in other countries such as Australia.

Dell’s intention was to bring up enough buffer stock to cover this return program, which would not necessarily available on a typical weekly basis. The several month timeframe can be a typical period in the electronics industry to build up inventory.

 

“Q: What if my switch is no longer under warranty?

A: To be eligible for the Proactive Replacement Program, customers must have had a valid warranty contract as of 2/7/2017 or later. Customers with effected products that have a warranty expired prior to 2/7/2017 may choose to purchase a service contract to have affected products replaced.”

 

If your warranty has expired after Feb 2017 and your device dies, despite the ’18 month expected lifetime’ that has been described, you will not be able to have the device repaired for free and a valid paid service contract is required. The repair program might as well not exist as no special handling is offered for the issue, Dell (and the other OEMs affected) just formally ‘recognise’ their devices are faulty.

 

“Q: What if my switch is affected (was under warranty on Feb 7, 2017), but is no longer under warranty or service contract, and I experience a failure?

A: In this case the unit will be replaced as the program priority comes up. The replacement timing has no NBD (or other) timeline guarantees (in cases where there is no existing Support contract). To guarantee that units are replaced within the use case requirements, please keep the support contract current to ensure timely replacements.”

 

If you do not have a support contract, the repair program enables eligibility for replacement but only at the lowest priority compared to customers with a replacement program. ‘use case requirements’ will be whatever Dell want it to mean, in that a return request must meet their criteria and that a working switch that has not yet failed, despite ’18 months life expectancy’ will be rejected, as was in my instance.

 

“Early warning detection can be enabled via a new patch release for the following affected models (S3048-ON, S6100-ON, Z9100-ON and C9010 platforms). This patch is available via dell.com/support.”

 

Since the design Dell and their design/manufacturing partners does not allow for a fix via a BIOS update, a countdown timer for the ‘ticking time bomb’ was added, however at time of publishing I cannot say if this is a timer or simply a flag that is displayed once particular conditions are met such as device lifetime, or whether some code is actually monitoring the performance of the clock signals for each domain in the chipset. The user can’t RMA ad-hoc, so all the feature does is tell them when to be ready with a replacement device. The patch reads as ‘we have to do something, lets ad an alarm or count down timer’ In fact there is also the possibility this early warning system may be a false positive, devices can still fail. It is the nature of the silicon die itself.

In summary, we can’t send these devices away to be fixed on our own time and must wait for the signal from Dell for when they are ready to exchange.

In addition, they are running the program to replace the oldest manufactured devices first, regardless of any existing failures.

Devices must be under warranty as of February 2017 to be eligible for the repair program. If the warranty has since expired, then those devices will be the lowest priority ones to be replaced under the program which has no expiry date.

The replacement program is also dependant on regional stock. These switches are high end items and not typically kept in large amounts and are sometimes built to order when stocks are low.

These switches were expensive and for that we can't send them away to get fixed!

My understanding of the Cisco side of the problem was they enabled and allowed RMAs en masse of affected devices regardless of their state, provided there was a current service contract

 

I have described the initial global communication from Dell, both to the public via their website as well as a media specific comment. This messaging as well as direct communication to Dell tech support indicated what Dell had said already, to wait for the July-August timeframe onwards for more information and replacement

Having received this update on the matter, and not hearing anything else from other vendors I contacted or even user feedback. I waited from March to July/August for progression.

July/August came and went with the same message relayed back to us except with the timing updated to Quarter 4 2017. A reason given was build-up of stock, as I have previously described.

There was not much else I could do other than wait and try again.

Does our story end in Quarter 4 2017? Nope.

Very late November Dell Tech Support advised the timeframe was December. That itself is ok, but Quarter 4 to December is a little more dubious.  Again I break down the commentary:

 

I have received update from our Proactive Field Replacement team, they will start contact customer by regional team in December. The team will work with each affected customer to replace any affected Dell EMC network product. As so far the exact timeline for a specific customer contact is difficult to predict.”

 

Issue was first disclosed in March 2017, then remedy delayed to July, August, November and December.

Timelines shouldn’t be difficult to predict. One would hope a manufacturer is able to plan their manufacturing and inventory to the day, especially since many use Just-In-Time Logistics methods. One would think all customers regardless of the manufacturer are treated equal, but we words like “specific customer” and each customer.

 

“Feedback from our product team, even though we have customer with over units running in excess of 2 years, we have observed no failures. In our labs where we have much older units without appeared the issue. Failures at 24 months is a mere 0.6%.  There is no epic failure imminent, and the issue may not be observed until a reboot or power cycle occurs.”

 

0.6% of 1 Mil units (my ball park guesstimate) is 6000 units at one particular moment in time. Given this issue is deterioration and aging, common sense says this number will increase over time. Besides often a statement from a corporate can be taken with a grain of salt, any figure of this nature often has many if’s and but’s. In the electronics and PC hardware industry, up to 10% as a normal failure rate is a figure I have heard often.

The statement of “epic failure imminent” contradicts the later ‘issue observed until power cycle’, Well you’ll be sure it’s #epicfail when your device no longer switches on! That’s my expert opinion anyway, as someone who has worked in the electronics assembly/repair and IT administration, repair sectors.

 

“Furthermore, in case of any unit failure occur, our ProSupport team here can help to troubleshooting and RMA issue hardware with your warranty entitlement.”

 

I tried in good faith to RMA our hardware under our warranty entitlement and it was knocked back as the device was not dead, despite its age and the announced replacement program combined with the mention that little to no failures were observed.

The ‘rush’ on my end to exchange the device is from a planning and administrative point of view I have made physical accommodations to easily remove the old device. Those who work in data centres will be aware of difficulty to remove certain devices plus naturally it is preferential to the operation of a large network to pre-emptively replace something before it’s too late.

As you would be reading this story in February to March 2018 there has been no further updates on the matter since December 2017, a full year has passed. This is significant given the 18 months life expectancy of the chipset and length of warranties/support contracts offered by vendors.

There is the risk that the situation may change once this story goes to press…