Article Index

 

Mitigation and solving the problem for good

If this issue affected consumer products, such as the faulty 6-series chipset for Intel 2nd Gen Core CPU, the dozens of known issues to apple mac books, macs and iPhone such as bad solder on GPUs, faulty FireWire ports, ‘bendgate’ on iPhone 6 Plus’s and a multitude of other common faults such as bad capacitors to all brands, an end user would simply be inconvenienced during the process of having the device repaired or replaced, and ignoring the financial implications a replacement is easily sourced.

For business/enterprise gear especially ones that run services third party customers, things get a bit more complicated as there are service level agreements in place that make a provider liable for outages to customer services, as well as any loss to reputation or damage to equipment.

SLA’s are often defined as a high 90s percentage, examples are:

  • 99.9% uptime means a 43 minute outage per month.
  • 99.99% uptime means 4.3 minutes outage per month.

Some providers advertise 100% uptime which obviously means 0 minutes outage per month. This is impossible regardless of high availability technology or provider. Third party services such as power often go out or become instable more often external to a Datacentre, international telecommunication lines fail such as in a storm or if a ship damages a undersea line or commonly enough contractors damage cables while doing construction or digging.  In extreme cases, acts of god and natural disasters such as floods or hurricanes/cyclones can take out entire facilities.

No cloud or network infrastructure provider can fully mitigate against all these issues. The best possible is to ensure redundancy on a service and geographic level, so if a particular service is lost in one location such as a router, power or data  line, other resources in another geography can be switched in.

Intel’s ‘Rangeley’ based Atom chips are and were used in hardware installed in such critical locations and sites so loss of functionality becomes paramount.

Users of such equipment need to plan for and accommodate not only the possible future failure of any of these devices, but also if a device fails how it will be replaced and what the time frame is to replace it.

The core definition of the relationship between a paying customer and business of high repute should be the ease of obtaining support and service from that vendor, especially when there is a declared known issue relating to that device.

I can understand a tier 1 OEM like Dell need to manage their inventory and production levels so they can balance existing orders as well as buffer stock for repair programs, but given the price of these devices (>$10K AUD before options)  a user should be able to request repair/replacement for affected devices whenever they please or need.

For an ordinary fault, the customers paid for high priority on-site warranty will cover any spot failures within a few hours but dell are refusing to allow any un-dead switches to be exchanged under this paid warranty program, again, we must wait for the bat signal from Dell.

By then it will be too late, the damage has been done.

Dell’s intermediate fix for the issue is to add “early warning detection” of imminent failure via a patch to their switch operating system. Vendors typically do not add telemetry to their systems and devices indicating a future fault. Often devices detect a fault after the fact and provide a remedy, or perform scheduled maintenance. If a company like Apple added a readout to iPhone stating ‘your phone is dying’, the zombie horde would grab their torches and pitchforks in revolt!

For administrators the only sure solution is to purchase a new switch at own-cost, schedule an outage, swap out the old switch for the new one and store the old, to be faulty switch for future replacement. This requires capital investment which some may not be able or willing to commit to. 

In our case, as many of our switches have reached or exceeded the 18 month mark, they are in the danger zone. IF there is an interruption in power it is possible the switch may not come back after the outage, leaving the network in a state of disarray /suboptimal state. The fact they are heavy operated 24x7 only further degrades the product. Something which could have been avoided if Dell had simply allowed blanket replacement of devices without any assigned priority.

It is unreasonable for a purchaser of a good to replace it at their own expense when the vendor has admitted and disclosed to the public that the item has known issues or faults.

Again to use Apple as an analogue, when a repair program for an iPhone or Mac is announced, no devices are prioritised /by age, and all devices falling within a specific series or time frame are equally eligible for repair regardless of age or mileage.

What should vendors do about the problem then, or what should they do different? All manufacturers had to do was allow end users to RMA their devices on an ad-hoc basis. Since production was fixed in February, the number of possible affected units in the field is limited and users should not expect a fast turnaround

All manufacturers concerned should also replace customer devices out of warranty. An active warranty plan should not be required to repair devices with declared issues.

Intel should also help end users who cannot get service for their device.

 

atom c2000 rangeley soc

2018-09-21 | NVIDIA QUADRO RTX 6000 In action - Photoshop, Porsche RTX, Kubernetes, CLARA and Metropolis


2018-09-20 | NVIDIA RTX Live Demo - Ray Traced Porsche on Turing GPU


2018-09-21 | EPSON goes bright with laser projection and 4K


2018-09-15 | 8 NVIDIA TESLA V100 VOLTA GPU server from HP


2017-12-31 | Future of Networking with Extreme Networks - Eric Broockman CTO Interview


2018-02-16 | NVIDIA’S VOLTA is real, it works and it is finally here.


2018-02-07 | motorola harbourside chat and interview | moto x4, g5s plus, 360 camera & gamepad


2018-02-07 | Inwin X-Frame product tour and interview


2018-02-07 | ASUS Zenbook 3 Deluxe UX490 Ultrabook


2017-11-20 | VICHYPER - Australian Hyperloop Interview and Tour


2017-11-17 | Eugene Kaspersky & Kaspersky Lab Press Conference Sydney 2017


2017-11-17 | Eugene Kaspersky Keynote Sydney 2017


2017-05-21 | AMD RYZEN - Australian launch and Q&A Session | BIOS Update Detailed


2016-11-21 | Motorola Mobility VP Engineering Interview | Moto Z and Moto Mods Developer Kit for Makers & Modders


2016-11-30 | EPSON Tech Interview | 2016 Business and Professional Projectors


2016-11-30 | EPSON Tech Interview | 2016 Home Theatre Projectors 6000, 8000 & 9000 Series


2016-11-22 | EPSON HDR Home Theatre Projector Demo | EPSON TW9300W


2016-11-21 | HDR video Demo - AMD Radeon 'Polaris' RX480 with LG OLED TV


2016-11-30 | Intel Game Chamber 2016 Tour and Interviews | VR, Games, PC Mods, Overclocking and FUN


2016-11-30 | Intel Core i7-6950X Broadwell-E @ 5.62 GHz Live Overclock | Intel Game Chamber 2016


2016-08-29 | LG puts OLED in all the things - Integrate/CEDIA AUS 2016 Booth Tour


2016-08-12 | What is Software Defined Networking with Doug Murray, CEO Big Switch Networks.


2016-06-17 | NVIDIA DGX-1 launches in Australia - NVIDIA ANZ Interviewed


2016-05-01 | Build a kickass Dolby ATMOS Home Theatre - The Pros show us how


2016-04-01 | The Panasonic Toughbook CF-20 Story


2016-03-23 | Seagate 8TB Family, 10TB Helium & Mobile HDD 2016 Hard Disk Update


2016-03-21 | Panasonic Toughbook 20 : Fully Rugged detachable Laptop launch - Water, Drop & Weight Tested


2016-03-23 | ASUS ROG Gaming House Tour - Sydney, Australia


2016-03-01 | ASUS ROG GX700 Liquid Cooled Notebook Preview & Interview


2016-03-23 | League of Legends Legacy eSports Team Interview


2015-12-09 | Microsoft Store SYDNEY - Grand Opening Press Conference


2015-11-11 | HP 6th Gen Core Intel Skylake PC & Jet Intelligence Printing Launch - Melbourne Experience Centre


2015-11-11 | HP Experience Centre Melbourne Tour -Part 2


2015-11-11 | HP Experience Centre Melbourne Tour -Part 1


2015-08-28 | Parrot Interviewed - Flower Power Wireless Plant Monitor


2015-08-28 | Parrot Interviewed - ZiK 2.0 Headphones


2015-08-28 | Parrot Interviewed - AR. Drone, MiniDrones, Bebop Drone


2015-08-23 | MOTO G 3rd Gen IPX7 Water Proofing Demo & Moto X Style/Play/G Q&A


2015-08-21 | Motorola Mobility Interviewed - 3rd Gen Moto G, Moto X Play, Moto X Style


2015-08-20 | NVIDIA GEFORCE GTX 950 Launch Ep.1 | Gamestream Co-Op - A New way to share games


2015-08-20 | NVIDIA GEFORCE GTX 950 Launch Ep.2 | MOBA/DOTA 2 Latency and Smoothness


2015-08-08 | Intel SKYLAKE i5-6600K LN2 overclock at 6.33 GHz with ASUS Z170 ROG Hero


2015-08-07 | Intel Skylake i7-6700K @ 4.8 GHz & DDR4 @ 3.5GHz on ASUS Z170 Deluxe Overclocking Guide


2015-08-05 | ASUS Z170 Deluxe Motherboard for Intel Skylake 6th Gen Core CPU Overview


2015-08-05 | Review - ASUS Installation Tool for Intel SKYLAKE 6th Gen Core CPU


2015-08-02 | HP Pavilion and Envy 2015 Consumer Notebook First Look


2015-07-25 | HP Envy Phoenix 2015 Overclockable Gaming Tower First Look


2015-07-22 | Ashes Of The Singularity - DirectX 12 Hands-On Demo with AMD FURY X


2015-07-14 | Media Q&A with AMD and Richard Huddy - AMD Fury X ‘Fiji’ Graphics comes to Australia


2015-07-14 | MSI Interview - AMD Radeon R9 300 Gaming, Armour and Lightning GPUs