Quote:
Originally Posted by mura_gadi
You can get 5+ teraflop phones now, the computing power of the best Super Computers in the world when Bill Clinton walked into office was about 10% of a teraflop. They say, you have more computing power in your hands today than was available to Bill Clinton as PotUS. (Around the late 90's for the 1st teraflop super computer.)
|
Hi Steve,
In 2016 I read an engineering article about how surprisingly low the
MTBF was of state-of-the-art supercomputers.
We are all familiar with the notoriously short MTBF numbers of early
vacuum tube computers like ENIAC which would be in order of an hour.
And we are all familiar with PC's at home or work operating reliably
indefinitely. Switched on continually, you may never see a hardware
related crash in months or even years.
But if you want to be at the absolute cutting edge of high performance
supercomputing in the petaFlop and exaFlop range, when it comes to
MTBF figures, few things have changed since the days of vacuum tube
computers.
Cosmic-radiation would be one big cause of failures, with one machine
at Los Alamos, even after putting extra metal shielding on it, only
running for about 6 hours before crashing. Another at Virginia
Tech’s Advanced Computing facility which consisted of 1,100 Apple
Power Mac G5 CPU's but no ECC on the RAM, had a failure rate so high
from cosmic radiation that it was nearly impossible even to boot the
whole system before it would crash.
A Cray XT-5 at Oak Ridge had 360 terabytes of main memory with ECC
and it would log ECC errors at a rate of 350 per minute.
The IBM Blue Gene/L system at Lawrence Livermore Laboratory, which was
the largest computer in the world between 2004 to 2008, would frequently
crash and the culprit was found to be radioactive lead in the solder.
The Cray XK7 Titan was the top supercomputer system in the world for a
long time and consisted of 18,688 NVIDIA GPUs but even it had a MTBF
during some periods of its life of less than a day.
Anyway, I had only just read this article in 2016 when I happened to
meet someone who was involved in supercomputing at the Sandia
National Laboratories. I said to him I had read that the MTBF could be less
than a day and he replied, "Oh no, much worse than that. Typically in the
order of only 30 minutes".