Connect with us

Blog

17.59 PFLOPS: AMD Opteron, Nvidia Tesla K20X Power World’s Fastest SuperComputer

Archivebot

Published

on

Sixteen years ago, Intel ASCII Red was the first supercomputer to break the 1TFLOPS barrier (1996). Four years ago, the world saw the debut of first PFLOPS system. Dubbed Roadrunner, this IBM-built system combined 6,480 dual-core AMD Opterons and 12,960 IBM PowerXCell 8i (8-core Cell CPU). Today, Top500.org released its 40th list of top five hundred supercomputers and once more, a combination of AMD Opteron and a 3rd party processor is sitting on top of the list.

This time around, the world record belongs to Titan supercomputer with massive 17.59 PFLOPS, 16.88x faster than the Roadrunner setup. If this does not talk magnitude about the progress of supercomputers, we don’t know what is.
Titan is located at Oak Ridge National Laboratories, and is consisted out of 2336 nodes consisting out of eight AMD Opteron 6276 (16-core Interlagos) and eight Nvidia Tesla K20X (2688 CUDA core) processors for a grand total of 18,688 Opterons and 18,688 Teslas. This is world’s largest installation of AMD Opterons and world’s largest installation of GPUs of any kind.

When we released the initial story on Oak Ridge’s Titan supercomputer, we were informed by Nvidia that Titan featured Tesla K20 GPUs, which consist out of 13 SMX units (2496 core) and 5GB of GDDR5 memory, while in fact Oak Ridge was getting Tesla K20X GPUs, with 15 SMX units (2688 cores) and 6GB of GDDR5 memory. The grand total of processing cores now stands at 299,008 x86 and 50,233,344 CUDA cores. Yes, you’ve read it correctly – Titan features over 50 million cores!

Overall system memory stands at 710 TB – each of the 16-core Opterons has 32GB of Registered ECC DDR3-1333 for a grand total of 584 TB/s, while the GPU section alone is set at 109.5 TB of GDDR5 memory. As you probably noticed, not all memory comes from CPUs and GPUs, as the system is using significant amount of memory as cache as well. The aggregate GPU bandwidth is astounding 4.45 PB/s, while the aggregate CPUs bandwidth stands at 0.76 PB/s.
Bear in mind that all of the numbers listed above are calculated using 1024 base for Gigabytes and Terabytes. To us, we find it sad that even scientific institutions are using the 1000-base to calculate as high numbers as possible. 1000 GB/s is not 1TB/s, unless you’re a hard drive manufacturer, of course. 🙂

Titan is replacing Jaguar in its entirety, which was a 2.3 PFLOPS system which was exclusively powered by AMD Opteron processors. Today, practically the same power footprint results with more than a 7x improvement in performance.

\Yet, this is not all… over the next two years, we might get to see a 100 PFLOPS system combining 100,000 CPUs and 100,000 accelerator cards, as the path forward seems to go through mixing the CPU and GPGPU. Who will be the main vendor for that system? Only time will tell…

Original Author: Theo Valich


You have stumbled on one of the old articles from our archive, for the latest articles I would suggest a visit to our latest technology news section. That part of our site offers fresh stuff! Additionally, we take great pride in our Home Office section, as well as iGaming news, so be sure to check them out as well.