With Cray EX Plus AMD Compute, HPE wins another European HPC center

0

Per aspera, ad astra, an old Latin proverb that means “by striving for the stars”, is the root of the name for a hybrid HPC and AI supercomputer developed by the Grand Équipment National de Calcul Intensif (GENCI) in collaboration with the Center Informatique National de l’Enseignement Supérieur (CINES), one of three national HPC centers in France, will build next year to provide scientific applications with 20 times more computing power.

The Adastra system, which goes into the CINES data center in Montpelier, a long-standing technology forge in France and thus also in Europe, is interesting insofar as Hewlett Packard Enterprise was selected as the main contractor for the 70 petaflops machine, not the Bull Division of French service giant Atos, which was the established provider of the two previous generations of Petascale supercomputers at CINES, the Occigen and Occigen 2 systems Vendors of the previous two CINES systems, Jade and Jade 2, which were made in 2008 and 2010 and have packed hundreds of teraflops, as you can see below:

The Occigen 2 machine, which was installed in January 2017, is showing its age, which happens from time to time in HPC centers and was a problem especially in the first year of the coronavirus pandemic, when the ability to install people in facilities to install machines was problematic. The Occigen 2 machine also didn’t have GPU accelerators, which means it couldn’t actually do AI training alongside HPC simulation and modeling on the same machine and in the same workflow. Occigen 2 had a total of 3,364 two-socket nodes based on Intel Xeon E5 processors of the “Haswell” and “Broadwell” generations with a total of 85,824 cores across the machines; the aggregated peak performance of the machine connected via 56 Gb / s FDR InfiniBand interconnects from Mellanox (now part of Nvidia) and fed by a 5 PB Luster parallel file system.

The Adastra machine will be a huge performance upgrade and will consist of a pure CPU cluster, as CINES has used in the past, and a hybrid CPU-GPU cluster, which we assume will take up much of the aggregate computing capacity of. offers the system. Essentially, the pure CPU machine partition on the machine can run the existing workloads at CINES.

The exact feeds and speeds of the two partitions have not been disclosed, but we strongly suspect that the CPU-only part of the machine will have a fairly large increase in core count and throughput, on the order of 5 petaflops to 6 petaflops at least. Maybe more. What we do know is that this pure CPU partition will be based on the future “Genua” Epyc 7004 processors from AMD, which will be released in the middle of next year, and nodes with 768 GB of main memory and a 200 Gb. will have / sec Cray Slingshot 11 connection per node. If we were looking for TCO savings, as GENCI and CINES certainly do, it would be tightly packed single socket nodes, something in the middle range, as has been done in the past to increase TCO. If Genoa has 96 cores or less, Adastra’s CPU-only partition may be using 48-core processors in a single socket node. But they say it will be based on Epyc- “processors” in the plural, so maybe it will be some low-bin parts like a pair of 32-core chips that are very affordable and have lots of memory slots and therefore memory sticks with it very little capacity can deliver the capacity and also a lot of bandwidth.

The second partition, which has GPU acceleration, sounds like it looks like a slightly updated variant of the nodes used in the Frontier supercomputer that is being installed at Oak Ridge National Laboratory. This second partition from Adastra will be a custom “Milan” Epyc 7003 processor with 256 GB main memory and four of the new “Aldebaran” Instinct MI250X GPU accelerators, each with 128 GB HBM2E stacked memory and four 200 Gb. have / sec Slingshot 11 network interface cards that connect the GPUs directly to the Slingshot network (like the Frontier supercomputer).

The first full CPU partition is to be installed in spring 2022, the remaining CPU-GPU nodes will come in the fourth quarter of 2022 now, but Genoa CPUs will come later. . . . )

The Adastra system will have a hybrid file system based on Cray ClusterStor E1000 arrays with Luster, including a 2PB partition based on flash storage providing a throughput of 1.3 TB / sec. supplies, and a 24-PB partition based on hard drives that provide 250 GB / sec of throughput. This hard drive-only Luster file system has 2.5 times the throughput of the Luster memory attached to the current Occigen 2 supercomputer and 4.8 times the capacity.

It is interesting that the Adastra system will have more than 20 times the theoretical peak power of the Occigen 2 machine, but with 1.6 megawatts of power it will only consume about 60 percent more energy than the Occigen 2 supercomputer. This is what five years of a still weakened Moore’s law paired with a change in the architecture to at least some GPU acceleration can bring about.

As part of the Adastra deal, AMD is working with GENCI and CINES to port GPU acceleration applications to the ROCm programming environment, including Nvidia’s CUDA HIP clone and OpenMP parallel threading for CPUs and GPUs.

Register for our newsletter

With highlights, analyzes and stories from the week straight from us in your inbox, without in between.
Subscribe now


Source link

Share.

Comments are closed.