SPEC – the Standard Performance Evaluation Company – today presented its latest benchmark suite SPEChpc 2021, which is supposed to measure the “intensive parallel computing power over one or more nodes”. SPEC was founded in 1988 and supports many benchmarks; One of the most famous is SPEC CPU 2017, which is used to benchmark CPU performance. With the advent of heterogeneous architectures with multiple accelerators, the addition of the SPEChpc benchmark is timely.
The new benchmark suite has been in the works for about four years, Mat Colgrove said in one HPC wire Briefing. Colgrove heads the SPEC High Performance Group (HPG) and is also a developer / technical engineer for Nvidia.
âOur group’s mission is to study high performance computing and we have published various benchmarks over the years. We mainly focus on programming models. Our still active benchmarks are SPEC MPI 2007 and SPEC OpenMP 2012, âsaid Colgrove, who was involved in the development of the SPEC ACCEL benchmark, which measures the performance of the accelerator, host CPU, memory transfer between host and accelerator, library support and tests drivers and compilers.
The new SPEChpc suite is aimed at performance when the workload is outsourced to accelerators.
âWe combined all of that [earlier] Elements in order to be able to use a single benchmark and do so in a hybrid [fashion] with several models. That gives you a different perspective. Now you can look at different heterogeneous and homogeneous architectures with the same code base. The parallel [programming] The model can change, but algorithmically it cannot. The idea was to have portable models that could be used by many different vendors and allow a fair comparison between different architectures, âsaid Colgrove.
The value of any benchmark lies in understanding its details. The components of SPEChpc 2021 are shown in the two following figures. (Click to enlarge)
Colgrove emphasized that SPEChpc is a ‘heavily scaled’ benchmark: âSo we have a fixed-size workload. The problem with scaling is that we want to range from a single node or two to hundreds or even thousands of nodes. You can’t do that with a workload. You will have too large a memory limit. We decided to split it up into four workloads. “
As can be seen in the images above, SPEChpc has four suites, each comprising between 9 (Tiny / Small) and 6 (Medium / Large) benchmarks. The three benchmarks that are not included in the Medium and Large Suite contain constructs such as MPI_AllReduce, which were not conducive to benchmarking on a large scale. According to SPEC, fewer benchmarks also help lower the costs of operating the suites on larger systems.
âWhile the benchmark source code is the same between suites, the workload size and storage requirements are different for each suite. Each suite is designed for clusters with different numbers of nodes and cores. The user must determine the appropriate suite for his system. While SPEChpc does not have requirements for the minimum or maximum number of ranks that can be used with each suite, the scale decreases as more ranks are used as SPEChpc scales heavily with fixed-size workloads. In some cases a benchmark can fail if too many ranks are used. The description for each suite below gives the ranks that SPEC / HPG tested using a pure MPI run. Scaling and ranking can differ when using an additional parallel model at the node level (OpenACC or OpenMP) â, says SPEC.
It will likely take some time for the new SPEChpc 2021 benchmark to become widespread. SPEC benchmarks are primarily tools for vendors to represent their system performance in a standardized way, but can also be run by users to evaluate their internal systems. Here is an excerpt from today’s announcement:
âWith the SPEChpc 2021 Benchmark Suites, developers and researchers can evaluate different programming models in order to assess which model is best suited for their application or system configuration. Hardware and software providers can use it to carry out stress tests for their solutions. And it can help compiler manufacturers improve overall code performance and support for policy-based programming models. The new suites can also be used by data center operators and other end users to make procurement decisions.
âBuilding on our experience developing the SPEC MPI 2007 benchmark, the SPEC OMP 2012 benchmark, and the SPEC ACCEL benchmark suites, SPEC has developed a new set of benchmark suites that keep pace with the rapidly evolving HPC market hold, âsaid Ron Lieberman, SPEC chair of the High Performance Group (HPG). “The high portability of the SPEChpc 2021 benchmark suites together with a strict result verification process and an extensive SPEC result repository enables us to provide vendor-neutral performance comparisons for the evaluation and investigation of modern HPC platforms.”
By and large, SPEC licenses its test suites for use by companies and organizations. Companies that carry out “compliant” (not customizing) tests are then free to publish the results. SPEC encourages organizations that conduct benchmarks to post them in lists that are kept on its website, but this is not required.
To date, 76 SPEChpc benchmarks were listed. They are distributed over the various test suites (tiny, small, medium, large). The hope is that soon more will be added by vendors and users showcasing their systems.
Colgrove recommends that care be taken in reviewing the results. âWe’re distilling down to a single metric number, but what drives that number may not be immediately apparent. I [may] performed a world record number, but what composes that? [test]. For example, how many nodes were used because the top tiny net result is down here is TACC (The Texas Advanced Computing Center, three entries) and they came up with 78. That’s an impressive number, but they also used 32 nodes, 64 Ranks and 20 openMP threads. It’s a big system. I encourage people to look at the details and understand them well. What do I want to compare? Do I want to look at a smaller system? If I’m a buyer looking at this and I wanted to understand it, I have to scale or scale, âhe said.
The actual SPEChpc score is a ratio of the performance of the tested system to a reference system, which is the Taurus system of the TU Dresden. The SPEChpc score is calculated as follows (from the SPEC website):
- For the specified suite (Tiny, Small, Medium, Large), the elapsed time in seconds is given for each of its benchmark runs.
- The ratio of the time of the reference system (Taurus system of the TU Dresden) divided by the corresponding measured time is given.
- The median or the lower median of these ratios per benchmark is given separately for base and peak value.
- The “base” metric is the geometric mean of the medians of the base ratios, and the “peak” metric is the geometric mean of the medians of the peak ratios.
Benchmarks for a number of systems are currently shown, with Lenovo submitting the majority, but there are also benchmarks for TACC (Frontera) and Oak Ridge (Summit). It is best to look directly at the results, considering that this is the first set of SPEChpc scores listed.
âWe very much hope that it will have an effect. This is our first major attempt at a large-scale hybrid benchmark, so to speak. We want to develop in this direction in the future, get feedback and attract more people, âsaid Colgrove.
Link to the results of SPEChpc 2021, https://www.spec.org/hpc2021/results/hpc2021.html
Link to SPEChpc 2021 overview, https://www.spec.org/hpc2021/Docs/overview.html#Q20
Link to the SPEChpc 2021 rules, https://www.spec.org/hpc2021/Docs/runrules.html