Folks running Google Cloud can see the tides of HPC changing and know that, as we discussed just a few months ago, there’s a pretty good chance that more HPC workloads will grow over time Cloud builders are being shifted as their sheer size increasingly dictates future chips and system designs and processing economics.
Google also knows it needs to do more to grab more market share from its larger rivals — top dog Amazon Web Services and silver medalist Microsoft Azure. That’s why the company has introduced a new open-source toolkit to help HPC shops create clusters for simulation and modeling that are repeatable yet flexible.
Dubbed the Cloud HPC Toolkit — a name that likely saved Google Cloud’s marketing department quite a bit of money — the system software features a modular design that allows users to create anything from simple to advanced clusters, ranging from the ability can benefit from the cloud to easily decompose disaggregated resources based on ever-changing requirements – what is called composability and which is gaining importance in the HPC sector.
This is what the components of the Cloud HPC Toolkit look like:
Google Cloud believes that most users want to start with the toolkit’s various predefined blueprints for infrastructure and software configurations that come in handy for HPC environments. But for those who have their own configuration settings, these blueprints can be modified by changing a few lines of text in configuration files.
These blueprints support a variety of building blocks needed to create an HPC environment, from compute and storage to networking and schedulers. On the compute side, this includes all of Google Cloud’s virtual machines, as well as its GPU-based instances and its HPC VM image, which is based on the CentOS flavor of Red Hat Enterprise Linux. For storage, the toolkit supports Intel’s DAOS system and DDN’s Luster-based EXAScaler system, as well as Filestore, local SSDs, and persistent storage on Google Cloud. Additionally, blueprints can be configured to run on a 100Gbps network using Google Cloud’s placement policies to provide lower latency between VMs.
However, only one selection of schedulers is currently available in the toolkit: Slurm. Given that Google Cloud currently supports Altair’s PBS Pro and Grid Engine schedulers, and IBM’s Spectrum LSF and Slurm, it seems reasonable that Cloud HPC Toolkit will eventually add these as well.
Both Intel and AMD have thrown their support behind the Cloud HPC Toolkit, but the former – which is currently trying to catch up with the latter to make faster and better processors – is particularly keen to leverage Google Cloud’s latest HPC offering as a showcase for the growing investment by the semiconductor giant in software, particularly on the HPC side.
Among the blueprints in Google Cloud’s new toolkit is a pre-defined configuration of hardware and software for simulating and modeling workloads from Intel itself, advertised under the Intel Select Solutions brand. Whatever happened behind the scenes between Google Cloud and Intel, the cloud builder made sure to promote Intel’s simulation and modeling draft as the only detailed example in its blog post announcing the toolkit.
A key part of Intel’s simulation and modeling plan is the company’s OneAPI toolkit, the cross-platform parallel programming model that aims to simplify development across a wide range of computing engines, including those of Intel’s competitors.
In a statement, Intel said access to oneAPI and its HPC-focused branch can help optimize performance for simulation and modeling workloads by improving compile times, accelerating results and allowing users to use Intel and competitor chips with SYCL to leverage the royalty-free, cross-architecture programming abstraction layer underlying oneAPI’s Data Parallel C++ language.
Intel and its competitors know that the real gold in the semiconductor industry lies in the cloud builders and hyperscalers, so we wouldn’t be surprised to see more and more HPC software announcements of this nature in the cloud world, with Intel’s OneAPI and AMD is pushing its open ROCm platform, and Nvidia is finding new ways to extend the Hydra software, ie CUDA.
Sign up for our newsletter
Featuring highlights, analysis and stories from the week straight from us to your inbox, with no in-between.