HPE launches ML development system, swarm learning solution

0

In a double whammy of new AI announcements, Hewlett Packard Enterprise (HPE) today announced its new Machine Learning Development System (MLDS) and Swarm Learning solutions. Both aim to ease the burden of AI development in a development environment that increasingly includes large amounts of proprietary data and specialized hardware.

MLDS from HPE

HPE presents the Machine Learning Development System as an end-to-end solution purpose-built for AI, ranging from software to hardware. The origins of the MLDS date back almost a year to the acquisition of Certaind AI by HPE, the developer of a software stack to train AI models at scale faster. Back then, HPE promised to combine Certaind AI’s software solution with HPE’s own AI and HPC offerings – and now with the MLDS, it’s doing just that.

The MLDS offers a complete software and services stack, including a training platform (the HPE Machine Learning Development Environment), container management (Docker), cluster management (HPE Cluster Manager), and Red Hat Enterprise Linux.

“Then you have a set of hardware that’s based on InfiniBand utilizing four Apollo A6500s using eight 80GB Nvidia A100 GPUs – that’s what’s releasing today,” said Justin Hotard, executive vice president and General Manager for HPC and AI at HPE, in a prebrief for press and analysts. “It combines the GPUs with service nodes and the storage option in the form of the HPE Parallel File System Storage solution, and then connects to the enterprise through the Aruba 6300 switch.” This, he said, is the most basic configuration of the MLDS.

An overview of the MLDS. Image courtesy of HPE.

“This came to HPE as part of the Certaind AI acquisition, and this product – the Machine Learning Development System – is truly an opportunity to combine the best of that software with the best hardware on the market and build it together as an appliance. ‘ summarizes Evan Sparks, vice president of HPC and AI at HPE. Sparks – the founder and CEO of Certaind AI – added that it was a “all-in-one offering” that could not have been achieved as an independent software company.

“We’ve talked about this quite extensively since acquiring Certaind AI last summer,” Hotard said. “Training these deep learning models is not only complex and time-consuming, but also very resource-intensive. What we’ve realized through acquiring Certaind AI and with many clients we’ve met before and after is: many of the engineers really spend their time managing the infrastructure. They deal with a lot of the technical intricacies of the infrastructure instead of really focusing on optimizing their goals and refining them at scale.”

“We see that customers – similar to HPC – need a specialized infrastructure,” he continued. “There’s been a massive explosion of companies building accelerators and looking for different layers of optimization, and that actually makes it harder for the individual data scientist and engineer because that just adds complexity when working on their solution.” Hotard said that these specific, rigid solutions are costly and often ineffective on a large scale. HPE, he said, wants to give developers flexibility in where to build, train, and deploy their models. With the MLDS, he argued, “they can focus their expertise on large-scale model development and training, and on business outcome and business value.”

“What we’ve seen is that this platform actually delivers faster performance compared to other systems on the market today, especially when running an NLP workload, and also even faster – and slightly more effectively – when running a computer -Vision workloads”, Hotard detailed.

As an example, HPE cited a pilot customer: Aleph Alpha, a European natural language processing start-up that needed a solution that enabled different levels of parallelism and scale. Aleph Alpha deployed 16x the base configuration of the MLDS for a total of 64 HPE Apollo A6500 systems. HPE reported that the systems were set up in “just a few days”, training started in two days, and that Aleph Alpha quickly achieved faster results.

“We are seeing amazing efficiency and performance of more than 150 teraflops using the HPE Machine Learning Development System,” said Jonas Andrulis, Founder and CEO of Aleph Alpha. “The system was quick to set up and we started training our models in hours instead of weeks. When running these massive workloads combined with our ongoing research, having the confidence to have an integrated solution for deployment and monitoring makes all the difference.”

The MLDS is now available worldwide.

Swarm learning from HPE

Hotard said that “the way things work today” tends to focus on collecting data and bringing model training and model development together in one central place. “Realistically, in many cases, this data is collected and collected at the edge,” he said. “In some cases, moving this data from the edge to the core has compliance implications, GDPR and other regulations, and that’s not trivial to just move everything to one central location.”

This, he said, amplifies the cost and complexity of moving that data at the technical and infrastructure levels, and as data becomes increasingly federated across cloud services and geographies, the problems are only going to get worse. “What we’re trying to eliminate,” he said, “is the reliance on data centralization and consolidation.”

Enter Swarm Learning, HPE’s answer to edge ML that allows models to be trained locally across a swarm of edge or distributed systems that share their insights but not their data. Swarm learning, according to HPE, effectively consists of a set of APIs and is purely software-based – and can also be integrated with the MLDS.

A basic diagram of HPE Swarm Learning. Image courtesy of HPE.

HPE identified a wide range of use cases for swarm learning, starting with healthcare. Due to severe limitations in sharing medical data between medical institutions – let alone between countries – HPE argued that this is a perfect sector for swarm learning, highlighting an example where models could be trained with health data between different institutions and multiple countries ” without violating GDPR, HIPAA or the Consumer Privacy Act.”

“We see a lot of applications in this space,” Hotard said. “We’re seeing applications in finance, we’re seeing applications in government and areas like climate and weather where there could be significant value in using swarm learning to accelerate insights without compromising the data itself.”

HPE Swarm Learning is also now available.

Share.

Comments are closed.