GPUs are becoming a must-have in computing, so Nvidia is intensifying its work with standards and open-source communities for downstream technologies that were once largely exclusive to the company’s development tools.
In particular, a lot of work is done around programming languages like C++ and Fortran that are left behind with native implementation to run code across highly parallel systems.
The plan is to make generic computing environments and compilers more productive and accessible, said Timothy Costa, group product manager for high performance computing and quantum computing at Nvidia The registry.
“Ultimately, our goal with the open source community and programming is to improve concurrency and parallelism for everyone. I say that because I mean CPUs and GPUs,” Costa said.
Many of the technologies being opened up and brought into the mainstream relate to Nvidia’s previous work on its CUDA framework for parallel programming, which combines open and proprietary libraries.
CUDA was introduced in 2007 as a set of programming tools and frameworks for programmers to write programs for GPUs. But CUDA strategy changed as GPU usage expanded to more applications and sectors.
Nvidia is widely known for dominating the GPU market, but CUDA is at the heart of the company’s repositioning as a software and services provider targeting a $1 trillion market valuation.
The long-term goal is for Nvidia to become a full-stack vendor targeting specialized areas including autonomous driving, quantum computing, healthcare, robotics, cybersecurity, and quantum computing.
Nvidia has built specialized CUDA libraries for these areas and also provides the hardware and services that companies can use.
The full-stack strategy is best illustrated by the concept of an “AI factory” introduced by CEO Jensen Huang at the recent GPU Technology Conference. The concept is that customers can dump applications into Nvidia’s mega data centers, with the result being a customized AI model that meets specific industry or application needs.
Nvidia has two ways of monetizing concepts like the AI factory: by using GPU capacity or by using domain-specific CUDA libraries. Programmers can use open-source parallel programming frameworks that include OpenCL on Nvidia’s GPUs. But for those willing to invest, CUDA will provide that extra boost in the last mile as it’s tuned to work closely with Nvidia’s GPU.
Parallel for everyone
While parallel programming is widespread in HPC, Nvidia’s goal is to standardize it in mainstream computing. The company helps the community standardize best-in-class tools for writing parallel code that is portable across hardware platforms, regardless of brand, accelerator type, or parallel programming framework.
“The complication is – it can be measured as easily as lines of code. If you’re jumping between many different programming models, you’re going to have more lines of code,” Costa said.
For one, Nvidia participates in a C++ committee that sets out the pipelines that orchestrate the parallel execution of code that’s hardware-portable. A context can be a CPU thread that primarily performs I/O, or a CPU or GPU thread that performs intensive computation. Nvidia is particularly active in providing the standard vocabulary and framework for asynchrony and concurrency that C++ programmers demand.
“Every institution, every major player has a C++ and Fortran compiler, so it would be crazy not to. As the language progresses, we’re getting to a point where we have true open standards with power portability across platforms,” Costa said.
“Of course, users can always optimize with a manufacturer-specific programming model that is tied to the hardware if they want to. I think we’ve arrived at a kind of mecca of productivity for end users and developers here,” Costa said.
Standardization at the language level will make parallel programming more accessible to programmers, which could eventually encourage the acceptance of open-source parallel programming frameworks like OpenCL, he said.
Of course, Nvidia’s own compiler will get the best performance and value out of its GPUs, but removing the hurdles to bring parallelism to language standards regardless of platform is important, Costa said.
“By focusing on the language standards, we ensure we have a real breadth of compilers and platform support for programming performance models,” he explained, adding that Nvidia has been working with the community on language changes for more than a decade to be done at low level for concurrency.
Initial work revolved around the memory model that was included in C++11, but had to be back-extended as parallelism and concurrency began to gain traction. The memory model in C++11 focused on concurrent execution on multi-core chips, but lacked the hooks for parallel programming.
The C++17 standard laid the foundation for higher-level concurrency features, but real portability comes in future standards. The current standard is C++ 20, C++ 23 is coming soon.
“The great thing is now because these pipelines have been laid. If you look at the next iterations of the standard, you’ll see more and more user-centric and productive features pouring into these languages that are really powerful. Any hardware architecture in the CPU and GPU space will be able to take advantage of this,” Costa promised. ®