Supercomputing enables the analysis of huge atmospheric scientific data sets

0

February 9, 2022 — On March 1, 2002, European researchers gazed at the sky for the launch of the Envisat satellite mission. Envisat hosted nine instruments designed to gather information about Earth’s atmosphere in unprecedented detail. The satellite continued to send data back to Earth until 2012, when European Space Agency (ESA) staff lost contact and declared the official end of the mission.

One of the instruments on board Envisat was the Michelson Interferometer for Passive Atmospheric Sounding (MIPAS), which measured infrared radiation emitted by the Earth’s atmosphere. These measurements help scientists better understand the role of greenhouse gases in our atmosphere.

Because many gases found in the atmosphere have distinctive “fingerprints” in these high-resolution spectra, each set of spectra recorded at a given location allows researchers to obtain the vertical profiles of more than 30 key gases.

During each of the satellite’s 14 daily orbits, the instrument took measurements at up to 95 different locations, each consisting of 17 to 35 different spectra corresponding to altitudes of 5 to 70 km, but sometimes reaching as high as 170 km.

Monthly zonal mean temperature and volume mixing ratios (vmr) of several chemical species calculated from spectra measured by the MIPAS instrument in September 2009. The abscissa gives the degree of latitude from -90 degrees (South Pole) to 90 degrees (North Pole) and the ordinate is the geometric height. vmr values ​​are reported as parts per million (ppmv) or parts per billion (ppbv), while temperatures are reported in Kelvin. Photo credit: Michael Kiefer/KIT

Taken together, a decade of compressed MIPAS spectral data fills approximately 10 terabytes. In order to analyze such a huge amount of data in a timely manner, researchers from the Karlsruhe Institute of Technology (KIT) and the Instituto de Astrofísica de Andalucía (IAA-CSIC) have turned to the performance of high-performance computing (HPC). The team worked with the High Performance Computing Center Stuttgart (HLRS) to securely store its large data set and used the centre’s supercomputing resources to model and analyze the infrared spectra of MIPAS.

“Using the HLRS supercomputer, we were able to quickly and thoroughly assemble a complete dataset for our 10-year time series,” said Dr. Michael Kiefer, researcher at KIT and project manager. “We could try to do this work on cluster computers, but it would take years to process before we get the result. With HPC we can quickly view our results and extract a greater variety of chemical species from the measured spectra. This is not only a quantitative improvement, but also a qualitative one.”

Comb through the ever-growing flood of data

Scientific computing has entered a new phase of development in recent years. For decades, advances in computing were rooted in the idea of ​​Moore’s Law – a prediction by Gordon Moore that the shrinking size of transistors would allow the number of transistors on a computer chip to double every two years. This, he proposed in 1965, would lead to a near-exponential increase in computing power over the coming decades. While Moore was right for several decades, the last 10 years brought that trend to an end.

But as luck would have it, raw computing power wasn’t what many researchers needed most either. Today, solving scientific challenges is often no longer limited by processing speed, but by the need to efficiently transfer, analyze, and store large data sets.

This applies to the MIPAS researchers, who must process and analyze a decade of data tracking 36 different chemical species and temperatures. The team’s work is further complicated by the complex relationship that trace gases have with each other at higher altitudes – researchers need to map the interplay of temperature, radiation, concentrations of other chemicals and how all these properties interact. As a result, the team has to perform computationally intensive nonlocal thermodynamic equilibrium (NLTE) calculations for these species. Water vapor alone took about a million core hours for these calculations, and the team had to model nine species using NLTE methods.

While these calculations could be performed individually on more modest computing resources, they would take far too long overall without access to HPC resources. “At the beginning of the project, we were just working on a local cluster to process data from the upper atmosphere for a month, where this complex NLTE is required,” said Dr. Bernd Funke, senior researcher at the IAA and collaborator on the project. “To get results for a month of data, the calculation could take almost a month. Now we can do these things in two or three nights. From a scientific point of view, this quick access to the data is extremely valuable.”

Both Kiefer and Funke stated that HLRS’s computing resources — as well as the ability to store their data on HLRS’s fast and secure high-performance storage system — enabled the team to quickly analyze their data.

Next-generation experimental techniques are driving the need for next-generation computers

As the researchers finalize their analysis of the MIPAS dataset, they anticipate that there will be new mid-infrared space missions in the near future. Given the massive data sets they expect from these missions, HPC centers like HLRS will continue to play an important role in hosting, processing, and analyzing the data.

Future missions, such as the Earth Explorer candidate mission CAIRT, recently selected by ESA for pre-feasibility studies, will use imaging techniques that increase the number of measurements per orbit and add two extra dimensions to the data. Not only will this give researchers an even more detailed view of atmospheric composition and processes, but it will also significantly increase the complexity and volume of data analysis required. The researchers estimate that one of the planned instruments could lead to an up to 1,000-fold increase in the data points collected.

The team also noted that HLRS was quick to embrace its relatively “non-traditional” need for supercomputing resources. The current shift towards even more data-centric HPC applications in the sciences underscores the need for HPC centers to provide a range of tools in the field of data storage and management.

The entire MIPAS dataset of chemical species in the atmosphere can be accessed at www.imk-asf.kit.edu/english/308.php

funding for Hazelnut hen was provided by Ministry of Science, Research and Art Baden-Württemberg and the Federal Ministry of Education and Research via the Gauss Center for Supercomputing (GCS).


Source: Eric Memorial, High Performance Computing Center Stuttgart

Share.

Comments are closed.