Data-centric cyber infrastructures for academic ultra-clean scientific laboratories

© Dmitrii Melnikov

Klara Nahrstedt, Professor and Director of the Coordinated Science Laboratory at the University of Illinois, Urbana-Champaign, and her team investigate how data-centric cyber-infrastructures in academic ultra-clean scientific laboratories help accelerate next-generation inventions

Academic ultra-clean scientific laboratories (clean rooms) are highly complex environments composed of various scientific instruments that (a) enable scientists to make discoveries of new materials, new semiconductor devices, and other scientific discoveries that are very important and crucial to society; and (b) perform a variety of roles in an academic setting, from educating students to conducting public relations to serving as a frontier for discovery.

Data-centric cyber infrastructures, scientific laboratories
Figure 1: The highest resolution electron beam recorder in North America housed in an ultra-clean scientific laboratory setting in the Holonyak Micro-Nanotechnology Lab

As Figure 1 shows, the scientific instruments are being digitized to a high degree, generating and collecting locally immense amounts of data. In these cleanrooms, however, there is often a lack of digital tools and cyber infrastructures that help with the origin, search and management of data.

In addition, academic cleanrooms lack the ability to provide scientists and laboratory managers with situational awareness of the surrounding space around instruments, which can lead to errors in experiments, insights, and ultimately discoveries [1]. Hence, it is of great importance to invest in the development and use of data related tools and cyber infrastructures to accelerate their scientific discoveries.

Achieve cost effective data collection

At the University of Illinois, Urbana-Champaign (UIUC), we research, develop and implement data-centric cyber infrastructures to achieve cost-effective data collection, processing, tracking and management in the academic environment. Our main data-centric cyber infrastructure services, which are discussed below in terms of their features and insights, are 4CeeD, ProvLet and SENSELET.

A distributed data-centric cyber infrastructure (CI) enables the timely and trustworthy curation, coordination and storage of data generated on scientific instruments in clean rooms such as SEM (scanning electron microscopes) and TEM (transmission electron microscopes). We offer the 4CeeD service [2] for this purpose. The data is uploaded from microscopes to a remote private cloud, where it is indexed and stored in the MongoDB-based Clowder data management system [3]and prepared for future access, queries, analyzes and visualizations.

4CeeD then enables scientists to access the data and metadata of their instruments via a web interface. The video in Figure 2 shows the 4CeeD curator and coordinator interfaces during the real-time upload process at the microscope site and during the visualization of indexed, queried and analyzed data sets.

Through the use of 4CeeD in the Holonyak Micro-and-Nanotechnology Laboratory (HMNTL) and in the Materials Research Lab (MRL), we have gained insights such as headaches and the use of open source software to build data-centric cyber infrastructures for cleanrooms enables sustainability. This is important because scientific instruments last 10-20 years and software ages much faster. Therefore, cyber components need to be constantly updated to keep up with security and performance requirements.

Figure 2: Data-centric tools for ultra-clean science laboratories

Data lineage for 4CeeD is performed by ProvLet to identify and track who created cleanroom scientific data, when it was created, and where it was created. This is of great importance as provenance information can be used for validation of scientific experiments, verification of scientific results, and forensic analysis in disputes over scientific data and discoveries.

ProvLet monitors the generation, access, curation and manipulation of data within 4CeeD’s clowder system in order to efficiently display provenance data using graph structures and to visualize results from auditing or forensic queries. ProvLet’s findings are intended to pay attention to the size of provenance logs and have a multi-level visualization function to display only necessary information (see video in Figure 2).

A sensory network cyber infrastructure in cleanrooms enables external sensors such as temperature and humidity sensors to monitor the microclimate around scientific instruments. For scientists, these external sensory data are important for nanofabrication and, overall, ensure the accuracy of experiments in clean rooms. Our SENSELET cyber infrastructure [4] collects sensory data and forwards it to wireless edge devices and the private campus cloud for further analysis, visualization and correlation with 4CeeD microscopy data, as shown in Figure 3.

SENSELET’s findings emphasize the correlation of environmental data with internal instrument data, which enables increased accuracy and precision of scientific experiments and the use of sensory data for the maintenance and safety of cleanrooms, which is especially important in academic scientific environments.

Data-centric cyber infrastructures, scientific laboratories
Figure 3: Data-centric cyber infrastructure framework

Overall, given the increasing challenges facing society, be it from climate change, pandemics or rapid technological changes, scientific discoveries are needed to solve the societal challenges, and data-centric cyber-infrastructures in academic ultra-clean scientific laboratories contribute greatly to inventions the next generation to accelerate and innovate our society needs.

Acknowledgments: This work was supported by NSF grants ACI 1443013, ACI 1827126, ACI 1835834. All results and opinions are our own and do not represent the views of National Science foundation.

Staff: B. Tian, ​​H. Moeini, P. Su, R. Kaufman, S. Konstanty, T. Nicholson, Z. Yang, R. Jain, J. Dallesasse, M. McCollum, G. Pezzarossi, McHenry, T. Smith, P. Braun


[1] JM Dallesasse, N. El-Zein, N. Holonyak Jr., KC Hsieh, “Environmental Degradation of AlxGa1-xAS-GaAS Quantum-well Heterostructures”, Journal of Applied Physics 68, 2235, August 1990,

[2] P. Nguyen, S. Konstanty, T. Nicholson, T. O’Brien, A. Schwartz-Duval, T. Spila, K. Nahrstedt, RH Campbell, I. Gupta, M. Chan, K. McHenry and N. Paquin “4CeeD: Real-time acquisition and analysis framework for material-related cyber-physical” Environments “; 17th IEEE / ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) 2017, DOI: 10.1109 / CCGRID.2017.51

[3] L. Marini, I. Gutierrez-Polo, R. Kooper, SP Satheesan, Burnette, J. Lee, T. Nicholson, Y. Zhao, and K. McHenry, “Clowder: Open Source Data Management for Long Tail Data”. ACM practice and experience in advanced research Computer (PEARC ’18). DOI:

[4] K. Nahrstedt, Z. Yang, T. Yu, P. Su, R. Kaufman, I. Shah, Konstanty, M. McCollum, J. Dallesasse, “Senselet: Distributed sensing infrastructure to improve the process Control and Security in Academic Cleanroom Environments ”, ACM GetMobile 2020, Vol. 2, No. 24, No. 2, September 2020,

Please note: this is a commercial profile

© 2019. This work is licensed under a CC BY 4.0 license.

editorial staff Recommended items


Comments are closed.