Natural language processing is a branch of artificial intelligence that is supposed to enable computers to process, understand, interpret and manipulate human language. After decades of research, the current state-of-the-art NLP solutions are based on deep learning models implemented with different types of neural networks.
“Among the deep learning models, the transformer-based models implemented using a self-attention mechanism such as BERT and GPT are the current state-of-the-art solutions,” said Yonghui Wu, director of NLP at the Clinical and Translational Science Institute at Gainesville-based University of Florida Health and Assistant Professor in the Department of Health Outcomes and Biomedical Informatics at the University of Florida.
“The transformer-based NLP models break the training process into two phases, including pre-training and fine-tuning,” he continued. “In the pre-training session, the Transformer models adopted unsupervised learning strategies to train language models from large, unlabeled corpora (e.g. Wikipedia, Pubmed articles and clinical notes).”
When fine-tuning, transformer models optimize the pre-trained models for specific downstream tasks using supervised learning.
“The key step is pre-training, where transformer-based models learn task-independent linguistic knowledge from massive textual data that can be used to solve many downstream NLP tasks,” said Wu. “However, to make the transformer-based models effective, they are typically very large, with billions of parameters that cannot fit in a single GPU memory, are trained on a single compute node, and employ traditional training strategies.
“Training these large models requires tremendous computing power, efficient memory management, and advanced distributed training techniques such as data and / or model parallelism to reduce training time,” he added. “Therefore, although there are large transformer models in the general English field, there are no comparable transformer models in the medical field.”
For example, if a company were to train a BERT model with 345 million parameters on a single GPU, it would take months.
“Models like GPT-2 with billions of parameters won’t even fit into a single GPU memory for training,” said Wu. “As a result, we cannot take advantage of large Transformer models for a long time, even though we have massive clinical text data at UF Health.”
For software, NLP provider Nvidia has developed the Megatron LM package, which uses an efficient parallel approach to the intra-layer model that can significantly reduce the communication time of distributed training while the GPUs remain computationally bound, said Jiang Bian , Associate Director of the Biomedical Informatics Program at UF Health’s Clinical and Translational Science Institute and Associate Professor in the Department of Health Outcomes and Biomedical Informatics at the University of Florida.
“This model-parallel technique is orthogonal to data parallelism, which could enable us to use distributed training from both model parallelism and data parallelism,” explained Bian. “Nvidia also developed and deployed a conversational AI toolkit, NeMo, to use these large language models for downstream tasks. These software packages have greatly simplified the steps involved in creating and using large transformer-based models like our GatorTron.
“For the hardware, Nvidia provided the HiPerGator AI NVIDIA DGX A100 SuperPod cluster, which was recently deployed at the University of Florida and includes 140 Nvidia DGX A100 nodes with 1120 Nvidia Ampere A100 GPUs,” he continued. “The software solved the bottleneck in distributed training algorithms and the hardware solved the bottleneck in computing power.”
TO THE CHALLENGE
The team at UF Health developed GatorTron, the world’s largest transformer-based NLP model – with around 9 billion parameters – in the medical field and trained it on more than 197 million notes with more than three billion sentences and more than 82 billion words of clinical speech text by UF Health.
“GatorTron inherited the architecture from Megatron-LM – the software provided by Nvidia,” said Wu. “We trained GatorTron with the HiPerGator AI NVIDIA DGX A100 SuperPod cluster that was recently deployed at the University of Florida and includes 140 Nvidia DGX A100 nodes with 1120 Nvidia Ampere A100 GPUs. With the HiPerGator AI cluster, computing resources are no longer a bottleneck.
“We trained GatorTron with 70 HiPerGator nodes with 560 GPUs, using both a data and model-parallel training strategy,” he added. “Without Nvidia’s Megatron LM, we wouldn’t be able to train such a large Transformer model in the clinical field. We also took advantage of Nvidia’s NeMo toolkit, which gives the flexibility to use GatorTron for various NLP downstream tasks with easy to use application programming interfaces. “
GatorTron is currently being evaluated for downstream tasks such as named entity recognition, relationship extraction, semantic similarity of texts and question and answer function with electronic patient data in a research setting. The team is working on applying GatorTron to real-world health applications like identifying patient cohorts, de-identifying text, and extracting information.
UF Health evaluated the GatorTron model on four key NLP tasks, including clinical concept extraction, clinical relationships extraction, medical natural language inference, and medical questions and answers.
“For clinical concept extraction, the GatorTron model performed state-of-the-art on all three benchmarks, including the 2010 i2b2, 2012 i2b2, and 2018 n2c2 publicly available datasets,” remarked Bian. “When it comes to relationship extraction, GatorTron outperformed other BERT models pre-trained in clinical or biomedical fields, such as: B. ClinicalBERT, BioBERT and BioMegatron, clearly.
“For medical inference in natural language and for questions and answers, GatorTron has achieved new state-of-the-art performances on both benchmark data sets – medNLI and emrQA,” he added.
ADVICE FOR OTHERS
There is increasing interest in the use of NLP models to extract patient information from clinical narratives, with state-of-the-art pre-trained language models being key components.
“A well-trained large language model could fine-tune many downstream NLP tasks, such as: Medical chatbots, automated summaries, medical Q&A, and clinical decision support systems, ”advised Wu. “When developing large transformer-based NLP models, it is recommended to examine different model sizes – number of parameters – based on your local clinical data.
“When applying these large transformer-based NLP models, healthcare providers need to think about the real-world configurations,” he concluded. “For example, these large transformer-based NLP models are very powerful solutions for high-performance servers, but they cannot be deployed on PCs.”