July 20, 2022

HealTAC 2022: Data for Good, State-of-the-Art and Key Takeaways for Healthcare Text Analytics 

All of the advances in Machine Learning (ML) and Natural Language Processing (NLP) tools are meaningless, unless these tools are put to good use. Healthcare Text Analytics Conference’s (HealTAC) interdisciplinary approach, links solutions provided by ML and NLP, for problems encountered in healthcare. 

Data for good 

Using healthcare text data for good, academic and industry experts demonstrated in the conference their solutions to detecting suicide ideation, identifying adverse childhood experiences and calculating risk of people exhibiting psychosis, amongst many others. The data used for these tasks came from clinical notes, electronic health records, and even social media sources. NLP tools have proven to be powerful solutions to automatically structuring and analysing such data. 

Focus on data availability, quality, privacy 

HealTAC 2022 had a major emphasis on data, which is the fundamental element of almost all research. Starting from the efforts of the National NLP Clinical Challenges, we were presented with empirical evidence that access and availability of quality clinical data, greatly benefits research and leads to more solutions. 

The gold standards set to ensure the quality of the data, start with the challenge of selecting the right sample of data to annotate for the appropriate task. Annotation being a crucial part of data quality, solutions were discussed around Inter Annotator Agreement (IAA), where multiple humans annotating the same data, face different challenges for different annotation tasks. An example would be medical Named Entity Recognition (NER) annotations, where multiple rounds of annotation and utilising strict or lenient evaluation appropriately, ensures annotators are more aligned in their annotations. 

A top priority throughout the conference was privacy, and specifically de-identification of patients. Using tools such as NER, Personally Identifiable Information (PII) was automatically detected and removed from data. 

State-of-the-art technologies 

Today’s NLP is dominated by transformer models. They are proving to be superior to previous Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM) methodologies when it comes to modelling language, due to their exceptional ability to capture contextual information of words. 

This is apparent from their extensive use in the solutions demonstrated in HealTAC. For the tasks of text classification, knowledge representation, semantic similarity and many more; mainly masked and auto-regressive language models were utilised. BERT (Bidirectional Encoder Representations from Transformers) and BERT variants, such as RoBERTa (Robustly optimized BERT pretraining approach) are most often used, providing an optimal balance between reliability and peak performance. 

Key takeaways from HealTAC 2022 

HealTAC’s own words, “Bringing together the academic, clinical, industrial and patient communities together to discuss the current state of the art in processing healthcare free text and share experience, results and challenges”, precisely captures the ethos of the conference. HealTAC’s contribution to the data and technological means necessary to achieve the end that is improving outcomes for patients, is impressive and certainly aligns with Talking Medicines values. 

A big thank you to Bea Alex for inviting us along and to William Clackett for guiding a wonderful discussion.  



Shutterstock Image ID: 1597219936