HealTAC 2022: Data for Good, State-of-the-Art and Key Takeaways for Healthcare Text Analytics

Solutions

DrugVoice unlocks the authentic voice of Healthcare Professionals and Patients

Find Out More Button

PeopleVoice turns unstructured Employee data into strategic intelligence

Find Out More Button

DrugVoice

PeopleVoice

About Us

TMLabs is our in-house Centre of Excellence for Data Science for Life Sciences – where we train, test, and refine proprietary models purpose-built to decode real-world health dialogue at scale

About Us Button

Articles & Scientific Publications

Our Articles & Scientific Publications showcase the rigorous methodologies and validated outcomes behind our Data Science – demonstrating the impact of Talking Medicines Predictive Intelligence in peer-reviewed research

See Publications Button

About TMLabs

Scientific Publications

Resources

Blogs

Our Blogs share insights at the intersection of data science, life sciences, and real-world health, covering trends, thought leadership, and innovation from the TM team

The Talking Room

Discover how The Talking Room demystifies AI, LLMs, and Machine Learning, showcasing data stories and expert insights that transform Patient and HCP conversations into actionable intelligence

Compliance Hub

The Compliance Hub outlines our commitment to data integrity, ethical AI, and regulatory standards, ensuring our intelligence is accurate, safe, and fully compliant

ESG

Our ESG principles guide how we operate, driving responsible innovation, and reducing environmental impact through ethical operating and data practices

HealTAC 2022: Data for Good, State-of-the-Art and Key Takeaways for Healthcare Text Analytics

All of the advances in Machine Learning (ML) and Natural Language Processing (NLP) tools are meaningless, unless these tools are put to good use. Healthcare Text Analytics Conference’s (HealTAC) interdisciplinary approach, links solutions provided by ML and NLP, for problems encountered in healthcare.

Data for good

Using healthcare text data for good, academic and industry experts demonstrated in the conference their solutions to detecting suicide ideation, identifying adverse childhood experiences and calculating risk of people exhibiting psychosis, amongst many others. The data used for these tasks came from clinical notes, electronic health records, and even social media sources. NLP tools have proven to be powerful solutions to automatically structuring and analysing such data.

Focus on data availability, quality, privacy

HealTAC 2022 had a major emphasis on data, which is the fundamental element of almost all research. Starting from the efforts of the National NLP Clinical Challenges, we were presented with empirical evidence that access and availability of quality clinical data, greatly benefits research and leads to more solutions.

The gold standards set to ensure the quality of the data, start with the challenge of selecting the right sample of data to annotate for the appropriate task. Annotation being a crucial part of data quality, solutions were discussed around Inter Annotator Agreement (IAA), where multiple humans annotating the same data, face different challenges for different annotation tasks. An example would be medical Named Entity Recognition (NER) annotations, where multiple rounds of annotation and utilising strict or lenient evaluation appropriately, ensures annotators are more aligned in their annotations.

A top priority throughout the conference was privacy, and specifically de-identification of patients. Using tools such as NER, Personally Identifiable Information (PII) was automatically detected and removed from data.

State-of-the-art technologies

Today’s NLP is dominated by transformer models. They are proving to be superior to previous Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM) methodologies when it comes to modelling language, due to their exceptional ability to capture contextual information of words.

This is apparent from their extensive use in the solutions demonstrated in HealTAC. For the tasks of text classification, knowledge representation, semantic similarity and many more; mainly masked and auto-regressive language models were utilised. BERT (Bidirectional Encoder Representations from Transformers) and BERT variants, such as RoBERTa (Robustly optimized BERT pretraining approach) are most often used, providing an optimal balance between reliability and peak performance.

Key takeaways from HealTAC 2022

HealTAC’s own words, “Bringing together the academic, clinical, industrial and patient communities together to discuss the current state of the art in processing healthcare free text and share experience, results and challenges”, precisely captures the ethos of the conference. HealTAC’s contribution to the data and technological means necessary to achieve the end that is improving outcomes for patients, is impressive and certainly aligns with Talking Medicines values.

A big thank you to Bea Alex for inviting us along and to William Clackett for guiding a wonderful discussion.

References

https://healtac2022.github.io/

Shutterstock Image ID: 1597219936

https://healtac2022.github.io/

Sign Up to Stay Ahead of Message Impact

Discover how Pharma marketeers are finally measuring which messages change HCP behavior. Our newsletter shares evidence-led insights powered by DrugVoice and the Message Resonance Score™ so you can predict and prove message impact—before prescriptions are written.

Subscribe on LinkedIn