August 14, 2020

Why text mining in social could be a game changer for personalised medicine

The improvement of tools to access information about diseases and treatments as well as the development of social media, is leading to more informed patients that want to be involved in their care. With an increasing number of patients with chronic conditions sharing health information online, social media sites present an opportunity to harness medical information that could be used to advance personalised medicine. Given the vast amount of information on these sites, machine learning and AI has been deployed as a strategy to bring value to this otherwise vast and unstructured data.

In the infancy of AI in healthcare, we saw machines primarily focussed on retrieving information from old health and medical records and efforts to develop supercomputers such as IBM’s Watson – that combines artificial intelligence (AI) and sophisticated analytical software to answer questions. Though these sources provide a plentiful supply of quality health information, there are problems with access to the raw text electronic health records, unless you are embedded in the NHS. If you are not affiliated with the health service, companies can expect to complete a rigorous process of many different applications to seek access. Even after access has been obtained there are still huge issues with maintaining anonymity and a need to process data in safe haven.8

Because of these issues, research has moved to social media. However readily accessible raw text is not the only benefit of using these sources:

1) Sites such as Twitter, Reddit, Health Unlocked, Patients like me, Facebook and more contain vast amounts of first-hand patient accounts in real-time.1 Facebook in particular has groups and communities comprising individuals that are readily sharing detailed information about their health and medicines this type of detail could not be obtained from patient health records alone.

2) It is possible to track longitudinal changes in factors over time due to the timeline nature of these sites.2

3) Some studies even suggest that the same level of granular information that can be extracted via a survey can be collected from these sites at a faster rate and a lower cost.3

What data is being collected from social media?

With many AI applications in healthcare our focus is on using machine learning to aid early diagnosis, disease tracking and management in the healthcare industry. Some work has been done in this space already, with several studies focussing on entity recognition.  In this type of analysis single words and phrases are pinpointed and tagged to build up a wealth of information about each category, for example side effects experienced as a result of taking a medicine.3,4,5 This data can be used for monitoring or for adverse event reporting to regulatory bodies such as the FDA in America and the EMA and MHRA in Europe.1

More widely, across many different industries, there has been a focus on identifying public attitudes towards products. This type of analysis is contextual and is effectively a weighting of positive words vs negative words within a given sentence or phrase.3,2 Recently there has been a shift away from this binary approach to the way that data is managed. The over-simplification of sentiment data can lead to key detail in a patient’s experience with a medicine being overlooked. Analysts are now using a more nuanced approach to augment this data; incorporating other key pieces of information found in patient accounts. Such information could include: stance detection (extraction of a subject’s reaction to a claim made by a primary subject), emotion detection (detection and recognition of types of feelings through the expression of texts or feelings) or risk factor and symptom data.6, 7

By integrating these extra nuggets of information – such as feelings data from online platforms – patient stories can be brought to life.  A piece of text that was once solely a positive/negative account can be extrapolated into a full story of how an individual interacted with a treatment or health condition. Utilising modern tools like AI and machine learning with this capability will allow for a vast number of accounts to be amassed and structured into meaningful insight, making ground on personalised medicine being a reality instead of just a pipe dream.


[1]  J. Mandrola and P. Futyma. The role of social media in cardiology. Trends in cardiovascular medicine, 30(1):32–35, 2020.


[2]  X. Lu, L. Chen, J. Yuan, J. Luo, J. Luo, Z. Xie, and D. Li.  User perceptions of different electronic cigarette flavors on social media:  Observational study. Journal of Medical Internet Research, 22(6):e17280, 2020.


[3]  M.  G.  Kim,  J.  Kim,  S.  C.  Kim,  and  J.  Jeong.   Twitter  analysis  of  the  nonmedical  use  and  side  effects  of methylphenidate:  Machine learning study. Journal of Medical Internet Research, 22(2):e16466, 2020.


[4]  O. M. Doyle, N. Leavitt, and J. A. Rigg. finding undiagnosed patients with hepatitis c infection:  an application of artificial intelligence to patient claims data. Scientific reports, 10(1):1–10, 2020.


[5]  L.  Sinnenberg,  C.  L.  DiSilvestro,  C.  Mancheno,  K.  Dailey,  C.  Tufts,  A.  M.  Buttenheim,  F.  Barg,  L.  Ungar, Schwartz, D. Brown, et al.  Twitter as a potential data source for cardiovascular disease research. JAMA cardiology, 1(9):1032–1036, 2016.