Why text mining in social could be a game changer for personalised medicine
August 14, 2020

The improvement of tools to access information about diseases and treatments as well as the development of social media, is leading to more informed patients that want to be involved in their care. With an increasing number of patients with chronic conditions sharing health information online, social media sites present an opportunity to harness medical information that could be used to advance personalised medicine. Given the vast amount of information on these sites, machine learning and AI has been deployed as a strategy to bring value to this otherwise vast and unstructured data.

In the infancy of AI in healthcare, we saw machines primarily focussed on retrieving information from old health and medical records and efforts to develop supercomputers such as IBM’s Watson – that combines artificial intelligence (AI) and sophisticated analytical software to answer questions. Though these sources provide a plentiful supply of quality health information, there are problems with access to the raw text electronic health records, unless you are embedded in the NHS. If you are not affiliated with the health service, companies can expect to complete a rigorous process of many different applications to seek access. Even after access has been obtained there are still huge issues with maintaining anonymity and a need to process data in safe haven.8

Because of these issues, research has moved to social media. However readily accessible raw text is not the only benefit of using these sources:

1) Sites such as Twitter, Reddit, Health Unlocked, Patients like me, Facebook and more contain vast amounts of first-hand patient accounts in real-time.1 Facebook in particular has groups and communities comprising individuals that are readily sharing detailed information about their health and medicines this type of detail could not be obtained from patient health records alone.

2) It is possible to track longitudinal changes in factors over time due to the timeline nature of these sites.2

3) Some studies even suggest that the same level of granular information that can be extracted via a survey can be collected from these sites at a faster rate and a lower cost.3

What data is being collected from social media?

With many AI applications in healthcare our focus is on using machine learning to aid early diagnosis, disease tracking and management in the healthcare industry. Some work has been done in this space already, with several studies focussing on entity recognition.  In this type of analysis single words and phrases are pinpointed and tagged to build up a wealth of information about each category, for example side effects experienced as a result of taking a medicine.3,4,5 This data can be used for monitoring or for adverse event reporting to regulatory bodies such as the FDA in America and the EMA and MHRA in Europe.1

More widely, across many different industries, there has been a focus on identifying public attitudes towards products. This type of analysis is contextual and is effectively a weighting of positive words vs negative words within a given sentence or phrase.3,2 Recently there has been a shift away from this binary approach to the way that data is managed. The over-simplification of sentiment data can lead to key detail in a patient’s experience with a medicine being overlooked. Analysts are now using a more nuanced approach to augment this data; incorporating other key pieces of information found in patient accounts. Such information could include: stance detection (extraction of a subject’s reaction to a claim made by a primary subject), emotion detection (detection and recognition of types of feelings through the expression of texts or feelings) or risk factor and symptom data.6, 7

By integrating these extra nuggets of information – such as feelings data from online platforms – patient stories can be brought to life.  A piece of text that was once solely a positive/negative account can be extrapolated into a full story of how an individual interacted with a treatment or health condition. Utilising modern tools like AI and machine learning with this capability will allow for a vast number of accounts to be amassed and structured into meaningful insight, making ground on personalised medicine being a reality instead of just a pipe dream.


[1]  J. Mandrola and P. Futyma. The role of social media in cardiology. Trends in cardiovascular medicine, 30(1):32–35, 2020.


[2]  X. Lu, L. Chen, J. Yuan, J. Luo, J. Luo, Z. Xie, and D. Li.  User perceptions of different electronic cigarette flavors on social media:  Observational study. Journal of Medical Internet Research, 22(6):e17280, 2020.


[3]  M.  G.  Kim,  J.  Kim,  S.  C.  Kim,  and  J.  Jeong.   Twitter  analysis  of  the  nonmedical  use  and  side  effects  of methylphenidate:  Machine learning study. Journal of Medical Internet Research, 22(2):e16466, 2020.


[4]  O. M. Doyle, N. Leavitt, and J. A. Rigg. finding undiagnosed patients with hepatitis c infection:  an application of artificial intelligence to patient claims data. Scientific reports, 10(1):1–10, 2020.


[5]  L.  Sinnenberg,  C.  L.  DiSilvestro,  C.  Mancheno,  K.  Dailey,  C.  Tufts,  A.  M.  Buttenheim,  F.  Barg,  L.  Ungar, Schwartz, D. Brown, et al.  Twitter as a potential data source for cardiovascular disease research. JAMA cardiology, 1(9):1032–1036, 2016.


[6] https://paperswithcode.com/task/stance-detection


[7] https://devblogs.microsoft.com/cse/2015/11/29/emotion-detection-and-recognition-from-text-using-deep-learning/


[8] https://www.nhsresearchscotland.org.uk/research-in-scotland/data/safe-havens




Recent Blogs

COVID-19 – The catalyst changing our health behaviour

COVID-19 – The catalyst changing our health behaviour

We are in a period of limbo, having apparently weathered phase one of the Corona Virus wave. Now as autumn arrives we are bracing to see if we can avoid another peak over winter. Many of our past activities have resumed; we are back in bars, restaurants, gyms and...

Counterfeit medicines in Africa

Counterfeit medicines in Africa

What is a counterfeit medicine? A counterfeit medicine is a medical product which is substandard or falsified and therefore hasn’t undergone evaluation by any regulatory authority before making it to the correct patients. The WHO estimate that 10% of all medicines in...

Why does ‘Feelings Data’ matter to us?

Why does ‘Feelings Data’ matter to us?

We live in a world in which “feelings” symptoms carry far less gravitas than those that are physical. Though physical traits will always be paramount in disease tracking and treatment, we have realised that acknowledging how patients feel is just as important. Why is...

Feelings data – why collect it?

Feelings data – why collect it?

Personalised medicine is about getting patients on the right medicine, at the right time on the right dose. Many factors play into achieving this, none more so than patient involvement in their treatment plan – sharing their opinions and feelings at different stages...