‘Explosion’ of health data fuels pharma advances: Linguamatics

By Jenni Spinner

- Last updated on GMT

(NatalyaBurova/iStock via Getty Images Plus)
(NatalyaBurova/iStock via Getty Images Plus)

Related tags IQVIA Artificial intelligence Data management data analysis preclinical Clinical data

A leader from the IQVIA company explains how the boom in the volume and quality of data, and advanced analysis, can speed and streamline drug development.

Outsourcing-Pharma (OSP) spoke with Jane Reed (JR), director of life sciences with Linguamatics, about the increasing amount of life-sciences data available to pharma firms and research partners, and how state-of-the-art analytical tools can lead to myriad benefits.

OSP: Could you please tell us about Linguamatics?

JR: Linguamatics, an IQVIA company, delivers a market-leading natural language processing-based AI platform for high-value knowledge discovery and decision support from text. Our NLP platform empowers clients to speed up drug development and improve patient outcomes by breaking down data silos, boosting innovation, enhancing quality, and reducing risk and complexity.

When applied to clinical trial records, full text literature or voice of consumer transcriptions, NLP can extract clean, structured data to advance drug discovery and research. Critical details that improve understanding around gene-disease associations, pathways and systems are often buried in unstructured text, both in public databases and internal sources.

Using NLP, life sciences companies and their partners can keep on top of this information by transforming text into actionable data that can be quickly visualized and analyzed at every stage in drug discovery. NLP can also help companies seeking drug repurposing opportunities for existing drugs, as well as identify and connect with appropriate contact research firms.

Linguamatics provides expert domain knowledge on healthcare and the life sciences, as we are focused on medical and scientific understanding. Linguamatics customers include 19 of the top 20 global pharmaceutical companies; the US Food and Drug Administration (FDA); and leading cancer institutes, hospitals and academic research centers. Linguamatics NLP has been deployed by organizations in pharmaceuticals, biotechnology, healthcare, chemicals and agrochemicals, government and academia.

OSP: Could you please provide an overview of the evolution of use of health data in recent years?

JR: There has been an explosion in the volume, variety and velocity of health data in recent years. The data sources that are valuable to healthcare organizations (pharma, hospitals, payers etc.) are wide-ranging, including scientific literature, clinical trial records, safety reports, social media, call center verbatims and other stakeholder insights, websites, news, patents; as well as electronic health records. These data sources can provide insights into drug discovery, development, and the delivery of therapeutics in the clinic.

The widespread adoption of electronic health record (EHR) systems has resulted in the digitization of massive amounts of healthcare data in recent years. However, information is often heavily siloed and stored disparate clinical departments.

While EHRs are a rich source of data, insights are often trapped in narrative-style surgical and clinical notes, or in pathology and radiology reports, making it difficult for clinicians and researchers to access. Further, different providers often use different terminology to document the same diagnoses and treatments, resulting in a lack of standardized terminology across patient records.

To gain actionable insights from unstructured EHR data, many provider organizations turn to manual chart reviews, in which groups of often highly paid clinicians scrutinize sometimes thousands and thousands of patient records to extract specific pieces of information. While manual chart reviews may sometimes (but not always) perform well in terms of accuracy, they are extremely resource-intensive, requiring a significant investment of time and money.

Similarly, access to the critical information buried in other health-related textual data sources can be manual and highly resource intensive.

OSP: What benefits does the data explosion provide?

JR: More effective use of data enables more effective decision making. And the best decisions are made with the full landscape of relevant information. The explosion of healthcare related data offers numerous benefits, including richer patient histories that give providers a full 360 view of a patient’s health and create more personalized care plans. In addition, aggregated health data enables deeper insight into the health of populations, as well as into specific diseases for the development of new therapies or to understand and evaluate the efficacy of existing therapies.

Access to the information buried in other sources of unstructured health-related text can also bring value, for example:

  • Information from clinical trial records can improve clinical trial design and efficiencies
  • Capturing the landscape of gene-disease associations, natural history of disease, safety issues for a specific drug, or health economics data can assist with decision points in drug discovery
  • Understanding the patient voice from social media, or key opinion leader sentiment for a new brand, can be critical for market access and brand lifecycle

OSP: Similarly, what challenges and problems does the data boom create (especially regarding patient safety and regulatory compliance)?

JR: The vast volume of data can make it difficult for researchers to find the right data when they need it. When a researcher must search through a variety of data sources (EHRs, voice of consumer transcripts, social media, etc.), they may miss critical signals that might provide insight into a drug’s side effects or efficacy. As more and more data becomes available, regulatory agencies want additional knowledge which requires  researchers to sift through an abundance of noise to find relevant signals.

OSP: Could you please explain NLP, and how it can help pharma firms and their partners?

Jane Reed, director of life sciences, Linguamatics

JR: NLP is an AI technology that in essence “reads” text (or another input such as speech) by simulating the human ability to understand a natural language such as English, Spanish or Chinese. NLP systems can analyze unlimited amounts of text-based data without fatigue and in a consistent, unbiased manner. They can understand concepts within complex contexts and decipher ambiguities of language to extract key facts and relationships or provide summaries.

Given the huge quantity of unstructured data that is produced every day, from EHRs to social media posts, this form of automation has become critical to analyzing text-based data efficiently.

Linguamatics NLP is designed to recognize linguistic entities and extract relationships utilizing semantic and linguistic processing to understand and detect the mention of a concept, no matter how it’s expressed in the text. It provides normalized output which enables the easy grouping and visualization of data sets, the loading of results into data warehouses or data lakes, and the use of data to drive machine learning models.

OSP: What other technologies can aid in digital transformation?

JR: There are several technologies that spring to mind. One core technology is optical character recognition. OCR is a technology that enables the conversion of different types of documents, such as scanned paper documents, PDF files or images into editable and searchable data. This is a key enabler of other technologies, such as NLP, particularly for legacy documents, or reports with embedded tables.

Another huge growth technology that aids in digital transformation is machine learning, which is increasingly being used to help solve complex issues by analyzing data from textual sources. Once an NLP solution processes a mass of unstructured data and transforms it to well-structured data, the data can be used to drive a predictive modeling machine learning engine.

Privacy technologies that anonymize patient data are also critical as data is shared across providers or for research purposes. AI-based technologies that facilitate the management of real-world data and help improve understanding of patient populations are also aiding in the digital transformation.

The ever-increasing growth in integrated cloud technologies is critical for digital transformation. Enabling ecosystems of tools and technologies that connect various applications, systems, repositories, and IT environments for the real-time exchange of data and processes is important, so open architectures, API services, and other integration technologies are important.

OSP: Please tell us about some of the obstacles organizations may face in stepping up their data game. Are there common mistakes, commonly overlooked things, etc.?

JR: One of the biggest obstacles that organizations face when stepping up their data game is building the right teams. It’s important to create a team environment where the technology domain experts (NLP, AI/ML, integration etc.) can collaborate and communicate effectively with the business experts; the scientists, researchers and clinicians who need the right data. Projects without this close communication can falter.

Another common headache is ensuring data is clean and accurate. A researcher working with noisy data for predictive modeling may ask the right questions but receive the wrong answers if the data is limited or contains duplicate information. Integrating data from a wide range of sources, both structured and unstructured, needs good normalization and mapping rules to clean and deduplicate information.

It’s also always worth experimenting with new technology with a small, manageable project, or proof-of-concept. Many organizations can be tempted to dive in and tackle a large exciting challenge, but it’s better to walk before you can run, test the new innovations in a small contained project, with a roadmap that can grow, following successful outcomes.

OSP: If you can gaze into your crystal ball, can you share your perspective on how you think the industry’s use of data and these technologies might evolve in the near future?

JR: These are:

  • Increased use of real-world data​: There is an increasing number of new drugs entering the market, this, combined with ever-increasing pricing and market access pressures, means pharmaceutical companies will see more pressure on funding for medicines. That leads to value-based assessments that will increase the trend for reliance on real world data to support patient safety, drug access, product brands etc. Real world data are playing an increasing role in health care decisions. FDA uses RWD and RWE to monitor post-market safety and adverse events and to make regulatory decisions. And of course, much RWD is unstructured; hence the need for NLP and similar technologies.
  • Technologies to help remote workers share insights​: There will continue to be a major impact due to the COVID-19 pandemic. Over this past year we have seen short-term impacts of the pandemic such as regulation revisions, research and development process changes, and a shift towards tele-communication and remote working. Some of these will evolve into long-term impacts. For example, organizations need to utilize technologies that enhance teamwork: data sharing, knowledge sharing, better communication, visual analytics; to help in part replace the lost “water cooler” moments that an office environment can engender.

Related news

Show more

Related products

show more

More Data, More Insights, More Progress

More Data, More Insights, More Progress

Content provided by Saama | 04-Mar-2024 | Case Study

The sponsor’s clinical development team needed a flexible solution to quickly visualize patient and site data in a single location

Using Define-XML to build more efficient studies

Using Define-XML to build more efficient studies

Content provided by Formedix | 14-Nov-2023 | White Paper

It is commonly thought that Define-XML is simply a dataset descriptor: a way to document what datasets look like, including the names and labels of datasets...

Related suppliers

Follow us


View more