6 Uses for Natural Language Processing in Healthcare


Natural Language Processing Lab

Unlock access to text analytics in 4 weeks

Download the Offer

Diagnosing conditions. Developing treatment plans. Optimizing the patient experience.

These are just a few of the many possible applications for natural language processing (NLP) in the healthcare industry. Because of this, a growing number of healthcare providers and practitioners are adopting NLP in order to make sense of the massive quantities of unstructured data contained in electronic health records (EHR) and to offer patients more comprehensive care. According to a recent report, global NLP in the healthcare and life sciences market is expected to reach $3.7 billion by 2025, at a Compound Annual Growth Rate of 20.5%.

In this blog post, we’ll take a closer look at NLP in the healthcare industry — what it is, how it works, and how healthcare providers can benefit from this truly remarkable technology.

What is NLP & How Does It Work?

Natural language processing is a specialized branch of artificial intelligence that enables computers to understand and interpret human speech.

The way it works is this: NLP systems pre-process data by first “cleaning” the dataset. This essentially involves organizing the data into a more logical format — for example, breaking down text into smaller semantic units, or “tokens,” in a process known as tokenization. Pre-processing simply makes the dataset easier for the NLP system to interpret.

From there, the system applies algorithms to the text in order to interpret it. The two primary algorithms used in NLP are rule-based systems, which interpret text based on predefined grammatical rules, and machine learning models, which use statistical methods and “learn” over time by being fed training data.

Despite being a major technological advancement — one that stands at the crossroads of computer science and linguistics — NLP is more commonplace than you might realize. Any time you interact with an at-home virtual assistant such as Siri or Alexa, or explain a customer service issue to a chatbot, that’s actually NLP in action. That said, NLP also has more sophisticated applications, especially in the healthcare industry, which we’ll explore in this article.

5 NLP Techniques You Should Know

Before we can talk about the ways in which you can use NLP in healthcare, we must first define a few key NLP techniques:

Optical Character Recognition (OCR):
OCR, or text recognition, is the method by which a computer “reads” a handwritten or printed text and converts it into a digital format — for example, scanning a physical document and turning it into a PDF. OCR is also used to scan unstructured data sets, such as images or text files, extract text and tables from that data, and present it in a digestible format. Once this data has been formatted, it can be fed into an NLP pipeline for further analysis. In the healthcare industry, OCR is commonly used to digitize clinical notes, medical history records, patient intake forms, discharge summaries, medical tests, and so on.

Named Entity Recognition (NER):
NER is an information extraction technique that segments named entities — that is, real-world subjects, such as a person, location, organization, or product — into predefined categories. NER is also known as entity chunking, entity extracting, or entity identification. We’ll explore some healthcare-specific NER applications further down the page.

Sentiment Analysis:
Sentiment analysis applies a combination of NLP, text analysis, computational linguistics, and biometrics to a text in order to ascertain its underlying sentiment. For this reason, sentiment analysis is also commonly referred to as sentiment detection or opinion mining.

An excellent illustrative example — and, perhaps, its most common use case — is when businesses apply sentiment analysis to social media. In doing so, they’re able to better understand how the public perceives their products, services, or brand as a whole. A healthcare provider could theoretically do the same by analyzing patients’ comments about their facility on social media in order to get an accurate picture of the patient experience.

Text Classification:
Also known as text categorization, this NLP technique is used to analyze text data and assign tags or labels to different semantic units or clauses based on predefined categories. For example, a healthcare provider might use text classification to identify at-risk patients based on certain key words or phrases within their medical records.

Topic Modeling:
Topic modeling is a form of statistical modeling and NLP used to classify collections of documents — that is, group them together based on common words or phrases in order to identify semantic structures, or “topics.” The most common form of topic modeling, latent dirichlet allocation, uses algorithms to identify semantic relationships between different words and phrases and group them accordingly.

Of the five NLP techniques described here, OCR and NER are the most common in the healthcare industry.

How Can NLP Support the Healthcare Industry?

Though there really are no limits to how NLP can support the healthcare industry, let’s look at three primary use cases:

  • Improving Clinical Documentation: Rather than waste valuable time manually reviewing complex EHR, NLP uses speech-to-text dictation and formulated data entry to extract critical data from EHR at the point of care. This not only enables physicians to focus on providing patients with the essential care they need, it also ensures that clinical documentation is accurate and kept up to date.
  • Accelerating Clinical Trial Matching: Using NLP, healthcare providers can automatically review massive quantities of unstructured clinical and patient data and identify eligible candidates for clinical trials. Not only does this enable patients to access experimental care that could dramatically improve their condition — and their lives — it also supports innovation in the medical field.
  • Supporting Clinical Decisions: NLP makes it fast, easy, and efficient for physicians to access health-related information exactly when they need it, enabling them to make more informed decisions at the point of care.

6 Healthcare-Specific NLP Applications

Now that we’ve covered the basics, let’s discuss NLP applications in a healthcare-specific setting. Before you can use NLP on any text, all paperwork — be it clinical notes, patient records, medical forms, or anything in between — must be converted into a digital format using OCR.

From there, you can apply any of the following:

Clinical Assertion Model:
Clinical assertion modeling enables healthcare providers to analyze clinical notes and identify whether a patient is experiencing a problem, and whether that problem is present, absent, or conditional. For this reason, clinical assertion models are often used to help diagnose and treat patients.

For example, a patient might tell her doctor that she’s experienced a headache for the past two weeks and feels anxious when she walks fast. After examining the patient, the doctor might note that she has no symptoms of alopecia and that she doesn’t appear to be in any pain.

The doctor could later use a combination of NER and text classification to analyze their clinical from that appointments and flag “headache,” “anxious,” “alopecia,” and “pain” as PROBLEM entities. From there, the doctor could further categorize those problems by making assertions as to whether they were present, conditional, or absent — in this case, the headache would be present, anxiousness would be conditional, and alopecia and pain would be absent.

As you can see based on this example, this application of NLP in healthcare enables physicians to optimize patient care by identifying which problems are most pressing and administering immediate treatment.

Clinical Deidentification Model:
Under the Health Insurance Portability and Accountability Act (HIPAA), healthcare providers, health plans, and other covered entities are required to “protect sensitive patient health information from being disclosed with the patient’s consent or knowledge.”

The exception to this rule is data that has been deidentified — that is, data from which specified individual identifiers, such as name, address, telephone number, and so on, have been removed. Deidentified data is no longer considered to be Protected Health Information (PHI) because it does not contain any information that could possibly expose the patient’s privacy.

Healthcare providers can actually use NLP to pinpoint potential pieces of content containing PHI and deidentify or obfuscate them by replacing PHI with semantic tags. In doing so, healthcare organizations can avoid HIPAA non-compliance.

Image shows an example of PHI that has been deidentified using NLP.

Clinical Entity Resolver:
Using natural language processing, healthcare providers can extract information about different conditions and diagnoses from patient records and assign an ICD-10 Clinical Modification (ICD-10-CM) code to them.

The ICD-10-CM is a valuable resource, one that helps physicians make better decisions by cross-referencing symptoms and diagnoses against ICD-10-CM codes. Therefore, by assigning the appropriate ICD-10-CM code, physicians can monitor healthcare statistics, quality outcomes, mortality statistics, and more for that particular condition. This, in turn, enables them to better understand medical complications, better design treatment, and better determine the outcome of care.

Image shows an example of different health conditions, or "problems" — in this case, "gestational diabetes mellitus," "type two diabetes mellitus," "T2DM," "prior episode of HTG-induced pancreatitis," and "associated with an acute hepatitis" — and their corresponding ICD-10-DM codes.

Clinical Named Entity Recognition General Model:
Similar to the Clinical Assertion Model, healthcare providers can use this version of NER to analyze clinical notes, extract keywords, and assign them to specific entities, such as PROBLEM, TEST, or TREATMENT.

For example, if a patient were to be treated with an insulin drip for euDKA and HTG with a reduction in the anion gap to 13 and triglycerides to 1400 mg/dL, within 24 hours, “insulin drip” and “reduction” would be flagged as TREATMENTs, “euDKA” and “HTG” would be flagged as PROBLEMs, and “the anion gap” and “triglycerides” would be flagged as TESTs.

Image shows an example of a Clinical Named Entity Recognition Model, which analyzes the following text:  "A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus (T2DM), one prior episode of HTG-induced pancreatitis three years prior to presentation, associated with an acute hepatitis, and obesity with a body mass index (BMI) of 33.5kg/m2, presented with a one-week history of polyuria, polydipsia, poor appetite, and vomiting.   Two weeks prior to presentation, she was treated with a five-day course of amoxicillin for a respiratory tract infection. She was on metformin, glipizide, and dapagliflozin for T2DM and atorvastatin and gemfibrozil for HTG. She had been on dapagliflozin for six months at the time of presentation. Physical examination on presentation was significant for dry oral mucosa; significantly, her abdominal examination was bening with no tenderness, guarding, or rigidity.   Pertinent laboratory findings on admission were: serum glucose 111mg/dl, bicarbonate 18 mmol/l, anion gap 20, creatinine 0.4 mg/dL, triglycerides 508 mg/dL, total cholestorol 122 mg/dL, glycated hemoglobin (HbA1c) 10%, and venous pH 7.27. Serum lipase was normal at 43 U/L. Serum acetone levels could not be assessed as blood samples kept hemolyzing due to significant lipemia.   The patient was initially admitted for starvation ketosis, as she reported poor oral intake for three days prior to admission. However, serum chemistry obtained six hours after presentation revealed her glucose was 186 mg/dL, the anion gap was still elevated at 21, serum bicarbonate was 16 mmol/L, triglyceride level peaked at 2050 mg/dL, and lipase was 52 U/L. The β-hydroxybutyrate level was obtained and found to be elevated at 5.29 mmol/L - the original sample was centrifuged and the chylomicron layer removed prior to analysis due to interference from turbidity caused by lipemia again.   The patient was treated with an insulin drip for ueDKA and HTG with a reduction in the anion gap to 13 and triglycerides to 1400 mg/dL, within 24 hours. Her euDKA was thought to be precipitated by her respiratory tract infection in the setting of SGLT2 inhibitor use. The patient was seen by the endocrinology service and she was discharged with 40 units of insulin glargine at night, 12 units of insulin lispro with meals, and metformin 1000 mg two times a day. It was determined that all SGLT2 inhibitors should be discontinued indefinitely. She had close follow-up with endocrinology post discharge."  Within that text, certain words such as "gestational diabetes mellitus," "HTG-induced pancreatitis," and "polyuria" are flagged as PROBLEMs in purple. Words such as "amoxicillin," "metformin," and "dapagliflozin" are flagged in grey as TREATMENTs, and words such as "physical examination," "serum glucose," and "anion gap" are flagged in blue as TESTs.

Clinical Named Entity Recognition Posology — shown in the image below — is a more specified version of the Clinical NER General Model. Both versions of this application can be used to help clinical trials identify patients through drug and dosage filtration.

Image shows an example of a Clinical Named Entity Recognition Posology, which analyzes the following text:  "The patient was prescribed 1 capsule of Advil for 5 days. She was seen by the endocrinology service and she was discharged on 40 units of insulin glargine at night, 12 units of insulin lispro with meals, and metformin 1000 mg two times a day. It was determined that all SGLT2 inhibitors should be discontinued indefinitely for 3 months."  In this text, "1," 40 units," and "12 units" are flagged in green as DOSAGEs; "capsule" is flagged in red as a FORM; "Advil," "insulin glargine," "insulin lispro," "metformin," and "SGLT2 inhibitors" are flagged in purple as DRUGs; "for 5 days" is flagged in turquoise as the DURATION; and "at night," "with meals," and "two times a day" are flagged in blue as FREQUENCYs.

Clinical Relation Extraction Model:
Healthcare providers can use NLP to identify the strength, frequency, form, and duration associated with a particular drug. Known as the Clinical Extraction Model, this is achieved by drawing connections between different entities detected by NLP algorithms. This application of NLP in healthcare supports clinical documentation by identifying pertinent data based on the different relationships that exist between key words and phrases.

Image shows an example of a Clinical Relation Extraction Model in the form of a chart identifying the different relationships between key words and phrases.

Financial Contract Named Entity Recognition:
This NLP application works in much the same way as the other examples of NER shown above except, in this case, it’s applied to financial documents in order to identify organizations, individuals, monetary sums, dates, and so on. Financial Contract NER enables health insurance providers to automate the financial contract review process and flag any potential errors or fraudulent information.

Take Patient Care to the Next Level with Hitachi Solutions

When it comes to providing your patients with exceptional and, in some cases, life-saving care, you can’t afford to let anything stand in your way — especially not unstructured data.

Here at Hitachi Solutions, we’re committed to helping organizations within the healthcare and health insurance industries do more with their data using innovative solutions and services, including natural language processing. All of our offerings come backed by decades of proven data science expertise, and we have the resources to help your organization go further, faster, and at scale.

Are you ready to take patient care to the next level using NLP? There’s no time like the present to get started — contact us today to learn more.

Models referenced in this post were created with Spark NLP. Learn more about it here: John Snow Labs | NLP & AI in Healthcare