May 1, 2021 Author: Matthew Renze

How do we use AI to extract useful information from text?

Text is how we communicate information to one another via written language. It is the primary type of data we encounter in books, articles, and emails. As a result, it is one of the most valuable forms of unstructured data that exist in our world.

In the last two articles of this multi-part series on The AI Developer’s Toolkit, I introduced you to the top AI tools for table analysis and table synthesis. In this article, I’ll introduce you to the three most popular AI tools for text analysis.

Text Classification

Text classification allows us to assign a body of text into two or more categories. It answers the question “what kind of text is this?” or “what group does this text belong to?”

For example, we could classify news articles by the industry they pertain to. We provide the text-classification model with an email message as input. Then the model produces a prediction of which industry the article pertains to as output.

Text classification is useful anytime you have a collection of documents and you need to organize them into two or more categories. For example:

  • organizing legal documents, based on the type of case that they involve
  • categorizing support tickets by the type of issue they pertain to
  • classifying emails as spam or not spam

Sentiment Analysis

Sentiment analysis allows us to determine the emotional sentiment of a body of text. It answers the question “is this text positive or negative?”

For example, we could analyze product reviews to determine if they are favorable or unfavorable. We provide the sentiment-analysis model with a product review as input. Then, the model produces a sentiment score as output. These scores often range from 0 (very negative) to 1 (very positive).

Sentiment analysis is useful anytime you need to determine the emotional sentiment of a body of text. For example:

  • detecting high-priority customer support emails based on the customer’s tone
  • filtering out social-media posts with overly negative sentiment
  • writing emails with a more positive and constructive tone

Entity Recognition

Entity recognition extracts named entities from a body of text. It answers the question: “what person, place, or thing do these words refer to?

For example, we can use entity recognition to discover named entities in news articles. We provide the entity-recognition model with the text of each article as input. Then the model produces a list of the named entities and their locations in the text as output.

For example, the words “Microsoft” and “Google” in a business article would clearly refer to their respective companies. However, the word “Amazon” could either refer to the company or the river in South America. So the model needs to use the surrounding context to determine which words correspond to what entities.

Entity recognition is useful anytime you want to determine what entities are contained in a body of text beyond simple word-matching. For example, …

  • improving document search using searchable entities
  • adding hyperlinks to entities in articles on your website
  • extracting medical codes from a patient’s diagnosis in their medical records

Other Tools

Beyond these three common text-analysis tools, there are also a variety of other text-analysis tools. For example:

  • Tone analysis – which is like sentiment analysis, but for other types of emotions
  • Language recognition – answers the question “what language is this text?”
  • Syntax analysis – parses words into a hierarchical tree representing their their syntactical relationship

As we can see, text-analysis tools allow us to extract useful information from bodies of text.

If you’d like to learn how to use all of the tools listed above, please watch my online course: The AI Developer’s Toolkit.

The future belongs who those who invest in AI today. Don’t get left behind!

Start Now!

Share this Article