July 1, 2021 Author: Matthew Renze

How do we use AI to extract useful information from images?

Our world is highly visual. We derive most of our information about the world through our eyes.

The spaces that we navigate, the faces we interact with, and the documents we read are all processed visually. As a result, digital images are one of the most valuable types of unstructured data for modern AI.

In the last two articles of this multi-part series on The AI Developer’s Toolkit, I introduced you to the top AI tools for audio analysis and audio synthesis. In this article, I’ll introduce you to the three most popular AI tools for image analysis.

Image Classification

Image classification allows us to assign an image to two or more labeled categories. It answers the question, “what is contained in this image?”

For example, we can use image classification to tag the content contained in an image. We provide the image-classification model with an image as input. Then the model produces a category label and a confidence score as output.

Image classification is useful anytime you are trying to assign a categorical label (or multiple tags) to a collection of images. For example:

  • auto-tagging images on social-media posts
  • detecting product defects via visual inspection
  • diagnosing medical issues like detecting certain types of skin cancer

Object Detection

Object detection allows us to identify the location of various objects in an image. It answers the question, “where are the objects located in this image?”

For example, we can use object detection to identify various items contained in an image. We provide the object-detection model with an image as input. Then the model produces the coordinates of a bounding box for each object in the image as output.

Object detection is useful anytime you have images with multiple objects that need to be located. For example:

  • counting the number objects in an photo
  • detecting people in surveillance videos
  • detecting various obstacles in a self-driving car

Face Recognition

Face recognition allows us to identify a person contained in an image by their facial features. It answers the question, “who is in this image?”

For example, we can use face recognition to determine who is contained in our photos. We provide the face-recognition model with an image as input. Then the model produces the identity of the person contained in the image as output.

Face recognition is useful anytime you need to identify people in images. For example:

  • identifying customers as they enter your store
  • recognizing the occupants of your office building
  • tagging your friends in photos on social-media

Other Tools

Beyond the three examples that we’ve seen so far, there are also a variety of other image-analysis tasks. For example:

  • Reverse image Search – which allows us to find images that are visually similar to a source image
  • Image captioning – which generates a text description of what is contained in an image
  • Image segmentation – which is like object detection but assigns a type of object to every pixel in an image
  • Face-analysis tools – which allow us to detect faces, compare faces, detect facial landmarks, determine gender and age, detect facial features, and classify emotions
  • Body-analysis tools – which allow us to estimate pose, recognize gestures, count fingers, and detect adult or racy content
  • Document-analysis tools which allow us to extract printed text, handwritten text, form data, and tables from documents



As we can see, image-analysis tools allow us to extract useful information from digital images.

If you’d like to learn how to use all of the tools listed above, please watch my online course: The AI Developer’s Toolkit.

The future belongs who those who invest in AI today. Don’t get left behind!

Start Now!

Share this Article