August 1, 2021 Author: Matthew Renze

How do we use AI to extract useful information from videos?

The real world isn’t composed of static images and audio snippets. Instead, we perceive the world as a rich multimedia experience.

When we combine images over time, synced with audio we get video. Video allows AI to perceive its world as a continuous and fluid audio-visual experience.

In the last two articles of this multi-part series on The AI Developer’s Toolkit, I introduced you to the top AI tools for image analysis and image synthesis. In this article, I’ll introduce you to the three most popular AI tools for video analysis.

Motion Detection

Motion detection allows us to identify movement in a video over time. It answers the question, “is anything moving in this video”.

For example, we can determine if anything is moving within a masked region of a security video. We provide the motion-detection model with a video and a polygon mask for the detection region as input. Then the model produces a motion label and confidence score for each frame as output.

Motion detection is useful anytime you need to determine if something is moving within a region of a video. For example:

  • detecting people moving in surveillance videos
  • monitoring the movement of wildlife on a game preserve
  • helping collaborative robots detect humans moving within their workspace

Object Tracking

Object tracking allows us to track the movement of an object over time. It answers the question, “how are these objects moving?”

For example, we can use object tracking to track the position, velocity, and acceleration of objects moving in a video. We provide the object-tracking model with a video as input. Then the model produces a sequence of bounding boxes and a corresponding object ID for each object being tracked as output.

Object tracking is useful anytime you need to know how an object moves in a video over time. For example:

  • tracking people walking in a surveillance video
  • following the faces of moving participants during a video-call
  • avoiding pedestrians in a self-driving car

Action Recognition

Action recognition allows us to classify various actions occurring in a video. It answers the question “what’s happening in this video?”

For example, we can use action recognition to understand human activities occurring in a webcam. We provide the action-recognition model with a video containing various human activities as input. Then the model produces an activity label and a confidence score as output.

Action recognition is useful anytime you need to detect what’s happening in a video. For example:

  • detecting unlawful activities in a surveillance video
  • analyzing activities occurring in sports videos
  • collaboration between humans and robots

Other Tools

Beyond the three video-analysis tools that we’ve seen so far, there are also a variety of other video-analysis tools. For example:

  • Optical flow – which allows us to track the path of moving objects over time
  • Gait recognition – which allows us to identify a person based on the way they walk
  • Lip reading – which allows us to covert the movement of lips from spoken words into text


As we can see, video-analysis tools allow us to extract useful information from digital video.

If you’d like to learn how to use all of the tools listed above, please watch my online course: The AI Developer’s Toolkit.

The future belongs who those who invest in AI today. Don’t get left behind!

Start Now!

Share this Article