Author: Matthew Renze
Published: 2025-04-01

What problems are we currently solving on the road to Artificial General Intelligence?

In the previous article in this series on Artificial General Intelligence (AGI), we learned about the problems in AI research that have been solved over the last few decades. We covered foundational AI topics like neural networks, machine learning, and deep learning.

In this article, we’ll learn about the AI research problems that are currently being solved on the road to AGI. This includes topics like attention, understanding, and agency.

Attention

Attention helps a deep learning model focus on the most relevant parts of the input data. The discovery of self-attention led to a new architecture for Deep Neural Networks (DNNs) called the “Transformer”. Transformers are the foundation for Large Language Models (LLMs) like ChatGPT.

Attention measures how important every word in a sentence is to every other word. For example, the words “small” and “fluffy” are strongly associated with (or “attend” to) “cat”. So, in addition to using only words to make a prediction, we also use their relative importance to one another as well.

This allows us to predict the next word in a sentence. For example, in the sentence “The small fluffy cat sat by the big brown ___”. The transformer might predict the word “dog” because “big” and “brown” strongly attend to words like “dog”, “couch”, and “box”.

Understanding

When we walk outside, we have a general understanding of the world around us. We understand spatial relationships, temporal cause-and-effect, and basic physics. This natural understanding of our environment comes from many years of observing how our world behaves.

However, AI systems don’t naturally have this ability. They don’t experience the world in the same way that we do. So, they don’t understand concepts like spatial relationships, temporal cause-and-effect, or real-world physics.

To solve this problem, we need to create something inside the AI known as a “world model”. A world model is an internal representation of the external world that allows an AI to understand, predict, and reason about its environment.

It captures the relationships between objects and events, enabling the AI to simulate and predict outcomes based on its internal model of the external world. World models are an important step towards AI systems that can navigate, predict, and understand the world around them.

Agency

Agency is the ability to take actions (autonomously) to achieve a goal of some kind. An agent is anything that has this ability. This can be a person, an animal, or (potentially) a machine — like a self-driving car. In AI, we use a technique called Reinforcement Learning (RL) to create AI agents.

RL involves training an agent to make decisions by rewarding them for desirable actions. An RL agent exists within an environment, like the physical world or a computer simulation. The agent makes observations about the state of the environment. Then, it chooses an action, which it believes will lead it closer to its goal.

Executing an action changes the state of the environment. With the environment now changed, the agent observes the new state and starts the process over again. If the agent achieves its goal, the environment produces a reward. The agent uses this reward as a signal to help it improve over time.

Although RL agents have existed for many years, recent advances have led to the creation of AI agents built on top of LLMs like GPT-4. This new generation of AI agents allows them to observe, reason, and act autonomously in iterative loops. It also allows them to use tools like calculators, web browsers, APIs, and code interpreters to solve more complex problems.

 

Attention, understanding, and agency are important building blocks for modern AI. However, these advances alone are not sufficient for achieving AGI. To do so, we’ll need to solve several other problems that currently exist. We’ll cover these in the next article in our series: AI of Tomorrow.