What problems will we need to solve on the road to Artificial General Intelligence?
In the previous article in this series on Artificial General Intelligence (AGI), we learned about the problems in AI research that we’re currently solving. We covered topics of modern AI, such as attention, world models, and agency.
In this article, we’ll explore the AI research problems we will need to solve over the next few years on the road to AGI. This includes topics like generality, reasoning, and embodiment.
We currently live in the era of Artificial Narrow Intelligence (ANI) – AI designed to solve a narrow set of specific problems like face detection, text-to-speech, and gesture recognition. Unlike humans, who can transfer skills across tasks, ANI needs to be retrained from scratch for each new task. For example, an AI trained to ride a bicycle cannot switch to a motorcycle without retraining the model.
Most narrow AI systems handle a single type of input and output, such as image classification systems – which take images as input and produce text as output. However, multi-modal systems use data from various sources, like images, text, audio, and video, to create a richer understanding of the world. This allows them to generalize knowledge across different types of data.
A multi-modal world model combines different types of data to form a more comprehensive understanding of concepts. For example, the word “cat,” a picture of a cat, and the sound of a cat meowing all point to the same general concept within the model. This integration of various data types helps the AI to understand and generalize better, making it capable of reasoning by analogy.
For instance, if we provide a multi-modal model with the sound of crumpling paper, a picture of a trashcan, and the equation for a parabolic arc, it understands that someone is playing a game of trashcan basketball. By developing a more comprehensive world model, multi-modal models can generalize, make better predictions, and understand complex concepts through analogy.
As humans, we reason through problems using a combination of logic, experience, and intuition. We work step-by-step through a problem to reach a solution. We use our intuition to choose a path to the solution, and if we encounter a dead-end, we can backtrack and try another path.
Unfortunately, narrow AI systems lack these capabilities. They aren’t good at solving complex, multi-step, long-horizon problems. So, AI researchers are currently working to overcome this limitation. One way we to address this issue is to use process supervision instead of outcome supervision.
With outcome supervision, we only check if the final answer is correct when training the LLM. However, with process supervision, we check each step of its thought process to see if it’s correct. Providing rewards for each correct step in the process, rather than just the outcome, teaches the LLM how to reason step-by-step on various math, logic, and reasoning tasks.
We can also combine process supervision with tree search to generate multiple branches for each step in a chain of thought. Then we see which path leads to the correct solution. Once we have the correct chain of thought, we can use it to create synthetic training data to train a new model using the old model’s correct chains of thought. This new model can reason better than the old one.
We typically think of the body and mind as separate entities. However, the theory of embodied cognition suggests the mind not only influences the body, but the body also influences the mind. As a result, a mind cannot exist without a body; rather, it needs physical embodiment to function.
In AI, embodied cognition implies that achieving AGI might require a physical body. AI needs to be embedded in a physical body or a virtual avatar to interact with the world and gain sensorimotor experiences. This helps the AI develop a comprehensive world model through direct interactions and feedback from its environment.
To create AGI, researchers are exploring embodied AI systems (i.e., robots with AI). We start by training simple virtual organisms in simulations, evolving them to learn basic world models. Next, we give the agents humanoid bodies, then more realistic robotic bodies, and eventually more precise sensors and actuators to learn precise motor control and manual dexterity.
Finally, these agents are placed in physical robot bodies, transferring their skills from the simulation to the real world and then refining their skills based on real-world physics. The goal is to create an AGI in humanoid robot form that can outperform humans in mental and physical tasks. However, to get to AGI, we will still need to overcome challenges like Moravec’s paradox.
Generality, reasoning, and embodiment are all important building blocks for AGI. However, these advances alone may not be sufficient to achieve AGI. So, we might need to solve other problems in AI research to reach our goal. We’ll cover these in our next article in the series: AI of the Future [coming June 1].
To learn more, please check out the next article in this series and watch my video on Artificial General Intelligence: The Road to Human-Level AI.