Author: Matthew Renze
Published: 2024-11-01

What are AI agents?

This is a question I’ve been hearing frequently from my clients.

However, it remains a source of great confusion and misunderstanding.

Unfortunately, the terminology and concepts in this area are still quite fuzzy.

In addition, most people don’t have a clear roadmap for where AI agents are heading.

So, to clarify, here are the main types of agents I’m currently seeing in industry and research.

AI Agent

In simple terms, an agent is anything that can perform actions. More specifically, an agent perceives its environment and chooses actions that maximize the expected likelihood of achieving a goal of some kind. So, an AI agent is a machine that chooses the best actions given what it currently knows.

However, agents can have varying degrees of autonomy in their actions. For example, a chatbot has a low degree of autonomy – it acts step-by-step with a human user. On the other hand, a multi-step, tool-using agent has a much higher degree of autonomy – it acts largely independent of its human user.

Today, when most people say “AI agent” or “agentic AI system” they mean a semi-autonomous software system that uses a large language model (LLM) as its core engine of computation – we’ll discuss these below. However, there are other AI agents that do not use LLMs, such as Reinforcement Learning (RL) agents.

 

Interface Agent

An interface agent is a natural language interface (NLI) for a specific computer subsystem. Here, “natural language” means that you can communicate with the agent in plain English, and it will respond in kind. The term subsystem can refer to an operating system, a file system, a database, an API, etc.

There are two main types of interface agents: query agents and action agents. A query agent answers questions using data contained in a subsystem. For example, a customer support chatbot using retrieval augmented generation (RAG) on an FAQ or writing and executing SQL queries on a relational database.

Action agents can query a subsystem but can also execute commands. For example, an action agent might add events to a calendar, update customer information in a CRM, or send tailored sales emails to customers. Action agents also include agents that interact with web browsers or desktop applications.

These agents typically handle simple, single-step tasks involving a single repository of data or subsystem. However, some RAG agents may span multiple text repositories. Interface agents are typically exposed via text-based user interfaces for humans and/or REST APIs using JSON for developers or other agents.

 

Workflow Agent

A workflow agent (or agentic workflow) is a semi-autonomous system that executes a sequence of pre-defined steps, using LLMs for some of the steps in the workflow. Workflow agents automate simple, repetitive, well-defined, multi-step tasks that a human would typically perform.

Workflows are triggered by an event (e.g., inbound email, a schedule, or a manual trigger). Each step in the workflow is then executed using a combination of traditional programming logic and LLMs. The steps that use LLMs may also call interface agents (from above) to answer queries or delegate tasks.

These agents are ideal for tasks that span multiple subsystems and involve conditional branching (i.e., multiple paths). Workflow agents naturally constrain the agent’s potential actions to just the steps contained in the workflow. However, they still require human oversight to handle errors and edge cases.

 

Autonomous Agent

An autonomous agent continuously performs a three-step loop of observation, reasoning, and action. First, the agent observes its environment. Next, it reasons via a chain of thought (CoT). Then, it executes actions that change the environment. Finally, it uses feedback from the environment in its next step.

Autonomous agents typically incorporate offline memory, enabling them to learn from past interactions. They typically have access to third-party tools like search engines, web browsers, calculators, and code interpreters. Additionally, they can also delegate tasks to interface and workflow agents (described above).

The scope of these agents typically aligns with well-defined roles in an organization. For example, a customer support agent, an HR agent, or a market research agent. Aligning them to bounded contexts in an organization keeps them focused, uses consistent terminology, and avoids responsibility overload.

These stand-alone agents can range from simple, repetitive, and scalable roles (like customer support agents) to more complex and higher-risk roles. However, you should always constrain their actions for safety, ensure human oversight, and require human approval for any steps involving uncertainty or risk.

 

Self-Improving Agent

A self-improving agent represents a more advanced stage of AI. It combines autonomous decision-making with self-improvement capabilities. However, unlike an autonomous agent (above), a self-improving agent can augment its own capabilities by modifying its own knowledge and behavior.

When a self-improving agent needs to interact with a new database, API, or subsystem, it reads the documentation and stores what it learned as text in its offline memory. Then, when it works with that subsystem, it retrieves the specific instructions necessary to interface effectively with the subsystem.

When encountering a new problem, the agent writes code to solve the problem. Then, it tests the code to verify the correct behavior. Once the code passes all the tests, it gets added to the agent’s skill library. Then, when the agent needs that skill again, the skill is retrieved from the skill library and executed.

These agents can also modify their system prompts, tune their hyperparameters, fine-tune their LLM, train new ML models, and create new worker agents. However, they don’t need to delegate work to interface or workflow agents (like above) because they learn each interface and workflow independently.

Self-improving agents (aka. autonomous AI engineers, researchers, or scientists) are currently an active area of research. As of Nov 2024, I haven’t seen them implemented successfully in industry yet. However, self-improving agents are a natural progression in the evolution of autonomous AI agents.

 

Autonomous Agency

An autonomous agency is a multi-agent system that functions as a self-operating organization. It’s essentially a (mostly) automated workforce. The agents perform the operational tasks in the organization, while humans oversee processes and assist the agents with outliers and edge cases.

The line-level roles in the organization are filled by the four types of agents (discussed above). However, humans continue to play key roles in these AI organizations. Technicians maintain agents, managers oversee teams of agents, a board of directors guides the organization, and shareholders vote on key decisions.

Fully autonomous agencies do not currently exist and are likely still a few years away. However, I’m already seeing the building blocks of these futuristic organizations beginning to emerge. For example, research projects like Generative Agents and ChatDev, as well as multi-agent frameworks like AutoGen and CrewAI,

A future economy filled with autonomous AI agencies would certainly create significant societal challenges. As a result, we may face new risks from technological unemployment, consolidation of wealth, and socio-economic disenfranchisement. So, we need to start preparing now for what is rapidly approaching.

To learn how to prepare your organization for AI, be sure to check out my presentation on Developing Your AI Strategy.

Share this Article