What is the best way to get started with AI agents?
Recently, I’ve had a recurring conversation with my clients.
Everyone wants to build AI agents, but they don’t know where to start.
So, to help you begin your journey, here are the exercises I recommend to get started with AI agents.
When you’re first getting started, the best place to begin is with concepts and terminology. I covered the definition of an agent and explained the various types of agents in my last article in this series. So, I recommend you start there for the basics. This article will build upon those concepts.
Next, begin working side-by-side with a large language model (LLM). You and the LLM will work together to solve day-to-day problems in your job. Learn the LLM’s capabilities and limitations, learn which prompting techniques work and which don’t, and learn how to prevent the model hallucination.
As an exercise, use ChatGPT to co-author a document, use GitHub Copilot as a coding assistant, or have Claude Opus help you find and fix a bug in your code. The goal is to learn how to communicate effectively with the LLM to solve problems together. The technical term for this is prompt engineering.
Next, create an agent that performs retrieval augmented generation (RAG). RAG is a technique where we query text from a database and use that text to ground the model with factually correct information. This prevents the model from hallucinating imaginary facts when it doesn’t know the correct answer.
As an exercise, build a quick customer support agent based on an FAQ. Process the text in the FAQ and store it in a vector database (like PineCone). Then, when a user asks the agent questions, the agent retrieves information from the vectorized database and generates answers using text found in the FAQ.
RAG works well when the data are text documents. However, much of an organization’s knowledge is contained in structured or semi-structured data. So, the next step is to build an agent that can query a database. This can be a relational database, a no-SQL database, a graph database, a spreadsheet, etc.
As an exercise, create an agent that can query a relational database by writing and executing SQL queries. The agent’s system prompt will contain information about the tables, the relationships, the fields, and their data types. You may also need to provide example SQL queries to act as a guide.
Querying data is a useful skill. However, the end goal is to create agents that can independently choose actions. So, the next step is to create a simple agent that can execute actions that change the state of a system. This can include updates to a database, calls to an API, clicking a button in a web browser, etc.
As an exercise, create an agent that adds events to a calendar, updates customer information in a CRM, or sends custom sales emails to clients. Start with simple actions first before tackling multi-step actions. The goal is to get the agent to execute actions in a way that is safe, reliable, and effective.
A workflow agent is a sequence of steps that complete a task using a combination of traditional programming logic and LLMs. Workflows are triggered by an event (e.g., an inbound email). Then, each step in the workflow is completed. The workflow concludes when you reach a terminal state.
As an exercise, automate a simple multi-step task that you perform regularly. Decompose the task into a sequence of steps. If you can write code to complete the step, then use code. If not, use an LLM with a system prompt. Low-code/no-code frameworks like n8n and VectorShift are a great place to start.
Executing pre-defined actions is useful, but what if we need an agent to choose an action from a set of multiple possible actions? The solution is to create a reasoning agent – an LLM that uses a chain of thought (CoT) to think step-by-step through a problem. The solution, in this case, is the correct action.
As an exercise, create a simple problem-solving agent. Create a prompt that instructs the LLM to think step-by-step through a problem. Then, give the agent a set of multiple-choice questions that require logical reasoning to solve. If the agent is reasoning correctly, it will choose the right answer.
An LLM’s memory is limited to the text contained in the current conversation – called in-context memory. An out-of-the-box LLM can’t recall anything it’s seen from one conversation to the next. So, it can’t learn anything new. The solution is to give the LLM an offline (i.e., out-of-context) memory.
As an exercise, create a chatbot that remembers information from previous conversations. For each conversation, have the agent chunk, embed, and store the previous conversations in a vector database. Then, during future conversations, use RAG to retrieve relevant info and add it to the context window.
The ability to use complex tools is one of the most important abilities that separates humans from other animals. Tools make us more capable. Coincidentally, giving agents access to tools also significantly increases their capabilities. So, the next step is to create an agent that can work with various tools.
As an exercise, create an agent that uses a search engine to search the web. Next, create an agent that uses a web browser to navigate websites. Then, can create an agent that uses a calculator to perform more complex math. There are literally hundreds of external tools that you can provide an LLM.
Reasoning step-by-step is useful for short multi-step problems. However, longer-horizon problems often require an upfront plan. Plans allow the agent to see the bigger picture as it moves toward a goal. Plans are not static though, so the agent should periodically inspect and update this plan based on feedback.
As an exercise, create a planning agent that works through a long-horizon problem. For example, take a task that you perform that involves many steps. Have the agent develop a plan of action before beginning the task. Then, after each step, have the agent update its plan based on new information.
LLMs are pretty decent programmers. They can write code to solve problems and execute that code using a code interpreter (like Python). They can also write and execute tests to verify the functionality of the code they write. Beyond this, they can also refactor, optimize, and fix bugs in their code.
As an exercise, create an agent that can solve simple puzzles by writing and executing Python code. There are several ways to parse code from the LLM’s output, feed it into an interpreter, and append the results to the LLMs input. Just be sure to put guardrails on what code the agent can and cannot execute.
When we encounter a new problem, we figure out how to solve it. Then, when we encounter a similar problem again, we use our previously learned skills to solve the new problem. Agents are the same. We can allow them to create new skills, store them in a skill library, and retrieve the skills when needed.
As an exercise, create an agent that writes functions in Python to solve new problems it encounters – like we did above. However, you store that code in a skill library using a vector database and RAG. When the agent encounters a similar problem, it will retrieve the corresponding skill and execute the code.
An autonomous agent performs a continuous three-step loop of observation, reasoning, and action. They typically also incorporate offline memory, tool use, skill libraries, etc. So, this exercise is your opportunity to put everything you’ve learned together into a single agent.
First, the agent observes its environment. The agent’s environment can include the operating system, local file system, databases, APIs, the internet, and possibly even the physical world – in the case of robots. The state of the environment is provided to the agent via text descriptions or JSON.
Next, the agent reasons via a chain of thought (CoT). The agent needs to get from the current step to the next step in the task. So, it needs to think step-by-step through the problem to choose the best possible next action. The agent may also retrieve knowledge from its offline memory at this time as well.
Then, the agent executes an action. Actions are typically performed using tools that the agent has access to. They can also include writing and executing code. Or, it may involve executing an existing function in its skill library. Executing an action changes the state of the environment in some way.
Finally, the agent uses feedback from the environment. The feedback helps the agent decide what to do next to get another step closer to its goal. The agent continuously repeats this three-step loop of observation, reasoning, and action until it reaches a goal state or gets stuck and asks for assistance.
Each of these exercises can be completed in less than a day using modern agent frameworks. Consider using developer frameworks like AutoGen and LangChain or low-code/no-code frameworks like n8n and Copilot Studio. Once you’ve completed these exercises, you’ll be ready to start building AI agents of your own!
To learn more about AI agents, be sure to check out my previous article on AI agents and all of my articles, videos, and online courses on AI.