The most surprising thing about building multi-agent systems with Claude’s SDK is how much of the "intelligence" is actually just clever prompt engineering.

Let’s see it in action. Imagine we want two agents, a Researcher and a Writer, to collaborate on a short blog post about the benefits of sourdough baking.

First, we define our agents with their roles and instructions:

from anthropic import Anthropic
from anthropic_agent.agent import Agent, AgentTools
from anthropic_agent.tools import tool

client = Anthropic(api_key="YOUR_ANTHROPIC_API_KEY")

class ResearchTools(AgentTools):
    @tool
    def search(self, query: str) -> str:
        """Searches the web for information and returns a summary."""
        # In a real scenario, this would call a search API
        print(f"Researcher searching for: {query}")
        if "benefits of sourdough" in query:
            return "Sourdough baking uses wild yeast and bacteria, leading to a distinct tangy flavor. It can also be easier to digest for some people due to the fermentation process breaking down gluten and phytic acid."
        return "No specific information found."

class WritingTools(AgentTools):
    @tool
    def write_section(self, title: str, content: str) -> str:
        """Writes a section of a blog post."""
        print(f"Writer creating section: {title}")
        return f"## {title}\n\n{content}\n\n"

researcher = Agent(
    name="Researcher",
    system_message="You are a helpful research assistant. Your goal is to find accurate information to answer user queries.",
    tools=ResearchTools(),
    client=client
)

writer = Agent(
    name="Writer",
    system_message="You are a skilled blog post writer. Your goal is to produce engaging and informative content based on provided research.",
    tools=WritingTools(),
    client=client
)

Now, we orchestrate their interaction. The Researcher will find information, and the Writer will use it.

async def main():
    # Researcher finds information
    research_result = await researcher.run(
        "Find the main benefits of sourdough baking."
    )
    print(f"Research Output: {research_result}\n")

    # Writer uses the research to create a section
    blog_intro = await writer.run(
        f"Write an introductory section for a blog post about sourdough baking, incorporating the following benefits: {research_result}"
    )
    print(f"Writer Output:\n{blog_intro}")

    # Let's say we want another section
    research_result_digestibility = await researcher.run(
        "Explain the digestibility aspects of sourdough."
    )
    print(f"Research Output (Digestibility): {research_result_digestibility}\n")

    blog_digestibility = await writer.run(
        f"Write a section about the digestibility of sourdough, using this information: {research_result_digestibility}"
    )
    print(f"Writer Output (Digestibility):\n{blog_digestibility}")

if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

When you run this, you’ll see the agents "thinking" and "acting" through their tool calls. The Researcher’s search tool is called, and its output is fed directly into the Writer’s prompt for write_section. This handoff is managed by the SDK, allowing agents to delegate tasks and use information from each other.

The core problem this solves is breaking down complex tasks into smaller, manageable sub-tasks that can be handled by specialized agents. Instead of one monolithic LLM call, you have a conversation between agents, each with a defined persona and capabilities. The system_message for each agent is crucial here; it’s not just a description, but the LLM’s guiding principles for how it should behave and what its goals are.

Internally, the SDK works by:

  1. Parsing the system_message: This sets the LLM’s persona and high-level instructions.
  2. Presenting available tools: The LLM is informed about the functions it can call.
  3. LLM chooses a tool (or responds directly): Based on the user’s input and its system message, the LLM decides whether to call a tool or generate a direct text response. If it calls a tool, it outputs the tool’s name and arguments in a structured format.
  4. SDK executes the tool: The Python function corresponding to the chosen tool is called with the provided arguments.
  5. Tool output is fed back: The result of the tool execution is then sent back to the LLM as part of the conversation history.
  6. LLM generates final response: The LLM, now with the tool’s output, generates the final text response or decides to call another tool.

The AgentTools class is a simple wrapper that lets the SDK discover methods decorated with @tool. The client is your standard Anthropic API client. The run method initiates the agent’s process.

The real magic is in how the LLM interprets the system_message and available tools to decide its next action. It’s not just executing code; it’s reasoning about what code to execute to achieve its objective. For instance, the Researcher doesn’t just know the benefits; its prompt instructs it to find them using its search tool. The Writer doesn’t just magically know how to format; its prompt guides it to use write_section to create structured output.

The one thing most people don’t realize is that the "state" of the multi-agent system is primarily managed within the LLM’s context window as a conversation history. When an agent calls a tool and its output is returned, it’s appended to the conversation. The LLM then uses this entire history to decide its next step. This means complex, multi-turn interactions are effectively just one long, intelligently managed dialogue where specific functions are called to achieve sub-goals. You can even pass tool outputs from one agent to another by including them in the prompt when you call the second agent.

The next concept you’ll likely explore is more sophisticated agent communication patterns, like broadcasting messages or implementing explicit handoff mechanisms beyond simple sequential tool use.

Want structured learning?

Take the full Claude-api course →