Claude Messages API Guide: Structure, Roles, and Parameters (2026)

Claude’s message-based API lets you have a conversation with the model, not just send a single prompt.

Let’s see it in action. Imagine you’re building a chatbot that helps users plan a trip.

import anthropic

client = anthropic.Anthropic(
    api_key="YOUR_ANTHROPIC_API_KEY",
)

# Start a new conversation
response = client.messages.create(
    model="claude-3-opus-20240229",
    max_tokens=1000,
    messages=[
        {
            "role": "user",
            "content": "I want to plan a trip to Japan. I'm interested in Tokyo and Kyoto.",
        }
    ]
)

print(response.content)
# Output might look like:
# "That sounds like a fantastic trip! Tokyo and Kyoto offer very different but equally amazing experiences. To help me tailor your itinerary, could you tell me a bit more about your interests? For example, are you looking for historical sites, modern attractions, nature, food experiences, or something else?"

# Continue the conversation
response = client.messages.create(
    model="claude-3-opus-20240229",
    max_tokens=1000,
    messages=[
        {
            "role": "user",
            "content": "I want to plan a trip to Japan. I'm interested in Tokyo and Kyoto.",
        },
        {
            "role": "assistant",
            "content": "That sounds like a fantastic trip! Tokyo and Kyoto offer very different but equally amazing experiences. To help me tailor your itinerary, could you tell me a bit more about your interests? For example, are you looking for historical sites, modern attractions, nature, food experiences, or something else?",
        },
        {
            "role": "user",
            "content": "I love history and food. I'm also interested in seeing some traditional gardens.",
        }
    ]
)

print(response.content)
# Output might look like:
# "Great! For history and traditional gardens, Kyoto is an absolute must. You'll want to visit Kinkaku-ji (the Golden Pavilion), Fushimi Inari Shrine with its thousands of red gates, and Arashiyama Bamboo Grove. For food, Nishiki Market is a culinary paradise. In Tokyo, you can explore the historic Asakusa district and Senso-ji Temple, and for a taste of traditional gardens, the Imperial Palace East Garden is beautiful. What dates are you considering for your trip?"

The core problem this solves is state management in conversational AI. Traditional APIs might treat each request as independent, forcing you to re-explain context. Claude’s message API explicitly handles the history of the interaction.

The messages parameter is a list of message objects. Each object has a role and content. The roles are user (what the human says) and assistant (what Claude says). You send the entire conversation history in each messages list to maintain context. Claude then uses this history to generate its next response.

The model parameter specifies which Claude model you want to use (e.g., claude-3-opus-20240229, claude-3-sonnet-20240229, claude-3-haiku-20240307). The max_tokens parameter limits the length of Claude’s response, preventing overly long or expensive outputs.

You can also include system prompts in the messages list. These are special instructions that guide Claude’s behavior throughout the conversation. They are typically placed at the very beginning of the messages list, before any user or assistant messages.

response = client.messages.create(
    model="claude-3-opus-20240229",
    max_tokens=1000,
    messages=[
        {
            "role": "user",
            "content": "What's the capital of France?",
        },
        {
            "role": "assistant",
            "content": "The capital of France is Paris.",
        },
        {
            "role": "user",
            "content": "And what is its population?",
        }
    ],
    system="You are a helpful assistant that only answers factual questions. Do not engage in small talk or offer opinions."
)

print(response.content)
# Output might look like:
# "As of my last update, the population of Paris is estimated to be around 2.1 million people within the city limits, and over 11 million in the greater metropolitan area."

The system prompt doesn’t get repeated in the messages list during subsequent turns. It’s a persistent instruction for the entire interaction.

When constructing your messages list for a new turn, you take the previous messages list, append the latest user message, and then send the whole thing. Claude processes this entire list to understand the conversational flow and generate a relevant assistant response.

The response object contains more than just the text. It includes metadata like id, type, role (always 'assistant'), model, stop_reason, stop_sequence, usage (token counts), and model_version. The usage field is critical for cost management, showing input_tokens and output_tokens.

The stop_reason tells you why Claude stopped generating text. Common reasons are end_turn (it naturally finished its thought) or max_tokens (it hit your token limit). If you consistently get max_tokens, you’ll need to either increase max_tokens or prompt Claude to be more concise.

One often overlooked aspect is how Claude handles long contexts. While you send the entire history, Claude internally uses sophisticated attention mechanisms to focus on the most relevant parts of the conversation. You don’t need to manually truncate or summarize the history yourself unless you’re hitting very extreme token limits (hundreds of thousands of tokens). The model is designed to manage this complexity.

The next step is understanding how to use tool_use to integrate Claude with external functions.