The Claude API doesn’t just send you a single, complete answer; it sends you a stream of text, piece by piece, just like a person typing.

Let’s see it in action. Imagine you’re building a chatbot. You send a prompt to Claude, and instead of waiting for the whole response, your UI updates as the text arrives.

import anthropic

client = anthropic.Anthropic(
    api_key="YOUR_ANTHROPIC_API_KEY",
)

stream = client.messages.create(
    model="claude-3-opus-20240229",
    max_tokens=1000,
    messages=[
        {"role": "user", "content": "Explain the concept of quantum entanglement in simple terms, as if to a curious 10-year-old."}
    ],
    stream=True,
)

for chunk in stream:
    if chunk.delta.type == 'text_delta':
        print(chunk.delta.text, end='', flush=True)

This code, when run, will print the response from Claude not all at once, but in small bursts. You’ll see the text appearing on your console as it’s generated. The stream=True parameter is the key here. It tells the API to keep the connection open and send back data in chunks, called "deltas," as they become available.

The fundamental problem this solves is perceived latency. When a user asks a question, they expect an answer. If the API takes seconds to generate a full response, the user interface can feel sluggish or unresponsive. Streaming allows you to display text as it’s being generated, creating a more dynamic and engaging user experience. It feels like the AI is thinking and responding in real-time.

Internally, the Claude API uses a technique called Server-Sent Events (SSE) under the hood when streaming is enabled. The server (Anthropic’s API) continuously sends data to the client (your application) over a single, long-lived HTTP connection. Each chunk you receive is an SSE event. These events contain delta objects, which are incremental updates. For text responses, the text_delta within these chunks contains the actual pieces of text. Your code iterates through these chunks, extracts the text, and appends it to your UI.

The exact structure of the stream is important. You’ll get various types of deltas, not just text_delta. There can be content_block_delta for tool use or other structured output, and the stream ends with a content_block_end event. You need to handle these different types gracefully. For simple text generation, you’ll primarily be interested in text_delta. The flush=True in the print statement is crucial for real-time display; it forces the output buffer to be written immediately, so you don’t have to wait for the buffer to fill up before seeing the text.

When you’re dealing with the streamed output, you’ll notice that the chunk objects themselves contain metadata. For example, a chunk might indicate the end of a content block or signal that the model is about to use a tool. Your application needs to parse these to understand the full lifecycle of the response, not just the text. This allows for more complex interactions, like displaying a "typing" indicator while waiting for the next piece of text or showing placeholder elements for future structured output.

The next step after mastering real-time text streaming is handling structured output and tool use within the same stream.

Want structured learning?

Take the full Claude-api course →