Claude’s API can feel a bit like a black box, but the most surprising thing is how much control you actually have over its "thinking" process, even in simple requests.
Let’s see it in action. Imagine you’re building a customer service chatbot. You want Claude to summarize a user’s query before responding.
import anthropic
client = anthropic.Anthropic(
api_key="YOUR_ANTHROPIC_API_KEY", # Replace with your actual API key
)
response = client.messages.create(
model="claude-3-opus-20240229",
max_tokens=1000,
temperature=0.7,
system="You are a helpful assistant that summarizes user requests concisely.",
messages=[
{
"role": "user",
"content": "I'm having trouble with my recent order, number 12345. The package arrived, but the main item, the 'Quantum Widget', is missing. I also noticed the box looked a bit damaged. Can you help me figure out what to do next and maybe get a replacement?"
}
]
)
print(response.content[0].text)
This code snippet sets up a client, specifies the model (claude-3-opus-20240229 for top-tier performance), and sends a user message. The system prompt is crucial here; it’s Claude’s initial instruction, shaping its persona and task. The temperature parameter, set to 0.7, influences creativity. Lower values make responses more deterministic, higher values more random.
The core problem this solves is translating natural language into structured, actionable information. Claude doesn’t just understand your query; it can process it according to your explicit instructions. The messages array is where the conversation history goes. Each item is a role (user, assistant, or tool) and content. For simple requests, it’s just the user’s input. For follow-ups, you’d append the assistant’s previous response and the new user query.
The model parameter is your primary lever for capability. claude-3-opus is the most powerful, followed by claude-3-sonnet (balanced), and claude-3-haiku (fastest and cheapest). max_tokens limits the length of Claude’s output, preventing runaway generation. temperature is another key control: a temperature of 0.0 means Claude will pick the most likely next word every time, resulting in predictable, often repetitive, output. Increasing it allows for more variation and creativity.
You might be surprised to learn that the system prompt isn’t just for setting a persona; it’s also where you can embed specific instructions for how Claude should process the user’s input before generating a final response. For example, you could instruct it to extract specific entities, reformat information, or even perform a preliminary classification, all before it constructs its final output to the user. This pre-processing step, guided by the system prompt, is often invisible but is the real engine behind sophisticated LLM applications.
The next concept to explore is tool use, where Claude can interact with external APIs to fetch real-time data or perform actions.