The most surprising thing about build tools and function calling is that they aren’t about building code at all, but about building intent for AI models.
Let’s see Claude 3 Opus use a hypothetical "weather lookup" tool. Imagine we’ve defined this tool for Claude:
{
"name": "get_weather",
"description": "Gets the current weather for a given location.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state or country for the weather lookup."
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to return."
}
},
"required": ["location"]
}
}
Now, we ask Claude: "What’s the weather like in San Francisco, and please use Celsius?"
Claude’s response won’t be a direct answer. Instead, it will be a structured request to use our tool:
{
"tool_name": "get_weather",
"parameters": {
"location": "San Francisco, CA",
"unit": "celsius"
}
}
This JSON blob is the "build tool" output. It’s not code being compiled; it’s a structured representation of what Claude wants to do next. Our application code then intercepts this, executes the get_weather function (perhaps by calling an actual weather API), and gets a result like:
{
"temperature": 15,
"unit": "celsius",
"description": "Cloudy"
}
This result is then fed back to Claude in a subsequent turn. Claude, now armed with the actual weather data, can formulate a natural language response: "The weather in San Francisco is currently 15 degrees Celsius and cloudy."
This workflow builds the user’s intent into a callable function. The "build tool" part is the process of taking a natural language request and translating it into a precise, machine-executable function call, complete with the correct arguments. The model isn’t generating code to solve a problem; it’s generating a plan for how your existing code should solve the problem.
The core problem this solves is bridging the gap between natural language understanding and programmatic execution. AI models are great at understanding nuance and intent, but they can’t directly interact with external systems or perform complex computations without a defined interface. Function calling provides that interface. It allows us to extend the AI’s capabilities beyond its training data by giving it access to real-world data and tools.
Internally, when Claude receives the function definition, it’s essentially performing a complex pattern matching and reasoning task. It identifies keywords, understands relationships between entities (like "San Francisco" and "location"), and maps these to the defined parameters of the available tools. The enum constraint on unit is a critical piece of information that prevents Claude from hallucinating a unit like "kelvin" if it’s not an allowed option. The required field ensures that essential information isn’t omitted.
When you define your tools, the description field is paramount. It’s not just for human readability; it’s the primary source of information Claude uses to decide if and how to call a tool. A vague description like "gets information" will lead to poor tool usage. Be specific. Describe what the tool does, what inputs it needs, and what it returns. For example, if you had a calculate_discount tool, you’d want to describe its parameters like {"item_price": {"type": "number", "description": "The original price of the item."}, "discount_percentage": {"type": "number", "description": "The percentage to discount, e.g., 10 for 10%."}}.
The most powerful aspect of this, and something many overlook, is that the tool definitions themselves can be dynamically generated or modified based on context. You’re not just hardcoding a fixed set of tools. Imagine an e-commerce assistant. If the user is browsing a specific product page, the available tools might change to include add_to_cart, check_stock, or get_product_reviews for that specific product. The LLM doesn’t need to know the product ID beforehand; it just needs to know that it has the capability to interact with product-specific functions, and your application will provide the context when it calls the tool.
The next hurdle is managing multi-turn tool use, where a single user request might require several sequential tool calls, potentially with intermediate reasoning steps from the model.