CodeAgent vs ToolCallingAgent: Two Ways to Let an Agent Act
The difference between an agent that writes Python and one that emits JSON, with strengths, limits, use cases, and when to pick each
Every agent has to answer one question on every turn: how do I say what I want to do? There are two answers in wide use, and smolagents is unusual in giving you a separate class for each. A ToolCallingAgent says its action as structured JSON. A CodeAgent says its action as a snippet of Python that gets executed. That sounds like an implementation detail. It is not. It changes what your agent can do, how reliable it is, and how much you have to worry about security. This post is a deep look at both: how each works, what each is good and bad at, real use cases, and a clear rule for choosing.
If you want the groundwork on what an agent even is, start with what agents are and how agents use tools. Here I assume you know the agent loop and want to understand this one fork in the road.
The one difference that matters#
Both agents run the same loop: the model looks at the task, decides on an action, the action runs, the result comes back, and it loops until it is done. The only thing that differs is the format of the action.
A ToolCallingAgent emits a structured tool call, the JSON format popularised by the OpenAI API:
{
"tool_call": {
"name": "search_docs",
"arguments": { "query": "What is the capital of France?" }
}
}Your framework parses that JSON, validates the arguments against the tool's schema, runs the matching function, and hands the result back.
A CodeAgent emits the same intent as executable Python:
result = search_docs("What is the capital of France?")
print(result)Your framework runs that code, and your tools are exposed to it as ordinary Python functions. Everything downstream flows from this split: JSON is data you validate, code is a program you execute.
ToolCallingAgent: acting in JSON#
This is the format most people mean when they say "tool calling." The model is shown each tool as a JSON schema (name, description, parameter types), and it responds by naming a tool and filling in arguments. Nothing runs except the exact function it named, with arguments checked against the schema first.
In smolagents it is a few lines:
from smolagents import ToolCallingAgent, WebSearchTool, InferenceClientModel
model = InferenceClientModel()
agent = ToolCallingAgent(tools=[WebSearchTool()], model=model)
agent.run("Get me the title of the page at https://huggingface.co/blog")The agent picks web_search, fills the query argument, the tool runs, and the loop continues.
Strengths. It is reliable, because the output is constrained and validated. There is no arbitrary code, so it is safe by default; the worst the model can do is call a tool you gave it with arguments that pass the schema. And it is interoperable: this exact shape maps cleanly onto external APIs and services, which is why every major provider speaks it.
Limitations. It is not very expressive. The model cannot easily combine two tools, transform one tool's output before feeding it to the next, or add a loop or a conditional. Each action is one call, in isolation. It is also inflexible: you have to define every possible action in advance as a tool, so the agent is limited to exactly the capabilities you predefined, with no way to compose new behaviour on the fly.
CodeAgent: acting in Python#
A CodeAgent writes Python instead. Your tools are bound as functions in the execution environment, and the model calls them the way it would call any function, in a real program. That unlocks control flow: it can call a tool, do math on the result, loop over a list, branch on a condition, and call another tool with the transformed value, all in a single action.
from smolagents import CodeAgent, InferenceClientModel
model = InferenceClientModel()
agent = CodeAgent(tools=[], model=model, add_base_tools=True)
agent.run("Give me the 118th number in the Fibonacci sequence.")Here the agent can just write a loop to compute Fibonacci rather than needing a dedicated tool for it. That is the whole point: code is a general way to express what a computer should do, so the agent is not boxed in by a fixed menu of actions.
The obvious question is security, and smolagents takes it seriously. By default the generated Python runs in a restricted local interpreter: only your tools and a small safe set of built-ins are callable, and imports are blocked outside an allowlist. You widen that deliberately:
agent = CodeAgent(
tools=[],
model=model,
additional_authorized_imports=["requests", "bs4"],
)
agent.run("Get me the title of the page at https://huggingface.co/blog")For anything untrusted, do not run code on your host at all. smolagents can execute in a sandbox instead by passing executor_type="e2b", "docker", or "blaxel", which isolates the code from your machine.
A CodeAgent generates arbitrary code that then runs. The restricted interpreter and allowlist are real protection, but the moment you widen imports or handle untrusted input, run it in a sandbox, not on your host. Treat "which imports do I authorize" as a security decision.
Strengths. It is highly expressive: complex logic, control flow, and combining tools all come for free because it is just code. It is flexible: the agent can compose new behaviour instead of being limited to predefined actions. And it suits emergent, multi-step reasoning, where the path is not known in advance.
Limitations. Code can error, so you have to handle syntax errors and exceptions in the loop. It is less predictable, and more prone to unexpected or unsafe output. And it genuinely requires a secure execution environment, which is operational work that JSON tool calling simply does not need.
Why code actions often win#
There is real evidence behind the code approach, not just vibes. The smolagents team points to several papers, the central one being Executable Code Actions Elicit Better LLM Agents (the "CodeAct" paper, ICML 2024), which found that letting the model act in code rather than JSON raised task success rates by up to around 20% on their benchmarks, often while taking fewer steps.
The reasons are intuitive once you see them:
- Composability. You can nest function calls, reuse a function, and chain results in code. You cannot really nest or reuse JSON actions.
- Object management. Code has a natural home for a tool's output: a variable. In JSON, where does the result of
generate_imagego so the next action can use it? - Generality. Code is built to express anything a computer can do. JSON tool calls express only the fixed set of tools you defined.
- Training data. Models have seen enormous amounts of real code, so writing actions as code plays to what they already do well.
That said, "often wins" is not "always wins." For a dispatcher that just needs to call one of a few well-defined APIs, the extra power of code is wasted, and the reliability and safety of validated JSON is exactly what you want. The research argues code is the better default for capable agents, not that JSON has no place.
Use cases, with examples#
Reach for ToolCallingAgent when the job is atomic dispatch.
- A support router that reads a message and calls one of
create_ticket,check_order_status, orescalate_to_human. One clean call per turn, arguments validated, nothing to execute. - A "front door" to your APIs, where each tool wraps one endpoint and you want strict argument validation and easy mapping to the service.
- Anywhere untrusted users drive the agent and you want the hard guarantee that no arbitrary code can run.
Reach for CodeAgent when the job is problem-solving.
- A data question like "pull these three metrics, compute the week-over-week change, and flag anything that moved more than 10%." That is fetch, transform, compare, and filter in one action, which is natural in code and painful in JSON.
- A research task that loops: search, read each result, keep the relevant ones, and synthesise. The loop and the filtering live in the code.
- Anything where the agent should combine tools it was given into new behaviour you did not spell out, for example parsing a document, doing math on the numbers, and querying an API with the result.
A useful mental image: a ToolCallingAgent is a dispatcher or controller, and a CodeAgent is a programmer or problem solver.
When to use which#
Use CodeAgent when you need reasoning, chaining, or dynamic composition; when your tools are functions meant to be combined (parsing plus math plus querying); and when the agent is essentially a problem solver. Accept that you owe it a secure execution environment.
Use ToolCallingAgent when your tools are simple and atomic (call an API, fetch a document); when you want high reliability and clear argument validation; and when the agent is a dispatcher or controller rather than a reasoner.
| CodeAgent | ToolCallingAgent | |
|---|---|---|
| Action format | Python code | Structured JSON |
| How it runs | Executed in an interpreter or sandbox | Parsed and validated, no code run |
| Expressivity | High: logic, loops, composition | Low: one predefined call per turn |
| Reliability | Can error, less predictable | Validated, predictable |
| Safety | Needs a secure sandbox | Safe by default |
| Flexibility | Composes new behaviour | Limited to predefined tools |
| Best as | Problem solver / programmer | Dispatcher / controller |
| Interop with external APIs | Indirect | Direct and natural |
The honest default: if your agent mostly routes to well-defined tools, start with ToolCallingAgent for the reliability and safety. If it needs to think, combine, and adapt, use CodeAgent and invest in the sandbox. And remember it is a per-agent choice, not a religion. Many real systems use a ToolCallingAgent at the edges for clean API dispatch and a CodeAgent where the actual reasoning happens.
Wrapping up#
CodeAgent and ToolCallingAgent are the same agent loop with one thing swapped: how the action is expressed. JSON gives you validated, safe, interoperable calls but boxes the agent into predefined actions. Code gives the agent the full expressive power of a programming language, with the research to back it up, at the cost of an execution environment you have to secure. Pick JSON when the agent dispatches, pick code when the agent reasons, and do not be afraid to use both in one system where each fits.

Folarin Akinloye is an AI Engineer based in London, UK. He builds production-ready agentic AI systems, multi-agent architectures, and sophisticated RAG implementations, and writes about the engineering decisions behind them.