Deep Agents by LangChain: A Deep Dive, with a Production Customer Support Agent
Why Deep Agents exists, what it gives you out of the box, and a full support agent you could actually ship
If you have built an agent with the basic loop, you have probably hit the wall. It works on the demo, then a real task arrives that needs ten steps, three tool results too big for the context window, and one action you would never let it take without a human signing off. You end up rebuilding the same scaffolding every time: a planner, a way to offload big results, a way to delegate, an approval gate. Deep Agents is LangChain's answer to that. It bakes all of that scaffolding into the agent loop so you stop reinventing it.
This post is a proper deep dive. I will cover why Deep Agents exists, walk through every main feature, and then build a customer support agent with it that is close to something you could actually ship. If you have followed this series, you already know what an agent is, how to give it tools, and how to evaluate it. Deep Agents is where all of that comes together.
Why Deep Agents exists#
A plain agent is a model in a loop with tools. That is enough for simple jobs, and LangChain's create_agent gives it to you cleanly. The trouble starts on hard tasks. The agent has no memory of its plan, so it wanders. Tool results pile up until the context window is full of noise. Everything happens in one context, so a single long task drowns out everything else. And there is no natural place to pause for a human before a risky action.
The teams building the most capable agents, the deep research tools and the coding agents, all independently solved these same problems the same way: give the agent an explicit todo list, a scratchpad it can write to and read from, the ability to spin up focused sub-workers, and approval gates on the dangerous bits. Deep Agents is that proven pattern packaged up. LangChain calls it an "agent harness." It is the same core tool-calling loop, with the reliability scaffolding built in.
The mental model I find useful: create_agent is the engine, Deep Agents is the car built around it. Same engine, but now it has a steering wheel, brakes, and seatbelts.
The main features#
Here is what you get out of the box, and why each one matters.
Planning with a todo list. Deep agents ship with a built-in write_todos tool. The agent breaks a complex request into discrete steps, tracks progress, and adapts the plan as it learns. This is the difference between an agent that charges off in one direction and one that thinks before it acts and course-corrects.
A virtual filesystem for context. The agent can write to and read from files, which sounds mundane but is the key to long tasks. Instead of carrying a giant search result in the conversation, it writes the result to a file and keeps only a reference. The context stays lean, and the agent stays sharp across a long run. The filesystem is pluggable: in-memory state, local disk, or a custom backend.
Context compression and summarization. On top of the filesystem, the harness offloads large tool inputs and outputs and summarizes older messages automatically, so the agent does not fall apart halfway through a long session when the window fills.
Subagents for delegation and context quarantine. A built-in task tool lets the agent spawn subagents, each running in its own isolated context window. This is the big one. When a subtask would flood the main agent with intermediate noise, the main agent hands it to a subagent, which does all the messy work and returns only the clean result. You also define specialized subagents with their own prompts, tools, and even models. There is a general-purpose subagent available by default.
Human-in-the-loop. You can mark specific tools as requiring approval. When the agent tries to call one, it pauses and hands control back to you to approve, edit, or reject before anything happens. This is what makes it safe to give an agent real powers like issuing refunds or sending emails.
Long-term memory. Using LangGraph's memory store, an agent can remember across threads and conversations, so it is not starting from zero every session.
Skills. You can extend an agent with reusable skills: bundles of specialized instructions, domain knowledge, and workflows it loads when relevant.
Pluggable backends and sandboxes. The filesystem backend is swappable, and shell-capable backends add an execute tool for builds, tests, and system tasks. For anything risky, you run it in a sandbox isolated from your host.
Smart default prompts. It ships with opinionated system prompts that teach the model to plan first, verify its work, and manage its own context. You can customize or replace them, but the defaults are a good starting point.
None of these are new ideas. What Deep Agents does is give you all of them at once, wired together correctly, behind one create_deep_agent call. The value is not any single feature, it is not having to build and integrate them yourself.
When to use it (and when not to)#
Reach for Deep Agents when the task is genuinely complex: multi-step, long-running, needs delegation, or involves actions that require human oversight. That is its sweet spot.
For a simple, single-purpose agent with a few tools and a short loop, plain create_agent is lighter and clearer. Do not pull in the whole harness to answer a FAQ. As always in this series, start simple and reach for more structure when you can feel the need.
Building a production customer support agent#
Let us build something real: a customer support agent for a SaaS product. It answers how-to questions from the help center, looks up a customer's account and orders, can issue a refund (but only with human approval), and escalates to a human when it is out of its depth. We will use planning, a subagent for context quarantine, runtime context to scope everything to the current customer, and human-in-the-loop on the one action that touches money.
Setup#
pip install deepagents langchain-anthropicexport ANTHROPIC_API_KEY="your-key"Step 1: runtime context#
Every request happens on behalf of a specific customer. Rather than trusting the model to pass a customer ID around, we inject it as runtime context, so our tools always act on the right account no matter what the model does. This is both cleaner and safer.
from dataclasses import dataclass
@dataclass
class SupportContext:
customer_id: str
tier: str # "free", "pro", or "enterprise"Step 2: the tools#
These are the real powers of the agent. In production each would call your actual systems; here they are stubbed so the example runs. Note how each tool reads the customer from context instead of taking it as an argument.
from langchain.tools import tool, ToolRuntime
@tool
def search_help_center(query: str) -> str:
"""Search the public help center articles for a query. Use for how-to and
product questions. Returns the most relevant article snippets."""
# Replace with a real search over your docs / vector store.
return f"Top help-center results for '{query}': [article snippets...]"
@tool
def get_account(runtime: ToolRuntime[SupportContext]) -> str:
"""Get the current customer's account details: plan, status, and usage."""
ctx = runtime.context
return f"Customer {ctx.customer_id} is on the {ctx.tier} plan, status active."
@tool
def get_order_status(order_id: str, runtime: ToolRuntime[SupportContext]) -> str:
"""Look up the status of one of the current customer's orders by order ID."""
ctx = runtime.context
return f"Order {order_id} for {ctx.customer_id}: delivered on 2026-06-10."
@tool
def issue_refund(order_id: str, amount: float, reason: str,
runtime: ToolRuntime[SupportContext]) -> str:
"""Issue a refund for an order. SENSITIVE: this moves money and requires
human approval before it runs."""
ctx = runtime.context
return f"Refunded ${amount:.2f} on order {order_id} for {ctx.customer_id} ({reason})."
@tool
def escalate_to_human(summary: str, runtime: ToolRuntime[SupportContext]) -> str:
"""Hand the conversation to a human support agent with a short summary.
Use when the issue is out of policy, legal, or the customer is upset."""
return f"Escalated to a human agent. Summary logged: {summary}"Step 3: a knowledge-base subagent for context quarantine#
How-to questions can mean several searches and a lot of article text. If the main agent does that itself, its context fills with snippets and it loses the thread of the actual conversation. So we delegate research to a subagent. It runs in an isolated context, does all the searching, and hands back a short answer. The main agent never sees the mess.
kb_researcher = {
"name": "kb-researcher",
"description": (
"Researches how-to and product questions using the help center. "
"Use for any question about how the product works or how to do something."
),
"system_prompt": (
"You answer product how-to questions. Use search_help_center as many "
"times as needed, then write a clear, friendly answer in under 150 words. "
"Cite the article titles you used. Do not include raw search dumps."
),
"tools": [search_help_center],
}Step 4: human approval on the one risky action#
Refunds move money, so the agent should never run one on its own. We mark issue_refund as requiring approval. A human can approve it, edit the arguments (say, correct the amount), or reject it. Everything else runs freely. Human-in-the-loop needs a checkpointer so the agent can pause and resume.
from langgraph.checkpoint.memory import MemorySaver
checkpointer = MemorySaver()
interrupt_on = {
"issue_refund": {"allowed_decisions": ["approve", "edit", "reject"]},
}Step 5: assemble the agent#
Now we put it together. The system prompt encodes the support policy: answer from the help center, escalate when out of policy, and treat refunds carefully.
from deepagents import create_deep_agent
SUPPORT_POLICY = """You are a customer support agent for a SaaS product. Be warm,
concise, and accurate.
How to work:
- For how-to or product questions, delegate to the kb-researcher subagent rather
than guessing. Never invent product behaviour.
- For account or order questions, use get_account and get_order_status.
- You may issue a refund with issue_refund, but it requires human approval, so
only propose one when the customer's request and the order clearly justify it.
- Escalate to a human for anything legal, abusive, out of policy, or when the
customer is clearly frustrated and you cannot resolve it.
Always confirm what you did in plain language at the end."""
agent = create_deep_agent(
model="anthropic:claude-sonnet-4-6",
tools=[get_account, get_order_status, issue_refund, escalate_to_human],
system_prompt=SUPPORT_POLICY,
subagents=[kb_researcher],
interrupt_on=interrupt_on,
checkpointer=checkpointer,
context_schema=SupportContext,
)That is the whole agent. Notice what we did not write: no planner, no context-offloading logic, no delegation plumbing, no pause-and-resume machinery. The harness gives us all of it.
Step 6: run a normal question#
config = {"configurable": {"thread_id": "ticket-5512"}}
ctx = SupportContext(customer_id="cus_8841", tier="pro")
result = agent.invoke(
{"messages": [{"role": "user", "content": "How do I rotate my API key?"}]},
config=config,
context=ctx,
version="v2",
)
print(result["messages"][-1].content)Here the agent recognises a how-to question, delegates to kb-researcher, which searches the help center in its own context and returns a tidy answer. The customer gets a clean response, and the main agent's context stays focused on the conversation.
Step 7: run a refund, and approve it#
Now the interesting path. The customer asks for a refund, which trips the approval gate.
from langgraph.types import Command
result = agent.invoke(
{"messages": [{"role": "user",
"content": "I was double charged on order A-1042. Please refund $29."}]},
config=config,
context=ctx,
version="v2",
)
# The agent looked up the order, then tried to issue a refund, which pauses.
if result.interrupts:
action = result.interrupts[0].value["action_requests"][0]
print("Approval needed:", action["name"], action["args"])
# {"order_id": "A-1042", "amount": 29.0, "reason": "double charge"}
# A human approves. They could also edit the args or reject.
result = agent.invoke(
Command(resume={"decisions": [{"type": "approve"}]}),
config=config, # same thread_id, so it resumes where it paused
version="v2",
)
print(result["messages"][-1].content)The agent plans the task, gathers the order details, and proposes the refund, then stops and waits. Nothing touches money until a human approves. If the human had wanted to change the amount, they would resume with an edit decision and corrected arguments instead. This is the pattern that makes an autonomous agent safe to put in front of real customers.
Scope your approval gates to the actions that actually carry risk. Reading an order is harmless and should run freely. Moving money, sending external emails, or deleting data are where you want a human in the loop. Gating everything just trains your team to rubber-stamp, which defeats the point.
Taking it to production#
The agent above is structurally production-shaped, but a few things turn it from a working example into something you would actually deploy.
Swap the checkpointer. MemorySaver is for development; it forgets everything when the process dies. In production use a durable checkpointer (such as the Postgres one) so paused approvals and conversation state survive restarts.
Turn on tracing. Everything from the evaluation post applies here. Set your LangSmith keys and you get a full trace of every plan, subagent call, and tool call, which is the only sane way to debug an agent that delegates. Build a dataset of real tickets and evaluate changes before you ship them.
Add memory. With a memory store, the agent can remember a returning customer's history across conversations instead of treating every ticket as the first contact.
Mind your tools and permissions. Each tool should hit your real systems with the minimum access it needs, and you can use filesystem permission rules to constrain what the agent can touch. The runtime context pattern we used keeps every action scoped to the right customer.
When you are ready to serve it, Deep Agents runs on the LangGraph runtime, so you get durable execution, streaming, and the standard deployment paths without extra work.
Wrapping up#
Deep Agents is the agent loop with the hard-won reliability patterns built in: planning, a filesystem for context, subagents for delegation and quarantine, human approval for risky actions, memory, and skills. You reach for it when a task is too complex for a plain agent, and you skip it when it is not. The customer support agent we built shows the shape of a real one: scoped to a customer with runtime context, delegating research to a subagent to stay focused, and gating the one money-moving action behind a human. That is maybe sixty lines for something that plans, researches, acts, and knows when to ask permission.
This rounds out the practical arc of the series. We started with what an agent is and have now built one substantial enough to trust with real work. If you build a deep agent of your own, I would love to see it.
Folarin Akinloye is an AI Engineer based in London, UK. He builds production-ready agentic AI systems, multi-agent architectures, and sophisticated RAG implementations, and writes about the engineering decisions behind them.