Context Engineering in DeepAgents, From the Inside
What goes into a deep agent's context, what the framework manages for you, and the knobs you actually control
I wrote a more general piece on context engineering for agents a while back: the four moves of compress, offload, retrieve, and isolate. This post is narrower and more hands-on. It is about how one framework, LangChain's DeepAgents, actually does those moves for you in TypeScript, and where you still hold the controls.
The reason I like DeepAgents as a teaching tool is that it makes context a first-class thing. Most of what kills a long-running agent is not the model. It is the context: stale tool outputs, redundant file dumps, half-finished plans, all piling up until the model gets confused and you are paying for every junk token. DeepAgents has opinions about all of that, and it helps to know what they are before you fight them.
This is the first post in a short series on building with DeepAgents. The later parts go deep on subagents, memory, skills, and human-in-the-loop. Here I want to give you the map.
The five kinds of context#
DeepAgents splits context into five categories. Two you set up front, three the framework manages while the agent runs.
| Type | What you control | Scope |
|---|---|---|
| Input context | System prompt, memory, skills, tool prompts | Static, applied each run |
| Runtime context | Config passed at invoke time (user IDs, API keys) | Per run, propagates to subagents |
| Compression | Offloading and summarization | Automatic, near the limit |
| Isolation | Quarantine heavy work in subagents | Per delegated task |
| Long-term memory | Persistence across threads via the filesystem | Across conversations |
The useful mental model: input and runtime context are what you hand the agent. Compression and isolation are how the framework keeps that context from exploding. Long-term memory is how anything survives past a single thread.
Input context: the system prompt you do not fully write#
When you set systemPrompt, you are not writing the whole prompt. You are prepending to a built-in one. The final system message DeepAgents sends the model is assembled from these parts, in order:
- Your custom
systemPrompt, if provided - The base agent prompt
- The to-do / planning prompt
- The memory prompt (only when
memoryis set) - The skills prompt (only when
skillsis set) - The virtual filesystem prompt
- The subagent (
tasktool) prompt - Any custom middleware prompts
- The human-in-the-loop prompt (only when
interruptOnis set)
So the moment you turn on memory, skills, or interrupts, the system prompt grows to teach the model how to use them. That is mostly good. It also means your "short" system prompt is never the whole story, which matters when you are debugging why the agent behaves a certain way.
The basic setup looks like this:
import { createDeepAgent } from "deepagents";
const agent = await createDeepAgent({
model: "claude-sonnet-4-6",
systemPrompt: `You are a research assistant specializing in scientific literature.
Always cite sources. Use subagents for parallel research on different topics.`,
});systemPrompt is static. It does not change per invocation. If you need it to vary (say "you have admin access" versus "read-only", or to inject a user preference pulled from memory), reach for dynamicSystemPromptMiddleware, which can read request.runtime.context and request.runtime.store.
One thing the docs are firm about, and I agree with: you do not need middleware just because a tool reads context. Tools already receive config.context and runtime.store directly. Add dynamic-prompt middleware only when the system prompt text itself must change per request. It is easy to over-engineer this.
Memory files versus skills#
Two of the input-context layers are easy to confuse, so be deliberate about which you use.
Memory (AGENTS.md files) is always loaded into the system prompt. No conditions. Use it only for things that are relevant every single turn: project conventions, a few hard guidelines, a user preference or two.
const agent = await createDeepAgent({
model: "claude-sonnet-4-6",
memory: ["/project/AGENTS.md", "~/.deepagents/preferences.md"],
});Skills are loaded on demand. The agent reads only the frontmatter of each SKILL.md at startup, then pulls in the full content only when a task matches.
const agent = await createDeepAgent({
model: "claude-sonnet-4-6",
skills: ["/skills/research/", "/skills/web-search/"],
});The rule I use: if it has to be true on every turn, it is memory. If it is a detailed workflow you only sometimes need, it is a skill. Putting a big workflow in memory is the most common way people quietly blow their token budget. There is a whole post in this series on skills if you want to go deeper.
Tool prompts are part of context too#
This one gets overlooked. The description and schema of every tool you pass is context the model reads on every turn. A vague tool description is wasted tokens and worse decisions. Spend the effort here:
const searchOrders = tool(
async ({ userId, status, limit }) => { /* ... */ },
{
name: "search_orders",
description: `Search for user orders by status.
Use this when the user asks about order history or wants to check
order status. Always filter by the provided status.`,
schema: z.object({
userId: z.string().describe("Unique identifier for the user"),
status: z.enum(["pending", "shipped", "delivered"]).describe("Order status to filter by"),
limit: z.number().default(10).describe("Maximum number of results to return"),
}),
}
);Put the when in the description, not just the what. The model uses it to decide whether to call the tool at all.
Runtime context: per-run config that is not in the prompt#
Runtime context is the stuff you pass at invoke time: user IDs, API keys, database handles, feature flags. It is not automatically added to the prompt. The model only sees it if a tool or middleware reads it and surfaces it. Tools get it through their config argument.
const fetchUserData = tool(
(input, config) => {
const userId = config.context?.userId;
return `Data for user ${userId}: ${input.query}`;
},
{
name: "fetch_user_data",
description: "Fetch data for the current user",
schema: z.object({ query: z.string() }),
}
);
const contextSchema = z.object({
userId: z.string(),
apiKey: z.string(),
});
const agent = await createDeepAgent({
model: "claude-sonnet-4-6",
tools: [fetchUserData],
contextSchema,
});
const result = await agent.invoke(
{ messages: [{ role: "user", content: "Get my recent activity" }] },
{ context: { userId: "user-123", apiKey: "sk-..." } },
);The detail worth remembering: runtime context propagates to all subagents. When the main agent delegates, the subagent gets the same config, including context. So you set the user once and every delegated task can act as that user.
Compression: what happens as the window fills#
This is where DeepAgents earns its keep, because it does two things automatically that people otherwise hand-roll badly.
Offloading#
When a tool input or result crosses a token threshold (default 20,000), DeepAgents moves the content to the filesystem and leaves a reference behind.
For large tool inputs, like a big write_file call, the full content is already on disk, so once the session crosses about 85% of the model's window, older tool calls get truncated and replaced with a pointer to the file. For large tool results, the response is offloaded to the backend and replaced with a file path plus a preview of the first 10 lines. The agent can re-read or grep the file when it actually needs the detail.
The effect: a single 50,000-token search result stops costing you 50,000 tokens on every subsequent turn. You carry a path and a preview instead.
Summarization#
When context crosses the window limit (around 85% of max_input_tokens) and there is nothing left worth offloading, DeepAgents summarizes the message history. Two things happen at once:
- An in-context summary replaces the full history in working memory: session intent, artifacts created, next steps.
- The complete original messages are written to the filesystem as a canonical record.
So the agent keeps its sense of the goal while preserving the ability to dig up specifics later. A few defaults worth knowing:
- Triggers at 85% of the model's
max_input_tokens(from its model profile) - Keeps about 10% of tokens as recent context
- Falls back to a 170,000-token trigger and keeping 6 messages if the profile is missing
- If a model call ever raises a context-overflow error, DeepAgents falls back to summarization and retries
Both offloading and summarization lean on the virtual filesystem. The pattern across DeepAgents is the same: the filesystem is the durable store, and the context window is just a working set. Once that clicks, a lot of the design makes sense.
Isolation: subagents as context firewalls#
Subagents solve context bloat by quarantine. When a task needs a dozen noisy tool calls, you delegate it. The subagent runs with its own fresh context, does the messy work, and returns one clean result. The main agent never sees the dozen intermediate calls.
const researchSubagent = {
name: "researcher",
description: "Conducts research on a topic",
systemPrompt: `You are a research assistant.
IMPORTANT: Return only the essential summary (under 500 words).
Do NOT include raw search results or detailed tool outputs.`,
tools: [webSearch],
};The thing that makes this work is instructing the subagent to return a summary, not raw data. A subagent that dumps everything it found back into the parent defeats the entire point. I cover the full mechanics, the general-purpose subagent, and when delegation backfires in the subagents post. If you are weighing this against other multi-agent shapes, my piece on handoffs versus the supervisor pattern is the companion read.
Long-term memory: surviving past one thread#
By default the agent's filesystem lives in agent state, which only persists within a single thread. To remember things across conversations, you route specific paths (by convention /memories/) to a LangGraph Store using a CompositeBackend.
import { createDeepAgent, CompositeBackend, StateBackend, StoreBackend } from "deepagents";
import { InMemoryStore } from "@langchain/langgraph-checkpoint";
const agent = await createDeepAgent({
model: "claude-sonnet-4-6",
store: new InMemoryStore(),
backend: (config) => new CompositeBackend(
new StateBackend(config),
{ "/memories/": new StoreBackend(config) },
),
systemPrompt: `When users tell you their preferences, save them to /memories/user_preferences.txt so you remember them in future conversations.`,
});You do not pre-populate /memories/. You give the agent the backend, the store, and instructions about what to save and where. It creates files on demand with its normal filesystem tools. The whole topic, including per-user versus per-agent scoping and background consolidation, gets its own memory post.
The defaults I would actually change#
After mapping all this, here is where I think most teams should spend their tuning time, in order:
- Keep memory tiny. Every token in
AGENTS.mdis paid on every turn. Be ruthless. Move workflows to skills. - Write real tool descriptions. The model decides what to call based on these. Vague descriptions are the cheapest bug to fix.
- Cap subagent output in the prompt. If you notice a subagent returning long results while debugging, add an explicit length limit to its system prompt.
- Use the filesystem on purpose. Persist large outputs to files and let the agent pull fragments with
read_fileandgrep, instead of keeping everything live. - Document your memory layout. Tell the agent what lives in
/memories/and how to use it, or it will not.
The framework handles the hard parts (offloading, summarization, the assembled prompt) so you can spend your attention on these. That is the right division of labor.
Next in the series: subagents, the single biggest lever you have for keeping a long task from drowning in its own context.

Folarin Akinloye is an AI Engineer based in London, UK. He builds production-ready agentic AI systems, multi-agent architectures, and sophisticated RAG implementations, and writes about the engineering decisions behind them.