Inside Vercel's Eve: A Deep Technical Walkthrough, with a Research Agent
What Eve is, every capability that matters, a full TypeScript research agent, and how it compares to LangChain's Deep Agents
Vercel shipped a framework for building agents, and the first thing you notice is that it does not look like one. There is no big Agent object you configure in code. Your agent is a directory. You drop files into an agent/ folder, and Eve reads that folder, runs the model loop, persists every session so it survives crashes, and serves the whole thing over HTTP and platform channels like Slack. It is one of the more interesting takes on agent infrastructure to land in a while.
This is a long one, because Eve is dense. I will explain what it is and why Vercel built it this way, tour every capability that matters with real code, build a complete research agent you could extend into something real, and finish with an honest comparison to LangChain's Deep Agents, the closest thing in spirit. If you read my Deep Agents deep dive, this is the TypeScript counterpart.
Eve is in public beta as of writing, under Vercel's beta terms. APIs and behaviour can change before general availability, so treat exact identifiers as a moving target and check the docs when you build.
What Eve is#
Eve is a TypeScript framework for building durable agents, and two words carry most of the weight.
Filesystem-first. A file's location decides its role, and its path gives it a name. A file at agent/tools/get_weather.ts becomes a tool the model sees as get_weather. A directory at agent/subagents/researcher/ becomes a subagent named researcher. No central registry, no array of tools to wire up. Add a file and Eve discovers it; move or rename it and its identity moves with it. The directory tells you what the agent can do before it runs.
Durable. Every session is persisted on the open-source Workflow SDK. A conversation can run for days, pause to wait for a human, survive a crash or a redeploy mid-task, and resume exactly where it left off. That is the substrate everything else sits on.
Eve calls itself an "agent harness." It is the same core tool-calling loop every framework has, but with the reliability scaffolding (planning, context management, durability, serving, human approval, sandboxing) built in rather than left to you. It ships as the eve npm package.
Why Vercel built it this way#
Think about how much of a production agent is not the clever part. The loop. Retries. Streaming. Persisting state between turns. Exposing it over HTTP. Adding auth. Running model-driven code somewhere safe. Deploying it so it scales. Most agent code is this plumbing, and every team rebuilds it slightly differently.
Eve's bet is that this should be framework-owned, the way Next.js owns routing and rendering so you do not hand-roll them. Make the filesystem the source of truth, take the runtime off your hands, and you are left writing only the parts that are actually your agent: its instructions, its tools, its specialists. And because it is Vercel, the deploy target is obvious and the durable runtime is already wired for it. It is aimed at the gap between "I got an agent working in a script" and "I have a stateful, authenticated, deployable agent service."
The mental model: your agent is a folder#
Everything clicks once you see the directory. Each subfolder under agent/ is a slot with a fixed meaning, and the path derives the name. This doubles as a map of the framework:
my-agent/
├── package.json
├── agent/
│ ├── agent.ts # runtime config: model, compaction, build flags
│ ├── instructions.md # the always-on system prompt
│ ├── instrumentation.ts # OpenTelemetry tracing (root-only)
│ ├── tools/ # typed actions; filename = tool name
│ ├── skills/ # load-on-demand procedures the model pulls in
│ ├── connections/ # external MCP and OpenAPI servers
│ ├── channels/ # how users reach the agent: HTTP, Slack... (root-only)
│ ├── hooks/ # subscribe to lifecycle and stream events
│ ├── sandbox/ # the agent's isolated shell environment + seeded files
│ ├── schedules/ # cron-driven runs (root-only)
│ ├── subagents/ # specialist child agents, each its own folder
│ └── lib/ # shared code, import-only, never mounted
└── evals/ # scored regression checks (sibling of agent/)A minimal agent is two files: instructions.md and agent.ts. Everything else you add when you need it. Identity always comes from the path; you never write a name or id field. That single idea is what makes the framework feel coherent rather than like a pile of config.
A tour of every capability that matters#
This is where Eve earns the "harness" label, because a lot of what you would normally build is just there.
Instructions. agent/instructions.md is the system prompt, prepended to every model call. Markdown is enough for most agents. Switch to agent/instructions.ts with defineInstructions to build the prompt from code at build time, or wrap it in defineDynamic to resolve a per-caller persona at runtime (tenant, plan, channel).
Tools. A tool is a file under agent/tools/. Filename is the tool name, the Zod schema is the contract, the function is the implementation:
// agent/tools/get_weather.ts
import { defineTool } from "eve/tools";
import { z } from "zod";
export default defineTool({
description: "Get the current weather for a city.",
inputSchema: z.object({ city: z.string().min(1) }),
async execute({ city }, ctx) {
return { city, condition: "Sunny", temperatureF: 72 };
},
});Tools run in your app runtime with full process.env. outputSchema types the return and toModelOutput can project a summary to the model while a channel gets the rich data. ctx carries the runtime (ctx.session, ctx.getSandbox(), ctx.getSkill()). Eve replays completed steps rather than re-running them, so finished work never fires twice.
The default harness. Every agent ships with built-in tools, no imports: bash, read_file, write_file, glob, grep (shell and files in a sandbox), web_fetch, web_search, todo (a durable task list, the planning capability), ask_question (pause and ask the user), agent (delegate to a copy of itself), plus load_skill and connection__search when you declare skills or connections. Override any built-in by authoring a tool at the same slug, or disable one with a disableTool() sentinel. Want web access but no shell? Drop a disableTool() at agent/tools/bash.ts.
The sandbox. Each agent has one isolated bash environment at /workspace; the shell and file tools proxy into it. Backends are pluggable: vercel() runs a hardware-isolated microVM, with local docker(), microsandbox(), and pure-JS justbash(), and a defaultBackend() that picks the best available. The standout is credential brokering: secrets never enter the sandbox. To let the model git clone a private repo, you inject the auth header at the sandbox's network firewall for that domain. The secret stays in your app runtime; the sandbox only sees the response.
Subagents. The built-in agent tool spins up a copy of itself on a focused task, sharing the sandbox but starting with fresh history. Declared subagents live under agent/subagents/<id>/, each its own agent:
// agent/subagents/deep-diver/agent.ts
import { defineAgent } from "eve";
export default defineAgent({
description: "Reads one source in depth and returns a tight, sourced summary.",
model: "anthropic/claude-opus-4.8",
});The point is context quarantine: a subagent does its messy work in an isolated context window and returns only the clean result. The parent sees it as a tool named exactly deep-diver (no prefix), and the description is how it decides to delegate.
Skills. Markdown procedures the model pulls in only when relevant via load_skill, rather than bloating every prompt. Flat agent/skills/changelog.md or packaged agent/skills/research/SKILL.md with supporting files both work, following the same Agent Skills standard. Skills add instructions, never execution surface: tools act, skills inform.
State. defineState gives a typed, durable per-session memory slot with no external store:
// agent/lib/budget.ts
import { defineState } from "eve/context";
export const budget = defineState("research.budget", () => ({ used: 0, cap: 20 }));Read with get(), write with update(fn); the value survives step boundaries, crashes, and long sessions. It is conversation-scoped working memory. Anything that must outlive the session belongs in an external store. State is never shared with subagents.
Dynamic capabilities and workflows. defineDynamic resolves tools, skills, or instructions at runtime from a session event, keyed on the caller (think one query tool per table a tenant can see). Further out, an experimental Workflow tool lets the model orchestrate its own subagents by writing JavaScript as one durable step. It runs in a QuickJS sandbox with an allowlist, not a denylist: the program reaches the agent functions bridged in as tools.<name> and language built-ins, and nothing else. No process, no filesystem, no network, because they are simply not present.
Human-in-the-loop. Any tool can require sign-off with needsApproval:
// agent/tools/publish_report.ts
import { defineTool } from "eve/tools";
import { always } from "eve/tools/approval";
import { z } from "zod";
export default defineTool({
description: "Publish the finished research brief to the team feed.",
inputSchema: z.object({ title: z.string(), markdown: z.string() }),
needsApproval: always(), // or once(), never(), or a predicate over the input
async execute({ title, markdown }) {
return publish(title, markdown);
},
});Helpers are never(), once() (first call in a session), always(), or a predicate like ({ toolInput }) => (toolInput?.amount ?? 0) > 1000. The built-in ask_question lets the model ask mid-turn. Both park the run durably, holding no compute, and resume when the answer arrives, even days later. Because the pause is durable, gating a non-idempotent side effect on approval also makes it replay-safe.
Connections (MCP and OpenAPI). A file under agent/connections/ wires the agent into an external server. MCP uses defineMcpClientConnection, any OpenAPI 3.x API uses defineOpenAPIConnection, both sharing one auth and approval model:
// agent/connections/linear.ts
import { defineMcpClientConnection } from "eve/connections";
import { once } from "eve/tools/approval";
export default defineMcpClientConnection({
url: "https://mcp.linear.app/mcp",
description: "Linear workspace: issues, projects, cycles, and comments.",
auth: { getToken: async () => ({ token: process.env.LINEAR_API_TOKEN! }) },
approval: once(),
});The model never sees the URL or the token. It discovers tools via the built-in connection__search and calls them by qualified name like connection__linear__list_issues. Interactive OAuth via Vercel Connect parks the turn durably until the user authorizes.
Channels. The edge adapter between a platform and your agent. The HTTP channel is always on. Beyond it, Eve ships first-class channels for Slack, Discord, Microsoft Teams, Telegram, Twilio (SMS and speech-transcribed calls), GitHub (PR review with repo checkout), and Linear (issue delegation with Agent Sessions), plus defineChannel for anything else. The same agent behaviour is portable across all of them, because tools never know which platform a message came from. Multi-channel reach being built in is one of Eve's quiet superpowers.
Sessions, runs, and durability. Every app speaks the same stable HTTP API. POST /eve/v1/session returns a continuationToken (resume handle) and an x-eve-session-id header (stream handle); GET /eve/v1/session/:id/stream is a newline-delimited JSON event feed (session.started, actions.requested, action.result, input.requested, subagent.called, message.completed, session.waiting, and more, with message.appended deltas for live rendering); POST /eve/v1/session/:id continues it. A session contains turns, a turn contains durable steps, and Eve checkpoints at each step. Crash or redeploy mid-turn, and it resumes from the last completed step. Work that waits parks durably and holds no compute. You never write workflow code.
Security. The app runtime is trusted and holds your secrets; the sandbox is isolated and never sees one. Routes reject unauthenticated traffic by default, anonymous access needs an explicit opt-in, and the scaffold ships a placeholderAuth() that keeps a half-configured app closed in production. Platform channels verify provider signatures in constant time and derive identity from the signature, never the body. It is a framework that assumes you will deploy it and tries hard not to let you footgun.
Schedules and evals. A file under agent/schedules/ with a cron expression runs the agent on its own clock; on Vercel each becomes a Vercel Cron Job. And Eve has a real eval framework built in: files under evals/, each an async test(t) that drives the agent over the same HTTP surface users hit, with deterministic assertions (t.completed(), t.calledTool("web_search"), t.check(t.reply, includes("..."))) and LLM-as-judge grading (t.judge.autoevals.factuality(...)), proper CI exit codes, and a --url flag to run the same files against a live deployment.
Deployment, client, and frontend. eve build compiles the agent and eve start serves it on a Nitro host, so the same agent runs locally, on Vercel (vercel deploy), or on your own Node host with identical routes. From TypeScript you call it with eve/client (new Client({ host }), client.session(), session.send(...), await response.result()), and in a browser the useEveAgent hook (React, Vue, Svelte) projects the event stream into AI SDK UIMessage-shaped state with optimistic updates, resumable sessions, and built-in human-in-the-loop rendering.
Building a research agent#
Enough touring. The goal: a research analyst agent that plans, searches the web, reads sources, delegates deep dives to a subagent to keep its context clean, tracks a source budget so it does not spiral, writes a sourced brief to disk, and gates publishing behind human approval. It exercises the harness, subagents, state, human-in-the-loop, and a custom tool.
Scaffold#
npx eve@latest init research-analyst
cd research-analystThat writes the starter agent/ files, installs dependencies, and starts the dev server. Stop it with Ctrl+C before editing. You need Node 24+ and a model credential (an AI_GATEWAY_API_KEY, or a direct key like ANTHROPIC_API_KEY).
The persona#
agent/instructions.md sets the identity and workflow. It leans on built-in tools (todo, web_search, web_fetch) and names our custom pieces.
You are a rigorous research analyst. You answer a question by gathering evidence
from the web and writing a clear, sourced brief.
How you work:
- Start by writing a short plan with the todo tool, then work through it.
- Use web_search to find sources and web_fetch to read the promising ones.
- For any source that needs careful reading, delegate to the deep-diver subagent
so your own context stays focused. Give it one clear question at a time.
- Respect the source budget. If you hit it, stop searching and write up what you have.
- Prefer primary and recent sources. Corroborate important claims across two.
- When the brief is ready, save it with save_report. Only call publish_report if
the user explicitly asks to publish.
The brief: a two or three sentence answer up front, key findings as short points
each with its source URL, and an "open questions" section for anything unverified.
Keep it tight. Never invent a source.The model#
// agent/agent.ts
import { defineAgent } from "eve";
export default defineAgent({
model: "anthropic/claude-opus-4.8",
});A source budget in durable state#
A research agent left unsupervised will search forever. We give it a budget that persists across the session.
// agent/lib/budget.ts
import { defineState } from "eve/context";
export const sourceBudget = defineState("research.budget", () => ({ used: 0, cap: 20 }));We enforce it by overriding the built-in web_fetch, keeping the original behaviour by spreading the default:
// agent/tools/web_fetch.ts
import { defineTool } from "eve/tools";
import { webFetch } from "eve/tools/defaults";
import { sourceBudget } from "../lib/budget.js";
export default defineTool({
...webFetch, // keep the default description, schema, and fetch behaviour
async execute(input, ctx) {
const { used, cap } = sourceBudget.get();
if (used >= cap) {
return "Source budget reached. Stop fetching and write the brief with what you have.";
}
sourceBudget.update((b) => ({ ...b, used: b.used + 1 }));
return webFetch.execute(input, ctx);
},
});That is the whole budget: durable state plus a one-line override of a built-in. To reset it each new question, add a hook:
// agent/hooks/reset-budget.ts
import { defineHook } from "eve/hooks";
import { sourceBudget } from "../lib/budget.js";
export default defineHook({
events: {
async "turn.started"() {
sourceBudget.update(() => ({ used: 0, cap: 20 }));
},
},
});A deep-dive subagent for context quarantine#
When a source needs careful reading, we do not want dozens of fetched paragraphs in the main agent's context. We delegate to a subagent that reads in its own isolated context and returns a tight summary. It gets the default harness (including the web tools) automatically.
// agent/subagents/deep-diver/agent.ts
import { defineAgent } from "eve";
export default defineAgent({
description:
"Reads one source or investigates one focused question in depth and returns a " +
"short, sourced summary. Use when a source needs careful reading.",
model: "anthropic/claude-opus-4.8",
});<!-- agent/subagents/deep-diver/instructions.md -->
You investigate one focused question. Read the relevant sources, then return a
summary under 200 words with the key facts and their source URLs. No raw page
dumps. If the sources disagree, say so.A tool to save the brief, and a gated one to publish#
The built-in write_file writes into the sandbox; we want the brief on our own disk, so save_report runs in the app runtime. And publish_report is gated behind approval, because publishing is the one action with real-world consequences.
// agent/tools/save_report.ts
import { defineTool } from "eve/tools";
import { z } from "zod";
import { writeFile, mkdir } from "node:fs/promises";
import { join } from "node:path";
export default defineTool({
description: "Save the finished research brief to disk as Markdown. Call once, at the end.",
inputSchema: z.object({
slug: z.string().min(1).describe("kebab-case filename, no extension"),
markdown: z.string().min(1).describe("the full brief in Markdown"),
}),
outputSchema: z.object({ path: z.string() }),
async execute({ slug, markdown }) {
const dir = join(process.cwd(), "reports");
await mkdir(dir, { recursive: true });
const path = join(dir, `${slug}.md`);
await writeFile(path, markdown, "utf8");
return { path };
},
});// agent/tools/publish_report.ts
import { defineTool } from "eve/tools";
import { always } from "eve/tools/approval";
import { z } from "zod";
export default defineTool({
description: "Publish the brief to the team feed. Requires human approval.",
inputSchema: z.object({ title: z.string(), markdown: z.string() }),
needsApproval: always(),
async execute({ title }) {
// Replace with a real publish call (Slack, a CMS, an email).
return { published: true, title };
},
});Run it#
npm run devIn the dev TUI, ask a real question: "What changed in the EU AI Act timeline for general-purpose AI models in 2025?" Watch the loop: it writes a todo plan, searches, fetches a few sources (each counted against the budget), delegates a careful read to deep-diver, then calls save_report and replies with the path and a summary. Ask it to publish and the run pauses at the approval gate until you approve, edit, or reject.
Drive it from TypeScript#
The TUI is for development. In production you use the HTTP API or the typed client:
import { Client } from "eve/client";
const client = new Client({ host: "http://127.0.0.1:3000" });
const session = client.session();
const response = await session.send("Research WebGPU support across browsers in 2026.");
const result = await response.result();
console.log(result.status, result.message);
// Continue the same durable session; it keeps its budget, plan, and history.
const followUp = await session.send("Now focus on Safari specifically.");
console.log((await followUp.result()).message);Add an eval#
Because evals are built in, locking the behaviour down is a file, not a separate tool:
// evals/research.eval.ts
import { defineEval } from "eve/evals";
import { includes } from "eve/evals/expect";
export default defineEval({
description: "The analyst searches, stays in budget, and produces a sourced brief.",
async test(t) {
await t.send("Give me a sourced brief on the rise of small language models in 2025.");
t.completed();
t.calledTool("web_search");
t.maxToolCalls(40);
t.check(t.reply, includes("http"));
t.judge.autoevals.closedQA("the answer cites at least two distinct sources").atLeast(0.6);
},
});Run it with eve eval, wire eve eval --strict --junit .eve/junit.xml into CI, and you have regression coverage on an agent, graded partly by another model, in a few lines.
That is a genuinely capable research agent: planning, web research, context-quarantined deep dives, a durable budget, a human-gated publish step, a TypeScript interface, and an eval. The framework did the loop, the persistence, the serving, and the sandbox. We wrote the parts that are actually ours.
Eve vs Deep Agents#
The closest comparison is LangChain's Deep Agents, the same idea (an agent harness with the reliability patterns built in) from the Python side. They rhyme more than they differ, so the question is which fits your situation.
The biggest split is language and philosophy. Deep Agents is a Python library: you call create_deep_agent(...) and configure it in code, running on the LangGraph runtime. Eve is filesystem-first TypeScript: your agent is a directory Eve discovers, running on the Workflow SDK with a built-in HTTP host. Python team in the LangChain ecosystem, pick Deep Agents. TypeScript shop, especially on Vercel, and Eve will feel native in a way nothing in Python will.
On core capabilities they line up closely: planning (Eve's todo, Deep Agents' write_todos), subagents with context isolation, context management, human-in-the-loop (needsApproval and ask_question vs interrupt_on), and skills. If you have used one, the other's feature list will feel familiar.
Where Eve reaches further is everything around the agent. Durability is the substrate, not an option. Serving is built in: a stable HTTP API, streaming protocol, typed client, and frontend hook. Multi-channel reach is first-class. The sandbox is a real isolated environment with credential brokering. And evaluation is in the box. With Deep Agents you assemble several of these yourself: LangGraph for deployment and persistence, LangSmith for tracing and evals, your own channel integrations.
Where Deep Agents reaches further is ecosystem and maturity. Python is where most ML and data tooling lives, LangChain and LangGraph are battle-tested with a deep integration catalogue, and LangSmith is a more mature evaluation product than a built-in runner. Deep Agents is also past beta. Eve is new and explicitly in beta, so some sharp edges and API churn come with the territory.
| Dimension | Eve | Deep Agents (LangChain) |
|---|---|---|
| Language | TypeScript | Python |
| Definition style | Filesystem-first (a folder) | Code (create_deep_agent) |
| Runtime | Workflow SDK + Nitro HTTP | LangGraph runtime |
| Planning | todo built-in | write_todos built-in |
| Subagents | agent tool + declared subagents | task tool / subagents |
| Human-in-the-loop | needsApproval, ask_question | interrupt_on |
| Durability | Built-in, default | Via LangGraph |
| Serving / HTTP | Built-in (sessions, streaming, client, hook) | Via LangGraph deployment |
| Multi-channel | First-class (Slack, Discord, Linear, ...) | Build your own |
| Sandbox | First-class (Vercel Sandbox, credential brokering) | Pluggable backends |
| External tools | Connections (MCP + OpenAPI) | Tools + MCP adapters |
| Evals | Built into the framework | Pair with LangSmith |
| Deploy target | Vercel (or any Node host) | Any LangGraph host |
| Maturity | Beta | More established |
The honest summary: pick Deep Agents if you are in Python and want the most mature, ecosystem-rich path. Pick Eve if you are in TypeScript, want durability and serving and channels and evals handled by one framework, and the Vercel deploy story fits. They are converging on the same idea from two language communities, which is good for everyone building agents.
Wrapping up#
Eve's filesystem-first design is the bet worth watching: your agent is a folder, and the framework owns the durable runtime, the serving, the sandbox, the channels, and the evals around it. For a research agent that pays off fast, because the built-in harness already gives you planning, web tools, files, and subagents, so a useful agent is a good prompt, a little state, one custom tool, and an approval gate. From there the same app is a durable, streamable, authenticated HTTP service you can call from TypeScript, put behind a browser UI, reach from Slack, schedule on a cron, and regression-test with built-in evals.
It is beta, so expect change, and the right move is to match the tool to the job: reach for Eve when you want a durable, deployable, multi-channel agent and you are happy on TypeScript and Vercel's rails. If you build something with it, I would love to see it.

Folarin Akinloye is an AI Engineer based in London, UK. He builds production-ready agentic AI systems, multi-agent architectures, and sophisticated RAG implementations, and writes about the engineering decisions behind them.