How to Sandbox an AI Agent: Filesystem & Network Isolation Patterns

An AI agent without isolation is a process that reads attacker-influenced data, writes new code at runtime based on what it read, and then runs that code with whatever access its environment happens to grant. The classic sandboxing toolkit on Linux — the kernel features that limit which files and system calls a process can use — was designed to contain known programs whose source code was reviewed before it ran. Agents break that assumption: the code is written while it runs, often under the influence of an adversary who planted a malicious instruction in a web page, a file the agent will read, or the output of a tool the agent will call.

This post is a tour of how the leading platforms isolate agents along two axes — filesystem (where the agent's runtime ends and your real machine begins) and network (which hosts the agent can reach and which credentials it carries) — and what each technique meaningfully prevents. It's the reference we wish existed when we started building Agyn.

What is AI agent isolation?

AI agent isolation is the practice of constraining what an agent's processes can read, talk to, and execute, so that an agent compromised by adversarial input cannot pivot to anything outside its assigned task. It is not the same as classic application sandboxing, which protects against bugs in code written by people you trust. Agents need isolation precisely because the code they run was not reviewed before it executed.

The threat model shifts from "protect against bugs" to "protect against arbitrary adversarial code." That single change is why every meaningful design choice in this post follows the same three rules: deny by default, contain the damage when something gets through, and keep the agent off the credentials.

The two axes

Every isolation design — whether you assemble it from primitives or buy it as a platform — picks a posture on two axes.

Axis	Question it answers	What an attacker gets if this layer fails
Filesystem	Where does the agent's runtime end and your real filesystem begin? Which files (and processes, sockets, kernel surface) can the agent's process reach?	Secrets, source code, the ability to plant backdoors or corrupt other agents' work, in the worst case a breakout from the agent's process to the host itself
Network	Which hosts can the agent reach, with which credentials?	Data exfiltration, calls to internal APIs, hopping from one internal system to another, access to another customer's data

There is no single technique that covers both. A production design picks one strong pattern per axis and verifies they don't undercut each other.

Why agent threats are different

A few numbers to anchor the urgency:

In the 2026 OWASP Top 10 for Agents, Excessive Agency (LLM08) is the highest-risk category — agents granted more permission than the task requires, with no narrowing at the tool boundary.
Microsoft Security Research's "When prompts become shells" (May 2026) documents end-to-end paths from a prompt-injection string in an agent's input to remote code execution (the attacker running arbitrary commands on the agent's host), exploiting default configurations in mainstream agent frameworks.
Independent red-team studies in 2026 found a majority of agentic AI systems vulnerable to prompt-injection-driven actions, including widely deployed coding assistants. Google researchers monitoring the web reported a ~32% increase in malicious prompt-injection payloads embedded in scraped content between late 2025 and early 2026.
The UK AI Safety Institute's SandboxEscapeBench is a benchmark that measures whether frontier models can break out of a container they were given — recognition that container escape is no longer a purely theoretical concern.

The mechanism is simple: an agent reads attacker-controlled data (a web page, a file, the output of a tool, a code comment), the LLM treats the data as instructions, and the agent issues tool calls that fulfil the attacker's intent. If those tool calls reach the filesystem or network of a process that holds secrets or has internal reach, the agent has become the exploit.

Isolation is what turns "the agent has been hijacked" from a security incident into a contained one.

Filesystem isolation

The filesystem axis is a spectrum, ordered roughly by what an attacker would need to defeat to escape:

Shared host (git worktree, plain process). Cheap, no isolation beyond directory layout. Secrets in ~/.aws, /tmp, and environment variables are all reachable. Fine for sequential dev on a trusted host; not a sandbox.
Container (runc, the standard Linux container runtime). Each agent gets its own view of the filesystem and process list, but every container shares one Linux kernel with the host. Defeats casual filesystem snooping; doesn't defeat kernel exploits.
gVisor. Google's runtime sits between the agent and the host kernel and re-implements most of the Linux interface in user space, so the agent's processes never make direct kernel calls. Used by Modal and inside Anthropic's Claude Code sandbox.
Kata Containers. A container runtime that runs each container inside its own small virtual machine with a dedicated guest kernel. The VMM backend is pluggable — Firecracker, QEMU, or Cloud Hypervisor. Kubernetes drives Kata like any other container runtime, so workloads need no changes. This is what Agyn runs on (Kata Containers with the Firecracker backend).

Where each pattern fits:

Boundary	Kernel	Use case
Container (runc / crun)	Shared with host	Trusted workloads
gVisor (`runsc`)	User-space re-implementation	Syscall-level isolation without booting a kernel
Kata Containers	Per-container VM (Firecracker / QEMU / Cloud Hypervisor backend)	Hostile workloads on Kubernetes

Real case — EchoLeak (CVE-2025-32711, Aim Security, June 2025). A single crafted email sent to a Microsoft 365 user caused Copilot to read internal files across OneDrive, SharePoint, and Teams during routine summarisation and exfiltrate their contents — zero clicks, no user interaction. The agent had legitimate read access to all those sources; broad filesystem reach across data domains is what made the cross-source exfil possible (Sentra writeup, arXiv 2509.10540). Narrowing filesystem reach to one task at a time is the structural fix.

Network isolation

Network isolation has to answer two questions independently:

Which hosts can the agent reach?
Which credentials does the agent hold while reaching them?

Get the first wrong and prompt injection becomes exfiltration. Get the second wrong and a compromised agent is a compromised production system. Most platforms do one well and the other badly.

Egress allowlists

The point is the same — deny by default, allow named destinations — but the mechanism varies:

Hostname filtering at the TLS layer (Vercel Sandbox). When an HTTPS connection opens, the first packet contains the destination hostname in plaintext (the SNI field of the TLS handshake). The sandbox firewall reads that, allows only configured hostnames, blocks the rest. Static IP-range blocks layer on top. The allowlist is editable at runtime.
Authenticated forwarding proxy (Anthropic Claude Code). All outbound traffic routes through a proxy. The agent authenticates with a signed token (a JWT) that enumerates exactly which hostnames it is allowed to reach. Three independent controls layer on top of each other — the agent's proxy settings, no direct DNS resolution inside the container, and a network-level firewall — so a bypass has to defeat all three.

Not every platform uses an allowlist. Agyn takes a different approach — covered in credential brokering below.

A worked example of the honesty this design demands: Anthropic publicly documents that the signed token silently includes six additional Anthropic-controlled hosts (including sentry.io for telemetry) beyond what the user configures. That's a defensible product decision — but only if you tell users. The lesson is that "allowlist" only means what the platform documents it to mean.

Real case — IDEsaster Cursor exfil (Ari Marzouk, late 2025). Researchers documented 24+ CVEs across Cursor, GitHub Copilot, Windsurf, Zed and other AI IDEs sharing one exfil pattern: the agent writes a JSON file with a JSON-Schema URL hosted on the attacker's domain, and the editor's schema validator then fetches that URL — leaking the file's contents in the HTTP request to the attacker. The agent never explicitly "sent" the data; the validator did, as part of normal operation (The Hacker News, byteiota). An egress allowlist that didn't include the attacker's domain would have dropped the validation request.

Credential brokering

This is the most important network pattern of 2026, and the most overlooked.

The problem. If an agent's process holds GITHUB_TOKEN, STRIPE_API_KEY, or any other long-lived secret in its environment, prompt injection can exfiltrate it as easily as the agent can use it. Northflank's 2026 sandbox roundup names environment-variable leakage as the biggest blind spot in agent sandboxing.

The pattern. The agent never holds the credential. A proxy outside the sandbox intercepts every outbound call and attaches the right token at the network layer. The agent uses a synthetic credential that is meaningful only to the proxy.

Implementations to know:

Anthropic git proxy. The agent runs git push origin main with a scoped synthetic credential. A custom proxy outside the sandbox attaches the real GitHub OAuth token before forwarding. The model never sees the real token.
Cloudflare Outbound Workers for Sandboxes (April 2026). Each sandbox gets a temporary certificate that lets a Cloudflare-controlled proxy decrypt and re-sign its outbound HTTPS traffic on the fly, attaching the right credentials before forwarding. Command-line tools inside the sandbox never hold real tokens.
Agyn LLM Proxy and Egress Gateway. Two proxies that sit outside the agent's sandbox — one for LLM calls, one for everything else. The agent makes ordinary HTTP calls; each proxy attaches the real token at the network layer and forwards. The agent never holds the credential. Authentication is per-pod network identity, not an API key the agent could leak.
AWS Bedrock AgentCore Identity (a different shape). A managed secrets vault holds long-lived OAuth credentials and mints short-lived access tokens at call time — but those access tokens still get handed to the agent during the call. Strong protection of the long-lived secret, weaker isolation at the moment of use.

The pattern reframes the question. Instead of asking "how do we keep secrets out of the agent's memory once they're loaded?" you arrange for them to never be loaded.

Real case — GitHub Copilot Codespaces GITHUB_TOKEN exfil (CVE-2026-21516). Researchers used prompt injection inside repository content to make GitHub Copilot for JetBrains read the GITHUB_TOKEN from its own environment and emit it back through the model's output channel. The token sat in the agent's process environment because that's where Codespaces put it (paperclipped writeup). If the agent had used a synthetic placeholder and a proxy outside the sandbox had attached the real token to outgoing GitHub API calls, there would have been nothing in the agent's environment to leak.

Per-MCP-server isolation

MCP is the de-facto extension protocol for agent tools. By default an MCP server runs as a subprocess of the agent — same process tree, same environment, same network reach. A prompt injection in MCP server A's output can read MCP server B's credentials through nothing more than shared environment variables.

The fix is to make each MCP server its own container with credentials scoped to it. Two implementations:

Cloudflare McpAgent. Each MCP server is a Durable Object with its own state and bindings — strong isolation, but only if you rewrite the server in TypeScript using Cloudflare's SDK.
Agyn. Each MCP server is a Kubernetes sidecar in the agent's pod, accessed by the agent over localhost. Per-server credentials come from per-server Kubernetes Secrets. Existing MCP servers in any language run unchanged.

This matters more as agents accumulate tools. A coding agent with twelve MCP servers has twelve credential domains; if they share an environment, the security boundary is the union of every server's vulnerabilities.

Real case — the MCP supply-chain wave (January–April 2026, OX Security). Researchers disclosed 40+ CVEs across MCP implementations in Python, TypeScript, Java, and Rust — a supply chain spanning 150M+ downloads and 7,000+ publicly accessible servers. CVE-2025-6514 (mcp-remote) was the first documented full-system compromise through MCP, affecting deployments at Cloudflare, Hugging Face, and Auth0. BlueRock's April 2026 survey found 36.7% of 7,000 public MCP servers had server-side request-forgery flaws; Microsoft's MarkItDown MCP enabled full AWS account takeover through the cloud metadata service (OX Security, The Hacker News, Docker MCP horror stories). When every MCP server is a separate container with its own credentials and its own egress identity, a compromise of any one server stays inside that one container.

Composing the axes

A worked example. Here's how Agyn composes the axes on Kubernetes:

Axis	Mechanism	What it prevents
Filesystem	Kata Containers + Firecracker microVM per workload. Dedicated disk per thread; skills mounted read-only; each MCP server owns its disk. Kubernetes "restricted" profile (no root, no capabilities, no privilege escalation, read-only root FS), inside a namespace-scoped identity.	Cross-thread state collisions; shared-`/tmp` leaks; one agent corrupting another's workspace; root escalation or sideways access to platform services.
Network	Per-pod network identity. Public internet open by default; cluster-internal blocked. Two credential proxies outside the sandbox — one for LLM calls, one for other outbound traffic — so the agent never holds real keys.	Credentials leaking to the model; one customer's agent reaching another customer's data; long-lived tokens in environment variables that prompt injection can exfiltrate.
MCP (cross-cutting)	Each MCP server runs as its own container alongside the agent; credentials come from per-server secrets; the agent reaches it over localhost.	A compromised tool reaching another tool's filesystem, credentials, or network identity.

If you want the full architecture, the Agyn paper and github.com/agynio/platform have the details. For a vendor-by-vendor scorecard on these same capabilities, see How to select an AI agent runtime for production.

Takeaway

AI agent isolation is two problems, not one. Filesystem isolation decides where the agent's runtime ends and your real machine begins — who can read your secrets, who can stomp on your work, whether a successful exploit stays inside the sandbox. Network isolation decides where your data can flow and which credentials reach the model.

The two patterns that pay back the most in 2026: deny-by-default egress with credential brokering, so prompt injection cannot reach an exfiltration endpoint or steal a long-lived token; and per-MCP-server containerization, so a compromised tool cannot reach another tool's secrets. Get those two right and most of the rest is hygiene.

References

How we contain Claude across products — Anthropic engineering
Making Claude Code more secure and autonomous with sandboxing — Anthropic engineering
When prompts become shells: RCE vulnerabilities in AI agent frameworks — Microsoft Security Blog, May 2026
Can AI agents escape their sandboxes? A benchmark for safely measuring container breakout capabilities — UK AI Safety Institute
Running Agents on Kubernetes with Agent Sandbox — Kubernetes blog
Comparing Sandboxing Approaches for AI Agents — Docker
OWASP Top 10 for Agents 2026
Prompt Injection Attacks on Agentic Coding Assistants (arXiv 2601.17548)
EchoLeak: The First Real-World Zero-Click Prompt Injection Exploit (arXiv 2509.10540) — CVE-2025-32711 analysis
Researchers Uncover 30+ Flaws in AI Coding Tools — IDEsaster disclosure
MCP Supply Chain Advisory: RCE Vulnerabilities Across the AI Ecosystem — OX Security
MCP Horror Stories: The Supply Chain Attack — Docker
Why isolated sandboxes are a hard requirement for AI agents — Agyn blog
How to Select an AI Agent Runtime for Production — Agyn blog

Get new agent engineering posts in your inbox