
We Read 12 Agent-Security Papers from CAIS 2026. Here Are the 3 You Actually Need.
We read 12 agent-security papers from CAIS 2026 — ACM's first conference on AI and agentic systems. Here are the three that should be on every agent team's desk this quarter.

Zero-trust overlay networks for AI agent isolation
Default cluster networking lets any pod dial every database, internal API, and other pod in the same VPC — the exfil path AI agents turn into incidents. A zero-trust overlay makes every dial an identity decision instead. The SSRF exploit pattern, and how Agyn wires the alternative in.

How to scope credentials for an AI coding agent (so it can't read your .env)
An AI coding agent has shell access and reads attacker-influenced data. That breaks the .env model. Two patterns actually contain the leak — short-lived credentials and credential brokering — and a short tour of the registered CVEs that explain why.

AI agent sandboxing: filesystem and network isolation patterns
Two axes decide whether an AI agent can be trusted with real tools: where its runtime is isolated from your real filesystem, and what it can reach on the network. We walk through the patterns leading platforms use on each axis and what each technique actually buys you.

CLAUDE.md and AGENTS.md compatibility: what Claude Code and Codex actually read
Claude Code reads CLAUDE.md. Codex reads AGENTS.md. Neither falls back to the other by default. Here's what the docs say in both directions, how to make one file work for both tools, and why the file is only half the problem.

How to Select an AI Agent Runtime for Production
We scored seven AI agent platforms on the capabilities that decide production readiness — self-hosting, MCP isolation, credential isolation, and zero-trust networking. No competitor scores above 3.75 out of 7.

Introducing Agyn: open-source Kubernetes runtime for AI agents
Shipping the new Agyn: a Kubernetes-native runtime for AI agents, with isolation, observability, and access controls built in. The control plane enterprises need to safely run thousands of different agents inside their own infrastructure.

AI self-improvement in 2026: what the research actually shows
Frontier coding agents reach 23% of human performance on autonomous post-training and reward-hack their way there. Recursion exists, but the moat moved to the harness.

Context-Activated Memory for Claude Code Agents
Claude Code’s built-in memory resets every session and doesn’t scale well. We built a context-activated retrieval layer instead. It uses a dedicated LLM to surface stored notes only when they’re relevant, not upfront. Under the hood, it runs a map-reduce process over memory chunks with automatic hook injection.

Why isolated sandboxes are a hard requirement for AI agents
Running AI agents on real codebases without proper isolation leads to file collisions, secret leakage, and non-reproducible failures. Isolation isn't an optimization — it's a prerequisite.

We tested how an AI team improves issue resolution on SWE-bench Verified
We evaluated a team-based approach on SWE-bench Verified, showing top performance among systems using GPT-5–class models.

gh pr-review: LLM-friendly PR review workflows in your CLI
A GitHub CLI extension that returns compact, deterministic JSON for PR reviews: single-command aggregation with filters, replies, resolutions, and submissions, reducing token overhead and error-prone tool chains.

Autonomous Software Engineer (A‑SWE): Scaling Beyond the Demo
A‑SWE reaches production when approvals, reproducible workspaces, and replayable timelines are in place—so leaders can trust outcomes, audit decisions, and scale.

How we built a small Pexels CLI (and the aarch64 cross-build trap we escaped)
A tiny Rust CLI that speaks the Pexels API, and the practical fix for aarch64 cross-builds on GitHub Actions.

What 2,800+ Claude Code issues reveal about AI dev tools teams actually use
We analyzed 2,800+ Claude Code issues. Here are four themes that separate demos from durable AI dev tools—plus concrete wins teams can ship now.

Multi‑Agent Orchestration: Patterns That Actually Work
Reliable multi‑agent systems use roles, handoffs, SLAs, and approvals—turning planner/executor/reviewer patterns into predictable missions teams can operate.

Agentic AI: From Demos to Durable Engineering
Agentic AI creates durable value when it moves beyond demos into an org-first control plane with orchestration, governance, and observability that teams can operate.

What 1,000+ Codex CLI issues reveal about AI dev tools that teams actually use
We analyzed 1,000+ Codex CLI issues. Here are 10 product themes that separate hobby projects from production-ready AI dev tools—plus concrete wins to deliver now.