72.2% issue resolution on SWE-bench Verified — #1 among GPT-5–based systems.

Read the post →
We Read 12 Agent-Security Papers from CAIS 2026. Here Are the 3 You Actually Need.

We Read 12 Agent-Security Papers from CAIS 2026. Here Are the 3 You Actually Need.

We read 12 agent-security papers from CAIS 2026 — ACM's first conference on AI and agentic systems. Here are the three that should be on every agent team's desk this quarter.

Jun 10, 20265 min read
Zero-trust overlay networks for AI agent isolation

Zero-trust overlay networks for AI agent isolation

Default cluster networking lets any pod dial every database, internal API, and other pod in the same VPC — the exfil path AI agents turn into incidents. A zero-trust overlay makes every dial an identity decision instead. The SSRF exploit pattern, and how Agyn wires the alternative in.

Jun 6, 20265 min read
How to scope credentials for an AI coding agent (so it can't read your .env)

How to scope credentials for an AI coding agent (so it can't read your .env)

An AI coding agent has shell access and reads attacker-influenced data. That breaks the .env model. Two patterns actually contain the leak — short-lived credentials and credential brokering — and a short tour of the registered CVEs that explain why.

Jun 5, 20267 min read
AI agent sandboxing: filesystem and network isolation patterns

AI agent sandboxing: filesystem and network isolation patterns

Two axes decide whether an AI agent can be trusted with real tools: where its runtime is isolated from your real filesystem, and what it can reach on the network. We walk through the patterns leading platforms use on each axis and what each technique actually buys you.

Jun 4, 202611 min read
CLAUDE.md and AGENTS.md compatibility: what Claude Code and Codex actually read

CLAUDE.md and AGENTS.md compatibility: what Claude Code and Codex actually read

Claude Code reads CLAUDE.md. Codex reads AGENTS.md. Neither falls back to the other by default. Here's what the docs say in both directions, how to make one file work for both tools, and why the file is only half the problem.

Jun 3, 20266 min read
How to Select an AI Agent Runtime for Production

How to Select an AI Agent Runtime for Production

We scored seven AI agent platforms on the capabilities that decide production readiness — self-hosting, MCP isolation, credential isolation, and zero-trust networking. No competitor scores above 3.75 out of 7.

May 27, 202618 min read
Introducing Agyn: open-source Kubernetes runtime for AI agents

Introducing Agyn: open-source Kubernetes runtime for AI agents

Shipping the new Agyn: a Kubernetes-native runtime for AI agents, with isolation, observability, and access controls built in. The control plane enterprises need to safely run thousands of different agents inside their own infrastructure.

May 20, 20267 min read
AI self-improvement in 2026: what the research actually shows

AI self-improvement in 2026: what the research actually shows

Frontier coding agents reach 23% of human performance on autonomous post-training and reward-hack their way there. Recursion exists, but the moat moved to the harness.

May 13, 20266 min read
Context-Activated Memory for Claude Code Agents

Context-Activated Memory for Claude Code Agents

Claude Code’s built-in memory resets every session and doesn’t scale well. We built a context-activated retrieval layer instead. It uses a dedicated LLM to surface stored notes only when they’re relevant, not upfront. Under the hood, it runs a map-reduce process over memory chunks with automatic hook injection.

Apr 1, 202610 min read
Why isolated sandboxes are a hard requirement for AI agents

Why isolated sandboxes are a hard requirement for AI agents

Running AI agents on real codebases without proper isolation leads to file collisions, secret leakage, and non-reproducible failures. Isolation isn't an optimization — it's a prerequisite.

Feb 21, 20266 min read
We tested how an AI team improves issue resolution on SWE-bench Verified

We tested how an AI team improves issue resolution on SWE-bench Verified

We evaluated a team-based approach on SWE-bench Verified, showing top performance among systems using GPT-5–class models.

Feb 12, 20265 min read
gh pr-review: LLM-friendly PR review workflows in your CLI

gh pr-review: LLM-friendly PR review workflows in your CLI

A GitHub CLI extension that returns compact, deterministic JSON for PR reviews: single-command aggregation with filters, replies, resolutions, and submissions, reducing token overhead and error-prone tool chains.

Dec 3, 202510 min read
Autonomous Software Engineer (A‑SWE): Scaling Beyond the Demo

Autonomous Software Engineer (A‑SWE): Scaling Beyond the Demo

A‑SWE reaches production when approvals, reproducible workspaces, and replayable timelines are in place—so leaders can trust outcomes, audit decisions, and scale.

Oct 23, 202511 min read
How we built a small Pexels CLI (and the aarch64 cross-build trap we escaped)

How we built a small Pexels CLI (and the aarch64 cross-build trap we escaped)

A tiny Rust CLI that speaks the Pexels API, and the practical fix for aarch64 cross-builds on GitHub Actions.

Oct 23, 20254 min read
What 2,800+ Claude Code issues reveal about AI dev tools teams actually use

What 2,800+ Claude Code issues reveal about AI dev tools teams actually use

We analyzed 2,800+ Claude Code issues. Here are four themes that separate demos from durable AI dev tools—plus concrete wins teams can ship now.

Oct 22, 202514 min read
Multi‑Agent Orchestration: Patterns That Actually Work

Multi‑Agent Orchestration: Patterns That Actually Work

Reliable multi‑agent systems use roles, handoffs, SLAs, and approvals—turning planner/executor/reviewer patterns into predictable missions teams can operate.

Oct 21, 202512 min read
Agentic AI: From Demos to Durable Engineering

Agentic AI: From Demos to Durable Engineering

Agentic AI creates durable value when it moves beyond demos into an org-first control plane with orchestration, governance, and observability that teams can operate.

Oct 19, 202511 min read
What 1,000+ Codex CLI issues reveal about AI dev tools that teams actually use

What 1,000+ Codex CLI issues reveal about AI dev tools that teams actually use

We analyzed 1,000+ Codex CLI issues. Here are 10 product themes that separate hobby projects from production-ready AI dev tools—plus concrete wins to deliver now.

Oct 17, 202513 min read