Agent Design
How you actually build AI agents that work. Architectures, tool use, memory patterns, and the frameworks worth paying attention to.
Key Guides
Most Multi-Agent Systems Aren't Cooperating. They're Colliding.
A new benchmark from Tsinghua and Microsoft tests 16 multi-agent frameworks on tasks requiring genuine coordination. The median system spends 74% of its inter-agent messages on redundant state synchronization, and adding a third agent makes most pipelines slower, not faster.
Your Agent's System Prompt Is Fighting Itself
A framework called Arbiter treats agent system prompts as auditable code. Applied to Claude Code, Codex CLI, and Gemini CLI, it found 152 interference patterns — including critical contradictions and a structural data loss bug — for a total cost of $0.27.
Agent Benchmarks Won't Sit Still
Static agent benchmarks assume frozen environments. ProEvolve evolved one environment into 200 with 3,000 task sandboxes. Every frontier model failed in structurally different ways when familiar tools disappeared.
Most AI Agents Don't Know When They're Wrong
A 4B parameter model just matched GPT-4o on tool-use tasks by learning to verify its own actions. The CoVe paper shows verification-first training beats the retry-and-pray approach plaguing production
From Clawdbot to OpenAI in 90 Days
OpenClaw hit 100,000 GitHub stars in 48 hours, survived three name changes, a supply chain attack, and three critical CVEs. Then its creator Peter Steinberger joined OpenAI.
Hierarchical Agents Don't Know Who They're Talking To
Roughly 70% of Earth science datasets hosted in large repositories like PANGAEA go uncited after publication. The data exists. The agents can access it....
When Your Agent Stops Using Tools
Reinforcement learning was supposed to teach agents to use tools fluently. Instead, researchers are watching a consistent failure mode: models trained...
The Protocol Wars Are Ending. Here's What Actually Happened.
Anthropic's MCP and Google's A2A joined the Linux Foundation. IBM killed its own protocol to back A2A. 146 organizations signed on. The wars are ending.
Multi-Agent Reasoning's Memory Problem
Reasoning language models score in the top percentile on math olympiad benchmarks, yet a new study from Stanford found they fail to correctly recall their...
Nobody Knows If Deployed AI Agents Are Safe
The 2025 AI Agent Index just cataloged over 100 deployed agentic AI systems, and the finding that should alarm everyone isn't about capability. It's about...