Agent Design
Key Guides
Agent Benchmarks Won't Sit Still
Static agent benchmarks assume frozen environments. ProEvolve evolved one environment into 200 with 3,000 task sandboxes. Every frontier model failed in structurally different ways when familiar tools disappeared.
Most AI Agents Don't Know When They're Wrong
A 4B parameter model just matched GPT-4o on tool-use tasks by learning to verify its own actions. The CoVe paper shows verification-first training beats the retry-and-pray approach plaguing production
From Clawdbot to OpenAI in 90 Days
OpenClaw hit 100,000 GitHub stars in 48 hours, survived three name changes, a supply chain attack, and three critical CVEs. Then its creator Peter Steinberger joined OpenAI.
Hierarchical Agents Don't Know Who They're Talking To
Roughly 70% of Earth science datasets hosted in large repositories like PANGAEA go uncited after publication. The data exists. The agents can access it....
When Your Agent Stops Using Tools
Reinforcement learning was supposed to teach agents to use tools fluently. Instead, researchers are watching a consistent failure mode: models trained...
The Protocol Wars Are Ending. Here's What Actually Happened.
Anthropic's MCP and Google's A2A joined the Linux Foundation. IBM killed its own protocol to back A2A. 146 organizations signed on. The wars are ending.
Multi-Agent Reasoning's Memory Problem
Reasoning language models score in the top percentile on math olympiad benchmarks, yet a new study from Stanford found they fail to correctly recall their...
Nobody Knows If Deployed AI Agents Are Safe
The 2025 AI Agent Index just cataloged over 100 deployed agentic AI systems, and the finding that should alarm everyone isn't about capability. It's about...
Small Models Just Learned When to Ask for Help
SWE-bench has been the graveyard of small language models. While GPT-4 class systems resolve over 40% of real-world GitHub issues, models under 10 billion...
The Control Interface Problem in Physical AI
NVIDIA just released a video foundation model that can simulate physical worlds with startling accuracy. A team at Oak Ridge National Laboratory built an...