Agent Design

How you actually build AI agents that work. Architectures, tool use, memory patterns, and the frameworks worth paying attention to.

Agent Benchmarks Won't Sit Still
signals

Agent Benchmarks Won't Sit Still

Static agent benchmarks assume frozen environments. ProEvolve evolved one environment into 200 with 3,000 task sandboxes. Every frontier model failed in structurally different ways when familiar tools disappeared.

3 min read
Most AI Agents Don't Know When They're Wrong
signals

Most AI Agents Don't Know When They're Wrong

A 4B parameter model just matched GPT-4o on tool-use tasks by learning to verify its own actions. The CoVe paper shows verification-first training beats the retry-and-pray approach plaguing production

6 min read
From Clawdbot to OpenAI in 90 Days
signals

From Clawdbot to OpenAI in 90 Days

OpenClaw hit 100,000 GitHub stars in 48 hours, survived three name changes, a supply chain attack, and three critical CVEs. Then its creator Peter Steinberger joined OpenAI.

6 min read
Hierarchical Agents Don't Know Who They're Talking To
signals

Hierarchical Agents Don't Know Who They're Talking To

Roughly 70% of Earth science datasets hosted in large repositories like PANGAEA go uncited after publication. The data exists. The agents can access it....

7 min read
When Your Agent Stops Using Tools
signals

When Your Agent Stops Using Tools

Reinforcement learning was supposed to teach agents to use tools fluently. Instead, researchers are watching a consistent failure mode: models trained...

7 min read
The Protocol Wars Are Ending. Here's What Actually Happened.
signals

The Protocol Wars Are Ending. Here's What Actually Happened.

Anthropic's MCP and Google's A2A joined the Linux Foundation. IBM killed its own protocol to back A2A. 146 organizations signed on. The wars are ending.

5 min read
Multi-Agent Reasoning's Memory Problem
signals

Multi-Agent Reasoning's Memory Problem

Reasoning language models score in the top percentile on math olympiad benchmarks, yet a new study from Stanford found they fail to correctly recall their...

8 min read
Nobody Knows If Deployed AI Agents Are Safe
signals

Nobody Knows If Deployed AI Agents Are Safe

The 2025 AI Agent Index just cataloged over 100 deployed agentic AI systems, and the finding that should alarm everyone isn't about capability. It's about...

7 min read
Small Models Just Learned When to Ask for Help
signals

Small Models Just Learned When to Ask for Help

SWE-bench has been the graveyard of small language models. While GPT-4 class systems resolve over 40% of real-world GitHub issues, models under 10 billion...

7 min read
The Control Interface Problem in Physical AI
Guides

The Control Interface Problem in Physical AI

NVIDIA just released a video foundation model that can simulate physical worlds with startling accuracy. A team at Oak Ridge National Laboratory built an...

12 min read
Swarm Signal
0:00
0:00
Up Next

Queue is empty. Click "+ Queue" on any article to add it.