Agent Design

Architectures, tool use, and frameworks for building AI agents. From single-agent patterns to production deployment strategies.

Why Agent Builders Are Betting on 7B Models Over GPT-4
guides

Why Agent Builders Are Betting on 7B Models Over GPT-4

Gemma 2 9B just scored 71.3% on GSM8K. Phi-3-mini hit 68.8% on MMLU using 3.8 billion parameters. Mistral 7B matched GPT-3.5 performance six months ago....

14 min read
Reward Models Are Learning to Lie
signals

Reward Models Are Learning to Lie

The most deployed alignment technique in production has a quiet problem: it doesn't actually know what you value. RLHF trains models to maximize a reward...

8 min read
When Your Judge Can't Read the Room
guides

When Your Judge Can't Read the Room

Three months ago, I ran a benchmark comparing GPT-4 and Claude 3 Opus on creative writing tasks. GPT-4 won by a comfortable margin according to my...

15 min read
Most Agent Benchmarks Test the Wrong Thing
signals

Most Agent Benchmarks Test the Wrong Thing

The SciAgentGym team ran 1,780 domain-specific scientific tools through current agent frameworks. Success rate on multi-step tool orchestration: 23%. Same...

6 min read
signals

When Multi-Agent Systems Break: The Coordination Tax Nobody Warns You About

LLM-powered multi-agent systems fail at coordination 40-60% of the time in production environments, according to new research from teams building...

6 min read
Types of AI Agents: Reactive, Deliberative, Hybrid, and What Comes Next
guides

Types of AI Agents: Reactive, Deliberative, Hybrid, and What Comes Next

SWE-bench accuracy went from 1.96% in 2023 to 69.1% in 2025. Understanding the types of AI agents behind this progress (reactive, deliberative, hybrid, and autonomous) is the difference between building tools that work and tools that impress.

15 min read
Your AI Agent Can Reason, Plan, and Code. It Still Can't See the Web.
signals

Your AI Agent Can Reason, Plan, and Code. It Still Can't See the Web.

AI agents can reason, plan, and code. But they still can't reliably see the live web. The observation layer is the real bottleneck for production agents.

6 min read
How to Test and Debug AI Agents
guides

How to Test and Debug AI Agents

Agents that call APIs, write to databases, and send emails can't be tested like chatbots. A complete guide to failure taxonomies, debugging tools, and evaluation pipelines.

12 min read
The Protocol Wars Nobody's Winning
signals

The Protocol Wars Nobody's Winning

Ten competing agent protocols and counting. MCP won the tool layer but shipped without authentication. The alphabet soup is a coordination failure.

7 min read
Abstract spiral pattern with glowing lights creating recursive loops in a dark background
signals

Agents That Rewrite Themselves: The Self-Modifying Stack Is Here

Three independent papers demonstrate agents rewriting their own training code, generating their own knowledge structures, and refining their reasoning at test time. Self-improvement has moved from theory to working engineering.

7 min read
Swarm Signal
0:00
0:00
Up Next

Queue is empty. Click "+ Queue" on any article to add it.