Agent Design

How you actually build AI agents that work. Architectures, tool use, memory patterns, and the frameworks worth paying attention to.

Deep Dives and Frameworks

Implementation playbooks, operator patterns, and durable analysis.

Signals, Maps, and Watch Lists

Production-oriented analysis, benchmarks, and market/system intelligence.

External tools

Execution tooling is separate

Swarm Signal keeps the analysis layer. Use BoredTools for reusable production templates and trackers.

Open BoredTools Open Budget Tracker

Signal Failure Briefs Evidence-first framing

AI Agents Are Security's Newest Nightmare

I've spent the last month reading prompt injection papers, and the thing that keeps me up isn't the attack success rates. It's how many production systems...

Signal Failure Briefs Evidence-first framing

When AI Agents Have Tools, They Lie More

Tool-using agents hallucinate 34% more often than chatbots answering the same questions. The culprit isn't bad models or missing context. It's that giving...

Signal Market Maps Evidence-first framing

Why Agent Builders Are Betting on 7B Models Over GPT-4

Gemma 2 9B just scored 71.3% on GSM8K. Phi-3-mini hit 68.8% on MMLU using 3.8 billion parameters. Mistral 7B matched GPT-3.5 performance six months ago....

Signal Failure Briefs Evidence-first framing

Reward Models Are Learning to Lie

The most deployed alignment technique in production has a quiet problem: it doesn't actually know what you value. RLHF trains models to maximize a reward...

Signal Benchmark Watch Evidence-first framing

When Your Judge Can't Read the Room

Three months ago, I ran a benchmark comparing GPT-4 and Claude 3 Opus on creative writing tasks. GPT-4 won by a comfortable margin according to my...

Signal Benchmark Watch Evidence-first framing

Most Agent Benchmarks Test the Wrong Thing

The SciAgentGym team ran 1,780 domain-specific scientific tools through current agent frameworks. Success rate on multi-step tool orchestration: 23%. Same...

Signal Failure Briefs Evidence-first framing

When Multi-Agent Systems Break: The Coordination Tax Nobody Warns You About

LLM-powered multi-agent systems fail at coordination 40-60% of the time in production environments, according to new research from teams building...

Signal Primers Evidence-first framing

Types of AI Agents: Reactive, Deliberative, Hybrid, and What Comes Next

SWE-bench accuracy went from 1.96% in 2023 to 69.1% in 2025. Understanding the types of AI agents behind this progress (reactive, deliberative, hybrid, and autonomous) is the difference between building tools that work and tools that impress.

Signal Signals Evidence-first framing

Your AI Agent Can Reason, Plan, and Code. It Still Can't See the Web.

AI agents can reason, plan, and code. But they still can't reliably see the live web. The observation layer is the real bottleneck for production agents.

Signal Benchmark Watch Evidence-first framing

How to Test and Debug AI Agents

Agents that call APIs, write to databases, and send emails can't be tested like chatbots. A complete guide to failure taxonomies, debugging tools, and evaluation pipelines.