Agent Design

How you actually build AI agents that work. Architectures, tool use, memory patterns, and the frameworks worth paying attention to.

Deep Dives and Frameworks

Implementation playbooks, operator patterns, and durable analysis.

Signals, Maps, and Watch Lists

Production-oriented analysis, benchmarks, and market/system intelligence.

External tools

Execution tooling is separate

Swarm Signal keeps the analysis layer. Use BoredTools for reusable production templates and trackers.

Open BoredTools Open Budget Tracker

Signal Benchmark Watch Evidence-first framing

Tool-Use Agents Need Failure Labels, Not Pass Rates

Tool-use agents can fail in ways a final accuracy score hides, because the same wrong answer can come from skipped tools, ignored outputs, fabricated...

Signal Benchmark Watch Evidence-first framing

Computer-Use Agents Fail Long Workflows, Not Mouse Clicks

Computer-use agents are clearing more short benchmark tasks, but the new failure line is workflow length. A June 2026 benchmark called OSWorld 2.0 tests...

Signal Signals Evidence-first framing

Agent Observability Needs Provenance, Not More Logs

Agent observability is drifting toward a familiar trap: capture every trace, then ask an engineer to work out why the agent did the wrong thing. A June...

Signal Signals Evidence-first framing

Agent Messages Need State, Not Chat

Multi-agent systems do not only fail because the agents are weak. They also fail because every agent is allowed to narrate too much. A June 2026 paper...

Signal Benchmark Watch Evidence-first framing

Agent Leaderboards Can Be Cheaper Without Being Safer

A March 2026 paper on efficient agent benchmarking found that mid-difficulty task subsets can remove large parts of an agent benchmark while preserving...

Signal Benchmark Watch Evidence-first framing

TerminalWorld Makes Agent Benchmarks Harder to Fake

TerminalWorld turns public terminal recordings into validated agent tasks. The signal is not a higher leaderboard score. It is a harder benchmark supply chain.

Signal Signals Evidence-first framing

Agent Tool Menus Are a Safety Surface

New agent benchmarks suggest the visible tool menu is not a neutral implementation detail. It changes success, cost, wrong-tool calls, and risk exposure.

Signal Signals Evidence-first framing

Agent Benchmarking Doesn't Need Every Task

Efficient agent benchmarking points to a cheaper way to compare agents: run the tasks that still separate systems, not every task in the suite.

Signal Signals Evidence-first framing

Self-Improving Agents Need Hard Boundaries

Self-improving agents can rewrite code, prompts and memory. Production teams need rollback, approval gates and evaluator change control.

Signal Signals Evidence-first framing

Agent Observability Is Escaping the Dashboard

Agent observability is moving from vendor dashboards into trace contracts that make every model call, tool call, handoff, guardrail, and evaluator step inspectable.