signals
Key Guides
Computer-Use Agents Can't Stop Breaking Things
Five research teams just published papers on the same problem: AI agents that can click, type, and control real software keep doing catastrophically...
Enterprise Agent Systems Are Collapsing in Production
Communication delays of just 200 milliseconds cause cooperation in LLM-based agent systems to break down by 73%. Not network latency from poor...
Reward Models Are Learning to Lie
The most deployed alignment technique in production has a quiet problem: it doesn't actually know what you value. RLHF trains models to maximize a reward...
Most Agent Benchmarks Test the Wrong Thing
The SciAgentGym team ran 1,780 domain-specific scientific tools through current agent frameworks. Success rate on multi-step tool orchestration: 23%. Same...
The Inference Budget Just Got Interesting
OpenAI's o1 made headlines for "thinking harder" during inference. But the real story isn't that a model can spend more tokens on reasoning: it's that...
When Multi-Agent Systems Break: The Coordination Tax Nobody Warns You About
LLM-powered multi-agent systems fail at coordination 40-60% of the time in production environments, according to new research from teams building...
Inference-Time Compute Is Escaping the LLM Bubble
Explore how inference-time compute scaling lets AI models think longer and reason deeper, boosting accuracy without retraining.
Your AI Agent Can Reason, Plan, and Code. It Still Can't See the Web.
AI agents can reason, plan, and code. But they still can't reliably see the live web. The observation layer is the real bottleneck for production agents.
The Protocol Wars Nobody's Winning
Ten competing agent protocols and counting. MCP won the tool layer but shipped without authentication. The alphabet soup is a coordination failure.
Fourteen Papers, Three Ways to Break: ICLR 2026's Multi-Agent Failure Playbook
ICLR 2026 produced a failure playbook for multi-agent systems. 70% of agent communication is redundant. Single agents still match swarms on most benchmarks.