LLMs Can't Find What's Already In Their Heads
Knowledge graphs have a well-documented lookup problem. When you ask an LLM to traverse a KG and reason over multi-hop paths, it doesn't search the graph...
Multi-Agent Reasoning's Memory Problem
Reasoning language models score in the top percentile on math olympiad benchmarks, yet a new study from Stanford found they fail to correctly recall their...
Small Models Just Got Smarter About When to Think
Reasoning tokens aren't free. Every chain-of-thought step an LLM generates costs inference budget, and most of the time that thinking is wasted on tasks...
Nobody Knows If Deployed AI Agents Are Safe
The 2025 AI Agent Index just cataloged over 100 deployed agentic AI systems, and the finding that should alarm everyone isn't about capability. It's about...
Small Models Just Learned When to Ask for Help
SWE-bench has been the graveyard of small language models. While GPT-4 class systems resolve over 40% of real-world GitHub issues, models under 10 billion...
MoE's Dirty Secret Is Load Balancing
Every frontier lab now ships a sparse Mixture-of-Experts model. Google's Switch Transformer started the trend. DeepSeek-V3 proved it could scale....
When Single Agents Beat Swarms: The Case Against Multi-Agent Systems
Stanford researchers found LLM teams fail to match their expert agents by up to 37.6%. Independent multi-agent systems amplify errors 17.2 times. The evidence for single agents over swarms is stronger than the industry admits.
The Control Interface Problem in Physical AI
NVIDIA just released a video foundation model that can simulate physical worlds with startling accuracy. A team at Oak Ridge National Laboratory built an...
Knowledge Graphs Just Made RAG Worth the Complexity
Retrieval-augmented generation was supposed to solve the hallucination problem. It didn't. Most RAG systems still return the wrong chunk, miss the...
The Accountability Gap When AI Agents Act
When an AI agent causes harm, who pays? Current law can't answer that clearly.