Tyler

Signal Field Guides Evidence-first framing

Context Window Management: When 1M Tokens Isn't Enough

Claude Opus 4.6 scores 76% on MRCR v2 at 1 million tokens. Gemini 3 Pro drops to 26.3%. Bigger windows don't solve the context problem — they change it. Research-backed strategies for chunking, compression, and retrieval.

Signal Failure Briefs Evidence-first framing

The Accountability Gap When AI Agents Act

When an AI agent causes harm, who pays? Current law can't answer that clearly.

Signal Signals Evidence-first framing

More Context Doesn't Kill RAG. It Just Changes the Fight.

Long-context LLMs now hit a million tokens, but a persistent 10% accuracy gap and punishing costs keep RAG very much in the fight.

Signal Benchmark Watch Evidence-first framing

The 12-to-72 Problem: Computer-Use Agents Hit Human Scores but Miss the Point

Computer-use agents jumped from 12% to 72% on OSWorld in 18 months. The scores look like progress. The latency and efficiency numbers tell a different story.

Signal Signals Evidence-first framing

Models Training Models: The Promise and Peril of Synthetic Data

Microsoft's Phi-4 trained on more than 50% synthetic data and beat GPT-4o on graduate science benchmarks. The old rules about training data are changing fast.

Briefing Briefings Evidence-first framing

The Agent Project That Should Have Been One LLM Call

Some enterprise agent projects fail because autonomy was added where a bounded single-call LLM design would have delivered cleaner behavior and lower operational risk.

Briefing Briefings Evidence-first framing

Open Source AI Impact: Who Wins When Models Get Cheap

Open source AI used to be the cheaper substitute. In 2026, that is too small.

Signal Benchmark Watch Evidence-first framing

Why Multi-Agent Papers Don't Replicate in Production

A paper from Tran and Kiela tested 28 multi-agent configurations across four architectures: Sequential, Parallel, Debate, and Ensemble. Every single one...

Signal Primers Evidence-first framing

Types of AI Agents: The 2026 Classification That Actually Helps

The reactive/deliberative/hybrid taxonomy is broken. The 2026 classification that actually helps: coding agents, research agents, computer-use agents, task agents, multi-agent orchestrators, and self-improving agents.