Also from Tyler's team
Spreadsheets That Don't Suck
BoredTools builds practical templates for budgeting, freelancing, and productivity. Simple, useful, no subscription required.
Agents That Rewrite Themselves: The Self-Modifying Stack Is Here
Sakana AI's [Darwin Godel Machine](https://sakana.ai/dgm/) improved its SWE-bench score from 20.0% to 50.0% last May by letting an agent rewrite its own...
Your AI Inherited Your Biases: When Agents Think Like Humans (And That's Not a Compliment)
By early 2017, Amazon quietly disbanded a team that had spent years building an AI hiring tool. The algorithm worked exactly as designed. It learned from...
The NHS Bet on AI Triage Is Bigger Than Anyone Admits
A single GP surgery in Surrey cut patient waiting times by 73% in four months. Not by hiring more doctors. Not by extending hours. By letting an AI decide...
The Benchmark Trap: When High Scores Hide Low Readiness
GPT-5 solves 65% of single-issue bug fixes on SWE-Bench Verified. The same model achieves just 21% on [SWE-EVO](https://arxiv.org/abs/2512.18470), where...
The Budget Problem: Why AI Agents Are Learning to Be Cheap
In January 2026, researchers at the University of Arkansas at Little Rock discovered something unsettling: their dialogue agents were using 41% more...
Chain-of-Thought Prompting Doesn't Always Work. Here's the Evidence.
Think step by step. It's the most common prompt engineering advice in circulation, repeated in tutorials, baked into system prompts, and treated as a...
Interpretability as Infrastructure: Why Understanding AI Matters More Than Controlling It
Approximately 100 neurons control subject-verb agreement in large language models. Not thousands. Not millions. One hundred MLP neurons in a 8-billion...
We Built the Agent Internet Before Its Firewalls
In January 2026, a security startup called Cyata published three CVEs against Anthropic's official Git MCP server. Not a third-party wrapper. Not a...
The Red Team That Never Sleeps: When Small Models Attack Large Ones
A 1.5-billion parameter model just learned to jailbreak GPT-5 Nano, Claude 3.5 Sonnet, and Gemini 2.5 Flash. It didn't need human creativity or domain...
Robots With Reasoning: When Language Models Meet the Physical World
A robot arm completing 84.9% of manipulation tasks without a single demonstration. Not through months of reinforcement learning or massive datasets of...