Safety & Governance
The hard problems: red teaming, bias, interpretability, alignment, and the governance frameworks that might actually matter. No hand-waving.
Key Guides
Latest Signals
- Interpretability as Infrastructure: Why Understanding AI Matters More Than Controlling It
- The Red Team That Never Sleeps: When Small Models Attack Large Ones
- Open Weights, Closed Minds: The Paradox of 'Open' AI
- Red Teams Found Agents Leak More Than Models
- Alignment Works in English. In Japanese, It Backfires.
From the team behind Swarm Signal
Track Your Finances While You Build AI
BoredTools makes the boring stuff easy — budget dashboards, freelance trackers, and business planners. Download free or grab the full collection.
The Red Team That Never Sleeps: When Small Models Attack Large Ones
Automated adversarial tools are emerging where small, cheap models systematically find vulnerabilities in frontier models. The safety landscape is shifting from pre-deployment testing to continuous monitoring.
Your AI Inherited Your Biases: When Agents Think Like Humans (And That's Not a Compliment)
New research shows AI agents don't just learn human capabilities; they systematically inherit human cognitive biases. The implications for deploying agents as objective decision-makers are uncomfortable.
The Benchmark Trap: When High Scores Hide Low Readiness
AI benchmarks measure performance in sanitized environments that bear little resemblance to conditions where these systems will actually operate.
Open Weights, Closed Minds: The Paradox of 'Open' AI
Models you can download but can't verify, use but can't fully trust, deploy but can't completely understand. The paradox of 'open' AI.
Interpretability as Infrastructure: Why Understanding AI Matters More Than Controlling It
Mechanistic interpretability has moved from describing what models do to engineering how they work. If you can identify the neurons responsible for a specific behavior, you don't need to control the entire system.