Safety & Governance

The hard problems: red teaming, bias, interpretability, alignment, and the governance frameworks that might actually matter. No hand-waving.

Key Guides

Latest Signals

From the team behind Swarm Signal

Track Your Finances While You Build AI

BoredTools makes the boring stuff easy — budget dashboards, freelance trackers, and business planners. Download free or grab the full collection.

Browse All Templates Budget Dashboard 2026

signals

Interpretability as Infrastructure: Why Understanding AI Matters More Than Controlling It

Approximately 100 neurons control subject-verb agreement in large language models. Not thousands. Not millions. One hundred MLP neurons in a 8-billion...

signals

The Red Team That Never Sleeps: When Small Models Attack Large Ones

A 1.5-billion parameter model just learned to jailbreak GPT-5 Nano, Claude 3.5 Sonnet, and Gemini 2.5 Flash. It didn't need human creativity or domain...

signals

Open Weights, Closed Minds: The Paradox of 'Open' AI

When researchers [examined 100+ language models](https://arxiv.org/abs/2502.18505) marketed as "open-source," they found a systematic pattern of omission....

Guides

AI Safety Compliance for Startups: The Minimum Viable Checklist

The EU AI Act went live. Colorado enforces algorithmic fairness. Enterprise buyers demand AI governance documentation. Here's the minimum viable compliance stack that satisfies current regulations without draining your runway.

signals

Red Teams Found Agents Leak More Than Models

Red teams found agents are far more vulnerable than standalone models. Mixed attack strategies hit 84.3% success rates. Memory poisoning persists across sessions. Every tool is a potential exfiltration path.

Guides

Red Teaming AI Agents: A Practitioner's Guide

Red teaming AI agents is fundamentally different from red teaming standalone models. Agents have tools, memory, and credentials — each a new attack surface. This guide covers the OWASP agentic framework and a structured testing methodology.

Guides

AI Safety Frameworks for Regulated Industries: Healthcare, Finance, and Government

Regulated industries face roughly three times the compliance burden of unregulated AI deployments. This guide maps the actual frameworks, enforcement timelines, and compliance costs for AI safety across healthcare, finance, and government in 2026.

Guides

Best AI Red-Teaming and Safety Testing Tools 2026

Your AI system will get attacked. The question is whether you find the vulnerabilities first or your users do. 8 red-teaming tools tested and compared.

signals

Alignment Works in English. In Japanese, It Backfires.

A new study shows the same alignment intervention that produces strong safety effects in English reverses direction in Japanese, increasing harmful outputs. Tested across 1,584 simulations, 16 languages, and three model families.

signals

One Fake Source Broke Every Agent

A single misinformation article injected into search rankings crashed GPT-5's accuracy from 65.1% to 18.2%. The agents had unlimited access to truthful sources and couldn't be bothered to look.