Safety & Governance

The hard problems: red teaming, bias, interpretability, alignment, and the governance frameworks that might actually matter. No hand-waving.

Alignment Works in English. In Japanese, It Backfires.
signals

Alignment Works in English. In Japanese, It Backfires.

A new study shows the same alignment intervention that produces strong safety effects in English reverses direction in Japanese, increasing harmful outputs. Tested across 1,584 simulations, 16 languages, and three model families.

3 min read
One Fake Source Broke Every Agent
signals

One Fake Source Broke Every Agent

A single misinformation article injected into search rankings crashed GPT-5's accuracy from 65.1% to 18.2%. The agents had unlimited access to truthful sources and couldn't be bothered to look.

3 min read
Washington's $42 Billion AI Shakedown
signals

Washington's $42 Billion AI Shakedown

The Trump administration is using $42 billion in broadband funding to pressure states into repealing AI laws. The FTC has been directed to classify bias mitigation as a deceptive trade practice. Meanwhile, the EU enforces the opposite.

5 min read
We Built the Agent Internet Before Its Firewalls
signals

We Built the Agent Internet Before Its Firewalls

Three CVEs in Anthropic's own MCP reference server. Over 8,000 production servers exposed to the internet. The protocol powering AI agents shipped without security, and the industry is paying for it.

7 min read
The EU AI Act Hits Full Force in August 2026. Here's What Changes.
guides

The EU AI Act Hits Full Force in August 2026. Here's What Changes.

On August 2, 2026, the EU AI Act becomes fully enforceable for high-risk AI systems. 40% of enterprise AI systems can't even determine whether they qualify. Here's what changes.

12 min read
AI Agent Security in 2026: Prompt Injection, Memory Poisoning, and the OWASP Top 10
guides

AI Agent Security in 2026: Prompt Injection, Memory Poisoning, and the OWASP Top 10

AI agents don't just have a security problem. They have a fundamentally different security problem than the systems they're replacing. Five attack surfaces and the defense patterns that actually work.

11 min read
The Swarm That Fakes Consensus
signals

The Swarm That Fakes Consensus

Twenty-two researchers across four continents show how agent swarms fabricate consensus, infiltrate communities, and poison the training data of future AI models.

6 min read
The Accountability Gap When AI Agents Act
signals

The Accountability Gap When AI Agents Act

When an AI agent causes harm, who pays? Current law can't answer that clearly.

3 min read
AI Guardrails for Agents: How to Build Safe, Validated LLM Systems
guides

AI Guardrails for Agents: How to Build Safe, Validated LLM Systems

A Chevrolet chatbot sold a Tahoe for $1. Now AI agents can execute code, call APIs, and trigger real-world actions. Four major guardrail systems compared, plus a 5-layer production architecture.

11 min read
The International AI Safety Report 2026: What 12 Companies Actually Agreed On
Signal

The International AI Safety Report 2026: What 12 Companies Actually Agreed On

The most comprehensive global AI safety assessment ever assembled was released last week. The International AI Safety Report 2026, led by Turing Award winn

7 min read
Swarm Signal
0:00
0:00
Up Next

Queue is empty. Click "+ Queue" on any article to add it.