Reasoning & Memory

How models think, remember, and retrieve information. Reasoning tokens, RAG pipelines, context engineering, and the memory architectures that make agents useful.

Deep Dives and Frameworks

Implementation playbooks, operator patterns, and durable analysis.

Signals, Maps, and Watch Lists

Production-oriented analysis, benchmarks, and market/system intelligence.

External tools

Execution tooling is separate

Swarm Signal keeps the analysis layer. Use BoredTools for reusable production templates and trackers.

Open BoredTools Open Budget Tracker

Signal Failure Briefs Evidence-first framing

Agent Memory Needs Quarantine, Not Recall

Persistent memory is moving from chat convenience into personal-agent infrastructure. The failure mode is not just forgetting. It is remembering the wrong...

Signal Failure Briefs Evidence-first framing

RAG Cost Attacks Turn Retrieval Into a Budget Risk

A June 2026 paper on retrieval-augmented inference cost attacks reports a failure mode that many RAG teams are not testing: poisoned external documents...

Signal Benchmark Watch Evidence-first framing

Multimodal Memory Tests Expose the Personal-Agent Gap

Product teams are turning memory into the selling point for personal agents. The hard question is no longer whether they can remember a preference; it is...

Signal Signals Evidence-first framing

Agent Memory Fails on Relationships, Not Recall

New June 2026 memory benchmarks show why long-running agents fail when facts conflict, evolve, or depend on hidden relationships.

Signal Signals Evidence-first framing

Evaluation-Aware Memory: How Agents Should Remember What They Can Prove

Agent memory should promote facts only after evals prove they improve task outcomes, not just because retrieval found them.

Signal Signals Evidence-first framing

RAG Maintenance After Deployment: The Failure Mode Nobody Budgets For

RAG maintenance after deployment is the hidden operating cost: stale indexes, drifting corpora, weak evals, and silent retrieval failure.

Signal Benchmark Watch Evidence-first framing

Million-Token Context Still Fails the Workload Test

Anthropic reported on February 5, 2026 that Claude Opus 4.6 scored 76% on the 8-needle 1M-token MRCR v2 test while Claude Sonnet 4.5 scored 18.5% on the...

Signal Field Guides Evidence-first framing

Context Window Management: When 1M Tokens Isn't Enough

Claude Opus 4.6 scores 76% on MRCR v2 at 1 million tokens. Gemini 3 Pro drops to 26.3%. Bigger windows don't solve the context problem — they change it. Research-backed strategies for chunking, compression, and retrieval.

Signal Signals Evidence-first framing

More Context Doesn't Kill RAG. It Just Changes the Fight.

Long-context LLMs now hit a million tokens, but a persistent 10% accuracy gap and punishing costs keep RAG very much in the fight.

Signal Field Guides Evidence-first framing

Knowledge Graphs for AI Agents: Beyond Vector Search

Vector databases power most retrieval-augmented generation systems in production today. They're fast, simple, and good enough for single-hop lookups...