LISTEN TO THIS ARTICLE
AI Agent Security Checklist
Review scope: data, credentials, tools, memory, and outbound channels.
Review Questions
Prompt Injection
Review:
- Can direct user input override the system policy?
- Can retrieved text steer a tool call, memory write, or outbound message?
- Can the agent keep task content separate from instructions?
- Is untrusted content labeled before it reaches the model context?
Memory
Review:
- Is memory disabled when the task does not need it?
- Are memory writes scoped to one user, tenant, or workflow?
- Can a reviewer inspect, delete, and quarantine stored memories?
- Do retrieval rules prefer recent trusted records over unknown records?
Tools
Review:
- Does the agent have an allow-list of tools for this workflow?
- Are tool arguments checked before execution?
- Are dangerous actions routed through human approval?
- Can the agent call only the systems needed for the current task?
Supply Chain
Review:
- Are third-party components pinned and reviewed?
- Are tool descriptions short, explicit, and free of hidden instructions?
- Are prompt templates versioned with code review?
- Can runtime behavior be compared with declared capabilities?
Exfiltration
Review:
- What private data can the agent read?
- What outbound channels can the agent use?
- Are large exports, unusual destinations, and sensitive fields blocked or reviewed?
- Is there a redaction step before responses, files, or tool outputs leave the workflow?
Baseline Controls
Baseline: small permissions first.
Credentials: short-lived, task-scoped, and separate by agent. Approvals: required for high-impact actions. Logs: prompts, retrieved context, tool calls, approvals, and final actions.
Multi-agent review: authenticated messages, signed payloads where appropriate, logged handoffs. See multi-agent systems.
RAG review: ingestion, retrieval, record access, and source visibility. See RAG architectures.
Further detail: AI guardrails guide and agent accountability framework.
Related: The Agent Project That Should Have Been One
Sources
Research Papers:
- Memory Poisoning Attack and Defense on Memory Based LLM-Agents — Devarangadi Sunil et al. (2026)
- MemoryGraft: Persistent Compromise of LLM Agents via Poisoned Experience Retrieval — (2025)
- Indirect Prompt Injection in the Wild for LLM Systems — (2026)
- EchoLeak: The First Real-World Zero-Click Prompt Injection Exploit in a Production LLM System — (2025)
- LlamaFirewall: An Open Source Guardrail System for Building Secure AI Agents — Sheng et al. (2025)
Industry / Standards:
- OWASP Top 10 for LLM Applications 2025 — OWASP
- OWASP Top 10 for Agentic Applications 2026 — OWASP
- AI Agent Security Cheat Sheet — OWASP
- CVE-2025-32711 Detail — NVD
- Breaking down EchoLeak, the First Zero-Click AI Vulnerability Enabling Data Exfiltration from Microsoft 365 Copilot — Cato / Aim Labs
- AI Tool Poisoning: How Hidden Instructions Threaten AI Agents — CrowdStrike
- Manipulating AI Memory for Profit: The Rise of AI Recommendation Poisoning — Microsoft Security
- AI Agents Are Here. So Are the Threats. — Palo Alto Networks Unit 42
- When AI Remembers Too Much: Persistent Behaviors in Agents' Memory — Palo Alto Networks Unit 42
- Unveiling AI Agent Vulnerabilities Part III: Data Exfiltration — Trend Micro
Commentary:
- OpenAI Admits Prompt Injection Is Here to Stay as Enterprises Lag on Defenses — VentureBeat
- AI Agent Attacks in Q4 2025 Signal New Risks for 2026 — eSecurity Planet
- Inside CVE-2025-32711 (EchoLeak): Prompt Injection Meets AI Exfiltration — Hack The Box
Related Swarm Signal Coverage: