LISTEN TO THIS ARTICLE

Agent accountability is leaving the policy binder. The sharper signal is technical: identity, delegated authority, trace logs, approval events, and incident reconstruction are becoming part of the agent runtime. A June 2026 arXiv paper argues that ordinary audit logs and traces can be identical under different delegation assignments, so investigators cannot reliably reconstruct which authority context caused which action without extra runtime fields Observability for Delegated Execution in Agentic AI Systems.

Evidence base: arXiv delegation research, CoSAI agentic IAM guidance, OpenAI and OpenTelemetry tracing docs, and EU AI Act logging and oversight text.

The tooling is already moving in that direction.

Key takeaways

  • Agent accountability is becoming an execution property, not just a governance statement.
  • Identity has to cover agents, users, delegated scope, model or code claims, and revocation.
  • Logs without authority context leave incident teams guessing after overlapping agent runs.
  • Human approvals need to be captured as events, not remembered as chat history.

Agent accountability gets an execution graph

The legal framing in The Accountability Gap When AI Agents Act was: who pays when an agent harms someone? The runtime question is operational: can you prove what the agent was authorised to do, which human approved it, which tools it touched, and what changed after the approval?

That is why the delegated-observability paper matters. Its authors test the difference between trace-based reconstruction and a delegation-aware common information model. The incident-response result is concrete: baseline forensic queries require 6 to 14 correlation and normalisation operations, while the proposed delegation model reduces the same query classes to 1 to 3 predicates over stable fields Observability for Delegated Execution in Agentic AI Systems.

The tooling is already moving in that direction. OpenAI's Agents SDK says tracing is built in and records LLM generations, tool calls, handoffs, guardrails, and custom events during an agent run OpenAI Agents SDK tracing. OpenTelemetry's GenAI agent span draft includes operations for creating agents, invoking agents, planning, tool execution, retrieval, and memory operations OpenTelemetry GenAI agent spans.

For agent builders, that belongs in the design, not the appendix.

Agent accountability needs authority, not just traces

Runtime identity is the missing layer. South, Marro, Hardjono and co-authors propose authenticated delegation that extends OAuth 2.0 and OpenID Connect with agent-specific credentials, delegated permissions, contextual scope restrictions, and auditable receipts Authenticated Delegation and Authorized AI Agents. CoSAI's 2026 agentic IAM paper lands in the same place from a security angle: give each agent a short-lived unique identity, bind it to verifiable claims such as code and model signatures, and validate those claims whenever the agent performs a critical operation Agentic Identity and Access Management.

This is the bridge between AI guardrails for agents and agent evals that catch production failures. Guardrails decide whether an action should proceed. Evals test whether behaviour was acceptable. Accountability infrastructure records who delegated authority, what scope was granted, which approval expanded that scope, and how to reconstruct the run after something fails.

Regulation is pushing the same shape. The EU AI Act requires high-risk AI systems to allow automatic event logs across their lifetime and says logging should support traceability, post-market monitoring, and operation monitoring EU AI Act Article 12. Article 14 requires high-risk systems to support human oversight, including the ability to monitor, override, reverse, intervene, or stop the system where appropriate EU AI Act Article 14. For agent builders, that belongs in the design, not the appendix.

The counterargument is real. More logging can become a privacy breach with nicer dashboards. OpenAI's tracing docs warn that generation spans and function spans may store inputs and outputs, and that sensitive capture is enabled by default unless configured otherwise OpenAI Agents SDK tracing. Accountability data needs redaction, retention rules, and access control. Otherwise the audit trail becomes the incident.

Operator takeaway

Treat agent accountability as a runtime contract. Before an agent reaches production, require a delegated authority ID, an agent identity, a human approval event for high-impact actions, tool-call traces, redaction policy, and a reconstruction query that can answer: what happened, under whose authority, with which permissions, and who could have stopped it? If the answer is a Slack thread, the control does not exist.

Related: Consent and Delegation Boundaries for AI Agents.

Source trail

Research papers

Technical docs and standards

Legal and security context

Related Swarm Signal analysis