Multi-Agent Communication Protocols: How Agents Actually Talk to Each Other

We diagnosed the problem in our earlier signal: agents can connect, but they still can't communicate. MCP handles tool calls. A2A handles task handoffs. Neither handles meaning. That piece asked the question. This guide tries to answer it.

What follows is a technical walkthrough of how multi-agent systems actually exchange information today: the protocol specs, the communication architectures, and the failure modes nobody warns you about until your five-agent pipeline starts hallucinating in circles.

The Protocol Stack: What Exists Right Now

Three protocols have real adoption as of early 2026. They solve different problems, and understanding where each one stops is more useful than arguing about which one wins.

MCP (Model Context Protocol) is Anthropic's open standard for connecting agents to external tools, databases, and APIs. It's vertical: agent-to-resource. MCP uses JSON-RPC 2.0 and has thousands of pre-built server integrations. It doesn't handle agent-to-agent communication at all. That's by design. We covered the full protocol comparison separately.

A2A (Agent-to-Agent Protocol) is Google's answer to horizontal coordination. Every A2A-compliant agent publishes an Agent Card, a JSON metadata document describing its capabilities, supported modalities, and authentication requirements. Agents discover each other by querying these cards at well-known URLs. Tasks progress through defined lifecycle states: submitted, working, input-required, completed, failed, canceled, rejected. It's clean. It's also limited to task delegation. There's no mechanism for negotiation, disagreement, or reasoning about another agent's intent.

ACP (Agent Communication Protocol) was IBM Research's attempt to fill the peer-to-peer gap. Launched in March 2025, ACP let agents interact as equals rather than through an intermediary. Then in August 2025, IBM merged ACP into A2A under the Linux Foundation, with IBM's team joining the A2A Technical Steering Committee alongside Google, Microsoft, AWS, Cisco, Salesforce, ServiceNow, and SAP. The merger concentrated the industry's standardization effort but left the semantic layer unaddressed.

All three protocols handle transport and task coordination. None handle what actually matters for multi-agent systems at scale: shared meaning.

The FIPA Lesson: Why Formal Speech Acts Failed

This isn't the first attempt. The Foundation for Intelligent Physical Agents (FIPA) tried to solve inter-agent communication in the late 1990s with a formal Agent Communication Language defining 22 speech acts: inform, request, propose, reject, confirm, and so on. Platforms like JADE implemented it. Researchers studied it for a decade.

Practitioners mostly ignored it. Three reasons killed adoption.

First, ontology management. FIPA required agents to share formal ontologies for every domain. Maintaining those ontologies became impossible as interaction scope expanded. When ontologies got complex or stale, agents misinterpreted messages or missed context entirely.

Second, rigidity. Real-world agent interactions don't map cleanly to 22 predefined performatives. What speech act covers "I'm 70% confident this answer is correct but I'd like you to verify"? FIPA had no good answer.

Third, the web happened. Service-oriented architectures replaced agent-based systems. REST APIs won. FIPA's formal structure lost to the pragmatic sloppiness of HTTP calls.

The irony: we've now swung to the opposite extreme. LLM agents throw natural language prompts at each other with zero formal structure. FIPA was too rigid. Current systems are too loose. Neither works when you need agents to reliably coordinate across organizational boundaries.

How Frameworks Actually Pass Messages

Protocol specs describe the wire format. Frameworks determine how agents actually communicate in practice. The three dominant ones each picked a different architecture.

AutoGen: Conversation as Protocol

Microsoft Research's AutoGen models agents as participants in a group chat. An assistant agent generates responses, a user proxy executes code, specialist agents contribute domain knowledge. Messages pass in a loop. Each agent responds, reflects, or calls tools based on its internal logic.

This is the simplest mental model: agents talk to each other the way humans talk in a Slack channel. It works for small teams (two to four agents) solving bounded problems. It falls apart when you need deterministic execution order or when agents start talking past each other, which happens more than you'd expect.

CrewAI: Roles and Delegation

CrewAI structures communication around organizational roles. Each agent has a defined role, backstory, and goal. A "crew" assembles agents with a set of tasks. The framework handles delegation and state management.

The communication pattern is hierarchical: a manager agent assigns tasks, workers report back. This maps well to real organizational workflows. The constraint is that it assumes a top-down structure. Peer-to-peer negotiation between agents isn't native to the model.

LangGraph: State Machines Over Chat

LangGraph takes the most structured approach. Communication flows through a directed graph. Nodes are agents or functions. Edges define transitions. Every agent reads from and writes to a central state object. Reducer logic merges concurrent updates.

You don't hope agents talk to each other; you draw the exact path they must take. This eliminates a whole class of communication failures but makes the system brittle to unexpected inputs. If the graph doesn't have an edge for a particular situation, the system either fails or loops.

The coordination overhead varies dramatically across these patterns. AutoGen's flexibility creates ambiguity. CrewAI's hierarchy creates bottlenecks. LangGraph's rigidity creates blind spots. Pick your failure mode.

Blackboard Architecture: The Shared Memory Alternative

There's a fourth pattern that's been gaining traction: blackboard systems. Instead of agents passing messages directly to each other, a central shared memory space acts as a coordination layer. Agents post information to the blackboard. Other agents monitor it, decide whether they have relevant expertise, and contribute when they can.

Research on LLM-based blackboard systems shows this architecture outperforming direct message passing by 13% to 57% in end-to-end task success across data discovery tasks, with up to a 9% improvement in F1 score. The key advantage: decision-making shifts from a single coordinator to a distributed model where agents autonomously determine their own participation.

The tradeoff is real though. A single shared repository creates bottlenecks at scale. And there's a security problem: one malicious or faulty contribution can mislead every agent reading from the blackboard. Shared memory is powerful. It's also a single point of corruption.

Recent work frames multi-agent memory as a computer architecture problem, proposing a three-layer memory hierarchy with cache sharing across agents and structured access control. The analogy is apt: CPUs solved coherence problems decades ago with protocols like MESI. Multi-agent systems haven't solved them yet.

The Missing Layers: Why Current Protocols Aren't Enough

Fleming et al. proposed the most complete answer to the protocol gap in their Internet of Agents architecture (November 2025). Their argument: the entire communication stack is missing two layers.

Layer 8 (Agent Communication Layer) would standardize message envelopes and speech-act performatives like REQUEST and INFORM, plus interaction patterns like request-reply and publish-subscribe. Think of it as FIPA done right, with the flexibility that FIPA lacked.

Layer 9 (Agent Semantic Layer) would handle semantic grounding: binding terms to shared definitions, disambiguating incoming prompts, and providing primitives for consensus and coordination. This is the layer that lets agents reason about each other's intent rather than just parsing each other's JSON.

The IETF agrees the gap is real. Rosenberg's draft framework for AI agent protocols identifies agent discovery, credential management, and multimodal negotiation as areas needing standardization above MCP and A2A. The Internet Engineering Task Force standardized HTTP and TCP/IP. The fact that they're now circling AI agent communication tells you how fundamental this problem is.

But proposals aren't shipping software. Layer 8 and Layer 9 exist on paper. The agents you're deploying today still communicate through string concatenation and prayer.

Failure Modes: What Actually Goes Wrong

The MAST study (Cemri et al., March 2025) provides the most rigorous failure analysis available. The researchers analyzed 1,642 execution traces across seven open-source multi-agent frameworks. The numbers are bad.

Failure rates ranged from 41% to 86.7% across frameworks. The MAST taxonomy identifies 14 failure modes clustered into three categories: system design issues, inter-agent misalignment, and task verification failures. Coordination breakdowns accounted for 36.9% of all failures.

The Google DeepMind and MIT scaling study (December 2025) quantified this further. After testing 180 agent configurations across GPT, Gemini, and Claude model families, their conclusion was blunt: more agents do not equal better performance. Adding agents degraded performance by up to 70% on some tasks. Coordination overhead scaled with interaction depth. Agents operated on progressively divergent world states. Errors cascaded through execution chains rather than being corrected.

The specific failure patterns:

Error propagation: Independent agents amplified errors up to 17x when mistakes went unchecked. Centralized coordination limited propagation to roughly 4.4x because the orchestrator validated outputs.
Coordination failures: Coordination failure rates varied by architecture: independent agents showed 0%, centralized 1.8%, decentralized 3.2%, and hybrid architectures 12.4%.
Performance beyond four agents: Accuracy gains saturate or fluctuate past the four-agent threshold without structured topology.

The practical takeaway from both studies: always use a manager node. Peer-to-peer voting systems are prone to error cascades. You need a central coordinator to review outputs, even if it creates a bottleneck.

The AgenticPay negotiation benchmark exposed a different class of failure. Over 110 multi-round buyer-seller tasks showed that even frontier models couldn't handle basic negotiation: over 40% of failures occurred when the price gap was within 5 units, meaning agents couldn't converge even when agreement was close. Performance gaps between buyer and seller roles revealed systematic asymmetries in how models handle negotiation.

Topology Matters More Than Protocol

MultiAgentBench (ACL 2025) evaluated four communication topologies: star, chain, tree, and graph. The results challenge conventional wisdom.

Graph-mesh topology, where all agents exchange messages pairwise, yielded the best task scores, planning efficiency, and moderate token consumption. It outperformed both hierarchical and chain structures. But it only worked at small scale. The communication cost of all-to-all messaging grows quadratically with agent count.

Star topology (one coordinator, many workers) scaled better but created coordinator bottlenecks. Chain topology (sequential pipeline) eliminated coordination overhead but made error propagation unidirectional: upstream mistakes compounded downstream with no recovery path.

The right topology depends on your constraints: how many agents, how much inter-dependence between tasks, and how much you can afford to spend on coordination tokens. There's no universal best answer. Anyone telling you otherwise is selling a framework.

For most production systems today, the practical sweet spot is a single well-prompted agent for tasks that don't require genuine parallelism, and a star topology with a strong coordinator when they do. Skip the graph mesh unless you're running fewer than five agents on tasks that genuinely require cross-pollination of reasoning.

What to Actually Use Today

The protocol space will consolidate. The ACP-A2A merger signals the direction. MCP will keep handling tool integration. A2A (now with ACP's peer-to-peer DNA) will handle task coordination. The semantic layer that Fleming proposed will take years to standardize, if it ever ships.

In the meantime, here's what works:

For tool integration: MCP. It's won. Thousands of integrations, backed by every major IDE and AI platform. Don't fight it.

For multi-agent task coordination: A2A if you need cross-vendor interoperability. Your framework's native patterns (AutoGen conversations, CrewAI delegation, LangGraph state machines) if you're staying within a single stack.

For shared context: Build your own blackboard. A shared database or document store that agents read from and write to. It's crude. It works better than passing increasingly long message chains between agents whose context windows are already struggling.

For production reliability: Keep agent count low. Use a coordinator. Monitor for error cascading. Accept that the communication protocols will improve and design your architecture so you can swap them out when they do.

The honest assessment: we're in the dial-up era of agent communication. The protocols handle connections. They don't handle conversations. FIPA tried formal semantics too early. The IETF is trying again now. Somewhere between FIPA's rigidity and today's freeform chaos, there's a protocol layer that will let agents actually understand each other. We're not there yet.

Sources

Beyond Self-Talk: A Communication-Centric Survey of LLM-Based Multi-Agent Systems - Yan et al. (2025)
LLM-Based Multi-Agent Blackboard System for Information Discovery in Data Science - Blackboard architecture benchmarks
Multi-Agent Memory from a Computer Architecture Perspective - Memory hierarchy proposal
A Layered Protocol Architecture for the Internet of Agents - Fleming et al. (2025), Layer 8/9 proposal
Why Do Multi-Agent LLM Systems Fail? - MAST failure taxonomy, 1,642 traces analyzed
Towards a Science of Scaling Agent Systems - Google DeepMind and MIT scaling study
MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents - ACL 2025, topology comparison
AgenticPay Negotiation Benchmark - Multi-round negotiation task failures
A2A Protocol Specification - Agent Cards and task lifecycle
ACP Joins Forces with A2A Under the Linux Foundation - ACP-A2A merger
A Survey of Agent Interoperability Protocols - MCP, ACP, A2A, ANP comparison
Framework for AI Agent Protocols (IETF) - Rosenberg's agent protocol requirements
IBM Agent Communication Protocol - ACP technical overview