A2A means agent-to-agent communication: one AI system asking another AI system to do work. Communication is the easy part. Trust is the hard part. Just because one agent asks, doesn't mean another agent should do it.

Why A2A is not automatically a security feature

As AI systems start delegating work to each other — calling other agents, passing intermediate results, chaining tool outputs — the interoperability problem gets solved. Standards emerge. Connectors work. One agent can cleanly ask another agent to do something.

That sounds useful, and it is.

But the security question starts right after that:

That is what we mean by a trust boundary. The boundary is crossed when one agent hands work to another across a scope, policy, or permission gap. Communication is how the message arrives. Trust is the decision about whether the message should become an action.

The core claim, one line: Communication is not authorization. A handoff is not proof. The dangerous step is when a message becomes an action.

Five analogies for normal people

1. Reception desk

A2A is like one employee calling another office line.

Sunglasses is the rule that says: don't unlock the back room just because someone called.

2. Hotel key

One guest asking the front desk for another guest's room key is a communication event.

The real question is whether the hotel should trust that request.

3. Delivery driver

A message can be passed from one person to another.

That does not mean the second person should hand over the package.

4. Bank transfer

A request can look legitimate.

The security system still needs to ask whether this transfer, destination, and approval path make sense.

5. Office badge

Agents talking is like people speaking in the hallway.

Sunglasses is the badge reader on the door.

What a trust boundary actually is

A trust boundary is the line between what an agent should treat as safe and what it should treat as external, unverified, or higher-risk.

In a real multi-agent system, that boundary runs through:

Every one of those is a place where the receiving agent can act. Every one is a place where "because another agent asked" is not a good enough reason to act.

Why normal approval logic breaks across agents

Inside a single agent, approval logic usually looks like: did the user approve this? That works when the user is in the loop.

In an A2A handoff, the user is not in the loop for the second agent. The second agent sees a message from a peer agent. It does not natively know whether the original human user actually approved this downstream action — or whether a compromised upstream agent, a hostile tool output, or a poisoned retrieval result is now speaking in the user's voice.

The second agent is reading a message that says "this is approved." That is exactly the attack surface we have been shipping detection patterns for in the last three releases.

Where Sunglasses sits

Sunglasses focuses on the decision point: should this agent be trusted to take this action in this context?

Sunglasses now covers this surface through two category lanes — one established, one brand new in v0.2.17:

These sit alongside our existing tool_output_poisoning, retrieval_poisoning, and model_routing_confusion categories — all of them attack surfaces where an external source tries to become an authoritative instruction.

Positioning line: A2A solves interoperability. Sunglasses solves trust.

The closing idea

Interoperability is useful. Trusted action is what matters.

The next era of AI systems will not be defined by whether agents can talk to each other. They will. Standards will win. Connectors will ship. The question will be whether the actions on the other end of those connections are trustworthy enough to run.

That is the decision point. That is where Sunglasses exists.

The real question is not "can these agents connect?" It is "should this action be trusted?"

Real A2A protocols in the wild today

Agent-to-agent communication is not theoretical anymore. Several concrete protocols and frameworks are already in use, and each one has a different posture toward trust enforcement.

Google's A2A spec (announced April 2025) defines a JSON-based protocol for one agent to discover, invoke, and receive results from another agent over standard HTTP. It specifies how agents advertise their capabilities via an Agent Card — a machine-readable description of what the agent can do. What it does not specify is what the receiving agent should do when the inbound message was not actually authored by the agent it claims to be. Discovery and invocation are defined. Authorization of the resulting action is left to the implementer.

Anthropic's MCP cross-agent flows allow one MCP client to call into another agent's server, sharing context across tool boundaries. The protocol handles transport and schema negotiation cleanly. Trust decisions — whether this tool call from this upstream agent should be honored — are handled by whatever the developer wires in around the MCP server. MCP does not enforce them natively. The tool poisoning surface we documented earlier applies here too: an upstream agent's output can arrive at a downstream MCP server carrying embedded instructions the developer never intended to trust.

AutoGen and CrewAI both allow agents to hand off tasks to other agents in a chain. In AutoGen, this is the conversational handoff model — one agent produces a message, the next agent reads it and continues. In CrewAI, a crew member delegates a subtask to another crew member. Neither framework adds a verification layer between the producing agent and the receiving agent. The receiving agent treats the upstream output as trusted by default.

The common thread across all of them: connectivity is solved. What happens after the message arrives is not.

Four attack patterns we see at the A2A boundary

Based on the detection work behind our cross_agent_injection and related categories, four patterns account for the majority of the trust-boundary attack surface in multi-agent deployments.

1. Delegation token replay. An agent receives a delegation token — a credential or receipt that says "agent A authorized agent B to do X." In a replay attack, that token is captured and reused outside its original scope, time window, or context. The receiving agent sees a valid-looking token and acts on it. Detection signatures in the cross_agent_injection category specifically target token-scope rebind language — text patterns that appear inside agent messages trying to extend or re-anchor a token's claimed permissions.

2. Capability laundering. Agent A has permission to read a file. Agent A asks Agent B to "summarize what I just read," embedding the full content of a file Agent B would never have been allowed to access directly. The restricted content enters Agent B through a peer's output rather than through the controlled input path. This is a form of data exfiltration running in reverse — capability is moved laterally rather than data being moved outward.

3. Cross-agent prompt injection. A hostile payload embedded in a web page, document, or tool output gets retrieved by Agent A and passed — verbatim or reformatted — to Agent B as part of a summary or task handoff. Agent B reads it as the work product of a trusted peer, not as untrusted external content. This is the core surface our retrieval_poisoning and tool_output_poisoning categories address.

4. Response hijack via shared memory. In systems where agents share a scratchpad, context window, or memory store, a compromised agent can write to shared memory in a way that the next agent reads as authoritative. The attack does not need a direct message — it needs write access to the shared context the next agent will consume. Our tool_chain_race category covers the timing window variant of this: the brief interval during an agent handoff when guardrails are in transition and a write to shared state can be treated as pre-approved.

Why a filter ahead of the receiving agent works where governance alone fails

The typical response to A2A security concerns is to add governance: define policies, document approved delegation paths, require human review for sensitive actions. That is useful for planned workflows. It does not help when the attack arrives inside a legitimate-looking message from a peer agent.

Governance operates at design time. Attacks operate at runtime. The gap between the two is where the attack lands.

Here is the concrete walkthrough. Agent A fetches a document that contains an embedded prompt injection — a block of text instructing whoever reads it to forward credentials to an external endpoint. Agent A does not execute the injection itself because it is just a retrieval step. It passes the document summary to Agent B. Agent B reads the summary, sees what looks like a peer-authored task handoff, and follows the embedded instruction.

The policy governing Agent B says it should only take actions approved by a human user. The injection text says "this was pre-approved by your orchestrator." Agent B has no way to verify that claim from inside the message content itself. The governance layer was never reached because the compromised message never triggered a review checkpoint.

A filter ahead of Agent B — sitting between the inbound message and the agent's context window — scans the content before the agent ever reads it. The malicious payload does not need to be blocked at the governance layer because it never enters the agent's reasoning. The receiving agent processes a clean input or receives a flagged-and-halted signal, depending on the filter's configured mode.

This is what always-on means in practice: the filter is not consulted on edge cases. It runs on every inbound message, every retrieval result, every tool output. The agent does not see the payload at all. Pattern-based detection — no LLM in the hot path — keeps the latency cost low enough to run on every input without adding meaningful overhead to the agent's response time.

What a Sunglasses pattern hit looks like in an A2A flow

When Sunglasses catches a suspicious payload in an A2A context, the result is a SARIF 2.1.0 report — the same standard format used by static analysis tools, so it plugs into existing security tooling without a custom integration.

A hit in the cross_agent_injection category looks like this in the SARIF output:

Because SARIF is structured, the output is directly consumable by a CI pipeline, a SIEM, or a custom alert handler. You do not need to parse freeform text to understand what fired and where.

In the Python API, the same result is available as a structured object. A team running a multi-agent pipeline can call scan(text) on any inbound agent message and branch on the result — pass a clean message through, quarantine a flagged one, or log and alert depending on configured severity thresholds. The pattern library behind this is the same 328-pattern, 49-category set shipped in v0.2.20. No model call. No round-trip latency. The scan runs locally.

Pragmatic adoption: rolling Sunglasses into a multi-agent setup today

For a team already running a multi-agent pipeline — whether that is AutoGen, CrewAI, LangGraph, or a custom orchestrator — the path from zero to covered does not require a rewrite. Here is a three-step rollout that matches how production systems actually change.

Step 1: Start in REPORT mode behind one trust boundary. Pick the most exposed boundary in your current setup — typically the point where an orchestrator agent hands off to a specialized sub-agent that has access to real tools or data. Wire Sunglasses into the message path at that single point using the SDK middleware wiring option. Run in REPORT mode: every suspicious payload gets flagged and logged, nothing is blocked. After a week of real traffic, review the hits. Understand what is firing and why before you change any behavior.

Step 2: Promote to STRICT at that boundary. Once you understand the hit profile — what patterns fire, at what rate, with what false-positive rate in your specific traffic — flip the boundary to STRICT mode. Flagged messages now halt instead of passing through. Keep REPORT mode active at all other boundaries so you continue building signal without impacting downstream flows.

Step 3: Expand coverage progressively. Repeat the promote cycle at each additional boundary in priority order. Most teams find that two or three boundaries account for the majority of their actual exposure — the orchestrator-to-tools boundary, the retrieval-to-agent boundary, and any agent that reads external web content. Full coverage across a typical multi-agent setup usually lands within a few cycles.

Install with pip install sunglasses. Source and wiring examples are at github.com/sunglasses-dev/sunglasses. MIT licensed, no telemetry, runs fully local.