Skip to content

research(a2a): MMA2A — multimodal-native routing in A2A networks (arXiv:2604.12213) #3326

@bug-ops

Description

@bug-ops

Description

MMA2A extends A2A with modality-native routing: instead of converting all inter-agent communication to text, the layer inspects Agent Card capability declarations and routes voice, image, and text parts in their native formats.

Results

  • 52% task completion vs 32% text-only baseline (+20 pp) on CrossModal-CS benchmark
  • Vision-dependent tasks: +38.5 pp (defect reports), +16.7 pp (visual troubleshooting)
  • Trade-off: 1.8× latency from native multimodal processing
  • Key finding: routing alone is insufficient — downstream agents need capable reasoning to realize the gains

Relevance to Zeph

Zeph's zeph-a2a currently serializes all inter-agent messages as text/JSON. MMA2A demonstrates that adding capability-aware routing to AgentCard declarations enables non-text modalities (image, audio) to pass natively between agents.

Immediate actionable improvements:

  1. Add capabilities field to Zeph's AgentCard / agent discovery (already partially supported)
  2. When routing a task, inspect calling agent's capability declarations and preserve binary payloads
  3. No need for full MMA2A — even step 1 (declaring image/audio capability) enables future routing

Architecture Sketch

Source

Metadata

Metadata

Assignees

Labels

P3Research — medium-high complexityresearchResearch-driven improvement

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions