Description
MMA2A extends A2A with modality-native routing: instead of converting all inter-agent communication to text, the layer inspects Agent Card capability declarations and routes voice, image, and text parts in their native formats.
Results
- 52% task completion vs 32% text-only baseline (+20 pp) on CrossModal-CS benchmark
- Vision-dependent tasks: +38.5 pp (defect reports), +16.7 pp (visual troubleshooting)
- Trade-off: 1.8× latency from native multimodal processing
- Key finding: routing alone is insufficient — downstream agents need capable reasoning to realize the gains
Relevance to Zeph
Zeph's zeph-a2a currently serializes all inter-agent messages as text/JSON. MMA2A demonstrates that adding capability-aware routing to AgentCard declarations enables non-text modalities (image, audio) to pass natively between agents.
Immediate actionable improvements:
- Add
capabilities field to Zeph's AgentCard / agent discovery (already partially supported)
- When routing a task, inspect calling agent's capability declarations and preserve binary payloads
- No need for full MMA2A — even step 1 (declaring image/audio capability) enables future routing
Architecture Sketch
Source
Description
MMA2A extends A2A with modality-native routing: instead of converting all inter-agent communication to text, the layer inspects Agent Card capability declarations and routes voice, image, and text parts in their native formats.
Results
Relevance to Zeph
Zeph's
zeph-a2acurrently serializes all inter-agent messages as text/JSON. MMA2A demonstrates that adding capability-aware routing to AgentCard declarations enables non-text modalities (image, audio) to pass natively between agents.Immediate actionable improvements:
capabilitiesfield to Zeph's AgentCard / agent discovery (already partially supported)Architecture Sketch
Source