research(a2a): MMA2A — multimodal-native routing in A2A networks (arXiv:2604.12213)

## Description

MMA2A extends A2A with modality-native routing: instead of converting all inter-agent communication to text, the layer inspects Agent Card capability declarations and routes voice, image, and text parts in their native formats.

## Results

- 52% task completion vs 32% text-only baseline (+20 pp) on CrossModal-CS benchmark
- Vision-dependent tasks: +38.5 pp (defect reports), +16.7 pp (visual troubleshooting)
- Trade-off: 1.8× latency from native multimodal processing
- Key finding: routing alone is insufficient — downstream agents need capable reasoning to realize the gains

## Relevance to Zeph

Zeph's `zeph-a2a` currently serializes all inter-agent messages as text/JSON. MMA2A demonstrates that adding capability-aware routing to AgentCard declarations enables non-text modalities (image, audio) to pass natively between agents.

Immediate actionable improvements:
1. Add `capabilities` field to Zeph's AgentCard / agent discovery (already partially supported)
2. When routing a task, inspect calling agent's capability declarations and preserve binary payloads
3. No need for full MMA2A — even step 1 (declaring image/audio capability) enables future routing

## Architecture Sketch



## Source

- Paper: https://arxiv.org/abs/2604.12213
- Benchmark: CrossModal-CS

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

research(a2a): MMA2A — multimodal-native routing in A2A networks (arXiv:2604.12213) #3326

Description

Results

Relevance to Zeph

Architecture Sketch

Source

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

research(a2a): MMA2A — multimodal-native routing in A2A networks (arXiv:2604.12213) #3326

Description

Description

Results

Relevance to Zeph

Architecture Sketch

Source

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions