Skip to content

research(orchestration): error cascades in multi-agent DAGs — genealogy-graph defense raises success rate 0.32→0.89 (arXiv:2603.04474) #2407

@bug-ops

Description

@bug-ops

Source

arXiv:2603.04474 — "From Spark to Fire: Modeling and Mitigating Error Cascades in LLM-Based Multi-Agent Collaboration" (submitted March 4, 2026)

Key Contribution

Models multi-agent collaboration as a directed dependency graph, identifies three cascade vulnerability classes: amplification, topological sensitivity, consensus inertia. Proposes a genealogy-graph-based defense layer implemented as a message-level plugin (no architectural changes required). Raises defense success rate from 0.32 baseline to 0.89+.

Relevance to Zeph

zeph-a2a / orchestration — Zeph's A2A and DagScheduler chain agent outputs as inputs to downstream agents. A failed or hallucinated intermediate result can corrupt entire DAG execution. The genealogy-graph plugin could wrap AgentRouter to track error provenance and abort cascades early.

Implementation Sketch

  • HandoffContext already carries task provenance — extend with error lineage tracking
  • Add cascade abort condition: if N consecutive nodes in a dependency chain exceed error threshold, abort DAG and surface root failure
  • Log cascade paths in orchestration audit log for post-mortem analysis

Priority Assessment

P3 (research) — Relevant as orchestration DAGs grow in depth. Implement when multi-agent cascade failures are observed in production.

Metadata

Metadata

Assignees

Labels

P3Research — medium-high complexityresearchResearch-driven improvement

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions