Skip to content

feat: structured failure diagnosis in RecoveryResult #706

@Aureliolo

Description

@Aureliolo

Context

Deep dive on Hive (aden-hive) revealed that their self-healing mechanism starts with structured failure diagnosis -- not just "task failed" but which specific node failed, which criteria it fell short on, and the full decision log.

SynthOrg's RecoveryResult currently lacks this structure, making checkpoint recovery and task reassignment routing less informed.

Action Items

  • Add failure_category enum to RecoveryResult (e.g., TOOL_FAILURE, STAGNATION, BUDGET_EXCEEDED, QUALITY_GATE_FAILED, TIMEOUT, DELEGATION_FAILED)
  • Add criteria_failed: list[str] -- which specific acceptance criteria were not met
  • Add stagnation_evidence: StagnationEvidence | None -- link stagnation detection data when applicable
  • Add failure_context: dict[str, Any] -- structured bag for domain-specific failure data (tool error messages, provider errors, etc.)
  • Use enriched diagnosis in checkpoint reconciliation messages
  • Use enriched diagnosis for smarter task reassignment routing (e.g., tool failure -> reassign to agent with different tool access)

Design Notes

This is a model extension, not a new module. Extends existing RecoveryResult in engine/recovery.py. Low effort, immediate improvement to recovery quality.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    prio:highImportant, should be prioritizedscope:smallLess than 1 day of workspec:task-workflowDESIGN_SPEC Section 6 - Task & Workflow Enginetype:featureNew feature implementationv0.7Minor version v0.7v0.7.0Patch release v0.7.0

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions