Skip to content

feat: two-layer health monitoring pipeline (judge + triage) for stagnation events #707

@Aureliolo

Description

@Aureliolo

Context

Deep dive on Hive revealed a two-layer health monitoring design that prevents alert fatigue:

  1. Health Judge (sensitive): Timer-driven, reads tool logs, tracks steps_since_last_accept, detects stall/doom-loops. Emits structured EscalationTicket.
  2. Queen triage (conservative): Filters tickets before operator notification. Dismiss: low severity + transient. Intervene: high/critical + doom loop + stall > threshold.

SynthOrg's stagnation detector handles layer 1 but has no post-termination escalation pipeline or alert filtering.

Action Items

  • After STAGNATION or repeated FAILED termination, emit structured health event to NotificationSink
  • Design EscalationTicket model: severity, cause, evidence, steps_since_last_progress, stall_duration
  • Implement triage filter: dismiss low-severity transient issues, escalate high/critical with evidence
  • Wire to NotificationSink protocol (ntfy.sh research, 2026-03-14)
  • Consider: should triage be a lightweight rule-based filter or an LLM agent?

Design Notes

Maps directly to the NotificationSink protocol recommended in the ntfy.sh research (research log entry #33, 2026-03-14). The two-layer design prevents operator alert fatigue -- the judge is sensitive, the triage filter is conservative.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    prio:highImportant, should be prioritizedscope:medium1-3 days of workspec:task-workflowDESIGN_SPEC Section 6 - Task & Workflow Enginetype:featureNew feature implementationv0.7Minor version v0.7v0.7.8Patch release v0.7.8

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions