kanban dispatcher: add circuit-breaker for repeated worker bails with identical block reason

# Feature request: circuit-breaker on repeated worker bails with identical block reason

## Summary

When a kanban worker bails on an external blocker (e.g. saturated CI runners, third-party API down, upstream dependency PR not merged), the dispatcher re-claims the task on its next tick. If the external blocker persists, the worker bails again with the same block reason. This can loop for hundreds of cycles before the human-in-the-loop notices, burning provider quota and flooding the kanban event history with identical "still blocked, unchanged" rows.

## Repro

1. Set up a kanban task whose work depends on an external condition the worker cannot fix (e.g. PR waiting on CI checks that are queued indefinitely because runners are saturated)
2. Worker claims, observes the unchanged external condition, runs `kanban_block` with a reason like "CI queued, 0 progress, unchanged"
3. Dispatcher's next tick re-claims, worker observes the same condition, blocks again with the same reason
4. Loop continues until something external changes or a human intervenes

## Expected

After N consecutive bails (suggest N=5) with substantially-identical block reasons, the dispatcher should:
- Auto-pause the task (status=`blocked`, no auto re-claim)
- Force an orchestrator handoff (or escalate if a handoff already exists and is past its SLA)
- Surface in `hermes kanban list` with a distinct diagnostic flag (e.g. `circuit_open`)

This is distinct from `max_retries` (which counts run failures, not voluntary bails) and from the triage-watcher handoff escalation (which triggers at 60min but does not stop the re-claim loop).

## Actual

Observed in production: ~230 worker spawn cycles across 4 tasks over a ~7 hour window during a CI runner saturation event. Each spawn was a full agent boot + context load + situation re-discovery, all bailing in 24-30 seconds on the same unchanged external condition. The triage watcher *did* correctly escalate orchestrator handoffs at the 60-minute mark, but the dispatcher kept re-claiming the source tasks because nothing stopped it.

Sample event sequence (one task, abbreviated):
```
#107 blocked → "98th consecutive run — PR still queued, 0 progress, unchanged"
#108 blocked → "99th consecutive run — PR still queued, 0 progress, unchanged"
#109 blocked → "100th run — same CI infra wall, unchanged"
... (continues)
```

## Suspected cause

`kanban_block` records the reason but doesn't track consecutive-with-same-reason counts on the task. Dispatcher's claim selection only considers `status=ready|blocked-with-unblock-time-passed` and doesn't penalize tasks that have repeatedly bailed on the same external condition.

## Suggested implementation sketch

- Track `consecutive_identical_bails` counter on the task, incremented when a new block event's reason is fuzzy-matched to the prior one (or simply substring-matched on a normalized form)
- Reset counter when the block reason changes substantively, when a comment is added by a non-worker (human/orchestrator intervention), or when the task transitions through `done`
- At `consecutive_identical_bails >= N` (default 5, configurable), refuse to re-claim and emit a `circuit_open` diagnostic + force a triage-watcher handoff if one doesn't exist

## Workaround

Operator-side: orchestrator must manually monitor blocked tasks and intervene before the loop runs hundreds of cycles. Encoded as a checklist in the `kanban-orchestrator` skill, but easy to miss when the orchestrator is on a long side-quest.

## Related

- Triage watcher orchestrator-handoff SLA (60min) — works correctly but is downstream of the loop, not the loop itself
- `max_retries` on tasks — counts failures, not voluntary bails, so doesn't trip on this pattern


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kanban dispatcher: add circuit-breaker for repeated worker bails with identical block reason #29320

Feature request: circuit-breaker on repeated worker bails with identical block reason

Summary

Repro

Expected

Actual

Suspected cause

Suggested implementation sketch

Workaround

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

kanban dispatcher: add circuit-breaker for repeated worker bails with identical block reason #29320

Description

Feature request: circuit-breaker on repeated worker bails with identical block reason

Summary

Repro

Expected

Actual

Suspected cause

Suggested implementation sketch

Workaround

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions