Skip to content

feat: adaptive retry with model escalation for kanban dispatcher#30608

Closed
rodaddy wants to merge 1 commit into
NousResearch:mainfrom
rodaddy:skippy/kanban-adaptive-retry
Closed

feat: adaptive retry with model escalation for kanban dispatcher#30608
rodaddy wants to merge 1 commit into
NousResearch:mainfrom
rodaddy:skippy/kanban-adaptive-retry

Conversation

@rodaddy

@rodaddy rodaddy commented May 22, 2026

Copy link
Copy Markdown

Summary

Implements adaptive model escalation for the kanban dispatcher when tasks fail repeatedly, plus fixes the crash-loop stickiness bug (#30417).

What changed

1. Config key: kanban.retry_model_escalation (hermes_cli/config.py)

  • New dict key, empty default -- fully backward compatible
  • Maps model names to escalation targets; applies when consecutive_failures > 0
  • Example:
    retry_model_escalation:
      sonnet4.6-off: sonnet4.6-low
      sonnet4.6-low: opus4.6-high

2. Dispatch logic (hermes_cli/kanban_db.py)

  • dispatch_once() gains a model_escalation: Optional[dict] parameter
  • Before spawn, resolves the task's current model (from model_override or empty string for profile default), looks it up in the map, and persists the escalated model back to the task row so each retry uses the upgraded model
  • No-op on first spawn (consecutive_failures == 0), empty map, or when already at the top of the chain

3. Crash-loop stickiness fix (#30417) (hermes_cli/kanban_db.py)

  • Root cause: recompute_ready promotes blocked tasks when all parents are done; for parentless tasks all([]) == True (vacuous truth), so a circuit-breaker-blocked task re-promoted every tick
  • Fix: in recompute_ready, when a parentless blocked task has a gave_up event with no subsequent unblocked event, skip promotion -- requires explicit hermes kanban unblock to re-queue
  • Tasks WITH parents still auto-recover when their parents complete (original design preserved)

4. Gateway wiring (gateway/run.py)

  • Reads kanban.retry_model_escalation from config and passes it to dispatch_once on every tick

Tests

8 new tests in tests/hermes_cli/test_kanban_db.py:

  • Model escalation: empty map, first-spawn no-op, already-at-top-of-chain, escalation from None override, escalation from set override
  • Crash loop: parentless gave_up stays blocked, explicit unblock re-queues, gave_up with done parents still promotes

All 332 existing tests continue to pass.

How to verify

source venv/bin/activate
python -m pytest tests/hermes_cli/test_kanban_db.py -k 'escalat or gave_up' -v
python -m pytest tests/hermes_cli/test_kanban_db.py tests/hermes_cli/test_kanban_core_functionality.py -q

Closes #30587
Fixes #30417

- Add kanban.retry_model_escalation config key (dict, empty default,
  fully backward compatible). Maps model names to escalation targets
  so consecutive failures prompt a model upgrade on the next retry.

- Add model_escalation param to dispatch_once(); when consecutive_failures
  > 0, look up the task's current model_override (or empty string for
  profile default) and set the escalated target before spawning.

- Fix crash loop stickiness (NousResearch#30417): parentless tasks blocked by the
  circuit breaker (gave_up event) were re-promoted every tick because
  all([]) == True (vacuous truth on empty parent list). Now recompute_ready
  skips promotion for parentless blocked tasks when the latest event is
  gave_up with no subsequent unblocked event. Tasks WITH parents still
  auto-recover when their parents complete (original design preserved).

- Wire model_escalation from config in gateway/run.py dispatcher tick.

- 8 new tests in test_kanban_db.py covering escalation cases and crash
  loop fix. All 332 existing tests continue to pass.

Ref: NousResearch#30587, NousResearch#30417
@alt-glitch alt-glitch added type/feature New feature or request P3 Low — cosmetic, nice to have comp/cli CLI entry point, hermes_cli/, setup wizard comp/cron Cron scheduler and job management labels May 22, 2026
@rodaddy rodaddy closed this May 22, 2026
@rodaddy rodaddy deleted the skippy/kanban-adaptive-retry branch May 22, 2026 21:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/cli CLI entry point, hermes_cli/, setup wizard comp/cron Cron scheduler and job management P3 Low — cosmetic, nice to have type/feature New feature or request

Projects

None yet

2 participants