feat: adaptive retry with model escalation for kanban dispatcher#30608
Closed
rodaddy wants to merge 1 commit into
Closed
feat: adaptive retry with model escalation for kanban dispatcher#30608rodaddy wants to merge 1 commit into
rodaddy wants to merge 1 commit into
Conversation
- Add kanban.retry_model_escalation config key (dict, empty default, fully backward compatible). Maps model names to escalation targets so consecutive failures prompt a model upgrade on the next retry. - Add model_escalation param to dispatch_once(); when consecutive_failures > 0, look up the task's current model_override (or empty string for profile default) and set the escalated target before spawning. - Fix crash loop stickiness (NousResearch#30417): parentless tasks blocked by the circuit breaker (gave_up event) were re-promoted every tick because all([]) == True (vacuous truth on empty parent list). Now recompute_ready skips promotion for parentless blocked tasks when the latest event is gave_up with no subsequent unblocked event. Tasks WITH parents still auto-recover when their parents complete (original design preserved). - Wire model_escalation from config in gateway/run.py dispatcher tick. - 8 new tests in test_kanban_db.py covering escalation cases and crash loop fix. All 332 existing tests continue to pass. Ref: NousResearch#30587, NousResearch#30417
14 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements adaptive model escalation for the kanban dispatcher when tasks fail repeatedly, plus fixes the crash-loop stickiness bug (#30417).
What changed
1. Config key:
kanban.retry_model_escalation(hermes_cli/config.py)consecutive_failures > 02. Dispatch logic (
hermes_cli/kanban_db.py)dispatch_once()gains amodel_escalation: Optional[dict]parametermodel_overrideor empty string for profile default), looks it up in the map, and persists the escalated model back to the task row so each retry uses the upgraded modelconsecutive_failures == 0), empty map, or when already at the top of the chain3. Crash-loop stickiness fix (#30417) (
hermes_cli/kanban_db.py)recompute_readypromotesblockedtasks when all parents are done; for parentless tasksall([]) == True(vacuous truth), so a circuit-breaker-blocked task re-promoted every tickrecompute_ready, when a parentless blocked task has agave_upevent with no subsequentunblockedevent, skip promotion -- requires explicithermes kanban unblockto re-queue4. Gateway wiring (
gateway/run.py)kanban.retry_model_escalationfrom config and passes it todispatch_onceon every tickTests
8 new tests in
tests/hermes_cli/test_kanban_db.py:All 332 existing tests continue to pass.
How to verify
Closes #30587
Fixes #30417