fix(kanban): call kanban_block on iteration-budget exhaustion to prevent protocol violation#23228
Closed
liuhao1024 wants to merge 1 commit into
Closed
Conversation
…ent protocol violation When a kanban worker subprocess hits the iteration budget, the agent loop strips tools and asks the model for a summary. The model cannot call kanban_block itself at that point, so the process exits rc=0 without calling kanban_complete or kanban_block — a protocol violation that the dispatcher detects as a fatal error, giving up after 1 failure and stranding downstream tasks. Fix: after _handle_max_iterations() returns, check HERMES_KANBAN_TASK and call kanban_block with a reason describing the exhaustion. The dispatcher then sees a clean block transition instead of a protocol violation, and the task can be retried or escalated by a human. Fixes [Bug] kanban-worker exits cleanly (rc=0) on iteration-budget exhaustion without calling kanban_complete or kanban_block NousResearch#23216
f2f08ec to
d19521b
Compare
Collaborator
|
Merged via salvage PR #23791. Your commits were cherry-picked onto current main with your authorship preserved in git log. Thanks for the fix! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When a kanban worker subprocess hits the iteration budget, the agent loop strips tools and asks the model for a summary via
_handle_max_iterations(). The model cannot callkanban_blockat that point (tools are gone), so the process exits with rc=0 without ever callingkanban_completeorkanban_block. The dispatcher correctly detects this as a protocol violation but treats it as fatal — giving up after 1 failure witheffective_limit: 1, stranding all downstream tasks.Root cause
The iteration-exhaustion path in
run_agent.py(line ~14944) calls_handle_max_iterations()which makes a toolless API call for a summary. After the summary returns, the agent loop exits normally. There is no hook to notify the kanban dispatcher that the worker could not complete its task.The kanban worker contract (call
kanban_completeorkanban_blockbefore exiting) lives in the kanban-worker SKILL prompt, but the iteration-exhaustion path bypasses the skill entirely — the model receives the summary directive from the agent loop, not from the skill.Fix
After
_handle_max_iterations()returns, check ifHERMES_KANBAN_TASKis set (indicating the agent is running as a kanban worker). If so, callkanban_blockviahandle_function_callwith a reason describing the exhaustion. The dispatcher then sees a clean block transition instead of a protocol violation, and the task can be retried or escalated by a human.The
kanban_blockcall is wrapped in a try/except to prevent failures from crashing the agent loop — if the block call fails, we log a warning and continue with the normal exit path.Regression coverage
Two new tests in
tests/run_agent/test_run_agent.py:test_kanban_block_called_on_iteration_exhaustion— setsHERMES_KANBAN_TASK, exhausts the iteration budget, and asserts thathandle_function_callis called exactly once withkanban_blockand the correct task_id/reason.test_no_kanban_block_when_not_in_kanban_mode— exhausts the iteration budget withoutHERMES_KANBAN_TASKset and asserts thatkanban_blockis never called (no spurious side effects).Testing
tests/run_agent/test_run_agent.pypass (including the 2 new ones).Fixes [Bug] kanban-worker exits cleanly (rc=0) on iteration-budget exhaustion without calling kanban_complete or kanban_block — protocol violation strands downstream tasks #23216