fix(kanban): checkpoint reminder before worker exits with text response (closes #27178)#34193
fix(kanban): checkpoint reminder before worker exits with text response (closes #27178)#34193teknium1 wants to merge 1 commit into
Conversation
🔎 Lint report:
|
| Rule | Count |
|---|---|
unresolved-import |
1 |
First entries
tests/run_agent/test_kanban_checkpoint_reminder.py:26: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
✅ Fixed issues: none
Unchanged: 5045 pre-existing issues carried over.
Diagnostics are surfaced as warnings — this check never fails the build.
… response (#27178) When a dispatcher-spawned kanban worker exits via finish_reason=stop without having called kanban_complete or kanban_block, the dispatcher flags it as protocol_violation and immediately trips the failure breaker. Per reporter @Jonnybooyy's repro: agent reaches the end of its turn with a 668-char text response after 28 tool turns of substantive work, but never checkpoints. From the agent's perspective the turn ended normally; from the dispatcher's perspective the task crashed and gets hard-blocked after the third occurrence. This is the canonical 'agent forgot to checkpoint' pattern. Fix gives the agent ONE more turn with an injected reminder before letting the worker exit: 1. At the natural-exit point (assistant text response, no tool_calls, finish_reason=stop), check HERMES_KANBAN_TASK env to scope the nudge to dispatcher-spawned workers. 2. Scan the conversation for any prior kanban_complete or kanban_block tool call. If found, exit normally. 3. Otherwise, append a synthetic user-role message reminding the agent to call kanban_complete or kanban_block (with the actual task_id pre-filled), set _kanban_checkpoint_reminder_sent=True, and continue the loop instead of breaking. 4. The flag caps the nudge at one per turn. If the agent ignores the reminder on the followup turn, the natural-exit fires again, the flag prevents a second nudge, and the existing protocol_violation path on the dispatcher side handles the task as before. Normal chat sessions never see this — HERMES_KANBAN_TASK is only set by the kanban dispatcher when spawning workers. The injection is structurally identical to the precedent in _handle_max_iterations (chat_completion_helpers.py:1289) which already appends a synthetic user message when iteration budget is exhausted. Tests: - tests/run_agent/test_kanban_checkpoint_reminder.py (2 cases): source-level invariant catching refactors that drop the nudge, and a unit test of the message-scan detection logic across 6 message-shape scenarios. - 1947/1947 run_agent + kanban tests pass. Co-authored-by: Jonnybooyy <jonlopez101@gmail.com>
7bcefa3 to
9efc07f
Compare
…34217) tui_gateway.server registers two atexit hooks at module load time: ThreadPoolExecutor shutdown (line 170) and _shutdown_sessions (line 336). Three test files reloaded the module on each fixture teardown to reset per-test state. Each reload re-runs module-level code, including the atexit registrations — duplicates accumulate across the test session. At pytest interpreter shutdown the duplicated atexit hooks race the stderr buffer flush: Fatal Python error: _enter_buffered_busy: could not acquire lock for <_io.BufferedWriter name='<stderr>'> at interpreter shutdown, possibly due to daemon threads pytest reports 'tests passed but the slice exited non-zero', and the shard turns red on CI. Surfaced today on PR #34193's test slice 1 (204 files, 3572 tests passed, then Fatal Python error during exit). Fix: drop importlib.reload(mod) from the three fixtures that have it. Per-test reset is handled by clearing the mutable session dicts (_sessions, _pending, _answers). _methods is also no longer cleared — it's populated at module import time and would only be re-populated by a reload, so clearing it without reload broke session.resume / command.dispatch / slash.exec method registration across tests. Affected fixtures: - tests/tui_gateway/test_goal_command.py - tests/tui_gateway/test_protocol.py - tests/tui_gateway/test_review_summary_callback.py The second reload in test_protocol.py at line 211 (reload of tui_gateway.transport) is preserved — transport.py has no atexit hooks or threads, so reload is safe there. Tests: 84/84 in tests/tui_gateway/ pass cleanly with exit code 0; no Fatal Python error at interpreter shutdown.
|
Closing without merging — decided this PR's value-to-LOC ratio isn't quite right despite the implementation being correct and CI being green. Summary of the thinking: What this PR was going to do: When a dispatcher-spawned kanban worker exits via Why we're not shipping it:
What landed from the triage anyway: During CI investigation for this PR, surfaced an unrelated test-infrastructure flake in If this becomes a real problem on current main, please reopen with: the session log showing the worker's Credit
Closing PR + leaving the issue open at P3 — if anyone retries the exact failure pattern on current main and it still bites, we'll revisit. The architecture for re-orientation across block→comment→resume already exists; this PR was about closing the "agent forgot, dispatcher punished" gap specifically. |
…ousResearch#34217) tui_gateway.server registers two atexit hooks at module load time: ThreadPoolExecutor shutdown (line 170) and _shutdown_sessions (line 336). Three test files reloaded the module on each fixture teardown to reset per-test state. Each reload re-runs module-level code, including the atexit registrations — duplicates accumulate across the test session. At pytest interpreter shutdown the duplicated atexit hooks race the stderr buffer flush: Fatal Python error: _enter_buffered_busy: could not acquire lock for <_io.BufferedWriter name='<stderr>'> at interpreter shutdown, possibly due to daemon threads pytest reports 'tests passed but the slice exited non-zero', and the shard turns red on CI. Surfaced today on PR NousResearch#34193's test slice 1 (204 files, 3572 tests passed, then Fatal Python error during exit). Fix: drop importlib.reload(mod) from the three fixtures that have it. Per-test reset is handled by clearing the mutable session dicts (_sessions, _pending, _answers). _methods is also no longer cleared — it's populated at module import time and would only be re-populated by a reload, so clearing it without reload broke session.resume / command.dispatch / slash.exec method registration across tests. Affected fixtures: - tests/tui_gateway/test_goal_command.py - tests/tui_gateway/test_protocol.py - tests/tui_gateway/test_review_summary_callback.py The second reload in test_protocol.py at line 211 (reload of tui_gateway.transport) is preserved — transport.py has no atexit hooks or threads, so reload is safe there. Tests: 84/84 in tests/tui_gateway/ pass cleanly with exit code 0; no Fatal Python error at interpreter shutdown. #AI commit#
…ousResearch#34217) tui_gateway.server registers two atexit hooks at module load time: ThreadPoolExecutor shutdown (line 170) and _shutdown_sessions (line 336). Three test files reloaded the module on each fixture teardown to reset per-test state. Each reload re-runs module-level code, including the atexit registrations — duplicates accumulate across the test session. At pytest interpreter shutdown the duplicated atexit hooks race the stderr buffer flush: Fatal Python error: _enter_buffered_busy: could not acquire lock for <_io.BufferedWriter name='<stderr>'> at interpreter shutdown, possibly due to daemon threads pytest reports 'tests passed but the slice exited non-zero', and the shard turns red on CI. Surfaced today on PR NousResearch#34193's test slice 1 (204 files, 3572 tests passed, then Fatal Python error during exit). Fix: drop importlib.reload(mod) from the three fixtures that have it. Per-test reset is handled by clearing the mutable session dicts (_sessions, _pending, _answers). _methods is also no longer cleared — it's populated at module import time and would only be re-populated by a reload, so clearing it without reload broke session.resume / command.dispatch / slash.exec method registration across tests. Affected fixtures: - tests/tui_gateway/test_goal_command.py - tests/tui_gateway/test_protocol.py - tests/tui_gateway/test_review_summary_callback.py The second reload in test_protocol.py at line 211 (reload of tui_gateway.transport) is preserved — transport.py has no atexit hooks or threads, so reload is safe there. Tests: 84/84 in tests/tui_gateway/ pass cleanly with exit code 0; no Fatal Python error at interpreter shutdown.
Fixes #27178 — kanban workers that finish with a text response (
finish_reason=stop) without callingkanban_complete/kanban_blockare flagged asprotocol_violationby the dispatcher, even when the agent did substantive work and just forgot to checkpoint.The bug
Reporter @Jonnybooyy's repro shape: agent worked through 28 tool turns / 29 API calls over ~3 minutes (
search_files,read_file,terminal, recovery from one tool error), then ended with:The agent never called
kanban_completeorkanban_block. From the agent's perspective the turn ended normally. From the dispatcher's perspective: the task is stillrunningin the DB, the worker exited rc=0, so this trips_classify_worker_exit() == "clean_exit"→protocol_violation→failure_limit=1(immediate breaker trip) → task →blocked. The agent's 28 turns of work were thrown away.This is the canonical "agent forgot to checkpoint" pattern. It is NOT the model being broken; it's the prompt-following gap between a kanban worker's job (always end with a terminal tool call) and a normal chat session's job (it's fine to just stop).
The fix
When the worker is about to exit naturally via text response, give the agent ONE more turn with a synthetic user-role reminder:
The injection is structurally identical to the existing
_handle_max_iterationspattern (chat_completion_helpers.py:1289) which already appends a synthetic user message when iteration budget is exhausted. Same role-alternation properties, same "after final assistant response" position.Invariants preserved
HERMES_KANBAN_TASK— only set by the kanban dispatcher when spawning workersbreak. Same position as user typing a follow-up message in interactive chatassistant (text response) → user (reminder) → assistant (next turn). Standard forward-flow alternation_kanban_checkpoint_reminder_sentflag scoped to one turn caps at exactly 1 nudge. If the agent ignores the reminder, the natural exit fires again, the flag prevents re-injection, and the existing protocol_violation path handles itkanban_complete/kanban_blocktool call in messages; if found, no reminder_handle_max_iterations) already usesReporter's other suggestions (deferred)
The bug report contained 4 suggested fixes. This PR implements #1 (checkpoint reminder on
finish_reason=stop). The others are worth separate consideration:failure_limit=1behavior still applies. That trip is now informative (agent ignored 2 chances) rather than punitive (one strike on first miss).Validation
tests/run_agent/test_kanban_checkpoint_reminder.py— 2 new tests:tests/run_agent/full suite + kanban + signal-handler suites: 1947/1947 pass (one order-dependent flake intest_provider_parity.pyconfirmed unrelated — passes in isolation, same flake exists onorigin/main).Credit
protocol_violationstring, which made root-cause diagnosis significantly slower than necessary). Proposed fix Terminal tool #1 ("Inject a checkpoint reminder onfinish_reason=stopwithout prior checkpoint") was the design that shipped.Co-authored-by:trailer on the commit; AUTHOR_MAP entry added.protocol_violationcluster.Infographic