fix(kanban): checkpoint reminder before worker exits with text response (closes #27178) by teknium1 · Pull Request #34193 · NousResearch/hermes-agent

teknium1 · 2026-05-28T23:59:46Z

Fixes #27178 — kanban workers that finish with a text response (finish_reason=stop) without calling kanban_complete / kanban_block are flagged as protocol_violation by the dispatcher, even when the agent did substantive work and just forgot to checkpoint.

The bug

Reporter @Jonnybooyy's repro shape: agent worked through 28 tool turns / 29 API calls over ~3 minutes (search_files, read_file, terminal, recovery from one tool error), then ended with:

Turn ended: reason=text_response(finish_reason=stop)
  model=gpt-5.3-codex api_calls=29/60 budget=29/60 tool_turns=28
  last_msg_role=assistant response_len=668

The agent never called kanban_complete or kanban_block. From the agent's perspective the turn ended normally. From the dispatcher's perspective: the task is still running in the DB, the worker exited rc=0, so this trips _classify_worker_exit() == "clean_exit" → protocol_violation → failure_limit=1 (immediate breaker trip) → task → blocked. The agent's 28 turns of work were thrown away.

This is the canonical "agent forgot to checkpoint" pattern. It is NOT the model being broken; it's the prompt-following gap between a kanban worker's job (always end with a terminal tool call) and a normal chat session's job (it's fine to just stop).

The fix

When the worker is about to exit naturally via text response, give the agent ONE more turn with a synthetic user-role reminder:

You ended your turn without calling `kanban_complete` or `kanban_block`
for task `t_a1b2c3d4`. Every kanban worker run MUST end with one of
these tool calls so the dispatcher knows the task outcome.

If your work is finished:
  kanban_complete(task_id="t_a1b2c3d4", summary="...", result="...")

If you cannot finish without more input:
  kanban_block(task_id="t_a1b2c3d4", reason="...")

Please make that call now.

The injection is structurally identical to the existing _handle_max_iterations pattern (chat_completion_helpers.py:1289) which already appends a synthetic user message when iteration budget is exhausted. Same role-alternation properties, same "after final assistant response" position.

Invariants preserved

Property	Check
Normal chat sessions unaffected	Env-gated on `HERMES_KANBAN_TASK` — only set by the kanban dispatcher when spawning workers
No mid-loop synthetic injection	Reminder fires AFTER the assistant's final text response is appended, BEFORE the loop's `break`. Same position as user typing a follow-up message in interactive chat
Role alternation correctness	Sequence is `assistant (text response) → user (reminder) → assistant (next turn)`. Standard forward-flow alternation
No reminder loop	`_kanban_checkpoint_reminder_sent` flag scoped to one turn caps at exactly 1 nudge. If the agent ignores the reminder, the natural exit fires again, the flag prevents re-injection, and the existing protocol_violation path handles it
Idempotent on already-checkpointed tasks	Pre-injection scan checks for any prior `kanban_complete` / `kanban_block` tool call in messages; if found, no reminder
AGENTS.md "no synthetic user mid-loop" rule	That rule specifically prohibits injection BETWEEN tool_call and tool_result. This injection is at the post-final-assistant-response position which the precedent (`_handle_max_iterations`) already uses

Reporter's other suggestions (deferred)

The bug report contained 4 suggested fixes. This PR implements #1 (checkpoint reminder on finish_reason=stop). The others are worth separate consideration:

Support passing morph snapshot id #2 ("Differentiate the dispatcher error message" — split protocol_violation into clean_exit_no_checkpoint vs true_crash vs model_auth_error): legitimate triage improvement, separate PR. Not blocking the agent-side fix here.
Architecture planning #3 ("Render the checkpoint requirement at every turn"): would help with by-turn-28-it's-out-of-the-recency-window forgetting. Conflicts with prompt-caching invariants (every-turn-append breaks the cache). The single-reminder approach in this PR is a less invasive alternative that gets most of the benefit.
Fix terminal interactivity #4 ("Don't auto-block on the first checkpoint-omission failure"): with this PR, the agent gets a checkpoint nudge BEFORE the dispatcher ever sees the clean exit. If the agent ignores the nudge, the existing failure_limit=1 behavior still applies. That trip is now informative (agent ignored 2 chances) rather than punitive (one strike on first miss).

Validation

tests/run_agent/test_kanban_checkpoint_reminder.py — 2 new tests:
- Source-level invariant catching refactors that drop the env-gated nudge
- Message-scan detection logic across 6 message-shape scenarios (kanban_complete alone, kanban_block alone, non-terminal kanban_show/heartbeat only, mixed kanban_show + kanban_complete, empty, malformed tool_calls)
tests/run_agent/ full suite + kanban + signal-handler suites: 1947/1947 pass (one order-dependent flake in test_provider_parity.py confirmed unrelated — passes in isolation, same flake exists on origin/main).

Credit

@Jonnybooyy — root-cause diagnosis (three repro shapes across three runs, all surfacing as the same protocol_violation string, which made root-cause diagnosis significantly slower than necessary). Proposed fix Terminal tool #1 ("Inject a checkpoint reminder on finish_reason=stop without prior checkpoint") was the design that shipped. Co-authored-by: trailer on the commit; AUTHOR_MAP entry added.
@cardtest15-coder, @alt-glitch — triage cross-references that helped scope this as distinct from the existing protocol_violation cluster.

Infographic

github-actions · 2026-05-29T00:00:30Z

🔎 Lint report: `hermes/hermes-4f9b6288` vs `origin/main`

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 9578 on HEAD, 9577 on base (🆕 +1)

🆕 New issues (1):

Rule	Count
`unresolved-import`	1

First entries

tests/run_agent/test_kanban_checkpoint_reminder.py:26: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`

✅ Fixed issues: none

Unchanged: 5045 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

@Jonnybooyy

… response (#27178) When a dispatcher-spawned kanban worker exits via finish_reason=stop without having called kanban_complete or kanban_block, the dispatcher flags it as protocol_violation and immediately trips the failure breaker. Per reporter @Jonnybooyy's repro: agent reaches the end of its turn with a 668-char text response after 28 tool turns of substantive work, but never checkpoints. From the agent's perspective the turn ended normally; from the dispatcher's perspective the task crashed and gets hard-blocked after the third occurrence. This is the canonical 'agent forgot to checkpoint' pattern. Fix gives the agent ONE more turn with an injected reminder before letting the worker exit: 1. At the natural-exit point (assistant text response, no tool_calls, finish_reason=stop), check HERMES_KANBAN_TASK env to scope the nudge to dispatcher-spawned workers. 2. Scan the conversation for any prior kanban_complete or kanban_block tool call. If found, exit normally. 3. Otherwise, append a synthetic user-role message reminding the agent to call kanban_complete or kanban_block (with the actual task_id pre-filled), set _kanban_checkpoint_reminder_sent=True, and continue the loop instead of breaking. 4. The flag caps the nudge at one per turn. If the agent ignores the reminder on the followup turn, the natural-exit fires again, the flag prevents a second nudge, and the existing protocol_violation path on the dispatcher side handles the task as before. Normal chat sessions never see this — HERMES_KANBAN_TASK is only set by the kanban dispatcher when spawning workers. The injection is structurally identical to the precedent in _handle_max_iterations (chat_completion_helpers.py:1289) which already appends a synthetic user message when iteration budget is exhausted. Tests: - tests/run_agent/test_kanban_checkpoint_reminder.py (2 cases): source-level invariant catching refactors that drop the nudge, and a unit test of the message-scan detection logic across 6 message-shape scenarios. - 1947/1947 run_agent + kanban tests pass. Co-authored-by: Jonnybooyy <jonlopez101@gmail.com>

…34217) tui_gateway.server registers two atexit hooks at module load time: ThreadPoolExecutor shutdown (line 170) and _shutdown_sessions (line 336). Three test files reloaded the module on each fixture teardown to reset per-test state. Each reload re-runs module-level code, including the atexit registrations — duplicates accumulate across the test session. At pytest interpreter shutdown the duplicated atexit hooks race the stderr buffer flush: Fatal Python error: _enter_buffered_busy: could not acquire lock for <_io.BufferedWriter name='<stderr>'> at interpreter shutdown, possibly due to daemon threads pytest reports 'tests passed but the slice exited non-zero', and the shard turns red on CI. Surfaced today on PR #34193's test slice 1 (204 files, 3572 tests passed, then Fatal Python error during exit). Fix: drop importlib.reload(mod) from the three fixtures that have it. Per-test reset is handled by clearing the mutable session dicts (_sessions, _pending, _answers). _methods is also no longer cleared — it's populated at module import time and would only be re-populated by a reload, so clearing it without reload broke session.resume / command.dispatch / slash.exec method registration across tests. Affected fixtures: - tests/tui_gateway/test_goal_command.py - tests/tui_gateway/test_protocol.py - tests/tui_gateway/test_review_summary_callback.py The second reload in test_protocol.py at line 211 (reload of tui_gateway.transport) is preserved — transport.py has no atexit hooks or threads, so reload is safe there. Tests: 84/84 in tests/tui_gateway/ pass cleanly with exit code 0; no Fatal Python error at interpreter shutdown.

teknium1 · 2026-05-29T01:17:34Z

Closing without merging — decided this PR's value-to-LOC ratio isn't quite right despite the implementation being correct and CI being green.

Summary of the thinking:

What this PR was going to do: When a dispatcher-spawned kanban worker exits via finish_reason=stop without having called kanban_complete / kanban_block, inject a synthetic user-role reminder and give the agent one more turn to checkpoint properly. Implementation: +85 LOC in agent/conversation_loop.py + regression test file.

Why we're not shipping it:

No confirmed in-the-wild incidence on current main. The reporter's bug is from v0.13.0 (May 16). Lots of kanban polish has landed since — KANBAN_GUIDANCE injection works, kanban-worker skill auto-loads, prior-attempts + comment-thread context is surfaced via kanban_show. The specific failure pattern (28 turns of work, agent forgets the final checkpoint call) is a model-instruction-following gap, not a missing affordance.
One-line workaround already exists. hermes kanban unblock <task_id> recovers the task and re-dispatches with the new context (prior summary + comments visible to the next worker).
The fix is a prompt-engineering bet. If the agent ignored "MUST end with kanban_complete or kanban_block" in its system prompt for 28 turns, a synthetic reminder might just produce "yes I'm done" again — burning two more turns and ending in the same protocol_violation.
+85 LOC in agent/conversation_loop.py for a polish fix carries ongoing maintenance cost in a 4600-line file that's part of the hot path. Not worth it for an unreported-on-current-main failure mode.

What landed from the triage anyway: During CI investigation for this PR, surfaced an unrelated test-infrastructure flake in tests/tui_gateway/ (atexit hooks duplicated by importlib.reload(mod) in fixture teardown → race at interpreter shutdown → "all tests pass but slice exits non-zero"). Split that into its own PR (#34217) and merged — fixes a real CI flake that was hitting unrelated PRs too.

If this becomes a real problem on current main, please reopen with: the session log showing the worker's messages array at exit, the tools array the worker was given, and the dispatcher's tool_turns count from the failed run. If we see the pattern empirically on post-v0.15 main, the implementation in this PR is a starting point.

Credit

@Jonnybooyy — sharp three-shape diagnostic (workspace contention / model auth / text-response-without-checkpoint, all surfacing as the same protocol_violation string) saved real triage time even though the resulting PR isn't shipping. Proposed fix Terminal tool #1 (reminder injection) was a sound design; not a wasted contribution. AUTHOR_MAP entry stays in place from this triage cycle in case a future PR draws on the same investigation.
@cardtest15-coder, @alt-glitch — triage cross-references that helped scope this as distinct from the existing protocol_violation cluster.

Closing PR + leaving the issue open at P3 — if anyone retries the exact failure pattern on current main and it still bites, we'll revisit. The architecture for re-orientation across block→comment→resume already exists; this PR was about closing the "agent forgot, dispatcher punished" gap specifically.

…ousResearch#34217) tui_gateway.server registers two atexit hooks at module load time: ThreadPoolExecutor shutdown (line 170) and _shutdown_sessions (line 336). Three test files reloaded the module on each fixture teardown to reset per-test state. Each reload re-runs module-level code, including the atexit registrations — duplicates accumulate across the test session. At pytest interpreter shutdown the duplicated atexit hooks race the stderr buffer flush: Fatal Python error: _enter_buffered_busy: could not acquire lock for <_io.BufferedWriter name='<stderr>'> at interpreter shutdown, possibly due to daemon threads pytest reports 'tests passed but the slice exited non-zero', and the shard turns red on CI. Surfaced today on PR NousResearch#34193's test slice 1 (204 files, 3572 tests passed, then Fatal Python error during exit). Fix: drop importlib.reload(mod) from the three fixtures that have it. Per-test reset is handled by clearing the mutable session dicts (_sessions, _pending, _answers). _methods is also no longer cleared — it's populated at module import time and would only be re-populated by a reload, so clearing it without reload broke session.resume / command.dispatch / slash.exec method registration across tests. Affected fixtures: - tests/tui_gateway/test_goal_command.py - tests/tui_gateway/test_protocol.py - tests/tui_gateway/test_review_summary_callback.py The second reload in test_protocol.py at line 211 (reload of tui_gateway.transport) is preserved — transport.py has no atexit hooks or threads, so reload is safe there. Tests: 84/84 in tests/tui_gateway/ pass cleanly with exit code 0; no Fatal Python error at interpreter shutdown. #AI commit#

…ousResearch#34217) tui_gateway.server registers two atexit hooks at module load time: ThreadPoolExecutor shutdown (line 170) and _shutdown_sessions (line 336). Three test files reloaded the module on each fixture teardown to reset per-test state. Each reload re-runs module-level code, including the atexit registrations — duplicates accumulate across the test session. At pytest interpreter shutdown the duplicated atexit hooks race the stderr buffer flush: Fatal Python error: _enter_buffered_busy: could not acquire lock for <_io.BufferedWriter name='<stderr>'> at interpreter shutdown, possibly due to daemon threads pytest reports 'tests passed but the slice exited non-zero', and the shard turns red on CI. Surfaced today on PR NousResearch#34193's test slice 1 (204 files, 3572 tests passed, then Fatal Python error during exit). Fix: drop importlib.reload(mod) from the three fixtures that have it. Per-test reset is handled by clearing the mutable session dicts (_sessions, _pending, _answers). _methods is also no longer cleared — it's populated at module import time and would only be re-populated by a reload, so clearing it without reload broke session.resume / command.dispatch / slash.exec method registration across tests. Affected fixtures: - tests/tui_gateway/test_goal_command.py - tests/tui_gateway/test_protocol.py - tests/tui_gateway/test_review_summary_callback.py The second reload in test_protocol.py at line 211 (reload of tui_gateway.transport) is preserved — transport.py has no atexit hooks or threads, so reload is safe there. Tests: 84/84 in tests/tui_gateway/ pass cleanly with exit code 0; no Fatal Python error at interpreter shutdown.

teknium1 force-pushed the hermes/hermes-4f9b6288 branch from 7bcefa3 to 9efc07f Compare May 29, 2026 00:47

teknium1 mentioned this pull request May 29, 2026

test(tui_gateway): stop reloading server module in fixture teardown #34217

Merged

teknium1 closed this May 29, 2026

teknium1 mentioned this pull request May 29, 2026

Kanban worker reports 'protocol_violation' when agent ends turn with text response instead of calling kanban_complete/kanban_block #27178

Open

BrewTestBot mentioned this pull request May 29, 2026

hermes-agent 2026.5.29 Homebrew/homebrew-core#285204

Merged

1 task

github-actions Bot mentioned this pull request May 29, 2026

chore: bump NousResearch/hermes-agent version from v2026.5.16 to v2026.5.29 Docker-Hub-sirmark/docker-hermes-agent#7

Merged

weilai-zte mentioned this pull request Jun 12, 2026

Kanban dispatcher misclassifies rc=0 worker exits as protocol_violation (new in-the-wild data, 270 events / 160 blocked tasks) #44812

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(kanban): checkpoint reminder before worker exits with text response (closes #27178)#34193

fix(kanban): checkpoint reminder before worker exits with text response (closes #27178)#34193
teknium1 wants to merge 1 commit into
mainfrom
hermes/hermes-4f9b6288

teknium1 commented May 28, 2026

Uh oh!

github-actions Bot commented May 29, 2026 •

edited

Loading

Uh oh!

teknium1 commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

teknium1 commented May 28, 2026

The bug

The fix

Invariants preserved

Reporter's other suggestions (deferred)

Validation

Credit

Infographic

Uh oh!

github-actions Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔎 Lint report: hermes/hermes-4f9b6288 vs origin/main

ruff

ty (type checker)

Uh oh!

teknium1 commented May 29, 2026

Credit

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented May 29, 2026 •

edited

Loading

🔎 Lint report: `hermes/hermes-4f9b6288` vs `origin/main`