Skip to content

fix(kanban): checkpoint reminder before worker exits with text response (closes #27178)#34193

Closed
teknium1 wants to merge 1 commit into
mainfrom
hermes/hermes-4f9b6288
Closed

fix(kanban): checkpoint reminder before worker exits with text response (closes #27178)#34193
teknium1 wants to merge 1 commit into
mainfrom
hermes/hermes-4f9b6288

Conversation

@teknium1

Copy link
Copy Markdown
Contributor

Fixes #27178 — kanban workers that finish with a text response (finish_reason=stop) without calling kanban_complete / kanban_block are flagged as protocol_violation by the dispatcher, even when the agent did substantive work and just forgot to checkpoint.

The bug

Reporter @Jonnybooyy's repro shape: agent worked through 28 tool turns / 29 API calls over ~3 minutes (search_files, read_file, terminal, recovery from one tool error), then ended with:

Turn ended: reason=text_response(finish_reason=stop)
  model=gpt-5.3-codex api_calls=29/60 budget=29/60 tool_turns=28
  last_msg_role=assistant response_len=668

The agent never called kanban_complete or kanban_block. From the agent's perspective the turn ended normally. From the dispatcher's perspective: the task is still running in the DB, the worker exited rc=0, so this trips _classify_worker_exit() == "clean_exit"protocol_violationfailure_limit=1 (immediate breaker trip) → task → blocked. The agent's 28 turns of work were thrown away.

This is the canonical "agent forgot to checkpoint" pattern. It is NOT the model being broken; it's the prompt-following gap between a kanban worker's job (always end with a terminal tool call) and a normal chat session's job (it's fine to just stop).

The fix

When the worker is about to exit naturally via text response, give the agent ONE more turn with a synthetic user-role reminder:

You ended your turn without calling `kanban_complete` or `kanban_block`
for task `t_a1b2c3d4`. Every kanban worker run MUST end with one of
these tool calls so the dispatcher knows the task outcome.

If your work is finished:
  kanban_complete(task_id="t_a1b2c3d4", summary="...", result="...")

If you cannot finish without more input:
  kanban_block(task_id="t_a1b2c3d4", reason="...")

Please make that call now.

The injection is structurally identical to the existing _handle_max_iterations pattern (chat_completion_helpers.py:1289) which already appends a synthetic user message when iteration budget is exhausted. Same role-alternation properties, same "after final assistant response" position.

Invariants preserved

Property Check
Normal chat sessions unaffected Env-gated on HERMES_KANBAN_TASK — only set by the kanban dispatcher when spawning workers
No mid-loop synthetic injection Reminder fires AFTER the assistant's final text response is appended, BEFORE the loop's break. Same position as user typing a follow-up message in interactive chat
Role alternation correctness Sequence is assistant (text response) → user (reminder) → assistant (next turn). Standard forward-flow alternation
No reminder loop _kanban_checkpoint_reminder_sent flag scoped to one turn caps at exactly 1 nudge. If the agent ignores the reminder, the natural exit fires again, the flag prevents re-injection, and the existing protocol_violation path handles it
Idempotent on already-checkpointed tasks Pre-injection scan checks for any prior kanban_complete / kanban_block tool call in messages; if found, no reminder
AGENTS.md "no synthetic user mid-loop" rule That rule specifically prohibits injection BETWEEN tool_call and tool_result. This injection is at the post-final-assistant-response position which the precedent (_handle_max_iterations) already uses

Reporter's other suggestions (deferred)

The bug report contained 4 suggested fixes. This PR implements #1 (checkpoint reminder on finish_reason=stop). The others are worth separate consideration:

  • Support passing morph snapshot id #2 ("Differentiate the dispatcher error message" — split protocol_violation into clean_exit_no_checkpoint vs true_crash vs model_auth_error): legitimate triage improvement, separate PR. Not blocking the agent-side fix here.
  • Architecture planning #3 ("Render the checkpoint requirement at every turn"): would help with by-turn-28-it's-out-of-the-recency-window forgetting. Conflicts with prompt-caching invariants (every-turn-append breaks the cache). The single-reminder approach in this PR is a less invasive alternative that gets most of the benefit.
  • Fix terminal interactivity #4 ("Don't auto-block on the first checkpoint-omission failure"): with this PR, the agent gets a checkpoint nudge BEFORE the dispatcher ever sees the clean exit. If the agent ignores the nudge, the existing failure_limit=1 behavior still applies. That trip is now informative (agent ignored 2 chances) rather than punitive (one strike on first miss).

Validation

  • tests/run_agent/test_kanban_checkpoint_reminder.py — 2 new tests:
    • Source-level invariant catching refactors that drop the env-gated nudge
    • Message-scan detection logic across 6 message-shape scenarios (kanban_complete alone, kanban_block alone, non-terminal kanban_show/heartbeat only, mixed kanban_show + kanban_complete, empty, malformed tool_calls)
  • tests/run_agent/ full suite + kanban + signal-handler suites: 1947/1947 pass (one order-dependent flake in test_provider_parity.py confirmed unrelated — passes in isolation, same flake exists on origin/main).

Credit

  • @Jonnybooyy — root-cause diagnosis (three repro shapes across three runs, all surfacing as the same protocol_violation string, which made root-cause diagnosis significantly slower than necessary). Proposed fix Terminal tool #1 ("Inject a checkpoint reminder on finish_reason=stop without prior checkpoint") was the design that shipped. Co-authored-by: trailer on the commit; AUTHOR_MAP entry added.
  • @cardtest15-coder, @alt-glitch — triage cross-references that helped scope this as distinct from the existing protocol_violation cluster.

Infographic

kanban-checkpoint-reminder

@github-actions

github-actions Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

🔎 Lint report: hermes/hermes-4f9b6288 vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 9578 on HEAD, 9577 on base (🆕 +1)

🆕 New issues (1):

Rule Count
unresolved-import 1
First entries
tests/run_agent/test_kanban_checkpoint_reminder.py:26: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`

✅ Fixed issues: none

Unchanged: 5045 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

… response (#27178)

When a dispatcher-spawned kanban worker exits via finish_reason=stop
without having called kanban_complete or kanban_block, the dispatcher
flags it as protocol_violation and immediately trips the failure
breaker. Per reporter @Jonnybooyy's repro: agent reaches the end of
its turn with a 668-char text response after 28 tool turns of
substantive work, but never checkpoints. From the agent's perspective
the turn ended normally; from the dispatcher's perspective the task
crashed and gets hard-blocked after the third occurrence.

This is the canonical 'agent forgot to checkpoint' pattern. Fix gives
the agent ONE more turn with an injected reminder before letting the
worker exit:

1. At the natural-exit point (assistant text response, no tool_calls,
   finish_reason=stop), check HERMES_KANBAN_TASK env to scope the
   nudge to dispatcher-spawned workers.
2. Scan the conversation for any prior kanban_complete or kanban_block
   tool call. If found, exit normally.
3. Otherwise, append a synthetic user-role message reminding the
   agent to call kanban_complete or kanban_block (with the actual
   task_id pre-filled), set _kanban_checkpoint_reminder_sent=True,
   and continue the loop instead of breaking.
4. The flag caps the nudge at one per turn. If the agent ignores the
   reminder on the followup turn, the natural-exit fires again, the
   flag prevents a second nudge, and the existing protocol_violation
   path on the dispatcher side handles the task as before.

Normal chat sessions never see this — HERMES_KANBAN_TASK is only set
by the kanban dispatcher when spawning workers.

The injection is structurally identical to the precedent in
_handle_max_iterations (chat_completion_helpers.py:1289) which already
appends a synthetic user message when iteration budget is exhausted.

Tests:
- tests/run_agent/test_kanban_checkpoint_reminder.py (2 cases):
  source-level invariant catching refactors that drop the nudge,
  and a unit test of the message-scan detection logic across 6
  message-shape scenarios.
- 1947/1947 run_agent + kanban tests pass.

Co-authored-by: Jonnybooyy <jonlopez101@gmail.com>
@teknium1 teknium1 force-pushed the hermes/hermes-4f9b6288 branch from 7bcefa3 to 9efc07f Compare May 29, 2026 00:47
teknium1 added a commit that referenced this pull request May 29, 2026
…34217)

tui_gateway.server registers two atexit hooks at module load time:
ThreadPoolExecutor shutdown (line 170) and _shutdown_sessions (line 336).
Three test files reloaded the module on each fixture teardown to reset
per-test state. Each reload re-runs module-level code, including the
atexit registrations — duplicates accumulate across the test session.

At pytest interpreter shutdown the duplicated atexit hooks race the
stderr buffer flush:

    Fatal Python error: _enter_buffered_busy: could not acquire lock
    for <_io.BufferedWriter name='<stderr>'> at interpreter shutdown,
    possibly due to daemon threads

pytest reports 'tests passed but the slice exited non-zero', and the
shard turns red on CI. Surfaced today on PR #34193's test slice 1
(204 files, 3572 tests passed, then Fatal Python error during exit).

Fix: drop importlib.reload(mod) from the three fixtures that have it.
Per-test reset is handled by clearing the mutable session dicts
(_sessions, _pending, _answers). _methods is also no longer cleared —
it's populated at module import time and would only be re-populated by
a reload, so clearing it without reload broke session.resume /
command.dispatch / slash.exec method registration across tests.

Affected fixtures:
- tests/tui_gateway/test_goal_command.py
- tests/tui_gateway/test_protocol.py
- tests/tui_gateway/test_review_summary_callback.py

The second reload in test_protocol.py at line 211 (reload of
tui_gateway.transport) is preserved — transport.py has no atexit hooks
or threads, so reload is safe there.

Tests: 84/84 in tests/tui_gateway/ pass cleanly with exit code 0; no
Fatal Python error at interpreter shutdown.
@teknium1

Copy link
Copy Markdown
Contributor Author

Closing without merging — decided this PR's value-to-LOC ratio isn't quite right despite the implementation being correct and CI being green.

Summary of the thinking:

What this PR was going to do: When a dispatcher-spawned kanban worker exits via finish_reason=stop without having called kanban_complete / kanban_block, inject a synthetic user-role reminder and give the agent one more turn to checkpoint properly. Implementation: +85 LOC in agent/conversation_loop.py + regression test file.

Why we're not shipping it:

  1. No confirmed in-the-wild incidence on current main. The reporter's bug is from v0.13.0 (May 16). Lots of kanban polish has landed since — KANBAN_GUIDANCE injection works, kanban-worker skill auto-loads, prior-attempts + comment-thread context is surfaced via kanban_show. The specific failure pattern (28 turns of work, agent forgets the final checkpoint call) is a model-instruction-following gap, not a missing affordance.
  2. One-line workaround already exists. hermes kanban unblock <task_id> recovers the task and re-dispatches with the new context (prior summary + comments visible to the next worker).
  3. The fix is a prompt-engineering bet. If the agent ignored "MUST end with kanban_complete or kanban_block" in its system prompt for 28 turns, a synthetic reminder might just produce "yes I'm done" again — burning two more turns and ending in the same protocol_violation.
  4. +85 LOC in agent/conversation_loop.py for a polish fix carries ongoing maintenance cost in a 4600-line file that's part of the hot path. Not worth it for an unreported-on-current-main failure mode.

What landed from the triage anyway: During CI investigation for this PR, surfaced an unrelated test-infrastructure flake in tests/tui_gateway/ (atexit hooks duplicated by importlib.reload(mod) in fixture teardown → race at interpreter shutdown → "all tests pass but slice exits non-zero"). Split that into its own PR (#34217) and merged — fixes a real CI flake that was hitting unrelated PRs too.

If this becomes a real problem on current main, please reopen with: the session log showing the worker's messages array at exit, the tools array the worker was given, and the dispatcher's tool_turns count from the failed run. If we see the pattern empirically on post-v0.15 main, the implementation in this PR is a starting point.

Credit

  • @Jonnybooyy — sharp three-shape diagnostic (workspace contention / model auth / text-response-without-checkpoint, all surfacing as the same protocol_violation string) saved real triage time even though the resulting PR isn't shipping. Proposed fix Terminal tool #1 (reminder injection) was a sound design; not a wasted contribution. AUTHOR_MAP entry stays in place from this triage cycle in case a future PR draws on the same investigation.
  • @cardtest15-coder, @alt-glitch — triage cross-references that helped scope this as distinct from the existing protocol_violation cluster.

Closing PR + leaving the issue open at P3 — if anyone retries the exact failure pattern on current main and it still bites, we'll revisit. The architecture for re-orientation across block→comment→resume already exists; this PR was about closing the "agent forgot, dispatcher punished" gap specifically.

@teknium1 teknium1 closed this May 29, 2026
Bryce-huang pushed a commit to wbkunlun/hermes-agent that referenced this pull request May 29, 2026
…ousResearch#34217)

tui_gateway.server registers two atexit hooks at module load time:
ThreadPoolExecutor shutdown (line 170) and _shutdown_sessions (line 336).
Three test files reloaded the module on each fixture teardown to reset
per-test state. Each reload re-runs module-level code, including the
atexit registrations — duplicates accumulate across the test session.

At pytest interpreter shutdown the duplicated atexit hooks race the
stderr buffer flush:

    Fatal Python error: _enter_buffered_busy: could not acquire lock
    for <_io.BufferedWriter name='<stderr>'> at interpreter shutdown,
    possibly due to daemon threads

pytest reports 'tests passed but the slice exited non-zero', and the
shard turns red on CI. Surfaced today on PR NousResearch#34193's test slice 1
(204 files, 3572 tests passed, then Fatal Python error during exit).

Fix: drop importlib.reload(mod) from the three fixtures that have it.
Per-test reset is handled by clearing the mutable session dicts
(_sessions, _pending, _answers). _methods is also no longer cleared —
it's populated at module import time and would only be re-populated by
a reload, so clearing it without reload broke session.resume /
command.dispatch / slash.exec method registration across tests.

Affected fixtures:
- tests/tui_gateway/test_goal_command.py
- tests/tui_gateway/test_protocol.py
- tests/tui_gateway/test_review_summary_callback.py

The second reload in test_protocol.py at line 211 (reload of
tui_gateway.transport) is preserved — transport.py has no atexit hooks
or threads, so reload is safe there.

Tests: 84/84 in tests/tui_gateway/ pass cleanly with exit code 0; no
Fatal Python error at interpreter shutdown.
#AI commit#
KKT-OPT pushed a commit to KKT-OPT/hermes-agent that referenced this pull request May 31, 2026
…ousResearch#34217)

tui_gateway.server registers two atexit hooks at module load time:
ThreadPoolExecutor shutdown (line 170) and _shutdown_sessions (line 336).
Three test files reloaded the module on each fixture teardown to reset
per-test state. Each reload re-runs module-level code, including the
atexit registrations — duplicates accumulate across the test session.

At pytest interpreter shutdown the duplicated atexit hooks race the
stderr buffer flush:

    Fatal Python error: _enter_buffered_busy: could not acquire lock
    for <_io.BufferedWriter name='<stderr>'> at interpreter shutdown,
    possibly due to daemon threads

pytest reports 'tests passed but the slice exited non-zero', and the
shard turns red on CI. Surfaced today on PR NousResearch#34193's test slice 1
(204 files, 3572 tests passed, then Fatal Python error during exit).

Fix: drop importlib.reload(mod) from the three fixtures that have it.
Per-test reset is handled by clearing the mutable session dicts
(_sessions, _pending, _answers). _methods is also no longer cleared —
it's populated at module import time and would only be re-populated by
a reload, so clearing it without reload broke session.resume /
command.dispatch / slash.exec method registration across tests.

Affected fixtures:
- tests/tui_gateway/test_goal_command.py
- tests/tui_gateway/test_protocol.py
- tests/tui_gateway/test_review_summary_callback.py

The second reload in test_protocol.py at line 211 (reload of
tui_gateway.transport) is preserved — transport.py has no atexit hooks
or threads, so reload is safe there.

Tests: 84/84 in tests/tui_gateway/ pass cleanly with exit code 0; no
Fatal Python error at interpreter shutdown.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Kanban worker reports 'protocol_violation' when agent ends turn with text response instead of calling kanban_complete/kanban_block

1 participant