Skip to content

fix(goal): stop the /goal loop when the agent says it is done (#29090)#29318

Open
xxxigm wants to merge 4 commits into
NousResearch:mainfrom
xxxigm:fix/29090-goal-autostop-windows
Open

fix(goal): stop the /goal loop when the agent says it is done (#29090)#29318
xxxigm wants to merge 4 commits into
NousResearch:mainfrom
xxxigm:fix/29090-goal-autostop-windows

Conversation

@xxxigm

@xxxigm xxxigm commented May 20, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Closes the "[Bug]: /goal keeps triggering after the goal is completed" loop reported in #29090. The reporter typed /goal lsdjflasjdf;ljasdlfja;sldjfalsdjf on Windows and watched ↻ Continuing toward goal (N/20) fire for the full turn budget despite the agent itself stating the task was complete.

Root cause is not Windows-specific — the reporter just hit it there first. For a gibberish or otherwise unverifiable goal, the agent's reply is almost always "I don't understand / can't proceed", which a strict judge per its system prompt is supposed to treat as DONE-with-reason-blocked. Weak judge models (the cheap fast goal_judge overrides users wire up) routinely fail that rule and hedge with continue, so the loop spam-queues continuation prompts at every turn until the 20-turn budget runs out. This is the same weak-judge class tracked in #27585.

Fix is a deterministic, model-independent stop path — the agent itself attests "I'm done" via a sentinel and the loop halts before the judge runs. Three pieces wired together:

  1. Continuation prompt teaches the contractCONTINUATION_PROMPT_TEMPLATE and CONTINUATION_PROMPT_WITH_SUBGOALS_TEMPLATE now share a single _STOP_INSTRUCTION_LINE block teaching the agent two terminal sentinels:

    <<HERMES_GOAL_DONE: one-sentence reason>>
    <<HERMES_GOAL_BLOCKED: one-sentence reason>>
    

    With explicit semantics — DONE for "complete or trivially unachievable (typos, gibberish, contradictions)", BLOCKED for "I genuinely need user input to make progress".

  2. Detector_detect_goal_stop_sentinel(response) returns (kind, reason) when the sentinel is the entire final non-blank line of the reply (using re.fullmatch after stripping surrounding whitespace). Anchoring on the final-line + full-match shape keeps prompt-echoing models from false-positive-stopping when they quote the instruction back to themselves mid-reasoning, and it matches the prompt contract "the sentinel must be the LAST non-blank line in your response".

  3. evaluate_after_turn short-circuit — checks the sentinel before calling judge_goal, persists status="done", resets consecutive_parse_failures (sentinel emission means we got a real reply, so stale judge-parse failures must not auto-pause the next /goal resume), and surfaces a clear user-visible message:

    • ✓ Goal achieved: <reason> for DONE
    • ✓ Goal stopped (agent blocked): <reason> for BLOCKED (so operators know the agent wants input rather than misreading it as success).

Reproduction shape, post-fix: turn 1 the weak judge still hedges with continue, so the continuation prompt fires; turn 2 the agent — now taught the sentinel by that very prompt — emits <<HERMES_GOAL_BLOCKED: …>> on its last line, the loop halts at 2 turns used instead of 20, and the judge never runs on turn 2. The reporter's exact scenario is locked in by TestRepro29090GibberishGoal.

Judge fail-open semantics, the parse-failure auto-pause, the turn-budget backstop, and the /goal pause / resume / clear controls are all untouched — sentinel handling is purely additive. Works identically on CLI and every gateway platform because goals.py is shared.

Related Issue

Fixes #29090

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)

Changes Made

  • hermes_cli/goals.py — new GOAL_DONE_SENTINEL / GOAL_BLOCKED_SENTINEL constants, _STOP_INSTRUCTION_LINE shared between both continuation-prompt templates, _STOP_SENTINEL_RE (case-insensitive, optional reason group), _detect_goal_stop_sentinel(response) helper anchored on the final non-blank line with fullmatch after whitespace strip, and the short-circuit branch at the top of evaluate_after_turn that lands status="done" and resets the parse-failure counter.
  • tests/hermes_cli/test_goals.py23 new tests across three classes:
    • TestStopSentinelDetection — unit-level happy paths (DONE/BLOCKED with and without reason, mixed case, whitespace, trailing newlines) and rejections (None/empty, prose-only, mid-line echoes, inline echoes on the final line), plus the continuation prompt carries the sentinel teaching in both the plain and /subgoal paths.
    • TestEvaluateAfterTurnSentinel — end-to-end through GoalManager.evaluate_after_turn: DONE short-circuits the judge, BLOCKED surfaces the right message, consecutive_parse_failures resets, non-sentinel responses still call the judge (regression guard), inline sentinels on a one-line response do NOT short-circuit (prompt-echo guard end-to-end), works alongside /subgoal criteria, inert on inactive goals.
    • TestRepro29090GibberishGoal — full two-turn reproduction of the reporter's exact scenario (/goal lsdjflasjdf;ljasdlfja;sldjfalsdjf): turn 1 weak judge says continue and queues the continuation, turn 2 agent emits <<HERMES_GOAL_BLOCKED: …>> and the loop halts at 2 turns used (turns_used < max_turns) without consulting the judge.
  • website/docs/user-guide/features/goals.md — new "Agent self-attestation (stop sentinels)" subsection explaining when and why the sentinels exist, the exact wire format, and how the surface message differs between DONE and BLOCKED; "When the judge gets it wrong" now points at the sentinel as the first line of defense against false-negative judges (turn budget still backs it up).

Backwards compatible: _state schema unchanged, no new config keys, no system-prompt mutation, prompt-cache invariants preserved (the sentinel teaching lives in the user-role continuation prompt, not the system prompt). All existing goal tests across CLI / TUI / gateway continue to pass unchanged.

How to Test

# New + existing goals.py coverage (50 existing + 23 new = 73 tests)
./scripts/run_tests.sh tests/hermes_cli/test_goals.py

# Full regression across every test module that touches the goal loop
./scripts/run_tests.sh \
    tests/hermes_cli/test_goals.py \
    tests/gateway/test_goal_verdict_send.py \
    tests/gateway/test_goal_max_turns_config.py \
    tests/tui_gateway/test_goal_command.py \
    tests/cli/test_cli_goal_interrupt.py
# expected: 97 passed

Manual / behavioural verification with the reporter's input:

You: /goal lsdjflasjdf;ljasdlfja;sldjfalsdjf

  ⊙ Goal set (20-turn budget): lsdjflasjdf;ljasdlfja;sldjfalsdjf

Hermes: I don't recognize this as a coherent goal — could you clarify what
        you'd like me to do?

  ↻ Continuing toward goal (1/20): goal text is unclear

Hermes: [Continuing toward your standing goal]
        The goal text appears to be random characters with no actionable
        intent. I'll stop here so you can re-send.

        <<HERMES_GOAL_BLOCKED: goal text is unintelligible, please re-send>>

  ✓ Goal stopped (agent blocked): goal text is unintelligible, please re-send

You: _

Two turns used, loop halted cleanly, no /goal clear needed. Same input pre-fix burned all 20 turns before auto-pausing on budget.

Checklist

  • Conventional Commits (fix(goal):, test(goal):, docs(goal):)
  • 4 focused commits, single author
  • 23 new tests pass; 74 existing goal-touching tests pass across hermes_cli / gateway / tui_gateway / cli (97 total)
  • Tested on macOS 15.6 (darwin 24.6.0)
  • Updated website/docs/user-guide/features/goals.md
  • No new config keys, no system-prompt mutation, no schema change, prompt-cache invariants preserved

xxxigm added 4 commits May 20, 2026 20:19
…h#29090)

The reproducer is ``/goal lsdjflasjdf;ljasdlfja;sldjfalsdjf``: a goal
with no verifiable success criterion.  The agent's first reply almost
always says "I don't understand this — please clarify" (in prose),
which a *strict* judge per the system prompt is supposed to treat as
DONE-with-reason-blocked.  Weak judge models routinely fail that rule
and hedge with ``continue``, so the loop spam-fires continuation
prompts at every turn until the 20-turn budget runs out — the exact
"keeps triggering until the maximum limit is reached" symptom on
Windows reported here (and the broader weak-judge class tracked in
NousResearch#27585).  Nothing about the failure mode is Windows-specific; the
reporter just happened to hit it there first.

The fix is a deterministic, model-independent stop path: the
continuation prompt now teaches the agent two terminal sentinels —

  <<HERMES_GOAL_DONE: one-sentence reason>>
  <<HERMES_GOAL_BLOCKED: one-sentence reason>>

— and ``evaluate_after_turn`` checks the assistant's last response
for one *before* calling the judge.  When the sentinel is on the
final non-blank line, the loop halts immediately with the agent's
own reason surfaced to the user (``✓ Goal achieved: …`` or ``✓ Goal
stopped (agent blocked): …``).  Anchoring on the last line keeps
prompt-echoing models from false-positive-stopping when they quote
the instruction back to themselves mid-reasoning.

Three small pieces wired together:

* ``GOAL_DONE_SENTINEL`` / ``GOAL_BLOCKED_SENTINEL`` constants +
  ``_STOP_SENTINEL_RE`` (case-insensitive, optional reason group so
  bare ``<<HERMES_GOAL_DONE>>`` still trips with a fallback reason).
* ``_detect_goal_stop_sentinel(response)`` returns ``(kind, reason)``
  only when the sentinel is the last non-blank line.
* ``evaluate_after_turn`` short-circuits to a "done" verdict on
  detection, resets the consecutive-parse-failures counter (sentinel
  emission means we got a real reply, so stale judge-parse failures
  must not auto-pause the next turn), and persists status="done".

The CONTINUATION_PROMPT_TEMPLATE and CONTINUATION_PROMPT_WITH_
SUBGOALS_TEMPLATE both share a single ``_STOP_INSTRUCTION_LINE`` so
the subgoals path stays in lock-step with the plain path.  The judge
fail-open semantics, parse-failure auto-pause, turn-budget backstop,
and ``/goal pause`` / ``resume`` / ``clear`` controls are all
untouched — sentinel handling is purely additive.
NousResearch#29090)

Tightens ``_detect_goal_stop_sentinel`` from ``.search(last_line)`` to
``.fullmatch(stripped)``.  Without this guard, a one-line response that
quotes the sentinel inline while continuing to work — e.g.

  "Per the instructions, I should emit <<HERMES_GOAL_DONE: example>>
  when I finish.  Right now I'm still working — running tests next."

— would short-circuit the loop even though the agent explicitly said
they're still working.  The prompt contract is "the sentinel must be
the LAST non-blank line in your response", and ``fullmatch`` is what
actually enforces that contract: the entire final non-blank line,
after stripping surrounding whitespace, must be the sentinel and
nothing else.  Multi-line responses ending on a clean sentinel line
still trip — that's the supported shape — and bare
``<<HERMES_GOAL_DONE>>`` / ``<<HERMES_GOAL_BLOCKED>>`` (no reason
group) still match because the regex makes the reason group optional.
…top path

Two new test classes plus an end-to-end repro of the exact scenario
from the bug report.

* ``TestStopSentinelDetection`` — unit-level coverage of
  ``_detect_goal_stop_sentinel``:
  - happy paths for DONE and BLOCKED, with and without a reason
    suffix, mixed case, extra whitespace, trailing newlines;
  - rejection of empty/``None`` input, of sentinel-shaped strings in
    prose context, and (the prompt-echo guard) of sentinels embedded
    inline with surrounding text on the final line;
  - ``next_continuation_prompt`` carries the sentinel teaching in
    both the plain and ``/subgoal`` paths so the agent actually
    knows the contract on every turn.

* ``TestEvaluateAfterTurnSentinel`` — end-to-end behaviour through
  ``GoalManager.evaluate_after_turn``:
  - DONE sentinel short-circuits the judge and lands ``status="done"``;
  - BLOCKED sentinel does the same with the user-visible message
    flagging "blocked" so the operator knows the agent wants input
    instead of misreading it as success;
  - the sentinel branch resets ``consecutive_parse_failures`` so a
    flaky judge that built up failures before the sentinel landed
    doesn't auto-pause the next ``/goal resume``;
  - non-sentinel responses still call the judge (regression guard
    for the existing path);
  - inline sentinels on a multi-text final line are NOT treated as
    stops — the judge still runs (the prompt-echo guard,
    end-to-end);
  - sentinels work alongside ``/subgoal`` criteria;
  - sentinels on an inactive goal are inert (the early-return on
    ``status != 'active'`` wins).

* ``TestRepro29090GibberishGoal`` — full two-turn reproduction of the
  reporter's exact scenario (``/goal lsdjflasjdf;ljasdlfja;sldjfalsdjf``):
  turn 1 sees a weak judge return "continue" so the continuation
  prompt fires; turn 2 sees the agent emit
  ``<<HERMES_GOAL_BLOCKED: …>>`` and the loop halts at 2 turns used
  instead of 20.  Asserts the continuation prompt carries the
  sentinel teaching (otherwise the agent couldn't emit it) and that
  the judge is never called on turn 2.
…ch#29090)

Adds an "Agent self-attestation (stop sentinels)" subsection to
``goals.md`` covering ``<<HERMES_GOAL_DONE: …>>`` and
``<<HERMES_GOAL_BLOCKED: …>>``: why they exist (gibberish or
otherwise unverifiable goals where weak judges hedge with
``continue`` and burn the turn budget), how to use them (must be
the entire final non-blank line of the reply), and what the user
sees on the surface (``✓ Goal achieved`` vs ``✓ Goal stopped
(agent blocked)``).  Updates the "When the judge gets it wrong"
section to point at the sentinel as the first line of defense
against false-negative judges, with the turn budget still backing
it up.
@MKI13

MKI13 commented May 20, 2026

Copy link
Copy Markdown

Autofix update: I reproduced the Supply Chain Audit failure. The workflow scans base..head; this fork branch was stale, so that diff included unrelated install-hook files from main history such as setup.py and hermes_cli/setup.py.

I could not push directly to xxxigm:fix/29090-goal-autostop-windows with the available token, and the update-branch endpoint was not available to this account. I pushed the verified merge fix to MKI13:autofix/29318-merge-main and opened a PR into the contributor branch:

xxxigm#2

Verification on that branch:

  • Supply-chain diff after merge only includes hermes_cli/goals.py, tests/hermes_cli/test_goals.py, and website/docs/user-guide/features/goals.md.
  • No .pth/install-hook/base64+exec/obfuscated subprocess hits in the reproduced scan.
  • pytest tests/hermes_cli/test_goals.py: 73 passed.

@alt-glitch alt-glitch added P2 Medium — degraded but workaround exists type/bug Something isn't working comp/cli CLI entry point, hermes_cli/, setup wizard labels May 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/cli CLI entry point, hermes_cli/, setup wizard P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

3 participants