fix(goal): stop the /goal loop when the agent says it is done (#29090)#29318
Open
xxxigm wants to merge 4 commits into
Open
fix(goal): stop the /goal loop when the agent says it is done (#29090)#29318xxxigm wants to merge 4 commits into
xxxigm wants to merge 4 commits into
Conversation
…h#29090) The reproducer is ``/goal lsdjflasjdf;ljasdlfja;sldjfalsdjf``: a goal with no verifiable success criterion. The agent's first reply almost always says "I don't understand this — please clarify" (in prose), which a *strict* judge per the system prompt is supposed to treat as DONE-with-reason-blocked. Weak judge models routinely fail that rule and hedge with ``continue``, so the loop spam-fires continuation prompts at every turn until the 20-turn budget runs out — the exact "keeps triggering until the maximum limit is reached" symptom on Windows reported here (and the broader weak-judge class tracked in NousResearch#27585). Nothing about the failure mode is Windows-specific; the reporter just happened to hit it there first. The fix is a deterministic, model-independent stop path: the continuation prompt now teaches the agent two terminal sentinels — <<HERMES_GOAL_DONE: one-sentence reason>> <<HERMES_GOAL_BLOCKED: one-sentence reason>> — and ``evaluate_after_turn`` checks the assistant's last response for one *before* calling the judge. When the sentinel is on the final non-blank line, the loop halts immediately with the agent's own reason surfaced to the user (``✓ Goal achieved: …`` or ``✓ Goal stopped (agent blocked): …``). Anchoring on the last line keeps prompt-echoing models from false-positive-stopping when they quote the instruction back to themselves mid-reasoning. Three small pieces wired together: * ``GOAL_DONE_SENTINEL`` / ``GOAL_BLOCKED_SENTINEL`` constants + ``_STOP_SENTINEL_RE`` (case-insensitive, optional reason group so bare ``<<HERMES_GOAL_DONE>>`` still trips with a fallback reason). * ``_detect_goal_stop_sentinel(response)`` returns ``(kind, reason)`` only when the sentinel is the last non-blank line. * ``evaluate_after_turn`` short-circuits to a "done" verdict on detection, resets the consecutive-parse-failures counter (sentinel emission means we got a real reply, so stale judge-parse failures must not auto-pause the next turn), and persists status="done". The CONTINUATION_PROMPT_TEMPLATE and CONTINUATION_PROMPT_WITH_ SUBGOALS_TEMPLATE both share a single ``_STOP_INSTRUCTION_LINE`` so the subgoals path stays in lock-step with the plain path. The judge fail-open semantics, parse-failure auto-pause, turn-budget backstop, and ``/goal pause`` / ``resume`` / ``clear`` controls are all untouched — sentinel handling is purely additive.
NousResearch#29090) Tightens ``_detect_goal_stop_sentinel`` from ``.search(last_line)`` to ``.fullmatch(stripped)``. Without this guard, a one-line response that quotes the sentinel inline while continuing to work — e.g. "Per the instructions, I should emit <<HERMES_GOAL_DONE: example>> when I finish. Right now I'm still working — running tests next." — would short-circuit the loop even though the agent explicitly said they're still working. The prompt contract is "the sentinel must be the LAST non-blank line in your response", and ``fullmatch`` is what actually enforces that contract: the entire final non-blank line, after stripping surrounding whitespace, must be the sentinel and nothing else. Multi-line responses ending on a clean sentinel line still trip — that's the supported shape — and bare ``<<HERMES_GOAL_DONE>>`` / ``<<HERMES_GOAL_BLOCKED>>`` (no reason group) still match because the regex makes the reason group optional.
…top path
Two new test classes plus an end-to-end repro of the exact scenario
from the bug report.
* ``TestStopSentinelDetection`` — unit-level coverage of
``_detect_goal_stop_sentinel``:
- happy paths for DONE and BLOCKED, with and without a reason
suffix, mixed case, extra whitespace, trailing newlines;
- rejection of empty/``None`` input, of sentinel-shaped strings in
prose context, and (the prompt-echo guard) of sentinels embedded
inline with surrounding text on the final line;
- ``next_continuation_prompt`` carries the sentinel teaching in
both the plain and ``/subgoal`` paths so the agent actually
knows the contract on every turn.
* ``TestEvaluateAfterTurnSentinel`` — end-to-end behaviour through
``GoalManager.evaluate_after_turn``:
- DONE sentinel short-circuits the judge and lands ``status="done"``;
- BLOCKED sentinel does the same with the user-visible message
flagging "blocked" so the operator knows the agent wants input
instead of misreading it as success;
- the sentinel branch resets ``consecutive_parse_failures`` so a
flaky judge that built up failures before the sentinel landed
doesn't auto-pause the next ``/goal resume``;
- non-sentinel responses still call the judge (regression guard
for the existing path);
- inline sentinels on a multi-text final line are NOT treated as
stops — the judge still runs (the prompt-echo guard,
end-to-end);
- sentinels work alongside ``/subgoal`` criteria;
- sentinels on an inactive goal are inert (the early-return on
``status != 'active'`` wins).
* ``TestRepro29090GibberishGoal`` — full two-turn reproduction of the
reporter's exact scenario (``/goal lsdjflasjdf;ljasdlfja;sldjfalsdjf``):
turn 1 sees a weak judge return "continue" so the continuation
prompt fires; turn 2 sees the agent emit
``<<HERMES_GOAL_BLOCKED: …>>`` and the loop halts at 2 turns used
instead of 20. Asserts the continuation prompt carries the
sentinel teaching (otherwise the agent couldn't emit it) and that
the judge is never called on turn 2.
…ch#29090) Adds an "Agent self-attestation (stop sentinels)" subsection to ``goals.md`` covering ``<<HERMES_GOAL_DONE: …>>`` and ``<<HERMES_GOAL_BLOCKED: …>>``: why they exist (gibberish or otherwise unverifiable goals where weak judges hedge with ``continue`` and burn the turn budget), how to use them (must be the entire final non-blank line of the reply), and what the user sees on the surface (``✓ Goal achieved`` vs ``✓ Goal stopped (agent blocked)``). Updates the "When the judge gets it wrong" section to point at the sentinel as the first line of defense against false-negative judges, with the turn budget still backing it up.
|
Autofix update: I reproduced the Supply Chain Audit failure. The workflow scans I could not push directly to Verification on that branch:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Closes the "[Bug]: /goal keeps triggering after the goal is completed" loop reported in #29090. The reporter typed
/goal lsdjflasjdf;ljasdlfja;sldjfalsdjfon Windows and watched↻ Continuing toward goal (N/20)fire for the full turn budget despite the agent itself stating the task was complete.Root cause is not Windows-specific — the reporter just hit it there first. For a gibberish or otherwise unverifiable goal, the agent's reply is almost always "I don't understand / can't proceed", which a strict judge per its system prompt is supposed to treat as
DONE-with-reason-blocked. Weak judge models (the cheap fastgoal_judgeoverrides users wire up) routinely fail that rule and hedge withcontinue, so the loop spam-queues continuation prompts at every turn until the 20-turn budget runs out. This is the same weak-judge class tracked in #27585.Fix is a deterministic, model-independent stop path — the agent itself attests "I'm done" via a sentinel and the loop halts before the judge runs. Three pieces wired together:
Continuation prompt teaches the contract —
CONTINUATION_PROMPT_TEMPLATEandCONTINUATION_PROMPT_WITH_SUBGOALS_TEMPLATEnow share a single_STOP_INSTRUCTION_LINEblock teaching the agent two terminal sentinels:With explicit semantics —
DONEfor "complete or trivially unachievable (typos, gibberish, contradictions)",BLOCKEDfor "I genuinely need user input to make progress".Detector —
_detect_goal_stop_sentinel(response)returns(kind, reason)when the sentinel is the entire final non-blank line of the reply (usingre.fullmatchafter stripping surrounding whitespace). Anchoring on the final-line + full-match shape keeps prompt-echoing models from false-positive-stopping when they quote the instruction back to themselves mid-reasoning, and it matches the prompt contract "the sentinel must be the LAST non-blank line in your response".evaluate_after_turnshort-circuit — checks the sentinel before callingjudge_goal, persistsstatus="done", resetsconsecutive_parse_failures(sentinel emission means we got a real reply, so stale judge-parse failures must not auto-pause the next/goal resume), and surfaces a clear user-visible message:✓ Goal achieved: <reason>forDONE✓ Goal stopped (agent blocked): <reason>forBLOCKED(so operators know the agent wants input rather than misreading it as success).Reproduction shape, post-fix: turn 1 the weak judge still hedges with
continue, so the continuation prompt fires; turn 2 the agent — now taught the sentinel by that very prompt — emits<<HERMES_GOAL_BLOCKED: …>>on its last line, the loop halts at 2 turns used instead of 20, and the judge never runs on turn 2. The reporter's exact scenario is locked in byTestRepro29090GibberishGoal.Judge fail-open semantics, the parse-failure auto-pause, the turn-budget backstop, and the
/goal pause/resume/clearcontrols are all untouched — sentinel handling is purely additive. Works identically on CLI and every gateway platform becausegoals.pyis shared.Related Issue
Fixes #29090
Type of Change
Changes Made
hermes_cli/goals.py— newGOAL_DONE_SENTINEL/GOAL_BLOCKED_SENTINELconstants,_STOP_INSTRUCTION_LINEshared between both continuation-prompt templates,_STOP_SENTINEL_RE(case-insensitive, optional reason group),_detect_goal_stop_sentinel(response)helper anchored on the final non-blank line withfullmatchafter whitespace strip, and the short-circuit branch at the top ofevaluate_after_turnthat landsstatus="done"and resets the parse-failure counter.tests/hermes_cli/test_goals.py— 23 new tests across three classes:TestStopSentinelDetection— unit-level happy paths (DONE/BLOCKED with and without reason, mixed case, whitespace, trailing newlines) and rejections (None/empty, prose-only, mid-line echoes, inline echoes on the final line), plus the continuation prompt carries the sentinel teaching in both the plain and/subgoalpaths.TestEvaluateAfterTurnSentinel— end-to-end throughGoalManager.evaluate_after_turn: DONE short-circuits the judge, BLOCKED surfaces the right message,consecutive_parse_failuresresets, non-sentinel responses still call the judge (regression guard), inline sentinels on a one-line response do NOT short-circuit (prompt-echo guard end-to-end), works alongside/subgoalcriteria, inert on inactive goals.TestRepro29090GibberishGoal— full two-turn reproduction of the reporter's exact scenario (/goal lsdjflasjdf;ljasdlfja;sldjfalsdjf): turn 1 weak judge sayscontinueand queues the continuation, turn 2 agent emits<<HERMES_GOAL_BLOCKED: …>>and the loop halts at 2 turns used (turns_used < max_turns) without consulting the judge.website/docs/user-guide/features/goals.md— new "Agent self-attestation (stop sentinels)" subsection explaining when and why the sentinels exist, the exact wire format, and how the surface message differs betweenDONEandBLOCKED; "When the judge gets it wrong" now points at the sentinel as the first line of defense against false-negative judges (turn budget still backs it up).Backwards compatible:
_stateschema unchanged, no new config keys, no system-prompt mutation, prompt-cache invariants preserved (the sentinel teaching lives in the user-role continuation prompt, not the system prompt). All existing goal tests across CLI / TUI / gateway continue to pass unchanged.How to Test
Manual / behavioural verification with the reporter's input:
Two turns used, loop halted cleanly, no
/goal clearneeded. Same input pre-fix burned all 20 turns before auto-pausing on budget.Checklist
fix(goal):,test(goal):,docs(goal):)hermes_cli/gateway/tui_gateway/cli(97 total)website/docs/user-guide/features/goals.md