fix(goals): stop /goal over-continuation on exploratory + scope-narrow goals (#34196, #34197) by Bartok9 · Pull Request #34343 · NousResearch/hermes-agent

Bartok9 · 2026-05-29T04:03:54Z

Problem

Two related /goal bugs reported in the last 6 hours:

#34196 — Goal judge over-continues exploratory goals (review/reflect/suggest) unless the assistant uses a magic phrase like 'goal complete'. The synthetic continuation loop escalates reflection into producing concrete artifacts that the goal only listed as examples of possible help.

#34197 — Goal judge infers 'goal incomplete' from git status: untracked even when staging/commit/push wasn't requested. This races with preflight compression and survives session split, turning a scoped 'done' answer into out-of-scope artifact production.

Both bugs converge on the goal-judge prompt machinery in hermes_cli/goals.py.

Fix (layered + minimal)

Tighten JUDGE_SYSTEM_PROMPT with explicit guardrails:
- EXPLORATORY goals (review/reflect/suggest/analyze) are completable by a substantive synthesis — do NOT require additional artifacts.
- Do NOT infer incompletion from untracked / unstaged / uncommitted files unless staging/commit/push was explicitly required.
- Do NOT require a magic completion phrase.
- Treat 'for example' / 'maybe' / 'you could' items as illustrative, NOT required deliverables.
- Scope-narrow goals (one file, one section, one specific change) are DONE when that exact scope is confirmed done — do not expand.
Add transparent keyword classifier _classify_goal_shape(goal) that returns exploratory, illustrative, or concrete. Cheap, reviewer-friendly substring detection — the LLM judge still makes the final call, but it now sees what kind of goal it is.
Append a corresponding goal-shape hint to the user-prompt template when the goal is exploratory or illustrative. Concrete goals get the original strict template unchanged.
With-subgoals template skips the shape hint — the user's explicit /subgoal criteria take precedence over goal-shape heuristics.

Why prompt-level vs new state

This intentionally avoids new GoalState fields, new gateway plumbing, or new compression-lifecycle coupling. The judge is the single point where 'should we continue?' is decided; teaching it to read goal shape is the smallest change that addresses both issues' root cause.

#34197's lifecycle concerns (#2-#6 in the proposed fixes) are real and documented but belong in a separate gateway-side change. This PR fixes the bad 'continue' verdict that triggers the lifecycle problem in the first place — without the bad verdict, the race window in #34197 has nothing to race over.

Tests

17 new tests, all 67 in test_goals.py pass:

TestClassifyGoalShape (9 tests) — exploratory/illustrative/concrete classification including the sanitized /goal judge over-continues exploratory goals unless the assistant explicitly says the goal is complete #34196 repro goal text.
TestJudgeSystemPromptGuardrails (4 tests) — system prompt mentions exploratory goals, warns about untracked files, warns about magic phrases, warns about illustrative examples.
TestJudgePromptIncludesGoalShapeHint (4 tests) — the user prompt gets the shape hint for exploratory/illustrative goals, NOT for concrete goals, and the with-subgoals template skips the hint to preserve its strict per-criterion evidence rule.

$ python -m pytest tests/hermes_cli/test_goals.py -v
=== 67 passed in 0.77s ===

Files changed

File	Change
`hermes_cli/goals.py`	+130 / -3 (system prompt + classifier + hint wiring)
`tests/hermes_cli/test_goals.py`	+220 / 0 (3 new test classes, 17 new tests)

🎻 Co-authored-by: Cursor cursoragent@cursor.com

…w goals (NousResearch#34196, NousResearch#34197) Two related /goal bugs: (review/reflect/suggest/analyze) unless the assistant uses a magic phrase like 'goal complete'. The synthetic continuation loop escalates reflection into producing concrete artifacts that the goal only listed as *examples* of possible help. untracked` even when the user did not ask for staging/commit/push. This races with preflight compression and survives session split, turning a scoped 'done' answer into out-of-scope artifact production. Both bugs converge on the goal-judge prompt machinery in `hermes_cli/goals.py`. The fix is layered, minimal, and reviewable: 1. Tighten JUDGE_SYSTEM_PROMPT with three new explicit guardrails: - EXPLORATORY goals (review/reflect/suggest/analyze) are completable by a substantive synthesis — do NOT require additional artifacts. - Do NOT infer incompletion from untracked / unstaged / uncommitted files unless the goal explicitly required staging/commit/push. - Do NOT require a magic phrase like 'goal complete'. - Treat 'for example' / 'maybe' / 'you could' items as illustrative, NOT as required deliverables. - Scope-narrow goals (one file, one section, one specific change) are DONE when that exact scope is confirmed done — do not expand. 2. Add a transparent keyword classifier `_classify_goal_shape(goal)` that returns 'exploratory', 'illustrative', or 'concrete'. Cheap, reviewer-friendly substring detection — the LLM judge still makes the final DONE/CONTINUE call, but it now sees what kind of goal it is. Kept intentionally simple so behaviour is easy to audit and tune from issue feedback. 3. Append a corresponding goal-shape hint to the user-prompt template when the goal is exploratory or illustrative. The hint reminds the judge that for those shapes, a high-quality synthesis IS the deliverable. Concrete goals get the original strict template unchanged. 4. The with-subgoals template (already enforces strict per-criterion evidence) deliberately does NOT receive the shape hint — the user's explicit /subgoal criteria take precedence over goal-shape heuristics. Why prompt-level vs adding new state: This intentionally avoids adding new GoalState fields, new gateway plumbing, or new compression-lifecycle coupling. The goal judge is the single point where 'should we continue?' is decided; teaching it to read goal shape correctly is the smallest change that addresses both issues' root cause without touching the compression race window described in NousResearch#34197 #2-#5. Those lifecycle concerns are real and documented in the issue's 'Proposed fixes #3-#6' — they belong in a separate gateway-side change. This PR fixes the judge's bad 'continue' verdict that triggers the lifecycle problem in the first place. Without the bad verdict, the race window in NousResearch#34197 has nothing to race over. Tests (17 new, all 67 in test_goals.py pass): - TestClassifyGoalShape (9 tests): exploratory/illustrative/concrete classification including the sanitized NousResearch#34196 repro goal text. - TestJudgeSystemPromptGuardrails (4 tests): system prompt mentions exploratory goals, warns about untracked files, warns about requiring magic phrases, warns about illustrative examples. - TestJudgePromptIncludesGoalShapeHint (4 tests): the user prompt receives the shape hint for exploratory/illustrative goals, does NOT for concrete goals, and the with-subgoals template skips the hint to preserve its strict per-criterion evidence rule. Refs: NousResearch#34196 NousResearch#34197 Closes: NousResearch#34196 NousResearch#34197 Co-authored-by: Cursor <cursoragent@cursor.com>

Bartok9 · 2026-06-06T07:29:44Z

Rebased onto current main to clear the conflict.

Conflict & resolution:

hermes_cli/goals.py — single conflict in the __all__ export list. main added KANBAN_GOAL_CONTINUATION_TEMPLATE / KANBAN_GOAL_FINALIZE_TEMPLATE (kanban goal-loop work); this PR added JUDGE_SYSTEM_PROMPT. Kept all three — verified each symbol is actually defined in the module so no export dangles.
tests/hermes_cli/test_goals.py auto-merged cleanly (this PR's 210 lines of exploratory/scope-narrow judge tests appended without overlap).

Verification:

pytest tests/hermes_cli/test_goals.py — 67 passed (full file, incl. this PR's new tests + main's kanban tests).
ruff check on both files — clean.
python -c 'import ast; ast.parse(...)' syntax check passed.
Now MERGEABLE.

alt-glitch added type/bug Something isn't working P3 Low — cosmetic, nice to have comp/cli CLI entry point, hermes_cli/, setup wizard labels May 29, 2026

Bartok9 force-pushed the fix/34196-34197-goal-judge-exploratory-and-untracked branch from bba108a to d93a019 Compare June 6, 2026 07:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(goals): stop /goal over-continuation on exploratory + scope-narrow goals (#34196, #34197)#34343

fix(goals): stop /goal over-continuation on exploratory + scope-narrow goals (#34196, #34197)#34343
Bartok9 wants to merge 1 commit into
NousResearch:mainfrom
Bartok9:fix/34196-34197-goal-judge-exploratory-and-untracked

Bartok9 commented May 29, 2026

Uh oh!

Bartok9 commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Bartok9 commented May 29, 2026

Problem

Fix (layered + minimal)

Why prompt-level vs new state

Tests

Files changed

Uh oh!

Bartok9 commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants