Skip to content

fix(goals): stop /goal over-continuation on exploratory + scope-narrow goals (#34196, #34197)#34343

Open
Bartok9 wants to merge 1 commit into
NousResearch:mainfrom
Bartok9:fix/34196-34197-goal-judge-exploratory-and-untracked
Open

fix(goals): stop /goal over-continuation on exploratory + scope-narrow goals (#34196, #34197)#34343
Bartok9 wants to merge 1 commit into
NousResearch:mainfrom
Bartok9:fix/34196-34197-goal-judge-exploratory-and-untracked

Conversation

@Bartok9

@Bartok9 Bartok9 commented May 29, 2026

Copy link
Copy Markdown
Contributor

Closes #34196
Closes #34197

Problem

Two related /goal bugs reported in the last 6 hours:

#34196 — Goal judge over-continues exploratory goals (review/reflect/suggest) unless the assistant uses a magic phrase like 'goal complete'. The synthetic continuation loop escalates reflection into producing concrete artifacts that the goal only listed as examples of possible help.

#34197 — Goal judge infers 'goal incomplete' from git status: untracked even when staging/commit/push wasn't requested. This races with preflight compression and survives session split, turning a scoped 'done' answer into out-of-scope artifact production.

Both bugs converge on the goal-judge prompt machinery in hermes_cli/goals.py.

Fix (layered + minimal)

  1. Tighten JUDGE_SYSTEM_PROMPT with explicit guardrails:

    • EXPLORATORY goals (review/reflect/suggest/analyze) are completable by a substantive synthesis — do NOT require additional artifacts.
    • Do NOT infer incompletion from untracked / unstaged / uncommitted files unless staging/commit/push was explicitly required.
    • Do NOT require a magic completion phrase.
    • Treat 'for example' / 'maybe' / 'you could' items as illustrative, NOT required deliverables.
    • Scope-narrow goals (one file, one section, one specific change) are DONE when that exact scope is confirmed done — do not expand.
  2. Add transparent keyword classifier _classify_goal_shape(goal) that returns exploratory, illustrative, or concrete. Cheap, reviewer-friendly substring detection — the LLM judge still makes the final call, but it now sees what kind of goal it is.

  3. Append a corresponding goal-shape hint to the user-prompt template when the goal is exploratory or illustrative. Concrete goals get the original strict template unchanged.

  4. With-subgoals template skips the shape hint — the user's explicit /subgoal criteria take precedence over goal-shape heuristics.

Why prompt-level vs new state

This intentionally avoids new GoalState fields, new gateway plumbing, or new compression-lifecycle coupling. The judge is the single point where 'should we continue?' is decided; teaching it to read goal shape is the smallest change that addresses both issues' root cause.

#34197's lifecycle concerns (#2-#6 in the proposed fixes) are real and documented but belong in a separate gateway-side change. This PR fixes the bad 'continue' verdict that triggers the lifecycle problem in the first place — without the bad verdict, the race window in #34197 has nothing to race over.

Tests

17 new tests, all 67 in test_goals.py pass:

  • TestClassifyGoalShape (9 tests) — exploratory/illustrative/concrete classification including the sanitized /goal judge over-continues exploratory goals unless the assistant explicitly says the goal is complete #34196 repro goal text.
  • TestJudgeSystemPromptGuardrails (4 tests) — system prompt mentions exploratory goals, warns about untracked files, warns about magic phrases, warns about illustrative examples.
  • TestJudgePromptIncludesGoalShapeHint (4 tests) — the user prompt gets the shape hint for exploratory/illustrative goals, NOT for concrete goals, and the with-subgoals template skips the hint to preserve its strict per-criterion evidence rule.
$ python -m pytest tests/hermes_cli/test_goals.py -v
=== 67 passed in 0.77s ===

Files changed

File Change
hermes_cli/goals.py +130 / -3 (system prompt + classifier + hint wiring)
tests/hermes_cli/test_goals.py +220 / 0 (3 new test classes, 17 new tests)

🎻 Co-authored-by: Cursor cursoragent@cursor.com

@alt-glitch alt-glitch added type/bug Something isn't working P3 Low — cosmetic, nice to have comp/cli CLI entry point, hermes_cli/, setup wizard labels May 29, 2026
…w goals (NousResearch#34196, NousResearch#34197)

Two related /goal bugs:

(review/reflect/suggest/analyze) unless the assistant uses a magic
phrase like 'goal complete'. The synthetic continuation loop escalates
reflection into producing concrete artifacts that the goal only listed
as *examples* of possible help.

untracked` even when the user did not ask for staging/commit/push.
This races with preflight compression and survives session split,
turning a scoped 'done' answer into out-of-scope artifact production.

Both bugs converge on the goal-judge prompt machinery in
`hermes_cli/goals.py`. The fix is layered, minimal, and reviewable:

1. Tighten JUDGE_SYSTEM_PROMPT with three new explicit guardrails:
   - EXPLORATORY goals (review/reflect/suggest/analyze) are completable
     by a substantive synthesis — do NOT require additional artifacts.
   - Do NOT infer incompletion from untracked / unstaged / uncommitted
     files unless the goal explicitly required staging/commit/push.
   - Do NOT require a magic phrase like 'goal complete'.
   - Treat 'for example' / 'maybe' / 'you could' items as illustrative,
     NOT as required deliverables.
   - Scope-narrow goals (one file, one section, one specific change)
     are DONE when that exact scope is confirmed done — do not expand.

2. Add a transparent keyword classifier `_classify_goal_shape(goal)`
   that returns 'exploratory', 'illustrative', or 'concrete'. Cheap,
   reviewer-friendly substring detection — the LLM judge still makes
   the final DONE/CONTINUE call, but it now sees what kind of goal it
   is. Kept intentionally simple so behaviour is easy to audit and
   tune from issue feedback.

3. Append a corresponding goal-shape hint to the user-prompt template
   when the goal is exploratory or illustrative. The hint reminds the
   judge that for those shapes, a high-quality synthesis IS the
   deliverable. Concrete goals get the original strict template
   unchanged.

4. The with-subgoals template (already enforces strict per-criterion
   evidence) deliberately does NOT receive the shape hint — the
   user's explicit /subgoal criteria take precedence over goal-shape
   heuristics.

Why prompt-level vs adding new state:

This intentionally avoids adding new GoalState fields, new gateway
plumbing, or new compression-lifecycle coupling. The goal judge is the
single point where 'should we continue?' is decided; teaching it to
read goal shape correctly is the smallest change that addresses both
issues' root cause without touching the compression race window
described in NousResearch#34197 #2-#5. Those lifecycle concerns are real and
documented in the issue's 'Proposed fixes #3-#6' — they belong in a
separate gateway-side change. This PR fixes the judge's bad
'continue' verdict that triggers the lifecycle problem in the first
place. Without the bad verdict, the race window in NousResearch#34197 has nothing
to race over.

Tests (17 new, all 67 in test_goals.py pass):
- TestClassifyGoalShape (9 tests): exploratory/illustrative/concrete
  classification including the sanitized NousResearch#34196 repro goal text.
- TestJudgeSystemPromptGuardrails (4 tests): system prompt mentions
  exploratory goals, warns about untracked files, warns about
  requiring magic phrases, warns about illustrative examples.
- TestJudgePromptIncludesGoalShapeHint (4 tests): the user prompt
  receives the shape hint for exploratory/illustrative goals, does
  NOT for concrete goals, and the with-subgoals template skips the
  hint to preserve its strict per-criterion evidence rule.

Refs: NousResearch#34196 NousResearch#34197
Closes: NousResearch#34196 NousResearch#34197

Co-authored-by: Cursor <cursoragent@cursor.com>
@Bartok9 Bartok9 force-pushed the fix/34196-34197-goal-judge-exploratory-and-untracked branch from bba108a to d93a019 Compare June 6, 2026 07:29
@Bartok9

Bartok9 commented Jun 6, 2026

Copy link
Copy Markdown
Contributor Author

Rebased onto current main to clear the conflict.

Conflict & resolution:

  • hermes_cli/goals.py — single conflict in the __all__ export list. main added KANBAN_GOAL_CONTINUATION_TEMPLATE / KANBAN_GOAL_FINALIZE_TEMPLATE (kanban goal-loop work); this PR added JUDGE_SYSTEM_PROMPT. Kept all three — verified each symbol is actually defined in the module so no export dangles.
  • tests/hermes_cli/test_goals.py auto-merged cleanly (this PR's 210 lines of exploratory/scope-narrow judge tests appended without overlap).

Verification:

  • pytest tests/hermes_cli/test_goals.py — 67 passed (full file, incl. this PR's new tests + main's kanban tests).
  • ruff check on both files — clean.
  • python -c 'import ast; ast.parse(...)' syntax check passed.
  • Now MERGEABLE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/cli CLI entry point, hermes_cli/, setup wizard P3 Low — cosmetic, nice to have type/bug Something isn't working

Projects

None yet

2 participants