/goal judge over-continues exploratory goals unless the assistant explicitly says the goal is complete

## Summary

A Discord `/goal` with an exploratory objective produced a valid synthesis/proposal answer, but the goal judge repeatedly returned `continue` because the assistant did not explicitly state that the goal was complete.

The synthetic continuation loop escalated the task from “review context and reflect on ways to help” into producing multiple concrete artifacts that the user had only mentioned as examples of possible help.

The issue is that `/goal` appears too dependent on explicit completion phrasing. For exploratory goals such as “review / reflect / suggest options”, a high-quality synthesis can satisfy the goal even without a magic phrase like “goal complete”.

## Expected behavior

For exploratory goals, `/goal` should stop when the assistant has reasonably completed the review and produced actionable recommendations, unless the original goal clearly requires concrete follow-up artifacts.

A response should not be judged incomplete solely because it lacks explicit wording such as “the goal is complete”.

## Actual behavior

1. User set a broad exploratory goal:
   - review recent tasks from two work domains;
   - include overdue/today/tomorrow items and task notes;
   - review related chat context;
   - review the user's same-day notes/journal context;
   - reflect on ways the assistant could help;
   - examples included writing tickets, preparing messages, preparing an investigation, or maybe creating Kanban activities.
2. The assistant reviewed the context and produced a large synthesis with a menu of possible ways to help.
3. The goal judge returned `continue` because the answer did not explicitly confirm that the review was complete.
4. Hermes injected repeated synthetic messages:

```text
[Continuing toward your standing goal]
Goal: <original exploratory goal>

Continue working toward this goal. Take the next concrete step. If you believe the goal is complete, state so explicitly and stop.
```

5. The assistant escalated from reflection/proposal into producing multiple concrete deliverables:
   - ticket drafts;
   - dependency/blocker wording;
   - handoff text;
   - messages to third parties;
   - technical brief material.
6. The loop only stopped once the assistant explicitly wrote that the goal was complete.

## Sanitized evidence

Initial user goal shape:

```text
18:28:51 inbound message: msg='observe tasks from two work domains ... overdue/today/tomorrow tasks ... notes ... chat context ... same-day notes/journal context ... reflect on ways you can help, for example writing tickets, preparing a message, or preparing an investigation ... maybe create kanban activities ... or maybe resolve differently.'
```

First complete-enough answer:

```text
18:35:21 Turn ended: reason=text_response(finish_reason=stop)
         response_len=10022
```

Goal judge then continued:

```text
18:35:22 hermes_cli.goals: goal judge: verdict=continue reason=A resposta mostra análise e propostas, mas não confirma explicitamente que a varredura completa foi concluída nem entreg… [truncated]
18:35:28 gateway.run: inbound message: msg='[Continuing toward your standing goal] Goal: observe tasks ...'
```

Repeated continuation pattern:

```text
18:36:16 goal judge: verdict=continue ...
18:36:17 inbound synthetic continuation ...

18:36:44 goal judge: verdict=continue ...
18:36:44 inbound synthetic continuation ...

18:37:30 goal judge: verdict=continue ...
18:37:32 inbound synthetic continuation ...

18:38:17 goal judge: verdict=continue ...
18:38:23 inbound synthetic continuation ...

18:39:22 goal judge: verdict=continue ...
18:39:29 inbound synthetic continuation ...
```

Loop stops only after explicit completion wording:

```text
18:39:38 Turn ended: reason=text_response(finish_reason=stop) response_len=822
18:39:39 hermes_cli.goals: goal judge: verdict=done reason=The agent explicitly states the goal is complete and lists the deliverables produced, so the goal is satisfied.
```

## Why this matters

This can turn planning/reflection goals into unwanted execution. The user may ask the assistant to inspect context and suggest possible help, but `/goal` can keep pushing until the assistant manufactures additional deliverables.

That is especially risky when the “possible help” examples include sending messages, editing files, creating tickets, or performing investigations.

## Suspected cause

The goal judge appears to use explicit completion language as a stronger signal than the actual semantic sufficiency of the response.

This creates a bad incentive: unless the assistant says “the goal is complete”, the loop may keep running even when the useful answer has already been delivered.

## Proposed fixes / invariants

1. Treat exploratory/review/proposal goals as completable by a sufficient synthesis, even without explicit “goal complete” wording.
2. Distinguish examples of possible next actions from required deliverables.
3. Add goal-judge guidance such as:
   - if the user asked to “review/reflect/suggest”, a concrete recommendation list can satisfy the goal;
   - do not require producing every artifact mentioned as an example;
   - do not continue merely to force an explicit completion phrase.
4. Consider making the continuation prompt safer for exploratory goals:
   - “If the previous answer substantially satisfied the review/proposal request, mark complete instead of producing extra artifacts.”

## Related issues

- #26986 — keep persistent goals active when the response explicitly reports incomplete work. This issue is the inverse: do not keep goals active when the response functionally completed an exploratory goal.
- #27585 — `/goal` can spam repeated completion messages when judge errors fail-open to continue. Related because both involve continuation after terminal-ish answers.
- #28649 — gateway `/goal` continuation loop behavior on Telegram/Discord.
- #18467 / #33618 — `/goal` state and session-id/compression lifecycle. Related but not required to reproduce this case.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

/goal judge over-continues exploratory goals unless the assistant explicitly says the goal is complete #34196

Summary

Expected behavior

Actual behavior

Sanitized evidence

Why this matters

Suspected cause

Proposed fixes / invariants

Related issues

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

/goal judge over-continues exploratory goals unless the assistant explicitly says the goal is complete #34196

Description

Summary

Expected behavior

Actual behavior

Sanitized evidence

Why this matters

Suspected cause

Proposed fixes / invariants

Related issues

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions