/goal auto-continuation can be amplified by preflight compression/session split and resurrect stale task state

## Summary

A Discord `/goal` task completed a narrowly scoped file-editing request and produced a final answer saying the requested scope was done. Immediately afterward, the goal judge returned `continue`. That synthetic continuation raced with automatic skill-library maintenance and preflight context compression/session split.

The agent then resumed work beyond the explicit scope, edited additional content, and compression preserved/reintroduced a stale active task list. The user had to interrupt and ask why the agent was still working.

This appears to be a lifecycle/compression amplification of a `/goal` judge false continuation: the judge made a bad `continue` decision, and compression/session split made the stale continuation state durable.

## Expected behavior

1. If the assistant has completed the scoped user request and the todo list is complete, `/goal` should not synthesize a new continuation unless there is an explicit remaining requirement.
2. A git file being `untracked` should not imply the goal is incomplete unless the user asked for staging, commit, push, or clean working tree.
3. Preflight compression/session split should not preserve a synthetic continuation or active task list that contradicts the latest completed final answer.
4. Background maintenance turns should not race with goal continuation/compression against the same session lineage.

## Actual behavior

1. User asked the agent to:
   - modify a documentation file in the repo;
   - add a new empty column to story tables;
   - inspect and annotate only one specified section;
   - not modify the outbox copy;
   - not fix implementation, only annotate.
2. The agent completed exactly that scope and sent a final answer:
   - repo file modified;
   - outbox not modified;
   - only the requested section annotated;
   - checks passed;
   - todo list fully completed.
3. Immediately after the final response, Hermes started an automatic background skill-library maintenance turn.
4. That maintenance turn triggered preflight compression.
5. The goal judge decided `continue`, apparently because the repo file was still `untracked`, although commit/staging was not part of the user request.
6. Gateway injected a synthetic `[Continuing toward your standing goal]` message.
7. A second preflight compression started for this continuation while the previous compression was already in flight.
8. The compressed handoff summary said active task was `None`, but the synthetic continuation still caused the agent to keep working.
9. The agent annotated additional sections beyond the requested one.
10. Later compression/session split preserved an active task list for this extra out-of-scope work.
11. The user interrupted and the assistant acknowledged it had continued beyond scope.

## Sanitized timeline

### 1. Scoped request completed

```text
20:31:57 [S1] Turn ended: reason=text_response(finish_reason=stop)
         api_calls=18/60 budget=14/60 tool_turns=29
         last_msg_role=assistant response_len=1504 session=S1
```

Final answer shape:

```text
Done in the repo file.
The outbox file was not changed.

What was modified:
- Added the new column to the story tables.
- Kept the column empty in other sections.
- Filled/annotated only the requested section.

Checks:
- All rows have the expected columns.
- Only the requested rows have annotations.
- Remaining rows are empty.
- git diff --check passed.
- Git status: file is untracked.
```

Todo state immediately before final answer:

```json
{
  "todos": [
    {"id": "edit-columns", "status": "completed"},
    {"id": "inspect-requested-section", "status": "completed"},
    {"id": "annotate-requested-section", "status": "completed"},
    {"id": "verify", "status": "completed"}
  ],
  "summary": {"pending": 0, "in_progress": 0, "completed": 4}
}
```

### 2. Background maintenance + compression begins

```text
20:31:57 [S1] conversation turn: msg='Review the conversation above and update the skill library. Be ACTIVE ...'
20:31:57 [S1] Preflight compression: ~128,217 tokens >= 122,400 threshold
20:31:57 [S1] context compression started: session=S1 messages=73 tokens=~128,217 model=gpt-5.5 focus=None
20:31:57 [S1] Auxiliary compression: using anthropic (claude-haiku-4-5-20251001)
```

### 3. Goal judge incorrectly continues

```text
20:31:58 hermes_cli.goals: goal judge: verdict=continue reason=A resposta indica que a revisão da seção solicitada foi feita, mas o próprio arquivo do repo ainda está untracked, então não há… [truncated]
```

The `untracked` reasoning is wrong for this task: the user did not ask for `git add`, commit, or push.

### 4. Synthetic continuation races with compression

```text
20:31:58 gateway.run: inbound message: platform=discord user=<user> chat=<redacted> msg='[Continuing toward your standing goal] Goal: ...'
20:31:58 [S1] conversation turn: msg='[User] [Continuing toward your standing goal] Goal: ... Continue working toward this goal. Take the next concrete step...'
20:31:58 [S1] Preflight compression: ~125,517 tokens >= 122,400 threshold
20:31:58 [S1] context compression started: session=S1 messages=73 tokens=~125,517 model=gpt-5.5 focus=None
```

### 5. Compression creates contradictory state

```text
20:32:24 [S1] context compression done: session=S1 messages=73->8 tokens=~31,451
20:32:30 [S1] context compression done: session=S2 messages=73->8 tokens=~29,634
```

Compressed handoff summary later visible in the new session:

```text
[CONTEXT COMPACTION — REFERENCE ONLY]
## Active Task
None. User completed the requested section validation and paused; no new task has been assigned.

## In Progress
None.
```

But the synthetic continuation was also present immediately after that summary.

### 6. Agent resumes out-of-scope work

Instead of stopping, the agent continued into other sections and replaced the todo list with new out-of-scope work:

```json
{
  "todos": [
    {"id": "inspect-later-sections", "status": "in_progress"},
    {"id": "annotate-later-sections", "status": "pending"},
    {"id": "verify-later-sections", "status": "pending"}
  ]
}
```

### 7. Later compression preserves the wrong active task list

```text
20:35:37 [S2] context compression started: session=S2 messages=76 tokens=~122,917 model=gpt-5.5 focus=None
20:36:04 [S2] context compression done: session=S3 messages=76->77 tokens=~75,345
```

New session contained:

```text
[Your active task list was preserved across context compression]
- [>] inspect-later-sections ... (in_progress)
- [ ] annotate-later-sections ... (pending)
- [ ] verify-later-sections ... (pending)
```

### 8. User interruption

```text
20:36:26 [S3] Turn ended: reason=interrupted_during_api_call
20:36:26 gateway.run: Session split detected: S1 → S3 (compression)
```

The assistant then acknowledged:

```text
The original request was already completed.
I continued beyond the scope and started inspecting/annotating later sections. That was my error.
```

## Impact

- Assistant continued after a completed final response.
- Extra file edits were made outside the requested scope.
- A synthetic continuation contradicted the compressed summary saying there was no active task.
- Compression/session split preserved a wrong active task list.
- User had to interrupt to stop out-of-scope work.

## Suspected contributing causes

1. Goal judge treated `git status: untracked` as evidence of incompletion even though staging/commit was not required.
2. Goal continuation and background skill-library maintenance can both start turns against the same session lineage.
3. Preflight compression can serialize synthetic continuation state while the latest assistant answer says the task is complete.
4. Active todo state created after an invalid synthetic continuation can be preserved across compression.
5. There may be missing consumed/ack state for synthetic continuation/recovery messages.

## Proposed fixes / invariants

1. Do not infer “goal incomplete” from untracked/modified files unless the goal explicitly required staging/commit/push or clean working tree.
2. If the latest assistant answer reports completion and todo state has no pending/in-progress items, require a stronger reason before queuing `/goal` continuation.
3. Serialize/lock session turns during preflight compression so background maintenance and goal continuation cannot both compress/use the same parent history concurrently.
4. When compression creates a child session, reconcile synthetic `/goal` continuations against the compressed `Active Task` summary and latest final answer.
5. Do not preserve active todo state created by a synthetic continuation if the premise conflicts with a completed final answer.
6. Add consumed-state markers for synthetic continuation/recovery messages so stale inferred work is not replayed as fresh work.

## Related issues

- #25242 — gateway auto-continue note can be persisted/amplified by interrupt-triggered preflight compression. Closest lifecycle analogue: synthetic note/state becomes durable session poison.
- #25921 — gateway can reuse parent-sized history after compression split, causing repeated preflight compression. Related compression/session publication problem.
- #20293 — context compaction + session split injects compressed summary as valid history. Related state/handoff boundary issue.
- #9631 — iterative compaction keeps completed topics alive and overrides current topic. Related stale-completed-work resurrection.
- #20250 — prompt lifecycle can remain in-flight after repeated compression timeout. Related long-session/compression lifecycle failure.
- #18467 / #33618 — `/goal` state and session-id migration across compression. Related `/goal` + compression lifecycle.
- #27585 — `/goal` can continue after terminal-ish answers when judge handling goes wrong.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

/goal auto-continuation can be amplified by preflight compression/session split and resurrect stale task state #34197

Summary

Expected behavior

Actual behavior

Sanitized timeline

1. Scoped request completed

2. Background maintenance + compression begins

3. Goal judge incorrectly continues

4. Synthetic continuation races with compression

5. Compression creates contradictory state

6. Agent resumes out-of-scope work

7. Later compression preserves the wrong active task list

8. User interruption

Impact

Suspected contributing causes

Proposed fixes / invariants

Related issues

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

/goal auto-continuation can be amplified by preflight compression/session split and resurrect stale task state #34197

Description

Summary

Expected behavior

Actual behavior

Sanitized timeline

1. Scoped request completed

2. Background maintenance + compression begins

3. Goal judge incorrectly continues

4. Synthetic continuation races with compression

5. Compression creates contradictory state

6. Agent resumes out-of-scope work

7. Later compression preserves the wrong active task list

8. User interruption

Impact

Suspected contributing causes

Proposed fixes / invariants

Related issues

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions