Skip to content

fix(agent): close complete_step cross-turn evidence + loop gaps#4014

Merged
esengine merged 1 commit into
main-v2from
fix/complete-step-cross-turn-loop
Jun 11, 2026
Merged

fix(agent): close complete_step cross-turn evidence + loop gaps#4014
esengine merged 1 commit into
main-v2from
fix/complete-step-cross-turn-loop

Conversation

@esengine

Copy link
Copy Markdown
Owner

Why

complete_step's enforcement is bound to a per-turn evidence ledger (a.evidence.Reset() runs at the top of every Run). Two consequences users have hit:

  1. Cross-turn evidence is rejected. Command evidence already falls back to scanning the full session (fix(agent): reduce evidence verification false signals #3587), but diff/files (path) evidence did not — so citing a file you edited in an earlier turn (or before compaction) failed with "no matching writer receipt in this turn".
  2. The loop doesn't close across turns/compaction. The final-answer gate reads only this turn's todo_write, so once the structured list is gone (a later turn, or compacted into a summary) a premature "all done" slips through — the unclosed loop in [Bug]: todo_write / complete_step 失败后,Agent 仍允许最终回答“全部完成” #2917.

What

  • diff/files session fallback (mirrors fix(agent): reduce evidence verification false signals #3587). PathsProvenInSession resolves a path against any successful (non-errored) write/read tool call in the transcript; verifyStepEvidence consults it when the per-turn ledger misses. Fabricated paths are still rejected.
  • Host-side canonical todo list. Agent.todoState is the latest todo_write with completions applied, rebuilt from the session on load/rewind (latest todo_write + replayed complete_steps). It never rides in the prompt, so it survives turn boundaries and compaction. The final gate falls back to it when a turn did work without a todo_write, so a premature "all done" is blocked until the plan is genuinely finished — bounded by the existing maxFinalReadinessBlocks.
  • complete_step advances the list. A successful sign-off marks the matching step completed, promotes the next, records a synthetic todo_write for the in-turn gate, and emits an event so the panel updates. The model is told it no longer needs a todo_write to mark completions — which removes the manual batch-completion step that [Bug] Agent批量提交complete_step导致todo_write校验失败——违反串行工作模式 #3909 tripped over (planApprovedMessage / executor handoff / tool copy updated to match).

All host-side, zero added tokens.

Tests

Unit: cross-turn diff/files fallback, canonical gate fallback, advance + promote, rebuild (incl. skipping failed complete_steps).

End-to-end (real complete_step/todo_write builtins driven through Agent.Run across turns):

  • serial plan, host auto-advances, no batch todo_write, final answer allowed
  • cd-prefixed command drift accepted in-turn
  • cross-turn diff evidence accepted via session fallback (and an unbacked path still rejected — asserted at the tool-result level, not just Run exit)
  • cross-turn canonical gate blocks a premature "all done", then clears once the steps are actually signed off

go test ./... green; gofmt/go vet clean.

Closes #2917

The evidence ledger resets every turn, so a complete_step citing work from
an earlier turn (or after compaction) was rejected, and the final-answer
gate — reading only this turn's todo_write — let a stale plan slip past
with an "all done".

- diff/files evidence now falls back to the full session like command
  evidence already did (#3587): a cross-turn citation of a written or read
  file is honored, fabrication still rejected.
- the host keeps a canonical todo list (latest todo_write + replayed
  complete_steps, rebuilt on session load/rewind) that survives turn
  boundaries and compaction; the final gate consults it when the turn did
  work without a todo_write, so a premature "all done" is blocked until the
  plan is actually finished.
- a successful complete_step advances that list (marks the step done,
  promotes the next, records it for the in-turn gate) and the model is told
  it no longer needs a todo_write to mark completions — removing the
  batch-completion step #3909 hit.

Adds unit + end-to-end coverage (serial host-advance, command drift,
cross-turn diff via session, cross-turn gate block-then-clear, rejection).

Closes #2917
@esengine esengine requested a review from SivanCola as a code owner June 11, 2026 10:09
@github-actions github-actions Bot added v2 Go rewrite (1.x) — main-v2 branch, active development skills Skill system (internal/skill, internal/tool) agent Core agent loop (internal/agent, internal/control) labels Jun 11, 2026
@esengine esengine merged commit 6dee96c into main-v2 Jun 11, 2026
14 checks passed
@esengine esengine deleted the fix/complete-step-cross-turn-loop branch June 11, 2026 10:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent Core agent loop (internal/agent, internal/control) skills Skill system (internal/skill, internal/tool) v2 Go rewrite (1.x) — main-v2 branch, active development

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: todo_write / complete_step 失败后,Agent 仍允许最终回答“全部完成”

1 participant