fix(agent): close complete_step cross-turn evidence + loop gaps#4014
Merged
Conversation
The evidence ledger resets every turn, so a complete_step citing work from an earlier turn (or after compaction) was rejected, and the final-answer gate — reading only this turn's todo_write — let a stale plan slip past with an "all done". - diff/files evidence now falls back to the full session like command evidence already did (#3587): a cross-turn citation of a written or read file is honored, fabrication still rejected. - the host keeps a canonical todo list (latest todo_write + replayed complete_steps, rebuilt on session load/rewind) that survives turn boundaries and compaction; the final gate consults it when the turn did work without a todo_write, so a premature "all done" is blocked until the plan is actually finished. - a successful complete_step advances that list (marks the step done, promotes the next, records it for the in-turn gate) and the model is told it no longer needs a todo_write to mark completions — removing the batch-completion step #3909 hit. Adds unit + end-to-end coverage (serial host-advance, command drift, cross-turn diff via session, cross-turn gate block-then-clear, rejection). Closes #2917
This was referenced Jun 12, 2026
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
complete_step's enforcement is bound to a per-turn evidence ledger (a.evidence.Reset()runs at the top of everyRun). Two consequences users have hit:todo_write, so once the structured list is gone (a later turn, or compacted into a summary) a premature "all done" slips through — the unclosed loop in [Bug]: todo_write / complete_step 失败后,Agent 仍允许最终回答“全部完成” #2917.What
PathsProvenInSessionresolves a path against any successful (non-errored) write/read tool call in the transcript;verifyStepEvidenceconsults it when the per-turn ledger misses. Fabricated paths are still rejected.Agent.todoStateis the latesttodo_writewith completions applied, rebuilt from the session on load/rewind (latesttodo_write+ replayedcomplete_steps). It never rides in the prompt, so it survives turn boundaries and compaction. The final gate falls back to it when a turn did work without atodo_write, so a premature "all done" is blocked until the plan is genuinely finished — bounded by the existingmaxFinalReadinessBlocks.complete_stepadvances the list. A successful sign-off marks the matching step completed, promotes the next, records a synthetictodo_writefor the in-turn gate, and emits an event so the panel updates. The model is told it no longer needs atodo_writeto mark completions — which removes the manual batch-completion step that [Bug] Agent批量提交complete_step导致todo_write校验失败——违反串行工作模式 #3909 tripped over (planApprovedMessage / executor handoff / tool copy updated to match).All host-side, zero added tokens.
Tests
Unit: cross-turn diff/files fallback, canonical gate fallback, advance + promote, rebuild (incl. skipping failed
complete_steps).End-to-end (real
complete_step/todo_writebuiltins driven throughAgent.Runacross turns):todo_write, final answer allowedRunexit)go test ./...green;gofmt/go vetclean.Closes #2917