fix(evidence): tolerate citation drift when matching complete_step to todos#4006
Merged
Conversation
… todos
The todo-step matcher demanded byte-exact (case-folded) equality between
complete_step.step and a todo's text, so a fullwidth/halfwidth colon or
whitespace drift ("Phase 5:…" cited as "Phase 5: …") could never match
and the model looped on "no matching todo_write item" retries, burning
tokens (discussion #3970). Same disease #3982 cured for command
citations, different limb.
Normalize both sides (fullwidth ASCII → halfwidth, whitespace dropped,
case-folded) before comparing, fall back to unique substring containment
(≥6 runes; ambiguous citations stay unmatched), and list this turn's
todos in the rejection so the model can self-correct by verbatim content
or index instead of guessing.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Discussion #3970: the plan completes but the todo list never updates, and in the worst case the model loops on
complete_stepretries — each one rejected withstep "Phase 5:脚本编辑与执行代码" has no matching todo_write item in this turn— quietly burning tokens.Root cause is exactly what the reporter guessed:
matchTodoSteprequired byte-exact (case-folded) equality between the cited step and a todo's text. Any drift — fullwidth:cited back as halfwidth:, an extra space, a wording subset — could never match, and nothing in the error told the model how to fix its citation, so it retried the same string forever. #3982 fixed this disease for command citations; the todo-text limb still had it.Fix
Phase 5:vsPhase 5:) now matches.cite a todo verbatim or by number: 1) "…", 2) "…"), the same receipts-in-error pattern that let models self-correct within 1–2 turns in fix(evidence): stop rejecting real complete_step verifications over command-string drift #3982's A/B run.Tests
todoInventorylists todos and degrades cleanly on an empty ledger.go test ./internal/evidence/... ./internal/tool/builtin/...passes.Addresses the remaining half of discussion #3970; #3982 covered the command-citation half.