Skip to content

fix(evidence): tolerate citation drift when matching complete_step to todos#4006

Merged
esengine merged 2 commits into
main-v2from
fix/todo-step-matching
Jun 11, 2026
Merged

fix(evidence): tolerate citation drift when matching complete_step to todos#4006
esengine merged 2 commits into
main-v2from
fix/todo-step-matching

Conversation

@esengine

Copy link
Copy Markdown
Owner

Problem

Discussion #3970: the plan completes but the todo list never updates, and in the worst case the model loops on complete_step retries — each one rejected with step "Phase 5:脚本编辑与执行代码" has no matching todo_write item in this turn — quietly burning tokens.

Root cause is exactly what the reporter guessed: matchTodoStep required byte-exact (case-folded) equality between the cited step and a todo's text. Any drift — fullwidth cited back as halfwidth : , an extra space, a wording subset — could never match, and nothing in the error told the model how to fix its citation, so it retried the same string forever. #3982 fixed this disease for command citations; the todo-text limb still had it.

Fix

  • Normalize before comparing: fullwidth ASCII forms → halfwidth, whitespace dropped, case-folded. The reported case (Phase 5: vs Phase 5: ) now matches.
  • Unique-containment fallback: a citation that is a substring of exactly one todo (or vice versa, shorter side ≥ 6 runes) resolves to it; if it would match two todos, it stays unmatched rather than guessing.
  • Actionable rejection: the error now lists this turn's todos (cite a todo verbatim or by number: 1) "…", 2) "…"), the same receipts-in-error pattern that let models self-correct within 1–2 turns in fix(evidence): stop rejecting real complete_step verifications over command-string drift #3982's A/B run.

Tests

  • Verbatim 计划已经完成,但是待办列表没有更新 #3970 shape: fullwidth-colon todo cited with halfwidth colon, plus case/whitespace/U+3000/subset/fullwidth-digit-index variants.
  • Negative cases: unrelated text, too-short containment, ambiguous shared-prefix citations stay unmatched while exact ones still resolve.
  • todoInventory lists todos and degrades cleanly on an empty ledger.

go test ./internal/evidence/... ./internal/tool/builtin/... passes.

Addresses the remaining half of discussion #3970; #3982 covered the command-citation half.

… todos

The todo-step matcher demanded byte-exact (case-folded) equality between
complete_step.step and a todo's text, so a fullwidth/halfwidth colon or
whitespace drift ("Phase 5:…" cited as "Phase 5: …") could never match
and the model looped on "no matching todo_write item" retries, burning
tokens (discussion #3970). Same disease #3982 cured for command
citations, different limb.

Normalize both sides (fullwidth ASCII → halfwidth, whitespace dropped,
case-folded) before comparing, fall back to unique substring containment
(≥6 runes; ambiguous citations stay unmatched), and list this turn's
todos in the rejection so the model can self-correct by verbatim content
or index instead of guessing.
@esengine esengine requested a review from SivanCola as a code owner June 11, 2026 08:50
@github-actions github-actions Bot added v2 Go rewrite (1.x) — main-v2 branch, active development skills Skill system (internal/skill, internal/tool) labels Jun 11, 2026
@esengine esengine merged commit 0250cbc into main-v2 Jun 11, 2026
13 checks passed
@esengine esengine deleted the fix/todo-step-matching branch June 11, 2026 08:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

skills Skill system (internal/skill, internal/tool) v2 Go rewrite (1.x) — main-v2 branch, active development

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant