fix(evidence): tolerate citation drift when matching complete_step to todos by esengine · Pull Request #4006 · esengine/DeepSeek-Reasonix

esengine · 2026-06-11T08:50:29Z

Problem

Discussion #3970: the plan completes but the todo list never updates, and in the worst case the model loops on complete_step retries — each one rejected with step "Phase 5：脚本编辑与执行代码" has no matching todo_write item in this turn — quietly burning tokens.

Root cause is exactly what the reporter guessed: matchTodoStep required byte-exact (case-folded) equality between the cited step and a todo's text. Any drift — fullwidth ： cited back as halfwidth : , an extra space, a wording subset — could never match, and nothing in the error told the model how to fix its citation, so it retried the same string forever. #3982 fixed this disease for command citations; the todo-text limb still had it.

Fix

Normalize before comparing: fullwidth ASCII forms → halfwidth, whitespace dropped, case-folded. The reported case (Phase 5： vs Phase 5: ) now matches.
Unique-containment fallback: a citation that is a substring of exactly one todo (or vice versa, shorter side ≥ 6 runes) resolves to it; if it would match two todos, it stays unmatched rather than guessing.
Actionable rejection: the error now lists this turn's todos (cite a todo verbatim or by number: 1) "…", 2) "…"), the same receipts-in-error pattern that let models self-correct within 1–2 turns in fix(evidence): stop rejecting real complete_step verifications over command-string drift #3982's A/B run.

Tests

Verbatim 计划已经完成，但是待办列表没有更新 #3970 shape: fullwidth-colon todo cited with halfwidth colon, plus case/whitespace/U+3000/subset/fullwidth-digit-index variants.
Negative cases: unrelated text, too-short containment, ambiguous shared-prefix citations stay unmatched while exact ones still resolve.
todoInventory lists todos and degrades cleanly on an empty ledger.

go test ./internal/evidence/... ./internal/tool/builtin/... passes.

Addresses the remaining half of discussion #3970; #3982 covered the command-citation half.

… todos The todo-step matcher demanded byte-exact (case-folded) equality between complete_step.step and a todo's text, so a fullwidth/halfwidth colon or whitespace drift ("Phase 5：…" cited as "Phase 5: …") could never match and the model looped on "no matching todo_write item" retries, burning tokens (discussion #3970). Same disease #3982 cured for command citations, different limb. Normalize both sides (fullwidth ASCII → halfwidth, whitespace dropped, case-folded) before comparing, fall back to unique substring containment (≥6 runes; ambiguous citations stay unmatched), and list this turn's todos in the rejection so the model can self-correct by verbatim content or index instead of guessing.

esengine requested a review from SivanCola as a code owner June 11, 2026 08:50

github-actions Bot added v2 Go rewrite (1.x) — main-v2 branch, active development skills Skill system (internal/skill, internal/tool) labels Jun 11, 2026

style: gofmt evidence_test (CJK-width map alignment)

082a08f

esengine merged commit 0250cbc into main-v2 Jun 11, 2026
13 checks passed

esengine deleted the fix/todo-step-matching branch June 11, 2026 08:57

JesonChou mentioned this pull request Jun 11, 2026

fix(evidence): allow rephrased todos to pass complete_step verification #4013

Merged

esengine mentioned this pull request Jun 11, 2026

refactor(evidence): reuse stepTextContains for todo overlap, hoist bounds guard #4027

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(evidence): tolerate citation drift when matching complete_step to todos#4006

fix(evidence): tolerate citation drift when matching complete_step to todos#4006
esengine merged 2 commits into
main-v2from
fix/todo-step-matching

esengine commented Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

esengine commented Jun 11, 2026

Problem

Fix

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant