fix(evidence): allow rephrased todos to pass complete_step verification by JesonChou · Pull Request #4013 · esengine/DeepSeek-Reasonix

JesonChou · 2026-06-11T09:51:25Z

Problem

When the model rephrases a todo item's content between successive todo_write calls (e.g., "Analyze the bug" → "Analyze the bug and understand root cause"), hasSuccessfulCompleteStepForTodo rejects the completion. The sameTodoMatch exact-text guard fails, and the bare continue on L592 prevents the index-based fallback (matchTodoStep) from ever running. The todo_write call is rejected by verifyTodoCompletionTransitions, so the frontend never sees the completed status — checkmarks don't appear (#3992).

Fix

Replace the continue with a content-overlap guard (todoContentRelates /
textOverlaps):

When old and new text share a recognisable substring (case-folded, normalized, ≥6 runes on the shorter side — same discipline as stepTextContains from fix(evidence): tolerate citation drift when matching complete_step to todos #4006), the receipt falls through to the index-based fallback.
When they are unrelated ("Add parser" → "Ship parser"), the original block is preserved (continue still fires).

textOverlaps uses normalizeStepText (introduced in #4006) for consistency with the matching layer.

Two new helpers:

todoContentRelates — gates the fallback on related task identity
textOverlaps — normalized substring overlap with ≥6-rune minimum

Test

TestLedgerNumericCompleteStepAuthorizesRephrasedTodo — covers the happy path now enabled (rephrased-but-same-task authorized)
All 11 existing evidence tests pass unchanged
All 33 builtin todo/complete_step tests pass unchanged
TestLedgerNumericCompleteStepDoesNotAuthorizeReplacedTodo still blocks genuinely different tasks

Files

File	Δ
`internal/evidence/evidence.go`	+35/−1
`internal/evidence/evidence_test.go`	+39

In `hasSuccessfulCompleteStepForTodo`, when a `complete_step` receipt carries a `TodoStep` snapshot whose content no longer exactly matches the current todo text (the model rephrases between successive `todo_write` calls), the bare `continue` unconditionally blocked the index-based fallback (`matchTodoStep`). This caused legitimate rephrased todos to be flagged as missing a completion, rejecting the `todo_write` call, and preventing the frontend from showing checkmarks (closes esengine#3992). Replace the `continue` with a content-overlap guard (`todoContentRelates`/`textOverlaps`): when old and new text share a substring (case-insensitive), the receipt falls through to the index/text fallback; otherwise the original block is preserved for genuinely different tasks. Add `TestLedgerNumericCompleteStepAuthorizesRephrasedTodo` to cover the rephrased-but-same-task happy path. All 11 evidence tests and 33 builtin tests pass with zero regressions.

esengine · 2026-06-11T11:12:18Z

Thanks @JesonChou — clean diagnosis of the bare continue swallowing the index fallback, and the overlap guard keeps replaced todos from reusing an old receipt. Merged. I'll fold textOverlaps into the existing stepTextContains and tidy the bounds guard in a small follow-up.

…unds guard (#4027) textOverlaps duplicated stepTextContains verbatim — fold it into a normalize-then-contains one-liner. Hoist the index bounds check so both current[index-1] accesses share one guard instead of one inline and one bare. Follow-up to #4013; behavior unchanged (evidence tests pass). Co-authored-by: reasonix <reasonix@deepseek.com>

JesonChou requested review from SivanCola and esengine as code owners June 11, 2026 09:51

github-actions Bot added the v2 Go rewrite (1.x) — main-v2 branch, active development label Jun 11, 2026

esengine merged commit 0869750 into esengine:main-v2 Jun 11, 2026
14 checks passed

esengine mentioned this pull request Jun 11, 2026

refactor(evidence): reuse stepTextContains for todo overlap, hoist bounds guard #4027

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(evidence): allow rephrased todos to pass complete_step verification#4013

fix(evidence): allow rephrased todos to pass complete_step verification#4013
esengine merged 1 commit into
esengine:main-v2from
JesonChou:fix/3992-todo-completion-checkmark

JesonChou commented Jun 11, 2026

Uh oh!

Uh oh!

esengine commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JesonChou commented Jun 11, 2026

Problem

Fix

Test

Files

Uh oh!

Uh oh!

esengine commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants