Skip to content

fix(evidence): allow rephrased todos to pass complete_step verification#4013

Merged
esengine merged 1 commit into
esengine:main-v2from
JesonChou:fix/3992-todo-completion-checkmark
Jun 11, 2026
Merged

fix(evidence): allow rephrased todos to pass complete_step verification#4013
esengine merged 1 commit into
esengine:main-v2from
JesonChou:fix/3992-todo-completion-checkmark

Conversation

@JesonChou

Copy link
Copy Markdown
Contributor

Fixes #3992

Problem

When the model rephrases a todo item's content between successive todo_write calls (e.g., "Analyze the bug" → "Analyze the bug and understand root cause"), hasSuccessfulCompleteStepForTodo rejects the completion. The sameTodoMatch exact-text guard fails, and the bare continue on L592 prevents the index-based fallback (matchTodoStep) from ever running. The todo_write call is rejected by verifyTodoCompletionTransitions, so the frontend never sees the completed status — checkmarks don't appear (#3992).

Fix

Replace the continue with a content-overlap guard (todoContentRelates /
textOverlaps):

  • When old and new text share a recognisable substring (case-folded, normalized, ≥6 runes on the shorter side — same discipline as stepTextContains from fix(evidence): tolerate citation drift when matching complete_step to todos #4006), the receipt falls through to the index-based fallback.
  • When they are unrelated ("Add parser" → "Ship parser"), the original block is preserved (continue still fires).

textOverlaps uses normalizeStepText (introduced in #4006) for consistency with the matching layer.

Two new helpers:

  • todoContentRelates — gates the fallback on related task identity
  • textOverlaps — normalized substring overlap with ≥6-rune minimum

Test

  • TestLedgerNumericCompleteStepAuthorizesRephrasedTodo — covers the happy path now enabled (rephrased-but-same-task authorized)
  • All 11 existing evidence tests pass unchanged
  • All 33 builtin todo/complete_step tests pass unchanged
  • TestLedgerNumericCompleteStepDoesNotAuthorizeReplacedTodo still blocks genuinely different tasks

Files

File Δ
internal/evidence/evidence.go +35/−1
internal/evidence/evidence_test.go +39

In `hasSuccessfulCompleteStepForTodo`, when a `complete_step` receipt carries
a `TodoStep` snapshot whose content no longer exactly matches the current todo
text (the model rephrases between successive `todo_write` calls), the bare
`continue` unconditionally blocked the index-based fallback
(`matchTodoStep`).  This caused legitimate rephrased todos to be flagged as
missing a completion, rejecting the `todo_write` call, and preventing the
frontend from showing checkmarks (closes esengine#3992).

Replace the `continue` with a content-overlap guard
(`todoContentRelates`/`textOverlaps`): when old and new text share a substring
(case-insensitive), the receipt falls through to the index/text fallback;
otherwise the original block is preserved for genuinely different tasks.

Add `TestLedgerNumericCompleteStepAuthorizesRephrasedTodo` to cover the
rephrased-but-same-task happy path.

All 11 evidence tests and 33 builtin tests pass with zero regressions.
@github-actions github-actions Bot added the v2 Go rewrite (1.x) — main-v2 branch, active development label Jun 11, 2026
@esengine esengine merged commit 0869750 into esengine:main-v2 Jun 11, 2026
14 checks passed
@esengine

Copy link
Copy Markdown
Owner

Thanks @JesonChou — clean diagnosis of the bare continue swallowing the index fallback, and the overlap guard keeps replaced todos from reusing an old receipt. Merged. I'll fold textOverlaps into the existing stepTextContains and tidy the bounds guard in a small follow-up.

esengine added a commit that referenced this pull request Jun 11, 2026
…unds guard (#4027)

textOverlaps duplicated stepTextContains verbatim — fold it into a normalize-then-contains one-liner. Hoist the index bounds check so both current[index-1] accesses share one guard instead of one inline and one bare. Follow-up to #4013; behavior unchanged (evidence tests pass).

Co-authored-by: reasonix <reasonix@deepseek.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

v2 Go rewrite (1.x) — main-v2 branch, active development

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: 生成的待办没有跟随完成进度勾选已完成的待办

2 participants