fix(evidence): allow rephrased todos to pass complete_step verification#4013
Merged
esengine merged 1 commit intoJun 11, 2026
Merged
Conversation
In `hasSuccessfulCompleteStepForTodo`, when a `complete_step` receipt carries a `TodoStep` snapshot whose content no longer exactly matches the current todo text (the model rephrases between successive `todo_write` calls), the bare `continue` unconditionally blocked the index-based fallback (`matchTodoStep`). This caused legitimate rephrased todos to be flagged as missing a completion, rejecting the `todo_write` call, and preventing the frontend from showing checkmarks (closes esengine#3992). Replace the `continue` with a content-overlap guard (`todoContentRelates`/`textOverlaps`): when old and new text share a substring (case-insensitive), the receipt falls through to the index/text fallback; otherwise the original block is preserved for genuinely different tasks. Add `TestLedgerNumericCompleteStepAuthorizesRephrasedTodo` to cover the rephrased-but-same-task happy path. All 11 evidence tests and 33 builtin tests pass with zero regressions.
Owner
|
Thanks @JesonChou — clean diagnosis of the bare |
esengine
added a commit
that referenced
this pull request
Jun 11, 2026
…unds guard (#4027) textOverlaps duplicated stepTextContains verbatim — fold it into a normalize-then-contains one-liner. Hoist the index bounds check so both current[index-1] accesses share one guard instead of one inline and one bare. Follow-up to #4013; behavior unchanged (evidence tests pass). Co-authored-by: reasonix <reasonix@deepseek.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #3992
Problem
When the model rephrases a todo item's
contentbetween successivetodo_writecalls (e.g., "Analyze the bug" → "Analyze the bug and understand root cause"),hasSuccessfulCompleteStepForTodorejects the completion. ThesameTodoMatchexact-text guard fails, and the barecontinueon L592 prevents the index-based fallback (matchTodoStep) from ever running. Thetodo_writecall is rejected byverifyTodoCompletionTransitions, so the frontend never sees the completed status — checkmarks don't appear (#3992).Fix
Replace the
continuewith a content-overlap guard (todoContentRelates/textOverlaps):stepTextContainsfrom fix(evidence): tolerate citation drift when matching complete_step to todos #4006), the receipt falls through to the index-based fallback.continuestill fires).textOverlapsusesnormalizeStepText(introduced in #4006) for consistency with the matching layer.Two new helpers:
todoContentRelates— gates the fallback on related task identitytextOverlaps— normalized substring overlap with ≥6-rune minimumTest
TestLedgerNumericCompleteStepAuthorizesRephrasedTodo— covers the happy path now enabled (rephrased-but-same-task authorized)evidencetests pass unchangedbuiltintodo/complete_step tests pass unchangedTestLedgerNumericCompleteStepDoesNotAuthorizeReplacedTodostill blocks genuinely different tasksFiles
internal/evidence/evidence.gointernal/evidence/evidence_test.go