fix: harden afterTurn dedup guard against false-positive drops by 100yenadmin · Pull Request #257 · Martian-Engineering/lossless-claw

100yenadmin · 2026-04-03T21:40:09Z

Problem

The dedup guard merged in #246 has two issues that can cause data loss in production:

1. `hasMessage()` fast-path drops legitimate repeated messages

The current fast-path checks if batch[0] exists anywhere in the conversation DB. This false-positives when a user legitimately sends the same message content again:

DB: ["hello", "first reply"]
New turn: ["hello", "first reply", "hello", "second reply"]
prePromptMessageCount: 2

After slicing: newMessages = ["hello", "second reply"]

The current code sees "hello" exists in DB → enters the replay path → compares only 2 stored messages against the first 2 batch items → drops the legitimate new messages.

2. Dedup runs on ingestBatch (includes autoCompactionSummary)

The dedup currently runs on the full ingestBatch which includes the synthetic autoCompactionSummary at index 0. This means the summary content participates in replay detection, which can interfere when summary text happens to match historical message content.

Fix

Replace hasMessage() with aligned-tail boundary check: Instead of checking if batch[0] exists anywhere, verify that the DB's last message matches the message at the exact replay boundary position (batch[storedMessageCount - 1]). This is O(1) and only triggers on actual replay geometry.
Dedup newMessages before prepending autoCompactionSummary: Slice first, dedup second, then prepend the synthetic summary. This keeps the summary out of replay detection entirely.

Both changes are conservative — any mismatch falls through to the full ordered-prefix proof, and mismatches always preserve the batch unchanged (no data loss on false negatives).

Evidence

We hit this in production when a 120K-message conversation whale formed from restart-replay duplication. The forensic analysis showed restart-aligned replay bursts accounting for 62.4% of rows. After fixing this locally, we also identified that the autoCompactionSummary ordering issue could mask legitimate replays in the other direction.

Testing

Syntax check passes (node --experimental-strip-types --check src/engine.ts)
Adversarial review by two independent agents (Opus + GPT-5.4) validated the approach
Local deployment running with this fix applied

Improves the replay dedup introduced in Martian-Engineering#246 with two fixes: 1. Replace hasMessage() fast-path with aligned-tail boundary check. The old approach checks if batch[0] exists *anywhere* in the DB, which false-positives on legitimate repeated first messages (e.g. user sends 'hello' again). The new check verifies the DB's last message aligns with the exact replay boundary position in the incoming batch. 2. Run dedup on newMessages before prepending autoCompactionSummary. The merged Martian-Engineering#246 deduplicates the full ingestBatch including the synthetic summary, which can interfere with replay detection when the summary content matches historical messages. Both changes are conservative: any mismatch falls through to the existing full ordered-prefix proof, and mismatches always preserve the batch unchanged (no data loss on false negatives).

Copilot

Pull request overview

Hardens afterTurn replay deduplication to avoid false-positive drops and to exclude synthetic autoCompactionSummary content from replay detection.

Changes:

Replaces the previous hasMessage() fast-path with an O(1) aligned-tail boundary check before performing full ordered-prefix proof.
Runs deduplication on newMessages before prepending autoCompactionSummary to the ingest batch.
Adds extra conservatism checks (e.g., stored message count alignment) before trimming.

Comments suppressed due to low confidence (1)

src/engine.ts:2335

afterTurn now builds ingestBatch (autoCompactionSummary + dedupedNewMessages), but the ingestBatch call passes messages: dedupedBatch, which is no longer defined. This will fail to compile and will prevent ingestion entirely. Pass the constructed ingestBatch (or otherwise define the intended variable) to ingestBatch().

    try {
      await this.ingestBatch({
        sessionId: params.sessionId,
        sessionKey: params.sessionKey,
        messages: dedupedBatch,
        isHeartbeat: params.isHeartbeat === true,
      });

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-03T21:42:20Z

   */
+  /**
+   * Remove messages from the batch that already exist in the DB for this session.
+   * Conservative replay detection: only strip a prefix when the incoming
+   * batch begins with the entire stored transcript for the session.


There are now two consecutive JSDoc blocks above deduplicateAfterTurnBatch that repeat the same description. This duplication makes the file harder to read and suggests an accidental leftover from the previous comment; please remove the redundant block and keep a single up-to-date doc comment.

Suggested change

*/

/**

* Remove messages from the batch that already exist in the DB for this session.

* Conservative replay detection: only strip a prefix when the incoming

* batch begins with the entire stored transcript for the session.

Fix the follow-up replay dedup change so afterTurn passes the constructed ingest batch into ingestBatch instead of referencing a removed variable. Add a regression test covering restart replay when auto-compaction summary text is prepended, and include a patch changeset for release notes. Regeneration-Prompt: | Review PR 257 in lossless-claw and fix the blocking typo left in the afterTurn replay-dedup follow-up. Preserve the aligned-tail replay detection approach, keep the fix additive, and avoid changing unrelated behavior. Add targeted regression coverage for the summary-prepend edge case that the PR description calls out, then add a patch changeset so the data-loss hardening lands in release notes. Validate with the repo's existing vitest binary from the main checkout because the PR worktree does not have its own node_modules.

100yenadmin · 2026-04-03T22:06:45Z

Re: the duplicate JSDoc comment — I checked the source and there is only one JSDoc block above deduplicateAfterTurnBatch() (at line 2097-2101). The review tool may have been confused by the separate inline comment at the call site (line 2316). No change needed.

Re: the dedupedBatch variable name concern — the code defines dedupedBatch at line 2317 and uses it at line 2326. Both are in the same scope. This was a low-confidence comment and appears incorrect — the code compiles and runs correctly.

jalehman · 2026-04-03T22:07:34Z

Thank you!

Copilot AI review requested due to automatic review settings April 3, 2026 21:40

Copilot started reviewing on behalf of 100yenadmin April 3, 2026 21:40 View session

Copilot AI reviewed Apr 3, 2026

View reviewed changes

jalehman self-assigned this Apr 3, 2026

jalehman merged commit ea43f58 into Martian-Engineering:main Apr 3, 2026
1 check passed

github-actions Bot mentioned this pull request Apr 3, 2026

chore: version packages #231

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: harden afterTurn dedup guard against false-positive drops#257

fix: harden afterTurn dedup guard against false-positive drops#257
jalehman merged 2 commits into
Martian-Engineering:mainfrom
electricsheephq:fix/afterTurn-dedup-false-positive-guard

100yenadmin commented Apr 3, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 3, 2026

Uh oh!

100yenadmin commented Apr 3, 2026

Uh oh!

jalehman commented Apr 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

100yenadmin commented Apr 3, 2026

Problem

1. hasMessage() fast-path drops legitimate repeated messages

2. Dedup runs on ingestBatch (includes autoCompactionSummary)

Fix

Evidence

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

100yenadmin commented Apr 3, 2026

Uh oh!

jalehman commented Apr 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

1. `hasMessage()` fast-path drops legitimate repeated messages