Skip to content

fix: add afterTurn dedup guard to prevent duplicate ingestion on gateway restart#246

Merged
jalehman merged 2 commits into
Martian-Engineering:mainfrom
liu51115:fix/afterTurn-dedup
Apr 3, 2026
Merged

fix: add afterTurn dedup guard to prevent duplicate ingestion on gateway restart#246
jalehman merged 2 commits into
Martian-Engineering:mainfrom
liu51115:fix/afterTurn-dedup

Conversation

@liu51115

@liu51115 liu51115 commented Apr 3, 2026

Copy link
Copy Markdown
Contributor

Fixes #245

Problem

After a gateway restart, afterTurn receives the full session history (rebuilt from .jsonl) but treats all messages as new. ingestBatch inserts them as duplicates — we observed 2,331 duplicates in 4 seconds in production, inflating the session to 780k tokens and making it unresponsive.

reconcileSessionTail (bootstrap path) has dedup logic, but afterTurn bypasses it entirely.

Fix

Add deduplicateAfterTurnBatch() to afterTurn, called before ingestBatch:

  • Fast path (normal operation): checks if the last DB message identity appears in the incoming batch. If not → all new, return unchanged. One getLastMessage call + O(n) scan. Zero overhead for normal turns.
  • Slow path (post-restart replay): walks backward through the batch using occurrence counting to find the exact anchor point where DB history ends. Same approach as reconcileSessionTail. Returns only genuinely new messages after the anchor.

False-positive protection: if a user legitimately sends the same content twice, the occurrence counting distinguishes the new message from the historical one.

Tests

8 test cases covering:

  • New session (no prior conversation)
  • Normal afterTurn (no restart, genuinely new messages)
  • Full restart replay (all duplicates → empty ingest)
  • Mixed old + new after restart (dedup old, ingest new)
  • Empty batch after slicing
  • Repeated identical content (empty tool results) with occurrence counting
  • Single genuinely new message
  • False-positive protection (user sends same content twice)

Changes

  • src/engine.ts: +85 lines (new deduplicateAfterTurnBatch method + call site in afterTurn)
  • test/engine.test.ts: +234 lines (8 test cases)

All 474 tests pass.

Claw Liu and others added 2 commits April 3, 2026 09:28
…way restart

When the gateway restarts, OpenClaw rebuilds the session from the .jsonl
file and passes the full message history to afterTurn. Since
prePromptMessageCount only covers the system prompt, LCM treats all
historical messages as new. ingestBatch has no dedup check and blindly
inserts duplicates.

This caused a production incident: 2,331 duplicate messages ingested in
4 seconds after two gateway restarts, inflating the session to 780k
tokens and triggering cascading compaction that made the session
unresponsive.

Fix: Add deduplicateAfterTurnBatch() to afterTurn, called before
ingestBatch. Uses a two-path approach:
- Fast path (normal operation): O(n) scan checks if the last DB message
  identity appears in the batch. If not, all messages are new.
- Slow path (post-restart replay): walks backward through the batch
  using occurrence counting to find the exact anchor point where DB
  history ends, returns only genuinely new messages.

Includes 8 test cases covering: new sessions, normal afterTurn, full
restart replay, mixed old+new after restart, empty batches, repeated
identical content with occurrence counting, single new messages, and
false-positive protection (user legitimately sending the same content
twice).
Use conservative replay detection in afterTurn so we only trim an incoming
batch when it begins with the exact stored transcript for the session.
This preserves legitimate repeated first messages in normal turns while
still deduplicating the full-history replay that happens after gateway
restart.

Add a regression test covering a repeated first new message, and keep the
existing restart replay cases green.

Regeneration-Prompt: |
  Update the afterTurn dedup guard added for gateway restart replay so it
  does not silently drop a legitimate first new message when that message
  repeats earlier content. Preserve the restart-replay protection, keep the
  change isolated to afterTurn dedup behavior, and add regression coverage
  for the repeated-first-message case alongside the existing replay tests.
  Prefer a conservative detection rule over a heuristic that can corrupt the
  stored transcript during normal operation.
@jalehman

jalehman commented Apr 3, 2026

Copy link
Copy Markdown
Contributor

Thank you!

@jalehman jalehman merged commit 33ad257 into Martian-Engineering:main Apr 3, 2026
1 check passed
100yenadmin pushed a commit to electricsheephq/lossless-claw-test that referenced this pull request Apr 3, 2026
Improves the replay dedup introduced in Martian-Engineering#246 with two fixes:

1. Replace hasMessage() fast-path with aligned-tail boundary check.
   The old approach checks if batch[0] exists *anywhere* in the DB,
   which false-positives on legitimate repeated first messages (e.g.
   user sends 'hello' again). The new check verifies the DB's last
   message aligns with the exact replay boundary position in the
   incoming batch.

2. Run dedup on newMessages before prepending autoCompactionSummary.
   The merged Martian-Engineering#246 deduplicates the full ingestBatch including the
   synthetic summary, which can interfere with replay detection when
   the summary content matches historical messages.

Both changes are conservative: any mismatch falls through to the
existing full ordered-prefix proof, and mismatches always preserve
the batch unchanged (no data loss on false negatives).
jalehman added a commit that referenced this pull request Apr 3, 2026
* fix: harden afterTurn dedup guard against false-positive drops

Improves the replay dedup introduced in #246 with two fixes:

1. Replace hasMessage() fast-path with aligned-tail boundary check.
   The old approach checks if batch[0] exists *anywhere* in the DB,
   which false-positives on legitimate repeated first messages (e.g.
   user sends 'hello' again). The new check verifies the DB's last
   message aligns with the exact replay boundary position in the
   incoming batch.

2. Run dedup on newMessages before prepending autoCompactionSummary.
   The merged #246 deduplicates the full ingestBatch including the
   synthetic summary, which can interfere with replay detection when
   the summary content matches historical messages.

Both changes are conservative: any mismatch falls through to the
existing full ordered-prefix proof, and mismatches always preserve
the batch unchanged (no data loss on false negatives).

* fix: repair afterTurn dedup ingest batch

Fix the follow-up replay dedup change so afterTurn passes the constructed ingest batch into ingestBatch instead of referencing a removed variable. Add a regression test covering restart replay when auto-compaction summary text is prepended, and include a patch changeset for release notes.

Regeneration-Prompt: |
  Review PR 257 in lossless-claw and fix the blocking typo left in the
  afterTurn replay-dedup follow-up. Preserve the aligned-tail replay
  detection approach, keep the fix additive, and avoid changing unrelated
  behavior. Add targeted regression coverage for the summary-prepend edge
  case that the PR description calls out, then add a patch changeset so the
  data-loss hardening lands in release notes. Validate with the repo's
  existing vitest binary from the main checkout because the PR worktree does
  not have its own node_modules.

---------

Co-authored-by: Eva <eva@100yen.org>
Co-authored-by: Josh Lehman <josh@martian.engineering>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] afterTurn re-ingests full session history as duplicates after gateway restart

2 participants