Skip to content

fix(memory): memoryFlush fires every compaction cycle instead of every other#51421

Merged
obviyus merged 2 commits into
openclaw:mainfrom
Kaspre:fix/memoryflush-skip-every-other
May 8, 2026
Merged

fix(memory): memoryFlush fires every compaction cycle instead of every other#51421
obviyus merged 2 commits into
openclaw:mainfrom
Kaspre:fix/memoryflush-skip-every-other

Conversation

@Kaspre

@Kaspre Kaspre commented Mar 21, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Fixes memoryFlush only firing on alternating compaction cycles
  • One-line behavioral change: stop reassigning memoryFlushCompactionCount to the post-increment value

Root cause

After a memoryFlush during compaction, memoryFlushCompactionCount is reassigned to the post-increment compactionCount (line 536 of agent-runner-memory.ts). This locks both counters to the same value. On the next cycle, the dedup gate at reply-state.ts sees memoryFlushCompactionCount === compactionCount and skips the flush. Only after a regular compaction (without flush) increments compactionCount alone do the values differ again, allowing the next flush.

The result: flush, skip, flush, skip — memoryFlush fires on every other compaction instead of every one.

Fix

Remove the reassignment. memoryFlushCompactionCount stays at the pre-increment value (N). After incrementCompactionCount(), compactionCount becomes N+1. Next cycle, N !== N+1, flush fires. The counter is then updated through the normal session store update path.

-      const nextCount = await incrementCompactionCount({
+      await incrementCompactionCount({
         sessionEntry: activeSessionEntry,
         sessionStore: activeSessionStore,
         sessionKey: params.sessionKey,
         storePath: params.storePath,
       });
-      if (typeof nextCount === "number") {
-        memoryFlushCompactionCount = nextCount;
-      }
+      // Do NOT reassign memoryFlushCompactionCount to the post-increment value.
+      // Keeping it at the pre-increment value ensures the next compaction cycle
+      // sees different counters, allowing memoryFlush to fire every cycle
+      // instead of every other. See #12590.

Prior art

This issue has been reported and patch-attempted multiple times:

We've been running this fix as a local dist patch since early March 2026 with no issues.

Fixes #12590.

Real behavior proof

  • Behavior or issue addressed: memoryFlush silently skips every other auto-compaction ([Bug]: memoryFlush does not fire reliably #12590). After flush+compaction, memoryFlushCompactionCount is reassigned to the post-increment compactionCount, which makes the next cycle's dedup check memoryFlushCompactionCount === compactionCount true → flush is skipped → on the cycle after, the counter has drifted again → flush fires. Result: ~50% miss rate on durable note writes.

  • Real environment tested: Local OpenClaw v2026.5.6 install on Linux 6.6.87.2 / WSL2 / Node v25.8.2, gateway running as systemd user service (pid 270001, active). Patch has been carried across v2026.3.8 → v2026.4.15 → v2026.4.24 → v2026.5.6 since 2026-03-08 with no regressions in any version.

  • Exact steps or command run after this patch: applied this PR's one-line change (drop the if (typeof nextCount === "number") { memoryFlushCompactionCount = nextCount; } block in dist/agent-runner.runtime-Ew7ojxcl.js, line 2065 of the v2026.5.6 bundle), restarted the gateway (systemctl --user restart openclaw-gateway), and verified by inspecting both the source-level fault and the patched bundle.

  • Evidence after fix:

    Before patch — bug still live in v2026.5.6 source (src/auto-reply/reply/agent-runner-memory.ts:979-981):

    $ git show upstream/main:src/auto-reply/reply/agent-runner-memory.ts | sed -n '979,981p'
          if (typeof nextCount === "number") {
            memoryFlushCompactionCount = nextCount;
          }

    After patch — this PR's change applied to the v2026.5.6 bundle on my install:

    $ grep -B 1 -A 4 "FIX #12590" \
        ~/.nvm/versions/node/v25.8.2/lib/node_modules/openclaw/dist/agent-runner.runtime-Ew7ojxcl.js
            });
          }
          // FIX #12590: Do NOT reassign memoryFlushCompactionCount to post-increment value.
          // incrementCompactionCount returns the NEW count after increment (compactionCount+1),
          // which equals memoryFlushCompactionCount after this flush succeeds. Reassigning to
          // nextCount makes the NEXT flush check see (memoryFlushCompactionCount === compactionCount)
          // and skip, causing skip-every-other behavior.
    
    $ openclaw status | grep -E "Update|Gateway service"
    │ Update               │ pnpm · up to date · npm latest 2026.5.6                                  │
    │ Gateway service      │ systemd user installed · enabled · running (pid 270001, state active)    │
  • Observed result after fix: With the reassignment removed, memoryFlushCompactionCount stays at its pre-increment value while incrementCompactionCount advances compactionCount to N+1. The next compaction cycle's dedup check sees N !== N+1 and the flush fires instead of being skipped. The gateway has run continuously across four version upgrades with this patch carried forward, including session reloads, manual /restart cycles, and multi-day-long sessions; no regressions or new errors in the gateway log have surfaced that trace back to this change.

  • What was not tested: I have not constructed an end-to-end repro that forces ≥ 3 consecutive auto-compactions in one synthetic test session and asserts memoryFlush firing on every cycle (rather than alternating). The compaction trigger is token-threshold-based and would require either a very large prompt loop or a test seam that lets you bypass the threshold. The pre-existing reply-state.test.ts coverage of memoryFlushCompactionCount is left to vouch for that behavior at the unit level.

Test plan

  • Local patch running in production since 2026-03-08, no regressions
  • Existing reply-state.test.ts tests for memoryFlushCompactionCount logic pass
  • Change is minimal — removes a reassignment, adds a comment
  • Real behavior proof attached above (v2026.5.6 bundle, unpatched and patched)

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com

@greptile-apps

greptile-apps Bot commented Mar 21, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes a subtle off-by-one bug where memoryFlush was only triggering on every other compaction cycle instead of every cycle. The one-line behavioral change is correct and well-reasoned.

How the bug worked: After a successful flush + compaction, incrementCompactionCount advanced compactionCount from N to N+1, and the old code immediately reassigned the local memoryFlushCompactionCount to N+1 as well. When the flush metadata (memoryFlushCompactionCount = N+1) was then persisted to the session store, hasAlreadyFlushedForCurrentCompaction would see N+1 === N+1 on the very next cycle and skip the flush — producing the flush/skip/flush/skip pattern described in the PR.

Why the fix is correct: By not reassigning memoryFlushCompactionCount, it retains the pre-increment value (N). After incrementCompactionCount, the session store holds compactionCount = N+1, while the persisted memoryFlushCompactionCount = N. On every subsequent cycle N ≠ N+1, so hasAlreadyFlushedForCurrentCompaction returns false and the flush fires as intended. The counter advances correctly on each cycle:

  • Flush at N: persists memoryFlushCompactionCount = N, store has compactionCount = N+1
  • Flush at N+1: persists memoryFlushCompactionCount = N+1, store has compactionCount = N+2
  • And so on — flush fires every cycle ✓

Key observations:

  • The explanatory comment added alongside the removal is clear and references the original issue ([Bug]: memoryFlush does not fire reliably #12590), which will help future maintainers.
  • The incrementCompactionCount return value (Promise<number | undefined>) is now implicitly discarded, which is intentional and valid TypeScript — no lint/compile issue.
  • No other callers of incrementCompactionCount are affected by this change.
  • The existing hasAlreadyFlushedForCurrentCompaction dedup guard in memory-flush.ts still correctly prevents double-flushes within the same cycle (e.g. if called more than once before a compaction completes).

Confidence Score: 5/5

  • This PR is safe to merge — it makes a minimal, well-understood one-line behavioral fix with a clear root-cause analysis and no risk of regression.
  • The change is a single removal of a post-increment reassignment. The root cause is accurately described, the fix is logically verified against hasAlreadyFlushedForCurrentCompaction in memory-flush.ts and incrementCompactionCount in session-updates.ts, existing tests pass, and it has been running in production since 2026-03-08 without regressions. No side effects were found on other callers.
  • No files require special attention.

Last reviewed commit: "fix(memory): memoryF..."

@Kaspre

Kaspre commented Mar 21, 2026

Copy link
Copy Markdown
Contributor Author

Updated with improvements from prior PRs:

Thanks to @lailoo, @Br1an67, and @dial481 for keeping this issue alive and contributing fixes. This PR combines the cleanest elements from the prior attempts.

@Kaspre Kaspre force-pushed the fix/memoryflush-skip-every-other branch 2 times, most recently from 56d0810 to d19afe3 Compare March 23, 2026 21:16
@Kaspre

Kaspre commented Mar 23, 2026

Copy link
Copy Markdown
Contributor Author

Rebased on main (v2026.3.22). Conflict in agent-runner-memory.ts resolved: kept upstream's new session refresh + queue logic while preserving the fix (don't reassign memoryFlushCompactionCount to post-increment value). The linter also upgraded letconst since the variable is no longer reassigned.

@Kaspre

Kaspre commented Mar 25, 2026

Copy link
Copy Markdown
Contributor Author

Branch rebased onto latest main, resolving merge conflicts in agent-runner-memory.ts. The core fix is preserved: memoryFlushCompactionCount is declared as const (not reassigned after increment), ensuring memoryFlush fires every compaction cycle instead of every other.

Requesting re-review.

@Kaspre Kaspre force-pushed the fix/memoryflush-skip-every-other branch from 3aeb747 to 0782639 Compare March 25, 2026 21:31
@Kaspre

Kaspre commented Mar 25, 2026

Copy link
Copy Markdown
Contributor Author

Local test note: src/auto-reply/reply/reply-state.test.ts has one failing test (incrementCompactionCount > falls back to the derived transcript path when rewritten absolute sessionFile is unsafe), but the same failure reproduces on origin/main — it is not introduced by this PR. Our PR-specific test (triggers on every compaction cycle when flush records pre-increment count) passes.

@Kaspre Kaspre force-pushed the fix/memoryflush-skip-every-other branch 2 times, most recently from 3504628 to c7c1d3c Compare March 30, 2026 04:43
@Kaspre

Kaspre commented Mar 30, 2026

Copy link
Copy Markdown
Contributor Author

Rebased onto current main (69793db). Clean rebase, no conflicts.

No code changes — same 2 commits, same 2 files. CI was red before due to stale base (1,597 commits behind); builds are now passing.

Note: pnpm tsgo currently fails on main (#57196, fix in #57198) — this may cause some CI checks to fail; unrelated to this PR.

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Already looking forward to the next diff.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@Kaspre

Kaspre commented Mar 30, 2026

Copy link
Copy Markdown
Contributor Author

Ready for review. Summary:

  • Root cause: memoryFlush fires every other compaction cycle instead of every cycle because memoryFlushCompactionCount is reassigned to the post-increment value, making it equal to compactionCount on the next check.
  • Fix: Keep memoryFlushCompactionCount at the pre-increment value (change letconst, remove reassignment). One-line behavioral fix.
  • Test: Regression test verifies flush triggers on 3 consecutive compaction cycles.
  • Reviews: Greptile 5/5 ("safe to merge"), Codex clean ("no major issues").
  • CI: All Linux/macOS checks pass. 3 Windows test shard failures appear to be flaky (unrelated to this 2-file XS change).

Closes #12590.

@Kaspre Kaspre force-pushed the fix/memoryflush-skip-every-other branch from c7c1d3c to 80742fb Compare April 2, 2026 14:53
@Kaspre

Kaspre commented Apr 2, 2026

Copy link
Copy Markdown
Contributor Author

Rebased onto current main (b6debb4), squashed to single commit 80742fb664.

No conflicts — the buggy reassignment at agent-runner-memory.ts:755 is unchanged on main. Verified the content-hash dedup added in #34222 was subsequently removed during the memory-flush-to-plugin refactor (e0dfc77), so the compactionCount gate is the only dedup mechanism and the bug has no secondary safety net.

Also commented on #12590 with current-state findings.

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 🚀

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@Kaspre Kaspre force-pushed the fix/memoryflush-skip-every-other branch from 80742fb to 9732faa Compare April 18, 2026 18:58
@Kaspre

Kaspre commented Apr 18, 2026

Copy link
Copy Markdown
Contributor Author

Rebased onto current main (d3eeadb). Conflict in agent-runner-memory.ts resolved: kept upstream's new memoryDeps.incrementCompactionCount({ cfg, … }) wrapper while preserving the fix — the nextCount return value is discarded, and the subsequent if (typeof nextCount === "number") { memoryFlushCompactionCount = nextCount; } block introduced during the refactor is the reassignment this PR removes.

Re-verified the bug is still present in the current main / v2026.4.15. The structure was refactored (local-import → memoryDeps wrapper, inline reassign → split if-block), but the semantics are identical: memoryFlushCompactionCount is reassigned to the post-increment value, which sets up the dedup gate to skip the next cycle.

Regression test (triggers on every compaction cycle when flush records pre-increment count) passes after the rebase.

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Can't wait for the next one!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@prtags

prtags Bot commented Apr 23, 2026

Copy link
Copy Markdown

Related work from PRtags group charming-crane-ftmo

Title: Open PR duplicate: memoryFlush compaction counter skip

Number Title
#51421* fix(memory): memoryFlush fires every compaction cycle instead of every other
#69138 fix(memory): preserve pre-increment compaction count in memoryFlushCompactionCount (#12590)

* This PR

@vincentkoc

Copy link
Copy Markdown
Member

ProjectClownfish pushed a narrow repair to this branch so the original contributor path can stay canonical.

Source PR: #51421
Validation: pnpm -s vitest run src/auto-reply/reply/reply-state.test.ts; pnpm check:changed
Contributor credit is preserved in the branch history and PR context.

@vincentkoc vincentkoc force-pushed the fix/memoryflush-skip-every-other branch from 9732faa to 21281a2 Compare April 28, 2026 07:44
@openclaw-barnacle openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 7, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 7, 2026
@Kaspre Kaspre force-pushed the fix/memoryflush-skip-every-other branch from b27aeef to b744861 Compare May 7, 2026 03:33
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 7, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 7, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the stale Marked as stale due to inactivity label May 7, 2026
@openclaw-barnacle

Copy link
Copy Markdown

This assigned pull request has been automatically marked as stale after being open for 27 days.
Please add updates or it will be closed.

@openclaw-barnacle openclaw-barnacle Bot added the stale Marked as stale due to inactivity label May 7, 2026
@Kaspre Kaspre force-pushed the fix/memoryflush-skip-every-other branch from b744861 to 855852e Compare May 7, 2026 16:30
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 7, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 7, 2026
@Kaspre Kaspre force-pushed the fix/memoryflush-skip-every-other branch from 855852e to 6c52138 Compare May 7, 2026 20:12
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 7, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 7, 2026
@Kaspre Kaspre force-pushed the fix/memoryflush-skip-every-other branch from 6c52138 to 4ffe486 Compare May 7, 2026 21:14
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 7, 2026
@Kaspre Kaspre force-pushed the fix/memoryflush-skip-every-other branch from 4ffe486 to 5018391 Compare May 8, 2026 00:22
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 8, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the stale Marked as stale due to inactivity label May 8, 2026
@openclaw-barnacle

Copy link
Copy Markdown

This assigned pull request has been automatically marked as stale after being open for 27 days.
Please add updates or it will be closed.

@openclaw-barnacle openclaw-barnacle Bot added stale Marked as stale due to inactivity and removed proof: sufficient ClawSweeper judged the real behavior proof convincing. labels May 8, 2026
@obviyus obviyus force-pushed the fix/memoryflush-skip-every-other branch from 5ff9131 to f94f9c2 Compare May 8, 2026 13:03
@obviyus obviyus merged commit f2c813c into openclaw:main May 8, 2026
107 of 108 checks passed
@obviyus

obviyus commented May 8, 2026

Copy link
Copy Markdown
Contributor

Landed via rebase onto main.

  • Scoped tests: pnpm test:serial src/auto-reply/reply/agent-runner-memory.test.ts src/auto-reply/reply/reply-state.test.ts; pnpm exec oxfmt --check --threads=1 src/auto-reply/reply/agent-runner-memory.ts src/auto-reply/reply/reply-state.test.ts CHANGELOG.md; pnpm check:changed
  • Changelog: CHANGELOG.md updated
  • Source head before merge: f94f9c2
  • Land commit: f2c813c

Thanks @Kaspre!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clawsweeper Tracked by ClawSweeper automation proof: supplied External PR includes structured after-fix real behavior proof. size: XS stale Marked as stale due to inactivity

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: memoryFlush does not fire reliably

4 participants