Summary
memory-core's sessions source has fundamentally weaker change detection than the memory source. When the gateway restarts, in-process sessionsDirty / sessionsDirtyFiles state is lost and there is no startup catch-up scan to compare on-disk session files against existing SQLite rows. Any session JSONL written across a restart boundary (without a follow-up onSessionTranscriptUpdate event in the new gateway process) stays unindexed indefinitely. memory_search silently misses it.
PR #39732 partially addresses this for the config-drift path (when needsFullReindex = true), but the clean-restart case — gateway up → restart → up again with no config changes — still falls into the gap.
Repro
- Run an agent for a few sessions; let memory-core index them.
openclaw memory status --agent <id> reports sessions · X/X files.
- Append to a session transcript (a normal turn that writes to JSONL).
- Restart the gateway (watchdog, OOM,
systemctl restart openclaw, upgrade — anything that does not change provider/model/scope/chunking/tokenizer).
- Start a new session for that agent. Observe `openclaw memory status --agent ` — `sessions` count is now `X-1/X` (or worse) and `Dirty: no`.
- `openclaw memory index --agent ` (no `--force`) does not catch it.
- Only `openclaw memory index --agent --force` recovers it.
We observed this hit 5 different agents simultaneously after 3 watchdog-driven gateway restarts in 4 hours on one server (OpenClaw 2026.3.28). Stale counts ranged from `5/13` to `18/67`.
Root cause
In `extensions/memory-core/src/memory/manager-session-reindex.ts` (current HEAD, post-#39732):
```ts
export function shouldSyncSessionsForReindex(params): boolean {
if (!params.hasSessionSource) return false;
if (params.sync?.sessionFiles?.some(sf => sf.trim().length > 0)) return true; // targeted
if (params.sync?.force) return true; // --force
if (params.needsFullReindex) return true; // config drift
const reason = params.sync?.reason;
if (reason === "session-start" || reason === "watch") return false; // ★
return params.sessionsDirty && params.dirtySessionFileCount > 0;
}
```
After a clean gateway restart with no config drift:
- `sessionsDirty = false` (in-process state lost)
- `sessionsDirtyFiles = ∅` (in-process state lost)
- `needsFullReindex = false` (no provider/scope/chunking/tokenizer change)
- `warmSession()` calls `sync({ reason: "session-start" })` → hits the ★ exclusion, returns `false`
- The fallthrough `sessionsDirty && dirtySessionFileCount > 0` is also `false`
→ session sync is skipped indefinitely. The only recovery paths are `--force` or an unrelated config change that happens to trigger `needsFullReindex`.
The `memory` source doesn't have this problem because it uses chokidar fs-watching that fires on restart and re-marks files dirty via the durable `this.dirty` flag. The `sessions` source uses an in-process subscription to `onSessionTranscriptUpdate` (see `ensureSessionListener` in `manager-sync-ops.ts`) — events only fire for in-flight turns in the current gateway process.
Why PR #39732 doesn't fully fix this
#39732 reordered the gate so `needsFullReindex` is checked before the `session-start`/`watch` exclusion. That fixes the case where the gateway restart coincides with a config drift that already triggers `needsFullReindex = true`. It does not help clean restarts (no config drift), which is the more common case in production (watchdog, OOM, planned restarts, package upgrades that don't bump the indexer config).
Proposed fix
Add a startup catch-up scan in `MemoryIndexManager`'s non-status-only initialization branch in `manager.ts` (around the spot where `ensureWatcher()` / `ensureSessionListener()` / `ensureIntervalSync()` get wired up):
- List `sessions/` files via `listSessionFilesForAgent(...)`.
- Compare against existing SQLite rows from `loadMemorySourceFileState({ source: "sessions" })` — the manager already loads this state.
- For any file that's missing from the index OR has a newer mtime / different size than its SQLite row, mark it dirty (`sessionsDirtyFiles.add(file)` + `sessionsDirty = true`).
- Schedule a debounced sync (or let the next `session-start` pick it up since the state is now durable in-process).
This restores the same robustness the `memory` source already has via fs-watching, without requiring `--force` and without changing `session-start`/`watch` exclusion semantics. The embedding cache keeps the cost minimal — unchanged chunks aren't re-paid for.
Bonus issues observed while diagnosing
Could be split into separate issues if preferred:
- `scanSessionFiles` filename filter inconsistency. `cli.runtime.ts`'s `scanSessionFiles` only matches `.jsonl`, but `isUsageCountedSessionTranscriptFileName` (in `src/config/sessions/artifacts.ts`) also matches `.jsonl.deleted.Z` and `*.jsonl.reset.Z` archives. `openclaw memory status` therefore under-counts `` vs what the indexer actually processes, so you can see e.g. `sessions · 41/40 files` after recovery.
- `Dirty:` field semantic is misleading. It only reflects in-process pending events, not "index out of sync with disk." After a restart with missed events you can (and routinely do) have `Dirty: no` and `25/41 files`. Status output should distinguish those two conditions.
Environment
- OpenClaw version: 2026.3.28 (gateway). Source inspected up through 2026.4.5 + the unreleased branch — the gap is still present.
- Linux x86_64, single gateway, root user, 5 agents (`main` plus 4 client agents).
- memory-core sources: `["memory", "sessions"]`.
Summary
memory-core's
sessionssource has fundamentally weaker change detection than thememorysource. When the gateway restarts, in-processsessionsDirty/sessionsDirtyFilesstate is lost and there is no startup catch-up scan to compare on-disk session files against existing SQLite rows. Any session JSONL written across a restart boundary (without a follow-uponSessionTranscriptUpdateevent in the new gateway process) stays unindexed indefinitely.memory_searchsilently misses it.PR #39732 partially addresses this for the config-drift path (when
needsFullReindex = true), but the clean-restart case — gateway up → restart → up again with no config changes — still falls into the gap.Repro
openclaw memory status --agent <id>reportssessions · X/X files.systemctl restart openclaw, upgrade — anything that does not change provider/model/scope/chunking/tokenizer).We observed this hit 5 different agents simultaneously after 3 watchdog-driven gateway restarts in 4 hours on one server (OpenClaw 2026.3.28). Stale counts ranged from `5/13` to `18/67`.
Root cause
In `extensions/memory-core/src/memory/manager-session-reindex.ts` (current HEAD, post-#39732):
```ts
export function shouldSyncSessionsForReindex(params): boolean {
if (!params.hasSessionSource) return false;
if (params.sync?.sessionFiles?.some(sf => sf.trim().length > 0)) return true; // targeted
if (params.sync?.force) return true; // --force
if (params.needsFullReindex) return true; // config drift
const reason = params.sync?.reason;
if (reason === "session-start" || reason === "watch") return false; // ★
return params.sessionsDirty && params.dirtySessionFileCount > 0;
}
```
After a clean gateway restart with no config drift:
→ session sync is skipped indefinitely. The only recovery paths are `--force` or an unrelated config change that happens to trigger `needsFullReindex`.
The `memory` source doesn't have this problem because it uses chokidar fs-watching that fires on restart and re-marks files dirty via the durable `this.dirty` flag. The `sessions` source uses an in-process subscription to `onSessionTranscriptUpdate` (see `ensureSessionListener` in `manager-sync-ops.ts`) — events only fire for in-flight turns in the current gateway process.
Why PR #39732 doesn't fully fix this
#39732 reordered the gate so `needsFullReindex` is checked before the `session-start`/`watch` exclusion. That fixes the case where the gateway restart coincides with a config drift that already triggers `needsFullReindex = true`. It does not help clean restarts (no config drift), which is the more common case in production (watchdog, OOM, planned restarts, package upgrades that don't bump the indexer config).
Proposed fix
Add a startup catch-up scan in `MemoryIndexManager`'s non-status-only initialization branch in `manager.ts` (around the spot where `ensureWatcher()` / `ensureSessionListener()` / `ensureIntervalSync()` get wired up):
This restores the same robustness the `memory` source already has via fs-watching, without requiring `--force` and without changing `session-start`/`watch` exclusion semantics. The embedding cache keeps the cost minimal — unchanged chunks aren't re-paid for.
Bonus issues observed while diagnosing
Could be split into separate issues if preferred:
Environment