Skip to content

fix(memory): EPERM on Windows persists after 64187 retry — needs copyFile/unlink fallback (was in closed PR 71611) #78640

@MilleniumGenAI

Description

@MilleniumGenAI

Summary

The memory index atomic reindex (openclaw memory index --force) consistently fails with EPERM on Windows 11, even after two prior fixes:

  • Issue 64187 — added renameWithRetry with EBUSY/EPERM/EACCES retry logic
  • Issue 77785 — added copyFileSync + unlinkSync fallback for exec-approvals (different module, different file)

The retry logic from 64187 is insufficient on Windows: fs.rename() fundamentally cannot rename a file when ANY process holds a handle, and the retry loop (6 attempts, 25ms backoff) doesn't help when the file lock is persistent — even when the gateway is stopped.

PR 71611 by @jujitao had the correct fix — copyFile + unlink fallback inside renameWithRetry — but it was closed without merge on May 2.

Reproduction

openclaw memory index --force

Output (tested on v2026.5.5 and v2026.5.6, gateway stopped — no external process holding handles):

Memory index failed (main): EPERM: operation not permitted, rename 'C:\Users\...\.openclaw\memory\main.sqlite' -> 'C:\Users\...\.openclaw\memory\main.sqlite.backup-...'
Memory index failed (coder): EPERM: operation not permitted, rename '...\coder.sqlite' -> '...\coder.sqlite.backup-...'
Memory index failed (deep-researcher): EPERM: operation not permitted, rename '...\deep-researcher.sqlite' -> '...\deep-researcher.sqlite.backup-...'

Reproducible both with gateway running AND with gateway stopped. Deleting the stale .sqlite files before reindex does not help — the reindex process itself creates new files and then fails to rename them.

Root cause

extensions/memory-core/src/memory/manager-atomic-reindex.tsrenameWithRetry()options.fileOps.rename(source, target) uses fs.promises.rename() which maps to MoveFileExW(MOVEFILE_REPLACE_EXISTING) on Windows. This requires DELETE access that consistently fails in Windows environments.

The existing retry loop at transientRenameErrorCodes = ["EBUSY", "EPERM", "EACCES"] treats EPERM as transient and retries 6 times, but on Windows the failure is NOT transient — it's a fundamental API mismatch. The retry will always exhaust all attempts and throw.

Fix

Add copyFile + unlink fallback inside renameWithRetry when err.code === "EPERM", matching the pattern already used in src/infra/json-file.ts and applied to exec-approvals in issue 77785:

try {
    await options.fileOps.rename(source, target);
} catch (e) {
    if (e.code === "EPERM") {
        await fs.promises.copyFile(source, target);
        await fs.promises.unlink(source);
    } else throw e;
}

This is exactly what PR 71611 proposed and implemented.

Related

  • Issue 64187 — Added retry but not copy fallback (closed as fixed)
  • Issue 77785 — Fixed same bug class in exec-approvals module
  • PR 71611 — Contained the correct fix but was closed without merging
  • OpenClaw versions tested: v2026.5.5, v2026.5.6
  • OS: Windows 11 10.0.26200 (x64)

Temporary workaround

Manual patch of compiled dist/manager-*.js: replace await options.fileOps.rename(source, target) with try-catch using copyFile + unlink on EPERM. Same pattern as the exec-approvals fix from issue 77785. Requires full gateway restart (stop/start, not SIGUSR1) to reload cached modules.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions