fix(config): serialize concurrent writeConfigFile calls with a file lock by BingqingLyu · Pull Request #612 · BingqingLyu/openclaw

BingqingLyu · 2026-04-27T09:23:19Z

Summary

Concurrent openclaw agents add invocations race on the global openclaw.json file. Each process reads the config independently, computes its own updated version, then writes via atomic rename. The rename is safe against torn writes but not against lost-update races: whichever process wins the rename last silently overwrites every change made by the others.

This is the root cause of the config-overwrite symptom reported in openclaw#43367:

Config overwrite: /home/user/.openclaw/openclaw.json (... -> ..., backup=...)

After parallel agents add commands, openclaw agents list shows only a subset of the agents.

Fix

Wrap the entire read-snapshot → compute → write sequence inside withFileLock(configPath, ...) using the existing file-lock primitive already used by the pairing store, auth-profile store, and other subsystems. Concurrent writes now queue on an advisory .lock file and each process re-reads the latest on-disk state before writing, so no agent entry is silently dropped.

Lock options: 15 retries, 1.5× back-off, 50 ms–5 s window, randomised jitter, 30 s stale timeout — tuned to handle typical parallel agent-add bursts without introducing noticeable latency for single-writer cases.

Testing

Run 4+ concurrent openclaw agents add commands targeting the same config file and verify all agents appear in openclaw agents list afterwards. Without this fix the test fails non-deterministically; with it, all agents persist.

Concurrent `openclaw agents add` invocations race on the global openclaw.json file: each process reads the config independently, computes its own updated version, then writes it back via an atomic rename. The rename is safe against torn writes but not against lost-update races: whichever process wins the rename last silently overwrites every change made by the others. This is the root cause of the config-overwrite symptom reported in openclaw#43367. Fix: wrap the entire read-snapshot → compute → write sequence inside withFileLock(configPath, ...) using the existing file-lock primitive already used by the pairing store, auth-profile store, and others. Concurrent writes now queue on an advisory .lock file and each process re-reads the latest on-disk state before writing, so no agent entry is silently dropped. Lock options: 15 retries, 1.5× back-off, 50 ms–5 s window, randomised jitter, 30 s stale timeout — tuned to handle typical parallel agent-add bursts without introducing noticeable latency for single-writer cases. Fixes: openclaw#43367

@greptile-apps

…it in agents add Addresses the lost-update race on agents.list identified by @greptile-apps in the review of this PR. The previous commit serialised concurrent writes via withFileLock, which prevents torn-rename races (two processes writing simultaneously). However, cfg was pre-computed by callers *before* entering the lock. createMergePatch treats arrays atomically, so when two processes both derive their configs from the same stale base and add different agents, the second writer's patch replaces agents.list wholesale, silently dropping the first writer's entry. This commit introduces updateConfigFile(transform, options?) — a new sibling to writeConfigFile that accepts a transformation function instead of a pre-computed config. The transform receives the freshest on-disk config (re-read inside the advisory lock) and returns the desired next config. Because the read, transform, and write all happen under the same lock, every concurrent writer sees the fully-committed result of all preceding writes before computing its own mutation. agentsAddCommand (non-interactive path) is updated to use updateConfigFile. The agent-creation and binding logic is moved inside the transform so it always operates on the current agents.list, making concurrent 'openclaw agents add' commands fully safe with respect to array-valued config fields. writeConfigFile is kept as-is for callers that write complete, self-contained configs (gateway bootstrap, model config, etc.) where the pre-computed pattern is intentional and correct.

…timePatch, add snapshot invalidation Two bugs in the module-level updateConfigFile wrapper: 1. The runtimePatch block computed createMergePatch(runtimeConfigSnapshot, transformed) where was derived from the fresh disk state (not from runtimeConfigSnapshot). In writeConfigFile this pattern is valid because cfg comes from runtimeConfigSnapshot. Here the bases differ so the patch is semantically wrong in gateway mode. 2. After writing, the runtime snapshot was never invalidated, leaving in-memory state stale for subsequent loadConfig() callers in gateway mode. Fix: remove the incorrect runtimePatch overlay (env-var ref restoration is already handled inside writeConfigFileUnderLock via restoreEnvVarRefs), and add clearRuntimeConfigSnapshot() when a snapshot was active so subsequent reads go back to disk. This only affects gateway mode — the CLI agents add use case runs without a runtime snapshot and was already correct.

withFileLock calls fs.mkdir without an explicit mode, which creates the directory with default permissions (0755, world-readable). The subsequent mkdir in writeConfigFileUnderLock passes mode:0700 but is a no-op on an already-existing directory, so the first write to a new config location would leave the config directory accessible to other local users until the process was restarted — a potential secret-exposure regression. Fix: add an explicit mkdir(configDir, { recursive:true, mode:0o700 }) in both writeConfigFile and updateConfigFile before the withFileLock call so we always win the race to create the directory with hardened permissions.

waliddafif added 4 commits March 13, 2026 11:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(config): serialize concurrent writeConfigFile calls with a file lock#612

fix(config): serialize concurrent writeConfigFile calls with a file lock#612
BingqingLyu wants to merge 4 commits into
mainfrom
fork-pr-44997-fix-config-write-lock

BingqingLyu commented Apr 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

BingqingLyu commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Fix

Testing

Related

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

BingqingLyu commented Apr 27, 2026 •

edited

Loading