Summary
When two OpenClaw processes write to ~/.openclaw/openclaw.json around the same time, the resulting file can contain two concatenated top-level JSON objects rather than a merged one. The gateway then fails to load config, and the UI surfaces this to the user as "agent failed before reply" on the next message. The built-in auto-restore from openclaw.json.last-good recovers correctly, but the failure window can drop at least one user turn.
Environment
- OpenClaw
2026.4.21 (f788c88)
- Node
v22.22.2
- Linux 6.17.0-22-generic
- Gateway mode:
local
What happened
On 2026-04-23, the config.observe auditor flagged the live config as invalid:
suspicious: ["missing-meta-vs-last-good", "gateway-mode-missing-vs-last-good"]
suspicious: ["reload-invalid-config"]
Both quarantined copies (openclaw.json.clobbered.2026-04-23T04-45-31-{238,252}Z) contain two separate top-level JSON objects, not one:
{
"gateway": { "mode": "local", "auth": { … } },
"wizard": { "lastRunCommand": "doctor", … },
"meta": { "lastTouchedAt": "2026-04-23T04:27:20.367Z" }
}
{
"agents": { "defaults": { "model": { "primary": "claude-cli/claude-opus-4-6" } } }
}
That's what doctor wrote (first object) followed by an unmerged agents.defaults.model.primary patch (second object) — the writer appears to have appended instead of merging, producing invalid JSON. The gateway's auto-restore kicked in from openclaw.json.last-good and the system recovered.
Suspected cause
Two writers racing without shared locking / atomic read-modify-write. Likely culprits: openclaw doctor and whichever code path sets agents.defaults.model.primary (possibly the gateway itself on startup, or a second CLI invocation). The config-audit log shows the bad reads happened at a gateway PID reading a config produced out-of-band.
Expected
All writers perform atomic merge (read → merge → write-via-tempfile-then-rename) under an advisory lock, so concurrent writes linearize instead of clobbering.
Impact
- User sees opaque "agent failed before reply" in the web UI until the next gateway reload.
- Recovery is automatic via
last-good, but the current turn is lost.
- Two
openclaw.json.clobbered.* files accumulate per incident (minor).
Repro (suspected, not yet confirmed)
Run a config-touching command (openclaw doctor, openclaw configure, openclaw models auth login …) in parallel with another config-touching command, or while the gateway is writing agents.defaults. The race window is small but likely reproducible under load.
Attachments
Happy to share sanitized copies of the two openclaw.json.clobbered.* files and the relevant config-audit.jsonl lines on request.
Summary
When two OpenClaw processes write to
~/.openclaw/openclaw.jsonaround the same time, the resulting file can contain two concatenated top-level JSON objects rather than a merged one. The gateway then fails to load config, and the UI surfaces this to the user as "agent failed before reply" on the next message. The built-in auto-restore fromopenclaw.json.last-goodrecovers correctly, but the failure window can drop at least one user turn.Environment
2026.4.21(f788c88)v22.22.2localWhat happened
On 2026-04-23, the
config.observeauditor flagged the live config as invalid:Both quarantined copies (
openclaw.json.clobbered.2026-04-23T04-45-31-{238,252}Z) contain two separate top-level JSON objects, not one:{ "gateway": { "mode": "local", "auth": { … } }, "wizard": { "lastRunCommand": "doctor", … }, "meta": { "lastTouchedAt": "2026-04-23T04:27:20.367Z" } } { "agents": { "defaults": { "model": { "primary": "claude-cli/claude-opus-4-6" } } } }That's what
doctorwrote (first object) followed by an unmergedagents.defaults.model.primarypatch (second object) — the writer appears to have appended instead of merging, producing invalid JSON. The gateway's auto-restore kicked in fromopenclaw.json.last-goodand the system recovered.Suspected cause
Two writers racing without shared locking / atomic read-modify-write. Likely culprits:
openclaw doctorand whichever code path setsagents.defaults.model.primary(possibly the gateway itself on startup, or a second CLI invocation). The config-audit log shows the bad reads happened at a gateway PID reading a config produced out-of-band.Expected
All writers perform atomic merge (read → merge → write-via-tempfile-then-rename) under an advisory lock, so concurrent writes linearize instead of clobbering.
Impact
last-good, but the current turn is lost.openclaw.json.clobbered.*files accumulate per incident (minor).Repro (suspected, not yet confirmed)
Run a config-touching command (
openclaw doctor,openclaw configure,openclaw models auth login …) in parallel with another config-touching command, or while the gateway is writingagents.defaults. The race window is small but likely reproducible under load.Attachments
Happy to share sanitized copies of the two
openclaw.json.clobbered.*files and the relevantconfig-audit.jsonllines on request.