Skip to content

Concurrent writes to openclaw.json produce invalid (concatenated) JSON; gateway refuses config until auto-restore #70643

@hexorcist404

Description

@hexorcist404

Summary

When two OpenClaw processes write to ~/.openclaw/openclaw.json around the same time, the resulting file can contain two concatenated top-level JSON objects rather than a merged one. The gateway then fails to load config, and the UI surfaces this to the user as "agent failed before reply" on the next message. The built-in auto-restore from openclaw.json.last-good recovers correctly, but the failure window can drop at least one user turn.

Environment

  • OpenClaw 2026.4.21 (f788c88)
  • Node v22.22.2
  • Linux 6.17.0-22-generic
  • Gateway mode: local

What happened

On 2026-04-23, the config.observe auditor flagged the live config as invalid:

suspicious: ["missing-meta-vs-last-good", "gateway-mode-missing-vs-last-good"]
suspicious: ["reload-invalid-config"]

Both quarantined copies (openclaw.json.clobbered.2026-04-23T04-45-31-{238,252}Z) contain two separate top-level JSON objects, not one:

{
  "gateway": { "mode": "local", "auth": { } },
  "wizard":  { "lastRunCommand": "doctor", },
  "meta":    { "lastTouchedAt": "2026-04-23T04:27:20.367Z" }
}
{
  "agents": { "defaults": { "model": { "primary": "claude-cli/claude-opus-4-6" } } }
}

That's what doctor wrote (first object) followed by an unmerged agents.defaults.model.primary patch (second object) — the writer appears to have appended instead of merging, producing invalid JSON. The gateway's auto-restore kicked in from openclaw.json.last-good and the system recovered.

Suspected cause

Two writers racing without shared locking / atomic read-modify-write. Likely culprits: openclaw doctor and whichever code path sets agents.defaults.model.primary (possibly the gateway itself on startup, or a second CLI invocation). The config-audit log shows the bad reads happened at a gateway PID reading a config produced out-of-band.

Expected

All writers perform atomic merge (read → merge → write-via-tempfile-then-rename) under an advisory lock, so concurrent writes linearize instead of clobbering.

Impact

  • User sees opaque "agent failed before reply" in the web UI until the next gateway reload.
  • Recovery is automatic via last-good, but the current turn is lost.
  • Two openclaw.json.clobbered.* files accumulate per incident (minor).

Repro (suspected, not yet confirmed)

Run a config-touching command (openclaw doctor, openclaw configure, openclaw models auth login …) in parallel with another config-touching command, or while the gateway is writing agents.defaults. The race window is small but likely reproducible under load.

Attachments

Happy to share sanitized copies of the two openclaw.json.clobbered.* files and the relevant config-audit.jsonl lines on request.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions