Skip to content

Config write has no validation — invalid keys silently accepted, only caught on gateway restart #31148

@moeedahmed

Description

@moeedahmed

Summary

OpenClaw does not validate config keys at write time. Any process (including agents themselves) can write an unrecognised or invalid key to openclaw.json, and it will be silently accepted. The gateway only validates the config schema on startup, so the invalid key sits undetected until the next restart — at which point the gateway crashes with a generic error that does not identify the offending key. In a systemd-managed deployment, this triggers a restart loop that can run for minutes, with all connected services (Telegram bots, etc.) going completely unresponsive and no obvious cause surfaced to the user.

Steps to Reproduce

  1. Start the OpenClaw gateway normally — confirm it's running and healthy.
  2. Add an unrecognised key to openclaw.json by any method (direct file edit, jq, or an agent writing via a script):
{
  "agents": {
    "defaults": {
      "suppressToolErrorWarnings": true
    }
  }
}
  1. Observe: no error is returned, no warning is logged. The gateway continues running with the old (valid) config.
  2. Restart the gateway (openclaw gateway restart or systemctl --user restart openclaw-gateway).
  3. The gateway fails to start. The error message is generic and does not name the invalid key.
  4. If running under systemd with Restart=on-failure, the service enters a crash loop.

Expected Behaviour

  • Writing an invalid or unrecognised config key should be rejected immediately with a clear error message naming the offending key and its location in the config tree.
  • At minimum, openclaw config set should validate against the current schema before persisting changes.
  • The gateway startup error should identify exactly which key failed validation, not just report a generic schema error.

Actual Behaviour

  • The invalid key is silently written to openclaw.json with no error or warning.
  • The gateway continues running on the previously loaded (valid) config — no indication anything is wrong.
  • On next restart, the gateway crashes during config validation.
  • The error message does not identify which key is invalid.
  • Under systemd, the crash triggers automatic restarts, creating a crash loop with sustained CPU usage and complete service unavailability.

Impact

Operational: In one incident (Mar 2 2026), the gateway entered a crash loop of 28 restart cycles over ~14 minutes. CPU overheated from rapid restart cycling; all five Telegram bots were completely unresponsive. Fix was removing a single unrecognised key — gateway started cleanly on the first attempt.

Agentic use — this is a safety issue: In agentic deployments where agents modify gateway config programmatically, this bug is significantly worse. An agent writes what it believes is correct, receives no error, and has no way to know the write was invalid until the gateway crashes — potentially hours later. The agent may retry the same invalid write, compounding the problem. Four incidents in six days across one deployment:

Date Key written Result
Feb 24 allowBots: true (unrecognised) Gateway crash on restart
Feb 25 allowBots: true (agent retry — unaware first attempt failed) Second crash
Mar 2 agents.defaults.suppressToolErrorWarnings: true (unrecognised in strict schema) 28-cycle crash loop, ~14 min downtime

Documentation saying 'don't write config directly' is insufficient when the config writer is an autonomous agent.

Proposed Fix

Option 1 — Minimal: Document openclaw config set as the only supported config write path. Add a warning when openclaw.json is modified externally.

Option 2 — Better: Add openclaw config validate or openclaw doctor --check — validates current config against the active schema without starting the gateway. Agents can run this as a post-write check and roll back if validation fails.

Option 3 — Best: Atomic config updates with validation before write. If validation fails, reject the write, return the exact failing key path, leave existing config untouched. Keep .openclaw.json.bak for auto-rollback on bad startup config.

Version

  • OpenClaw v2026.2.25 and v2026.2.26 (commit bc50708)
  • Issue present in both versions

Environment

  • WSL2 Ubuntu (Windows Subsystem for Linux)
  • systemd user service (openclaw-gateway.service) with Restart=on-failure
  • Telegram channel provider
  • Multi-agent deployment (5 agents, shared gateway config)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions