Skip to content

[Feature] Automatic config rollback on gateway failure #65814

@smonett

Description

@smonett

Problem

When config.apply or config.patch writes a configuration change that breaks the gateway (invalid field combination, bad secret reference, incompatible plugin config), the gateway fails to start and stays down. There is no automatic recovery — the user must manually identify and fix the config or restore from a backup they may not have.

In hybrid reload mode, the gateway auto-restarts for critical config changes. If that restart fails due to a bad config, there's no fallback — KeepAlive just crash-loops the same broken config.

Evidence

  • Multiple incidents where config changes broke the gateway, requiring manual terminal intervention.
  • No backup is created before config writes. If the user doesn't have a manual backup, recovery requires guessing what changed.

Workaround

I built oc-config-safe, a ~120-line bash script that enforces:

  1. Create timestamped backup of current config
  2. Stop gateway
  3. Validate new config (JSON parse + schema check)
  4. Apply change
  5. Start gateway
  6. Verify health within timeout
  7. If health check fails → automatically restore backup and restart

This has prevented data loss on multiple occasions.

Proposed Solution

Add a backup-and-rollback mechanism to the native config pipeline:

  1. Before any config write (config.apply, config.patch, doctor --fix), save the current config as openclaw.json.last-good
  2. After writing, verify gateway health within a configurable timeout
  3. If health fails, automatically restore openclaw.json.last-good and restart
  4. Expose config.rollback as a gateway tool action for manual recovery
{
  "gateway": {
    "configSafety": {
      "backupBeforeWrite": true,
      "rollbackOnFailure": true,
      "healthCheckTimeoutMs": 30000
    }
  }
}

Impact

Medium-high. Config changes are a common failure point, especially for users customizing agent configs, adding plugins, or adjusting auth profiles. Automatic rollback would prevent extended downtime from bad configs.

Environment

  • OpenClaw 2026.4.10 (npm, macOS)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions