Problem
When config.apply or config.patch writes a configuration change that breaks the gateway (invalid field combination, bad secret reference, incompatible plugin config), the gateway fails to start and stays down. There is no automatic recovery — the user must manually identify and fix the config or restore from a backup they may not have.
In hybrid reload mode, the gateway auto-restarts for critical config changes. If that restart fails due to a bad config, there's no fallback — KeepAlive just crash-loops the same broken config.
Evidence
- Multiple incidents where config changes broke the gateway, requiring manual terminal intervention.
- No backup is created before config writes. If the user doesn't have a manual backup, recovery requires guessing what changed.
Workaround
I built oc-config-safe, a ~120-line bash script that enforces:
- Create timestamped backup of current config
- Stop gateway
- Validate new config (JSON parse + schema check)
- Apply change
- Start gateway
- Verify health within timeout
- If health check fails → automatically restore backup and restart
This has prevented data loss on multiple occasions.
Proposed Solution
Add a backup-and-rollback mechanism to the native config pipeline:
- Before any config write (
config.apply, config.patch, doctor --fix), save the current config as openclaw.json.last-good
- After writing, verify gateway health within a configurable timeout
- If health fails, automatically restore
openclaw.json.last-good and restart
- Expose
config.rollback as a gateway tool action for manual recovery
{
"gateway": {
"configSafety": {
"backupBeforeWrite": true,
"rollbackOnFailure": true,
"healthCheckTimeoutMs": 30000
}
}
}
Impact
Medium-high. Config changes are a common failure point, especially for users customizing agent configs, adding plugins, or adjusting auth profiles. Automatic rollback would prevent extended downtime from bad configs.
Environment
- OpenClaw 2026.4.10 (npm, macOS)
Problem
When
config.applyorconfig.patchwrites a configuration change that breaks the gateway (invalid field combination, bad secret reference, incompatible plugin config), the gateway fails to start and stays down. There is no automatic recovery — the user must manually identify and fix the config or restore from a backup they may not have.In
hybridreload mode, the gateway auto-restarts for critical config changes. If that restart fails due to a bad config, there's no fallback —KeepAlivejust crash-loops the same broken config.Evidence
Workaround
I built
oc-config-safe, a ~120-line bash script that enforces:This has prevented data loss on multiple occasions.
Proposed Solution
Add a backup-and-rollback mechanism to the native config pipeline:
config.apply,config.patch,doctor --fix), save the current config asopenclaw.json.last-goodopenclaw.json.last-goodand restartconfig.rollbackas a gateway tool action for manual recovery{ "gateway": { "configSafety": { "backupBeforeWrite": true, "rollbackOnFailure": true, "healthCheckTimeoutMs": 30000 } } }Impact
Medium-high. Config changes are a common failure point, especially for users customizing agent configs, adding plugins, or adjusting auth profiles. Automatic rollback would prevent extended downtime from bad configs.
Environment