Description
When a config schema change causes startup validation to fail, the gateway exits with code 1 and systemd restarts it indefinitely — no backoff, no circuit breaker, no user notification.
Real-World Incident
After upgrading from 2026.3.28 → 2026.4.1, the tools.web.search config schema changed. The old key path was now invalid, causing a hard config validation failure on every startup. systemd restarted the gateway 6,198 times over 12.5 hours (20:20 Apr 2 → 08:55 Apr 3) before the user manually intervened.
Actual Log (journalctl)
Apr 02 20:20:44 node[1768617]: [gateway] signal SIGTERM received
Apr 02 20:20:44 node[1768617]: [gateway] received SIGTERM; shutting down
Apr 02 20:20:46 node[2645786]: Config invalid
Apr 02 20:20:46 node[2645786]: - tools.web.search: Unrecognized key: "brave"
Apr 02 20:20:46 systemd[332]: openclaw-gateway.service: Main process exited, code=exited, status=1/FAILURE
Apr 02 20:20:46 systemd[332]: openclaw-gateway.service: Failed with result 'exit-code'.
Apr 02 20:20:52 systemd[332]: openclaw-gateway.service: Scheduled restart job, restart counter is at 1.
Apr 02 20:20:53 node[2645810]: Config invalid
Apr 02 20:20:53 node[2645810]: - tools.web.search: Unrecognized key: "brave"
Apr 02 20:20:54 systemd[332]: openclaw-gateway.service: Main process exited, code=exited, status=1/FAILURE
Apr 02 20:20:54 systemd[332]: openclaw-gateway.service: Failed with result 'exit-code'.
Apr 02 20:20:59 systemd[332]: openclaw-gateway.service: Scheduled restart job, restart counter is at 2.
[... same pattern every ~5-7 seconds ...]
Apr 03 08:49:30 systemd[332]: openclaw-gateway.service: Scheduled restart job, restart counter is at 6197.
Apr 03 08:49:38 systemd[332]: openclaw-gateway.service: Scheduled restart job, restart counter is at 6198.
Duration: ~12.5 hours. Restarts: 6,198. Interval: ~7 seconds each.
Steps to Reproduce
- Upgrade openclaw from 2026.3.28 → 2026.4.1
- Have
tools.web.search.brave key in openclaw.json (old schema path)
- Restart gateway — immediately enters crash loop
Expected Behavior
After 3 consecutive startup failures within 60 seconds:
- Stop auto-restarting
- Write a clear message to logs:
"Gateway failed to start 3 times in 60s — possible config error. Check ~/.openclaw/openclaw.json. Run 'openclaw doctor' to diagnose."
- Exit cleanly (let the user fix and manually restart)
Suggested Fix
Option A (application-level): On startup, check a restart-sentinel file (already exists: src/infra/restart-sentinel.ts). If 3+ restarts within 60s, write error and exit with a distinct code (e.g. 78 = EX_CONFIG) that systemd's RestartPreventExitStatus can catch.
Option B (systemd unit): Document recommended StartLimitBurst=3 + StartLimitIntervalSec=60 in the generated systemd unit file (src/daemon/node-service.ts).
Both options should be implemented — A for user-visible feedback, B as a safety net.
Environment
- OpenClaw: 2026.4.1
- OS: Linux (WSL2, systemd user session)
- Supervisor: systemd
Description
When a config schema change causes startup validation to fail, the gateway exits with code 1 and systemd restarts it indefinitely — no backoff, no circuit breaker, no user notification.
Real-World Incident
After upgrading from 2026.3.28 → 2026.4.1, the
tools.web.searchconfig schema changed. The old key path was now invalid, causing a hard config validation failure on every startup. systemd restarted the gateway 6,198 times over 12.5 hours (20:20 Apr 2 → 08:55 Apr 3) before the user manually intervened.Actual Log (journalctl)
Duration: ~12.5 hours. Restarts: 6,198. Interval: ~7 seconds each.
Steps to Reproduce
tools.web.search.bravekey in openclaw.json (old schema path)Expected Behavior
After 3 consecutive startup failures within 60 seconds:
"Gateway failed to start 3 times in 60s — possible config error. Check ~/.openclaw/openclaw.json. Run 'openclaw doctor' to diagnose."Suggested Fix
Option A (application-level): On startup, check a restart-sentinel file (already exists:
src/infra/restart-sentinel.ts). If 3+ restarts within 60s, write error and exit with a distinct code (e.g. 78 = EX_CONFIG) that systemd'sRestartPreventExitStatuscan catch.Option B (systemd unit): Document recommended
StartLimitBurst=3+StartLimitIntervalSec=60in the generated systemd unit file (src/daemon/node-service.ts).Both options should be implemented — A for user-visible feedback, B as a safety net.
Environment