Skip to content

Feature Request: Self-healing gateway with restart countdown, config backup & crash recovery #31480

@Mb29661

Description

@Mb29661

Problem

When the OpenClaw gateway crashes or enters a restart loop due to a bad config change, there is currently no built-in recovery mechanism. The user has to:

  • Notice the gateway is down
  • Manually diagnose the cause
  • Restore a previous config by hand
  • Restart manually

This is especially painful on headless servers where the user is interacting via mobile (Telegram/WhatsApp).

Proposed solution

Three small, composable features:

1. Restart countdown notifications

Before any gateway restart, send a channel notification with a countdown:

🔄 Gateway restart — T-60s
🔄 Gateway restart — T-30s
🔄 Gateway restart — T-10s
🔄 Gateway restart — T-0 🚀
✅ Gateway up — agents: Research · CRM · Site Seller · System · General

2. Automatic config backup

Before every restart, snapshot openclaw.json to a rotating backup directory (keep last N). Tag each as good-config-<timestamp>.json. On crash-loop detection, automatically roll back to the last known good config.

3. Crash-loop watchdog

A short-lived watchdog (runs ~90s post-restart, then exits). If ≥3 restarts occur within the window:

  • Save the bad config for diagnostics
  • Restore last known good config automatically
  • Restart the gateway
  • Notify the user what happened and which config was restored

Reference implementation

We built this as shell scripts that work well in production on a headless Hetzner VPS (Ubuntu 24.04). Key insight: the watchdog needs to be short-lived (not a daemon) and the restart needs to be scheduled with a small delay (via at) so the exec session can return before the kill happens — otherwise you kill the session running the restart command.

Happy to share the scripts as a starting point for a proper implementation.

Why it matters

For always-on, mobile-first setups (the core OpenClaw use case) this is table stakes. The gateway should be self-healing — not something the user has to babysit from their phone.

Metadata

Metadata

Assignees

No one assigned

    Labels

    staleMarked as stale due to inactivity

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions