Feature Request
After a gateway restart (planned or crash recovery), all in-flight work is lost:
- Active agent sessions lose their conversation context
- Running cron jobs are silently dropped
- There's no mechanism to resume or retry interrupted work
Current Behaviour
- Gateway crashes or restarts
- Watchdog detects and restarts gateway (~5 min)
- All in-flight sessions and cron runs are gone
- Agent wakes up fresh with no knowledge of what was in progress
- Cron jobs that were mid-execution are not retried until their next scheduled time
Desired Behaviour
Session recovery:
- On restart, the gateway should detect sessions that were active at shutdown
- Inject a system event into recovered sessions indicating the restart (e.g.
[System] Gateway restarted. Previous session context may be incomplete.)
- Optionally: persist session state to disk so context survives restarts
Cron run recovery:
- Track in-flight cron runs in a durable store (e.g. SQLite or file)
- On restart, check for interrupted runs
- Re-queue interrupted runs with a flag indicating they're retries
- Respect a configurable retry policy (e.g. max retries, backoff)
Wake mechanism:
- After successful restart, automatically send a wake event to all agents that had active sessions
- This ensures agents can check for in-progress work rather than waiting for their next heartbeat
Workarounds Currently in Use
- External watchdog script (
watchdog.sh) handles restart detection
- Manual
cron wake event after restart to kick agents
- Daily memory files for manual context recovery
openclaw-safe-restart scripts for planned restarts
Impact
This is especially important for:
- Long-running cron jobs (e.g. email cleanup that takes 9+ minutes)
- Multi-step agent workflows that get interrupted
- Users with longer heartbeat intervals (1-2h) who won't notice the gap
Environment
- macOS, LaunchAgent-based gateway
- Watchdog runs every 5 minutes
- Multiple agents (kit, cron-bot, etc.)
Feature Request
After a gateway restart (planned or crash recovery), all in-flight work is lost:
Current Behaviour
Desired Behaviour
Session recovery:
[System] Gateway restarted. Previous session context may be incomplete.)Cron run recovery:
Wake mechanism:
Workarounds Currently in Use
watchdog.sh) handles restart detectioncron wakeevent after restart to kick agentsopenclaw-safe-restartscripts for planned restartsImpact
This is especially important for:
Environment