Problem
After a gateway restart (SIGUSR1, config change, update, or manual restart), all active agent sessions are interrupted. If an agent was mid-conversation in a Signal group (or any channel), the session dies and the agent never follows up. The user has to re-send their message or poke the agent with "?" to get a response.
This is especially painful when:
- A config change triggers an automatic restart
- The gateway restarts during an active conversation
- Multiple agents across multiple Signal groups are affected simultaneously
The user experience is terrible — messages appear "read" (Signal read receipts) but never get a response. It looks like the agent is ignoring you.
Current Workarounds
1. historyLimit (partial fix)
Setting channels.signal.historyLimit: 15 means agents see recent group messages when a new session starts. But this only helps if someone sends a new message — the agent still sits idle until poked.
2. BOOT.md + scan script (workaround)
We built a BOOT.md that runs on gateway:startup via the boot-md hook. It scans all agent session transcripts for Signal groups where the last message was from a user (unanswered), then sends sessions_send nudges to those agents.
This works but is fragile:
- It costs a full agent turn on the main agent every restart
- The scan script reads raw JSONL transcripts (implementation detail that could change)
- It cannot detect messages lost during the drain window
- Maximum 5 nudges per boot to avoid token storms
Proposed Solution
Native session resumption after restart
When the gateway comes back up after a SIGUSR1 restart:
- Detect interrupted sessions — sessions that had an active turn aborted by drain, or sessions where the last transcript entry is a user message with no assistant response
- Auto-resume those sessions — inject a system event like: "The gateway restarted. Review conversation context and respond to any unanswered messages." or simply re-process the last user message
- Scope it to channel sessions only — skip heartbeat, subagent, and boot sessions
- Rate limit — cap at N concurrent resumptions to avoid API storms
- Configurable — add a config key like
session.resumeAfterRestart: true/false (default: true)
Bonus: Drain-aware message queuing
The GatewayDrainingError should queue messages silently (the code already has resetAllLanes() for this, but it does not always work). Messages received during drain should be replayed after restart, not rejected.
Environment
- OpenClaw 2026.3.13
- Signal channel with ~27 bound agents across Signal groups
- Frequent restarts due to config changes, updates, and development
Impact
This affects every multi-agent Signal setup. Any restart = broken conversations across all active groups. The user has to manually re-engage every agent that was mid-conversation.
Problem
After a gateway restart (SIGUSR1, config change, update, or manual restart), all active agent sessions are interrupted. If an agent was mid-conversation in a Signal group (or any channel), the session dies and the agent never follows up. The user has to re-send their message or poke the agent with "?" to get a response.
This is especially painful when:
The user experience is terrible — messages appear "read" (Signal read receipts) but never get a response. It looks like the agent is ignoring you.
Current Workarounds
1.
historyLimit(partial fix)Setting
channels.signal.historyLimit: 15means agents see recent group messages when a new session starts. But this only helps if someone sends a new message — the agent still sits idle until poked.2.
BOOT.md+ scan script (workaround)We built a
BOOT.mdthat runs ongateway:startupvia theboot-mdhook. It scans all agent session transcripts for Signal groups where the last message was from a user (unanswered), then sendssessions_sendnudges to those agents.This works but is fragile:
Proposed Solution
Native session resumption after restart
When the gateway comes back up after a SIGUSR1 restart:
session.resumeAfterRestart: true/false(default: true)Bonus: Drain-aware message queuing
The
GatewayDrainingErrorshould queue messages silently (the code already hasresetAllLanes()for this, but it does not always work). Messages received during drain should be replayed after restart, not rejected.Environment
Impact
This affects every multi-agent Signal setup. Any restart = broken conversations across all active groups. The user has to manually re-engage every agent that was mid-conversation.