Summary
When the gateway is restarted (via systemctl restart or openclaw gateway restart), any in-flight LLM requests are silently dropped. The user who sent a message via Feishu receives no reply and no notification that a restart occurred. From the user's perspective, the bot simply stops responding.
Environment
- OpenClaw version: 2026.3.2
- Channel: Feishu (飞书) via WebSocket long connection
- Deployment: systemd user service (
openclaw-gateway.service)
- Agents affected: All Feishu-bound agents (8 agents, 8 Feishu bot accounts)
Steps to Reproduce
- Send a message to any Feishu-bound agent (e.g., xiaotang)
- While the agent is processing (LLM request in-flight), restart the gateway:
systemctl --user restart openclaw-gateway
- Observe: the user never receives a reply or any indication that the request was lost
Expected Behavior
After a gateway restart, users with interrupted sessions should receive a notification (e.g., "Service restarted, please resend your last message") through the same Feishu channel.
Analysis
Looking at the gateway source code, I found:
-
SIGUSR1 triggers graceful restart with drain (DRAIN_TIMEOUT_MS = 30s), but SIGTERM (used by systemctl restart) does NOT drain — it proceeds to shutdown immediately.
-
abortedLastRun flag exists in sessions.json entries but is only set to false on new runs — there's no post-restart logic to scan for abortedLastRun=true sessions and notify users.
-
Restart Sentinel mechanism exists (writeRestartSentinel / consumeRestartSentinel in gateway-cli-vk3t7zJU.js) but only notifies a single session specified at shutdown time — not all affected sessions.
-
server.close() sends restartExpectedMs: 1500 to WebSocket clients, but Feishu users connecting via bot → WebSocket bridge don't see this.
Relevant code locations
| Component |
File |
Lines |
| Drain logic (SIGUSR1 only) |
gateway-cli-vk3t7zJU.js |
22991-22999 |
| SIGTERM handler (no drain) |
gateway-cli-vk3t7zJU.js |
23015-23017 |
| Restart Sentinel (single session) |
gateway-cli-vk3t7zJU.js |
20568-20635 |
abortedLastRun field |
sessions-XdimqNx2.js |
9089-9091 |
| Session write lock cleanup |
sessions-XdimqNx2.js |
23-164 |
Proposed Enhancement
Option A: Built-in post-restart notification (preferred)
On gateway startup, scan all agent sessions.json for sessions where:
abortedLastRun == true, OR
updatedAt is within the last N minutes AND the session has a valid deliveryContext
Then automatically send a notification via the original channel (Feishu, Telegram, etc.) informing the user that a restart occurred.
Option B: Lifecycle hooks
Provide pre-stop / post-start hooks in the gateway configuration:
gateway:
hooks:
pre-stop: "script-to-collect-active-sessions.sh"
post-start: "script-to-notify-users.sh"
Current Workaround
We created a wrapper script (openclaw-restart.sh) that:
- Collects recently active Feishu sessions from
sessions.json before restart
- Uses
openclaw gateway restart (SIGUSR1-based, with drain) instead of systemctl restart
- After the new process starts, sends notifications via
openclaw agent --deliver to each affected user
This works but is fragile and shouldn't be necessary — the gateway should handle this natively.
Summary
When the gateway is restarted (via
systemctl restartoropenclaw gateway restart), any in-flight LLM requests are silently dropped. The user who sent a message via Feishu receives no reply and no notification that a restart occurred. From the user's perspective, the bot simply stops responding.Environment
openclaw-gateway.service)Steps to Reproduce
Expected Behavior
After a gateway restart, users with interrupted sessions should receive a notification (e.g., "Service restarted, please resend your last message") through the same Feishu channel.
Analysis
Looking at the gateway source code, I found:
SIGUSR1 triggers graceful restart with drain (
DRAIN_TIMEOUT_MS = 30s), but SIGTERM (used bysystemctl restart) does NOT drain — it proceeds to shutdown immediately.abortedLastRunflag exists insessions.jsonentries but is only set tofalseon new runs — there's no post-restart logic to scan forabortedLastRun=truesessions and notify users.Restart Sentinel mechanism exists (
writeRestartSentinel/consumeRestartSentinelingateway-cli-vk3t7zJU.js) but only notifies a single session specified at shutdown time — not all affected sessions.server.close()sendsrestartExpectedMs: 1500to WebSocket clients, but Feishu users connecting via bot → WebSocket bridge don't see this.Relevant code locations
gateway-cli-vk3t7zJU.jsgateway-cli-vk3t7zJU.jsgateway-cli-vk3t7zJU.jsabortedLastRunfieldsessions-XdimqNx2.jssessions-XdimqNx2.jsProposed Enhancement
Option A: Built-in post-restart notification (preferred)
On gateway startup, scan all agent
sessions.jsonfor sessions where:abortedLastRun == true, ORupdatedAtis within the last N minutes AND the session has a validdeliveryContextThen automatically send a notification via the original channel (Feishu, Telegram, etc.) informing the user that a restart occurred.
Option B: Lifecycle hooks
Provide pre-stop / post-start hooks in the gateway configuration:
Current Workaround
We created a wrapper script (
openclaw-restart.sh) that:sessions.jsonbefore restartopenclaw gateway restart(SIGUSR1-based, with drain) instead ofsystemctl restartopenclaw agent --deliverto each affected userThis works but is fragile and shouldn't be necessary — the gateway should handle this natively.