Problem
When the gateway restarts (e.g., after config.patch), any in-flight agent run that triggers a new command during the drain window gets GatewayDrainingError. This falls through to the generic error handler in agent-runner.runtime and surfaces to the user as:
⚠️ Agent failed before reply: Gateway is draining for restart; new tasks are not accepted.
Logs: openclaw logs --follow
This is a transient error — the gateway comes back up seconds later. But the user sees an error and thinks something is broken.
Expected behavior
GatewayDrainingError should be treated like isTransientHttp errors — auto-retry after a short delay (e.g., wait for the restart to complete, then retry). The error should never surface to the user since it always resolves on its own.
Current behavior
In agent-runner.runtime, the error handling chain checks for billing, context overflow, role ordering, session corruption, and transient HTTP — but GatewayDrainingError is not checked and falls to the generic Agent failed before reply message.
Suggested fix
Add a check before the generic error handler:
if (message.includes('Gateway is draining') || error?.name === 'GatewayDrainingError') {
// Wait for restart to complete (poll gateway health or fixed delay)
await new Promise(r => setTimeout(r, 15000));
continue; // retry the run
}
Environment
- OpenClaw 2026.3.24
- macOS, local gateway, config.patch triggered restart
- Happens every time a restart occurs while agents are active
Problem
When the gateway restarts (e.g., after
config.patch), any in-flight agent run that triggers a new command during the drain window getsGatewayDrainingError. This falls through to the generic error handler inagent-runner.runtimeand surfaces to the user as:This is a transient error — the gateway comes back up seconds later. But the user sees an error and thinks something is broken.
Expected behavior
GatewayDrainingErrorshould be treated likeisTransientHttperrors — auto-retry after a short delay (e.g., wait for the restart to complete, then retry). The error should never surface to the user since it always resolves on its own.Current behavior
In
agent-runner.runtime, the error handling chain checks for billing, context overflow, role ordering, session corruption, and transient HTTP — butGatewayDrainingErroris not checked and falls to the genericAgent failed before replymessage.Suggested fix
Add a check before the generic error handler:
Environment