Skip to content

[Bug]: 2026.5.16-beta.4 on Raspberry Pi: cron forced-run closes gateway and triggers event-loop starvation #83456

@h-mascot

Description

@h-mascot

Bug type

Crash (process/app exits or hangs)

Beta release blocker

No

Summary

On a Raspberry Pi canary upgraded to 2026.5.16-beta.4, openclaw cron run --wait reproducibly fails with GatewayTransportError: gateway closed (1000 normal closure) and leaves the gateway health degraded with event-loop delay/utilization/CPU warnings.

Steps to reproduce

  1. Start from OpenClaw 2026.5.12 on Raspberry Pi / Linux arm64 with a systemd user gateway.
  2. Run openclaw update --channel beta to upgrade to 2026.5.16-beta.4.
  3. Verify the gateway is active and reachable with openclaw gateway status --deep.
  4. Create a temporary one-shot cron job with an agent-turn payload, for example Reply exactly SCOTTY_CRON_RESTART_REPRO_OK.
  5. Run the cron job manually with openclaw cron run --wait <job-id>.
  6. Observe the gateway close error and degraded event-loop health.
  7. Manually restart the gateway with systemctl --user restart openclaw-gateway.service, confirm health is initially clean, then repeat steps 4-6.

Expected behavior

openclaw cron run --wait <job-id> should complete the forced cron run, return the job result, or fail gracefully without closing the gateway connection or leaving the event loop degraded.

Actual behavior

The forced cron run failed twice on the beta canary. It first failed during the normal post-update canary run, then reproduced again after a clean manual gateway restart.

Observed error after restart retry:

gateway connect failed: Error: gateway closed (1000):
GatewayTransportError: gateway closed (1000 normal closure): no close reason
Gateway target: ws://127.0.0.1:18789
Source: local loopback
Config: /home/jamify/.openclaw/openclaw.json
Bind: auto

Post-cron health after the restart retry:

Gateway event loop: degraded reasons=event_loop_delay,event_loop_utilization,cpu max=8514ms p99=8514ms util=0.988 cpu=1.054

The initial update also completed package/plugin update but exited non-zero at the final health gate:

Gateway did not become healthy after restart

OpenClaw version

2026.5.16-beta.4 (38c3a8d)

Previous version before canary update: 2026.5.12 (f066dd2)

Operating system

Raspberry Pi / Linux arm64. Exact distro version: NOT_ENOUGH_INFO.

Install method

npm global install, upgraded with:

openclaw update --channel beta

Gateway managed by systemd user service:

openclaw-gateway.service - OpenClaw Gateway (v2026.5.16-beta.4)

Model

The failing path is the gateway/cron forced-run path, not a specific model completion.

A separate beta canary model-route smoke passed through the gateway using litellm/gpt-5.5 and returned SCOTTY_INFER_MODEL_OK.

Provider / routing chain

For the failing path:

openclaw CLI -> local gateway WebSocket ws://127.0.0.1:18789 -> cron forced-run -> isolated agent turn

For the separate passing model smoke:

openclaw CLI -> local gateway -> litellm/gpt-5.5

Additional provider/model setup details

No API keys, tokens, or passwords included. The issue reproduced on local loopback gateway transport, before any useful model result was returned from the forced cron run.

Logs, screenshots, and evidence

# Target
Host: Scotty / castlemascot-r1 / Raspberry Pi / Linux arm64
Before: OpenClaw 2026.5.12 (f066dd2)
After: OpenClaw 2026.5.16-beta.4 (38c3a8d)

# Update result
openclaw update --channel beta
- global update: OK
- global install swap: OK
- doctor during update: OK
- plugins updated: @openclaw/codex@beta, @openclaw/discord@beta
- service restarted on beta pid 621865
- final updater error: Gateway did not become healthy after restart

# Post-update diagnostics
openclaw gateway status --deep
- connectivity probe: OK
- admin-capable: yes
- CLI version: 2026.5.16-beta.4
- Gateway version: 2026.5.16-beta.4

openclaw status --all
- gateway reachable: 181ms
- Telegram: OK
- Discord: OK

# Passing P0 smoke checks
Runtime agent turn: SCOTTY_BETA_RUNTIME_OK — OpenClaw version 2026.5.16-beta.4
Model route via gateway: SCOTTY_INFER_MODEL_OK
Discord visible outbound delivery: message id 1505581524295356547
plugins list: works, 16/93 enabled
sessions list: works
cron scheduler status: enabled, 23 jobs

# Initial cron forced-run failure
GatewayTransportError: gateway closed (1000 normal closure)
Gateway health: degraded reasons=event_loop_delay,event_loop_utilization,cpu max/p99=3284ms util=0.965 cpu=1.074

# Logs during initial cron smoke
fetch timeout after 10000ms (elapsed 29468ms) timer delayed 19468ms, likely event-loop starvation operation=fetchWithTimeout url=https://discord.com/api/v10/users/@me
ws handshake timeout
[discord] gateway: Gateway websocket closed: 1000
closed before connect ... code=1000

# Restart retry before filing
Pre-restart health: event loop ok, max=69ms, p99=21ms
Manual restart command: systemctl --user restart openclaw-gateway.service
New gateway pid: 648275
openclaw gateway status --deep: connectivity probe OK, admin-capable, CLI and gateway both 2026.5.16-beta.4
Immediate post-restart health: degraded reasons=event_loop_utilization,cpu max/p99=446ms util=1 cpu=1.225

# Forced cron retry after clean restart
Temporary cron job: 2d61443a-74b5-41c8-a288-e23a8c1399d5
Payload: Reply exactly SCOTTY_CRON_RESTART_REPRO_OK
Command: openclaw cron run --wait 2d61443a-74b5-41c8-a288-e23a8c1399d5
Result:
gateway connect failed: Error: gateway closed (1000):
GatewayTransportError: gateway closed (1000 normal closure): no close reason
Gateway target: ws://127.0.0.1:18789
Source: local loopback
Config: /home/jamify/.openclaw/openclaw.json
Bind: auto

# Post-cron retry health
Gateway event loop: degraded reasons=event_loop_delay,event_loop_utilization,cpu max=8514ms p99=8514ms util=0.988 cpu=1.054

# Cleanup
Temporary cron job removed successfully.

Impact and severity

Affected: OpenClaw beta canary on Raspberry Pi / Linux arm64 using systemd user gateway and cron forced-run.

Severity: High for beta validation. This blocks expanding the beta to other agents in our fleet because cron forced-run is a P0 canary check and the gateway enters degraded event-loop health after the failure.

Frequency: Reproduced twice in the same canary, including once after a clean manual gateway restart. Broader frequency across other hosts: NOT_ENOUGH_INFO.

Consequence: Forced cron run does not produce the expected result, CLI sees a gateway close 1000, and the gateway health degrades with event-loop delay/utilization/CPU warnings.

Additional information

This may be related to existing gateway close/update health reports, but this canary adds a beta.4 Raspberry Pi cron forced-run repro with event-loop starvation evidence.

Related issue/PR found while checking for duplicates:

This report does not claim that beta.4 is generally broken. Other P0 smokes passed on the same canary: version, service active, status, doctor, gateway status, model route, runtime agent turn, Discord outbound delivery, plugins list, sessions list, and cron scheduler status.

The canary decision is fail-report / hold expansion: keep the beta isolated to Scotty until this is triaged or a workaround is known.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High-priority user-facing bug, regression, or broken workflow.clawsweeper:needs-live-reproClawSweeper needs live local, crabbox, or manual validation to confirm this issue.clawsweeper:needs-maintainer-reviewClawSweeper marked this issue as needing maintainer review before automation.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.impact:crash-loopCrash, hang, restart loop, or process-level availability failure.issue-rating: 🐚 platinum hermitGood issue quality with a plausible reproduction path needing some confirmation.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions