-
-
Notifications
You must be signed in to change notification settings - Fork 79.1k
[Bug]: Gateway CPU spin / crash loop on Raspberry Pi 4 (ARM64) — regression from 4.23 to 4.25+ #79380
Copy link
Copy link
Closed
Labels
P1High-priority user-facing bug, regression, or broken workflow.High-priority user-facing bug, regression, or broken workflow.bugSomething isn't workingSomething isn't workingclawsweeper:linked-pr-openClawSweeper found an open linked pull request for this issue.ClawSweeper found an open linked pull request for this issue.clawsweeper:needs-live-reproClawSweeper needs live local, crabbox, or manual validation to confirm this issue.ClawSweeper needs live local, crabbox, or manual validation to confirm this issue.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.ClawSweeper does not recommend queueing a new automated fix PR for this issue.impact:crash-loopCrash, hang, restart loop, or process-level availability failure.Crash, hang, restart loop, or process-level availability failure.impact:message-lossChannel message delivery can be lost, duplicated, or misrouted.Channel message delivery can be lost, duplicated, or misrouted.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.Session, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🐚 platinum hermitGood issue quality with a plausible reproduction path needing some confirmation.Good issue quality with a plausible reproduction path needing some confirmation.regressionBehavior that previously worked and now failsBehavior that previously worked and now failsstaleMarked as stale due to inactivityMarked as stale due to inactivity
Metadata
Metadata
Assignees
Labels
P1High-priority user-facing bug, regression, or broken workflow.High-priority user-facing bug, regression, or broken workflow.bugSomething isn't workingSomething isn't workingclawsweeper:linked-pr-openClawSweeper found an open linked pull request for this issue.ClawSweeper found an open linked pull request for this issue.clawsweeper:needs-live-reproClawSweeper needs live local, crabbox, or manual validation to confirm this issue.ClawSweeper needs live local, crabbox, or manual validation to confirm this issue.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.ClawSweeper does not recommend queueing a new automated fix PR for this issue.impact:crash-loopCrash, hang, restart loop, or process-level availability failure.Crash, hang, restart loop, or process-level availability failure.impact:message-lossChannel message delivery can be lost, duplicated, or misrouted.Channel message delivery can be lost, duplicated, or misrouted.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.Session, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🐚 platinum hermitGood issue quality with a plausible reproduction path needing some confirmation.Good issue quality with a plausible reproduction path needing some confirmation.regressionBehavior that previously worked and now failsBehavior that previously worked and now failsstaleMarked as stale due to inactivityMarked as stale due to inactivity
Type
Fields
Give feedbackNo fields configured for issues without a type.
Bug type
Regression (worked before, now fails)
Beta release blocker
No
Summary
After upgrading from 2026.4.23 to any version ≥2026.4.25 (tested 4.25, 4.29, 5.7), the gateway process pins CPU at 100%+ on Raspberry Pi 4 ARM64, starving the Node.js event loop; Telegram polling stalls permanently and never recovers. Rolling back to 4.23 resolves it immediately.
Steps to reproduce
docker compose pull && docker compose up -d).docker stats— CPU pins at 100–123% and never drops.Expected behavior
On 2026.4.23, the gateway idles at <5% CPU after startup, Telegram polling completes normally, and the bot replies to messages within seconds.
Actual behavior
Gateway process consumes 100–123% CPU immediately after startup and never stabilizes. Telegram polling enters a permanent stall loop:
getUpdateshangs for ~210s, transport rebuilds every ~3 minutes, allsendMessageandsendChatActioncalls fail with "Network request failed". Session locks are held for 216,653ms (max is 15,000ms). Zombie processes (git,MainThread) accumulate. Model pricing fetches timeout at 60s.doctorcommand hangs indefinitely — completes analysis but never exits. On 4.29 specifically: ~150s overhead per turn, 76s inmodel-resolution, 38s inauth. Manualcurltoapi.telegram.orgfrom inside the container succeeds on all versions, confirming this is not a network issue.OpenClaw version
2026.5.7 (also tested 2026.4.25 and 2026.4.29 — same behavior)
Operating system
Raspberry Pi OS (Debian Bookworm), ARM64, kernel 6.x
Install method
docker
Model
openai-codex/gpt-5.5 (default), xai/grok-4-1-fast (fallback)
Provider / routing chain
openclaw -> anthropic / openai-codex / xai (direct API)
Additional provider/model setup details
Not model-specific — the CPU spin occurs before any model request is made. The gateway never stabilizes enough to process messages. Providers configured: Anthropic (Claude Sonnet/Opus), OpenAI Codex (OAuth), xAI (Grok), Mistral. Dual search: Tavily + web search. NODE_COMPILE_CACHE and OPENCLAW_NO_RESPAWN=1 already set.
Logs, screenshots, and evidence
Impact and severity
Affected: All ARM64 Docker users (confirmed Raspberry Pi 4)
Severity: Critical — gateway is completely unusable, no messages processed
Frequency: 100% reproducible on every boot with versions ≥4.25
Consequence: Forces permanent rollback to 4.23, blocking access to all security patches and features since April 2026
Additional information
Last known good version: 2026.4.23
First known bad version: 2026.4.25
Workaround: pin Docker image to ghcr.io/openclaw/openclaw:2026.4.23
Attempted fixes that did NOT help:
The 5.x runtime scoping and plugin memoization changes (designed to address this class of issue) do not resolve it on ARM64.
Related: #72338