Skip to content

[Bug]: Gateway CPU spin / crash loop on Raspberry Pi 4 (ARM64) — regression from 4.23 to 4.25+ #79380

@jorgemarmor

Description

@jorgemarmor

Bug type

Regression (worked before, now fails)

Beta release blocker

No

Summary

After upgrading from 2026.4.23 to any version ≥2026.4.25 (tested 4.25, 4.29, 5.7), the gateway process pins CPU at 100%+ on Raspberry Pi 4 ARM64, starving the Node.js event loop; Telegram polling stalls permanently and never recovers. Rolling back to 4.23 resolves it immediately.

Steps to reproduce

  1. Run OpenClaw via Docker on Raspberry Pi 4 (4GB RAM, ARM64) with version 2026.4.23 — gateway idles at <5% CPU, Telegram works normally.
  2. Update image to any version ≥2026.4.25 (docker compose pull && docker compose up -d).
  3. Run docker stats — CPU pins at 100–123% and never drops.
  4. Send a message to the bot on Telegram — no reply, gateway logs show polling stall loop.

Expected behavior

On 2026.4.23, the gateway idles at <5% CPU after startup, Telegram polling completes normally, and the bot replies to messages within seconds.

Actual behavior

Gateway process consumes 100–123% CPU immediately after startup and never stabilizes. Telegram polling enters a permanent stall loop: getUpdates hangs for ~210s, transport rebuilds every ~3 minutes, all sendMessage and sendChatAction calls fail with "Network request failed". Session locks are held for 216,653ms (max is 15,000ms). Zombie processes (git, MainThread) accumulate. Model pricing fetches timeout at 60s. doctor command hangs indefinitely — completes analysis but never exits. On 4.29 specifically: ~150s overhead per turn, 76s in model-resolution, 38s in auth. Manual curl to api.telegram.org from inside the container succeeds on all versions, confirming this is not a network issue.

OpenClaw version

2026.5.7 (also tested 2026.4.25 and 2026.4.29 — same behavior)

Operating system

Raspberry Pi OS (Debian Bookworm), ARM64, kernel 6.x

Install method

docker

Model

openai-codex/gpt-5.5 (default), xai/grok-4-1-fast (fallback)

Provider / routing chain

openclaw -> anthropic / openai-codex / xai (direct API)

Additional provider/model setup details

Not model-specific — the CPU spin occurs before any model request is made. The gateway never stabilizes enough to process messages. Providers configured: Anthropic (Claude Sonnet/Opus), OpenAI Codex (OAuth), xAI (Grok), Mistral. Dual search: Tavily + web search. NODE_COMPILE_CACHE and OPENCLAW_NO_RESPAWN=1 already set.

Logs, screenshots, and evidence

[telegram] Polling stall detected (active getUpdates stuck for 207.3s); forcing restart.
[telegram] [diag] polling cycle finished reason=polling stall detected inFlight=0 outcome=error error=Network request for 'getUpdates' failed!
[telegram] polling runner stopped (polling stall detected); restarting in 30s.
[telegram] Polling runner stop timed out after 15s; forcing restart cycle.
[session-write-lock] releasing lock held for 216653ms (max=15000ms)
[diagnostic] stuck session: sessionId=699a966d state=processing age=232s queueDepth=1
[model-pricing] OpenRouter pricing fetch failed (timeout 60s): TimeoutError
[model-pricing] LiteLLM pricing fetch failed (timeout 60s): TimeoutError
[tools] read failed: ENOENT: no such file or directory, access '/home/node/.openclaw/workspace/memory/2026-04-27.md'
[telegram] sendMessage failed: Network request for 'sendMessage' failed!


docker stats output on affected version:

CPU %: 102.98%   MEM: 0B / 0B   PIDS: 14


docker stats output on 2026.4.23:

CPU %: 2.1%   MEM: 485MiB / 3.7GiB   PIDS: 12

Impact and severity

Affected: All ARM64 Docker users (confirmed Raspberry Pi 4)
Severity: Critical — gateway is completely unusable, no messages processed
Frequency: 100% reproducible on every boot with versions ≥4.25
Consequence: Forces permanent rollback to 4.23, blocking access to all security patches and features since April 2026

Additional information

Last known good version: 2026.4.23
First known bad version: 2026.4.25
Workaround: pin Docker image to ghcr.io/openclaw/openclaw:2026.4.23

Attempted fixes that did NOT help:

  • NODE_COMPILE_CACHE=/var/tmp/openclaw-compile-cache
  • OPENCLAW_NO_RESPAWN=1
  • Disabling Active Memory plugin
  • Cleaning orphan transcripts via doctor
  • Session cleanup (--enforce --fix-missing)
  • Full docker compose down && docker compose up -d

The 5.x runtime scoping and plugin memoization changes (designed to address this class of issue) do not resolve it on ARM64.

Related: #72338

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High-priority user-facing bug, regression, or broken workflow.bugSomething isn't workingclawsweeper:linked-pr-openClawSweeper found an open linked pull request for this issue.clawsweeper:needs-live-reproClawSweeper needs live local, crabbox, or manual validation to confirm this issue.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.impact:crash-loopCrash, hang, restart loop, or process-level availability failure.impact:message-lossChannel message delivery can be lost, duplicated, or misrouted.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🐚 platinum hermitGood issue quality with a plausible reproduction path needing some confirmation.regressionBehavior that previously worked and now failsstaleMarked as stale due to inactivity

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions