-
-
Notifications
You must be signed in to change notification settings - Fork 79.1k
[Bug]: Local model provider calls thread block gateway event loop on Windows beta; trivial infer run takes ~4 minutes #86599
Copy link
Copy link
Closed
Labels
P1High-priority user-facing bug, regression, or broken workflow.High-priority user-facing bug, regression, or broken workflow.bugSomething isn't workingSomething isn't workingbug:behaviorIncorrect behavior without a crashIncorrect behavior without a crashclawsweeper:needs-infoClawSweeper needs more reporter information before it can verify this issue.ClawSweeper needs more reporter information before it can verify this issue.clawsweeper:needs-maintainer-reviewClawSweeper marked this issue as needing maintainer review before automation.ClawSweeper marked this issue as needing maintainer review before automation.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.ClawSweeper does not recommend queueing a new automated fix PR for this issue.impact:auth-providerAuth, provider routing, model choice, or SecretRef resolution may break.Auth, provider routing, model choice, or SecretRef resolution may break.impact:crash-loopCrash, hang, restart loop, or process-level availability failure.Crash, hang, restart loop, or process-level availability failure.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.Session, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🦐 gold shrimpDecent issue quality, but reproduction details are still incomplete.Decent issue quality, but reproduction details are still incomplete.
Metadata
Metadata
Assignees
Labels
P1High-priority user-facing bug, regression, or broken workflow.High-priority user-facing bug, regression, or broken workflow.bugSomething isn't workingSomething isn't workingbug:behaviorIncorrect behavior without a crashIncorrect behavior without a crashclawsweeper:needs-infoClawSweeper needs more reporter information before it can verify this issue.ClawSweeper needs more reporter information before it can verify this issue.clawsweeper:needs-maintainer-reviewClawSweeper marked this issue as needing maintainer review before automation.ClawSweeper marked this issue as needing maintainer review before automation.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.ClawSweeper does not recommend queueing a new automated fix PR for this issue.impact:auth-providerAuth, provider routing, model choice, or SecretRef resolution may break.Auth, provider routing, model choice, or SecretRef resolution may break.impact:crash-loopCrash, hang, restart loop, or process-level availability failure.Crash, hang, restart loop, or process-level availability failure.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.Session, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🦐 gold shrimpDecent issue quality, but reproduction details are still incomplete.Decent issue quality, but reproduction details are still incomplete.
Type
Fields
Give feedbackNo fields configured for issues without a type.
Bug type
Behavior bug (incorrect output/state without crash)
Beta release blocker
Yes
Summary
On Windows with OpenClaw 2026.5.24-beta.1, local model calls appear to block or starve the Gateway event loop. Even a trivial fresh prompt like hi, how are you or:
openclaw infer model run --model llamacpp/qwen3.5-9b-instruct-Q5_K_M.gguf --prompt "hi" --json
takes around 3 minutes.
The underlying llama.cpp backend can generate quickly in isolation, but when invoked through OpenClaw the Gateway shows repeated event-loop starvation warnings, slow WebSocket RPCs, Telegram fetch timer delays, and stalled sessions with activeWorkKind=model_call.
This reproduces with both llama.cpp and Ollama backends, so it does not look specific to one local server implementation.
Steps to reproduce
Fresh chat with a trivial prompt takes many minutes.
openclaw infer model run --prompt "hi" also takes ~3 minutes.
Gateway/control RPCs become very slow during the run.
Telegram health/fetch timers are delayed and report likely event-loop starvation.
Logs show model calls stuck with no progress.
Expected behavior
A trivial local model prompt should not starve the Gateway event loop. Even if the local backend/model is slow, Gateway timers, health checks, WebSocket RPCs, and channel polling should remain responsive or degrade gracefully.
Actual behavior
During local model calls, the Gateway event loop appears saturated:
eventLoopDelayP99Ms=20-29s
eventLoopUtilization=1
cpuCoreRatio≈0.98
activeWorkKind=model_call
This makes unrelated Gateway operations appear broken or delayed.
OpenClaw version
2026.5.24-beta.1
Operating system
Windows 11
Install method
npm
Model
Qwen 3.5 9B
Provider / routing chain
openclaw -> llama.cpp -> qwen
Additional provider/model setup details
openclaw-diagnostics-2026-05-25T18-08-09-809Z-6904.zip
Configs/backends tried
llama.cpp via OpenAI-compatible endpoint
Ollama backend
OpenAI Responses-style config
OpenAI Chat Completions-style config
Tool support enabled/disabled attempts
Fresh/simple prompts and fresh chats
The issue persists across local backend choices.
Diagnostics
I have an openclaw gateway diagnostics export zip generated while reproducing this. The export includes sanitized logs, gateway status, health, config shape, and stability data. I can attach it to this issue.
Logs, screenshots, and evidence
Relevant log excerpts: [diagnostic] liveness warning: reasons=event_loop_delay,event_loop_utilization,cpu interval=49s eventLoopDelayP99Ms=29813.1 eventLoopDelayMaxMs=29813.1 eventLoopUtilization=1 cpuCoreRatio=0.987 active=1 waiting=0 queued=0 work=[active=agent:main:main(processing/embedded_run,q=1,age=56s last=embedded_run:started)] [fetch-timeout] fetch timeout after 9999ms (elapsed 18183ms) timer delayed 8184ms, likely event-loop starvation operation=fetchWithTimeout url=https://api.telegram.org/.../getMe [agent/embedded] [trace:embedded-run] prep stages: runId=270498cf-d78a-4f58-ae81-f271e9ee4738 sessionId=d432c2dd-b18c-4ae8-947a-1dc7b409f875 phase=stream-ready totalMs=11071 stages=workspace-sandbox:2ms@2ms,skills:1ms@3ms,core-plugin-tools:2096ms@2099ms,bootstrap-context:18ms@2117ms,bundle-tools:338ms@2455ms,system-prompt:5976ms@8431ms,session-resource-loader:2604ms@11035ms,agent-session:5ms@11040ms,stream-setup:30ms@11070ms [diagnostic] long-running session: sessionId=d432c2dd-b18c-4ae8-947a-1dc7b409f875 sessionKey=agent:main:main state=processing age=135s queueDepth=1 reason=queued_behind_active_work classification=long_running activeWorkKind=model_call lastProgress=model_call:started lastProgressAge=87s recovery=none [diagnostic] stalled session: sessionId=d432c2dd-b18c-4ae8-947a-1dc7b409f875 sessionKey=agent:main:main state=processing age=140s queueDepth=0 reason=active_work_without_progress classification=stalled_agent_run activeWorkKind=model_call lastProgress=model_call:started Slow RPC examples from the same window: sessions.list 29482ms chat.history 30201ms sessions.list 20701ms sessions.list 25379ms models.list 29399msImpact and severity
Local model use is effectively unusable for even trivial prompts on this setup, despite the backend itself being capable of high token/sec throughput outside OpenClaw.
Additional information
The model provider invocation path for local providers on Windows may be doing CPU-heavy synchronous work or otherwise failing to isolate the local model request/stream processing from the Gateway event loop. The expensive pre-run prep is also visible (~11s), but the main failure appears after model_call:started, where the Gateway starts reporting starvation and stalled agent runs.
Edit:
Possibly related: sessions.list stalls while local model call is active:
While the local model call is stalled, repeated Gateway WS RPCs also become very slow:
text
18:53:58 [ws] ⇄ res ✓ sessions.list 20736ms
18:54:19 [ws] ⇄ res ✓ sessions.list 20652ms
18:55:19 [ws] ⇄ res ✓ sessions.list 21084ms
18:56:06 [ws] ⇄ res ✓ sessions.list 25379ms
19:18:54 [ws] ⇄ res ✓ sessions.list 20005ms
19:19:14 [ws] ⇄ res ✓ sessions.list 20414ms
These occur near event-loop starvation / stalled model-call logs:
text
fetch timeout ... timer delayed ... likely event-loop starvation
stalled session ... activeWorkKind=model_call lastProgress=model_call:started
Expected: local status/session RPCs should remain responsive even if a model backend is slow.
Actual: simple local RPCs take ~20-25s while the model call is active.