[Bug]: Local model provider calls thread block gateway event loop on Windows beta; trivial infer run takes ~4 minutes

### Bug type

Behavior bug (incorrect output/state without crash)

### Beta release blocker

Yes

### Summary

On Windows with OpenClaw 2026.5.24-beta.1, local model calls appear to block or starve the Gateway event loop. Even a trivial fresh prompt like hi, how are you or:

openclaw infer model run --model llamacpp/qwen3.5-9b-instruct-Q5_K_M.gguf --prompt "hi" --json

takes around 3 minutes.

The underlying llama.cpp backend can generate quickly in isolation, but when invoked through OpenClaw the Gateway shows repeated event-loop starvation warnings, slow WebSocket RPCs, Telegram fetch timer delays, and stalled sessions with activeWorkKind=model_call.

This reproduces with both llama.cpp and Ollama backends, so it does not look specific to one local server implementation.

### Steps to reproduce

Fresh chat with a trivial prompt takes many minutes.
openclaw infer model run --prompt "hi" also takes ~3 minutes.
Gateway/control RPCs become very slow during the run.
Telegram health/fetch timers are delayed and report likely event-loop starvation.
Logs show model calls stuck with no progress.


### Expected behavior

A trivial local model prompt should not starve the Gateway event loop. Even if the local backend/model is slow, Gateway timers, health checks, WebSocket RPCs, and channel polling should remain responsive or degrade gracefully.

### Actual behavior

During local model calls, the Gateway event loop appears saturated:

eventLoopDelayP99Ms=20-29s
eventLoopUtilization=1
cpuCoreRatio≈0.98
activeWorkKind=model_call


This makes unrelated Gateway operations appear broken or delayed.

### OpenClaw version

2026.5.24-beta.1

### Operating system

Windows 11

### Install method

npm

### Model

Qwen 3.5 9B

### Provider / routing chain

openclaw -> llama.cpp -> qwen

### Additional provider/model setup details

[openclaw-diagnostics-2026-05-25T18-08-09-809Z-6904.zip](https://github.com/user-attachments/files/28231559/openclaw-diagnostics-2026-05-25T18-08-09-809Z-6904.zip)

Configs/backends tried

llama.cpp via OpenAI-compatible endpoint
Ollama backend
OpenAI Responses-style config
OpenAI Chat Completions-style config
Tool support enabled/disabled attempts
Fresh/simple prompts and fresh chats

The issue persists across local backend choices.

Diagnostics

I have an openclaw gateway diagnostics export zip generated while reproducing this. The export includes sanitized logs, gateway status, health, config shape, and stability data. I can attach it to this issue.

### Logs, screenshots, and evidence

```shell
Relevant log excerpts:

[diagnostic] liveness warning: reasons=event_loop_delay,event_loop_utilization,cpu interval=49s eventLoopDelayP99Ms=29813.1 eventLoopDelayMaxMs=29813.1 eventLoopUtilization=1 cpuCoreRatio=0.987 active=1 waiting=0 queued=0 work=[active=agent:main:main(processing/embedded_run,q=1,age=56s last=embedded_run:started)]

[fetch-timeout] fetch timeout after 9999ms (elapsed 18183ms) timer delayed 8184ms, likely event-loop starvation operation=fetchWithTimeout url=https://api.telegram.org/.../getMe

[agent/embedded] [trace:embedded-run] prep stages: runId=270498cf-d78a-4f58-ae81-f271e9ee4738 sessionId=d432c2dd-b18c-4ae8-947a-1dc7b409f875 phase=stream-ready totalMs=11071 stages=workspace-sandbox:2ms@2ms,skills:1ms@3ms,core-plugin-tools:2096ms@2099ms,bootstrap-context:18ms@2117ms,bundle-tools:338ms@2455ms,system-prompt:5976ms@8431ms,session-resource-loader:2604ms@11035ms,agent-session:5ms@11040ms,stream-setup:30ms@11070ms

[diagnostic] long-running session: sessionId=d432c2dd-b18c-4ae8-947a-1dc7b409f875 sessionKey=agent:main:main state=processing age=135s queueDepth=1 reason=queued_behind_active_work classification=long_running activeWorkKind=model_call lastProgress=model_call:started lastProgressAge=87s recovery=none

[diagnostic] stalled session: sessionId=d432c2dd-b18c-4ae8-947a-1dc7b409f875 sessionKey=agent:main:main state=processing age=140s queueDepth=0 reason=active_work_without_progress classification=stalled_agent_run activeWorkKind=model_call lastProgress=model_call:started

Slow RPC examples from the same window:

sessions.list 29482ms
chat.history 30201ms
sessions.list 20701ms
sessions.list 25379ms
models.list 29399ms
```

### Impact and severity

Local model use is effectively unusable for even trivial prompts on this setup, despite the backend itself being capable of high token/sec throughput outside OpenClaw.


### Additional information

The model provider invocation path for local providers on Windows may be doing CPU-heavy synchronous work or otherwise failing to isolate the local model request/stream processing from the Gateway event loop. The expensive pre-run prep is also visible (~11s), but the main failure appears after model_call:started, where the Gateway starts reporting starvation and stalled agent runs.

Edit:

### Possibly related: sessions.list stalls while local model call is active:
While the local model call is stalled, repeated Gateway WS RPCs also become very slow:
text
18:53:58 [ws] ⇄ res ✓ sessions.list 20736ms
18:54:19 [ws] ⇄ res ✓ sessions.list 20652ms
18:55:19 [ws] ⇄ res ✓ sessions.list 21084ms
18:56:06 [ws] ⇄ res ✓ sessions.list 25379ms
19:18:54 [ws] ⇄ res ✓ sessions.list 20005ms
19:19:14 [ws] ⇄ res ✓ sessions.list 20414ms
These occur near event-loop starvation / stalled model-call logs:
text
fetch timeout ... timer delayed ... likely event-loop starvation
stalled session ... activeWorkKind=model_call lastProgress=model_call:started
Expected: local status/session RPCs should remain responsive even if a model backend is slow.
Actual: simple local RPCs take ~20-25s while the model call is active.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: Local model provider calls thread block gateway event loop on Windows beta; trivial infer run takes ~4 minutes #86599

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

Possibly related: sessions.list stalls while local model call is active:

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: Local model provider calls thread block gateway event loop on Windows beta; trivial infer run takes ~4 minutes #86599

Description

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

Possibly related: sessions.list stalls while local model call is active:

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions