Bug type
Regression (worked before, now fails)
Beta release blocker
No
Summary
After upgrading from 2026.4.24 to 2026.4.29, WebUI chat responses became extremely slow or timed out, while the gateway stayed healthy and direct Codex CLI inside the same Docker container replied quickly. Rolling back to 2026.4.24 made WebUI chat usable again.
Steps to reproduce
- Run OpenClaw 2026.4.29 in Docker on Ubuntu 24.04 with OpenAI Codex OAuth and model openai-codex/gpt-5.4.
- Open the Control WebUI.
- Start a new chat session or use /new.
- Send a trivial prompt: "Antworte exakt mit OK. Keine Tools nutzen. Keine Erklärung."
- Observe that chat.send is fast, but the embedded agent startup/prep takes around 90+ seconds or times out.
- Compare with direct Codex CLI inside the same container, which replies in about 7–11 seconds.
- Roll back to 2026.4.24 with the same setup; WebUI chat becomes usable again, around 10–20 seconds.
Expected behavior
A trivial WebUI chat prompt should not spend around 90 seconds in embedded-agent startup/prep stages when direct Codex CLI inside the same container replies in about 7–11 seconds. The WebUI should respond in a reasonable time and should not trigger repeated long sessions.list/node.list/device.pair.list delays or stuck-session diagnostics.
Actual behavior
On 2026.4.29, the WebUI remains reachable and the gateway reports healthy. WebUI chat submission itself is fast, for example chat.send around 100–200ms, but the embedded agent takes around 90+ seconds before producing a reply or fails with timeout.
Observed slow stages for a trivial "reply OK" prompt:
startup stages totalMs=39182:
- model-resolution: 22485ms
- auth: 9317ms
- attempt-dispatch: 7373ms
prep stages totalMs=56699:
- core-plugin-tools: 30886ms
- system-prompt: 9299ms
- stream-setup: 10735ms
Other observed slow operations included:
- sessions.list: 55985ms / 61307ms
- node.list: 63236ms
- device.pair.list: 63278ms
- models.authStatus: 27357ms
- sessions.usage: 27867ms
Also observed:
- [fetch-timeout] fetch timeout reached; aborting operation
- stuck session diagnostics
- session-write-lock held for more than 15s
- agent cleanup timed out at pi-trajectory-flush
After rollback to 2026.4.24, cleaning stale UI/session state, and letting the gateway settle, chat response time returned to about 10–20 seconds.
OpenClaw version
Bad: 2026.4.29 Good/workaround: 2026.4.24
Operating system
Ubuntu 24.04 VPS
Install method
Docker / Docker Compose OpenClaw runs as openclaw-gateway in a custom Docker image based on ghcr.io/openclaw/openclaw:. The image additionally includes Codex CLI, jq, ripgrep, ffmpeg, GitHub CLI, and Python tooling.
Model
openai-codex/gpt-5.4
Provider / routing chain
OpenClaw -> OpenAI Codex OAuth -> gpt-5.4 No direct OpenAI API key is used.
Additional provider/model setup details
The default/main model is configured as openai-codex/gpt-5.4.
Direct Codex CLI inside the same openclaw-gateway container works quickly:
codex exec --cd /home/node/.openclaw/workspace --skip-git-repo-check --sandbox read-only -m gpt-5.4 "Antworte exakt mit OK. Keine Erklärung."
Observed result:
- Codex CLI 0.128.0 replied "OK"
- real time: 0m7.684s
Earlier with Codex CLI 0.125.0:
- replied "OK"
- real time: 0m11.443s
This suggests Docker networking, Codex OAuth, and the model provider are fundamentally working.
Logs, screenshots, and evidence
Example from 2026.4.29 for a trivial WebUI "reply OK" prompt:
[ws] ⇄ res ✓ chat.send 102ms runId=fe1e8869-5713-4ec5-b935-a8930f7b0259
[agent/embedded] [trace:embedded-run] startup stages:
runId=fe1e8869-5713-4ec5-b935-a8930f7b0259
sessionId=c7938d77-8971-4d9b-92ba-4db3125c123a
phase=attempt-dispatch
totalMs=39182
stages=workspace:2ms@2ms,
runtime-plugins:3ms@5ms,
hooks:0ms@5ms,
model-resolution:22485ms@22490ms,
auth:9317ms@31807ms,
context-engine:2ms@31809ms,
attempt-dispatch:7373ms@39182ms
[agent/embedded] [trace:embedded-run] prep stages:
runId=fe1e8869-5713-4ec5-b935-a8930f7b0259
sessionId=c7938d77-8971-4d9b-92ba-4db3125c123a
phase=stream-ready
totalMs=56699
stages=workspace-sandbox:141ms@141ms,
skills:2ms@143ms,
core-plugin-tools:30886ms@31029ms,
bootstrap-context:563ms@31592ms,
bundle-tools:2656ms@34248ms,
system-prompt:9299ms@43547ms,
session-resource-loader:2409ms@45956ms,
agent-session:8ms@45964ms,
stream-setup:10735ms@56699ms
Other observed slow calls:
[ws] ⇄ res ✓ sessions.list 55985ms
[ws] ⇄ res ✓ sessions.list 61307ms
[ws] ⇄ res ✓ node.list 63236ms
[ws] ⇄ res ✓ device.pair.list 63278ms
[ws] ⇄ res ✓ models.authStatus 27357ms
[ws] ⇄ res ✓ sessions.usage 27867ms
Observed diagnostics:
[fetch-timeout] fetch timeout reached; aborting operation
[diagnostic] stuck session:
sessionId=unknown
sessionKey=agent:main:main
state=processing
age=123s
queueDepth=1
reason=processing_with_queued_work
[session-write-lock] releasing lock held for 62016ms (max=15000ms):
/home/node/.openclaw/agents/main/sessions/sessions.json.lock
[agent/embedded] agent cleanup timed out:
step=pi-trajectory-flush
timeoutMs=10000
Session state observations:
- /home/node/.openclaw/agents/main/sessions/sessions.json was about 8.9M
- A temporary stale-looking .jsonl.lock file was observed and later disappeared
- After rollback, WebUI cache hard reload, and session state settling, 2026.4.24 became usable again
A rate limit was observed during debugging but was later ruled out as the main cause because direct Codex CLI calls inside the same container worked quickly after quota recovered.
Impact and severity
Affected: WebUI chat on my Docker-based OpenClaw setup using OpenAI Codex OAuth.
Severity: High for this setup because WebUI chat becomes effectively unusable on 2026.4.29.
Frequency: Reproduced repeatedly after upgrading to 2026.4.29.
Consequence: Simple prompts can take 90+ seconds or time out, while the same model via direct Codex CLI responds in about 7–11 seconds.
Workaround: Roll back to 2026.4.24.
Additional information
I also tested whether too many main-agent skills caused the issue. Reducing the main-agent skill list did not materially improve the core-plugin-tools latency:
Before reducing skills:
core-plugin-tools: 30886ms
After reducing main-agent skills:
core-plugin-tools: 29555ms
memory-core dreaming was disabled during debugging.
After rolling back from 2026.4.29 to 2026.4.24, I initially saw a WebUI/client cache mismatch:
models.list error:
invalid models.list params: at root: unexpected property 'view'
After hard-reloading the WebUI / using a fresh session, models.list became fast again.
I can provide additional logs or run specific debug commands if helpful.
Bug type
Regression (worked before, now fails)
Beta release blocker
No
Summary
After upgrading from 2026.4.24 to 2026.4.29, WebUI chat responses became extremely slow or timed out, while the gateway stayed healthy and direct Codex CLI inside the same Docker container replied quickly. Rolling back to 2026.4.24 made WebUI chat usable again.
Steps to reproduce
Expected behavior
A trivial WebUI chat prompt should not spend around 90 seconds in embedded-agent startup/prep stages when direct Codex CLI inside the same container replies in about 7–11 seconds. The WebUI should respond in a reasonable time and should not trigger repeated long sessions.list/node.list/device.pair.list delays or stuck-session diagnostics.
Actual behavior
On 2026.4.29, the WebUI remains reachable and the gateway reports healthy. WebUI chat submission itself is fast, for example chat.send around 100–200ms, but the embedded agent takes around 90+ seconds before producing a reply or fails with timeout.
Observed slow stages for a trivial "reply OK" prompt:
startup stages totalMs=39182:
prep stages totalMs=56699:
Other observed slow operations included:
Also observed:
After rollback to 2026.4.24, cleaning stale UI/session state, and letting the gateway settle, chat response time returned to about 10–20 seconds.
OpenClaw version
Bad: 2026.4.29 Good/workaround: 2026.4.24
Operating system
Ubuntu 24.04 VPS
Install method
Docker / Docker Compose OpenClaw runs as openclaw-gateway in a custom Docker image based on ghcr.io/openclaw/openclaw:. The image additionally includes Codex CLI, jq, ripgrep, ffmpeg, GitHub CLI, and Python tooling.
Model
openai-codex/gpt-5.4
Provider / routing chain
OpenClaw -> OpenAI Codex OAuth -> gpt-5.4 No direct OpenAI API key is used.
Additional provider/model setup details
The default/main model is configured as openai-codex/gpt-5.4.
Direct Codex CLI inside the same openclaw-gateway container works quickly:
codex exec --cd /home/node/.openclaw/workspace --skip-git-repo-check --sandbox read-only -m gpt-5.4 "Antworte exakt mit OK. Keine Erklärung."
Observed result:
Earlier with Codex CLI 0.125.0:
This suggests Docker networking, Codex OAuth, and the model provider are fundamentally working.
Logs, screenshots, and evidence
Impact and severity
Affected: WebUI chat on my Docker-based OpenClaw setup using OpenAI Codex OAuth.
Severity: High for this setup because WebUI chat becomes effectively unusable on 2026.4.29.
Frequency: Reproduced repeatedly after upgrading to 2026.4.29.
Consequence: Simple prompts can take 90+ seconds or time out, while the same model via direct Codex CLI responds in about 7–11 seconds.
Workaround: Roll back to 2026.4.24.
Additional information
I also tested whether too many main-agent skills caused the issue. Reducing the main-agent skill list did not materially improve the core-plugin-tools latency:
Before reducing skills:
core-plugin-tools: 30886ms
After reducing main-agent skills:
core-plugin-tools: 29555ms
memory-core dreaming was disabled during debugging.
After rolling back from 2026.4.29 to 2026.4.24, I initially saw a WebUI/client cache mismatch:
models.list error:
invalid models.list params: at root: unexpected property 'view'
After hard-reloading the WebUI / using a fresh session, models.list became fast again.
I can provide additional logs or run specific debug commands if helpful.