Summary
On OpenClaw 2026.4.29, every openai/* model embedded run hangs for the full timeout window with zero tokens emitted and ends with FailoverError: LLM request timed out. Direct curl from the same host to api.openai.com using the same API key returns a real response in ~1–2 seconds, so the OpenAI Platform itself is healthy and reachable. The gateway-internal OpenAI call never produces output.
This affects openai/gpt-5.5, openai/gpt-5.4, and openai/gpt-5.4-nano (the last triggered by the active-memory plugin). Other providers (anthropic/claude-opus-4-7, google/gemini-3-flash) work fine in the same gateway. Only openai/* (direct API-key route, not codex OAuth) is broken.
Environment
- OpenClaw
2026.4.29 (a448042)
- macOS 26.4.1 arm64 (Mac Studio M3 Ultra)
- Node
v25.9.0
- Auth: direct OpenAI Platform API key (
OPENAI_API_KEY env), no Codex OAuth
- Provider config:
models.providers.openai.timeoutSeconds: 600, baseUrl: https://api.openai.com/v1
Reproduction
- Configure
agents.defaults.model.primary (or any sub-agent model) to openai/gpt-5.5 (also tested openai/gpt-5.4).
- Trigger any embedded run (sub-agent spawn,
/model gpt5.5 from chat, or active-memory pre-reply).
- Run hangs through full timeout window. Zero input/output tokens. No HTTP response observed.
Tried with and without:
agents.defaults.models["openai/gpt-5.5"].streaming: false
agents.defaults.timeoutSeconds: 900
plugins.entries.active-memory.enabled: false
None of these affect the outcome.
Proof OpenAI itself is healthy from the same host
$ curl -sS -w "HTTP: %{http_code}, time: %{time_total}s\n" \
https://api.openai.com/v1/chat/completions \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"gpt-5.5","messages":[{"role":"user","content":"hi"}]}'
{ ... valid response with content ... }
HTTP: 200, time: 2.161660s
DNS/network confirmed clean: api.openai.com resolves to 172.66.0.243 / 162.159.140.245, IPv4 + IPv6 reachable in < 100 ms, no proxy on user shell or gateway process, openclaw doctor clean (0 plugin errors, no warnings about OpenAI auth, runtime, or model config).
Gateway log evidence
Embedded run trace shows prep stages all complete normally and stream-ready fires, then nothing comes back from OpenAI until the embedded-run timeout aborts:
[trace:embedded-run] prep stages: runId=<id> phase=stream-ready totalMs=12892
stages=workspace-sandbox:1ms, skills:0ms, core-plugin-tools:5216ms,
bootstrap-context:21ms, bundle-tools:1452ms, system-prompt:2476ms,
session-resource-loader:681ms, agent-session:1ms, stream-setup:3044ms
[agent/embedded] embedded run timeout: runId=<id> timeoutMs=120000
[agent/embedded] embedded run failover decision: stage=assistant decision=fallback_model reason=timeout from=openai/gpt-5.4 profile=-
[model-fallback/decision] decision=candidate_failed requested=openai/gpt-5.4 candidate=openai/gpt-5.4 reason=timeout next=none detail=LLM request timed out.
This pattern repeats across openai/gpt-5.5, openai/gpt-5.4, and openai/gpt-5.4-nano.
The active-memory plugin also fails in the same shape (its own internal gpt-5.4-nano calls time out), but disabling active-memory does NOT fix the user-visible embedded runs — the assistant-stage OpenAI call hangs independently.
Stats reported by the runtime for hung sub-agent runs: runtime 1m10s–6m46s • tokens 0 (in 0 / out 0) — i.e. no bytes ever come back from the OpenAI request before the watchdog fires.
What I'm asking for
- Any pointer to a known issue / fix on this OpenClaw build for the OpenAI direct-API-key path
- Or a diagnostics flag / DEBUG knob that would log whether the OpenAI HTTP request is actually being dispatched (and what response, if any, the SDK is receiving) — so we can localize whether this is in the OpenAI SDK, an undici/fetch pool issue, or further upstream
Happy to grab additional logs with whatever flag you suggest.
Summary
On OpenClaw 2026.4.29, every
openai/*model embedded run hangs for the full timeout window with zero tokens emitted and ends withFailoverError: LLM request timed out.Directcurlfrom the same host toapi.openai.comusing the same API key returns a real response in ~1–2 seconds, so the OpenAI Platform itself is healthy and reachable. The gateway-internal OpenAI call never produces output.This affects
openai/gpt-5.5,openai/gpt-5.4, andopenai/gpt-5.4-nano(the last triggered by theactive-memoryplugin). Other providers (anthropic/claude-opus-4-7,google/gemini-3-flash) work fine in the same gateway. Onlyopenai/*(direct API-key route, not codex OAuth) is broken.Environment
2026.4.29 (a448042)v25.9.0OPENAI_API_KEYenv), no Codex OAuthmodels.providers.openai.timeoutSeconds: 600,baseUrl: https://api.openai.com/v1Reproduction
agents.defaults.model.primary(or any sub-agent model) toopenai/gpt-5.5(also testedopenai/gpt-5.4)./model gpt5.5from chat, or active-memory pre-reply).Tried with and without:
agents.defaults.models["openai/gpt-5.5"].streaming: falseagents.defaults.timeoutSeconds: 900plugins.entries.active-memory.enabled: falseNone of these affect the outcome.
Proof OpenAI itself is healthy from the same host
DNS/network confirmed clean:
api.openai.comresolves to172.66.0.243 / 162.159.140.245, IPv4 + IPv6 reachable in < 100 ms, no proxy on user shell or gateway process,openclaw doctorclean (0 plugin errors, no warnings about OpenAI auth, runtime, or model config).Gateway log evidence
Embedded run trace shows prep stages all complete normally and
stream-readyfires, then nothing comes back from OpenAI until the embedded-run timeout aborts:This pattern repeats across
openai/gpt-5.5,openai/gpt-5.4, andopenai/gpt-5.4-nano.The active-memory plugin also fails in the same shape (its own internal
gpt-5.4-nanocalls time out), but disabling active-memory does NOT fix the user-visible embedded runs — the assistant-stage OpenAI call hangs independently.Stats reported by the runtime for hung sub-agent runs:
runtime 1m10s–6m46s • tokens 0 (in 0 / out 0)— i.e. no bytes ever come back from the OpenAI request before the watchdog fires.What I'm asking for
Happy to grab additional logs with whatever flag you suggest.