Skip to content

[Bug]: 2026.4.29 — all openai/* embedded runs hang with zero tokens until timeout; direct curl to OpenAI works in 2s #76174

@pamopsdev

Description

@pamopsdev

Summary

On OpenClaw 2026.4.29, every openai/* model embedded run hangs for the full timeout window with zero tokens emitted and ends with FailoverError: LLM request timed out. Direct curl from the same host to api.openai.com using the same API key returns a real response in ~1–2 seconds, so the OpenAI Platform itself is healthy and reachable. The gateway-internal OpenAI call never produces output.

This affects openai/gpt-5.5, openai/gpt-5.4, and openai/gpt-5.4-nano (the last triggered by the active-memory plugin). Other providers (anthropic/claude-opus-4-7, google/gemini-3-flash) work fine in the same gateway. Only openai/* (direct API-key route, not codex OAuth) is broken.

Environment

  • OpenClaw 2026.4.29 (a448042)
  • macOS 26.4.1 arm64 (Mac Studio M3 Ultra)
  • Node v25.9.0
  • Auth: direct OpenAI Platform API key (OPENAI_API_KEY env), no Codex OAuth
  • Provider config: models.providers.openai.timeoutSeconds: 600, baseUrl: https://api.openai.com/v1

Reproduction

  1. Configure agents.defaults.model.primary (or any sub-agent model) to openai/gpt-5.5 (also tested openai/gpt-5.4).
  2. Trigger any embedded run (sub-agent spawn, /model gpt5.5 from chat, or active-memory pre-reply).
  3. Run hangs through full timeout window. Zero input/output tokens. No HTTP response observed.

Tried with and without:

  • agents.defaults.models["openai/gpt-5.5"].streaming: false
  • agents.defaults.timeoutSeconds: 900
  • plugins.entries.active-memory.enabled: false

None of these affect the outcome.

Proof OpenAI itself is healthy from the same host

$ curl -sS -w "HTTP: %{http_code}, time: %{time_total}s\n" \
    https://api.openai.com/v1/chat/completions \
    -H "Authorization: Bearer $OPENAI_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{"model":"gpt-5.5","messages":[{"role":"user","content":"hi"}]}'

{ ... valid response with content ... }
HTTP: 200, time: 2.161660s

DNS/network confirmed clean: api.openai.com resolves to 172.66.0.243 / 162.159.140.245, IPv4 + IPv6 reachable in < 100 ms, no proxy on user shell or gateway process, openclaw doctor clean (0 plugin errors, no warnings about OpenAI auth, runtime, or model config).

Gateway log evidence

Embedded run trace shows prep stages all complete normally and stream-ready fires, then nothing comes back from OpenAI until the embedded-run timeout aborts:

[trace:embedded-run] prep stages: runId=<id> phase=stream-ready totalMs=12892
  stages=workspace-sandbox:1ms, skills:0ms, core-plugin-tools:5216ms,
  bootstrap-context:21ms, bundle-tools:1452ms, system-prompt:2476ms,
  session-resource-loader:681ms, agent-session:1ms, stream-setup:3044ms

[agent/embedded] embedded run timeout: runId=<id> timeoutMs=120000
[agent/embedded] embedded run failover decision: stage=assistant decision=fallback_model reason=timeout from=openai/gpt-5.4 profile=-
[model-fallback/decision] decision=candidate_failed requested=openai/gpt-5.4 candidate=openai/gpt-5.4 reason=timeout next=none detail=LLM request timed out.

This pattern repeats across openai/gpt-5.5, openai/gpt-5.4, and openai/gpt-5.4-nano.

The active-memory plugin also fails in the same shape (its own internal gpt-5.4-nano calls time out), but disabling active-memory does NOT fix the user-visible embedded runs — the assistant-stage OpenAI call hangs independently.

Stats reported by the runtime for hung sub-agent runs: runtime 1m10s–6m46s • tokens 0 (in 0 / out 0) — i.e. no bytes ever come back from the OpenAI request before the watchdog fires.

What I'm asking for

  • Any pointer to a known issue / fix on this OpenClaw build for the OpenAI direct-API-key path
  • Or a diagnostics flag / DEBUG knob that would log whether the OpenAI HTTP request is actually being dispatched (and what response, if any, the SDK is receiving) — so we can localize whether this is in the OpenAI SDK, an undici/fetch pool issue, or further upstream

Happy to grab additional logs with whatever flag you suggest.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions