Summary
This looks like a more specific root-cause report for the broader openai/* embedded run hangs until timeout symptom reported in #76174 (and possibly related to #75824).
On OpenClaw 2026.4.29 (a448042), embedded OpenAI Responses streaming can stall because the SSE sanitization layer appears to starve or corrupt the stream before the OpenAI SDK parser sees the downstream events.
A clean repro on macOS succeeds immediately when sanitizeOpenAISdkSseResponse() is bypassed for one run, which strongly isolates the timeout to that sanitizer path rather than general OpenAI connectivity, auth, or the embedded runner itself.
Environment
- OpenClaw
2026.4.29 (a448042)
- macOS arm64
- Node
24.13.1
- Provider: OpenAI direct API key path
- Models observed failing in embedded runs before the bypass:
openai/gpt-5.4, openai/gpt-5.3-codex, openai/gpt-4o-mini
What we observed before the bypass
For embedded runs using the OpenAI Responses path:
- request is dispatched successfully
- OpenAI returns
200 OK
- content type is
text/event-stream; charset=utf-8
- response body is present
- the embedded runner reaches
stream-ready
- a first event of type
start is observed
- then no usable downstream reply events arrive and the run eventually hits the idle watchdog / embedded timeout
In other words, this is not a simple fetch never returned failure.
Critical isolation result
I patched the installed bundle temporarily so sanitizeOpenAISdkSseResponse() can be bypassed with an env var for a single run:
OPENCLAW_BYPASS_OPENAI_SDK_SSE_SANITIZE=1
Then I ran a fresh unique-session repro:
OPENCLAW_DISABLE_BONJOUR=1 \
OPENCLAW_BYPASS_OPENAI_SDK_SSE_SANITIZE=1 \
openclaw agent --local --agent main \
--session-id diag-bypass-sse-1777763671 \
--model openai/gpt-4o-mini \
--message "Reply with exactly: hello" \
--json --timeout 90
That run completed successfully and returned:
Total duration was ~30s, no fallback used.
Why this points at the sanitizer
With the bypass enabled, the downstream SSE/event flow looks normal again. The run emits/observes events like:
response.created
response.in_progress
response.output_item.added
response.content_part.added
response.output_text.delta
response.output_text.done
response.completed
Before the bypass, the stream reached the initial startup boundary but stalled before those downstream messages materialized.
Relevant diagnostic evidence
Before bypass:
[diag:fetch-sse] guarded-fetch-response provider=openai model=gpt-4o-mini status=200 contentType=text/event-stream; charset=utf-8 bodyPresent=true url=https://api.openai.com/v1/responses
[diag:fetch-sse] sanitize-start status=200 contentType=text/event-stream; charset=utf-8 url=https://api.openai.com/v1/responses
[diag:fetch-sse] sanitize-pull index=1 done=false bytes=492
[diag:openai-sdk] parse-response-stream status=200 contentType=text/event-stream; charset=utf-8 bodyPresent=true
[diag:embedded-stream] first-event provider=openai model=gpt-4o-mini api=openai-responses elapsedMs=1642 type=start
[diag:fetch-sse] sanitize-pull index=2 done=false bytes=1369
...
embedded run timeout
After bypass:
[diag:fetch-sse] sanitize-bypassed status=200 contentType=text/event-stream; charset=utf-8 url=https://api.openai.com/v1/responses
[diag:openai-sdk] sse-message index=5 event=response.output_text.delta dataPrefix={"type":"response.output_text.delta",..."delta":"hello"...}
[openai-transport] [diag:openai-responses] text-delta provider=openai model=gpt-4o-mini api=openai-responses contentIndex=0 deltaLen=5 totalLen=5
[openai-transport] [diag:openai-responses] response-completed provider=openai model=gpt-4o-mini api=openai-responses status=completed stopReason=stop contentBlocks=1
Suspect area
The problematic layer appears to be the SSE sanitation/filtering path around the function:
sanitizeOpenAISdkSseResponse(response)
That layer sits between the raw text/event-stream HTTP response and the OpenAI SDK SSE parser.
Expected behavior
OpenClaw should preserve valid OpenAI Responses SSE frames so embedded OpenAI runs can stream normally without requiring a bypass.
Actual behavior
The sanitizer path appears to interfere with or starve the SSE stream, causing embedded runs to hang until timeout.
Workaround
A temporary bundle-local env-gated bypass of the sanitizer restored correct behavior immediately:
OPENCLAW_BYPASS_OPENAI_SDK_SSE_SANITIZE=1
Request
Could maintainers take a look at the SSE sanitizer path for the OpenAI Responses stream on 2026.4.29?
This seems like a concrete root-cause lead for the broader openai/* embedded runs hang until timeout reports such as #76174.
Summary
This looks like a more specific root-cause report for the broader
openai/* embedded run hangs until timeoutsymptom reported in #76174 (and possibly related to #75824).On OpenClaw
2026.4.29 (a448042), embedded OpenAI Responses streaming can stall because the SSE sanitization layer appears to starve or corrupt the stream before the OpenAI SDK parser sees the downstream events.A clean repro on macOS succeeds immediately when
sanitizeOpenAISdkSseResponse()is bypassed for one run, which strongly isolates the timeout to that sanitizer path rather than general OpenAI connectivity, auth, or the embedded runner itself.Environment
2026.4.29 (a448042)24.13.1openai/gpt-5.4,openai/gpt-5.3-codex,openai/gpt-4o-miniWhat we observed before the bypass
For embedded runs using the OpenAI Responses path:
200 OKtext/event-stream; charset=utf-8stream-readystartis observedIn other words, this is not a simple
fetch never returnedfailure.Critical isolation result
I patched the installed bundle temporarily so
sanitizeOpenAISdkSseResponse()can be bypassed with an env var for a single run:Then I ran a fresh unique-session repro:
OPENCLAW_DISABLE_BONJOUR=1 \ OPENCLAW_BYPASS_OPENAI_SDK_SSE_SANITIZE=1 \ openclaw agent --local --agent main \ --session-id diag-bypass-sse-1777763671 \ --model openai/gpt-4o-mini \ --message "Reply with exactly: hello" \ --json --timeout 90That run completed successfully and returned:
Total duration was ~30s, no fallback used.
Why this points at the sanitizer
With the bypass enabled, the downstream SSE/event flow looks normal again. The run emits/observes events like:
response.createdresponse.in_progressresponse.output_item.addedresponse.content_part.addedresponse.output_text.deltaresponse.output_text.doneresponse.completedBefore the bypass, the stream reached the initial startup boundary but stalled before those downstream messages materialized.
Relevant diagnostic evidence
Before bypass:
After bypass:
Suspect area
The problematic layer appears to be the SSE sanitation/filtering path around the function:
sanitizeOpenAISdkSseResponse(response)That layer sits between the raw
text/event-streamHTTP response and the OpenAI SDK SSE parser.Expected behavior
OpenClaw should preserve valid OpenAI Responses SSE frames so embedded OpenAI runs can stream normally without requiring a bypass.
Actual behavior
The sanitizer path appears to interfere with or starve the SSE stream, causing embedded runs to hang until timeout.
Workaround
A temporary bundle-local env-gated bypass of the sanitizer restored correct behavior immediately:
Request
Could maintainers take a look at the SSE sanitizer path for the OpenAI Responses stream on
2026.4.29?This seems like a concrete root-cause lead for the broader
openai/* embedded runs hang until timeoutreports such as #76174.