Skip to content

fix(agent): recover Codex Responses streams with null output#32890

Closed
carltonawong wants to merge 1 commit into
NousResearch:mainfrom
carltonawong:fix/codex-null-output-stream-fallback
Closed

fix(agent): recover Codex Responses streams with null output#32890
carltonawong wants to merge 1 commit into
NousResearch:mainfrom
carltonawong:fix/codex-null-output-stream-fallback

Conversation

@carltonawong

@carltonawong carltonawong commented May 27, 2026

Copy link
Copy Markdown
Contributor

Summary

  • recover Codex/Responses streams when the OpenAI SDK raises TypeError: 'NoneType' object is not iterable while parsing a terminal response.completed with response.output = null
  • preserve already-streamed response.output_item.done items or streamed text deltas instead of surfacing a bare NoneType failure
  • apply the same protection to auxiliary Codex calls used by compression/title/vision-style paths

Why a new PR

This is related to the existing report in #11179 and overlaps with #11182.

I opened a separate PR because #11182 is currently conflicted/stale against main, and its tests exercise the case where Hermes successfully receives a terminal event / reaches final-response parsing. The failure shape I reproduced can occur earlier: the SDK raises during stream iteration while parsing response.completed.response.output = null, before Hermes reaches the post-loop get_final_response() backfill path.

This PR keeps the same intent as #11182, but targets the iterator-time parser failure directly and adds regression coverage for both the main agent Codex stream and the auxiliary Codex adapter.

Fixes #11179.
Related to #11182; this PR ports the fix shape to current main and adds iterator-time parser regression coverage.

Changes

  • agent/codex_runtime.py
    • detect the SDK null-output parser TypeError
    • recover from collected response.output_item.done events / text deltas
    • backfill both None and empty-list outputs
    • fall back to raw create(stream=True) only when no recoverable events were collected
  • agent/auxiliary_client.py
    • apply the same null-output recovery for auxiliary Codex Responses streams
    • backfill None as well as empty-list final outputs
  • tests
    • main stream regression where __iter__ raises before get_final_response()
    • auxiliary adapter regression for the same iterator-time failure

Verification

  • python -m pytest -o addopts='' tests/run_agent/test_run_agent_codex_responses.py::test_run_codex_stream_falls_back_when_stream_iteration_parses_null_output tests/agent/test_auxiliary_client.py::TestCodexAuxiliaryAdapterNullOutputRecovery::test_recovers_output_item_when_sdk_raises_during_iteration -q
  • python -m pytest -o addopts='' tests/run_agent/test_run_agent_codex_responses.py tests/agent/test_auxiliary_client.py -q — 243 passed
  • git diff --check
  • python -m py_compile agent/codex_runtime.py agent/auxiliary_client.py

@carltonawong carltonawong changed the title fix(agent): recover Codex streams with null output fix(agent): recover Codex Responses streams with null output May 27, 2026
@JeremyDev87

Copy link
Copy Markdown

Review — static analysis at head db62b9fc763a6936a69e9936ef204c35fe5d4d17

Scope: static review of the 4-file diff plus full context of run_codex_stream, run_codex_create_stream_fallback, and the auxiliary Codex adapter. I did not execute the suite locally (the author reports 243 passed); CI shows no checks on the branch.

Summary

The fix is well-targeted: the SDK can raise TypeError: 'NoneType' object is not iterable during stream iteration (parsing response.completed.response.output = null), which happens before get_final_response(), so the existing post-loop backfill never runs. Guarding the iteration and recovering from already-streamed output_item.done / text deltas is the right shape. Extending the backfill condition from "empty list" to "None or empty list" is also correct, and the two iterator-time regression tests are nicely scoped.

A few things worth addressing before merge.


Medium

1. The except TypeError in run_codex_stream is broader than the failure it targets.
In agent/codex_runtime.py, the handler wraps the entire with active_client.responses.stream(...) block — including Hermes's own event handling and callbacks (_fire_stream_delta, _fire_reasoning_delta). If any of those raise 'NoneType' object is not iterable (a genuine bug in callback/handler code), _responses_null_output_iterable_error will misclassify it as the SDK null-output case and silently "recover," returning a partial response as success and masking the real bug.

The auxiliary adapter does this correctly: its inner try/except TypeError wraps only the SDK iteration + get_final_response(), not the surrounding logic. Consider narrowing the main path to match — wrap only the for event in stream + get_final_response() region and let callback exceptions propagate.

2. Test coverage gaps for the new branches.
Both new tests exercise the output_item.done recovery path only. Untested:

  • text-delta-only synthesis recovery (no output_item.done collected) — main and aux,
  • the recovered is None → _run_codex_create_stream_fallback(...) branch in run_codex_stream,
  • the new _out is None (vs empty-list) backfill branch,
  • the new has_tool_calls suppression added to run_codex_create_stream_fallback.

The has_tool_calls suppression is a behavior change (the fallback previously synthesized text unconditionally) and currently has no regression test.


Low

3. String-matching detection is fragile. _responses_null_output_iterable_error keys off "NoneType" in str(exc) and "not iterable" in str(exc). If the SDK wraps the error or a future CPython changes the message wording, recovery silently stops and the bare error resurfaces. It degrades to current behavior (not dangerous), but a comment pinning the expected SDK version / message would make the assumption explicit.

4. Recovered responses carry usage=None. Both backfill helpers set usage=None, so token/cost accounting is lost for any recovered turn. Acceptable given the SDK failed, but telemetry/billing will have a blind spot on these turns — worth a note.

5. Type annotations. model: str = None should be Optional[str] = None in both helpers (Optional is already imported in auxiliary_client.py).

6. Duplicated helpers. _responses_null_output_iterable_error and the backfill helper are now defined near-identically in both modules. A shared util would prevent the two copies drifting if the detection string ever changes in only one place.


Looks good

  • Non-matching TypeErrors are correctly re-raised.
  • Unrecoverable null-output correctly falls back to create(stream=True) (main path) or raises (aux), rather than fabricating output.
  • The has_tool_calls suppression makes the fallback path consistent with the documented intent in the main path ("a function_call response with incidental text should not be collapsed into a plain-text message").

Conclusion: Sound fix for a real iterator-time failure. No correctness blocker found in static review, but I'd recommend narrowing the main-path except TypeError scope (Medium 1) and adding the missing branch tests (Medium 2) before merge. This is a static review only — not a formal approval, and I did not run the suite.

@alt-glitch alt-glitch added type/bug Something isn't working comp/agent Core agent loop, run_agent.py, prompt builder provider/openai OpenAI / Codex Responses API P2 Medium — degraded but workaround exists labels May 27, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Duplicate of #32884 (same root cause: Codex Responses stream null output, #11179). Also competing with #32888 and #32891. All four PRs fix codex_runtime.py for the same SDK TypeError: 'NoneType' object is not iterable; #32890 and #32888 also fix auxiliary_client.py.

@alt-glitch alt-glitch added the duplicate This issue or pull request already exists label May 27, 2026
@carltonawong

Copy link
Copy Markdown
Contributor Author

Thanks — makes sense to consolidate around #32884.

One gap I wanted to flag from the live failure I hit: this same SDK response.output = null parser failure also showed up through the auxiliary Codex adapter path, not only the main codex_runtime.py path. In my case it affected compression/title-style calls as well as normal Discord turns.

This PR includes a small auxiliary-path regression in tests/agent/test_auxiliary_client.py plus the matching recovery in agent/auxiliary_client.py. If #32884 is the canonical fix, feel free to cherry-pick or port that auxiliary coverage there; happy to close this once that path is covered.

@bananohands

Copy link
Copy Markdown

Nice

@konstantinreed

Copy link
Copy Markdown

Thanks, this helped me recover my server quickly.

I cherry-picked the commit onto v2026.5.16 / v0.14.0 as a temporary hotfix. I chose this PR because it also covers the auxiliary summary/title/compression paths.

@teknium1

Copy link
Copy Markdown
Contributor

Salvaged onto current main via #32963 (merged as 43a3f11). Your authorship is preserved in git log on the fix commit. Thanks Carlton — clean diff, helpers were well-factored, and the iterator-time regression tests were exactly what we needed. Closes #11179.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder duplicate This issue or pull request already exists P2 Medium — degraded but workaround exists provider/openai OpenAI / Codex Responses API type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Responses stream crashes when terminal response.output is null

6 participants