fix(agent): recover Responses streams with null output#11182
fix(agent): recover Responses streams with null output#11182richard950825-sys wants to merge 4 commits into
Conversation
Review — independent read of current head
The fix is well-targeted and the issue write-up is clear. A few things worth addressing: High (please verify) — main stream path now bypasses
|
The chatgpt.com/backend-api/codex backend can emit a terminal event (response.completed/failed/incomplete) whose `output` is null. The openai SDK (2.24.0) then crashes with "'NoneType' object is not iterable" at lib/_parsing/_responses.py:61 (`for output in response.output`). A captured traceback proved the crash happens INSIDE the event loop — accumulate_event() -> parse_response() during `for event in stream` — i.e. before get_final_response() is ever reached. run_codex_stream only caught httpx/RuntimeError, so the TypeError escaped, was classified upstream as a non-retryable local error, and the raw "'NoneType' object is not iterable" was surfaced to the user (e.g. the Telegram gateway). Fix (defense in depth, both Codex streaming code paths): - run_codex_stream (agent/codex_runtime.py): wrap the stream-event loop with an `except TypeError` that recovers from the output items already yielded via response.output_item.done (or from streamed text deltas), or returns an empty-output response so validate_response() routes to retry/fallback. Also guard get_final_response() and extend the backfill to treat a null `output`. - _CodexCompletionsAdapter (agent/auxiliary_client.py): the auxiliary client has a parallel Codex stream loop used for summaries/titles/iteration-limit recaps with the same vulnerability — mirror the same recovery there. Adds regression tests for both files covering the iteration-crash path, the get_final_response crash path, and the nothing-collected case. Verified in production: under a degraded codex backend returning null output on every call, the gateway recovered 9 consecutive crashes (4-8 output items each) with zero raw errors surfaced, instead of dying on every turn. Upstream: NousResearch#11179 (PR NousResearch#11182). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Superseded by #32963 (cherry-picked from @carltonawong's PR #32890, which ported your fix shape onto current main and added iterator-time regression coverage for the failure shape that broke the Codex backend today). Thanks for the original fix @richard950825-sys — your work is what the salvage was based on. Closes #11179. |
What does this PR do?
Fixes a Responses streaming compatibility crash where an OpenAI-compatible provider streams valid
response.output_item.doneevents, then sends a terminal response whoseresponse.outputisnull.Existing recovery handled
response.output == [], butoutput=Nonecan make the OpenAI SDK raise insidestream.get_final_response()before Hermes reaches that backfill logic. This PR recovers from the already-streamed output events in both the main agent stream path and the auxiliary Codex/Responses adapter.Related Issue
Fixes #11179
Type of Change
Changes Made
run_agent.pyresponse.completed/response.incomplete/response.failedobjects from stream events before falling back to SDK final parsing.TypeError("'NoneType' object is not iterable")final-parse failure only when streamed output items or text deltas are available.responses.create(stream=True)fallback events.agent/auxiliary_client.pytests/run_agent/test_run_agent_codex_responses.pyresponse.completed.response.outputisNonebut a priorresponse.output_item.doneexists.tests/agent/test_auxiliary_client.pyHow to Test
Run targeted regressions:
Run the related test files:
Optional syntax check:
Checklist
Code
fix(scope):,feat(scope):, etc.)pytest tests/ -qand all tests passDocumentation & Housekeeping
docs/, docstrings) ? or N/Acli-config.yaml.exampleif I added/changed config keys ? or N/ACONTRIBUTING.mdorAGENTS.mdif I changed architecture or workflows ? or N/AFor New Skills
N/A
Screenshots / Logs
Targeted regressions:
Related test files: