Skip to content

fix(agent): recover Responses streams with null output#11182

Closed
richard950825-sys wants to merge 4 commits into
NousResearch:mainfrom
richard950825-sys:fix/responses-null-output-backfill
Closed

fix(agent): recover Responses streams with null output#11182
richard950825-sys wants to merge 4 commits into
NousResearch:mainfrom
richard950825-sys:fix/responses-null-output-backfill

Conversation

@richard950825-sys

Copy link
Copy Markdown

What does this PR do?

Fixes a Responses streaming compatibility crash where an OpenAI-compatible provider streams valid response.output_item.done events, then sends a terminal response whose response.output is null.

Existing recovery handled response.output == [], but output=None can make the OpenAI SDK raise inside stream.get_final_response() before Hermes reaches that backfill logic. This PR recovers from the already-streamed output events in both the main agent stream path and the auxiliary Codex/Responses adapter.

Related Issue

Fixes #11179

Type of Change

  • ?? Bug fix (non-breaking change that fixes an issue)
  • ? New feature (non-breaking change that adds functionality)
  • ?? Security fix
  • ?? Documentation update
  • ? Tests (adding or improving test coverage)
  • ?? Refactor (no behavior change)
  • ?? New skill (bundled or hub)

Changes Made

  • run_agent.py
    • Adds a shared Responses stream backfill helper for missing/null/empty terminal output.
    • Captures terminal response.completed / response.incomplete / response.failed objects from stream events before falling back to SDK final parsing.
    • Recovers from the specific SDK TypeError("'NoneType' object is not iterable") final-parse failure only when streamed output items or text deltas are available.
    • Applies the same backfill behavior to responses.create(stream=True) fallback events.
  • agent/auxiliary_client.py
    • Adds equivalent null-output recovery for the Codex auxiliary chat-completions shim used by compression, memory flushes, vision, session search, and web extraction paths.
  • tests/run_agent/test_run_agent_codex_responses.py
    • Adds a fake-stream regression where response.completed.response.output is None but a prior response.output_item.done exists.
  • tests/agent/test_auxiliary_client.py
    • Adds the same regression for the auxiliary client path.

How to Test

  1. Run targeted regressions:

    uv run --extra dev pytest tests/run_agent/test_run_agent_codex_responses.py::test_run_codex_stream_completed_output_none_backfills_output_items tests/agent/test_auxiliary_client.py::TestCodexAuxiliaryClientStreamRecovery::test_stream_completed_output_none_backfills_output_items -q
  2. Run the related test files:

    uv run --extra dev pytest tests/run_agent/test_run_agent_codex_responses.py tests/agent/test_auxiliary_client.py -q
  3. Optional syntax check:

    uv run --extra dev python -m py_compile run_agent.py agent/auxiliary_client.py tests/run_agent/test_run_agent_codex_responses.py tests/agent/test_auxiliary_client.py

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run pytest tests/ -q and all tests pass
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform: Alibaba Cloud Linux 3 (OpenAnolis Edition)

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) ? or N/A
  • I've updated cli-config.yaml.example if I added/changed config keys ? or N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows ? or N/A
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide ? or N/A
  • I've updated tool descriptions/schemas if I changed tool behavior ? or N/A

For New Skills

N/A

Screenshots / Logs

Targeted regressions:

..                                                                       [100%]
2 passed, 2 warnings in 10.54s

Related test files:

........................................................................ [ 50%]
........................................................................ [100%]
144 passed, 142 warnings in 13.72s

@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder labels Apr 25, 2026
@JeremyDev87

Copy link
Copy Markdown

Review — independent read of current head

  • Head reviewed: e4cdf2e4a01a79c92f4ba72c4ff1c631dfd6bb9d
  • Scope: static review of the diff (4 files, +305/−99) against issue [Bug]: Responses stream crashes when terminal response.output is null #11179. I did not check out the branch or run the test suite locally; test results below are from the PR description, not re-executed.
  • Mergeability: branch currently reports CONFLICTING / DIRTY against main — a rebase/merge of main is needed before this can land.
  • CI: no checks reported on the branch.

The fix is well-targeted and the issue write-up is clear. A few things worth addressing:

High (please verify) — main stream path now bypasses get_final_response() for all providers

Previously _run_codex_stream only handled response.incomplete/response.failed in-loop and always fell through to stream.get_final_response(). It now returns inside the loop on the first response.completed carrying a non-None response:

elif event_type in {"response.completed", "response.incomplete", "response.failed"}:
    resp_obj = getattr(event, "response", None)
    ...
    if resp_obj is not None:
        return self._backfill_responses_stream_output(resp_obj, ...)

This removes the SDK's get_final_response() / parse_response() post-processing from the success path, not just the null-output case — so the blast radius is every request through this path. Could you confirm nothing downstream relied on SDK final-parse behavior the raw terminal-event response doesn't already provide (e.g. structured-output/text_format parsing, typed-object normalization, usage accounting)? The auxiliary and create_stream_fallback paths already returned the event response directly, so there's precedent, but this is new behavior for the main path.

Medium — the get_final_response() TypeError-recovery branch isn't covered by the new tests

Both new regressions emit a response.completed event whose response is a non-None object with output=None. That takes the in-loop return path and never calls get_final_response() — the final_error=TypeError(...) on the fake stream only acts as a guard ("terminal event should be used before final parsing"). So the central try/except recovery:

except TypeError as exc:
    if "NoneType" not in err_text or "not iterable" not in err_text:
        raise
    ...
    final_response = SimpleNamespace(status="completed", model=self.model, output=[], usage=None)

has no test that actually triggers it. Suggest adding a case where the terminal event's response is None (or absent) and get_final_response() raises the TypeError, so the recovery path is exercised end-to-end.

Low

  • Brittle error matching: recovery keys off "NoneType" ... "not iterable" substrings in the exception message, which can shift across Python/SDK versions. Reasonable as a narrowing guard, but a comment noting the dependency would help future maintainers.
  • Duplicate helpers: _backfill_responses_output (auxiliary_client.py) and _backfill_responses_stream_output (run_agent.py) are nearly identical; only the latter handles model/usage. Fine across modules for now, but they'll drift.
  • Type hint: model: str = None should be Optional[str] = None.

Nice

Adding has_tool_calls detection in _run_codex_create_stream_fallback is a real correctness improvement — the old fallback could collapse a function_call response into a synthesized plain-text message, and the shared helper now guards that with elif collected_text_deltas and not has_tool_calls.

Conclusion: not a formal approval — merge conflict must be resolved, and I'd like the High item confirmed and the Medium test gap closed. None of these are hard blockers to the core fix's correctness for the documented null-output scenario.

alessandropcostabr added a commit to alessandropcostabr/hermes-agent that referenced this pull request May 27, 2026
The chatgpt.com/backend-api/codex backend can emit a terminal event
(response.completed/failed/incomplete) whose `output` is null. The openai
SDK (2.24.0) then crashes with "'NoneType' object is not iterable" at
lib/_parsing/_responses.py:61 (`for output in response.output`).

A captured traceback proved the crash happens INSIDE the event loop —
accumulate_event() -> parse_response() during `for event in stream` — i.e.
before get_final_response() is ever reached. run_codex_stream only caught
httpx/RuntimeError, so the TypeError escaped, was classified upstream as a
non-retryable local error, and the raw "'NoneType' object is not iterable"
was surfaced to the user (e.g. the Telegram gateway).

Fix (defense in depth, both Codex streaming code paths):
- run_codex_stream (agent/codex_runtime.py): wrap the stream-event loop with
  an `except TypeError` that recovers from the output items already yielded
  via response.output_item.done (or from streamed text deltas), or returns an
  empty-output response so validate_response() routes to retry/fallback. Also
  guard get_final_response() and extend the backfill to treat a null `output`.
- _CodexCompletionsAdapter (agent/auxiliary_client.py): the auxiliary client
  has a parallel Codex stream loop used for summaries/titles/iteration-limit
  recaps with the same vulnerability — mirror the same recovery there.

Adds regression tests for both files covering the iteration-crash path, the
get_final_response crash path, and the nothing-collected case.

Verified in production: under a degraded codex backend returning null output
on every call, the gateway recovered 9 consecutive crashes (4-8 output items
each) with zero raw errors surfaced, instead of dying on every turn.

Upstream: NousResearch#11179 (PR NousResearch#11182).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@teknium1

Copy link
Copy Markdown
Contributor

Superseded by #32963 (cherry-picked from @carltonawong's PR #32890, which ported your fix shape onto current main and added iterator-time regression coverage for the failure shape that broke the Codex backend today). Thanks for the original fix @richard950825-sys — your work is what the salvage was based on. Closes #11179.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Responses stream crashes when terminal response.output is null

4 participants