Hermes Bug Report: GPT-5.4 via OpenAI Codex Stream Backfill Failure
Summary
GPT-5.4 model via openai-codex provider fails in normal chat mode with "Empty/malformed response" fallback, but succeeds in verbose mode (-v flag). The issue is in Codex Responses API stream backfill logic.
Environment
Steps to Reproduce
- Set model to gpt-5.4 with openai-codex provider
- Run:
hermes chat -q "hello"
- Observe: "⚠️ Empty/malformed response — switching to fallback..."
- Falls back to claude-haiku-4-5-20251001
Expected Behavior
gpt-5.4 should respond normally, as it does in verbose mode.
Actual Behavior
- Normal mode: Triggers fallback with empty response
- Verbose mode (
hermes chat -q "hello" -v): Works perfectly, returns 29 output tokens, displays "Received. I'm here."
Root Cause
File: ~/.hermes/hermes-agent/run_agent.py, lines 7475-7503
The Codex response validation logic detects that response.output is an empty list. The code attempts fallback to response.output_text, but that field is also absent or empty. This causes the response to be marked response_invalid = True, triggering the provider fallback chain.
The underlying issue is in the stream backfill logic (line 7476-7478):
_run_codex_stream's backfill from output_item.done events and text-delta
synthesis both failed to populate output.
The Codex Responses API stream is returning events but they're not being properly backfilled into response.output. This suggests either:
- The stream event parser is not correctly converting Codex stream events to output items
- The Codex backend changed its stream response format
- There's a timing issue in stream collection before response object creation
Logs (Verbose Mode - Working)
11:58:19 - run_agent - DEBUG - Codex stream: backfilled 2 output items from stream events
11:58:19 - root - DEBUG - API Response received - Model: gpt-5.4, Usage: ResponseUsage(...)
11:58:19 - run_agent - INFO - API call #1: model=gpt-5.4 provider=openai-codex in=12641 out=29 total=12670 latency=5.3s
🤖 Assistant: Received. I'm here.
In verbose mode, "Codex stream: backfilled 2 output items" succeeds. In normal mode, this likely fails silently.
Code Location
File: ~/.hermes/hermes-agent/run_agent.py
Lines: 7463-7503 (validation logic)
Related: Lines 3443-3462 (_normalize_codex_response fallback handling)
Stream handler: _run_codex_stream() method (needs investigation)
Affected Users
Any user attempting to use gpt-5.4 or other Codex Responses API models via openai-codex provider in normal (non-verbose) chat mode.
Workaround
Use verbose mode:
Or switch to a different provider/model combination (e.g., openai with gpt-4o, or anthropic with claude-opus).
Logs Attached
- Verbose mode output: Shows successful API call with 29 tokens
- Config:
model.default: gpt-5.4, provider: openai-codex
- Stream backfill fails in normal mode but succeeds in verbose mode
Severity
Medium - Feature works in verbose mode; only affects interactive terminal UX.
Additional Context
The fix applied in run_agent.py at lines 7463-7503 correctly identifies that this is a Codex response and checks for output_text fallback. However, the underlying stream backfill mechanism still needs investigation. The root cause is likely in how _run_codex_stream() processes Codex Responses API events before creating the response object.
Hermes Bug Report: GPT-5.4 via OpenAI Codex Stream Backfill Failure
Summary
GPT-5.4 model via openai-codex provider fails in normal chat mode with "Empty/malformed response" fallback, but succeeds in verbose mode (
-vflag). The issue is in Codex Responses API stream backfill logic.Environment
Steps to Reproduce
hermes chat -q "hello"Expected Behavior
gpt-5.4 should respond normally, as it does in verbose mode.
Actual Behavior
hermes chat -q "hello" -v): Works perfectly, returns 29 output tokens, displays "Received. I'm here."Root Cause
File:
~/.hermes/hermes-agent/run_agent.py, lines 7475-7503The Codex response validation logic detects that
response.outputis an empty list. The code attempts fallback toresponse.output_text, but that field is also absent or empty. This causes the response to be markedresponse_invalid = True, triggering the provider fallback chain.The underlying issue is in the stream backfill logic (line 7476-7478):
The Codex Responses API stream is returning events but they're not being properly backfilled into
response.output. This suggests either:Logs (Verbose Mode - Working)
In verbose mode, "Codex stream: backfilled 2 output items" succeeds. In normal mode, this likely fails silently.
Code Location
File:
~/.hermes/hermes-agent/run_agent.pyLines: 7463-7503 (validation logic)
Related: Lines 3443-3462 (
_normalize_codex_responsefallback handling)Stream handler:
_run_codex_stream()method (needs investigation)Affected Users
Any user attempting to use gpt-5.4 or other Codex Responses API models via openai-codex provider in normal (non-verbose) chat mode.
Workaround
Use verbose mode:
Or switch to a different provider/model combination (e.g., openai with gpt-4o, or anthropic with claude-opus).
Logs Attached
model.default: gpt-5.4,provider: openai-codexSeverity
Medium - Feature works in verbose mode; only affects interactive terminal UX.
Additional Context
The fix applied in run_agent.py at lines 7463-7503 correctly identifies that this is a Codex response and checks for
output_textfallback. However, the underlying stream backfill mechanism still needs investigation. The root cause is likely in how_run_codex_stream()processes Codex Responses API events before creating the response object.