Skip to content

GPT-5.4 via openai-codex fails in normal mode with 'Empty/malformed response' fallback #5883

@BenedictExec

Description

@BenedictExec

Hermes Bug Report: GPT-5.4 via OpenAI Codex Stream Backfill Failure

Summary

GPT-5.4 model via openai-codex provider fails in normal chat mode with "Empty/malformed response" fallback, but succeeds in verbose mode (-v flag). The issue is in Codex Responses API stream backfill logic.

Environment

Steps to Reproduce

  1. Set model to gpt-5.4 with openai-codex provider
  2. Run: hermes chat -q "hello"
  3. Observe: "⚠️ Empty/malformed response — switching to fallback..."
  4. Falls back to claude-haiku-4-5-20251001

Expected Behavior

gpt-5.4 should respond normally, as it does in verbose mode.

Actual Behavior

  • Normal mode: Triggers fallback with empty response
  • Verbose mode (hermes chat -q "hello" -v): Works perfectly, returns 29 output tokens, displays "Received. I'm here."

Root Cause

File: ~/.hermes/hermes-agent/run_agent.py, lines 7475-7503

The Codex response validation logic detects that response.output is an empty list. The code attempts fallback to response.output_text, but that field is also absent or empty. This causes the response to be marked response_invalid = True, triggering the provider fallback chain.

The underlying issue is in the stream backfill logic (line 7476-7478):

_run_codex_stream's backfill from output_item.done events and text-delta 
synthesis both failed to populate output.

The Codex Responses API stream is returning events but they're not being properly backfilled into response.output. This suggests either:

  1. The stream event parser is not correctly converting Codex stream events to output items
  2. The Codex backend changed its stream response format
  3. There's a timing issue in stream collection before response object creation

Logs (Verbose Mode - Working)

11:58:19 - run_agent - DEBUG - Codex stream: backfilled 2 output items from stream events
11:58:19 - root - DEBUG - API Response received - Model: gpt-5.4, Usage: ResponseUsage(...)
11:58:19 - run_agent - INFO - API call #1: model=gpt-5.4 provider=openai-codex in=12641 out=29 total=12670 latency=5.3s
🤖 Assistant: Received. I'm here.

In verbose mode, "Codex stream: backfilled 2 output items" succeeds. In normal mode, this likely fails silently.

Code Location

File: ~/.hermes/hermes-agent/run_agent.py
Lines: 7463-7503 (validation logic)
Related: Lines 3443-3462 (_normalize_codex_response fallback handling)
Stream handler: _run_codex_stream() method (needs investigation)

Affected Users

Any user attempting to use gpt-5.4 or other Codex Responses API models via openai-codex provider in normal (non-verbose) chat mode.

Workaround

Use verbose mode:

hermes chat -v

Or switch to a different provider/model combination (e.g., openai with gpt-4o, or anthropic with claude-opus).

Logs Attached

  • Verbose mode output: Shows successful API call with 29 tokens
  • Config: model.default: gpt-5.4, provider: openai-codex
  • Stream backfill fails in normal mode but succeeds in verbose mode

Severity

Medium - Feature works in verbose mode; only affects interactive terminal UX.

Additional Context

The fix applied in run_agent.py at lines 7463-7503 correctly identifies that this is a Codex response and checks for output_text fallback. However, the underlying stream backfill mechanism still needs investigation. The root cause is likely in how _run_codex_stream() processes Codex Responses API events before creating the response object.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existscomp/agentCore agent loop, run_agent.py, prompt builderprovider/openaiOpenAI / Codex Responses APItype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions