Skip to content

[Bug] openai-codex provider returns empty response.output from agent loop on multiple gpt-5.x models (works in isolated direct call)Β #5736

@eve-coda

Description

@eve-coda

Type: πŸ› Bug
Severity: Medium β€” non-fatal (fallback recovers), but degrades any Hermes setup whose primary model is openai-codex/*. Affects all tested gpt-5.x codex models.
Maintainer note: filed by an outside contributor (no triage permissions to apply the bug label directly β€” please retag if appropriate).


Summary

Hermes' agent loop reliably trips the "Invalid API response: response.output is empty" validation at run_agent.py:7022-7024 when the primary model is set to any tested gpt-5.x model on the openai-codex provider β€” both gpt-5.4 and gpt-5.3-codex. The retry loop then exhausts max retries and falls back to the configured fallback model (minimax/minimax-m2.7 via openrouter in our case), which delivers a normal response. The user-facing symptom on a chat platform (Telegram, in our case) is:

⚠️ Empty/malformed response β€” switching to fallback...
πŸ”„ Primary model failed β€” switching to fallback: minimax/minimax-m2.7 via openrouter

The codex token itself is healthy (hermes auth list shows openai-codex/codex as default, OAuth device-code, no errors), and a direct minimal call through Hermes' own agent.auxiliary_client.resolve_provider_client('openai-codex', model='gpt-5.3-codex', raw_codex=True) then client.responses.stream(...) produces a valid streamed response with text deltas:

client, model = resolve_provider_client('openai-codex', model='gpt-5.3-codex', raw_codex=True)
with client.responses.stream(
    model='gpt-5.3-codex',
    instructions='You are a helpful assistant.',
    input=[{'role': 'user', 'content': 'Say hello in exactly 3 words.'}],
    store=False,
) as stream:
    # Yields ResponseTextDeltaEvent β†’ "Hello to you!" with finish_reason=stop
    ...

So the underlying credential, model, base URL, and stream pipeline all work in isolation. Something the agent loop adds to the request makes the codex endpoint return response.output = [].

Symptom in logs

WARNING root: Invalid API response (retry 1/3): response.output is empty | Provider: model=gpt-5.4
WARNING root: Invalid API response (retry 2/3): response.output is empty | Provider: model=gpt-5.4
ERROR root: Invalid API response after 3 retries.

Also reproduced with model: gpt-5.3-codex / provider: openai-codex after editing ~/.hermes/config.yaml and restarting the gateway. Same retry-then-fallback path.

Versions / environment

  • Hermes commit: local rollup, but the failing code paths in run_agent.py (_preflight_codex_api_kwargs, _run_codex_stream, the empty-output validation block) are identical to current origin/main (e651e041).
  • Python: 3.11.15
  • OS: Arch Linux (kernel 6.18.7-arch1-1)
  • openai SDK: bundled in ~/.hermes/hermes-agent/venv
  • Auth: openai-codex via hermes auth add openai-codex --label codex --set-default (OAuth device-code), confirmed active in hermes auth list

Config that reproduces

# ~/.hermes/config.yaml
model:
  default: gpt-5.3-codex   # also reproduces with gpt-5.4
  provider: openai-codex
fallback_providers:
- provider: openrouter
  model: minimax/minimax-m2.7

Restart gateway, send any inbound message via a configured platform (Telegram, in our case). Empty-output validation trips, fallback activates.

What I checked

  1. Codex-side request shape: my direct test confirms the codex Responses endpoint requires instructions, input (as a list), store=False, and stream=True. _preflight_codex_api_kwargs (run_agent.py:3013) correctly enforces all of these in the kwargs Hermes sends. So the kwargs should be valid.
  2. Model name is in _PROVIDER_MODELS["openai-codex"]: gpt-5.3-codex is the latest entry; gpt-5.4 is not listed (it's a copilot/opencode-zen model), but the failure is the same with both, so model-validation isn't the discriminator.
  3. Auth round-trip: direct minimal call with the exact same client returned by resolve_provider_client works. So the OAuth token, base URL, and streaming pipeline are healthy.
  4. No 4xx exception: the agent loop is reaching the empty-output validation branch, which only fires when getattr(response, 'output', None) is a list of length 0 β€” meaning the codex endpoint returned a 200 OK with an empty output array, not a 400 error.

Hypotheses (in order of likelihood)

  1. Tools or oversized prompt being sent in the agent loop trip a silent codex degradation. The minimal direct test sends one tiny user message and zero tools. The agent loop sends the full system prompt, conversation history, and ~50+ tool definitions. Codex may be returning empty output as a degraded-mode response when something in the kwargs is too large or malformed.
  2. Hermes' codex adapter is mis-handling a reasoning config or other passthrough kwarg that the bare client doesn't include.
  3. gpt-5.3-codex is restricted on this account's codex entitlement and codex returns empty output rather than a clean 4xx for entitlement violations. Less likely because direct call works, but possible if entitlement is per-route.

What would help diagnose

Happy to attach a HERMES_DUMP_REQUESTS=1 payload (sanitized) showing the exact api_kwargs Hermes is sending in the failing case, plus any structured response metadata visible to the validation block. Will follow up with that if the maintainers want it β€” let me know.

Workaround

Switching the primary model to a non-codex provider (we ended up on qwen/qwen3.6-plus:free via openrouter) avoids the issue entirely. So for users hitting this, the path of least resistance is:

model:
  default: openai/gpt-5.4   # or anthropic/claude-opus-4-6, etc.
  provider: openrouter

…which routes around the codex-specific code path.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium β€” degraded but workaround existscomp/agentCore agent loop, run_agent.py, prompt builderprovider/openaiOpenAI / Codex Responses APItype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions