[Bug] openai-codex provider returns empty response.output from agent loop on multiple gpt-5.x models (works in isolated direct call)

> **Type:** 🐛 Bug
> **Severity:** Medium — non-fatal (fallback recovers), but degrades any Hermes setup whose primary model is `openai-codex/*`. Affects all tested gpt-5.x codex models.
> **Maintainer note:** filed by an outside contributor (no triage permissions to apply the `bug` label directly — please retag if appropriate).

---

## Summary

Hermes' agent loop reliably trips the *"Invalid API response: response.output is empty"* validation at `run_agent.py:7022-7024` when the primary model is set to **any tested gpt-5.x model on the `openai-codex` provider** — both `gpt-5.4` and `gpt-5.3-codex`. The retry loop then exhausts max retries and falls back to the configured fallback model (`minimax/minimax-m2.7` via `openrouter` in our case), which delivers a normal response. The user-facing symptom on a chat platform (Telegram, in our case) is:

> ⚠️ Empty/malformed response — switching to fallback...
> 🔄 Primary model failed — switching to fallback: minimax/minimax-m2.7 via openrouter

The codex token itself is healthy (`hermes auth list` shows `openai-codex/codex` as default, OAuth device-code, no errors), and a **direct minimal call** through Hermes' own `agent.auxiliary_client.resolve_provider_client('openai-codex', model='gpt-5.3-codex', raw_codex=True)` then `client.responses.stream(...)` produces a valid streamed response with text deltas:

```python
client, model = resolve_provider_client('openai-codex', model='gpt-5.3-codex', raw_codex=True)
with client.responses.stream(
    model='gpt-5.3-codex',
    instructions='You are a helpful assistant.',
    input=[{'role': 'user', 'content': 'Say hello in exactly 3 words.'}],
    store=False,
) as stream:
    # Yields ResponseTextDeltaEvent → "Hello to you!" with finish_reason=stop
    ...
```

So the underlying credential, model, base URL, and stream pipeline all work in isolation. Something the agent loop adds to the request makes the codex endpoint return `response.output = []`.

## Symptom in logs

```
WARNING root: Invalid API response (retry 1/3): response.output is empty | Provider: model=gpt-5.4
WARNING root: Invalid API response (retry 2/3): response.output is empty | Provider: model=gpt-5.4
ERROR root: Invalid API response after 3 retries.
```

Also reproduced with `model: gpt-5.3-codex / provider: openai-codex` after editing `~/.hermes/config.yaml` and restarting the gateway. Same retry-then-fallback path.

## Versions / environment

- Hermes commit: local rollup, but the failing code paths in `run_agent.py` (`_preflight_codex_api_kwargs`, `_run_codex_stream`, the empty-output validation block) are identical to current `origin/main` (`e651e041`).
- Python: 3.11.15
- OS: Arch Linux (kernel 6.18.7-arch1-1)
- `openai` SDK: bundled in `~/.hermes/hermes-agent/venv`
- Auth: `openai-codex` via `hermes auth add openai-codex --label codex --set-default` (OAuth device-code), confirmed active in `hermes auth list`

## Config that reproduces

```yaml
# ~/.hermes/config.yaml
model:
  default: gpt-5.3-codex   # also reproduces with gpt-5.4
  provider: openai-codex
fallback_providers:
- provider: openrouter
  model: minimax/minimax-m2.7
```

Restart gateway, send any inbound message via a configured platform (Telegram, in our case). Empty-output validation trips, fallback activates.

## What I checked

1. **Codex-side request shape**: my direct test confirms the codex Responses endpoint requires `instructions`, `input` (as a list), `store=False`, and `stream=True`. `_preflight_codex_api_kwargs` (`run_agent.py:3013`) correctly enforces all of these in the kwargs Hermes sends. So the kwargs *should* be valid.
2. **Model name is in `_PROVIDER_MODELS["openai-codex"]`**: `gpt-5.3-codex` is the latest entry; `gpt-5.4` is *not* listed (it's a `copilot`/`opencode-zen` model), but the failure is the same with both, so model-validation isn't the discriminator.
3. **Auth round-trip**: direct minimal call with the exact same client returned by `resolve_provider_client` works. So the OAuth token, base URL, and streaming pipeline are healthy.
4. **No 4xx exception**: the agent loop is reaching the empty-output validation branch, which only fires when `getattr(response, 'output', None)` is a list of length 0 — meaning the codex endpoint returned a 200 OK with an empty output array, not a 400 error.

## Hypotheses (in order of likelihood)

1. **Tools or oversized prompt being sent in the agent loop trip a silent codex degradation.** The minimal direct test sends one tiny user message and zero tools. The agent loop sends the full system prompt, conversation history, and ~50+ tool definitions. Codex may be returning empty `output` as a degraded-mode response when something in the kwargs is too large or malformed.
2. **Hermes' codex adapter is mis-handling a `reasoning` config or other passthrough kwarg** that the bare client doesn't include.
3. **`gpt-5.3-codex` is restricted on this account's codex entitlement** and codex returns empty output rather than a clean 4xx for entitlement violations. Less likely because direct call works, but possible if entitlement is per-route.

## What would help diagnose

Happy to attach a `HERMES_DUMP_REQUESTS=1` payload (sanitized) showing the exact `api_kwargs` Hermes is sending in the failing case, plus any structured response metadata visible to the validation block. Will follow up with that if the maintainers want it — let me know.

## Workaround

Switching the primary model to a non-codex provider (we ended up on `qwen/qwen3.6-plus:free` via `openrouter`) avoids the issue entirely. So for users hitting this, the path of least resistance is:

```yaml
model:
  default: openai/gpt-5.4   # or anthropic/claude-opus-4-6, etc.
  provider: openrouter
```

…which routes around the codex-specific code path.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] openai-codex provider returns empty response.output from agent loop on multiple gpt-5.x models (works in isolated direct call) #5736

Summary

Symptom in logs

Versions / environment

Config that reproduces

What I checked

Hypotheses (in order of likelihood)

What would help diagnose

Workaround

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Bug] openai-codex provider returns empty response.output from agent loop on multiple gpt-5.x models (works in isolated direct call) #5736

Description

Summary

Symptom in logs

Versions / environment

Config that reproduces

What I checked

Hypotheses (in order of likelihood)

What would help diagnose

Workaround

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions