Type: π Bug
Severity: Medium β non-fatal (fallback recovers), but degrades any Hermes setup whose primary model is openai-codex/*. Affects all tested gpt-5.x codex models.
Maintainer note: filed by an outside contributor (no triage permissions to apply the bug label directly β please retag if appropriate).
Summary
Hermes' agent loop reliably trips the "Invalid API response: response.output is empty" validation at run_agent.py:7022-7024 when the primary model is set to any tested gpt-5.x model on the openai-codex provider β both gpt-5.4 and gpt-5.3-codex. The retry loop then exhausts max retries and falls back to the configured fallback model (minimax/minimax-m2.7 via openrouter in our case), which delivers a normal response. The user-facing symptom on a chat platform (Telegram, in our case) is:
β οΈ Empty/malformed response β switching to fallback...
π Primary model failed β switching to fallback: minimax/minimax-m2.7 via openrouter
The codex token itself is healthy (hermes auth list shows openai-codex/codex as default, OAuth device-code, no errors), and a direct minimal call through Hermes' own agent.auxiliary_client.resolve_provider_client('openai-codex', model='gpt-5.3-codex', raw_codex=True) then client.responses.stream(...) produces a valid streamed response with text deltas:
client, model = resolve_provider_client('openai-codex', model='gpt-5.3-codex', raw_codex=True)
with client.responses.stream(
model='gpt-5.3-codex',
instructions='You are a helpful assistant.',
input=[{'role': 'user', 'content': 'Say hello in exactly 3 words.'}],
store=False,
) as stream:
# Yields ResponseTextDeltaEvent β "Hello to you!" with finish_reason=stop
...
So the underlying credential, model, base URL, and stream pipeline all work in isolation. Something the agent loop adds to the request makes the codex endpoint return response.output = [].
Symptom in logs
WARNING root: Invalid API response (retry 1/3): response.output is empty | Provider: model=gpt-5.4
WARNING root: Invalid API response (retry 2/3): response.output is empty | Provider: model=gpt-5.4
ERROR root: Invalid API response after 3 retries.
Also reproduced with model: gpt-5.3-codex / provider: openai-codex after editing ~/.hermes/config.yaml and restarting the gateway. Same retry-then-fallback path.
Versions / environment
- Hermes commit: local rollup, but the failing code paths in
run_agent.py (_preflight_codex_api_kwargs, _run_codex_stream, the empty-output validation block) are identical to current origin/main (e651e041).
- Python: 3.11.15
- OS: Arch Linux (kernel 6.18.7-arch1-1)
openai SDK: bundled in ~/.hermes/hermes-agent/venv
- Auth:
openai-codex via hermes auth add openai-codex --label codex --set-default (OAuth device-code), confirmed active in hermes auth list
Config that reproduces
# ~/.hermes/config.yaml
model:
default: gpt-5.3-codex # also reproduces with gpt-5.4
provider: openai-codex
fallback_providers:
- provider: openrouter
model: minimax/minimax-m2.7
Restart gateway, send any inbound message via a configured platform (Telegram, in our case). Empty-output validation trips, fallback activates.
What I checked
- Codex-side request shape: my direct test confirms the codex Responses endpoint requires
instructions, input (as a list), store=False, and stream=True. _preflight_codex_api_kwargs (run_agent.py:3013) correctly enforces all of these in the kwargs Hermes sends. So the kwargs should be valid.
- Model name is in
_PROVIDER_MODELS["openai-codex"]: gpt-5.3-codex is the latest entry; gpt-5.4 is not listed (it's a copilot/opencode-zen model), but the failure is the same with both, so model-validation isn't the discriminator.
- Auth round-trip: direct minimal call with the exact same client returned by
resolve_provider_client works. So the OAuth token, base URL, and streaming pipeline are healthy.
- No 4xx exception: the agent loop is reaching the empty-output validation branch, which only fires when
getattr(response, 'output', None) is a list of length 0 β meaning the codex endpoint returned a 200 OK with an empty output array, not a 400 error.
Hypotheses (in order of likelihood)
- Tools or oversized prompt being sent in the agent loop trip a silent codex degradation. The minimal direct test sends one tiny user message and zero tools. The agent loop sends the full system prompt, conversation history, and ~50+ tool definitions. Codex may be returning empty
output as a degraded-mode response when something in the kwargs is too large or malformed.
- Hermes' codex adapter is mis-handling a
reasoning config or other passthrough kwarg that the bare client doesn't include.
gpt-5.3-codex is restricted on this account's codex entitlement and codex returns empty output rather than a clean 4xx for entitlement violations. Less likely because direct call works, but possible if entitlement is per-route.
What would help diagnose
Happy to attach a HERMES_DUMP_REQUESTS=1 payload (sanitized) showing the exact api_kwargs Hermes is sending in the failing case, plus any structured response metadata visible to the validation block. Will follow up with that if the maintainers want it β let me know.
Workaround
Switching the primary model to a non-codex provider (we ended up on qwen/qwen3.6-plus:free via openrouter) avoids the issue entirely. So for users hitting this, the path of least resistance is:
model:
default: openai/gpt-5.4 # or anthropic/claude-opus-4-6, etc.
provider: openrouter
β¦which routes around the codex-specific code path.
Summary
Hermes' agent loop reliably trips the "Invalid API response: response.output is empty" validation at
run_agent.py:7022-7024when the primary model is set to any tested gpt-5.x model on theopenai-codexprovider β bothgpt-5.4andgpt-5.3-codex. The retry loop then exhausts max retries and falls back to the configured fallback model (minimax/minimax-m2.7viaopenrouterin our case), which delivers a normal response. The user-facing symptom on a chat platform (Telegram, in our case) is:The codex token itself is healthy (
hermes auth listshowsopenai-codex/codexas default, OAuth device-code, no errors), and a direct minimal call through Hermes' ownagent.auxiliary_client.resolve_provider_client('openai-codex', model='gpt-5.3-codex', raw_codex=True)thenclient.responses.stream(...)produces a valid streamed response with text deltas:So the underlying credential, model, base URL, and stream pipeline all work in isolation. Something the agent loop adds to the request makes the codex endpoint return
response.output = [].Symptom in logs
Also reproduced with
model: gpt-5.3-codex / provider: openai-codexafter editing~/.hermes/config.yamland restarting the gateway. Same retry-then-fallback path.Versions / environment
run_agent.py(_preflight_codex_api_kwargs,_run_codex_stream, the empty-output validation block) are identical to currentorigin/main(e651e041).openaiSDK: bundled in~/.hermes/hermes-agent/venvopenai-codexviahermes auth add openai-codex --label codex --set-default(OAuth device-code), confirmed active inhermes auth listConfig that reproduces
Restart gateway, send any inbound message via a configured platform (Telegram, in our case). Empty-output validation trips, fallback activates.
What I checked
instructions,input(as a list),store=False, andstream=True._preflight_codex_api_kwargs(run_agent.py:3013) correctly enforces all of these in the kwargs Hermes sends. So the kwargs should be valid._PROVIDER_MODELS["openai-codex"]:gpt-5.3-codexis the latest entry;gpt-5.4is not listed (it's acopilot/opencode-zenmodel), but the failure is the same with both, so model-validation isn't the discriminator.resolve_provider_clientworks. So the OAuth token, base URL, and streaming pipeline are healthy.getattr(response, 'output', None)is a list of length 0 β meaning the codex endpoint returned a 200 OK with an empty output array, not a 400 error.Hypotheses (in order of likelihood)
outputas a degraded-mode response when something in the kwargs is too large or malformed.reasoningconfig or other passthrough kwarg that the bare client doesn't include.gpt-5.3-codexis restricted on this account's codex entitlement and codex returns empty output rather than a clean 4xx for entitlement violations. Less likely because direct call works, but possible if entitlement is per-route.What would help diagnose
Happy to attach a
HERMES_DUMP_REQUESTS=1payload (sanitized) showing the exactapi_kwargsHermes is sending in the failing case, plus any structured response metadata visible to the validation block. Will follow up with that if the maintainers want it β let me know.Workaround
Switching the primary model to a non-codex provider (we ended up on
qwen/qwen3.6-plus:freeviaopenrouter) avoids the issue entirely. So for users hitting this, the path of least resistance is:β¦which routes around the codex-specific code path.