Problem
hermes_cli/runtime_provider.py supports an api_mode flag in config.yaml (chat_completions / codex_responses / anthropic_messages) and also has URL-based auto-detection (_detect_api_mode_for_url at line 36) that maps api.openai.com → codex_responses.
But agent/auxiliary_client.py — which handles auxiliary.*, compression.*, and delegation resolution — is a completely parallel chain that ignores api_mode entirely. It calls client.chat.completions.create(**kwargs) directly, with no way to route through /v1/responses.
This breaks using codex-family models for auxiliary tasks on a standard OPENAI_API_KEY. If you set:
auxiliary:
compression:
model: gpt-5.3-codex
base_url: https://api.openai.com/v1
api_mode: codex_responses # ← silently ignored
the call still hits /v1/chat/completions and 404s with "This is not a chat model and thus not supported in the v1/chat/completions endpoint".
Related
Proposed Solution
Two options:
Option A — Honor api_mode in auxiliary_client.py. When resolving a custom endpoint, check the api_mode field and use CodexAuxiliaryClient (the existing Responses API adapter) instead of a raw chat-completions client. The adapter already handles content translation via _convert_content_for_responses, so wiring it up for the non-OAuth API-key case should be localized.
Option B — Unify the two resolution systems. Collapse auxiliary_client.py into runtime_provider.py so there's one api_mode-aware path for everything. Bigger refactor but eliminates the confusing split.
I'd lean toward Option A as the minimal fix.
Use Case
Discovered during live debugging on a ChatGPT Team plan. When the weekly Codex quota blew, I wanted all Hermes-side tasks (not just main chat) to fall through to gpt-5.3-codex on my pay-per-token API key instead of degrading to gpt-5.4-mini. The fallback_model path works (via api_mode: codex_responses per #6209's underlying functionality); the auxiliary path is the remaining gap.
Scope
agent/auxiliary_client.py — resolution chain, approximately lines 900-1000
- No changes to
runtime_provider.py or the CLI wizard
- Add a note to
website/docs/developer-guide/provider-runtime.md about both resolution systems honoring api_mode
Happy to submit a PR for Option A if maintainers agree on the approach.
Problem
hermes_cli/runtime_provider.pysupports anapi_modeflag inconfig.yaml(chat_completions/codex_responses/anthropic_messages) and also has URL-based auto-detection (_detect_api_mode_for_urlat line 36) that mapsapi.openai.com→codex_responses.But
agent/auxiliary_client.py— which handlesauxiliary.*,compression.*, anddelegationresolution — is a completely parallel chain that ignoresapi_modeentirely. It callsclient.chat.completions.create(**kwargs)directly, with no way to route through/v1/responses.This breaks using codex-family models for auxiliary tasks on a standard
OPENAI_API_KEY. If you set:the call still hits
/v1/chat/completionsand 404s with "This is not a chat model and thus not supported in the v1/chat/completions endpoint".Related
api_modein the CLI wizard. Same underlying family of gap, but [Feature]: Custom endpoint setup should let users choose API protocol (api_mode) #6209 is about theruntime_provider.pypath (main chat). This issue is about theauxiliary_client.pypath being a separate, parallel system that also needs the flag._try_codex()function inauxiliary_client.py(around line 900) does route through the Responses API viaCodexAuxiliaryClient, but only for ChatGPT OAuth auth — not for standardOPENAI_API_KEY+api.openai.com.Proposed Solution
Two options:
Option A — Honor
api_modeinauxiliary_client.py. When resolving a custom endpoint, check theapi_modefield and useCodexAuxiliaryClient(the existing Responses API adapter) instead of a raw chat-completions client. The adapter already handles content translation via_convert_content_for_responses, so wiring it up for the non-OAuth API-key case should be localized.Option B — Unify the two resolution systems. Collapse
auxiliary_client.pyintoruntime_provider.pyso there's oneapi_mode-aware path for everything. Bigger refactor but eliminates the confusing split.I'd lean toward Option A as the minimal fix.
Use Case
Discovered during live debugging on a ChatGPT Team plan. When the weekly Codex quota blew, I wanted all Hermes-side tasks (not just main chat) to fall through to
gpt-5.3-codexon my pay-per-token API key instead of degrading togpt-5.4-mini. Thefallback_modelpath works (viaapi_mode: codex_responsesper #6209's underlying functionality); the auxiliary path is the remaining gap.Scope
agent/auxiliary_client.py— resolution chain, approximately lines 900-1000runtime_provider.pyor the CLI wizardwebsite/docs/developer-guide/provider-runtime.mdabout both resolution systems honoringapi_modeHappy to submit a PR for Option A if maintainers agree on the approach.