Skip to content

[Feature]: auxiliary_client.py should honor api_mode flag (parallel to runtime_provider.py) #6800

@mecampbellsoup

Description

@mecampbellsoup

Problem

hermes_cli/runtime_provider.py supports an api_mode flag in config.yaml (chat_completions / codex_responses / anthropic_messages) and also has URL-based auto-detection (_detect_api_mode_for_url at line 36) that maps api.openai.comcodex_responses.

But agent/auxiliary_client.py — which handles auxiliary.*, compression.*, and delegation resolution — is a completely parallel chain that ignores api_mode entirely. It calls client.chat.completions.create(**kwargs) directly, with no way to route through /v1/responses.

This breaks using codex-family models for auxiliary tasks on a standard OPENAI_API_KEY. If you set:

auxiliary:
  compression:
    model: gpt-5.3-codex
    base_url: https://api.openai.com/v1
    api_mode: codex_responses   # ← silently ignored

the call still hits /v1/chat/completions and 404s with "This is not a chat model and thus not supported in the v1/chat/completions endpoint".

Related

Proposed Solution

Two options:

Option A — Honor api_mode in auxiliary_client.py. When resolving a custom endpoint, check the api_mode field and use CodexAuxiliaryClient (the existing Responses API adapter) instead of a raw chat-completions client. The adapter already handles content translation via _convert_content_for_responses, so wiring it up for the non-OAuth API-key case should be localized.

Option B — Unify the two resolution systems. Collapse auxiliary_client.py into runtime_provider.py so there's one api_mode-aware path for everything. Bigger refactor but eliminates the confusing split.

I'd lean toward Option A as the minimal fix.

Use Case

Discovered during live debugging on a ChatGPT Team plan. When the weekly Codex quota blew, I wanted all Hermes-side tasks (not just main chat) to fall through to gpt-5.3-codex on my pay-per-token API key instead of degrading to gpt-5.4-mini. The fallback_model path works (via api_mode: codex_responses per #6209's underlying functionality); the auxiliary path is the remaining gap.

Scope

  • agent/auxiliary_client.py — resolution chain, approximately lines 900-1000
  • No changes to runtime_provider.py or the CLI wizard
  • Add a note to website/docs/developer-guide/provider-runtime.md about both resolution systems honoring api_mode

Happy to submit a PR for Option A if maintainers agree on the approach.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions