[Feature]: auxiliary_client.py should honor api_mode flag (parallel to runtime_provider.py)

## Problem

`hermes_cli/runtime_provider.py` supports an `api_mode` flag in `config.yaml` (`chat_completions` / `codex_responses` / `anthropic_messages`) and also has URL-based auto-detection (`_detect_api_mode_for_url` at line 36) that maps `api.openai.com` → `codex_responses`.

But `agent/auxiliary_client.py` — which handles `auxiliary.*`, `compression.*`, and `delegation` resolution — is a completely parallel chain that **ignores `api_mode` entirely**. It calls `client.chat.completions.create(**kwargs)` directly, with no way to route through `/v1/responses`.

This breaks using codex-family models for auxiliary tasks on a standard `OPENAI_API_KEY`. If you set:

```yaml
auxiliary:
  compression:
    model: gpt-5.3-codex
    base_url: https://api.openai.com/v1
    api_mode: codex_responses   # ← silently ignored
```

the call still hits `/v1/chat/completions` and 404s with *"This is not a chat model and thus not supported in the v1/chat/completions endpoint"*.

## Related

- #6209 — custom endpoint setup should expose `api_mode` in the CLI wizard. Same underlying family of gap, but #6209 is about the `runtime_provider.py` path (main chat). This issue is about the `auxiliary_client.py` path being a separate, parallel system that also needs the flag.
- The existing `_try_codex()` function in `auxiliary_client.py` (around line 900) does route through the Responses API via `CodexAuxiliaryClient`, but only for ChatGPT OAuth auth — not for standard `OPENAI_API_KEY` + `api.openai.com`.

## Proposed Solution

Two options:

**Option A — Honor `api_mode` in `auxiliary_client.py`.** When resolving a custom endpoint, check the `api_mode` field and use `CodexAuxiliaryClient` (the existing Responses API adapter) instead of a raw chat-completions client. The adapter already handles content translation via `_convert_content_for_responses`, so wiring it up for the non-OAuth API-key case should be localized.

**Option B — Unify the two resolution systems.** Collapse `auxiliary_client.py` into `runtime_provider.py` so there's one `api_mode`-aware path for everything. Bigger refactor but eliminates the confusing split.

I'd lean toward Option A as the minimal fix.

## Use Case

Discovered during live debugging on a ChatGPT Team plan. When the weekly Codex quota blew, I wanted all Hermes-side tasks (not just main chat) to fall through to `gpt-5.3-codex` on my pay-per-token API key instead of degrading to `gpt-5.4-mini`. The `fallback_model` path works (via `api_mode: codex_responses` per #6209's underlying functionality); the auxiliary path is the remaining gap.

## Scope

- `agent/auxiliary_client.py` — resolution chain, approximately lines 900-1000
- No changes to `runtime_provider.py` or the CLI wizard
- Add a note to `website/docs/developer-guide/provider-runtime.md` about both resolution systems honoring `api_mode`

Happy to submit a PR for Option A if maintainers agree on the approach.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: auxiliary_client.py should honor api_mode flag (parallel to runtime_provider.py) #6800

Problem

Related

Proposed Solution

Use Case

Scope

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Feature]: auxiliary_client.py should honor api_mode flag (parallel to runtime_provider.py) #6800

Description

Problem

Related

Proposed Solution

Use Case

Scope

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions