fix: force Copilot models to use chat completions#9232
Conversation
There was a problem hiding this comment.
Pull request overview
This PR changes Hermes’ Copilot model routing policy to prevent automatic switching to the Responses API, forcing Copilot-visible models to use Chat Completions for more consistent behavior across the model catalog.
Changes:
- Hard-disable
_should_use_copilot_responses_api()(alwaysFalse) to prevent Responses API routing for Copilot models. - Force
copilot_model_api_mode()to always return"chat_completions". - Update a Copilot auth test to reflect the new “always chat” policy for GPT‑5 model IDs.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
hermes_cli/models.py |
Forces Copilot API mode routing to chat_completions, disabling prior Responses API selection logic. |
tests/hermes_cli/test_copilot_auth.py |
Updates unit tests to assert Copilot GPT‑5 model IDs no longer trigger Responses API routing. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -1508,34 +1502,9 @@ def copilot_model_api_mode( | |||
| ) -> str: | |||
| """Determine the API mode for a Copilot model. | |||
|
|
|||
| Uses the model ID pattern (matching opencode's approach) as the | |||
| primary signal. Falls back to the catalog's ``supported_endpoints`` | |||
| only for models not covered by the pattern check. | |||
| Hermes intentionally forces all Copilot-visible models through | |||
| Chat Completions to avoid silent model-dependent API switching. | |||
| """ | |||
| normalized = normalize_copilot_model_id(model_id, catalog=catalog, api_key=api_key) | |||
| if not normalized: | |||
| return "chat_completions" | |||
|
|
|||
| # Primary: model ID pattern (matches opencode's shouldUseCopilotResponsesApi) | |||
| if _should_use_copilot_responses_api(normalized): | |||
| return "codex_responses" | |||
|
|
|||
| # Secondary: check catalog for non-GPT-5 models (Claude via /v1/messages, etc.) | |||
| if catalog is None and api_key: | |||
| catalog = fetch_github_model_catalog(api_key=api_key) | |||
|
|
|||
| if catalog: | |||
| catalog_entry = next((item for item in catalog if item.get("id") == normalized), None) | |||
| if isinstance(catalog_entry, dict): | |||
| supported_endpoints = { | |||
| str(endpoint).strip() | |||
| for endpoint in (catalog_entry.get("supported_endpoints") or []) | |||
| if str(endpoint).strip() | |||
| } | |||
| # For non-GPT-5 models, check if they only support messages API | |||
| if "/v1/messages" in supported_endpoints and "/chat/completions" not in supported_endpoints: | |||
| return "anthropic_messages" | |||
|
|
|||
| return "chat_completions" | |||
There was a problem hiding this comment.
copilot_model_api_mode() is now hard-coded to chat_completions, but there are existing tests and catalog mocks in the repo that assume Copilot GPT‑5.* models route to codex_responses when supported_endpoints only includes /responses (e.g. tests/hermes_cli/test_model_validation.py and tests/hermes_cli/test_api_key_providers.py). With this change, those tests will fail and the implementation will ignore catalog endpoint capabilities. Update the remaining tests/mocks (and any downstream logic that relies on codex_responses for Copilot) to reflect the new policy, or reintroduce capability-based routing if the catalog is still authoritative.
| def _should_use_copilot_responses_api(model_id: str) -> bool: | ||
| """Decide whether a Copilot model should use the Responses API. | ||
|
|
||
| Replicates opencode's ``shouldUseCopilotResponsesApi`` logic: | ||
| GPT-5+ models use Responses API, except ``gpt-5-mini`` which uses | ||
| Chat Completions. All non-GPT models (Claude, Gemini, etc.) use | ||
| Chat Completions. | ||
| Hermes now forces Copilot models through Chat Completions for | ||
| consistency across the model catalog. This intentionally disables | ||
| automatic Responses API routing that would otherwise affect GPT-5+ | ||
| models. | ||
| """ | ||
| import re | ||
|
|
||
| match = re.match(r"^gpt-(\d+)", model_id) | ||
| if not match: | ||
| return False | ||
| major = int(match.group(1)) | ||
| return major >= 5 and not model_id.startswith("gpt-5-mini") | ||
| return False |
There was a problem hiding this comment.
_should_use_copilot_responses_api() now always returns False, but its name and usage imply a real decision function (it is still imported/used by agent/auxiliary_client.py to decide whether to wrap Copilot clients for Responses API). Consider either removing/renaming this helper to reflect the new “always chat_completions” policy, or updating the remaining call sites/comments that assume GPT‑5 Copilot models require Responses API so the codebase doesn’t drift into contradictory routing logic.
There was a problem hiding this comment.
updated the auxiliary client path and tests so they match the new chat_completions-only Copilot policy, and marked the old helper as deprecated compatibility-only.
f84ef30 to
271ca88
Compare
Prevent Hermes from auto-switching Copilot-visible GPT-5 models to the Responses API. This keeps gpt-5-mini and related models on chat_completions unless a provider explicitly requires otherwise.
271ca88 to
6b4c24a
Compare
|
Thanks for the submission @jergensturdley. Closing without merging — this is a policy change, not a bug fix. Current Hermes (via opencode's That may be the right choice, but it needs a product decision from @teknium1 rather than landing silently as a bugfix. If the Responses API route is broken for specific model IDs, a narrower fix (disable Responses for the problematic models only) would be preferable. Feel free to open an issue making the case, or resubmit as a gated config option (e.g. |
Summary
Prevent Hermes from auto-switching Copilot-visible GPT-5 models to Responses API
Force Copilot model routing to chat_completions for consistency
Update tests to reflect the new behavior
Motivation
This avoids silent API-mode switching that made gpt-5-mini behavior confusing and hard to debug.
Verification
pytest tests/hermes_cli/test_copilot_auth.py -q
pytest tests/cli/test_cli_provider_resolution.py tests/run_agent/test_strict_api_validation.py tests/run_agent/test_provider_parity.py -q
Notes
This is a deliberate policy change in Hermes rather than a config-only fix.