Skip to content

fix: force Copilot models to use chat completions#9232

Closed
jergensturdley wants to merge 1 commit into
NousResearch:mainfrom
jergensturdley:fix/copilot-chat-completions
Closed

fix: force Copilot models to use chat completions#9232
jergensturdley wants to merge 1 commit into
NousResearch:mainfrom
jergensturdley:fix/copilot-chat-completions

Conversation

@jergensturdley

@jergensturdley jergensturdley commented Apr 13, 2026

Copy link
Copy Markdown

Summary

Prevent Hermes from auto-switching Copilot-visible GPT-5 models to Responses API
Force Copilot model routing to chat_completions for consistency
Update tests to reflect the new behavior

Motivation

This avoids silent API-mode switching that made gpt-5-mini behavior confusing and hard to debug.

Verification

pytest tests/hermes_cli/test_copilot_auth.py -q
pytest tests/cli/test_cli_provider_resolution.py tests/run_agent/test_strict_api_validation.py tests/run_agent/test_provider_parity.py -q

Notes

This is a deliberate policy change in Hermes rather than a config-only fix.

Copilot AI review requested due to automatic review settings April 13, 2026 21:29

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR changes Hermes’ Copilot model routing policy to prevent automatic switching to the Responses API, forcing Copilot-visible models to use Chat Completions for more consistent behavior across the model catalog.

Changes:

  • Hard-disable _should_use_copilot_responses_api() (always False) to prevent Responses API routing for Copilot models.
  • Force copilot_model_api_mode() to always return "chat_completions".
  • Update a Copilot auth test to reflect the new “always chat” policy for GPT‑5 model IDs.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
hermes_cli/models.py Forces Copilot API mode routing to chat_completions, disabling prior Responses API selection logic.
tests/hermes_cli/test_copilot_auth.py Updates unit tests to assert Copilot GPT‑5 model IDs no longer trigger Responses API routing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread hermes_cli/models.py
Comment on lines 1497 to 1508
@@ -1508,34 +1502,9 @@ def copilot_model_api_mode(
) -> str:
"""Determine the API mode for a Copilot model.

Uses the model ID pattern (matching opencode's approach) as the
primary signal. Falls back to the catalog's ``supported_endpoints``
only for models not covered by the pattern check.
Hermes intentionally forces all Copilot-visible models through
Chat Completions to avoid silent model-dependent API switching.
"""
normalized = normalize_copilot_model_id(model_id, catalog=catalog, api_key=api_key)
if not normalized:
return "chat_completions"

# Primary: model ID pattern (matches opencode's shouldUseCopilotResponsesApi)
if _should_use_copilot_responses_api(normalized):
return "codex_responses"

# Secondary: check catalog for non-GPT-5 models (Claude via /v1/messages, etc.)
if catalog is None and api_key:
catalog = fetch_github_model_catalog(api_key=api_key)

if catalog:
catalog_entry = next((item for item in catalog if item.get("id") == normalized), None)
if isinstance(catalog_entry, dict):
supported_endpoints = {
str(endpoint).strip()
for endpoint in (catalog_entry.get("supported_endpoints") or [])
if str(endpoint).strip()
}
# For non-GPT-5 models, check if they only support messages API
if "/v1/messages" in supported_endpoints and "/chat/completions" not in supported_endpoints:
return "anthropic_messages"

return "chat_completions"

Copilot AI Apr 13, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copilot_model_api_mode() is now hard-coded to chat_completions, but there are existing tests and catalog mocks in the repo that assume Copilot GPT‑5.* models route to codex_responses when supported_endpoints only includes /responses (e.g. tests/hermes_cli/test_model_validation.py and tests/hermes_cli/test_api_key_providers.py). With this change, those tests will fail and the implementation will ignore catalog endpoint capabilities. Update the remaining tests/mocks (and any downstream logic that relies on codex_responses for Copilot) to reflect the new policy, or reintroduce capability-based routing if the catalog is still authoritative.

Copilot uses AI. Check for mistakes.
Comment thread hermes_cli/models.py
Comment on lines 1486 to +1494
def _should_use_copilot_responses_api(model_id: str) -> bool:
"""Decide whether a Copilot model should use the Responses API.

Replicates opencode's ``shouldUseCopilotResponsesApi`` logic:
GPT-5+ models use Responses API, except ``gpt-5-mini`` which uses
Chat Completions. All non-GPT models (Claude, Gemini, etc.) use
Chat Completions.
Hermes now forces Copilot models through Chat Completions for
consistency across the model catalog. This intentionally disables
automatic Responses API routing that would otherwise affect GPT-5+
models.
"""
import re

match = re.match(r"^gpt-(\d+)", model_id)
if not match:
return False
major = int(match.group(1))
return major >= 5 and not model_id.startswith("gpt-5-mini")
return False

Copilot AI Apr 13, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_should_use_copilot_responses_api() now always returns False, but its name and usage imply a real decision function (it is still imported/used by agent/auxiliary_client.py to decide whether to wrap Copilot clients for Responses API). Consider either removing/renaming this helper to reflect the new “always chat_completions” policy, or updating the remaining call sites/comments that assume GPT‑5 Copilot models require Responses API so the codebase doesn’t drift into contradictory routing logic.

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the auxiliary client path and tests so they match the new chat_completions-only Copilot policy, and marked the old helper as deprecated compatibility-only.

@jergensturdley jergensturdley force-pushed the fix/copilot-chat-completions branch from f84ef30 to 271ca88 Compare April 13, 2026 21:33
@jergensturdley jergensturdley marked this pull request as draft April 13, 2026 21:40
Prevent Hermes from auto-switching Copilot-visible GPT-5 models to the Responses API. This keeps gpt-5-mini and related models on chat_completions unless a provider explicitly requires otherwise.
@teknium1

Copy link
Copy Markdown
Contributor

Thanks for the submission @jergensturdley. Closing without merging — this is a policy change, not a bug fix.

Current Hermes (via opencode's shouldUseCopilotResponsesApi logic) routes GPT-5+ Copilot models through the Responses API and everything else through Chat Completions. Your PR forces ALL Copilot models through Chat Completions unconditionally.

That may be the right choice, but it needs a product decision from @teknium1 rather than landing silently as a bugfix. If the Responses API route is broken for specific model IDs, a narrower fix (disable Responses for the problematic models only) would be preferable.

Feel free to open an issue making the case, or resubmit as a gated config option (e.g. copilot.api_mode: chat_completions_only).

@teknium1 teknium1 closed this Apr 24, 2026
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists provider/copilot GitHub Copilot (ACP + Chat) comp/agent Core agent loop, run_agent.py, prompt builder labels Apr 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists provider/copilot GitHub Copilot (ACP + Chat) type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants