fix(models): accept Gemini + Anthropic in gateway /model picker (#12532)#12585
fix(models): accept Gemini + Anthropic in gateway /model picker (#12532)#12585briandevans wants to merge 2 commits into
Conversation
…Research#12532) The gateway ``/model`` picker calls ``validate_requested_model``, which probes the provider's ``/models`` endpoint. Two distinct failures drop Gemini and Anthropic models from that flow: * **Gemini**: the OpenAI-compat endpoint at ``generativelanguage.googleapis.com/v1beta/openai/models`` returns IDs prefixed with ``models/`` (e.g. ``models/gemini-2.5-flash``) — native Gemini-API convention. Our curated list and user input use the bare ID, so the set-membership check drops every known Gemini model. * **Anthropic**: the generic ``probe_api_models`` helper sends ``Authorization: Bearer`` without the ``anthropic-version`` header, so Anthropic's native ``/v1/models`` returns 4xx and ``fetch_api_models`` yields ``None``. The request lands in the generic "could not reach API" hard-reject, even though ``_fetch_anthropic_models`` (with the correct ``x-api-key`` + ``anthropic-version`` headers) works elsewhere in the codebase. Both paths cause the gateway picker to fail while ``hermes model`` (which skips validation) works fine with the same credentials. Reporter: NousResearch#12532. Fix --- Two surgical additions to ``hermes_cli.models.validate_requested_model``: 1. Strip the ``models/`` prefix from the probed listing when ``normalized == "gemini"``, before the set-membership check. The rest of the strict branch (auto-correction, suggestions, reject path) reuses the normalized list — suggestions therefore surface bare IDs the user can actually type. 2. When ``api_models is None`` and ``normalized == "anthropic"``, fall back to ``provider_model_ids("anthropic")``. That helper internally uses ``_fetch_anthropic_models`` with the correct headers and falls back to the curated static list when the live fetch also fails — identical pattern to the existing Bedrock (``#bedrock``) and Alibaba (NousResearch#12272 / PR NousResearch#12287) fall-throughs. Narrow scope — explicitly not changed ------------------------------------- * **``probe_api_models`` auth headers.** Still ``Bearer``-only. Adding Anthropic-specific headers to the generic probe is out of scope; the catalog fall-through is the less invasive fix and keeps the generic probe provider-agnostic. * **Other providers whose /models returns prefixed IDs.** The strip is gated on ``normalized == "gemini"`` so no other provider's behaviour changes. Pinned by a ``custom`` canary test. * **Other providers' hard-reject on unreachable API.** Still reject. Pinned by a ``zai`` canary test. * **Reporter's Option A** (skip validation entirely when the model was chosen from a curated picker). That's a gateway-side refactor (``_on_model_selected`` → ``switch_model``); this PR keeps the fix at the validator layer, which also covers CLI direct invocations and future callers. Regression coverage ------------------- ``tests/hermes_cli/test_model_validation.py`` gets two new classes: * ``TestValidateGeminiModelsPrefix`` (4 cases) — bare ID acceptance, all curated Gemini IDs resolve, unknown IDs surface suggestions that don't leak the ``models/`` prefix, and a canary pinning that the strip is gated on gemini. * ``TestValidateAnthropicNoModelsEndpoint`` (7 cases) — curated Claude model accepted, all three tiers (opus/sonnet/haiku) resolve, unknown models accepted with warning, empty-catalog + exception fall through to the original generic reject, close-match suggestions surface on typos, and a ``zai`` canary preserving the generic reject for other providers. 6 of the 11 fail on clean ``origin/main`` (``6fb69229``) with ``assert False is True`` on ``result["accepted"]`` — the exact reporter symptom. The 5 remaining tests pin preserved behaviour (canaries + defensive fall-throughs). Validation ---------- ``source venv/bin/activate && python -m pytest tests/hermes_cli/test_model_validation.py::TestValidateGeminiModelsPrefix tests/hermes_cli/test_model_validation.py::TestValidateAnthropicNoModelsEndpoint -q`` → **11 passed**. Broader model-switch / normalize suites (6 files) → **159 passed, 0 failures**. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Fixes gateway /model picker validation for Gemini and Anthropic by aligning model-ID normalization with provider responses and adding provider-specific fallback validation when the generic /models probe is structurally incompatible.
Changes:
- Normalize Gemini
/modelsresults by stripping themodels/prefix before membership checks and suggestions. - Add an Anthropic-specific “catalog fall-through” path when live
/modelsprobing returnsNone. - Add regression tests covering Gemini prefix normalization and Anthropic fallback behavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
hermes_cli/models.py |
Updates validate_requested_model() to normalize Gemini IDs and to fall back to the Anthropic catalog when /models probing fails. |
tests/hermes_cli/test_model_validation.py |
Adds regression tests for the Gemini models/ prefix mismatch and Anthropic probe failure fallback. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if normalized == "anthropic": | ||
| try: | ||
| catalog = provider_model_ids("anthropic") | ||
| except Exception: | ||
| catalog = [] | ||
| if catalog: |
| def test_unknown_gemini_id_surfaces_suggestions_not_generic_reject(self): | ||
| """The prefix-strip fix must not accidentally route unknown Gemini | ||
| models into the generic "couldn't reach API" branch. The post- | ||
| strip list must be used for suggestions too.""" | ||
| result = self._validate("gemini-hypothetical") | ||
| # Strict branch rejects unknown IDs (even after strip) — same as | ||
| # any other provider whose /models endpoint responded. | ||
| assert result["accepted"] is False | ||
| # Suggestions must reference the bare (post-strip) IDs, not the | ||
| # raw ``models/…`` strings — otherwise the UI would surface | ||
| # "Similar models: models/gemini-2.5-flash" which is useless | ||
| # advice because that literal can't be typed into the picker. | ||
| if "Similar models" in (result["message"] or ""): | ||
| assert "models/" not in result["message"] | ||
|
|
| # Either accepted-as-recognized or accepted-with-suggestions is fine; | ||
| # the point is that we proceed + offer context. | ||
| assert result["accepted"] is True |
… review (follow-up to NousResearch#12532) Addresses all 3 Copilot inline comments on NousResearch#12585: 1. **Second live network call on the failure path** (line 2216). ``provider_model_ids("anthropic")`` internally calls ``_fetch_anthropic_models`` with a 5s timeout. Calling it from the ``validate_requested_model`` fallback — which is already a failure path the user is waiting on — could stack a second 5s hang after the probe's 5s. Switched to reading ``_PROVIDER_MODELS["anthropic"]`` directly. The static list is the source of truth the picker populates from, so it's guaranteed to contain any ID a user could have selected. No env-based credential discovery, no network call. 2. **Gemini suggestions test was a no-op on the empty branch** (test line 631). The original ``if "Similar models" in …`` guard meant the assertion silently passed whenever suggestions weren't generated. Renamed to ``test_unknown_gemini_id_surfaces_bare_id_suggestions`` and switched to a deliberately-close input (``gemini-2.5-flash-nano``) that reliably fires suggestions at cutoff=0.5 without hitting the auto-correct at cutoff=0.9 — pre-computed the ratio matrix to pick this value. Now asserts (1) ``accepted is False``, (2) "Similar models" present, (3) no ``models/`` leak, (4) a known bare ID is in the list. 3. **Anthropic suggestion test didn't actually check suggestions** (test line 762). The original only asserted ``accepted``. Changed input to ``claude-opus-4-7-preview`` (close to ``claude-opus-4-7`` but not exact) and now asserts (1) accepted, (2) ``recognized is False``, (3) "Similar models" in message, (4) the exact closest match surfaces. Also collapsed the redundant ``test_missing_catalog_falls_through_to_generic_reject`` — ``test_empty_catalog_falls_through_to_generic_reject`` already covers the defensive fall-through, and after the switch to direct ``_PROVIDER_MODELS`` access there's no longer a ``provider_model_ids``-raising path to cover separately. Test plumbing: the Anthropic fixture now patches ``_PROVIDER_MODELS`` via ``patch.dict`` instead of patching ``provider_model_ids``, matching the new code path. Validation ---------- ``source venv/bin/activate && python -m pytest tests/hermes_cli/test_model_validation.py::TestValidateGeminiModelsPrefix tests/hermes_cli/test_model_validation.py::TestValidateAnthropicNoModelsEndpoint -q`` → **10 passed** (was 11; consolidated the redundant catalog test). Broader model-switch / normalize suites (6 files) → **158 passed, 0 failures**. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Thanks for the catches @copilot-pull-request-reviewer — all 3 addressed in 1. Second 5s network call on the failure path. You're right — the fallback is already a failure path and stacking another 5s Anthropic probe on top would double the worst-case latency. Switched from 2. Gemini suggestions test was a no-op. Right — the
3. Anthropic close-match suggestion test only checked Also collapsed the now-redundant Test fixture now patches 10 passed on branch; broader model-switch suites (6 files) → 158 passed, 0 regressions. |
Salvage of the Gemini-specific piece from PR #12585 by @briandevans. Gemini's OpenAI-compat /v1beta/openai/models endpoint returns IDs prefixed with 'models/' (native Gemini-API convention), so set-membership against curated bare IDs drops every model. Strip the prefix before comparison. The Anthropic static-catalog piece of #12585 was subsumed by #12618's _fetch_anthropic_models() branch landing earlier in the same salvage PR. Full branch cherry-pick was skipped because it also carried unrelated catalog-version regressions.
|
Thanks @briandevans. The Gemini-prefix piece from your PR landed in #15136 (commit 7f26cea) with your authorship preserved via The branch itself had unrelated catalog-version regressions (old OpenRouter snapshot, missing Xiaomi mimo-v2.5 entries, etc.) so I couldn't cherry-pick it cleanly — the Gemini fix was reapplied directly with |
…2532) Salvage of the Gemini-specific piece from PR NousResearch#12585 by @briandevans. Gemini's OpenAI-compat /v1beta/openai/models endpoint returns IDs prefixed with 'models/' (native Gemini-API convention), so set-membership against curated bare IDs drops every model. Strip the prefix before comparison. The Anthropic static-catalog piece of NousResearch#12585 was subsumed by NousResearch#12618's _fetch_anthropic_models() branch landing earlier in the same salvage PR. Full branch cherry-pick was skipped because it also carried unrelated catalog-version regressions.
…2532) Salvage of the Gemini-specific piece from PR NousResearch#12585 by @briandevans. Gemini's OpenAI-compat /v1beta/openai/models endpoint returns IDs prefixed with 'models/' (native Gemini-API convention), so set-membership against curated bare IDs drops every model. Strip the prefix before comparison. The Anthropic static-catalog piece of NousResearch#12585 was subsumed by NousResearch#12618's _fetch_anthropic_models() branch landing earlier in the same salvage PR. Full branch cherry-pick was skipped because it also carried unrelated catalog-version regressions.
…2532) Salvage of the Gemini-specific piece from PR NousResearch#12585 by @briandevans. Gemini's OpenAI-compat /v1beta/openai/models endpoint returns IDs prefixed with 'models/' (native Gemini-API convention), so set-membership against curated bare IDs drops every model. Strip the prefix before comparison. The Anthropic static-catalog piece of NousResearch#12585 was subsumed by NousResearch#12618's _fetch_anthropic_models() branch landing earlier in the same salvage PR. Full branch cherry-pick was skipped because it also carried unrelated catalog-version regressions.
…2532) Salvage of the Gemini-specific piece from PR NousResearch#12585 by @briandevans. Gemini's OpenAI-compat /v1beta/openai/models endpoint returns IDs prefixed with 'models/' (native Gemini-API convention), so set-membership against curated bare IDs drops every model. Strip the prefix before comparison. The Anthropic static-catalog piece of NousResearch#12585 was subsumed by NousResearch#12618's _fetch_anthropic_models() branch landing earlier in the same salvage PR. Full branch cherry-pick was skipped because it also carried unrelated catalog-version regressions.
…2532) Salvage of the Gemini-specific piece from PR NousResearch#12585 by @briandevans. Gemini's OpenAI-compat /v1beta/openai/models endpoint returns IDs prefixed with 'models/' (native Gemini-API convention), so set-membership against curated bare IDs drops every model. Strip the prefix before comparison. The Anthropic static-catalog piece of NousResearch#12585 was subsumed by NousResearch#12618's _fetch_anthropic_models() branch landing earlier in the same salvage PR. Full branch cherry-pick was skipped because it also carried unrelated catalog-version regressions.
…2532) Salvage of the Gemini-specific piece from PR NousResearch#12585 by @briandevans. Gemini's OpenAI-compat /v1beta/openai/models endpoint returns IDs prefixed with 'models/' (native Gemini-API convention), so set-membership against curated bare IDs drops every model. Strip the prefix before comparison. The Anthropic static-catalog piece of NousResearch#12585 was subsumed by NousResearch#12618's _fetch_anthropic_models() branch landing earlier in the same salvage PR. Full branch cherry-pick was skipped because it also carried unrelated catalog-version regressions.
…2532) Salvage of the Gemini-specific piece from PR NousResearch#12585 by @briandevans. Gemini's OpenAI-compat /v1beta/openai/models endpoint returns IDs prefixed with 'models/' (native Gemini-API convention), so set-membership against curated bare IDs drops every model. Strip the prefix before comparison. The Anthropic static-catalog piece of NousResearch#12585 was subsumed by NousResearch#12618's _fetch_anthropic_models() branch landing earlier in the same salvage PR. Full branch cherry-pick was skipped because it also carried unrelated catalog-version regressions.
Fixes #12532.
TL;DR
Gateway
/modelpicker rejects both Gemini and Anthropic models while CLIhermes modelworks with the same credentials. Two distinct bugs invalidate_requested_model:/modelsreturns IDs likemodels/gemini-2.5-flash— curated list and user input use baregemini-2.5-flash→ set membership drops every model.Authorization: Bearerwithoutanthropic-version→ Anthropic's native/v1/modelsreturns 4xx →fetch_api_modelsyieldsNone→ lands in the generic "couldn't reach API" hard-reject.Both fixes mirror existing patterns already in this function (Bedrock / Alibaba catalog fall-through for Anthropic; symmetric normalization for the Gemini prefix drift).
Root cause details
Gemini
Gemini's OpenAI-compat endpoint at
generativelanguage.googleapis.com/v1beta/openai/modelsreturns IDs in the native Gemini format (models/gemini-2.5-flash). The curated_PROVIDER_MODELS["gemini"]list and the picker UI both use the bare ID.requested_for_lookup in set(api_models)therefore fails for every known Gemini model.Anthropic
probe_api_modelsconstructs headers athermes_cli/models.py:1776-1780:Anthropic's API requires
x-api-key(for API keys) orAuthorization: Bearerwithanthropic-versionheader — neverBeareralone without a version. So the probe silently 4xxs, returningNone, and the request lands in the "couldn't reach API" hard-reject branch._fetch_anthropic_models(line 1338) already does the right thing — it's called duringprovider_model_ids("anthropic")for picker population, but never for per-request validation.Fix
Two surgical additions to
validate_requested_model:1. Gemini prefix strip — before the set-membership check:
The normalized list also drives auto-correct + suggestions, so "Similar models: gemini-2.5-flash" surfaces bare IDs the user can actually type — not the useless
models/...literals.2. Anthropic catalog fall-through — after the existing Bedrock block, same pattern as the
#12287Alibaba fix:provider_model_ids("anthropic")already handles live-fetch (via_fetch_anthropic_modelswith the correctx-api-key+anthropic-versionheaders) and falls back to the curated static list on failure.Behaviour matrix
/model gemini:gemini-2.5-flashvia picker{models/gemini-2.5-flash, …}/model anthropic:claude-opus-4-7via pickergemini-hypothetical)models/…-leaked suggestionszai) with unreachable APImodels/-prefixed API responseNarrow scope — explicitly not changed
probe_api_modelsauth headers. StillBearer-only. Teaching the generic probe about Anthropic's header requirements would entangle provider-specific details in the transport layer; the catalog fall-through is less invasive.models/-prefixed IDs. The strip is gated onnormalized == "gemini". Pinned by acustomcanary._on_model_selected/switch_model; this PR keeps the fix at the validator layer so it benefits CLI, gateway, and any future caller uniformly.Regression coverage
Two new test classes, 11 cases:
TestValidateGeminiModelsPrefix(4 cases):test_bare_id_accepted_despite_models_prefix_from_api— reporter's reprotest_all_curated_gemini_ids_resolve— 4 IDs across pro/flash/flash-lite/previewtest_unknown_gemini_id_surfaces_suggestions_not_generic_reject— pins that suggestions don't leak themodels/prefixtest_prefix_strip_limited_to_gemini_provider—customprovider canaryTestValidateAnthropicNoModelsEndpoint(7 cases):test_curated_claude_model_accepted— reporter's reprotest_all_tiers_resolve— opus/sonnet/haikutest_unknown_model_accepted_with_warningtest_empty_catalog_falls_through_to_generic_rejecttest_catalog_lookup_exception_falls_throughtest_unknown_model_includes_close_match_suggestiontest_other_providers_still_hard_reject_when_api_unreachable—zaicanary6 of 11 fail on clean
origin/main(6fb69229) withassert False is Trueonresult["accepted"]. The 5 passing tests pin preserved behaviour (canaries + defensive fall-throughs).Validation
Broader model-switch / normalize suites (6 files) → 159 passed, 0 failures.
Relation to prior PRs
This PR uses the exact same fall-through pattern as
#12287(Alibaba DashScope coding endpoint). That pattern in turn mirrors the pre-existing Bedrock branch. Each provider whose/modelsendpoint is structurally inaccessible gets its own small catalog fall-through — no broad widening of the validator's trust surface.Co-authored via LLM assistance; I've reviewed every line and am responsible for correctness.