fix(models): validate Alibaba coding-endpoint switches via catalog when /models 404s (#12272)#12287
Conversation
…en /models 404s (NousResearch#12272) ``validate_requested_model`` probes the provider's ``/models`` endpoint to verify that a requested model exists. The Alibaba / DashScope **coding** endpoint (``https://coding.dashscope.aliyuncs.com/v1``) does not expose ``/models`` — the request returns **HTTP 404** and ``fetch_api_models`` yields ``None``. Before this change, the "couldn't reach API" branch at ``hermes_cli/models.py:2195`` hard-rejected every ``/model`` switch for DashScope coding users (``qwen3-coder-plus``, ``kimi-k2.5``, ``glm-5``, ``MiniMax-M2.5``, …), even for IDs Hermes statically knows about via ``_PROVIDER_MODELS["alibaba"]``. Reporter: NousResearch#12272. Regression window — the "couldn't reach → hard reject" branch was introduced by ``aeb53131f`` (Apr 13, 2026); pre-``aeb53131f`` the fallback silently accepted and persisted. The new fix reintroduces switch-ability for DashScope coding without reverting ``aeb53131f``'s "fake model" guard for other providers. Fix --- Mirror the existing Bedrock pattern (``models.py:2161``): when ``api_models is None`` AND ``normalized == "alibaba"``, fall back to the curated ``provider_model_ids("alibaba")`` catalog. * Catalog hit → ``{accepted: True, recognized: True}`` (no warning). * Catalog miss → ``{accepted: True, recognized: False}`` with a clear warning naming the coding endpoint + close-match suggestions. * Empty catalog (``provider_model_ids`` returned ``[]``) or catalog lookup exception → fall through to the existing generic reject so we don't silently accept every unknown model. Narrow scope — explicitly not changed ------------------------------------- * **Classic DashScope endpoint** (``dashscope-intl.aliyuncs.com/compatible-mode/v1``) — still validates against the live ``/models`` listing, unchanged. Pinned by a canary test. * **Other providers** — the alibaba-only ``if`` means no other provider's hard-reject behaviour changes. Pinned by a ``zai`` canary test. * **The ``aeb53131f`` "fake models" guard** — intact for every path that doesn't match ``alibaba``. * **``provider: custom`` with a DashScope coding base URL** — out of scope; custom provider has its own validation path via ``TestCustomProviderBaseUrlSuggestion``. Regression coverage ------------------- ``tests/hermes_cli/test_model_validation.py`` gets a new ``TestValidateAlibabaCodingEndpoint`` class with 8 cases: * 4 new-behaviour tests (fail on main): - ``test_known_qwen_model_accepted`` (reporter's repro) - ``test_known_third_party_model_accepted`` (glm-5 / kimi-k2.5 / MiniMax-M2.5 all flow through) - ``test_unknown_model_accepted_with_warning`` - ``test_unknown_model_includes_suggestions`` * 2 defensive-fallback pins: - ``test_empty_catalog_falls_through_to_generic_reject`` - ``test_catalog_lookup_exception_falls_through`` * 2 preserved-behaviour canaries (pass on main and branch): - ``test_live_api_path_unchanged_when_endpoint_supports_models`` - ``test_other_providers_still_hard_reject_when_api_unreachable`` 4 of the 8 fail on clean ``origin/main`` (``6fb69229``) with ``assert False is True`` on ``result["accepted"]`` — the exact reporter symptom. The remaining 4 pin preserved behaviour. Validation ---------- ``source venv/bin/activate && python -m pytest tests/hermes_cli/test_model_validation.py::TestValidateAlibabaCodingEndpoint -q`` → **8 passed**. Broader model-switch / normalize suites (``test_model_validation.py``, ``test_model_switch_custom_providers.py``, ``test_model_switch_variant_tags.py``, ``test_user_providers_model_switch.py``, ``test_model_normalize.py``) → **147 passed, 0 failures.** Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Fixes model switching for Alibaba/DashScope “coding” base URLs where /models returns 404 by falling back to Hermes’ curated Alibaba model catalog when live model probing is unavailable.
Changes:
- Add an
alibaba-specific fallback invalidate_requested_model()when/modelscannot be fetched, usingprovider_model_ids("alibaba"). - Accept known catalog models silently; accept unknown models with a warning + “Similar models” suggestions; retain generic hard-reject when the catalog is unavailable/empty.
- Add a dedicated regression test suite covering the coding-endpoint behavior and canary tests to ensure other providers remain unchanged.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
hermes_cli/models.py |
Adds Alibaba-specific catalog fallback when /models probing yields None, matching the intended behavior for DashScope coding endpoints. |
tests/hermes_cli/test_model_validation.py |
Adds regression and canary tests validating the new Alibaba fallback behavior and preserving existing behavior for other providers/endpoints. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
CI audit — all failures are pre-existing baselines:
Standing infra issue (container UID vs mounted volume ownership). Reproduces on
Zero in Green: Focused suite on branch: |
|
Automated review check: BLOCKED. Required CI is failing: build-and-push and test both failed. The remaining checks passed, but this is still a merge blocker. |
|
@Artem151193 Thanks for the sweep — the two red checks are pre-existing baselines on
Focused suite on this PR's added class → 8 passed. Broader model/switch/normalize suite (5 files) → 147 passed, 0 failures. Happy to open a separate PR addressing the most tractable baseline test failures (e.g. the |
…Research#12532) The gateway ``/model`` picker calls ``validate_requested_model``, which probes the provider's ``/models`` endpoint. Two distinct failures drop Gemini and Anthropic models from that flow: * **Gemini**: the OpenAI-compat endpoint at ``generativelanguage.googleapis.com/v1beta/openai/models`` returns IDs prefixed with ``models/`` (e.g. ``models/gemini-2.5-flash``) — native Gemini-API convention. Our curated list and user input use the bare ID, so the set-membership check drops every known Gemini model. * **Anthropic**: the generic ``probe_api_models`` helper sends ``Authorization: Bearer`` without the ``anthropic-version`` header, so Anthropic's native ``/v1/models`` returns 4xx and ``fetch_api_models`` yields ``None``. The request lands in the generic "could not reach API" hard-reject, even though ``_fetch_anthropic_models`` (with the correct ``x-api-key`` + ``anthropic-version`` headers) works elsewhere in the codebase. Both paths cause the gateway picker to fail while ``hermes model`` (which skips validation) works fine with the same credentials. Reporter: NousResearch#12532. Fix --- Two surgical additions to ``hermes_cli.models.validate_requested_model``: 1. Strip the ``models/`` prefix from the probed listing when ``normalized == "gemini"``, before the set-membership check. The rest of the strict branch (auto-correction, suggestions, reject path) reuses the normalized list — suggestions therefore surface bare IDs the user can actually type. 2. When ``api_models is None`` and ``normalized == "anthropic"``, fall back to ``provider_model_ids("anthropic")``. That helper internally uses ``_fetch_anthropic_models`` with the correct headers and falls back to the curated static list when the live fetch also fails — identical pattern to the existing Bedrock (``#bedrock``) and Alibaba (NousResearch#12272 / PR NousResearch#12287) fall-throughs. Narrow scope — explicitly not changed ------------------------------------- * **``probe_api_models`` auth headers.** Still ``Bearer``-only. Adding Anthropic-specific headers to the generic probe is out of scope; the catalog fall-through is the less invasive fix and keeps the generic probe provider-agnostic. * **Other providers whose /models returns prefixed IDs.** The strip is gated on ``normalized == "gemini"`` so no other provider's behaviour changes. Pinned by a ``custom`` canary test. * **Other providers' hard-reject on unreachable API.** Still reject. Pinned by a ``zai`` canary test. * **Reporter's Option A** (skip validation entirely when the model was chosen from a curated picker). That's a gateway-side refactor (``_on_model_selected`` → ``switch_model``); this PR keeps the fix at the validator layer, which also covers CLI direct invocations and future callers. Regression coverage ------------------- ``tests/hermes_cli/test_model_validation.py`` gets two new classes: * ``TestValidateGeminiModelsPrefix`` (4 cases) — bare ID acceptance, all curated Gemini IDs resolve, unknown IDs surface suggestions that don't leak the ``models/`` prefix, and a canary pinning that the strip is gated on gemini. * ``TestValidateAnthropicNoModelsEndpoint`` (7 cases) — curated Claude model accepted, all three tiers (opus/sonnet/haiku) resolve, unknown models accepted with warning, empty-catalog + exception fall through to the original generic reject, close-match suggestions surface on typos, and a ``zai`` canary preserving the generic reject for other providers. 6 of the 11 fail on clean ``origin/main`` (``6fb69229``) with ``assert False is True`` on ``result["accepted"]`` — the exact reporter symptom. The 5 remaining tests pin preserved behaviour (canaries + defensive fall-throughs). Validation ---------- ``source venv/bin/activate && python -m pytest tests/hermes_cli/test_model_validation.py::TestValidateGeminiModelsPrefix tests/hermes_cli/test_model_validation.py::TestValidateAnthropicNoModelsEndpoint -q`` → **11 passed**. Broader model-switch / normalize suites (6 files) → **159 passed, 0 failures**. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Thanks for the thorough write-up and test coverage, @briandevans — the root cause analysis and behaviour matrix were spot on. This is an automated hermes-sweeper review. The underlying bug (#12272) has been independently fixed on
Closing as implemented on main. The test structure and edge-case analysis you contributed here (especially the defensive fallback pins and the behaviour matrix) visibly informed how the broader fix was framed. |
Fixes #12272.
TL;DR
validate_requested_modelprobes the provider's/modelsendpoint to confirm a requested model exists. The DashScope coding endpoint (https://coding.dashscope.aliyuncs.com/v1) does not expose/models— it returns HTTP 404, sofetch_api_modelsyieldsNone.Before this fix, the "couldn't reach API" branch hard-rejected every
/modelswitch for DashScope coding users — including models Hermes statically knows about (qwen3-coder-plus,kimi-k2.5,glm-5,MiniMax-M2.5).Fix: mirror the existing Bedrock pattern — when
api_models is Noneandnormalized == "alibaba", fall back to the curatedprovider_model_ids("alibaba")catalog. Accept known models quietly; accept unknown models with a warning; fall through to the generic reject if the catalog itself is unavailable.Regression window
The "couldn't reach → hard reject" branch was introduced by commit
aeb53131f(Apr 13, 2026) to prevent "fake model" typos. This fix re-enables switch-ability for the DashScope coding endpoint without revertingaeb53131f's guard for other providers.Root cause
hermes_cli/models.py:2155(pre-fix):Fix
Behaviour matrix
api_modelsqwen3-coder-pluson coding/modelsNoneglm-5on coding (third-party via DashScope)/modelsNone/modelsNoneSimilar models:suggestions/models["qwen3-…", …]NoneNoneNarrow scope — explicitly not changed
dashscope-intl.aliyuncs.com/compatible-mode/v1). Still probes the live/modelsendpoint — pinned bytest_live_api_path_unchanged_when_endpoint_supports_models.if normalized == "alibaba"is deliberately narrow. Pinned bytest_other_providers_still_hard_reject_when_api_unreachable(zai canary).aeb53131f"fake models" guard. Intact for every path that doesn't matchalibaba.provider: customwith a DashScope coding base URL. Custom provider has its own validation path viaTestCustomProviderBaseUrlSuggestion; out of scope.Regression coverage
tests/hermes_cli/test_model_validation.py::TestValidateAlibabaCodingEndpoint— 8 cases:test_known_qwen_model_accepted(reporter's exact repro)test_known_third_party_model_accepted(glm-5,kimi-k2.5,MiniMax-M2.5)test_unknown_model_accepted_with_warningtest_unknown_model_includes_suggestionstest_empty_catalog_falls_through_to_generic_rejecttest_catalog_lookup_exception_falls_throughtest_live_api_path_unchanged_when_endpoint_supports_modelstest_other_providers_still_hard_reject_when_api_unreachable4 of 8 fail on clean
origin/main(6fb69229) withassert False is Trueonresult["accepted"]— the exact reporter symptom.Validation
Broader model-switch / normalize suites:
python -m pytest \ tests/hermes_cli/test_model_validation.py \ tests/hermes_cli/test_model_switch_custom_providers.py \ tests/hermes_cli/test_model_switch_variant_tags.py \ tests/hermes_cli/test_user_providers_model_switch.py \ tests/hermes_cli/test_model_normalize.py -q # 147 passed, 0 failuresPre-empted review questions
Q. Why not always fall back to the catalog for every provider?
Because
aeb53131fwas deliberate — the "fake model" guard was added to catch user typos that would otherwise silently burn API quota on nonexistent models. The fix is narrowed toalibababecause the coding endpoint structurally doesn't expose/models(it's not a transient network error), so the catalog is the only validation signal available. Other providers should still fail loudly when their API is genuinely unreachable.Q. What about users on the classic DashScope endpoint?
They follow the live
/modelspath atmodels.py:2114, which runs before my fallback. No change. Pinned by a dedicated test.Q. What if the catalog drifts out of sync with what DashScope actually serves?
That's already the tradeoff of using any static catalog — same for
bedrock,openai-codex,nous,copilot. Unknown models are still accepted (with a warning) so users who want to use newer models than the catalog knows about still get the switch. They just see "not found in catalog" as a heads-up.Q. Auto-correction for typos (
qwen3-coder-plu→qwen3-coder-plus)?Not in this PR — matches Bedrock's current behaviour (suggest-only, don't auto-correct on the fallback path). Auto-correction is only done when
/modelsreturns a list. Scope creep otherwise.Co-authored via LLM assistance; I've reviewed every line and am responsible for correctness.