Skip to content

fix(models): validate Alibaba coding-endpoint switches via catalog when /models 404s (#12272)#12287

Closed
briandevans wants to merge 1 commit into
NousResearch:mainfrom
briandevans:fix/alibaba-coding-endpoint-model-validation
Closed

fix(models): validate Alibaba coding-endpoint switches via catalog when /models 404s (#12272)#12287
briandevans wants to merge 1 commit into
NousResearch:mainfrom
briandevans:fix/alibaba-coding-endpoint-model-validation

Conversation

@briandevans

Copy link
Copy Markdown
Contributor

Fixes #12272.

TL;DR

validate_requested_model probes the provider's /models endpoint to confirm a requested model exists. The DashScope coding endpoint (https://coding.dashscope.aliyuncs.com/v1) does not expose /models — it returns HTTP 404, so fetch_api_models yields None.

Before this fix, the "couldn't reach API" branch hard-rejected every /model switch for DashScope coding users — including models Hermes statically knows about (qwen3-coder-plus, kimi-k2.5, glm-5, MiniMax-M2.5).

Fix: mirror the existing Bedrock pattern — when api_models is None and normalized == "alibaba", fall back to the curated provider_model_ids("alibaba") catalog. Accept known models quietly; accept unknown models with a warning; fall through to the generic reject if the catalog itself is unavailable.

Regression window

The "couldn't reach → hard reject" branch was introduced by commit aeb53131f (Apr 13, 2026) to prevent "fake model" typos. This fix re-enables switch-ability for the DashScope coding endpoint without reverting aeb53131f's guard for other providers.

Root cause

hermes_cli/models.py:2155 (pre-fix):

# api_models is None — couldn't reach API.  Accept and persist,
# but warn so typos don't silently break things.

# Bedrock: use our own discovery instead of HTTP /models endpoint.
if normalized == "bedrock":
    ...

# ← No alibaba case here → generic reject fires for DashScope coding

provider_label = _PROVIDER_LABELS.get(normalized, normalized)
return {
    "accepted": False,
    "persist": False,
    "recognized": False,
    "message": (
        f"Could not reach the {provider_label} API to validate `{requested}`. "
        f"If the service isn't down, this model may not be valid."
    ),
}

Fix

if normalized == "alibaba":
    try:
        catalog = provider_model_ids("alibaba")
    except Exception:
        catalog = []
    if catalog:
        if requested in set(catalog) or requested_for_lookup in set(catalog):
            return {"accepted": True, "persist": True, "recognized": True, "message": None}
        suggestions = get_close_matches(requested, catalog, n=3, cutoff=0.4)
        return {
            "accepted": True,
            "persist": True,
            "recognized": False,
            "message": f"Note: `{requested}` was not found in the Alibaba (DashScope) "
                       f"catalog; the coding endpoint doesn't expose `/models`, …",
        }
    # empty catalog or lookup failed → fall through to generic reject

Behaviour matrix

Scenario Endpoint api_models Before After
Switch to qwen3-coder-plus on coding 404s on /models None REJECT (bug) accept (quiet)
Switch to glm-5 on coding (third-party via DashScope) 404s on /models None REJECT (bug) accept (quiet)
Switch to unknown model on coding 404s on /models None reject with "couldn't reach" accept with Similar models: suggestions
Switch on classic endpoint supports /models ["qwen3-…", …] accept via live list unchanged
Switch on other provider whose API is unreachable timeout None reject with "couldn't reach" unchanged (narrow-scope guard)
Catalog module raises None reject fall through to reject (defensive)

Narrow scope — explicitly not changed

  • Classic DashScope endpoint (dashscope-intl.aliyuncs.com/compatible-mode/v1). Still probes the live /models endpoint — pinned by test_live_api_path_unchanged_when_endpoint_supports_models.
  • Other providers' hard-reject behaviour. The if normalized == "alibaba" is deliberately narrow. Pinned by test_other_providers_still_hard_reject_when_api_unreachable (zai canary).
  • The aeb53131f "fake models" guard. Intact for every path that doesn't match alibaba.
  • provider: custom with a DashScope coding base URL. Custom provider has its own validation path via TestCustomProviderBaseUrlSuggestion; out of scope.

Regression coverage

tests/hermes_cli/test_model_validation.py::TestValidateAlibabaCodingEndpoint — 8 cases:

  • 4 new-behaviour tests (fail on main):
    • test_known_qwen_model_accepted (reporter's exact repro)
    • test_known_third_party_model_accepted (glm-5, kimi-k2.5, MiniMax-M2.5)
    • test_unknown_model_accepted_with_warning
    • test_unknown_model_includes_suggestions
  • 2 defensive-fallback pins:
    • test_empty_catalog_falls_through_to_generic_reject
    • test_catalog_lookup_exception_falls_through
  • 2 preserved-behaviour canaries (pass on both main and branch):
    • test_live_api_path_unchanged_when_endpoint_supports_models
    • test_other_providers_still_hard_reject_when_api_unreachable

4 of 8 fail on clean origin/main (6fb69229) with assert False is True on result["accepted"] — the exact reporter symptom.

Validation

source venv/bin/activate
python -m pytest tests/hermes_cli/test_model_validation.py::TestValidateAlibabaCodingEndpoint -q
# 8 passed

Broader model-switch / normalize suites:

python -m pytest \
  tests/hermes_cli/test_model_validation.py \
  tests/hermes_cli/test_model_switch_custom_providers.py \
  tests/hermes_cli/test_model_switch_variant_tags.py \
  tests/hermes_cli/test_user_providers_model_switch.py \
  tests/hermes_cli/test_model_normalize.py -q
# 147 passed, 0 failures

Pre-empted review questions

Q. Why not always fall back to the catalog for every provider?
Because aeb53131f was deliberate — the "fake model" guard was added to catch user typos that would otherwise silently burn API quota on nonexistent models. The fix is narrowed to alibaba because the coding endpoint structurally doesn't expose /models (it's not a transient network error), so the catalog is the only validation signal available. Other providers should still fail loudly when their API is genuinely unreachable.

Q. What about users on the classic DashScope endpoint?
They follow the live /models path at models.py:2114, which runs before my fallback. No change. Pinned by a dedicated test.

Q. What if the catalog drifts out of sync with what DashScope actually serves?
That's already the tradeoff of using any static catalog — same for bedrock, openai-codex, nous, copilot. Unknown models are still accepted (with a warning) so users who want to use newer models than the catalog knows about still get the switch. They just see "not found in catalog" as a heads-up.

Q. Auto-correction for typos (qwen3-coder-pluqwen3-coder-plus)?
Not in this PR — matches Bedrock's current behaviour (suggest-only, don't auto-correct on the fallback path). Auto-correction is only done when /models returns a list. Scope creep otherwise.


Co-authored via LLM assistance; I've reviewed every line and am responsible for correctness.

…en /models 404s (NousResearch#12272)

``validate_requested_model`` probes the provider's ``/models`` endpoint
to verify that a requested model exists.  The Alibaba / DashScope
**coding** endpoint
(``https://coding.dashscope.aliyuncs.com/v1``) does not expose
``/models`` — the request returns **HTTP 404** and
``fetch_api_models`` yields ``None``.

Before this change, the "couldn't reach API" branch at
``hermes_cli/models.py:2195`` hard-rejected every ``/model`` switch
for DashScope coding users (``qwen3-coder-plus``, ``kimi-k2.5``,
``glm-5``, ``MiniMax-M2.5``, …), even for IDs Hermes statically knows
about via ``_PROVIDER_MODELS["alibaba"]``.  Reporter: NousResearch#12272.

Regression window — the "couldn't reach → hard reject" branch was
introduced by ``aeb53131f`` (Apr 13, 2026); pre-``aeb53131f`` the
fallback silently accepted and persisted.  The new fix reintroduces
switch-ability for DashScope coding without reverting ``aeb53131f``'s
"fake model" guard for other providers.

Fix
---
Mirror the existing Bedrock pattern (``models.py:2161``): when
``api_models is None`` AND ``normalized == "alibaba"``, fall back to
the curated ``provider_model_ids("alibaba")`` catalog.

* Catalog hit → ``{accepted: True, recognized: True}`` (no warning).
* Catalog miss → ``{accepted: True, recognized: False}`` with a clear
  warning naming the coding endpoint + close-match suggestions.
* Empty catalog (``provider_model_ids`` returned ``[]``) or catalog
  lookup exception → fall through to the existing generic reject so
  we don't silently accept every unknown model.

Narrow scope — explicitly not changed
-------------------------------------
* **Classic DashScope endpoint**
  (``dashscope-intl.aliyuncs.com/compatible-mode/v1``) — still validates
  against the live ``/models`` listing, unchanged.  Pinned by a
  canary test.
* **Other providers** — the alibaba-only ``if`` means no other
  provider's hard-reject behaviour changes.  Pinned by a ``zai``
  canary test.
* **The ``aeb53131f`` "fake models" guard** — intact for every path
  that doesn't match ``alibaba``.
* **``provider: custom`` with a DashScope coding base URL** — out of
  scope; custom provider has its own validation path via
  ``TestCustomProviderBaseUrlSuggestion``.

Regression coverage
-------------------
``tests/hermes_cli/test_model_validation.py`` gets a new
``TestValidateAlibabaCodingEndpoint`` class with 8 cases:

* 4 new-behaviour tests (fail on main):
  - ``test_known_qwen_model_accepted`` (reporter's repro)
  - ``test_known_third_party_model_accepted`` (glm-5 / kimi-k2.5 /
    MiniMax-M2.5 all flow through)
  - ``test_unknown_model_accepted_with_warning``
  - ``test_unknown_model_includes_suggestions``
* 2 defensive-fallback pins:
  - ``test_empty_catalog_falls_through_to_generic_reject``
  - ``test_catalog_lookup_exception_falls_through``
* 2 preserved-behaviour canaries (pass on main and branch):
  - ``test_live_api_path_unchanged_when_endpoint_supports_models``
  - ``test_other_providers_still_hard_reject_when_api_unreachable``

4 of the 8 fail on clean ``origin/main`` (``6fb69229``) with
``assert False is True`` on ``result["accepted"]`` — the exact reporter
symptom.  The remaining 4 pin preserved behaviour.

Validation
----------
``source venv/bin/activate && python -m pytest
tests/hermes_cli/test_model_validation.py::TestValidateAlibabaCodingEndpoint
-q`` → **8 passed**.

Broader model-switch / normalize suites
(``test_model_validation.py``, ``test_model_switch_custom_providers.py``,
``test_model_switch_variant_tags.py``,
``test_user_providers_model_switch.py``, ``test_model_normalize.py``) →
**147 passed, 0 failures.**

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 18, 2026 20:59

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes model switching for Alibaba/DashScope “coding” base URLs where /models returns 404 by falling back to Hermes’ curated Alibaba model catalog when live model probing is unavailable.

Changes:

  • Add an alibaba-specific fallback in validate_requested_model() when /models cannot be fetched, using provider_model_ids("alibaba").
  • Accept known catalog models silently; accept unknown models with a warning + “Similar models” suggestions; retain generic hard-reject when the catalog is unavailable/empty.
  • Add a dedicated regression test suite covering the coding-endpoint behavior and canary tests to ensure other providers remain unchanged.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
hermes_cli/models.py Adds Alibaba-specific catalog fallback when /models probing yields None, matching the intended behavior for DashScope coding endpoints.
tests/hermes_cli/test_model_validation.py Adds regression and canary tests validating the new Alibaba fallback behavior and preserving existing behavior for other providers/endpoints.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@briandevans

Copy link
Copy Markdown
Contributor Author

CI audit — all failures are pre-existing baselines:

build-and-push FAILURE — Docker entrypoint smoke test:

mkdir: cannot create directory '/opt/data/cron': Permission denied

Standing infra issue (container UID vs mounted volume ownership). Reproduces on origin/main builds.

test FAILURE — every failing test is in the standing baseline set I've verified against clean origin/main (6fb69229):

  • test_browser_camofox_state.py::test_config_version_matches_current_schema (assert 19 == 18)
  • test_web_server.py::test_no_single_field_categories (schema drift)
  • test_concurrent_interrupt.py (_Stub missing attr)
  • test_run_agent.py::test_tool_call_accumulation ('search' == 'web_search')
  • test_send_message_tool.py (pytest-xdist ordering flakes, pass under -n auto on main)
  • test_skills_tool.py::TestSkillView* (xdist fixture ordering)
  • test_tts_mistral.py::test_telegram_produces_ogg_and_voice_compatible

Zero in hermes_cli/models.py or tests/hermes_cli/test_model_validation.py.

Green: check-attribution, e2e, nix (ubuntu-latest), nix (macos-latest), supply-chain scan.

Focused suite on branch: pytest tests/hermes_cli/test_model_validation.py::TestValidateAlibabaCodingEndpoint -q8 passed.
Broader model suite (5 files): 147 passed, 0 failures.

@Artem151193

Copy link
Copy Markdown

Automated review check: BLOCKED.

Required CI is failing: build-and-push and test both failed. The remaining checks passed, but this is still a merge blocker.

@briandevans

Copy link
Copy Markdown
Contributor Author

@Artem151193 Thanks for the sweep — the two red checks are pre-existing baselines on origin/main (6fb69229), not introduced by this PR. Evidence:

build-and-push: Docker entrypoint permission errors (mkdir: cannot create directory '/opt/data/cron': Permission denied). Container-UID vs mounted-volume-ownership issue — reproduces on main builds and on every other open PR right now (see #12266, #12284, #12170, #12165, #12076, #12053, #11992).

test: 12 failures, every single one in the standing baseline set I verified against clean origin/main earlier in this session — see the baseline audit I posted above for the per-test classification. None touch hermes_cli/models.py or tests/hermes_cli/test_model_validation.py.

Focused suite on this PR's added class → 8 passed. Broader model/switch/normalize suite (5 files) → 147 passed, 0 failures.

Happy to open a separate PR addressing the most tractable baseline test failures (e.g. the _Stub missing-attribute issue in test_concurrent_interrupt.py) if that would unblock the merge process — just let me know.

briandevans added a commit to briandevans/hermes-agent that referenced this pull request Apr 19, 2026
…Research#12532)

The gateway ``/model`` picker calls ``validate_requested_model``, which
probes the provider's ``/models`` endpoint.  Two distinct failures drop
Gemini and Anthropic models from that flow:

* **Gemini**: the OpenAI-compat endpoint at
  ``generativelanguage.googleapis.com/v1beta/openai/models`` returns IDs
  prefixed with ``models/`` (e.g. ``models/gemini-2.5-flash``) — native
  Gemini-API convention.  Our curated list and user input use the bare
  ID, so the set-membership check drops every known Gemini model.
* **Anthropic**: the generic ``probe_api_models`` helper sends
  ``Authorization: Bearer`` without the ``anthropic-version`` header, so
  Anthropic's native ``/v1/models`` returns 4xx and ``fetch_api_models``
  yields ``None``.  The request lands in the generic "could not reach
  API" hard-reject, even though ``_fetch_anthropic_models`` (with the
  correct ``x-api-key`` + ``anthropic-version`` headers) works
  elsewhere in the codebase.

Both paths cause the gateway picker to fail while ``hermes model``
(which skips validation) works fine with the same credentials.
Reporter: NousResearch#12532.

Fix
---
Two surgical additions to ``hermes_cli.models.validate_requested_model``:

1. Strip the ``models/`` prefix from the probed listing when
   ``normalized == "gemini"``, before the set-membership check.  The
   rest of the strict branch (auto-correction, suggestions, reject
   path) reuses the normalized list — suggestions therefore surface
   bare IDs the user can actually type.
2. When ``api_models is None`` and ``normalized == "anthropic"``, fall
   back to ``provider_model_ids("anthropic")``.  That helper internally
   uses ``_fetch_anthropic_models`` with the correct headers and falls
   back to the curated static list when the live fetch also fails —
   identical pattern to the existing Bedrock (``#bedrock``) and Alibaba
   (NousResearch#12272 / PR NousResearch#12287) fall-throughs.

Narrow scope — explicitly not changed
-------------------------------------
* **``probe_api_models`` auth headers.**  Still ``Bearer``-only.
  Adding Anthropic-specific headers to the generic probe is out of
  scope; the catalog fall-through is the less invasive fix and keeps
  the generic probe provider-agnostic.
* **Other providers whose /models returns prefixed IDs.**  The strip
  is gated on ``normalized == "gemini"`` so no other provider's
  behaviour changes.  Pinned by a ``custom`` canary test.
* **Other providers' hard-reject on unreachable API.**  Still reject.
  Pinned by a ``zai`` canary test.
* **Reporter's Option A** (skip validation entirely when the model was
  chosen from a curated picker).  That's a gateway-side refactor
  (``_on_model_selected`` → ``switch_model``); this PR keeps the fix at
  the validator layer, which also covers CLI direct invocations and
  future callers.

Regression coverage
-------------------
``tests/hermes_cli/test_model_validation.py`` gets two new classes:

* ``TestValidateGeminiModelsPrefix`` (4 cases) — bare ID acceptance,
  all curated Gemini IDs resolve, unknown IDs surface suggestions that
  don't leak the ``models/`` prefix, and a canary pinning that the
  strip is gated on gemini.
* ``TestValidateAnthropicNoModelsEndpoint`` (7 cases) — curated Claude
  model accepted, all three tiers (opus/sonnet/haiku) resolve, unknown
  models accepted with warning, empty-catalog + exception fall through
  to the original generic reject, close-match suggestions surface on
  typos, and a ``zai`` canary preserving the generic reject for other
  providers.

6 of the 11 fail on clean ``origin/main`` (``6fb69229``) with
``assert False is True`` on ``result["accepted"]`` — the exact
reporter symptom.  The 5 remaining tests pin preserved behaviour
(canaries + defensive fall-throughs).

Validation
----------
``source venv/bin/activate && python -m pytest
tests/hermes_cli/test_model_validation.py::TestValidateGeminiModelsPrefix
tests/hermes_cli/test_model_validation.py::TestValidateAnthropicNoModelsEndpoint
-q`` → **11 passed**.

Broader model-switch / normalize suites (6 files) →
**159 passed, 0 failures**.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/cli CLI entry point, hermes_cli/, setup wizard provider/qwen Qwen / Alibaba Cloud (OAuth) labels Apr 23, 2026
@teknium1

Copy link
Copy Markdown
Contributor

Thanks for the thorough write-up and test coverage, @briandevans — the root cause analysis and behaviour matrix were spot on.

This is an automated hermes-sweeper review.

The underlying bug (#12272) has been independently fixed on main by a broader patch that landed three days after this PR was filed:

  • Commit 3f72b2fefix(/model): accept provider switches when /models is unreachable (Apr 21, 2026) added a generic provider_model_ids() catalog fallback block for all providers when api_models is None (hermes_cli/models.py:2905). This is a superset of the narrow if normalized == "alibaba": guard proposed here — alibaba falls through to it naturally.
  • The _PROVIDER_MODELS["alibaba"] catalog at models.py:348 already contains qwen3-coder-plus, glm-5, MiniMax-M2.5, and kimi-k2.5 — the exact models the reporter was blocked on.
  • The same commit also updated tests/hermes_cli/test_model_validation.py to flip the "API unreachable" assertions from accepted=False to accepted=True, covering the same behaviour your 8 tests assert.

Closing as implemented on main. The test structure and edge-case analysis you contributed here (especially the defensive fallback pins and the behaviour matrix) visibly informed how the broader fix was framed.

@teknium1 teknium1 closed this Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/cli CLI entry point, hermes_cli/, setup wizard P2 Medium — degraded but workaround exists provider/qwen Qwen / Alibaba Cloud (OAuth) type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(models): /model fails for Alibaba/DashScope coding endpoint — /models returns 404

5 participants