fix(models): validate Alibaba coding-endpoint switches via catalog when /models 404s (#12272) by briandevans · Pull Request #12287 · NousResearch/hermes-agent

briandevans · 2026-04-18T20:59:39Z

TL;DR

validate_requested_model probes the provider's /models endpoint to confirm a requested model exists. The DashScope coding endpoint (https://coding.dashscope.aliyuncs.com/v1) does not expose /models — it returns HTTP 404, so fetch_api_models yields None.

Before this fix, the "couldn't reach API" branch hard-rejected every /model switch for DashScope coding users — including models Hermes statically knows about (qwen3-coder-plus, kimi-k2.5, glm-5, MiniMax-M2.5).

Fix: mirror the existing Bedrock pattern — when api_models is None and normalized == "alibaba", fall back to the curated provider_model_ids("alibaba") catalog. Accept known models quietly; accept unknown models with a warning; fall through to the generic reject if the catalog itself is unavailable.

Regression window

The "couldn't reach → hard reject" branch was introduced by commit aeb53131f (Apr 13, 2026) to prevent "fake model" typos. This fix re-enables switch-ability for the DashScope coding endpoint without reverting aeb53131f's guard for other providers.

Root cause

hermes_cli/models.py:2155 (pre-fix):

# api_models is None — couldn't reach API.  Accept and persist,
# but warn so typos don't silently break things.

# Bedrock: use our own discovery instead of HTTP /models endpoint.
if normalized == "bedrock":
    ...

# ← No alibaba case here → generic reject fires for DashScope coding

provider_label = _PROVIDER_LABELS.get(normalized, normalized)
return {
    "accepted": False,
    "persist": False,
    "recognized": False,
    "message": (
        f"Could not reach the {provider_label} API to validate `{requested}`. "
        f"If the service isn't down, this model may not be valid."
    ),
}

Fix

if normalized == "alibaba":
    try:
        catalog = provider_model_ids("alibaba")
    except Exception:
        catalog = []
    if catalog:
        if requested in set(catalog) or requested_for_lookup in set(catalog):
            return {"accepted": True, "persist": True, "recognized": True, "message": None}
        suggestions = get_close_matches(requested, catalog, n=3, cutoff=0.4)
        return {
            "accepted": True,
            "persist": True,
            "recognized": False,
            "message": f"Note: `{requested}` was not found in the Alibaba (DashScope) "
                       f"catalog; the coding endpoint doesn't expose `/models`, …",
        }
    # empty catalog or lookup failed → fall through to generic reject

Behaviour matrix

Scenario	Endpoint	`api_models`	Before	After
Switch to `qwen3-coder-plus` on coding	404s on `/models`	`None`	REJECT (bug)	accept (quiet)
Switch to `glm-5` on coding (third-party via DashScope)	404s on `/models`	`None`	REJECT (bug)	accept (quiet)
Switch to unknown model on coding	404s on `/models`	`None`	reject with "couldn't reach"	accept with `Similar models:` suggestions
Switch on classic endpoint	supports `/models`	`["qwen3-…", …]`	accept via live list	unchanged
Switch on other provider whose API is unreachable	timeout	`None`	reject with "couldn't reach"	unchanged (narrow-scope guard)
Catalog module raises	—	`None`	reject	fall through to reject (defensive)

Narrow scope — explicitly not changed

Classic DashScope endpoint (dashscope-intl.aliyuncs.com/compatible-mode/v1). Still probes the live /models endpoint — pinned by test_live_api_path_unchanged_when_endpoint_supports_models.
Other providers' hard-reject behaviour. The if normalized == "alibaba" is deliberately narrow. Pinned by test_other_providers_still_hard_reject_when_api_unreachable (zai canary).
The aeb53131f "fake models" guard. Intact for every path that doesn't match alibaba.
provider: custom with a DashScope coding base URL. Custom provider has its own validation path via TestCustomProviderBaseUrlSuggestion; out of scope.

Regression coverage

tests/hermes_cli/test_model_validation.py::TestValidateAlibabaCodingEndpoint — 8 cases:

4 new-behaviour tests (fail on main):
- test_known_qwen_model_accepted (reporter's exact repro)
- test_known_third_party_model_accepted (glm-5, kimi-k2.5, MiniMax-M2.5)
- test_unknown_model_accepted_with_warning
- test_unknown_model_includes_suggestions
2 defensive-fallback pins:
- test_empty_catalog_falls_through_to_generic_reject
- test_catalog_lookup_exception_falls_through
2 preserved-behaviour canaries (pass on both main and branch):
- test_live_api_path_unchanged_when_endpoint_supports_models
- test_other_providers_still_hard_reject_when_api_unreachable

4 of 8 fail on clean origin/main (6fb69229) with assert False is True on result["accepted"] — the exact reporter symptom.

Validation

source venv/bin/activate
python -m pytest tests/hermes_cli/test_model_validation.py::TestValidateAlibabaCodingEndpoint -q
# 8 passed

Broader model-switch / normalize suites:

python -m pytest \
  tests/hermes_cli/test_model_validation.py \
  tests/hermes_cli/test_model_switch_custom_providers.py \
  tests/hermes_cli/test_model_switch_variant_tags.py \
  tests/hermes_cli/test_user_providers_model_switch.py \
  tests/hermes_cli/test_model_normalize.py -q
# 147 passed, 0 failures

Pre-empted review questions

Q. Why not always fall back to the catalog for every provider?
Because aeb53131f was deliberate — the "fake model" guard was added to catch user typos that would otherwise silently burn API quota on nonexistent models. The fix is narrowed to alibaba because the coding endpoint structurally doesn't expose /models (it's not a transient network error), so the catalog is the only validation signal available. Other providers should still fail loudly when their API is genuinely unreachable.

Q. What about users on the classic DashScope endpoint?
They follow the live /models path at models.py:2114, which runs before my fallback. No change. Pinned by a dedicated test.

Q. What if the catalog drifts out of sync with what DashScope actually serves?
That's already the tradeoff of using any static catalog — same for bedrock, openai-codex, nous, copilot. Unknown models are still accepted (with a warning) so users who want to use newer models than the catalog knows about still get the switch. They just see "not found in catalog" as a heads-up.

Q. Auto-correction for typos (qwen3-coder-plu → qwen3-coder-plus)?
Not in this PR — matches Bedrock's current behaviour (suggest-only, don't auto-correct on the fallback path). Auto-correction is only done when /models returns a list. Scope creep otherwise.

_{Co-authored via LLM assistance; I've reviewed every line and am responsible for correctness.}

…en /models 404s (NousResearch#12272) ``validate_requested_model`` probes the provider's ``/models`` endpoint to verify that a requested model exists. The Alibaba / DashScope **coding** endpoint (``https://coding.dashscope.aliyuncs.com/v1``) does not expose ``/models`` — the request returns **HTTP 404** and ``fetch_api_models`` yields ``None``. Before this change, the "couldn't reach API" branch at ``hermes_cli/models.py:2195`` hard-rejected every ``/model`` switch for DashScope coding users (``qwen3-coder-plus``, ``kimi-k2.5``, ``glm-5``, ``MiniMax-M2.5``, …), even for IDs Hermes statically knows about via ``_PROVIDER_MODELS["alibaba"]``. Reporter: NousResearch#12272. Regression window — the "couldn't reach → hard reject" branch was introduced by ``aeb53131f`` (Apr 13, 2026); pre-``aeb53131f`` the fallback silently accepted and persisted. The new fix reintroduces switch-ability for DashScope coding without reverting ``aeb53131f``'s "fake model" guard for other providers. Fix --- Mirror the existing Bedrock pattern (``models.py:2161``): when ``api_models is None`` AND ``normalized == "alibaba"``, fall back to the curated ``provider_model_ids("alibaba")`` catalog. * Catalog hit → ``{accepted: True, recognized: True}`` (no warning). * Catalog miss → ``{accepted: True, recognized: False}`` with a clear warning naming the coding endpoint + close-match suggestions. * Empty catalog (``provider_model_ids`` returned ``[]``) or catalog lookup exception → fall through to the existing generic reject so we don't silently accept every unknown model. Narrow scope — explicitly not changed ------------------------------------- * **Classic DashScope endpoint** (``dashscope-intl.aliyuncs.com/compatible-mode/v1``) — still validates against the live ``/models`` listing, unchanged. Pinned by a canary test. * **Other providers** — the alibaba-only ``if`` means no other provider's hard-reject behaviour changes. Pinned by a ``zai`` canary test. * **The ``aeb53131f`` "fake models" guard** — intact for every path that doesn't match ``alibaba``. * **``provider: custom`` with a DashScope coding base URL** — out of scope; custom provider has its own validation path via ``TestCustomProviderBaseUrlSuggestion``. Regression coverage ------------------- ``tests/hermes_cli/test_model_validation.py`` gets a new ``TestValidateAlibabaCodingEndpoint`` class with 8 cases: * 4 new-behaviour tests (fail on main): - ``test_known_qwen_model_accepted`` (reporter's repro) - ``test_known_third_party_model_accepted`` (glm-5 / kimi-k2.5 / MiniMax-M2.5 all flow through) - ``test_unknown_model_accepted_with_warning`` - ``test_unknown_model_includes_suggestions`` * 2 defensive-fallback pins: - ``test_empty_catalog_falls_through_to_generic_reject`` - ``test_catalog_lookup_exception_falls_through`` * 2 preserved-behaviour canaries (pass on main and branch): - ``test_live_api_path_unchanged_when_endpoint_supports_models`` - ``test_other_providers_still_hard_reject_when_api_unreachable`` 4 of the 8 fail on clean ``origin/main`` (``6fb69229``) with ``assert False is True`` on ``result["accepted"]`` — the exact reporter symptom. The remaining 4 pin preserved behaviour. Validation ---------- ``source venv/bin/activate && python -m pytest tests/hermes_cli/test_model_validation.py::TestValidateAlibabaCodingEndpoint -q`` → **8 passed**. Broader model-switch / normalize suites (``test_model_validation.py``, ``test_model_switch_custom_providers.py``, ``test_model_switch_variant_tags.py``, ``test_user_providers_model_switch.py``, ``test_model_normalize.py``) → **147 passed, 0 failures.** Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

Fixes model switching for Alibaba/DashScope “coding” base URLs where /models returns 404 by falling back to Hermes’ curated Alibaba model catalog when live model probing is unavailable.

Changes:

Add an alibaba-specific fallback in validate_requested_model() when /models cannot be fetched, using provider_model_ids("alibaba").
Accept known catalog models silently; accept unknown models with a warning + “Similar models” suggestions; retain generic hard-reject when the catalog is unavailable/empty.
Add a dedicated regression test suite covering the coding-endpoint behavior and canary tests to ensure other providers remain unchanged.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
`hermes_cli/models.py`	Adds Alibaba-specific catalog fallback when `/models` probing yields `None`, matching the intended behavior for DashScope coding endpoints.
`tests/hermes_cli/test_model_validation.py`	Adds regression and canary tests validating the new Alibaba fallback behavior and preserving existing behavior for other providers/endpoints.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

briandevans · 2026-04-18T21:18:11Z

CI audit — all failures are pre-existing baselines:

build-and-push FAILURE — Docker entrypoint smoke test:

mkdir: cannot create directory '/opt/data/cron': Permission denied

Standing infra issue (container UID vs mounted volume ownership). Reproduces on origin/main builds.

test FAILURE — every failing test is in the standing baseline set I've verified against clean origin/main (6fb69229):

test_browser_camofox_state.py::test_config_version_matches_current_schema (assert 19 == 18)
test_web_server.py::test_no_single_field_categories (schema drift)
2× test_concurrent_interrupt.py (_Stub missing attr)
test_run_agent.py::test_tool_call_accumulation ('search' == 'web_search')
4× test_send_message_tool.py (pytest-xdist ordering flakes, pass under -n auto on main)
2× test_skills_tool.py::TestSkillView* (xdist fixture ordering)
test_tts_mistral.py::test_telegram_produces_ogg_and_voice_compatible

Zero in hermes_cli/models.py or tests/hermes_cli/test_model_validation.py.

Green: check-attribution, e2e, nix (ubuntu-latest), nix (macos-latest), supply-chain scan.

Focused suite on branch: pytest tests/hermes_cli/test_model_validation.py::TestValidateAlibabaCodingEndpoint -q → 8 passed.
Broader model suite (5 files): 147 passed, 0 failures.

Artem151193 · 2026-04-19T08:45:57Z

Automated review check: BLOCKED.

Required CI is failing: build-and-push and test both failed. The remaining checks passed, but this is still a merge blocker.

briandevans · 2026-04-19T13:28:14Z

@Artem151193 Thanks for the sweep — the two red checks are pre-existing baselines on origin/main (6fb69229), not introduced by this PR. Evidence:

build-and-push: Docker entrypoint permission errors (mkdir: cannot create directory '/opt/data/cron': Permission denied). Container-UID vs mounted-volume-ownership issue — reproduces on main builds and on every other open PR right now (see #12266, #12284, #12170, #12165, #12076, #12053, #11992).

test: 12 failures, every single one in the standing baseline set I verified against clean origin/main earlier in this session — see the baseline audit I posted above for the per-test classification. None touch hermes_cli/models.py or tests/hermes_cli/test_model_validation.py.

Focused suite on this PR's added class → 8 passed. Broader model/switch/normalize suite (5 files) → 147 passed, 0 failures.

Happy to open a separate PR addressing the most tractable baseline test failures (e.g. the _Stub missing-attribute issue in test_concurrent_interrupt.py) if that would unblock the merge process — just let me know.

…Research#12532) The gateway ``/model`` picker calls ``validate_requested_model``, which probes the provider's ``/models`` endpoint. Two distinct failures drop Gemini and Anthropic models from that flow: * **Gemini**: the OpenAI-compat endpoint at ``generativelanguage.googleapis.com/v1beta/openai/models`` returns IDs prefixed with ``models/`` (e.g. ``models/gemini-2.5-flash``) — native Gemini-API convention. Our curated list and user input use the bare ID, so the set-membership check drops every known Gemini model. * **Anthropic**: the generic ``probe_api_models`` helper sends ``Authorization: Bearer`` without the ``anthropic-version`` header, so Anthropic's native ``/v1/models`` returns 4xx and ``fetch_api_models`` yields ``None``. The request lands in the generic "could not reach API" hard-reject, even though ``_fetch_anthropic_models`` (with the correct ``x-api-key`` + ``anthropic-version`` headers) works elsewhere in the codebase. Both paths cause the gateway picker to fail while ``hermes model`` (which skips validation) works fine with the same credentials. Reporter: NousResearch#12532. Fix --- Two surgical additions to ``hermes_cli.models.validate_requested_model``: 1. Strip the ``models/`` prefix from the probed listing when ``normalized == "gemini"``, before the set-membership check. The rest of the strict branch (auto-correction, suggestions, reject path) reuses the normalized list — suggestions therefore surface bare IDs the user can actually type. 2. When ``api_models is None`` and ``normalized == "anthropic"``, fall back to ``provider_model_ids("anthropic")``. That helper internally uses ``_fetch_anthropic_models`` with the correct headers and falls back to the curated static list when the live fetch also fails — identical pattern to the existing Bedrock (``#bedrock``) and Alibaba (NousResearch#12272 / PR NousResearch#12287) fall-throughs. Narrow scope — explicitly not changed ------------------------------------- * **``probe_api_models`` auth headers.** Still ``Bearer``-only. Adding Anthropic-specific headers to the generic probe is out of scope; the catalog fall-through is the less invasive fix and keeps the generic probe provider-agnostic. * **Other providers whose /models returns prefixed IDs.** The strip is gated on ``normalized == "gemini"`` so no other provider's behaviour changes. Pinned by a ``custom`` canary test. * **Other providers' hard-reject on unreachable API.** Still reject. Pinned by a ``zai`` canary test. * **Reporter's Option A** (skip validation entirely when the model was chosen from a curated picker). That's a gateway-side refactor (``_on_model_selected`` → ``switch_model``); this PR keeps the fix at the validator layer, which also covers CLI direct invocations and future callers. Regression coverage ------------------- ``tests/hermes_cli/test_model_validation.py`` gets two new classes: * ``TestValidateGeminiModelsPrefix`` (4 cases) — bare ID acceptance, all curated Gemini IDs resolve, unknown IDs surface suggestions that don't leak the ``models/`` prefix, and a canary pinning that the strip is gated on gemini. * ``TestValidateAnthropicNoModelsEndpoint`` (7 cases) — curated Claude model accepted, all three tiers (opus/sonnet/haiku) resolve, unknown models accepted with warning, empty-catalog + exception fall through to the original generic reject, close-match suggestions surface on typos, and a ``zai`` canary preserving the generic reject for other providers. 6 of the 11 fail on clean ``origin/main`` (``6fb69229``) with ``assert False is True`` on ``result["accepted"]`` — the exact reporter symptom. The 5 remaining tests pin preserved behaviour (canaries + defensive fall-throughs). Validation ---------- ``source venv/bin/activate && python -m pytest tests/hermes_cli/test_model_validation.py::TestValidateGeminiModelsPrefix tests/hermes_cli/test_model_validation.py::TestValidateAnthropicNoModelsEndpoint -q`` → **11 passed**. Broader model-switch / normalize suites (6 files) → **159 passed, 0 failures**. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

teknium1 · 2026-04-27T04:39:30Z

Thanks for the thorough write-up and test coverage, @briandevans — the root cause analysis and behaviour matrix were spot on.

This is an automated hermes-sweeper review.

The underlying bug (#12272) has been independently fixed on main by a broader patch that landed three days after this PR was filed:

Commit 3f72b2fe — fix(/model): accept provider switches when /models is unreachable (Apr 21, 2026) added a generic provider_model_ids() catalog fallback block for all providers when api_models is None (hermes_cli/models.py:2905). This is a superset of the narrow if normalized == "alibaba": guard proposed here — alibaba falls through to it naturally.
The _PROVIDER_MODELS["alibaba"] catalog at models.py:348 already contains qwen3-coder-plus, glm-5, MiniMax-M2.5, and kimi-k2.5 — the exact models the reporter was blocked on.
The same commit also updated tests/hermes_cli/test_model_validation.py to flip the "API unreachable" assertions from accepted=False to accepted=True, covering the same behaviour your 8 tests assert.

Closing as implemented on main. The test structure and edge-case analysis you contributed here (especially the defensive fallback pins and the behaviour matrix) visibly informed how the broader fix was framed.

Copilot AI review requested due to automatic review settings April 18, 2026 20:59

Copilot started reviewing on behalf of briandevans April 18, 2026 21:00 View session

Copilot AI reviewed Apr 18, 2026

View reviewed changes

This was referenced Apr 19, 2026

fix(skill-config): expand ~ against subprocess HOME, not Python process HOME (#12260) #12284

Closed

test(concurrent-interrupt): add missing _apply_pending_steer_to_tool_results no-op to _Stub #12574

Closed

briandevans mentioned this pull request Apr 20, 2026

fix(run_agent): preserve dotted Bedrock inference-profile model IDs (#11976) #11992

Closed

13 tasks

alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/cli CLI entry point, hermes_cli/, setup wizard provider/qwen Qwen / Alibaba Cloud (OAuth) labels Apr 23, 2026

This was referenced Apr 23, 2026

fix(models): alibaba coding plan /model validation + provider_label scope bug #12279

Closed

fix(models): accept known DashScope models offline #11967

Closed

Bug: hermes model command fails to validate DashScope provider models #11954

Closed

teknium1 closed this Apr 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(models): validate Alibaba coding-endpoint switches via catalog when /models 404s (#12272)#12287

fix(models): validate Alibaba coding-endpoint switches via catalog when /models 404s (#12272)#12287
briandevans wants to merge 1 commit into
NousResearch:mainfrom
briandevans:fix/alibaba-coding-endpoint-model-validation

briandevans commented Apr 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

briandevans commented Apr 18, 2026

Uh oh!

Artem151193 commented Apr 19, 2026

Uh oh!

briandevans commented Apr 19, 2026

Uh oh!

teknium1 commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

briandevans commented Apr 18, 2026

TL;DR

Regression window

Root cause

Fix

Behaviour matrix

Narrow scope — explicitly not changed

Regression coverage

Validation

Pre-empted review questions

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

briandevans commented Apr 18, 2026

Uh oh!

Artem151193 commented Apr 19, 2026

Uh oh!

briandevans commented Apr 19, 2026

Uh oh!

teknium1 commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants