fix(/model): respect per-model context_length from custom_providers config#11438
fix(/model): respect per-model context_length from custom_providers config#11438pdscomp wants to merge 3 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes /model switching so per-model context_length overrides defined under custom_providers[].models[].context_length are respected (rather than falling back to probing /models, which can be incomplete for some OpenAI-compatible servers), and propagates the resolved context length into user-facing confirmations.
Changes:
- Resolve per-model
context_lengthfromcustom_providersduring model switch and propagate it viaModelSwitchResult.context_length. - Reset cached context-length override behavior in
AIAgent.switch_model()to re-resolve overrides on provider changes. - Prefer
result.context_lengthin CLI and gateway confirmations; discard dispatched slash-commands when restoring modal input snapshot.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| run_agent.py | Clears cached context-length override on provider change; attempts per-model override lookup before probing context length. |
| hermes_cli/model_switch.py | Adds context_length to ModelSwitchResult and resolves it from custom_providers config during switch. |
| cli.py | Updates /model confirmation output to prefer result.context_length; avoids restoring dispatched slash-commands into the input buffer. |
| gateway/run.py | Updates Telegram/webhook confirmation output to prefer result.context_length before models.dev/probing fallback. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…onfig ## Summary - fix: resolve context_length from per-model custom_providers[].models[].context_length before falling back to the generic get_model_context_length probe chain - fix: AIAgent.switch_model() clears _config_context_length on provider change so per-model overrides are re-resolved on every /model switch - fix: ModelSwitchResult carries resolved context_length to all confirmation paths (CLI, gateway Telegram, gateway webhook) with priority over model_info.context_window - fix: _restore_modal_input_snapshot() discards slash-commands that were dispatched (prevents /model re-appearing in input bar after send) - fix: run_agent._restore_modal_input_snapshot no longer restores dispatched slash commands ## Root cause When switching to a custom provider model (e.g. legion / Qwen3.6), the context_length was resolved via get_model_context_length() which probes the provider's OpenAI-compatible /models endpoint. The R523/R528 llama.cpp server at custom base URLs does not surface context window in its /models response, so the probe falls back to a default (128k). The config correctly specifies context_length under custom_providers[].models[], but that value was never consulted during /model switching — only at startup. Additionally, AIAgent._config_context_length was cached on the agent object and never cleared on provider change, so even the startup lookup was stale when switching between custom providers. ## Validation - /model switch to legion+Qwen3.6 reports correct 256K context in Telegram confirmation - python -m py_compile on all 4 modified files passes - Gateway and CLI confirmations both show correct context length for per-model overrides
9b95d14 to
d9e68f6
Compare
Render max_output, cost, and capabilities even when ModelSwitchResult.context_length is present, using context_length only for the context line override in CLI and gateway confirmations.\n\nAdd regression tests for both CLI and gateway confirmation paths.\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Keep global model.context_length override intact during provider changes and resolve per-model custom provider context_length from switch_model(custom_providers=...) before any config reload fallback.\n\nAdd focused regressions for provider-change override persistence and custom_providers argument priority.\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Thanks for the thorough investigation and fix, @pdscomp! This has been superseded by a maintainer-authored fix that landed on This is an automated hermes-sweeper review. Why closing: PR #15844 (merged 2026-04-26 by @teknium1) resolves the same root cause —
Commit: Related: #15779 (upstream issue, now closed), #13052 (overlapping fix PR), #15787 (another fix PR for the same issue). |
Summary
Root cause
When switching to a custom provider model (e.g. legion / Qwen3.6), the context_length was resolved via get_model_context_length() which probes the provider's OpenAI-compatible /models endpoint. The R523/R528 llama.cpp server at custom base URLs does not surface context window in its /models response, so the probe falls back to a default (128k). The config correctly specifies context_length under custom_providers[].models[], but that value was never consulted during /model switching — only at startup.
Additionally, AIAgent._config_context_length was cached on the agent object and never cleared on provider change, so even the startup lookup was stale when switching between custom providers.
Files changed
Validation