Skip to content

fix(/model): respect per-model context_length from custom_providers config#11438

Closed
pdscomp wants to merge 3 commits into
NousResearch:mainfrom
pdscomp:fix/per-model-context-length
Closed

fix(/model): respect per-model context_length from custom_providers config#11438
pdscomp wants to merge 3 commits into
NousResearch:mainfrom
pdscomp:fix/per-model-context-length

Conversation

@pdscomp

@pdscomp pdscomp commented Apr 17, 2026

Copy link
Copy Markdown

Summary

  • fix: resolve context_length from per-model custom_providers[].models[].context_length before falling back to the generic get_model_context_length probe chain
  • fix: AIAgent.switch_model() clears _config_context_length on provider change so per-model overrides are re-resolved on every /model switch
  • fix: ModelSwitchResult carries resolved context_length to all confirmation paths (CLI, gateway Telegram, gateway webhook) with priority over model_info.context_window
  • fix: _restore_modal_input_snapshot() discards dispatched slash-commands (prevents /model re-appearing in input bar after send)

Root cause

When switching to a custom provider model (e.g. legion / Qwen3.6), the context_length was resolved via get_model_context_length() which probes the provider's OpenAI-compatible /models endpoint. The R523/R528 llama.cpp server at custom base URLs does not surface context window in its /models response, so the probe falls back to a default (128k). The config correctly specifies context_length under custom_providers[].models[], but that value was never consulted during /model switching — only at startup.

Additionally, AIAgent._config_context_length was cached on the agent object and never cleared on provider change, so even the startup lookup was stale when switching between custom providers.

Files changed

  • hermes_cli/model_switch.py: Added context_length field to ModelSwitchResult; lookup per-model context_length from custom_providers config before fallback
  • run_agent.py: Clear _config_context_length on provider change; pass per-model override to get_model_context_length
  • cli.py: Prefer result.context_length in CLI confirmation output
  • gateway/run.py: Prefer result.context_length in Telegram and webhook confirmation output

Validation

  • /model switch to legion+Qwen3.6 reports correct 256K context in Telegram confirmation
  • python -m py_compile on all 4 modified files passes
  • Gateway and CLI confirmations both show correct context length for per-model overrides

Copilot AI review requested due to automatic review settings April 17, 2026 06:03

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes /model switching so per-model context_length overrides defined under custom_providers[].models[].context_length are respected (rather than falling back to probing /models, which can be incomplete for some OpenAI-compatible servers), and propagates the resolved context length into user-facing confirmations.

Changes:

  • Resolve per-model context_length from custom_providers during model switch and propagate it via ModelSwitchResult.context_length.
  • Reset cached context-length override behavior in AIAgent.switch_model() to re-resolve overrides on provider changes.
  • Prefer result.context_length in CLI and gateway confirmations; discard dispatched slash-commands when restoring modal input snapshot.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File Description
run_agent.py Clears cached context-length override on provider change; attempts per-model override lookup before probing context length.
hermes_cli/model_switch.py Adds context_length to ModelSwitchResult and resolves it from custom_providers config during switch.
cli.py Updates /model confirmation output to prefer result.context_length; avoids restoring dispatched slash-commands into the input buffer.
gateway/run.py Updates Telegram/webhook confirmation output to prefer result.context_length before models.dev/probing fallback.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cli.py Outdated
Comment thread cli.py Outdated
Comment thread run_agent.py Outdated
Comment thread hermes_cli/model_switch.py Outdated
Comment thread hermes_cli/model_switch.py Outdated
Comment thread gateway/run.py Outdated
…onfig

## Summary
- fix: resolve context_length from per-model custom_providers[].models[].context_length
  before falling back to the generic get_model_context_length probe chain
- fix: AIAgent.switch_model() clears _config_context_length on provider change so
  per-model overrides are re-resolved on every /model switch
- fix: ModelSwitchResult carries resolved context_length to all confirmation paths
  (CLI, gateway Telegram, gateway webhook) with priority over model_info.context_window
- fix: _restore_modal_input_snapshot() discards slash-commands that were dispatched
  (prevents /model re-appearing in input bar after send)
- fix: run_agent._restore_modal_input_snapshot no longer restores dispatched slash commands

## Root cause
When switching to a custom provider model (e.g. legion / Qwen3.6), the context_length
was resolved via get_model_context_length() which probes the provider's OpenAI-compatible
/models endpoint. The R523/R528 llama.cpp server at custom base URLs does not surface
context window in its /models response, so the probe falls back to a default (128k).
The config correctly specifies context_length under custom_providers[].models[], but
that value was never consulted during /model switching — only at startup.

Additionally, AIAgent._config_context_length was cached on the agent object and never
cleared on provider change, so even the startup lookup was stale when switching between
custom providers.

## Validation
- /model switch to legion+Qwen3.6 reports correct 256K context in Telegram confirmation
- python -m py_compile on all 4 modified files passes
- Gateway and CLI confirmations both show correct context length for per-model overrides
@pdscomp pdscomp force-pushed the fix/per-model-context-length branch from 9b95d14 to d9e68f6 Compare April 17, 2026 06:40
pdscomp added 2 commits April 17, 2026 02:45
Render max_output, cost, and capabilities even when ModelSwitchResult.context_length is present, using context_length only for the context line override in CLI and gateway confirmations.\n\nAdd regression tests for both CLI and gateway confirmation paths.\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Keep global model.context_length override intact during provider changes and resolve per-model custom provider context_length from switch_model(custom_providers=...) before any config reload fallback.\n\nAdd focused regressions for provider-change override persistence and custom_providers argument priority.\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/cli CLI entry point, hermes_cli/, setup wizard comp/gateway Gateway runner, session dispatch, delivery comp/agent Core agent loop, run_agent.py, prompt builder labels Apr 24, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Overlaps significantly with #13052 — both fix custom_providers per-model context_length being ignored during /model switch. Also related to #12316 and #12380.

@teknium1

Copy link
Copy Markdown
Contributor

Thanks for the thorough investigation and fix, @pdscomp! This has been superseded by a maintainer-authored fix that landed on main.


This is an automated hermes-sweeper review.

Why closing: PR #15844 (merged 2026-04-26 by @teknium1) resolves the same root cause — custom_providers per-model context_length ignored on /model switch — and covers every fix surface this PR addresses:

  • hermes_cli/config.py line 2245 — new get_custom_provider_context_length() helper (single source of truth for the per-model lookup)
  • run_agent.py line 1798 — switch_model() re-reads custom_providers from live config and passes overrides to get_model_context_length on every /model switch
  • hermes_cli/model_switch.pyresolve_display_context_length() gains a custom_providers= kwarg wired through the display path

Commit: 125de02056eab84362fc91f57bd7041a19860b22

Related: #15779 (upstream issue, now closed), #13052 (overlapping fix PR), #15787 (another fix PR for the same issue).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder comp/cli CLI entry point, hermes_cli/, setup wizard comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants