Skip to content

fix(gateway): honor custom provider context length#5096

Closed
dlkakbs wants to merge 1 commit into
NousResearch:mainfrom
dlkakbs:fix/gateway-custom-provider-context-length-clean
Closed

fix(gateway): honor custom provider context length#5096
dlkakbs wants to merge 1 commit into
NousResearch:mainfrom
dlkakbs:fix/gateway-custom-provider-context-length-clean

Conversation

@dlkakbs

@dlkakbs dlkakbs commented Apr 4, 2026

Copy link
Copy Markdown
Contributor

Fixes: #5089

Summary:

This fixes gateway context-length resolution for models configured under custom_providers. When model.context_length is not set at the top level, the gateway status banner was incorrectly falling back to the default 128K context window instead of using the per-model context_length defined on the matching custom provider.

Related: #4085 fixed the gateway hygiene compression path for custom_providers context length resolution. This PR addresses the remaining gateway status/session info path and also makes the gateway-side lookup handle list-shaped custom_providers.models entries from the current repro.

Root cause:

_format_session_info() only read model.context_length from the top-level config and never resolved per-model context_length from custom_providers. The gateway hygiene path also assumed custom_providers[].models was always dict-shaped, so list-shaped model entries were skipped.

What changed:

  • Added a small resolver in gateway/run.py to read per-model context_length from the matching custom_providers entry.
  • Reused that resolver in both gateway status info formatting and gateway session hygiene context resolution.
  • Supported both config shapes for custom_providers[].models:
    • dict-based model mappings
    • list-based model entries
  • Added regression tests covering both shapes in tests/gateway/ test_session_info.py.

Testing:

  • pytest tests/gateway/test_session_info.py

@chevyphillip chevyphillip left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

pinch-claw added a commit to pinch-claw/hermes-agent that referenced this pull request Apr 23, 2026
…nner

The session-reset / info banner in `_format_session_info` resolved the
context window only from the top-level `model.context_length` key. When
users configured context_length under the new `providers.<name>.models.<model>`
dict schema (or the legacy `custom_providers[].models.<model>` list
schema), the banner fell through to `get_model_context_length()`'s probe
chain. Remote OpenAI-compatible proxies frequently omit `context_length`
from `/models` responses, so the probe failed silently and the banner
displayed `128K tokens (default — set model.context_length in config to
override)` — even though the runtime `ContextCompressor` was already
budgeting with the correct value resolved by `AIAgent.__init__`.

After the top-level lookup, walk `get_compatible_custom_providers(cfg)`,
match the entry by `base_url` against the resolved runtime base_url, and
read `entry["models"][model]["context_length"]`. This mirrors the exact
resolution `AIAgent.__init__` performs at `run_agent.py:1608-1643` so the
banner's displayed value matches what the compressor actually uses.

Fixes NousResearch#5089.

Relationship to existing open PRs (NousResearch/hermes-agent):
- NousResearch#5096, NousResearch#8240, NousResearch#10690: each hand-roll traversal of the legacy
  `custom_providers:` list only. They do not cover the newer
  `providers:` dict schema users land on via `hermes model`.
- NousResearch#12380: touches `/model` display + gateway fallback paths; overlaps
  in spirit but still traverses lists manually.

Routing through `get_compatible_custom_providers()` — the existing
compat shim at `hermes_cli/config.py:2078` — gives a single lookup that
covers both schemas and stays consistent with every other runtime
caller, eliminating future drift between display and compressor.
@teknium1

Copy link
Copy Markdown
Contributor

Thanks for this fix, @dlkakbs! The issue you identified is real, but it has since been addressed on main by a broader refactor in PR #15844 (commit 125de0205).

What landed on main:

  • A centralized get_custom_provider_context_length() helper was added to hermes_cli/config.py (line 2253) as the single source of truth for per-model context overrides from custom_providers.
  • gateway/run.py's _format_session_info() now passes custom_providers= through to get_model_context_length(), which calls this helper before any network probe (step 0b).
  • _normalize_custom_provider_entry() in hermes_cli/config.py (line 2169) converts list-shaped models entries to dict shape, handling the list-vs-dict config shape issue you identified.
  • The fix is wired through five call sites: agent startup, /model switch, display resolution, and both gateway session info paths.

This is an automated hermes-sweeper review. The cross-references to PRs #15668 and #15669 in the timeline also point to overlapping work in this area that has since landed.

@teknium1 teknium1 closed this Apr 27, 2026
@alt-glitch alt-glitch added type/bug Something isn't working comp/gateway Gateway runner, session dispatch, delivery area/config Config system, migrations, profiles P2 Medium — degraded but workaround exists labels May 1, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Related to #14382 and #8240 — all address custom_providers context_length resolution for #5089. This PR covers both gateway status and hygiene paths.

@alt-glitch

Copy link
Copy Markdown
Collaborator

Related to #14382 and #8240.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/config Config system, migrations, profiles comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Status banner fails to detect context_length from custom_providers

4 participants