Skip to content

fix(context): pass custom_providers in aux compression and fallback paths#18877

Closed
rkt2spc wants to merge 1 commit into
NousResearch:mainfrom
rkt2spc:fix/aux-compression-custom-providers-context
Closed

fix(context): pass custom_providers in aux compression and fallback paths#18877
rkt2spc wants to merge 1 commit into
NousResearch:mainfrom
rkt2spc:fix/aux-compression-custom-providers-context

Conversation

@rkt2spc

@rkt2spc rkt2spc commented May 2, 2026

Copy link
Copy Markdown

fix(context): pass custom_providers in aux compression and fallback paths

Problem

Users who pin context_length on a model in custom_providers see the main agent honor that override but the auxiliary compression model fall back to the default 256K — even when it's literally the same model. This produces a confusing startup warning and silently lowers the compression threshold every session.

What goes wrong (user perspective)

Take a custom_providers entry with a per-model context override (typical for the 1M-context Opus variant against a custom proxy):

model:
  default: claude-opus-4-7[1m]
  provider: my-vertex-proxy

custom_providers:
- name: my-vertex-proxy
  base_url: http://127.0.0.1:8788
  api_mode: anthropic_messages
  models:
    claude-opus-4-7[1m]:
      context_length: 1000000

compression:
  threshold: 0.5     # default — compress at 50% of context

Expected: compression triggers at 500K tokens (50% of 1M).

Actual: at session start, this warning appears:

⚠ Compression model claude-opus-4-7[1m] (127.0.0.1) context is 256,000 tokens,
  but the main model claude-opus-4-7[1m] (custom)'s compression threshold was
  500,000 tokens. Auto-lowered this session's threshold to 256,000 tokens so
  compression can run.

Both labels in that message are the same modelclaude-opus-4-7[1m] against the same custom provider. Yet Hermes resolves it to two different context windows: 1M for the main agent (correct), 256K for the compression model (wrong). The session quietly runs at half the configured threshold and the user has to either ignore the warning every session or manually drop compression.threshold to 0.25.

Root cause

agent/model_metadata.py:1229's get_model_context_length() accepts a custom_providers keyword. When supplied, step 0b in its resolution order checks custom_providers[].models[].context_length for an explicit override. Without it, that step is skipped and resolution falls through (eventually) to the DEFAULT_FALLBACK_CONTEXT of 256K for unknown custom endpoints.

There are four call sites in run_agent.py. Two pass custom_providers; two don't:

Line What it computes Passes custom_providers?
1995 Initial context length when a plugin context engine is loaded
2392 New context length after a /model switch (closes #15779)
2655 Auxiliary compression model context — the symptom above
7635 Fallback model context after primary failover

Site 2655 is what produces the warning. Site 7635 has the same bug but is latent — only triggers when a user has both a custom-provider main model AND a fallback_model; the symptom would be the fallback's compressor mis-sizing its threshold.

Fix

Mirror the existing pattern already used at run_agent.py:2382-2389 — load custom_providers from current config (defensively wrapped in try/except so a load failure doesn't break the path) and pass via the existing custom_providers= keyword:

_aux_custom_providers = None
try:
    from hermes_cli.config import (
        load_config,
        get_compatible_custom_providers,
    )
    _aux_custom_providers = get_compatible_custom_providers(load_config())
except Exception:
    _aux_custom_providers = None

aux_context = get_model_context_length(
    aux_model,
    base_url=aux_base_url,
    api_key=aux_api_key,
    config_context_length=getattr(self, "_aux_compression_context_length_config", None),
    provider=getattr(self, "provider", ""),
    custom_providers=_aux_custom_providers,   # ← new
)

Same shape applied at site 7635 for the fallback path.

No new code paths. No behavior change for users not using custom_providers.

Test plan

  • uv run pytest (existing suite passes)
  • Repro case: configure custom_providers[].models[].context_length: 1000000 and compression.threshold: 0.5 against a custom Anthropic-compatible endpoint; start a session → warning does not appear, context_compressor.threshold_tokens is 500,000.
  • Fallback case: configure as above plus a fallback_model pointing at the same custom provider; force the primary to fail (e.g. wrong port); confirm fallback's context_compressor.context_length reflects the per-model override (not the 256K default).

…aths

get_model_context_length() honors per-model context_length overrides
defined in custom_providers[].models[].context_length only when called
with the custom_providers= keyword. Two of the four call sites in
run_agent.py already do this; two do not — the auxiliary compression
context probe (~:2655) and the fallback-model activation path (~:7635).

Without the fix, a user who pins context_length on a model exposed by
a custom provider sees the main agent correctly use the override but
the compression model fall back to the 256K default, producing the
"Compression model ... context is 256,000 tokens" warning and an
auto-lowered session threshold even though the model can in fact
handle the configured window.

The fix mirrors the existing pattern from the switch_model() path
at ~:2382: load_config() + get_compatible_custom_providers(),
defensively wrapped in try/except, then pass via custom_providers=.
No behavior change for users not using custom_providers.
@alt-glitch alt-glitch added type/bug Something isn't working comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists labels May 2, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Likely duplicate of #13540 — same root cause: custom_providers context_length not propagated to auxiliary compression model. Also overlaps with #14152.

@alt-glitch

Copy link
Copy Markdown
Collaborator

Likely duplicate of #13540 — same root cause.

@teknium1

Copy link
Copy Markdown
Contributor

This appears to be implemented on current main. Automated hermes-sweeper review found both paths from this PR covered now, though the code has since been refactored out of run_agent.py.

Evidence:

  • Auxiliary compression context sizing now passes custom_providers=agent._custom_providers into get_model_context_length() at agent/conversation_compression.py:136-146. This came from 7becb19ea00c13bdff6f78b71aa3ddfb0bdb5378.
  • Fallback activation context sizing now passes custom_providers=getattr(agent, "_custom_providers", None) into get_model_context_length() at agent/chat_completion_helpers.py:1263-1268. The fallback fix landed via 21078ebcea6dd870835080fdc76a40284418c921 and was preserved/re-applied after refactor in b5bcffe1674fa9ab3ba7a754c07ab77bedde83a8 / 563b4d9e51a46cc421e327b351cb7efe1ccb151b.
  • agent/model_metadata.py:1565-1578 confirms that passing custom_providers is the path that checks per-model context_length overrides before falling through to probes/defaults.
  • The earliest release tag I found containing both the aux and fallback pieces is v2026.5.28.

Thanks for the clear writeup and repro details; they match the behavior now present on main.

@teknium1 teknium1 closed this Jun 11, 2026
@teknium1 teknium1 added the sweeper:implemented-on-main Sweeper: behavior already present on current main label Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists sweeper:implemented-on-main Sweeper: behavior already present on current main type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: /model switch to named custom provider ignores custom_providers model context_length

3 participants