Skip to content

fix(context): pass config_context_length to all get_model_context_length() call sites#12630

Closed
quinnmacro wants to merge 1 commit into
NousResearch:mainfrom
quinnmacro:fix/context-length-passthrough
Closed

fix(context): pass config_context_length to all get_model_context_length() call sites#12630
quinnmacro wants to merge 1 commit into
NousResearch:mainfrom
quinnmacro:fix/context-length-passthrough

Conversation

@quinnmacro

@quinnmacro quinnmacro commented Apr 19, 2026

Copy link
Copy Markdown

What does this PR do?

Passes config_context_length to all get_model_context_length() call sites so that custom OpenAI-compatible endpoints (e.g., self-hosted, Infini-AI) that don't expose context_length via their /models API show and use the correct context window from custom_providers[].models..context_length in config.yaml.

Problem

get_model_context_length() accepts a config_context_length parameter that lets callers pass an explicit context length from custom_providers[].models.<model>.context_length in config.yaml. However, several display and utility call sites never pass this value, causing the function to fall through to the 128K default when custom endpoints don't return context_length from their /models API.

The AIAgent.__init__ path in run_agent.py correctly resolves context_length from custom_providers, but these call sites miss it:

File Line Call site Status
gateway/run.py L3777 @context reference expansion Fixed in commit 1
gateway/run.py L4718 /modelinfo display Fixed in commit 1
gateway/run.py L5577 /model switch display Fixed in commit 1
cli.py L4750 /model display (first variant) Fixed in commit 1
cli.py L4979 /model display (fallback variant) Fixed in commit 1
cli.py L7934 @context reference expansion Fixed in commit 1
run_agent.py L6437 Fallback model switch Fixed in commit 2

The remaining 6 call sites already pass config_context_length:

  • gateway/run.py L4067: already passes _hyg_config_context_length
  • run_agent.py L1513: already passes _config_context_length
  • run_agent.py L1800: already passes getattr(self, "_config_context_length", None)
  • run_agent.py L2014: already passes _aux_context_config
  • agent/context_compressor.py L305: constructor parameter, already received
  • hermes_cli/web_server.py L624: intentionally passes None (wants auto-detected value)

Duplicate Check

I searched existing upstream PRs for the exact code path and issue terms before opening:

  • config_context_length
  • get_model_context_length passthrough
  • custom_providers context_length
  • 128K default context

I did not find an open PR that fixes this specific passthrough gap.

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Security fix
  • Documentation update
  • Tests (adding or improving test coverage)
  • Refactor (no behavior change)

Changes Made

  • gateway/run.py: Added config_context_length=getattr(self, "_config_context_length", None) at L3777 (@context expansion path).
  • gateway/run.py: Added inline custom_providers resolution at L4718 (/modelinfo display) and L5577 (/model switch display) since these paths don't have self._config_context_length cached. The resolution logic mirrors AIAgent.__init__.
  • cli.py: Added config_context_length=getattr(self, "_config_context_length", None) at L4750, L4979, and L7934.
  • run_agent.py: Added config_context_length=getattr(self, "_config_context_length", None) at L6437 (fallback model switch path).

Impact

@context reference expansion (gateway + CLI)

@context expansion uses the returned context length to set injection limits (hard_limit = context_length * 0.50, soft_limit = context_length * 0.25). With the 128K default, a @context reference on a 200K model hits the hard limit at 64K tokens — 36K tokens short of the correct 100K limit. File content beyond 64K tokens is silently refused injection.

Fallback model switch (run_agent.py)

When the primary model fails and the agent falls back, get_model_context_length() at L6437 is called without config_context_length. If the fallback model is also a custom endpoint, the context compressor updates to an incorrect 128K threshold, causing premature compression.

Display-only (no functional impact)

/modelinfo and /model switch displays show 128K instead of the configured 200K — misleading but not functionally harmful.

How to Test

  1. Configure a custom OpenAI-compatible endpoint that does not return context_length from its /models API:

    # config.yaml
    custom_providers:
      - base_url: "https://your-custom-endpoint/v1"
        models:
          your-model:
            context_length: 200000
  2. Start Hermes with that model and run:

    /modelinfo
    

    Before: shows 128K context.
    After: shows 200K context (from config).

  3. Test @context reference expansion with a large file:

    @context Read the file at /path/to/large/file.txt
    

    Before: hard limit = 64K tokens; large files are refused injection.
    After: hard limit = 100K tokens; correct injection.

  4. Test /model switch display — switch to the custom model and verify context length shown is correct.

Testing

Verified with Infini-AI custom endpoint:

  • Before: /modelinfo showed 128K, @context hard limit = 64K
  • After: /modelinfo shows 200K (configured), @context hard limit = 100K

Full call-site audit: all 13 get_model_context_length() invocations across gateway/run.py, cli.py, run_agent.py, agent/context_compressor.py, and hermes_cli/web_server.py have been verified — 7 fixed, 6 already correct.

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix (no unrelated commits)
  • I've verified all call sites of the affected function are covered

Documentation & Housekeeping

  • I've updated relevant documentation — N/A (internal fix, no API change)
  • I've updated cli-config.yaml.example if I added/changed config keys — N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — N/A
  • I've considered cross-platform impact — no platform-specific behavior

@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery comp/cli CLI entry point, hermes_cli/, setup wizard comp/agent Core agent loop, run_agent.py, prompt builder labels Apr 23, 2026
@quinnmacro quinnmacro force-pushed the fix/context-length-passthrough branch from 4ca8f04 to 3ae67e2 Compare April 26, 2026 01:00
@quinnmacro

Copy link
Copy Markdown
Author

PR Update: Rebased + Refactored

Rebased onto latest main (resolved 3 conflicts from upstream ModelInfo refactor) and extracted duplicated inline custom_providers parsing into a helper function.

Changes Summary

Before: +69/-1 across 3 files (minimal passthrough fix)
After: +94/-56 across 4 files (passthrough fix + deduplication)

File Change
agent/model_metadata.py New resolve_custom_providers_context_length() helper — single source of truth for custom_providers context length resolution
cli/cli.py Resolved conflict: old fallback branch removed (upstream ModelInfo replaces it)
cli/run_agent.py 39-line inline → 2-line helper call (warn_on_invalid=True preserves diagnostic logging)
gateway/run.py 2 conflict resolutions + 2 inline blocks → helper calls (67 lines removed)

Key Decisions

  • Upstream's ModelInfo refactor made two of the original fix sites unnecessary (cli.py L5401, gateway L5727) — those fallback branches were deleted
  • Helper placed in agent/model_metadata.py (same module as get_model_context_length)
  • warn_on_invalid kw-only param preserves run_agent.py init-path warning while keeping gateway paths silent

All lint + syntax checks pass. Ready for review.


Gentle ping — this PR has been open for 7 days without review. The bug affects any user with custom_providers models (Infini-AI, local endpoints, etc.) where context_length gets silently truncated to 128K. Happy to make any adjustments the maintainers suggest.

@quinnmacro quinnmacro force-pushed the fix/context-length-passthrough branch from 3ae67e2 to 672c715 Compare April 26, 2026 10:05
… sites

Custom endpoints (like Infini-AI) don't return context_length from
their /models API, causing get_model_context_length() to fall through
to the 128K default before reaching hardcoded fallbacks. The AIAgent
init path correctly resolves context_length from custom_providers, but
several display and utility call sites never passed this value.

Changes:
- agent/model_metadata.py: add resolve_custom_providers_context_length()
  helper that replaces all inline custom_providers lookup loops
- gateway/run.py: use helper in @context expansion, /modelinfo display,
  and hygiene path (was 3× duplicated inline code, now 3× 2-line calls)
- run_agent.py: use helper in AIAgent.__init__ custom_providers check
  (preserves warn_on_invalid=True for user-facing diagnostics)
- cli.py: pass config_context_length to get_model_context_length() at
  @context expansion and /model display paths
- run_agent.py: pass config_context_length to fallback model switch

Note: upstream refactored /model switch display to use ModelInfo objects
(removing the else fallback), so the gateway _on_model_selected inline
block from the original PR is no longer needed.
@quinnmacro

Copy link
Copy Markdown
Author

Closing in favor of a new, leaner PR (#NEW).

Since this PR was opened, upstream merged #15844 which fixed the core bug (custom_providers context_length not passed to display/switch paths) using a different approach — adding a custom_providers kwarg to get_model_context_length() + a get_custom_provider_context_length() helper.

This branch accumulated 1035 commits of divergence (dirty merge state) and included unrelated code reversions. Rather than rebase-rescue it, I'm opening a clean PR from the latest main that:

  1. Fixes a remaining bug: the @context expansion path still doesn't pass custom_providers to get_model_context_length(), so step 0b is skipped
  2. Code cleanup: replaces 27-line inline custom_providers parsing in the hygiene run with the existing get_custom_provider_context_length() helper

Net diff: +19/-21 lines. One real bug fix, one deduplication.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder comp/cli CLI entry point, hermes_cli/, setup wizard comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants