Skip to content

[Bug]: custom_providers.models.context_length not propagated to auxiliary compression feasibility check #12977

@hoanggia7398

Description

@hoanggia7398

Bug Description

When context_length is set in custom_providers.models, it correctly applies to the main model context window, but the auxiliary compression feasibility check (_check_compression_model_feasibility) does NOT resolve it for the compression model when the compression model falls back to the main model.

This produces an incorrect warning about context mismatch, and auto-lowers the compression threshold unnecessarily:

⚠ Compression model (glm-5.1) context is 128,000 tokens, but the main model's compression threshold was 130,000 tokens. Auto-lowered this session's threshold to 128,000 tokens so compression can run.

Even though the user has configured context_length: 200000 for the model.

Steps to Reproduce

  1. Configure custom_providers with a model that has a context_length override different from the built-in default:
model:
  default: glm-5.1
  provider: custom
  base_url: http://localhost:8317/v1
  api_key: sk-xxx

custom_providers:
  - name: Local (localhost:8317)
    base_url: http://localhost:8317/v1
    api_key: sk-xxx
    model: glm-5.1
    models:
      glm-5.1:
        context_length: 200000
  1. Do NOT set auxiliary.compression.model or auxiliary.compression.context_length (so compression falls back to the main model).
  2. Set compression.threshold: 0.65 (default).
  3. Start Hermes.
  4. Observe the warning on startup — 200K context should be recognized but 128K is reported instead.

Expected Behavior

When the compression model matches a model defined in custom_providers.models, the feasibility check should resolve the context_length from custom_providers the same way the main model does (lines 1499-1536). No warning should appear since 0.65 × 200,000 = 130,000 < 200,000.

Actual Behavior

The auxiliary feasibility check calls get_model_context_length() with config_context_length=None, falling back to the built-in default (128K for glm-5.1), ignoring the user's custom_providers.models.glm-5.1.context_length: 200000. This triggers a false warning and auto-lowers the compression threshold.

Affected Component

  • Agent Core (conversation loop, context compression, memory)
  • Configuration (config.yaml, .env, hermes setup)

Messaging Platform (if gateway-related)

N/A (CLI only)

Debug Report

Running on self-hosted VPS with CLIProxyAPI as local LLM gateway. Issue reproduced on CLI and Discord gateway.

Operating System

Ubuntu 24.04 VPS

Python Version

3.11

Hermes Version

v0.1.0 (editable install from NousResearch/hermes-agent main branch)

Root Cause Analysis

In run_agent.py, line ~2080-2085, the feasibility check calls:

aux_context = get_model_context_length(
    aux_model,
    base_url=aux_base_url,
    api_key=aux_api_key,
    config_context_length=getattr(self, "_aux_compression_context_length_config", None),
)

This only passes _aux_compression_context_length_config (from auxiliary.compression.context_length in config), but does NOT resolve custom_providers.models context_length for the auxiliary model.

Meanwhile, the main model (lines 1499-1536) correctly resolves custom_providers.models context_length and stores it in _config_context_length. The auxiliary path skips this resolution entirely.

When the compression model is the same as the main model (default behavior), get_model_context_length receives config_context_length=None and falls back to built-in defaults (128K for glm-5.1).

Proposed Fix

In _check_compression_model_feasibility, before calling get_model_context_length, resolve the custom_providers.models context_length for aux_model (mirroring the logic at lines 1499-1536) and pass it as config_context_length.

Alternatively, extract the custom_providers.models context resolution into a reusable function that both the main model and auxiliary paths call.

Are you willing to submit a PR for this?

  • I'd like to fix this myself and submit a PR

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existsarea/configConfig system, migrations, profilescomp/agentCore agent loop, run_agent.py, prompt buildertype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions