Bug Description
When context_length is set in custom_providers.models, it correctly applies to the main model context window, but the auxiliary compression feasibility check (_check_compression_model_feasibility) does NOT resolve it for the compression model when the compression model falls back to the main model.
This produces an incorrect warning about context mismatch, and auto-lowers the compression threshold unnecessarily:
⚠ Compression model (glm-5.1) context is 128,000 tokens, but the main model's compression threshold was 130,000 tokens. Auto-lowered this session's threshold to 128,000 tokens so compression can run.
Even though the user has configured context_length: 200000 for the model.
Steps to Reproduce
- Configure
custom_providers with a model that has a context_length override different from the built-in default:
model:
default: glm-5.1
provider: custom
base_url: http://localhost:8317/v1
api_key: sk-xxx
custom_providers:
- name: Local (localhost:8317)
base_url: http://localhost:8317/v1
api_key: sk-xxx
model: glm-5.1
models:
glm-5.1:
context_length: 200000
- Do NOT set
auxiliary.compression.model or auxiliary.compression.context_length (so compression falls back to the main model).
- Set
compression.threshold: 0.65 (default).
- Start Hermes.
- Observe the warning on startup — 200K context should be recognized but 128K is reported instead.
Expected Behavior
When the compression model matches a model defined in custom_providers.models, the feasibility check should resolve the context_length from custom_providers the same way the main model does (lines 1499-1536). No warning should appear since 0.65 × 200,000 = 130,000 < 200,000.
Actual Behavior
The auxiliary feasibility check calls get_model_context_length() with config_context_length=None, falling back to the built-in default (128K for glm-5.1), ignoring the user's custom_providers.models.glm-5.1.context_length: 200000. This triggers a false warning and auto-lowers the compression threshold.
Affected Component
Messaging Platform (if gateway-related)
N/A (CLI only)
Debug Report
Running on self-hosted VPS with CLIProxyAPI as local LLM gateway. Issue reproduced on CLI and Discord gateway.
Operating System
Ubuntu 24.04 VPS
Python Version
3.11
Hermes Version
v0.1.0 (editable install from NousResearch/hermes-agent main branch)
Root Cause Analysis
In run_agent.py, line ~2080-2085, the feasibility check calls:
aux_context = get_model_context_length(
aux_model,
base_url=aux_base_url,
api_key=aux_api_key,
config_context_length=getattr(self, "_aux_compression_context_length_config", None),
)
This only passes _aux_compression_context_length_config (from auxiliary.compression.context_length in config), but does NOT resolve custom_providers.models context_length for the auxiliary model.
Meanwhile, the main model (lines 1499-1536) correctly resolves custom_providers.models context_length and stores it in _config_context_length. The auxiliary path skips this resolution entirely.
When the compression model is the same as the main model (default behavior), get_model_context_length receives config_context_length=None and falls back to built-in defaults (128K for glm-5.1).
Proposed Fix
In _check_compression_model_feasibility, before calling get_model_context_length, resolve the custom_providers.models context_length for aux_model (mirroring the logic at lines 1499-1536) and pass it as config_context_length.
Alternatively, extract the custom_providers.models context resolution into a reusable function that both the main model and auxiliary paths call.
Are you willing to submit a PR for this?
Bug Description
When
context_lengthis set incustom_providers.models, it correctly applies to the main model context window, but the auxiliary compression feasibility check (_check_compression_model_feasibility) does NOT resolve it for the compression model when the compression model falls back to the main model.This produces an incorrect warning about context mismatch, and auto-lowers the compression threshold unnecessarily:
Even though the user has configured
context_length: 200000for the model.Steps to Reproduce
custom_providerswith a model that has acontext_lengthoverride different from the built-in default:auxiliary.compression.modelorauxiliary.compression.context_length(so compression falls back to the main model).compression.threshold: 0.65(default).Expected Behavior
When the compression model matches a model defined in
custom_providers.models, the feasibility check should resolve thecontext_lengthfromcustom_providersthe same way the main model does (lines 1499-1536). No warning should appear since0.65 × 200,000 = 130,000 < 200,000.Actual Behavior
The auxiliary feasibility check calls
get_model_context_length()withconfig_context_length=None, falling back to the built-in default (128K for glm-5.1), ignoring the user'scustom_providers.models.glm-5.1.context_length: 200000. This triggers a false warning and auto-lowers the compression threshold.Affected Component
Messaging Platform (if gateway-related)
N/A (CLI only)
Debug Report
Running on self-hosted VPS with CLIProxyAPI as local LLM gateway. Issue reproduced on CLI and Discord gateway.
Operating System
Ubuntu 24.04 VPS
Python Version
3.11
Hermes Version
v0.1.0 (editable install from NousResearch/hermes-agent main branch)
Root Cause Analysis
In
run_agent.py, line ~2080-2085, the feasibility check calls:This only passes
_aux_compression_context_length_config(fromauxiliary.compression.context_lengthin config), but does NOT resolvecustom_providers.modelscontext_length for the auxiliary model.Meanwhile, the main model (lines 1499-1536) correctly resolves
custom_providers.modelscontext_length and stores it in_config_context_length. The auxiliary path skips this resolution entirely.When the compression model is the same as the main model (default behavior),
get_model_context_lengthreceivesconfig_context_length=Noneand falls back to built-in defaults (128K for glm-5.1).Proposed Fix
In
_check_compression_model_feasibility, before callingget_model_context_length, resolve thecustom_providers.modelscontext_length foraux_model(mirroring the logic at lines 1499-1536) and pass it asconfig_context_length.Alternatively, extract the
custom_providers.modelscontext resolution into a reusable function that both the main model and auxiliary paths call.Are you willing to submit a PR for this?