feat: warn at session start when compression model context is too small by teknium1 · Pull Request #7894 · NousResearch/hermes-agent

teknium1 · 2026-04-11T18:29:30Z

Summary

Adds a session-start check that detects when the auxiliary compression model's context window is smaller than the main model's compression threshold. When this is the case, context compression will not be possible because the content to summarize will exceed the auxiliary model's capacity.

What changed

run_agent.py:

New method _check_compression_model_feasibility() on AIAgent
Called during __init__ right after the compressor is initialized
Resolves the auxiliary compression model via the same resolution chain as call_llm(task='compression') — respects auxiliary.compression.model, compression.summary_model, env overrides, and the auto-detection chain
Compares the auxiliary model's context length against threshold_tokens (= main_context * threshold_percent)
Emits warning via _emit_status() — covers all platforms: CLI (_vprint(force=True)), and every gateway platform (Telegram, Discord, Slack, WhatsApp, Signal, Matrix, Mattermost, Home Assistant, DingTalk, etc.) through status_callback('lifecycle', ...)
Also logs via logger.warning() to agent.log
Warns when no auxiliary LLM provider is configured at all
Entire check is wrapped in try/except — never blocks startup

tests/run_agent/test_compression_feasibility.py: 8 tests covering:

Warning fires when aux context < threshold
No warning when aux context >= threshold
No provider configured → different warning
Compression disabled → check skipped
Exception safety (never crashes)
Gateway status_callback receives the warning
Exact boundary (equal = no warning)
One below boundary → warning fires

Example output

When a user has a 200K main model (threshold at 100K) but their auxiliary compression model only has 32K context:

📊 Context limit: 200,000 tokens (compress at 50% = 100,000)
⚠ Compression model (google/gemini-3-flash-preview) context is 32,768 tokens,
but the main model's compression threshold is 100,000 tokens. Context compression
will not be possible — the content to summarise will exceed the auxiliary model's
context window. Consider configuring a larger model via auxiliary.compression.model
in config.yaml.

Test plan

python -m pytest tests/run_agent/test_compression_feasibility.py -n0 -v  # 8 passed
python -m pytest tests/agent/test_context_compressor.py tests/run_agent/test_compressor_fallback_update.py tests/run_agent/test_compression_persistence.py tests/run_agent/test_compression_boundary.py tests/run_agent/test_413_compression.py -n0  # 66 passed

Two-phase design so the warning fires before the user's first message on every platform: Phase 1 (__init__): _check_compression_model_feasibility() runs during agent construction. Resolves the auxiliary compression model (same chain as call_llm with task='compression'), compares its context length to the main model's compression threshold. If too small, emits via _emit_status() (prints for CLI) and stores the warning in _compression_warning. Phase 2 (run_conversation, first call): _replay_compression_warning() re-sends the stored warning through status_callback — which the gateway wires AFTER construction. The warning is then cleared so it only fires once. This ensures: - CLI users see the warning immediately at startup (right after the context limit line) - Gateway users (Telegram, Discord, Slack, WhatsApp, Signal, Matrix, Mattermost, Home Assistant, DingTalk, etc.) receive it via status_callback('lifecycle', ...) on their first message - logger.warning() always hits agent.log regardless of platform Also warns when no auxiliary LLM provider is configured at all. Entire check wrapped in try/except — never blocks startup. 11 tests covering: core warning logic, boundary conditions, exception safety, two-phase store+replay, gateway callback wiring, and single-delivery guarantee.

…i provider, community bug fixes) - fix(vision): auto-resize oversized images + preserve aspect ratio - feat: warn when compression model context is too small (NousResearch#7894) - refactor(auxiliary): config.yaml priority over env vars (NousResearch#7889) - fix: three high-impact community bugs (NousResearch#5819, NousResearch#6893, NousResearch#3388) - fix(streaming): adaptive backoff + cursor strip (NousResearch#7683) - fix(weixin): keep multi-line messages in single bubble - fix(matrix): pass required args for mautrix >=0.21 - feat(xiaomi): add Xiaomi MiMo as first-class provider - feat(migration): preview-then-confirm UX + docs - fix: unify openai-codex model list - docs: MiMo docs + compression context warning docs Self-improve: automated improvement

…ll (NousResearch#7894) Two-phase design so the warning fires before the user's first message on every platform: Phase 1 (__init__): _check_compression_model_feasibility() runs during agent construction. Resolves the auxiliary compression model (same chain as call_llm with task='compression'), compares its context length to the main model's compression threshold. If too small, emits via _emit_status() (prints for CLI) and stores the warning in _compression_warning. Phase 2 (run_conversation, first call): _replay_compression_warning() re-sends the stored warning through status_callback — which the gateway wires AFTER construction. The warning is then cleared so it only fires once. This ensures: - CLI users see the warning immediately at startup (right after the context limit line) - Gateway users (Telegram, Discord, Slack, WhatsApp, Signal, Matrix, Mattermost, Home Assistant, DingTalk, etc.) receive it via status_callback('lifecycle', ...) on their first message - logger.warning() always hits agent.log regardless of platform Also warns when no auxiliary LLM provider is configured at all. Entire check wrapped in try/except — never blocks startup. 11 tests covering: core warning logic, boundary conditions, exception safety, two-phase store+replay, gateway callback wiring, and single-delivery guarantee.

teknium1 force-pushed the hermes/hermes-da2f08b5 branch 3 times, most recently from 1bca039 to df710a0 Compare April 11, 2026 18:58

teknium1 force-pushed the hermes/hermes-da2f08b5 branch from df710a0 to 765af0b Compare April 11, 2026 19:01

teknium1 merged commit dafe443 into main Apr 11, 2026
4 checks passed

SHL0MS mentioned this pull request Apr 11, 2026

[Tracking] /compress display bugs #7955

Closed

2 tasks

XiaoXiao0221 mentioned this pull request Apr 12, 2026

fix/windows gateway encoding v2 #8179

Closed

github-actions Bot mentioned this pull request Apr 15, 2026

chore: bump NousResearch/hermes-agent version from v2026.4.8 to v2026.4.13 Docker-Hub-sirmark/docker-hermes-agent#1

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: warn at session start when compression model context is too small#7894

feat: warn at session start when compression model context is too small#7894
teknium1 merged 1 commit into
mainfrom
hermes/hermes-da2f08b5

teknium1 commented Apr 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

teknium1 commented Apr 11, 2026

Summary

What changed

Example output

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant