feat: warn at session start when compression model context is too small#7894
Merged
Conversation
1bca039 to
df710a0
Compare
Two-phase design so the warning fires before the user's first message
on every platform:
Phase 1 (__init__):
_check_compression_model_feasibility() runs during agent construction.
Resolves the auxiliary compression model (same chain as call_llm with
task='compression'), compares its context length to the main model's
compression threshold. If too small, emits via _emit_status() (prints
for CLI) and stores the warning in _compression_warning.
Phase 2 (run_conversation, first call):
_replay_compression_warning() re-sends the stored warning through
status_callback — which the gateway wires AFTER construction. The
warning is then cleared so it only fires once.
This ensures:
- CLI users see the warning immediately at startup (right after the
context limit line)
- Gateway users (Telegram, Discord, Slack, WhatsApp, Signal, Matrix,
Mattermost, Home Assistant, DingTalk, etc.) receive it via
status_callback('lifecycle', ...) on their first message
- logger.warning() always hits agent.log regardless of platform
Also warns when no auxiliary LLM provider is configured at all.
Entire check wrapped in try/except — never blocks startup.
11 tests covering: core warning logic, boundary conditions, exception
safety, two-phase store+replay, gateway callback wiring, and
single-delivery guarantee.
df710a0 to
765af0b
Compare
forsonny
pushed a commit
to forsonny/hermes-agent
that referenced
this pull request
Apr 11, 2026
…i provider, community bug fixes) - fix(vision): auto-resize oversized images + preserve aspect ratio - feat: warn when compression model context is too small (NousResearch#7894) - refactor(auxiliary): config.yaml priority over env vars (NousResearch#7889) - fix: three high-impact community bugs (NousResearch#5819, NousResearch#6893, NousResearch#3388) - fix(streaming): adaptive backoff + cursor strip (NousResearch#7683) - fix(weixin): keep multi-line messages in single bubble - fix(matrix): pass required args for mautrix >=0.21 - feat(xiaomi): add Xiaomi MiMo as first-class provider - feat(migration): preview-then-confirm UX + docs - fix: unify openai-codex model list - docs: MiMo docs + compression context warning docs Self-improve: automated improvement
2 tasks
Tommyeds
pushed a commit
to Tommyeds/hermes-agent
that referenced
this pull request
Apr 12, 2026
…ll (NousResearch#7894) Two-phase design so the warning fires before the user's first message on every platform: Phase 1 (__init__): _check_compression_model_feasibility() runs during agent construction. Resolves the auxiliary compression model (same chain as call_llm with task='compression'), compares its context length to the main model's compression threshold. If too small, emits via _emit_status() (prints for CLI) and stores the warning in _compression_warning. Phase 2 (run_conversation, first call): _replay_compression_warning() re-sends the stored warning through status_callback — which the gateway wires AFTER construction. The warning is then cleared so it only fires once. This ensures: - CLI users see the warning immediately at startup (right after the context limit line) - Gateway users (Telegram, Discord, Slack, WhatsApp, Signal, Matrix, Mattermost, Home Assistant, DingTalk, etc.) receive it via status_callback('lifecycle', ...) on their first message - logger.warning() always hits agent.log regardless of platform Also warns when no auxiliary LLM provider is configured at all. Entire check wrapped in try/except — never blocks startup. 11 tests covering: core warning logic, boundary conditions, exception safety, two-phase store+replay, gateway callback wiring, and single-delivery guarantee.
ulasbilgen
pushed a commit
to ulasbilgen/hermes-adhd-agent
that referenced
this pull request
May 1, 2026
…ll (NousResearch#7894) Two-phase design so the warning fires before the user's first message on every platform: Phase 1 (__init__): _check_compression_model_feasibility() runs during agent construction. Resolves the auxiliary compression model (same chain as call_llm with task='compression'), compares its context length to the main model's compression threshold. If too small, emits via _emit_status() (prints for CLI) and stores the warning in _compression_warning. Phase 2 (run_conversation, first call): _replay_compression_warning() re-sends the stored warning through status_callback — which the gateway wires AFTER construction. The warning is then cleared so it only fires once. This ensures: - CLI users see the warning immediately at startup (right after the context limit line) - Gateway users (Telegram, Discord, Slack, WhatsApp, Signal, Matrix, Mattermost, Home Assistant, DingTalk, etc.) receive it via status_callback('lifecycle', ...) on their first message - logger.warning() always hits agent.log regardless of platform Also warns when no auxiliary LLM provider is configured at all. Entire check wrapped in try/except — never blocks startup. 11 tests covering: core warning logic, boundary conditions, exception safety, two-phase store+replay, gateway callback wiring, and single-delivery guarantee.
aj-nt
pushed a commit
to aj-nt/hermes-agent
that referenced
this pull request
May 1, 2026
…ll (NousResearch#7894) Two-phase design so the warning fires before the user's first message on every platform: Phase 1 (__init__): _check_compression_model_feasibility() runs during agent construction. Resolves the auxiliary compression model (same chain as call_llm with task='compression'), compares its context length to the main model's compression threshold. If too small, emits via _emit_status() (prints for CLI) and stores the warning in _compression_warning. Phase 2 (run_conversation, first call): _replay_compression_warning() re-sends the stored warning through status_callback — which the gateway wires AFTER construction. The warning is then cleared so it only fires once. This ensures: - CLI users see the warning immediately at startup (right after the context limit line) - Gateway users (Telegram, Discord, Slack, WhatsApp, Signal, Matrix, Mattermost, Home Assistant, DingTalk, etc.) receive it via status_callback('lifecycle', ...) on their first message - logger.warning() always hits agent.log regardless of platform Also warns when no auxiliary LLM provider is configured at all. Entire check wrapped in try/except — never blocks startup. 11 tests covering: core warning logic, boundary conditions, exception safety, two-phase store+replay, gateway callback wiring, and single-delivery guarantee.
02356abc
pushed a commit
to 02356abc/hermes-agent
that referenced
this pull request
May 14, 2026
…ll (NousResearch#7894) Two-phase design so the warning fires before the user's first message on every platform: Phase 1 (__init__): _check_compression_model_feasibility() runs during agent construction. Resolves the auxiliary compression model (same chain as call_llm with task='compression'), compares its context length to the main model's compression threshold. If too small, emits via _emit_status() (prints for CLI) and stores the warning in _compression_warning. Phase 2 (run_conversation, first call): _replay_compression_warning() re-sends the stored warning through status_callback — which the gateway wires AFTER construction. The warning is then cleared so it only fires once. This ensures: - CLI users see the warning immediately at startup (right after the context limit line) - Gateway users (Telegram, Discord, Slack, WhatsApp, Signal, Matrix, Mattermost, Home Assistant, DingTalk, etc.) receive it via status_callback('lifecycle', ...) on their first message - logger.warning() always hits agent.log regardless of platform Also warns when no auxiliary LLM provider is configured at all. Entire check wrapped in try/except — never blocks startup. 11 tests covering: core warning logic, boundary conditions, exception safety, two-phase store+replay, gateway callback wiring, and single-delivery guarantee.
olympus-terminal
pushed a commit
to olympus-terminal/hermes-agent
that referenced
this pull request
May 16, 2026
…ll (NousResearch#7894) Two-phase design so the warning fires before the user's first message on every platform: Phase 1 (__init__): _check_compression_model_feasibility() runs during agent construction. Resolves the auxiliary compression model (same chain as call_llm with task='compression'), compares its context length to the main model's compression threshold. If too small, emits via _emit_status() (prints for CLI) and stores the warning in _compression_warning. Phase 2 (run_conversation, first call): _replay_compression_warning() re-sends the stored warning through status_callback — which the gateway wires AFTER construction. The warning is then cleared so it only fires once. This ensures: - CLI users see the warning immediately at startup (right after the context limit line) - Gateway users (Telegram, Discord, Slack, WhatsApp, Signal, Matrix, Mattermost, Home Assistant, DingTalk, etc.) receive it via status_callback('lifecycle', ...) on their first message - logger.warning() always hits agent.log regardless of platform Also warns when no auxiliary LLM provider is configured at all. Entire check wrapped in try/except — never blocks startup. 11 tests covering: core warning logic, boundary conditions, exception safety, two-phase store+replay, gateway callback wiring, and single-delivery guarantee.
gweeteve
pushed a commit
to gweeteve/hermes-agent
that referenced
this pull request
Jun 2, 2026
…ll (NousResearch#7894) Two-phase design so the warning fires before the user's first message on every platform: Phase 1 (__init__): _check_compression_model_feasibility() runs during agent construction. Resolves the auxiliary compression model (same chain as call_llm with task='compression'), compares its context length to the main model's compression threshold. If too small, emits via _emit_status() (prints for CLI) and stores the warning in _compression_warning. Phase 2 (run_conversation, first call): _replay_compression_warning() re-sends the stored warning through status_callback — which the gateway wires AFTER construction. The warning is then cleared so it only fires once. This ensures: - CLI users see the warning immediately at startup (right after the context limit line) - Gateway users (Telegram, Discord, Slack, WhatsApp, Signal, Matrix, Mattermost, Home Assistant, DingTalk, etc.) receive it via status_callback('lifecycle', ...) on their first message - logger.warning() always hits agent.log regardless of platform Also warns when no auxiliary LLM provider is configured at all. Entire check wrapped in try/except — never blocks startup. 11 tests covering: core warning logic, boundary conditions, exception safety, two-phase store+replay, gateway callback wiring, and single-delivery guarantee.
Egavasyug
pushed a commit
to Egavasyug/hermes-agent
that referenced
this pull request
Jun 10, 2026
…ll (NousResearch#7894) Two-phase design so the warning fires before the user's first message on every platform: Phase 1 (__init__): _check_compression_model_feasibility() runs during agent construction. Resolves the auxiliary compression model (same chain as call_llm with task='compression'), compares its context length to the main model's compression threshold. If too small, emits via _emit_status() (prints for CLI) and stores the warning in _compression_warning. Phase 2 (run_conversation, first call): _replay_compression_warning() re-sends the stored warning through status_callback — which the gateway wires AFTER construction. The warning is then cleared so it only fires once. This ensures: - CLI users see the warning immediately at startup (right after the context limit line) - Gateway users (Telegram, Discord, Slack, WhatsApp, Signal, Matrix, Mattermost, Home Assistant, DingTalk, etc.) receive it via status_callback('lifecycle', ...) on their first message - logger.warning() always hits agent.log regardless of platform Also warns when no auxiliary LLM provider is configured at all. Entire check wrapped in try/except — never blocks startup. 11 tests covering: core warning logic, boundary conditions, exception safety, two-phase store+replay, gateway callback wiring, and single-delivery guarantee.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a session-start check that detects when the auxiliary compression model's context window is smaller than the main model's compression threshold. When this is the case, context compression will not be possible because the content to summarize will exceed the auxiliary model's capacity.
What changed
run_agent.py:_check_compression_model_feasibility()onAIAgent__init__right after the compressor is initializedcall_llm(task='compression')— respectsauxiliary.compression.model,compression.summary_model, env overrides, and the auto-detection chainthreshold_tokens(=main_context * threshold_percent)_emit_status()— covers all platforms: CLI (_vprint(force=True)), and every gateway platform (Telegram, Discord, Slack, WhatsApp, Signal, Matrix, Mattermost, Home Assistant, DingTalk, etc.) throughstatus_callback('lifecycle', ...)logger.warning()to agent.logtests/run_agent/test_compression_feasibility.py: 8 tests covering:Example output
When a user has a 200K main model (threshold at 100K) but their auxiliary compression model only has 32K context:
Test plan