Skip to content

feat: warn at session start when compression model context is too small#7894

Merged
teknium1 merged 1 commit into
mainfrom
hermes/hermes-da2f08b5
Apr 11, 2026
Merged

feat: warn at session start when compression model context is too small#7894
teknium1 merged 1 commit into
mainfrom
hermes/hermes-da2f08b5

Conversation

@teknium1

Copy link
Copy Markdown
Contributor

Summary

Adds a session-start check that detects when the auxiliary compression model's context window is smaller than the main model's compression threshold. When this is the case, context compression will not be possible because the content to summarize will exceed the auxiliary model's capacity.

What changed

run_agent.py:

  • New method _check_compression_model_feasibility() on AIAgent
  • Called during __init__ right after the compressor is initialized
  • Resolves the auxiliary compression model via the same resolution chain as call_llm(task='compression') — respects auxiliary.compression.model, compression.summary_model, env overrides, and the auto-detection chain
  • Compares the auxiliary model's context length against threshold_tokens (= main_context * threshold_percent)
  • Emits warning via _emit_status() — covers all platforms: CLI (_vprint(force=True)), and every gateway platform (Telegram, Discord, Slack, WhatsApp, Signal, Matrix, Mattermost, Home Assistant, DingTalk, etc.) through status_callback('lifecycle', ...)
  • Also logs via logger.warning() to agent.log
  • Warns when no auxiliary LLM provider is configured at all
  • Entire check is wrapped in try/except — never blocks startup

tests/run_agent/test_compression_feasibility.py: 8 tests covering:

  • Warning fires when aux context < threshold
  • No warning when aux context >= threshold
  • No provider configured → different warning
  • Compression disabled → check skipped
  • Exception safety (never crashes)
  • Gateway status_callback receives the warning
  • Exact boundary (equal = no warning)
  • One below boundary → warning fires

Example output

When a user has a 200K main model (threshold at 100K) but their auxiliary compression model only has 32K context:

📊 Context limit: 200,000 tokens (compress at 50% = 100,000)
⚠ Compression model (google/gemini-3-flash-preview) context is 32,768 tokens,
but the main model's compression threshold is 100,000 tokens. Context compression
will not be possible — the content to summarise will exceed the auxiliary model's
context window. Consider configuring a larger model via auxiliary.compression.model
in config.yaml.

Test plan

python -m pytest tests/run_agent/test_compression_feasibility.py -n0 -v  # 8 passed
python -m pytest tests/agent/test_context_compressor.py tests/run_agent/test_compressor_fallback_update.py tests/run_agent/test_compression_persistence.py tests/run_agent/test_compression_boundary.py tests/run_agent/test_413_compression.py -n0  # 66 passed

@teknium1 teknium1 force-pushed the hermes/hermes-da2f08b5 branch 3 times, most recently from 1bca039 to df710a0 Compare April 11, 2026 18:58
Two-phase design so the warning fires before the user's first message
on every platform:

Phase 1 (__init__):
  _check_compression_model_feasibility() runs during agent construction.
  Resolves the auxiliary compression model (same chain as call_llm with
  task='compression'), compares its context length to the main model's
  compression threshold. If too small, emits via _emit_status() (prints
  for CLI) and stores the warning in _compression_warning.

Phase 2 (run_conversation, first call):
  _replay_compression_warning() re-sends the stored warning through
  status_callback — which the gateway wires AFTER construction. The
  warning is then cleared so it only fires once.

This ensures:
- CLI users see the warning immediately at startup (right after the
  context limit line)
- Gateway users (Telegram, Discord, Slack, WhatsApp, Signal, Matrix,
  Mattermost, Home Assistant, DingTalk, etc.) receive it via
  status_callback('lifecycle', ...) on their first message
- logger.warning() always hits agent.log regardless of platform

Also warns when no auxiliary LLM provider is configured at all.
Entire check wrapped in try/except — never blocks startup.

11 tests covering: core warning logic, boundary conditions, exception
safety, two-phase store+replay, gateway callback wiring, and
single-delivery guarantee.
@teknium1 teknium1 force-pushed the hermes/hermes-da2f08b5 branch from df710a0 to 765af0b Compare April 11, 2026 19:01
@teknium1 teknium1 merged commit dafe443 into main Apr 11, 2026
4 checks passed
forsonny pushed a commit to forsonny/hermes-agent that referenced this pull request Apr 11, 2026
…i provider, community bug fixes)

- fix(vision): auto-resize oversized images + preserve aspect ratio
- feat: warn when compression model context is too small (NousResearch#7894)
- refactor(auxiliary): config.yaml priority over env vars (NousResearch#7889)
- fix: three high-impact community bugs (NousResearch#5819, NousResearch#6893, NousResearch#3388)
- fix(streaming): adaptive backoff + cursor strip (NousResearch#7683)
- fix(weixin): keep multi-line messages in single bubble
- fix(matrix): pass required args for mautrix >=0.21
- feat(xiaomi): add Xiaomi MiMo as first-class provider
- feat(migration): preview-then-confirm UX + docs
- fix: unify openai-codex model list
- docs: MiMo docs + compression context warning docs

Self-improve: automated improvement
@SHL0MS SHL0MS mentioned this pull request Apr 11, 2026
2 tasks
Tommyeds pushed a commit to Tommyeds/hermes-agent that referenced this pull request Apr 12, 2026
…ll (NousResearch#7894)

Two-phase design so the warning fires before the user's first message
on every platform:

Phase 1 (__init__):
  _check_compression_model_feasibility() runs during agent construction.
  Resolves the auxiliary compression model (same chain as call_llm with
  task='compression'), compares its context length to the main model's
  compression threshold. If too small, emits via _emit_status() (prints
  for CLI) and stores the warning in _compression_warning.

Phase 2 (run_conversation, first call):
  _replay_compression_warning() re-sends the stored warning through
  status_callback — which the gateway wires AFTER construction. The
  warning is then cleared so it only fires once.

This ensures:
- CLI users see the warning immediately at startup (right after the
  context limit line)
- Gateway users (Telegram, Discord, Slack, WhatsApp, Signal, Matrix,
  Mattermost, Home Assistant, DingTalk, etc.) receive it via
  status_callback('lifecycle', ...) on their first message
- logger.warning() always hits agent.log regardless of platform

Also warns when no auxiliary LLM provider is configured at all.
Entire check wrapped in try/except — never blocks startup.

11 tests covering: core warning logic, boundary conditions, exception
safety, two-phase store+replay, gateway callback wiring, and
single-delivery guarantee.
ulasbilgen pushed a commit to ulasbilgen/hermes-adhd-agent that referenced this pull request May 1, 2026
…ll (NousResearch#7894)

Two-phase design so the warning fires before the user's first message
on every platform:

Phase 1 (__init__):
  _check_compression_model_feasibility() runs during agent construction.
  Resolves the auxiliary compression model (same chain as call_llm with
  task='compression'), compares its context length to the main model's
  compression threshold. If too small, emits via _emit_status() (prints
  for CLI) and stores the warning in _compression_warning.

Phase 2 (run_conversation, first call):
  _replay_compression_warning() re-sends the stored warning through
  status_callback — which the gateway wires AFTER construction. The
  warning is then cleared so it only fires once.

This ensures:
- CLI users see the warning immediately at startup (right after the
  context limit line)
- Gateway users (Telegram, Discord, Slack, WhatsApp, Signal, Matrix,
  Mattermost, Home Assistant, DingTalk, etc.) receive it via
  status_callback('lifecycle', ...) on their first message
- logger.warning() always hits agent.log regardless of platform

Also warns when no auxiliary LLM provider is configured at all.
Entire check wrapped in try/except — never blocks startup.

11 tests covering: core warning logic, boundary conditions, exception
safety, two-phase store+replay, gateway callback wiring, and
single-delivery guarantee.
aj-nt pushed a commit to aj-nt/hermes-agent that referenced this pull request May 1, 2026
…ll (NousResearch#7894)

Two-phase design so the warning fires before the user's first message
on every platform:

Phase 1 (__init__):
  _check_compression_model_feasibility() runs during agent construction.
  Resolves the auxiliary compression model (same chain as call_llm with
  task='compression'), compares its context length to the main model's
  compression threshold. If too small, emits via _emit_status() (prints
  for CLI) and stores the warning in _compression_warning.

Phase 2 (run_conversation, first call):
  _replay_compression_warning() re-sends the stored warning through
  status_callback — which the gateway wires AFTER construction. The
  warning is then cleared so it only fires once.

This ensures:
- CLI users see the warning immediately at startup (right after the
  context limit line)
- Gateway users (Telegram, Discord, Slack, WhatsApp, Signal, Matrix,
  Mattermost, Home Assistant, DingTalk, etc.) receive it via
  status_callback('lifecycle', ...) on their first message
- logger.warning() always hits agent.log regardless of platform

Also warns when no auxiliary LLM provider is configured at all.
Entire check wrapped in try/except — never blocks startup.

11 tests covering: core warning logic, boundary conditions, exception
safety, two-phase store+replay, gateway callback wiring, and
single-delivery guarantee.
02356abc pushed a commit to 02356abc/hermes-agent that referenced this pull request May 14, 2026
…ll (NousResearch#7894)

Two-phase design so the warning fires before the user's first message
on every platform:

Phase 1 (__init__):
  _check_compression_model_feasibility() runs during agent construction.
  Resolves the auxiliary compression model (same chain as call_llm with
  task='compression'), compares its context length to the main model's
  compression threshold. If too small, emits via _emit_status() (prints
  for CLI) and stores the warning in _compression_warning.

Phase 2 (run_conversation, first call):
  _replay_compression_warning() re-sends the stored warning through
  status_callback — which the gateway wires AFTER construction. The
  warning is then cleared so it only fires once.

This ensures:
- CLI users see the warning immediately at startup (right after the
  context limit line)
- Gateway users (Telegram, Discord, Slack, WhatsApp, Signal, Matrix,
  Mattermost, Home Assistant, DingTalk, etc.) receive it via
  status_callback('lifecycle', ...) on their first message
- logger.warning() always hits agent.log regardless of platform

Also warns when no auxiliary LLM provider is configured at all.
Entire check wrapped in try/except — never blocks startup.

11 tests covering: core warning logic, boundary conditions, exception
safety, two-phase store+replay, gateway callback wiring, and
single-delivery guarantee.
olympus-terminal pushed a commit to olympus-terminal/hermes-agent that referenced this pull request May 16, 2026
…ll (NousResearch#7894)

Two-phase design so the warning fires before the user's first message
on every platform:

Phase 1 (__init__):
  _check_compression_model_feasibility() runs during agent construction.
  Resolves the auxiliary compression model (same chain as call_llm with
  task='compression'), compares its context length to the main model's
  compression threshold. If too small, emits via _emit_status() (prints
  for CLI) and stores the warning in _compression_warning.

Phase 2 (run_conversation, first call):
  _replay_compression_warning() re-sends the stored warning through
  status_callback — which the gateway wires AFTER construction. The
  warning is then cleared so it only fires once.

This ensures:
- CLI users see the warning immediately at startup (right after the
  context limit line)
- Gateway users (Telegram, Discord, Slack, WhatsApp, Signal, Matrix,
  Mattermost, Home Assistant, DingTalk, etc.) receive it via
  status_callback('lifecycle', ...) on their first message
- logger.warning() always hits agent.log regardless of platform

Also warns when no auxiliary LLM provider is configured at all.
Entire check wrapped in try/except — never blocks startup.

11 tests covering: core warning logic, boundary conditions, exception
safety, two-phase store+replay, gateway callback wiring, and
single-delivery guarantee.
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
…ll (NousResearch#7894)

Two-phase design so the warning fires before the user's first message
on every platform:

Phase 1 (__init__):
  _check_compression_model_feasibility() runs during agent construction.
  Resolves the auxiliary compression model (same chain as call_llm with
  task='compression'), compares its context length to the main model's
  compression threshold. If too small, emits via _emit_status() (prints
  for CLI) and stores the warning in _compression_warning.

Phase 2 (run_conversation, first call):
  _replay_compression_warning() re-sends the stored warning through
  status_callback — which the gateway wires AFTER construction. The
  warning is then cleared so it only fires once.

This ensures:
- CLI users see the warning immediately at startup (right after the
  context limit line)
- Gateway users (Telegram, Discord, Slack, WhatsApp, Signal, Matrix,
  Mattermost, Home Assistant, DingTalk, etc.) receive it via
  status_callback('lifecycle', ...) on their first message
- logger.warning() always hits agent.log regardless of platform

Also warns when no auxiliary LLM provider is configured at all.
Entire check wrapped in try/except — never blocks startup.

11 tests covering: core warning logic, boundary conditions, exception
safety, two-phase store+replay, gateway callback wiring, and
single-delivery guarantee.
Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request Jun 10, 2026
…ll (NousResearch#7894)

Two-phase design so the warning fires before the user's first message
on every platform:

Phase 1 (__init__):
  _check_compression_model_feasibility() runs during agent construction.
  Resolves the auxiliary compression model (same chain as call_llm with
  task='compression'), compares its context length to the main model's
  compression threshold. If too small, emits via _emit_status() (prints
  for CLI) and stores the warning in _compression_warning.

Phase 2 (run_conversation, first call):
  _replay_compression_warning() re-sends the stored warning through
  status_callback — which the gateway wires AFTER construction. The
  warning is then cleared so it only fires once.

This ensures:
- CLI users see the warning immediately at startup (right after the
  context limit line)
- Gateway users (Telegram, Discord, Slack, WhatsApp, Signal, Matrix,
  Mattermost, Home Assistant, DingTalk, etc.) receive it via
  status_callback('lifecycle', ...) on their first message
- logger.warning() always hits agent.log regardless of platform

Also warns when no auxiliary LLM provider is configured at all.
Entire check wrapped in try/except — never blocks startup.

11 tests covering: core warning logic, boundary conditions, exception
safety, two-phase store+replay, gateway callback wiring, and
single-delivery guarantee.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant