Skip to content

fix: unify gateway session hygiene with agent compression config#739

Merged
teknium1 merged 4 commits into
mainfrom
hermes/hermes-0cbb57e2
Mar 9, 2026
Merged

fix: unify gateway session hygiene with agent compression config#739
teknium1 merged 4 commits into
mainfrom
hermes/hermes-0cbb57e2

Conversation

@teknium1

@teknium1 teknium1 commented Mar 9, 2026

Copy link
Copy Markdown
Contributor

Problem

The gateway had a separate compression system ('session hygiene') with hardcoded thresholds that were completely disconnected from the model's context length and the user's compression config in config.yaml:

  • auto_compress_tokens: 100,000 (hardcoded)
  • auto_compress_messages: 200 (hardcoded)

This caused premature auto-compression on Telegram/Discord:

  • ~60k tokens: The 200-message threshold was hit (tool-heavy sessions easily reach 200 messages even with low token counts)
  • ~220k tokens: The agent's own internal compressor finally triggered at the correct threshold
  • Neither matched the expected 85% of claude-opus-4.6's 200k context = 170k tokens

CLI sessions worked correctly because they only use the agent's internal ContextCompressor, which properly reads compression.threshold from config.

Fix

Unified the gateway hygiene to use the exact same config as the agent:

  1. Reads model name from config.yaml → uses get_model_context_length() for context limit
  2. Reads compression.threshold from config.yaml (default 0.85)
  3. Respects compression.enabled and env var overrides (CONTEXT_COMPRESSION_THRESHOLD, CONTEXT_COMPRESSION_ENABLED)
  4. Removed the message-count-based trigger (redundant, caused false positives)
  5. Removed the undocumented session_hygiene config section
  6. Warn threshold is now 95% of model context (was hardcoded 200k)

Result for claude-opus-4.6: Gateway hygiene now triggers at 170k tokens (85% of 200k) instead of the old 100k/200-messages.

Test plan

  • Updated tests/gateway/test_session_hygiene.py — 13 tests covering:
    • Model-aware threshold scaling (128k, 200k, 1M models)
    • Custom threshold percentages
    • Message count alone no longer triggers compression
    • Warn threshold at 95% of context
  • Full suite: 2468 passed, 5 skipped

teknium1 added 4 commits March 8, 2026 19:41
Adds a simple config option to play the terminal bell (\a) when the
agent finishes a response. Useful for long-running tasks — switch to
another window and your terminal will ding when done.

Works over SSH since the bell character propagates through the
connection. Most terminal emulators can be configured to flash the
taskbar, play a sound, or show a visual indicator on bell.

Config (default: off):
  display:
    bell_on_complete: true

Closes #318
The gateway had a SEPARATE compression system ('session hygiene')
with hardcoded thresholds (100k tokens / 200 messages) that were
completely disconnected from the model's context length and the
user's compression config in config.yaml. This caused premature
auto-compression on Telegram/Discord — triggering at ~60k tokens
(from the 200-message threshold) or inconsistent token counts.

Changes:
- Gateway hygiene now reads model name from config.yaml and uses
  get_model_context_length() to derive the actual context limit
- Compression threshold comes from compression.threshold in
  config.yaml (default 0.85), same as the agent's ContextCompressor
- Removed the message-count-based trigger (was redundant and caused
  false positives in tool-heavy sessions)
- Removed the undocumented session_hygiene config section — the
  standard compression.* config now controls everything
- Env var overrides (CONTEXT_COMPRESSION_THRESHOLD,
  CONTEXT_COMPRESSION_ENABLED) are respected
- Warn threshold is now 95% of model context (was hardcoded 200k)
- Updated tests to verify model-aware thresholds, scaling across
  models, and that message count alone no longer triggers compression

For claude-opus-4.6 (200k context) at 85% threshold: gateway
hygiene now triggers at 170k tokens instead of the old 100k.
Major updates to reflect the current OBLITERATUS codebase:

- Change default recommendation from 'informed' (experimental) to
  'advanced' (reliable, well-tested multi-direction SVD)
- Add new CLI commands: tourney, recommend, strategies, report,
  aggregate, abliterate (alias)
- Add --direction-method flag (diff_means, svd, leace)
- Add strategies module (embedding/FFN ablation, head pruning,
  layer removal)
- Add evaluation module with LM Eval Harness integration
- Expand analysis modules from 15 to 28
- Add Apple Silicon (MLX) support
- Add study presets (quick, jailbreak, knowledge, etc.)
- Add --contribute, --verify-sample-size, --preset flags
- Add complete CLI command reference table
- Fix torch property name: total_mem -> total_memory (caught
  during live testing)

Tested: Successfully abliterated Qwen2.5-0.5B-Instruct using
'advanced' method — refusal rate 0.4%, coherence 1.0, model
responds without refusal to test prompts.
Added pitfalls discovered during live abliteration testing:
- Models < 1B have fragmented refusal, respond poorly (0.5B: 60%→20%)
- Models 3B+ work much better (3B: 75%→0% with advanced defaults)
- aggressive method can backfire on small models (made it worse)
- Spectral certification RED is common even when refusal rate is 0%
- Fixed torch property: total_mem → total_memory
@teknium1 teknium1 merged commit c21d77c into main Mar 9, 2026
1 check passed
angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 27, 2026
OBLITERATUS skill (PR NousResearch#408 updated):
- 9 CLI methods, 28 analysis modules, 116 model presets
- Default method: advanced (multi-direction SVD, norm-preserving)
- Live-tested: Qwen2.5-3B 75%→0% refusal, Qwen2.5-0.5B 60%→20%
- References, templates, and real-world pitfalls included

Gateway compression fix (PR NousResearch#739):
- Unified session hygiene with agent compression config
- Uses model context length × compression.threshold from config.yaml
- Removed hardcoded 100k/200-msg thresholds
02356abc pushed a commit to 02356abc/hermes-agent that referenced this pull request May 14, 2026
OBLITERATUS skill (PR NousResearch#408 updated):
- 9 CLI methods, 28 analysis modules, 116 model presets
- Default method: advanced (multi-direction SVD, norm-preserving)
- Live-tested: Qwen2.5-3B 75%→0% refusal, Qwen2.5-0.5B 60%→20%
- References, templates, and real-world pitfalls included

Gateway compression fix (PR NousResearch#739):
- Unified session hygiene with agent compression config
- Uses model context length × compression.threshold from config.yaml
- Removed hardcoded 100k/200-msg thresholds
olympus-terminal pushed a commit to olympus-terminal/hermes-agent that referenced this pull request May 16, 2026
OBLITERATUS skill (PR NousResearch#408 updated):
- 9 CLI methods, 28 analysis modules, 116 model presets
- Default method: advanced (multi-direction SVD, norm-preserving)
- Live-tested: Qwen2.5-3B 75%→0% refusal, Qwen2.5-0.5B 60%→20%
- References, templates, and real-world pitfalls included

Gateway compression fix (PR NousResearch#739):
- Unified session hygiene with agent compression config
- Uses model context length × compression.threshold from config.yaml
- Removed hardcoded 100k/200-msg thresholds
Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request Jun 10, 2026
OBLITERATUS skill (PR NousResearch#408 updated):
- 9 CLI methods, 28 analysis modules, 116 model presets
- Default method: advanced (multi-direction SVD, norm-preserving)
- Live-tested: Qwen2.5-3B 75%→0% refusal, Qwen2.5-0.5B 60%→20%
- References, templates, and real-world pitfalls included

Gateway compression fix (PR NousResearch#739):
- Unified session hygiene with agent compression config
- Uses model context length × compression.threshold from config.yaml
- Removed hardcoded 100k/200-msg thresholds
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant