fix: unify gateway session hygiene with agent compression config#739
Merged
Conversation
Adds a simple config option to play the terminal bell (\a) when the
agent finishes a response. Useful for long-running tasks — switch to
another window and your terminal will ding when done.
Works over SSH since the bell character propagates through the
connection. Most terminal emulators can be configured to flash the
taskbar, play a sound, or show a visual indicator on bell.
Config (default: off):
display:
bell_on_complete: true
Closes #318
The gateway had a SEPARATE compression system ('session hygiene')
with hardcoded thresholds (100k tokens / 200 messages) that were
completely disconnected from the model's context length and the
user's compression config in config.yaml. This caused premature
auto-compression on Telegram/Discord — triggering at ~60k tokens
(from the 200-message threshold) or inconsistent token counts.
Changes:
- Gateway hygiene now reads model name from config.yaml and uses
get_model_context_length() to derive the actual context limit
- Compression threshold comes from compression.threshold in
config.yaml (default 0.85), same as the agent's ContextCompressor
- Removed the message-count-based trigger (was redundant and caused
false positives in tool-heavy sessions)
- Removed the undocumented session_hygiene config section — the
standard compression.* config now controls everything
- Env var overrides (CONTEXT_COMPRESSION_THRESHOLD,
CONTEXT_COMPRESSION_ENABLED) are respected
- Warn threshold is now 95% of model context (was hardcoded 200k)
- Updated tests to verify model-aware thresholds, scaling across
models, and that message count alone no longer triggers compression
For claude-opus-4.6 (200k context) at 85% threshold: gateway
hygiene now triggers at 170k tokens instead of the old 100k.
Major updates to reflect the current OBLITERATUS codebase: - Change default recommendation from 'informed' (experimental) to 'advanced' (reliable, well-tested multi-direction SVD) - Add new CLI commands: tourney, recommend, strategies, report, aggregate, abliterate (alias) - Add --direction-method flag (diff_means, svd, leace) - Add strategies module (embedding/FFN ablation, head pruning, layer removal) - Add evaluation module with LM Eval Harness integration - Expand analysis modules from 15 to 28 - Add Apple Silicon (MLX) support - Add study presets (quick, jailbreak, knowledge, etc.) - Add --contribute, --verify-sample-size, --preset flags - Add complete CLI command reference table - Fix torch property name: total_mem -> total_memory (caught during live testing) Tested: Successfully abliterated Qwen2.5-0.5B-Instruct using 'advanced' method — refusal rate 0.4%, coherence 1.0, model responds without refusal to test prompts.
Added pitfalls discovered during live abliteration testing: - Models < 1B have fragmented refusal, respond poorly (0.5B: 60%→20%) - Models 3B+ work much better (3B: 75%→0% with advanced defaults) - aggressive method can backfire on small models (made it worse) - Spectral certification RED is common even when refusal rate is 0% - Fixed torch property: total_mem → total_memory
angelburgosrosado
pushed a commit
to angelburgosrosado/hermes-agent
that referenced
this pull request
Apr 27, 2026
OBLITERATUS skill (PR NousResearch#408 updated): - 9 CLI methods, 28 analysis modules, 116 model presets - Default method: advanced (multi-direction SVD, norm-preserving) - Live-tested: Qwen2.5-3B 75%→0% refusal, Qwen2.5-0.5B 60%→20% - References, templates, and real-world pitfalls included Gateway compression fix (PR NousResearch#739): - Unified session hygiene with agent compression config - Uses model context length × compression.threshold from config.yaml - Removed hardcoded 100k/200-msg thresholds
02356abc
pushed a commit
to 02356abc/hermes-agent
that referenced
this pull request
May 14, 2026
OBLITERATUS skill (PR NousResearch#408 updated): - 9 CLI methods, 28 analysis modules, 116 model presets - Default method: advanced (multi-direction SVD, norm-preserving) - Live-tested: Qwen2.5-3B 75%→0% refusal, Qwen2.5-0.5B 60%→20% - References, templates, and real-world pitfalls included Gateway compression fix (PR NousResearch#739): - Unified session hygiene with agent compression config - Uses model context length × compression.threshold from config.yaml - Removed hardcoded 100k/200-msg thresholds
olympus-terminal
pushed a commit
to olympus-terminal/hermes-agent
that referenced
this pull request
May 16, 2026
OBLITERATUS skill (PR NousResearch#408 updated): - 9 CLI methods, 28 analysis modules, 116 model presets - Default method: advanced (multi-direction SVD, norm-preserving) - Live-tested: Qwen2.5-3B 75%→0% refusal, Qwen2.5-0.5B 60%→20% - References, templates, and real-world pitfalls included Gateway compression fix (PR NousResearch#739): - Unified session hygiene with agent compression config - Uses model context length × compression.threshold from config.yaml - Removed hardcoded 100k/200-msg thresholds
Egavasyug
pushed a commit
to Egavasyug/hermes-agent
that referenced
this pull request
Jun 10, 2026
OBLITERATUS skill (PR NousResearch#408 updated): - 9 CLI methods, 28 analysis modules, 116 model presets - Default method: advanced (multi-direction SVD, norm-preserving) - Live-tested: Qwen2.5-3B 75%→0% refusal, Qwen2.5-0.5B 60%→20% - References, templates, and real-world pitfalls included Gateway compression fix (PR NousResearch#739): - Unified session hygiene with agent compression config - Uses model context length × compression.threshold from config.yaml - Removed hardcoded 100k/200-msg thresholds
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The gateway had a separate compression system ('session hygiene') with hardcoded thresholds that were completely disconnected from the model's context length and the user's
compressionconfig inconfig.yaml:auto_compress_tokens: 100,000(hardcoded)auto_compress_messages: 200(hardcoded)This caused premature auto-compression on Telegram/Discord:
CLI sessions worked correctly because they only use the agent's internal
ContextCompressor, which properly readscompression.thresholdfrom config.Fix
Unified the gateway hygiene to use the exact same config as the agent:
config.yaml→ usesget_model_context_length()for context limitcompression.thresholdfromconfig.yaml(default 0.85)compression.enabledand env var overrides (CONTEXT_COMPRESSION_THRESHOLD,CONTEXT_COMPRESSION_ENABLED)session_hygieneconfig sectionResult for claude-opus-4.6: Gateway hygiene now triggers at 170k tokens (85% of 200k) instead of the old 100k/200-messages.
Test plan
tests/gateway/test_session_hygiene.py— 13 tests covering: