fix(gateway): strip [CONTEXT COMPACTION] echoes from gateway responses#6272
Open
giwaov wants to merge 1 commit into
Open
fix(gateway): strip [CONTEXT COMPACTION] echoes from gateway responses#6272giwaov wants to merge 1 commit into
giwaov wants to merge 1 commit into
Conversation
When the model parrots the stored compaction handoff summary back to the user (e.g. in response to a simple 'Hello?' after a session with prior compressed context), the raw [CONTEXT COMPACTION] blob leaks into the visible response. Strip the compaction prefix and boilerplate from the response before delivery, so the user sees a clean greeting instead of an opaque internal handoff dump. Closes NousResearch#6212
ilyasst
added a commit
to ilyasst/hermes-agent
that referenced
this pull request
May 13, 2026
Symptom: local Qwen3.5 via llama-server with --jinja occasionally generates "\nuser\n[<user_name>] ..." past the assistant stop boundary. Hermes formats user messages as "[<user_name>] ..." (gateway/run.py:5692), so the model learned that pattern as the user-turn delimiter; when sampling drifts past the assistant stop, it continues into a plausible user turn. The gateway delivers that verbatim to Telegram — the bot appears to literally echo "[ilyass] nudge" back at the user. Belt + suspenders: - run_agent.py: inject stop sequences ["\nuser\n[", "\n\nuser\n"] into chat_completions api_kwargs after _build_api_kwargs(). Cuts generation at the leak boundary, saving tokens too. Disable via HERMES_DISABLE_USER_PREFIX_STOP=1. - gateway/run.py: post-process final_response with a regex that detects and strips "\nuser\n[<name>] ..." patterns. Safety net for any leak that escapes the stop sequence (e.g., on streaming providers where stop detection differs). Logs a warning when it fires so we can monitor. Related upstream (all open, none merged): NousResearch#6272 (strip [CONTEXT COMPACTION] echoes), NousResearch#19887 (strip leaked template markers from gemma4 tool args), NousResearch#15369 (strip MEDIA directives), NousResearch#18529 (title leaks <thinking>).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When a Telegram session has prior compressed context, the model can parrot the stored \[CONTEXT COMPACTION]\ handoff summary back as its response to a simple greeting like \/start\ or \Hello?\. This makes the bot dump an opaque internal handoff blob instead of greeting normally.
Root Cause
The context compressor stores a structured handoff summary prefixed with \[CONTEXT COMPACTION] Earlier turns in this conversation were compacted...\ in the conversation history. When a user re-enters the conversation, the model sees this in its context and may echo it verbatim as the response.
Changes
gateway/run.py - response post-processing (after \�gent_result.get('final_response')\)
Observed Behavior
Before:
After:
The compaction boilerplate is stripped, leaving only any user-facing content the model generated after the echo.
Closes #6212