fix: prevent infinite 400 loop on context overflow + block prompt injection via cache files#1639
Merged
Conversation
When a gateway session exceeds the model's context window, Anthropic may return a generic 400 invalid_request_error with just 'Error' as the message. This bypassed the phrase-based context-length detection, causing the agent to treat it as a non-retryable client error. Worse, the failed user message was still persisted to the transcript, making the session even larger on each attempt — creating an infinite loop. Three-layer fix: 1. run_agent.py — Fallback heuristic: when a 400 error has a very short generic message AND the session is large (>40% of context or >80 messages), treat it as a probable context overflow and trigger compression instead of aborting. 2. run_agent.py + gateway/run.py — Don't persist failed messages: when the agent returns failed=True before generating any response, skip writing the user's message to the transcript/DB. This prevents the session from growing on each failure. 3. gateway/run.py — Smarter error messages: detect context-overflow failures and suggest /compact or /reset specifically, instead of a generic 'try again' that will fail identically.
Adds two security layers to prevent prompt injection via skills hub cache files (#1558): 1. read_file: blocks direct reads of ~/.hermes/skills/.hub/ directory (index-cache, catalog files). The 3.5MB clawhub_catalog_v1.json was the original injection vector — untrusted skill descriptions in the catalog contained adversarial text that the model executed. 2. skill_view: warns when skills are loaded from outside the trusted ~/.hermes/skills/ directory, and detects common injection patterns in skill content ("ignore previous instructions", "<system>", etc.). Cherry-picked from PR #1562 by ygd58.
This was referenced Mar 17, 2026
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fix 1: Prevent infinite 400 failure loop (#1630)
When a gateway session exceeds the model's context window, Anthropic may return a generic 400
invalid_request_errorwith just"Error"as the message. This bypassed the phrase-based context-length detection, causing the agent to treat it as non-retryable. The failed user message was persisted, making the session larger — creating an infinite loop.Three-layer fix:
/compactor/resetinstead of generic 'try again'Fix 2: Block prompt injection via skills hub cache (#1558, salvaged from PR #1562 by @ygd58)
A user experienced the agent outputting threatening/adversarial text after it read a 3.5MB hub catalog cache file containing prompt injection content.
Two-layer fix (cherry-picked from @ygd58's PR):
read_fileblock — denies access to~/.hermes/skills/.hub/directory (index-cache, catalog files)skill_viewdetection — warns when skills loaded from untrusted paths or contain injection patternsTest plan
tests/test_1630_context_overflow_loop.py