Skip to content

fix: prevent infinite 400 loop on context overflow + block prompt injection via cache files#1639

Merged
teknium1 merged 3 commits into
mainfrom
hermes/hermes-6bb9911e
Mar 17, 2026
Merged

fix: prevent infinite 400 loop on context overflow + block prompt injection via cache files#1639
teknium1 merged 3 commits into
mainfrom
hermes/hermes-6bb9911e

Conversation

@teknium1

@teknium1 teknium1 commented Mar 17, 2026

Copy link
Copy Markdown
Contributor

Summary

Fix 1: Prevent infinite 400 failure loop (#1630)

When a gateway session exceeds the model's context window, Anthropic may return a generic 400 invalid_request_error with just "Error" as the message. This bypassed the phrase-based context-length detection, causing the agent to treat it as non-retryable. The failed user message was persisted, making the session larger — creating an infinite loop.

Three-layer fix:

  1. Agent heuristic — generic 400 + short error + large session → treat as context overflow and compress
  2. Skip persistence on failure — don't write failed messages to transcript (both agent + gateway)
  3. Smarter error messages — suggest /compact or /reset instead of generic 'try again'

Fix 2: Block prompt injection via skills hub cache (#1558, salvaged from PR #1562 by @ygd58)

A user experienced the agent outputting threatening/adversarial text after it read a 3.5MB hub catalog cache file containing prompt injection content.

Two-layer fix (cherry-picked from @ygd58's PR):

  1. read_file block — denies access to ~/.hermes/skills/.hub/ directory (index-cache, catalog files)
  2. skill_view detection — warns when skills loaded from untrusted paths or contain injection patterns

Test plan

teknium1 and others added 3 commits March 17, 2026 01:45
When a gateway session exceeds the model's context window, Anthropic may
return a generic 400 invalid_request_error with just 'Error' as the
message.  This bypassed the phrase-based context-length detection,
causing the agent to treat it as a non-retryable client error.  Worse,
the failed user message was still persisted to the transcript, making
the session even larger on each attempt — creating an infinite loop.

Three-layer fix:

1. run_agent.py — Fallback heuristic: when a 400 error has a very short
   generic message AND the session is large (>40% of context or >80
   messages), treat it as a probable context overflow and trigger
   compression instead of aborting.

2. run_agent.py + gateway/run.py — Don't persist failed messages:
   when the agent returns failed=True before generating any response,
   skip writing the user's message to the transcript/DB. This prevents
   the session from growing on each failure.

3. gateway/run.py — Smarter error messages: detect context-overflow
   failures and suggest /compact or /reset specifically, instead of a
   generic 'try again' that will fail identically.
Adds two security layers to prevent prompt injection via skills hub
cache files (#1558):

1. read_file: blocks direct reads of ~/.hermes/skills/.hub/ directory
   (index-cache, catalog files). The 3.5MB clawhub_catalog_v1.json
   was the original injection vector — untrusted skill descriptions
   in the catalog contained adversarial text that the model executed.

2. skill_view: warns when skills are loaded from outside the trusted
   ~/.hermes/skills/ directory, and detects common injection patterns
   in skill content ("ignore previous instructions", "<system>", etc.).

Cherry-picked from PR #1562 by ygd58.
@teknium1 teknium1 changed the title fix: prevent infinite 400 failure loop on context overflow fix: prevent infinite 400 loop on context overflow + block prompt injection via cache files Mar 17, 2026
@teknium1 teknium1 merged commit 96dac22 into main Mar 17, 2026
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants