fix: move pre_llm_call plugin context to user message, preserve prompt cache#5146
Merged
Conversation
…t cache Plugin context from pre_llm_call hooks was injected into the system prompt, breaking the prompt cache prefix every turn when content changed (typical for memory plugins). Now all plugin context goes into the current turn's user message — the system prompt stays identical across turns, preserving cached tokens. The system prompt is reserved for Hermes internals. Plugins contribute context alongside the user's input. Also adds comprehensive documentation for all 6 plugin hooks: pre_tool_call, post_tool_call, pre_llm_call, post_llm_call, on_session_start, on_session_end — each with full callback signatures, parameter tables, firing conditions, and examples. Supersedes #5138 which identified the same cache-busting bug and proposed an uncached system suffix approach. This fix goes further by removing system prompt injection entirely. Co-identified-by: OutThisLife (PR #5138)
Contributor
|
9 tasks
naoironman-hue
pushed a commit
to naoironman-hue/hermes-agent
that referenced
this pull request
Apr 5, 2026
…t cache (NousResearch#5146) Plugin context from pre_llm_call hooks was injected into the system prompt, breaking the prompt cache prefix every turn when content changed (typical for memory plugins). Now all plugin context goes into the current turn's user message — the system prompt stays identical across turns, preserving cached tokens. The system prompt is reserved for Hermes internals. Plugins contribute context alongside the user's input. Also adds comprehensive documentation for all 6 plugin hooks: pre_tool_call, post_tool_call, pre_llm_call, post_llm_call, on_session_start, on_session_end — each with full callback signatures, parameter tables, firing conditions, and examples. Supersedes NousResearch#5138 which identified the same cache-busting bug and proposed an uncached system suffix approach. This fix goes further by removing system prompt injection entirely. Co-identified-by: OutThisLife (PR NousResearch#5138)
jooray
added a commit
to jooray/hermes-agent
that referenced
this pull request
Apr 5, 2026
* upstream/main: (29 commits) style: use module-level re import instead of local import re as _re Preserve numeric credential labels in auth removal Honor provider reset windows in pooled credential failover docs: update docstring to mention Fireworks strict validation test: add strict API validation tests for Fireworks compatibility test: add test for _should_sanitize_tool_calls() refactor: use _should_sanitize_tool_calls in run_conversation() refactor: use _should_sanitize_tool_calls in _handle_max_iterations() refactor: use _should_sanitize_tool_calls in flush_memories() feat: add _should_sanitize_tool_calls() method test(redact): add regression tests for lowercase variable redaction (NousResearch#4367) (NousResearch#5185) docs(skill): claude-code v2.2 — add cheat sheet commands, env vars, rules, advanced features (NousResearch#5158) fix(telegram): prevent duplicate message delivery on send timeout (NousResearch#5153) fix: strip MEDIA: directives from streamed gateway messages (NousResearch#5152) docs(skill): comprehensive claude-code skill rewrite v2.0 (NousResearch#5155) fix(security): guard cron script against path traversal and redact output feat: add exit code context for common CLI tools in terminal results (NousResearch#5144) fix: move pre_llm_call plugin context to user message, preserve prompt cache (NousResearch#5146) fix: --yolo and other flags silently dropped when placed before 'chat' subcommand (NousResearch#5145) fix: include approval metadata in terminal tool results (NousResearch#5141) ...
Tommyeds
pushed a commit
to Tommyeds/hermes-agent
that referenced
this pull request
Apr 12, 2026
…t cache (NousResearch#5146) Plugin context from pre_llm_call hooks was injected into the system prompt, breaking the prompt cache prefix every turn when content changed (typical for memory plugins). Now all plugin context goes into the current turn's user message — the system prompt stays identical across turns, preserving cached tokens. The system prompt is reserved for Hermes internals. Plugins contribute context alongside the user's input. Also adds comprehensive documentation for all 6 plugin hooks: pre_tool_call, post_tool_call, pre_llm_call, post_llm_call, on_session_start, on_session_end — each with full callback signatures, parameter tables, firing conditions, and examples. Supersedes NousResearch#5138 which identified the same cache-busting bug and proposed an uncached system suffix approach. This fix goes further by removing system prompt injection entirely. Co-identified-by: OutThisLife (PR NousResearch#5138)
angelburgosrosado
pushed a commit
to angelburgosrosado/hermes-agent
that referenced
this pull request
Apr 27, 2026
…t cache (NousResearch#5146) Plugin context from pre_llm_call hooks was injected into the system prompt, breaking the prompt cache prefix every turn when content changed (typical for memory plugins). Now all plugin context goes into the current turn's user message — the system prompt stays identical across turns, preserving cached tokens. The system prompt is reserved for Hermes internals. Plugins contribute context alongside the user's input. Also adds comprehensive documentation for all 6 plugin hooks: pre_tool_call, post_tool_call, pre_llm_call, post_llm_call, on_session_start, on_session_end — each with full callback signatures, parameter tables, firing conditions, and examples. Supersedes NousResearch#5138 which identified the same cache-busting bug and proposed an uncached system suffix approach. This fix goes further by removing system prompt injection entirely. Co-identified-by: OutThisLife (PR NousResearch#5138)
02356abc
pushed a commit
to 02356abc/hermes-agent
that referenced
this pull request
May 14, 2026
…t cache (NousResearch#5146) Plugin context from pre_llm_call hooks was injected into the system prompt, breaking the prompt cache prefix every turn when content changed (typical for memory plugins). Now all plugin context goes into the current turn's user message — the system prompt stays identical across turns, preserving cached tokens. The system prompt is reserved for Hermes internals. Plugins contribute context alongside the user's input. Also adds comprehensive documentation for all 6 plugin hooks: pre_tool_call, post_tool_call, pre_llm_call, post_llm_call, on_session_start, on_session_end — each with full callback signatures, parameter tables, firing conditions, and examples. Supersedes NousResearch#5138 which identified the same cache-busting bug and proposed an uncached system suffix approach. This fix goes further by removing system prompt injection entirely. Co-identified-by: OutThisLife (PR NousResearch#5138)
olympus-terminal
pushed a commit
to olympus-terminal/hermes-agent
that referenced
this pull request
May 16, 2026
…t cache (NousResearch#5146) Plugin context from pre_llm_call hooks was injected into the system prompt, breaking the prompt cache prefix every turn when content changed (typical for memory plugins). Now all plugin context goes into the current turn's user message — the system prompt stays identical across turns, preserving cached tokens. The system prompt is reserved for Hermes internals. Plugins contribute context alongside the user's input. Also adds comprehensive documentation for all 6 plugin hooks: pre_tool_call, post_tool_call, pre_llm_call, post_llm_call, on_session_start, on_session_end — each with full callback signatures, parameter tables, firing conditions, and examples. Supersedes NousResearch#5138 which identified the same cache-busting bug and proposed an uncached system suffix approach. This fix goes further by removing system prompt injection entirely. Co-identified-by: OutThisLife (PR NousResearch#5138)
9 tasks
gweeteve
pushed a commit
to gweeteve/hermes-agent
that referenced
this pull request
Jun 2, 2026
…t cache (NousResearch#5146) Plugin context from pre_llm_call hooks was injected into the system prompt, breaking the prompt cache prefix every turn when content changed (typical for memory plugins). Now all plugin context goes into the current turn's user message — the system prompt stays identical across turns, preserving cached tokens. The system prompt is reserved for Hermes internals. Plugins contribute context alongside the user's input. Also adds comprehensive documentation for all 6 plugin hooks: pre_tool_call, post_tool_call, pre_llm_call, post_llm_call, on_session_start, on_session_end — each with full callback signatures, parameter tables, firing conditions, and examples. Supersedes NousResearch#5138 which identified the same cache-busting bug and proposed an uncached system suffix approach. This fix goes further by removing system prompt injection entirely. Co-identified-by: OutThisLife (PR NousResearch#5138)
Egavasyug
pushed a commit
to Egavasyug/hermes-agent
that referenced
this pull request
Jun 10, 2026
…t cache (NousResearch#5146) Plugin context from pre_llm_call hooks was injected into the system prompt, breaking the prompt cache prefix every turn when content changed (typical for memory plugins). Now all plugin context goes into the current turn's user message — the system prompt stays identical across turns, preserving cached tokens. The system prompt is reserved for Hermes internals. Plugins contribute context alongside the user's input. Also adds comprehensive documentation for all 6 plugin hooks: pre_tool_call, post_tool_call, pre_llm_call, post_llm_call, on_session_start, on_session_end — each with full callback signatures, parameter tables, firing conditions, and examples. Supersedes NousResearch#5138 which identified the same cache-busting bug and proposed an uncached system suffix approach. This fix goes further by removing system prompt injection entirely. Co-identified-by: OutThisLife (PR NousResearch#5138)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Plugin context from
pre_llm_callhooks was injected into the system prompt, breaking the prompt cache prefix every turn when the content changed (typical for memory plugins — different query = different recalled memories = different system prompt = cache miss).Fix: All plugin context now goes into the current turn's user message. The system prompt stays identical across turns, preserving cached tokens. The system prompt is Hermes's territory (model guidance, tool enforcement, personality, skills). Plugins contribute context alongside the user's input.
This is a deliberate design choice, not a default — there is no
targetoption. Plugins cannot modify the system prompt.Supersedes #5138
@OutThisLife independently identified the same cache-busting bug in PR #5138 and proposed injecting plugin context as an uncached system suffix block after cache markers are placed. This PR goes further by removing system prompt injection entirely — simpler contract, zero chance of cache interference.
Changes
Core (
run_agent.py)Documentation (3 pages, +500 lines)
hooks.md— Every hook now fully defined: callback signature with types, parameter table, exact firing location and conditions, return value behavior, use cases, 1-2 complete examplesbuild-a-hermes-plugin.md— Hook reference table links to full definitions;pre_llm_callcontext injection section with return format, caching rationale, examples (memory recall, guardrails, observer-only)plugins.md— Hook table entries link to full definitions on hooks.mdCode (
plugins.py)invoke_hookdocstring documents user-message-only injection contractTests (
test_plugins.py, +5 tests)Test plan