Skip to content

fix: move pre_llm_call plugin context to user message, preserve prompt cache#5146

Merged
teknium1 merged 1 commit into
mainfrom
hermes/hermes-511b79a5
Apr 4, 2026
Merged

fix: move pre_llm_call plugin context to user message, preserve prompt cache#5146
teknium1 merged 1 commit into
mainfrom
hermes/hermes-511b79a5

Conversation

@teknium1

@teknium1 teknium1 commented Apr 4, 2026

Copy link
Copy Markdown
Contributor

Summary

Plugin context from pre_llm_call hooks was injected into the system prompt, breaking the prompt cache prefix every turn when the content changed (typical for memory plugins — different query = different recalled memories = different system prompt = cache miss).

Fix: All plugin context now goes into the current turn's user message. The system prompt stays identical across turns, preserving cached tokens. The system prompt is Hermes's territory (model guidance, tool enforcement, personality, skills). Plugins contribute context alongside the user's input.

This is a deliberate design choice, not a default — there is no target option. Plugins cannot modify the system prompt.

Supersedes #5138

@OutThisLife independently identified the same cache-busting bug in PR #5138 and proposed injecting plugin context as an uncached system suffix block after cache markers are placed. This PR goes further by removing system prompt injection entirely — simpler contract, zero chance of cache interference.

Changes

Core (run_agent.py)

  • Plugin context collection simplified: single bucket, no target routing
  • User message injection: plugin context appended alongside memory manager prefetch at API call time
  • System prompt injection block removed, replaced with explanatory comment

Documentation (3 pages, +500 lines)

  • hooks.md — Every hook now fully defined: callback signature with types, parameter table, exact firing location and conditions, return value behavior, use cases, 1-2 complete examples
  • build-a-hermes-plugin.md — Hook reference table links to full definitions; pre_llm_call context injection section with return format, caching rationale, examples (memory recall, guardrails, observer-only)
  • plugins.md — Hook table entries link to full definitions on hooks.md

Code (plugins.py)

  • invoke_hook docstring documents user-message-only injection contract

Tests (test_plugins.py, +5 tests)

  • Dict context collection
  • Plain string returns
  • Multiple plugins collected
  • Routing logic simulation matching run_agent.py pattern
  • Context without target key

Test plan

python -m pytest tests/test_plugins.py -o 'addopts=' -q  # 23 passed

…t cache

Plugin context from pre_llm_call hooks was injected into the system
prompt, breaking the prompt cache prefix every turn when content
changed (typical for memory plugins). Now all plugin context goes
into the current turn's user message — the system prompt stays
identical across turns, preserving cached tokens.

The system prompt is reserved for Hermes internals. Plugins
contribute context alongside the user's input.

Also adds comprehensive documentation for all 6 plugin hooks:
pre_tool_call, post_tool_call, pre_llm_call, post_llm_call,
on_session_start, on_session_end — each with full callback
signatures, parameter tables, firing conditions, and examples.

Supersedes #5138 which identified the same cache-busting bug
and proposed an uncached system suffix approach. This fix goes
further by removing system prompt injection entirely.

Co-identified-by: OutThisLife (PR #5138)
@github-actions

github-actions Bot commented Apr 4, 2026

Copy link
Copy Markdown
Contributor

⚠️ Supply Chain Risk Detected

This PR contains patterns commonly associated with supply chain attacks. This does not mean the PR is malicious — but these patterns require careful human review before merging.

⚠️ WARNING: Outbound network calls (POST/PUT)

Outbound POST/PUT requests in new code could be data exfiltration. Verify the destination URLs are legitimate.

Matches (first 10):

409:+        resp = httpx.post(f"{MEMORY_API}/recall", json={
680:+        resp = httpx.post(f"{MEMORY_API}/recall", json={
745:+        httpx.post(f"{MEMORY_API}/store", json={

Automated scan triggered by supply-chain-audit. If this is a false positive, a maintainer can approve after manual review.

@teknium1 teknium1 merged commit 5879b3e into main Apr 4, 2026
6 of 7 checks passed
naoironman-hue pushed a commit to naoironman-hue/hermes-agent that referenced this pull request Apr 5, 2026
…t cache (NousResearch#5146)

Plugin context from pre_llm_call hooks was injected into the system
prompt, breaking the prompt cache prefix every turn when content
changed (typical for memory plugins). Now all plugin context goes
into the current turn's user message — the system prompt stays
identical across turns, preserving cached tokens.

The system prompt is reserved for Hermes internals. Plugins
contribute context alongside the user's input.

Also adds comprehensive documentation for all 6 plugin hooks:
pre_tool_call, post_tool_call, pre_llm_call, post_llm_call,
on_session_start, on_session_end — each with full callback
signatures, parameter tables, firing conditions, and examples.

Supersedes NousResearch#5138 which identified the same cache-busting bug
and proposed an uncached system suffix approach. This fix goes
further by removing system prompt injection entirely.

Co-identified-by: OutThisLife (PR NousResearch#5138)
jooray added a commit to jooray/hermes-agent that referenced this pull request Apr 5, 2026
* upstream/main: (29 commits)
  style: use module-level re import instead of local import re as _re
  Preserve numeric credential labels in auth removal
  Honor provider reset windows in pooled credential failover
  docs: update docstring to mention Fireworks strict validation
  test: add strict API validation tests for Fireworks compatibility
  test: add test for _should_sanitize_tool_calls()
  refactor: use _should_sanitize_tool_calls in run_conversation()
  refactor: use _should_sanitize_tool_calls in _handle_max_iterations()
  refactor: use _should_sanitize_tool_calls in flush_memories()
  feat: add _should_sanitize_tool_calls() method
  test(redact): add regression tests for lowercase variable redaction (NousResearch#4367) (NousResearch#5185)
  docs(skill): claude-code v2.2 — add cheat sheet commands, env vars, rules, advanced features (NousResearch#5158)
  fix(telegram): prevent duplicate message delivery on send timeout (NousResearch#5153)
  fix: strip MEDIA: directives from streamed gateway messages (NousResearch#5152)
  docs(skill): comprehensive claude-code skill rewrite v2.0 (NousResearch#5155)
  fix(security): guard cron script against path traversal and redact output
  feat: add exit code context for common CLI tools in terminal results (NousResearch#5144)
  fix: move pre_llm_call plugin context to user message, preserve prompt cache (NousResearch#5146)
  fix: --yolo and other flags silently dropped when placed before 'chat' subcommand (NousResearch#5145)
  fix: include approval metadata in terminal tool results (NousResearch#5141)
  ...
Tommyeds pushed a commit to Tommyeds/hermes-agent that referenced this pull request Apr 12, 2026
…t cache (NousResearch#5146)

Plugin context from pre_llm_call hooks was injected into the system
prompt, breaking the prompt cache prefix every turn when content
changed (typical for memory plugins). Now all plugin context goes
into the current turn's user message — the system prompt stays
identical across turns, preserving cached tokens.

The system prompt is reserved for Hermes internals. Plugins
contribute context alongside the user's input.

Also adds comprehensive documentation for all 6 plugin hooks:
pre_tool_call, post_tool_call, pre_llm_call, post_llm_call,
on_session_start, on_session_end — each with full callback
signatures, parameter tables, firing conditions, and examples.

Supersedes NousResearch#5138 which identified the same cache-busting bug
and proposed an uncached system suffix approach. This fix goes
further by removing system prompt injection entirely.

Co-identified-by: OutThisLife (PR NousResearch#5138)
angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 27, 2026
…t cache (NousResearch#5146)

Plugin context from pre_llm_call hooks was injected into the system
prompt, breaking the prompt cache prefix every turn when content
changed (typical for memory plugins). Now all plugin context goes
into the current turn's user message — the system prompt stays
identical across turns, preserving cached tokens.

The system prompt is reserved for Hermes internals. Plugins
contribute context alongside the user's input.

Also adds comprehensive documentation for all 6 plugin hooks:
pre_tool_call, post_tool_call, pre_llm_call, post_llm_call,
on_session_start, on_session_end — each with full callback
signatures, parameter tables, firing conditions, and examples.

Supersedes NousResearch#5138 which identified the same cache-busting bug
and proposed an uncached system suffix approach. This fix goes
further by removing system prompt injection entirely.

Co-identified-by: OutThisLife (PR NousResearch#5138)
02356abc pushed a commit to 02356abc/hermes-agent that referenced this pull request May 14, 2026
…t cache (NousResearch#5146)

Plugin context from pre_llm_call hooks was injected into the system
prompt, breaking the prompt cache prefix every turn when content
changed (typical for memory plugins). Now all plugin context goes
into the current turn's user message — the system prompt stays
identical across turns, preserving cached tokens.

The system prompt is reserved for Hermes internals. Plugins
contribute context alongside the user's input.

Also adds comprehensive documentation for all 6 plugin hooks:
pre_tool_call, post_tool_call, pre_llm_call, post_llm_call,
on_session_start, on_session_end — each with full callback
signatures, parameter tables, firing conditions, and examples.

Supersedes NousResearch#5138 which identified the same cache-busting bug
and proposed an uncached system suffix approach. This fix goes
further by removing system prompt injection entirely.

Co-identified-by: OutThisLife (PR NousResearch#5138)
olympus-terminal pushed a commit to olympus-terminal/hermes-agent that referenced this pull request May 16, 2026
…t cache (NousResearch#5146)

Plugin context from pre_llm_call hooks was injected into the system
prompt, breaking the prompt cache prefix every turn when content
changed (typical for memory plugins). Now all plugin context goes
into the current turn's user message — the system prompt stays
identical across turns, preserving cached tokens.

The system prompt is reserved for Hermes internals. Plugins
contribute context alongside the user's input.

Also adds comprehensive documentation for all 6 plugin hooks:
pre_tool_call, post_tool_call, pre_llm_call, post_llm_call,
on_session_start, on_session_end — each with full callback
signatures, parameter tables, firing conditions, and examples.

Supersedes NousResearch#5138 which identified the same cache-busting bug
and proposed an uncached system suffix approach. This fix goes
further by removing system prompt injection entirely.

Co-identified-by: OutThisLife (PR NousResearch#5138)
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
…t cache (NousResearch#5146)

Plugin context from pre_llm_call hooks was injected into the system
prompt, breaking the prompt cache prefix every turn when content
changed (typical for memory plugins). Now all plugin context goes
into the current turn's user message — the system prompt stays
identical across turns, preserving cached tokens.

The system prompt is reserved for Hermes internals. Plugins
contribute context alongside the user's input.

Also adds comprehensive documentation for all 6 plugin hooks:
pre_tool_call, post_tool_call, pre_llm_call, post_llm_call,
on_session_start, on_session_end — each with full callback
signatures, parameter tables, firing conditions, and examples.

Supersedes NousResearch#5138 which identified the same cache-busting bug
and proposed an uncached system suffix approach. This fix goes
further by removing system prompt injection entirely.

Co-identified-by: OutThisLife (PR NousResearch#5138)
Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request Jun 10, 2026
…t cache (NousResearch#5146)

Plugin context from pre_llm_call hooks was injected into the system
prompt, breaking the prompt cache prefix every turn when content
changed (typical for memory plugins). Now all plugin context goes
into the current turn's user message — the system prompt stays
identical across turns, preserving cached tokens.

The system prompt is reserved for Hermes internals. Plugins
contribute context alongside the user's input.

Also adds comprehensive documentation for all 6 plugin hooks:
pre_tool_call, post_tool_call, pre_llm_call, post_llm_call,
on_session_start, on_session_end — each with full callback
signatures, parameter tables, firing conditions, and examples.

Supersedes NousResearch#5138 which identified the same cache-busting bug
and proposed an uncached system suffix approach. This fix goes
further by removing system prompt injection entirely.

Co-identified-by: OutThisLife (PR NousResearch#5138)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant