fix: move pre_llm_call plugin context to user message, preserve prompt cache by teknium1 · Pull Request #5146 · NousResearch/hermes-agent

teknium1 · 2026-04-04T23:54:08Z

Summary

Plugin context from pre_llm_call hooks was injected into the system prompt, breaking the prompt cache prefix every turn when the content changed (typical for memory plugins — different query = different recalled memories = different system prompt = cache miss).

Fix: All plugin context now goes into the current turn's user message. The system prompt stays identical across turns, preserving cached tokens. The system prompt is Hermes's territory (model guidance, tool enforcement, personality, skills). Plugins contribute context alongside the user's input.

This is a deliberate design choice, not a default — there is no target option. Plugins cannot modify the system prompt.

Supersedes #5138

@OutThisLife independently identified the same cache-busting bug in PR #5138 and proposed injecting plugin context as an uncached system suffix block after cache markers are placed. This PR goes further by removing system prompt injection entirely — simpler contract, zero chance of cache interference.

Changes

Core (`run_agent.py`)

Plugin context collection simplified: single bucket, no target routing
User message injection: plugin context appended alongside memory manager prefetch at API call time
System prompt injection block removed, replaced with explanatory comment

Documentation (3 pages, +500 lines)

hooks.md — Every hook now fully defined: callback signature with types, parameter table, exact firing location and conditions, return value behavior, use cases, 1-2 complete examples
build-a-hermes-plugin.md — Hook reference table links to full definitions; pre_llm_call context injection section with return format, caching rationale, examples (memory recall, guardrails, observer-only)
plugins.md — Hook table entries link to full definitions on hooks.md

Code (`plugins.py`)

invoke_hook docstring documents user-message-only injection contract

Tests (`test_plugins.py`, +5 tests)

Dict context collection
Plain string returns
Multiple plugins collected
Routing logic simulation matching run_agent.py pattern
Context without target key

Test plan

python -m pytest tests/test_plugins.py -o 'addopts=' -q  # 23 passed

…t cache Plugin context from pre_llm_call hooks was injected into the system prompt, breaking the prompt cache prefix every turn when content changed (typical for memory plugins). Now all plugin context goes into the current turn's user message — the system prompt stays identical across turns, preserving cached tokens. The system prompt is reserved for Hermes internals. Plugins contribute context alongside the user's input. Also adds comprehensive documentation for all 6 plugin hooks: pre_tool_call, post_tool_call, pre_llm_call, post_llm_call, on_session_start, on_session_end — each with full callback signatures, parameter tables, firing conditions, and examples. Supersedes #5138 which identified the same cache-busting bug and proposed an uncached system suffix approach. This fix goes further by removing system prompt injection entirely. Co-identified-by: OutThisLife (PR #5138)

github-actions · 2026-04-04T23:54:21Z

⚠️ Supply Chain Risk Detected

This PR contains patterns commonly associated with supply chain attacks. This does not mean the PR is malicious — but these patterns require careful human review before merging.

⚠️ WARNING: Outbound network calls (POST/PUT)

Outbound POST/PUT requests in new code could be data exfiltration. Verify the destination URLs are legitimate.

Matches (first 10):

409:+        resp = httpx.post(f"{MEMORY_API}/recall", json={
680:+        resp = httpx.post(f"{MEMORY_API}/recall", json={
745:+        httpx.post(f"{MEMORY_API}/store", json={

Automated scan triggered by supply-chain-audit. If this is a false positive, a maintainer can approve after manual review.

…t cache (NousResearch#5146) Plugin context from pre_llm_call hooks was injected into the system prompt, breaking the prompt cache prefix every turn when content changed (typical for memory plugins). Now all plugin context goes into the current turn's user message — the system prompt stays identical across turns, preserving cached tokens. The system prompt is reserved for Hermes internals. Plugins contribute context alongside the user's input. Also adds comprehensive documentation for all 6 plugin hooks: pre_tool_call, post_tool_call, pre_llm_call, post_llm_call, on_session_start, on_session_end — each with full callback signatures, parameter tables, firing conditions, and examples. Supersedes NousResearch#5138 which identified the same cache-busting bug and proposed an uncached system suffix approach. This fix goes further by removing system prompt injection entirely. Co-identified-by: OutThisLife (PR NousResearch#5138)

* upstream/main: (29 commits) style: use module-level re import instead of local import re as _re Preserve numeric credential labels in auth removal Honor provider reset windows in pooled credential failover docs: update docstring to mention Fireworks strict validation test: add strict API validation tests for Fireworks compatibility test: add test for _should_sanitize_tool_calls() refactor: use _should_sanitize_tool_calls in run_conversation() refactor: use _should_sanitize_tool_calls in _handle_max_iterations() refactor: use _should_sanitize_tool_calls in flush_memories() feat: add _should_sanitize_tool_calls() method test(redact): add regression tests for lowercase variable redaction (NousResearch#4367) (NousResearch#5185) docs(skill): claude-code v2.2 — add cheat sheet commands, env vars, rules, advanced features (NousResearch#5158) fix(telegram): prevent duplicate message delivery on send timeout (NousResearch#5153) fix: strip MEDIA: directives from streamed gateway messages (NousResearch#5152) docs(skill): comprehensive claude-code skill rewrite v2.0 (NousResearch#5155) fix(security): guard cron script against path traversal and redact output feat: add exit code context for common CLI tools in terminal results (NousResearch#5144) fix: move pre_llm_call plugin context to user message, preserve prompt cache (NousResearch#5146) fix: --yolo and other flags silently dropped when placed before 'chat' subcommand (NousResearch#5145) fix: include approval metadata in terminal tool results (NousResearch#5141) ...

…t cache (NousResearch#5146) Plugin context from pre_llm_call hooks was injected into the system prompt, breaking the prompt cache prefix every turn when content changed (typical for memory plugins). Now all plugin context goes into the current turn's user message — the system prompt stays identical across turns, preserving cached tokens. The system prompt is reserved for Hermes internals. Plugins contribute context alongside the user's input. Also adds comprehensive documentation for all 6 plugin hooks: pre_tool_call, post_tool_call, pre_llm_call, post_llm_call, on_session_start, on_session_end — each with full callback signatures, parameter tables, firing conditions, and examples. Supersedes NousResearch#5138 which identified the same cache-busting bug and proposed an uncached system suffix approach. This fix goes further by removing system prompt injection entirely. Co-identified-by: OutThisLife (PR NousResearch#5138)

teknium1 merged commit 5879b3e into main Apr 4, 2026
6 of 7 checks passed

teknium1 mentioned this pull request Apr 4, 2026

fix: inject plugin context after cache markers to preserve Anthropic … #5138

Closed

9 tasks

NoxsMedia mentioned this pull request Apr 26, 2026

fix(plugins): honor pre_llm_call short_circuit_response #15205

Open

alt-glitch mentioned this pull request May 16, 2026

fix: inject plugin context after cache markers to preserve Anthropic prompt cache prefix stability #27093

Closed

9 tasks

xiaoyaner0201 mentioned this pull request Jun 4, 2026

feat(gateway): authoritative sender attribution in all chat contexts #13939

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: move pre_llm_call plugin context to user message, preserve prompt cache#5146

fix: move pre_llm_call plugin context to user message, preserve prompt cache#5146
teknium1 merged 1 commit into
mainfrom
hermes/hermes-511b79a5

teknium1 commented Apr 4, 2026

Uh oh!

github-actions Bot commented Apr 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

teknium1 commented Apr 4, 2026

Summary

Supersedes #5138

Changes

Core (run_agent.py)

Documentation (3 pages, +500 lines)

Code (plugins.py)

Tests (test_plugins.py, +5 tests)

Test plan

Uh oh!

github-actions Bot commented Apr 4, 2026

⚠️ Supply Chain Risk Detected

⚠️ WARNING: Outbound network calls (POST/PUT)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Core (`run_agent.py`)

Code (`plugins.py`)

Tests (`test_plugins.py`, +5 tests)