Bug Description
post_llm_call hooks can return a replacement response, but the override is currently applied after session persistence.
This means the user-facing final_response can differ from the assistant message stored in result["messages"], the SQLite session DB, and the JSON session log. On the next turn or after resuming the session, Hermes replays the original model response instead of the hook-modified response.
This creates inconsistent behavior for plugins that use post_llm_call for response post-processing, rendering, policy transforms, or persona/style shaping.
Steps to Reproduce
- Register a
post_llm_call hook that returns a replacement response:
def on_post_llm_call(**kwargs):
return {"response": "patched response"}
ctx.register_hook("post_llm_call", on_post_llm_call)
- Run a normal conversation turn where the model produces a response, for example:
-
Observe the returned/displayed final_response.
-
Inspect any of the following:
result["messages"]
- the persisted session transcript
- the SQLite session DB
- the next turn's replayed conversation context
- a resumed session
Expected Behavior
If post_llm_call supports response overrides, the overridden response should be applied consistently to the completed assistant turn.
The following should all agree:
- returned
result["final_response"]
- last assistant message in
result["messages"]
- persisted session DB / JSON session log
- next-turn conversation replay
- resumed session transcript
Alternatively, if post_llm_call overrides are intended to be display-only, this should be documented explicitly to avoid plugin authors assuming durable response mutation.
Actual Behavior
The hook override affects the returned user-facing final_response, but does not update the already-persisted assistant message.
Current order is effectively:
self._persist_session(messages, conversation_history)
_post_results = invoke_hook("post_llm_call", ...)
for r in _post_results:
final_response = ...
As a result:
- the user may see
"patched response"
result["messages"] still contains "original response"
- persisted history still contains
"original response"
- the next turn uses
"original response" as prior assistant context
Affected Component
CLI (interactive chat), Agent Core (conversation loop, context compression, memory)
Messaging Platform (if gateway-related)
No response
Debug Report
Report https://paste.rs/pK9p0
agent.log https://paste.rs/N7B6x
Operating System
ubuntu 24.04
Python Version
No response
Hermes Version
No response
Additional Logs / Traceback (optional)
Root Cause Analysis (optional)
No response
Proposed Fix (optional)
No response
Are you willing to submit a PR for this?
Bug Description
post_llm_callhooks can return a replacement response, but the override is currently applied after session persistence.This means the user-facing
final_responsecan differ from the assistant message stored inresult["messages"], the SQLite session DB, and the JSON session log. On the next turn or after resuming the session, Hermes replays the original model response instead of the hook-modified response.This creates inconsistent behavior for plugins that use
post_llm_callfor response post-processing, rendering, policy transforms, or persona/style shaping.Steps to Reproduce
post_llm_callhook that returns a replacement response:Observe the returned/displayed
final_response.Inspect any of the following:
result["messages"]Expected Behavior
If
post_llm_callsupports response overrides, the overridden response should be applied consistently to the completed assistant turn.The following should all agree:
result["final_response"]result["messages"]Alternatively, if
post_llm_calloverrides are intended to be display-only, this should be documented explicitly to avoid plugin authors assuming durable response mutation.Actual Behavior
The hook override affects the returned user-facing
final_response, but does not update the already-persisted assistant message.Current order is effectively:
As a result:
"patched response"result["messages"]still contains"original response""original response""original response"as prior assistant contextAffected Component
CLI (interactive chat), Agent Core (conversation loop, context compression, memory)
Messaging Platform (if gateway-related)
No response
Debug Report
Operating System
ubuntu 24.04
Python Version
No response
Hermes Version
No response
Additional Logs / Traceback (optional)
Root Cause Analysis (optional)
No response
Proposed Fix (optional)
No response
Are you willing to submit a PR for this?