Skip to content

[Bug]: post_llm_call response overrides are applied after persistence, causing final_response/history mismatch #14894

@M1p0

Description

@M1p0

Bug Description

post_llm_call hooks can return a replacement response, but the override is currently applied after session persistence.

This means the user-facing final_response can differ from the assistant message stored in result["messages"], the SQLite session DB, and the JSON session log. On the next turn or after resuming the session, Hermes replays the original model response instead of the hook-modified response.

This creates inconsistent behavior for plugins that use post_llm_call for response post-processing, rendering, policy transforms, or persona/style shaping.

Steps to Reproduce

  1. Register a post_llm_call hook that returns a replacement response:
def on_post_llm_call(**kwargs):
    return {"response": "patched response"}

ctx.register_hook("post_llm_call", on_post_llm_call)
  1. Run a normal conversation turn where the model produces a response, for example:
original response
  1. Observe the returned/displayed final_response.

  2. Inspect any of the following:

  • result["messages"]
  • the persisted session transcript
  • the SQLite session DB
  • the next turn's replayed conversation context
  • a resumed session

Expected Behavior

If post_llm_call supports response overrides, the overridden response should be applied consistently to the completed assistant turn.

The following should all agree:

  • returned result["final_response"]
  • last assistant message in result["messages"]
  • persisted session DB / JSON session log
  • next-turn conversation replay
  • resumed session transcript

Alternatively, if post_llm_call overrides are intended to be display-only, this should be documented explicitly to avoid plugin authors assuming durable response mutation.

Actual Behavior

The hook override affects the returned user-facing final_response, but does not update the already-persisted assistant message.

Current order is effectively:

self._persist_session(messages, conversation_history)

_post_results = invoke_hook("post_llm_call", ...)
for r in _post_results:
    final_response = ...

As a result:

  • the user may see "patched response"
  • result["messages"] still contains "original response"
  • persisted history still contains "original response"
  • the next turn uses "original response" as prior assistant context

Affected Component

CLI (interactive chat), Agent Core (conversation loop, context compression, memory)

Messaging Platform (if gateway-related)

No response

Debug Report

Report     https://paste.rs/pK9p0
  agent.log  https://paste.rs/N7B6x

Operating System

ubuntu 24.04

Python Version

No response

Hermes Version

No response

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

No response

Proposed Fix (optional)

No response

Are you willing to submit a PR for this?

  • I'd like to fix this myself and submit a PR

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existscomp/agentCore agent loop, run_agent.py, prompt buildercomp/pluginsPlugin system and bundled pluginstype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions