Skip to content

fix(gateway): strip internal fields from tool_calls on session reload to preserve KV cache#4563

Closed
ygd58 wants to merge 1 commit into
NousResearch:mainfrom
ygd58:fix/kv-cache-invalidation-session-reload
Closed

fix(gateway): strip internal fields from tool_calls on session reload to preserve KV cache#4563
ygd58 wants to merge 1 commit into
NousResearch:mainfrom
ygd58:fix/kv-cache-invalidation-session-reload

Conversation

@ygd58

@ygd58 ygd58 commented Apr 2, 2026

Copy link
Copy Markdown
Contributor

Fixes #4555

Problem

KV cache was fully invalidated on every new user message because session reload produced different tokens than the in-memory agentic loop. Three differences were identified.

Fix 1: Strip internal tool_call fields

call_id, response_item_id, finish_reason are Hermes-internal fields not part of OpenAI API spec. Stripped on session reload so tool_calls are byte-identical to agentic loop.

Fix 2: Normalize content whitespace

Assistant content trailing whitespace stripped consistently in both tool message path and simple message path.

Result

Messages sent to API are now consistent between agentic loop iteration and session reload, allowing local backends (llama.cpp, lemonade) to reuse KV cache across turns.

@lsunay

lsunay commented Apr 21, 2026

Copy link
Copy Markdown

Thanks for this fix! 👏

I wanted to mention a related but different issue: #13442

Your PR #4563:

  • Fixes KV cache invalidation on gateway session reload
  • Location: gateway/run.py
  • When: Gateway → CLI handoff

Issue #13442:

  • Fixes KV cache invalidation on every LLM request within run_conversation()
  • Location: run_agent.py (no global conversation state)
  • When: Agentic loop (multiple LLM requests per user message)

They're complementary! Both needed for full optimization:

  1. Your fix: Gateway reload cache invalidation ✅
  2. Our issue: Within-CLI agentic loop cache invalidation ❓

Would love your thoughts on #13442 if you have time!

@alt-glitch alt-glitch added type/perf Performance improvement or optimization P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery labels May 1, 2026
@teknium1

Copy link
Copy Markdown
Contributor

Automated hermes-sweeper review: this looks implemented on current main through the shared API-bound message sanitization path rather than by applying this exact gateway/run.py diff.

Evidence:

  • agent/conversation_loop.py:628-646 builds the API copy, removes top-level finish_reason, and calls _sanitize_tool_calls_for_strict_api() before sending.
  • run_agent.py:4878-4912 strips call_id and response_item_id from tool_calls on that outgoing copy while preserving internal history state.
  • agent/conversation_loop.py:708-716 normalizes string message content on the API copy specifically for prefix/KV-cache matching on local inference servers.
  • gateway/run.py:488-523 and tests/gateway/test_replay_entry_fields.py cover the gateway replay parity side for reasoning/reasoning_content/codex fields and finish_reason.

I also read the linked #4555 discussion and the #13442 comment here; #13442 is a related within-loop optimization, but this PR's gateway session-reload request is already covered on main.

@teknium1 teknium1 closed this Jun 10, 2026
@teknium1 teknium1 added the sweeper:implemented-on-main Sweeper: behavior already present on current main label Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists sweeper:implemented-on-main Sweeper: behavior already present on current main type/perf Performance improvement or optimization

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: KV cache invalidation on new user message due to message format differences between agentic loop and session reload

4 participants