fix(gateway): strip internal fields from tool_calls on session reload to preserve KV cache#4563
Closed
ygd58 wants to merge 1 commit into
Closed
Conversation
… to preserve KV cache
|
Thanks for this fix! 👏 I wanted to mention a related but different issue: #13442 Your PR #4563:
Issue #13442:
They're complementary! Both needed for full optimization:
Would love your thoughts on #13442 if you have time! |
Contributor
|
Automated hermes-sweeper review: this looks implemented on current main through the shared API-bound message sanitization path rather than by applying this exact gateway/run.py diff. Evidence:
I also read the linked #4555 discussion and the #13442 comment here; #13442 is a related within-loop optimization, but this PR's gateway session-reload request is already covered on main. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #4555
Problem
KV cache was fully invalidated on every new user message because session reload produced different tokens than the in-memory agentic loop. Three differences were identified.
Fix 1: Strip internal tool_call fields
call_id, response_item_id, finish_reason are Hermes-internal fields not part of OpenAI API spec. Stripped on session reload so tool_calls are byte-identical to agentic loop.
Fix 2: Normalize content whitespace
Assistant content trailing whitespace stripped consistently in both tool message path and simple message path.
Result
Messages sent to API are now consistent between agentic loop iteration and session reload, allowing local backends (llama.cpp, lemonade) to reuse KV cache across turns.