[KVConnector][Feature] Support KV connector cache reset via /reset_prefix_cache by ptovam · Pull Request #27170 · vllm-project/vllm

ptovam · 2025-10-19T10:44:37Z

Purpose

Extends the existing /reset_prefix_cache endpoint with an optional reset_external flag.
When set, both the local (GPU) and external (connector) prefix caches are reset.

This allows clearing connector-side caches dynamically without restarting the server -useful for benchmarking and testing.

gemini-code-assist

Code Review

This pull request introduces a new API to reset the KV connector cache, which is a useful feature for testing and benchmarking. The implementation is well-structured, adding the reset_cache method to the base connector class and implementing it through the stack for both synchronous and asynchronous paths.

My main feedback is about a bug in the synchronous path where the boolean return value indicating the success of the cache reset is not propagated up to the LLM class. This results in LLM.reset_connector_cache having an incorrect return type hint and behavior. I've provided detailed comments and suggestions to fix this issue across the affected files. The asynchronous path used by the API server correctly handles this by design, as it doesn't check the return status.

Overall, the changes are good, and with the suggested fix, the new API will be more robust.

markmc · 2025-10-20T08:26:53Z

Is there a compelling reason to add a new endpoint? It would seem reasonable to piggy-back on /reset_prefix_cache ?

ptovam · 2025-10-20T09:08:53Z

Is there a compelling reason to add a new endpoint? It would seem reasonable to piggy-back on /reset_prefix_cache ?

Good point, there's no strict need for a separate endpoint.
The goal was mainly to keep the option to reset only the GPU prefix cache, or both GPU and connector caches together.
Extending /reset_prefix_cache with an optional flag would work just as well if that makes more sense.

ptovam · 2025-10-20T16:35:27Z

Is there a compelling reason to add a new endpoint? It would seem reasonable to piggy-back on /reset_prefix_cache ?

Updated to extend /reset_prefix_cache with an optional reset_external flag, replacing the separate endpoint.

markmc

lgtm

DarkLight1337 · 2025-10-22T13:19:36Z

cc @NickLucche

NickLucche

lgtm

NickLucche

Actually, sorry, one comment on the interface for the multiconnector case

NickLucche · 2025-10-22T13:28:23Z

+        Returns:
+            bool: True if the cache was successfully reset, False otherwise.
+        """
+        return False


this could be optional to implement the suggestion below

Optional[bool] or bool | None as return type, to allow for the all(..) check to work fine (you rule-out the connectors that do NOT implement it)

Why treat connectors that haven’t implemented this as successful?
It could confuse users who aren’t aware of the implementation details.

connectors that have not implemented it will return None which is neither.
If you do any(..), one sub-connector being successful will result in multiconnector returning successful transfers.

What I would expect semantically is that multiconnector only returns true when all sub-connectors that implement the interface return True (hence None return value is needed to discern those that have implemented and yet fail).

Got it - I’ve pushed a commit with that change, thanks for clarifying.

ptovam · 2025-10-23T09:19:16Z

Once the prefix cache metrics get merged - #26245, we should reset the metric too.

Edit: Done

ptovam · 2025-11-06T09:58:46Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a feature to reset the external KV connector cache via the /reset_prefix_cache endpoint, controlled by a new reset_external flag. The changes are well-propagated from the API layer down to the scheduler and connectors. My review focuses on the implementation of the reset logic, where I've identified two high-severity issues related to short-circuiting evaluation that could prevent all relevant caches from being reset. I've provided suggestions to ensure the reset operations are always fully executed as intended.

ptovam · 2025-11-10T10:27:32Z

@NickLucche Ready for another review — please let me know what you think.
Thanks.

…efix_cache Signed-off-by: tovam <tovam@pliops.com>

Signed-off-by: tovam <tovam@pliops.com>

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Tova Movshovitz <tovam@pliops.com>

ptovam · 2025-12-02T12:54:47Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a feature to reset the external KV connector cache through the existing /reset_prefix_cache endpoint by adding a reset_external flag. The implementation is well-structured, with the new functionality correctly propagated from the API layer down to the scheduler and KV connectors. The logic for handling cache resets, including for MultiConnector and for connectors that may not implement this functionality, is sound. The code is clean and adheres to the existing patterns within the codebase. I have not identified any high or critical severity issues in these changes.

NickLucche

Apologies for the delay in getting back to you, this looks well done!

…or that does not implement resetting Signed-off-by: tovam <tovam@pliops.com>

NickLucche · 2025-12-05T17:16:50Z

@ptovam enabling auto-merge, let's see if we can get CI green

ivanium · 2026-03-30T23:03:41Z

Curious should we make reset_connector=True by default? I may not have the full picture here but I think reset_prefix_cache is mainly used in RL cases, where we also need to reset connector caches.

…efix_cache (vllm-project#27170) Signed-off-by: tovam <tovam@pliops.com> Signed-off-by: Tova Movshovitz <tovam@pliops.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

…nnector_cache ``Scheduler.reset_connector_cache()`` currently returns ``False`` when no KV connector is configured, which makes ``Scheduler.reset_prefix_cache(reset_running_requests, reset_connector=True)`` return ``False`` on every engine that doesn't have a connector -- even when the local prefix cache reset itself succeeded. A caller that interprets the return value as "did the reset I asked for succeed?" (e.g. an RL framework that asks for a clear before a weight update, or an ops tool that pipes the return value into an SLO check) sees a spurious failure for the most common deployment shape (no KV connector configured). The user-facing log line was also a ``warning``, even though "no connector exists, so nothing to reset" is the expected steady-state behavior of most engines. Fix: - Return ``True`` when ``self.connector is None``. There is no external KV store to invalidate, so the operation trivially succeeds; the local prefix cache (handled earlier in ``reset_prefix_cache``) decides the real return value. - Demote the log line from ``warning`` to ``debug``: this branch is hit on every engine without a connector, which is the norm rather than a misconfiguration. The True/False/None return contract on ``KVConnector.reset_cache`` itself (None = not implemented, False = explicit failure, anything else = success) is unchanged; only the "no connector at all" path moves from a sentinel failure to a sentinel success, matching how ``reset_mm_cache`` / ``reset_encoder_cache`` already behave when their target structures are empty. Tests: ``tests/v1/core/test_scheduler.py::test_reset_connector_cache_no_connector_is_no_op_success`` covers both the direct method call and the ``reset_prefix_cache(reset_connector=True)`` cascade end-to-end on a no-connector scheduler. ## Duplicate-work check - ``gh pr list --repo vllm-project/vllm --state open --search "reset_connector_cache in:body"`` -> no results - ``gh pr list --repo vllm-project/vllm --state open --search "reset_connector reset_prefix_cache"`` -> no results - ``gh issue list --repo vllm-project/vllm --state open --search "reset_connector_cache"`` -> no results The only prior PR touching this method is vllm-project#27170 (which introduced the ``reset_connector`` parameter); that PR is merged and this is a follow-up correctness fix in the same method. ## AI assistance This patch was authored with assistance from Claude Opus 4.7 (1M context). Every changed line was reviewed by the submitter before opening the PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: aoshen524 <aoshen524@gmail.com>

`AsyncLLM.pause_generation(clear_cache=True)` internally calls `EngineCore._reset_caches`, which then calls `Scheduler.reset_prefix_cache(reset_running_requests=True)` -- with the new `reset_connector` parameter (added in vllm-project#27170) defaulting to False. For engines with a configured external KV store connector (e.g. `MooncakeStoreConnector`, `LMCacheConnectorV1`, `HF3FSConnector`, ...) that means a user asking the engine to "clear cache" after a weight update gets the local prefix cache cleared but leaves the external store full of KV blocks hashed against the previous policy. For RL post-training workflows this is a silent stale-cache correctness hole: the next request can read external KV that was written by the old policy and serve it as a prefix hit against the new policy. This patch parameterizes `EngineCore._reset_caches` with `reset_connector: bool = True` and forwards the value to `reset_prefix_cache`. The default flips from False to True so that `pause_generation(clear_cache=True)` -- the user-facing API whose contract is "clear caches" -- now actually clears the external KV store too. `Scheduler.reset_connector_cache` already gracefully handles the no-connector case (warns + returns False), so this is a no-op for engines that don't configure a connector. Internal callers that want a local-only invalidation can override with `reset_connector=False`; outside callers can still bypass the cascade by calling `scheduler.reset_prefix_cache(reset_connector=False)` directly. This intentionally keeps `reset_connector` *out* of the user-facing `AsyncLLM.pause_generation` signature: threading an opt-in flag all the way up turns the safety net into a footgun (user forgets the flag -> silent stale KV). The user-facing surface stays a single `clear_cache` semantic ("clear means clear"). Future internal callers can override via the new kwarg if they need to. ## Why this is not a duplicate - `gh pr list --repo vllm-project/vllm --state open --search "_reset_caches reset_connector"` -> empty - `gh pr list --repo vllm-project/vllm --state open --search "pause_generation clear_cache reset_connector"` -> empty - `gh issue list --repo vllm-project/vllm --state open --search "pause_generation reset_connector"` -> empty Related PRs: - vllm-project#27170 introduced `reset_connector` on `Scheduler.reset_prefix_cache`. - vllm-project#42693 is a parallel correctness fix for the no-connector branch of the same code path (returns True instead of False when no connector is attached) -- independent of this change. - vllm-project#42694 implements `MooncakeStoreConnector.reset_cache`, the most immediate beneficiary of this cascade. ## Test plan No new test is added: the behavior change is exercised by the existing `tests/v1/core/test_reset_prefix_cache_e2e.py` (no-connector engine keeps working) and by the new tests in vllm-project#42694 (Mooncake connector now sees its `reset_cache` called via `pause_generation(clear_cache=True)`). Commands run locally: ```bash pre-commit run --files vllm/v1/engine/core.py # -> ruff check, ruff format, mypy-3.10, SPDX, etc. all Passed ``` The full pytest run could not be executed in my local sandbox (the editable `vllm` install in the venv points at a different worktree whose precompiled CUDA flash-attention extensions don't match upstream `main`). CI will be the source of truth. ## AI assistance This patch was authored with assistance from Claude Opus 4.7 (1M context). Every changed line was reviewed by the submitter before opening the PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: aoshen524 <aoshen524@gmail.com>

…efix_cache (vllm-project#27170) Signed-off-by: tovam <tovam@pliops.com> Signed-off-by: Tova Movshovitz <tovam@pliops.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

ptovam requested review from ApostaC, NickLucche, WoosukKwon, aarnphm, alexm-redhat, chaunceyjiang, comaniac, heheda12345, njhill, robertgshaw2-redhat and ywang96 as code owners October 19, 2025 10:44

mergify Bot added frontend v1 kv-connector labels Oct 19, 2025

gemini-code-assist Bot reviewed Oct 19, 2025

View reviewed changes

ptovam force-pushed the reset_connector_cache branch from 4b86b33 to ae370d8 Compare October 20, 2025 16:29

markmc approved these changes Oct 21, 2025

View reviewed changes

Comment thread vllm/entrypoints/openai/api_server.py Outdated

ptovam force-pushed the reset_connector_cache branch from ae370d8 to dc9168f Compare October 21, 2025 06:43

ptovam changed the title ~~[KVConnector][Feature] Add API to reset KV connector cache~~ [KVConnector][Feature] Support KV connector cache reset via /reset_prefix_cache Oct 21, 2025

markmc added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 21, 2025

NickLucche approved these changes Oct 22, 2025

View reviewed changes

NickLucche requested changes Oct 22, 2025

View reviewed changes

ptovam force-pushed the reset_connector_cache branch 2 times, most recently from 1dd3ae5 to 26292d1 Compare October 23, 2025 09:04

ptovam force-pushed the reset_connector_cache branch from 26292d1 to e8045dc Compare October 23, 2025 11:50

mergify Bot removed the needs-rebase label Oct 30, 2025

gemini-code-assist Bot reviewed Nov 6, 2025

View reviewed changes

Comment thread vllm/distributed/kv_transfer/kv_connector/v1/multi_connector.py Outdated

Comment thread vllm/v1/core/sched/scheduler.py Outdated

ptovam and others added 6 commits December 2, 2025 14:06

[KVConnector][Feature] Support KV connector cache reset via /reset_pr…

569cb19

…efix_cache Signed-off-by: tovam <tovam@pliops.com>

Use all() to ensure all connector cache resets succeed

c41a06c

Signed-off-by: tovam <tovam@pliops.com>

Make reset_cache optionally return None for unimplemented connectors

c2d42c0

Signed-off-by: tovam <tovam@pliops.com>

Reset prefix cache metric

a350945

Signed-off-by: tovam <tovam@pliops.com>

multi-connector: ensure all connectors are attempted to be reset

feddf13

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Tova Movshovitz <tovam@pliops.com>

ensure local and connector caches are reset independently

12b5448

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Tova Movshovitz <tovam@pliops.com>

ptovam force-pushed the reset_connector_cache branch from 395dd1a to 12b5448 Compare December 2, 2025 12:51

gemini-code-assist Bot reviewed Dec 2, 2025

View reviewed changes

NickLucche approved these changes Dec 4, 2025

View reviewed changes

Comment thread vllm/v1/core/sched/scheduler.py

Comment thread vllm/v1/core/sched/scheduler.py

ptovam added 2 commits December 4, 2025 13:37

Add a debug log for calls that require resetting a cache on a connect…

ff585c5

…or that does not implement resetting Signed-off-by: tovam <tovam@pliops.com>

Merge branch 'main' into reset_connector_cache

7bc54ab

NickLucche enabled auto-merge (squash) December 5, 2025 17:16

NickLucche merged commit adb3150 into vllm-project:main Dec 5, 2025
56 checks passed

This was referenced May 15, 2026

[KVConnector][Bugfix] Treat no-connector as no-op success in reset_connector_cache #42693

Closed

[KVConnector][Mooncake] Wire reset_cache cascade end-to-end #42694

Merged

[Engine] Reset KV connector cache in pause_generation cascade #42695

Closed

Uh oh!

Conversation

ptovam commented Oct 19, 2025 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

markmc commented Oct 20, 2025

Uh oh!

ptovam commented Oct 20, 2025

Uh oh!

ptovam commented Oct 20, 2025

Uh oh!

markmc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

DarkLight1337 commented Oct 22, 2025

Uh oh!

NickLucche left a comment

Choose a reason for hiding this comment

Uh oh!

NickLucche left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

NickLucche Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

ptovam Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

NickLucche Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

ptovam Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

NickLucche Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

ptovam Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

ptovam commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ptovam commented Nov 6, 2025

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

ptovam commented Nov 10, 2025

Uh oh!

ptovam commented Dec 2, 2025

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

NickLucche left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

NickLucche commented Dec 5, 2025

Uh oh!

Uh oh!

ptovam commented Oct 19, 2025 •

edited by github-actions Bot

Loading

ptovam commented Oct 23, 2025 •

edited

Loading