fix(lmcache): correct store for cached requests and num_scheduled_tokens in lmcache_mp_connector.py by maobaolong · Pull Request #39655 · vllm-project/vllm

maobaolong · 2026-04-13T01:46:22Z

Purpose

Fix two bugs in lmcache_mp_connector.py related to incorrect KV store behavior for cached requests:

Fix num_scheduled_tokens source for cached requests: In _process_cached_requests, the num_new_tokens was incorrectly using cached_reqs.num_computed_tokens[idx] instead of scheduler_output.num_scheduled_tokens[request_id]. This caused inconsistency with _process_new_requests which uses the incremental num_scheduled_tokens.
Fix min_available_blocks upper bound calculation: The computed_blocks now correctly includes num_lmcache_hit_blocks so that the upper bound matches num_stored_blocks (which already covers hit blocks). Previously, hit blocks were excluded from the calculation, causing the upper bound to be too low and potentially skipping blocks that should be staged for storage.

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request updates the KV transfer logic to include LMCache hit blocks in the calculation of available blocks, ensuring the upper bound aligns with stored blocks. It also improves consistency in token tracking by using incremental scheduled tokens from the scheduler output. A redundant comment fragment was identified in the GetStoreMetadata function that should be removed for clarity.

…ens in lmcache_mp_connector.py Signed-off-by: baoloongmao <baoloongmao@tencent.com>

ApostaC

LGTM! Thanks for the fix!

maobaolong · 2026-04-13T02:46:10Z

@ApostaC Thanks for your review, @KuntaiDu Would you like to take another look? Thanks~

…ens in lmcache_mp_connector.py (vllm-project#39655) Signed-off-by: baoloongmao <baoloongmao@tencent.com>

…ens in lmcache_mp_connector.py (vllm-project#39655) Signed-off-by: baoloongmao <baoloongmao@tencent.com> Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>

maobaolong requested review from ApostaC, NickLucche and orozery as code owners April 13, 2026 01:46

mergify Bot added the kv-connector label Apr 13, 2026

gemini-code-assist Bot reviewed Apr 13, 2026

View reviewed changes

Comment thread vllm/distributed/kv_transfer/kv_connector/v1/lmcache_mp_connector.py Outdated

maobaolong mentioned this pull request Apr 13, 2026

fix(mp): correct store cached requests in lmcache_mp_connector LMCache/LMCache#3012

Merged

fix(lmcache): correct store for cached requests and num_scheduled_tok…

6af8b32

…ens in lmcache_mp_connector.py Signed-off-by: baoloongmao <baoloongmao@tencent.com>

maobaolong force-pushed the fix_not_store_bugs.vllm branch from 194ec3c to 6af8b32 Compare April 13, 2026 01:52

ApostaC approved these changes Apr 13, 2026

View reviewed changes

ApostaC enabled auto-merge (squash) April 13, 2026 01:59

github-actions Bot added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 13, 2026

ApostaC merged commit 2a3c32c into vllm-project:main Apr 13, 2026
58 checks passed

maobaolong mentioned this pull request Apr 13, 2026

fix(lmcache): correct store for cached requests while enable prefix cache #39719

Merged

5 tasks

wojciech-wais pushed a commit to wojciech-wais/vllm that referenced this pull request Apr 13, 2026

fix(lmcache): correct store for cached requests and num_scheduled_tok…

154b038

…ens in lmcache_mp_connector.py (vllm-project#39655) Signed-off-by: baoloongmao <baoloongmao@tencent.com>

whk-lab pushed a commit to whk-lab/vllm that referenced this pull request Apr 23, 2026

fix(lmcache): correct store for cached requests and num_scheduled_tok…

ffdc9dc

…ens in lmcache_mp_connector.py (vllm-project#39655) Signed-off-by: baoloongmao <baoloongmao@tencent.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(lmcache): correct store for cached requests and num_scheduled_tokens in lmcache_mp_connector.py#39655

fix(lmcache): correct store for cached requests and num_scheduled_tokens in lmcache_mp_connector.py#39655
ApostaC merged 1 commit intovllm-project:mainfrom
maobaolong:fix_not_store_bugs.vllm

maobaolong commented Apr 13, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

ApostaC left a comment

Uh oh!

maobaolong commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

maobaolong commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

ApostaC left a comment

Choose a reason for hiding this comment

Uh oh!

maobaolong commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

maobaolong commented Apr 13, 2026 •

edited

Loading