[Core][Hybrid allocator + connector] Support hybrid allocator + kv cache connector#23624

Closed

KuntaiDu wants to merge 71 commits into

vllm-project:mainfrom

KuntaiDu:kuntai-support-hybrid-allocator

KuntaiDu commented Aug 26, 2025 •

edited by github-actions Bot

Loading

Collaborator

[Core][Hybrid allocator + connector] Support hybrid allocator + kv cache connector

Checklist at the bottom is considered.

Purpose

This PR aims to support hybrid allocator + kv cache connector code path.

Design doc: link

Related to #23079
Solves #22292

Test Plan

Local correctness test passed. Will further work on instructions to let other people reproduce.

Core test logic:

        first_prompt = "Hello, how are you?" * 5000 + "Hello, my name is"
        second_prompt = [
            "Hello, how are you?" * 1000 + "Tell me a very long story",
        ]
        sampling_params = SamplingParams(temperature=0, top_p=0.95, max_tokens=10)
        print_output(llm, [first_prompt], sampling_params, "first")
        print_output(llm, ["1" + first_prompt], sampling_params, "second")
        print_output(llm, ["2" + first_prompt], sampling_params, "second")

        # Now the first request is evicted. Run this request.
        # It will trigger KV cache loading from LMCache.
        print_output(llm, second_prompt, sampling_params, "third")

Test Result

For the last request:

[2025-08-26 05:36:37,718] LMCache INFO: Reqid: 3, Total tokens 6007, LMCache hit tokens: 5888, need to load: 5888 (vllm_v1_adapter.py:1091:lmcache.integration.vllm.vllm_v1_adapter)
[2025-08-26 05:36:37,820] LMCache INFO: Retrieved 5888 tokens (vllm_v1_adapter.py:822:lmcache.integration.vllm.vllm_v1_adapter)
Processed prompts: 100%|██████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.26it/s, est. speed input: 7594.87 toks/s, output: 12.64 toks/s]
--------------------------------------------------
Generated text: '.\n\nOkay, here we go...\n\nOnce'
Generation took 0.80 seconds, third request done.
--------------------------------------------------
[2025-08-26 05:36:44,886] LMCache INFO: Storage manager closed. (storage_manager.py:472:lmcache.v1.storage_backend.storage_manager)
[2025-08-26 05:36:48,332] LMCache INFO: LMCacheEngine closed. (cache_engine.py:965:lmcache.v1.cache_engine)

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

KuntaiDu added 5 commits

August 26, 2025 05:51


          initial release

7e61f1a

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>


          fall back to simpler case: even when the allocation can fully fit int…

96910d7

…o GPU memory, the inference results are wrong. Fix this first.

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>


          vllm side of hybrid allocator impl

28d5d8e

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>


          remove previous debug footprint

5d0b504

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>


          remove debugging codes

9f0ac8c

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

KuntaiDu requested review from ProExpertProg, WoosukKwon, alexm-redhat, comaniac, hmellor, houseroad, mgoin, njhill, robertgshaw2-redhat, simon-mo, tlrmchlsmth, yewentao256, youkaichao and ywang96 as code owners

August 26, 2025 05:54

mergify Bot added the v1 label

mergify Bot commented Aug 26, 2025

Contributor

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @KuntaiDu.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify Bot added the needs-rebase label


          merge from main, and resolve conflict in worker

2f4b7b2

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

mergify Bot removed the needs-rebase label

gemini-code-assist Bot commented Aug 26, 2025

Contributor

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.


          clean up some code diff footprint, and remove some debugging statements

a926b1d

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

KuntaiDu mentioned this pull request

[feat] Support hybrid allocator LMCache/LMCache#1436

Open

KuntaiDu marked this pull request as draft

August 26, 2025 06:05

KuntaiDu added 2 commits

August 26, 2025 21:51


          allow allocating when GPU memory is limited and make formatter happy

1cd2654

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>


          add an empty line to improve readability

34fbe1d

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

heheda12345 mentioned this pull request

[WIP][Don't Merge] Try to prototype hybrid allocator + kv cache connector #24840

Closed

5 tasks

heheda12345 and others added 6 commits

September 14, 2025 17:26


          further cleanup

5f8f21d

Signed-off-by: Chen Zhang <zhangch99@outlook.com>


          further cleanup

32f06e4

Signed-off-by: Chen Zhang <zhangch99@outlook.com>


          prototype hybrid allocator + connector

84953d0

Signed-off-by: Chen Zhang <zhangch99@outlook.com>


          remove assert

640cb04

Signed-off-by: Chen Zhang <zhangch99@outlook.com>


          Merge branch 'main' into kuntai-support-hybrid-allocator

b17e048


          Merge branch 'kuntai-support-hybrid-allocator' of https://github.com/…

d1e826f

…KuntaiDu/vllm into kuntai-support-hybrid-allocator

Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
Co-authored-by: heheda12345 <zhangch99@outlook.com>

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

hmellor reviewed

View reviewed changes

vllm/config/__init__.py Outdated


          [bugfix] count into , and align the function signature of , and fix c…

0fc91b0

…omments from @hmellor, and fix missing return value

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

KuntaiDu requested a review from ApostaC as a code owner

September 18, 2025 05:56

KuntaiDu added 2 commits

September 17, 2025 23:03


          [test] revert change to the test as we fall back to previous function…

63fd72c

… signature

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>


          [test] fix test

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

mergify Bot added the kv-connector label

KuntaiDu commented Sep 19, 2025

Collaborator Author

Per @heheda12345 's suggestion, this PR will be separated to smaller PRs to reduce the review overhead.

mergify Bot commented Sep 19, 2025

Contributor

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @KuntaiDu.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify Bot added the needs-rebase label

This was referenced Sep 22, 2025

[Core][Hybrid allocator + connector 1/n] Enable KV cache connector + hybrid allocator #25363

Closed

[Core][Hybrid allocator + connector 2/n] Unify remove_skipped_blocks by get_last_useful_token #25431

Merged

ivanium mentioned this pull request

[Core][Hybrid allocator + connector] Support hybrid allocator + kv cache connector #30166

Merged

5 tasks

ivanium added a commit to ivanium/vllm that referenced this pull request


          Squashed merge PR vllm-project#23624

e86fe9d

Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>
Co-authored-by: KuntaiDu <kuntai@uchicago.edu>

ivanium added a commit to ivanium/vllm that referenced this pull request


          Squashed merge PR vllm-project#23624

9847dda

Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>
Co-authored-by: KuntaiDu <kuntai@uchicago.edu>

github-actions Bot commented Dec 20, 2025

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

github-actions Bot added the stale label

heheda12345 commented Dec 21, 2025

Collaborator

we are working on this here #30166

heheda12345 closed this

ivanium added a commit to ivanium/vllm that referenced this pull request


          Squashed merge PR vllm-project#23624

066eb9f

Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>
Co-authored-by: KuntaiDu <kuntai@uchicago.edu>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

hmellor hmellor left review comments

heheda12345 heheda12345 left review comments

sfeng33 sfeng33 left review comments

WoosukKwon Awaiting requested review from WoosukKwon WoosukKwon is a code owner

robertgshaw2-redhat Awaiting requested review from robertgshaw2-redhat robertgshaw2-redhat is a code owner

njhill Awaiting requested review from njhill njhill is a code owner

ywang96 Awaiting requested review from ywang96 ywang96 is a code owner

comaniac Awaiting requested review from comaniac

alexm-redhat Awaiting requested review from alexm-redhat alexm-redhat is a code owner

simon-mo Awaiting requested review from simon-mo

youkaichao Awaiting requested review from youkaichao youkaichao is a code owner

mgoin Awaiting requested review from mgoin mgoin is a code owner

tlrmchlsmth Awaiting requested review from tlrmchlsmth tlrmchlsmth is a code owner

houseroad Awaiting requested review from houseroad houseroad is a code owner

yewentao256 Awaiting requested review from yewentao256 yewentao256 is a code owner

ProExpertProg Awaiting requested review from ProExpertProg ProExpertProg is a code owner

zhuohan123 Awaiting requested review from zhuohan123

NickLucche Awaiting requested review from NickLucche NickLucche is a code owner

ApostaC Awaiting requested review from ApostaC ApostaC is a code owner

Labels

kv-connector needs-rebase stale tpu v1