Skip to content

[Core][Hybrid allocator + connector] Support hybrid allocator + kv cache connector#23624

Closed
KuntaiDu wants to merge 71 commits into
vllm-project:mainfrom
KuntaiDu:kuntai-support-hybrid-allocator
Closed

[Core][Hybrid allocator + connector] Support hybrid allocator + kv cache connector#23624
KuntaiDu wants to merge 71 commits into
vllm-project:mainfrom
KuntaiDu:kuntai-support-hybrid-allocator

Conversation

@KuntaiDu

@KuntaiDu KuntaiDu commented Aug 26, 2025

Copy link
Copy Markdown
Collaborator

[Core][Hybrid allocator + connector] Support hybrid allocator + kv cache connector

Checklist at the bottom is considered.

Purpose

This PR aims to support hybrid allocator + kv cache connector code path.

Design doc: link

Related to #23079
Solves #22292

Test Plan

Local correctness test passed. Will further work on instructions to let other people reproduce.

Core test logic:

        first_prompt = "Hello, how are you?" * 5000 + "Hello, my name is"
        second_prompt = [
            "Hello, how are you?" * 1000 + "Tell me a very long story",
        ]
        sampling_params = SamplingParams(temperature=0, top_p=0.95, max_tokens=10)
        print_output(llm, [first_prompt], sampling_params, "first")
        print_output(llm, ["1" + first_prompt], sampling_params, "second")
        print_output(llm, ["2" + first_prompt], sampling_params, "second")

        # Now the first request is evicted. Run this request.
        # It will trigger KV cache loading from LMCache.
        print_output(llm, second_prompt, sampling_params, "third")

Test Result

For the last request:

[2025-08-26 05:36:37,718] LMCache INFO: Reqid: 3, Total tokens 6007, LMCache hit tokens: 5888, need to load: 5888 (vllm_v1_adapter.py:1091:lmcache.integration.vllm.vllm_v1_adapter)
[2025-08-26 05:36:37,820] LMCache INFO: Retrieved 5888 tokens (vllm_v1_adapter.py:822:lmcache.integration.vllm.vllm_v1_adapter)
Processed prompts: 100%|██████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.26it/s, est. speed input: 7594.87 toks/s, output: 12.64 toks/s]
--------------------------------------------------
Generated text: '.\n\nOkay, here we go...\n\nOnce'
Generation took 0.80 seconds, third request done.
--------------------------------------------------
[2025-08-26 05:36:44,886] LMCache INFO: Storage manager closed. (storage_manager.py:472:lmcache.v1.storage_backend.storage_manager)
[2025-08-26 05:36:48,332] LMCache INFO: LMCacheEngine closed. (cache_engine.py:965:lmcache.v1.cache_engine)

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
…o GPU memory, the inference results are wrong. Fix this first.

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
@mergify

mergify Bot commented Aug 26, 2025

Copy link
Copy Markdown
Contributor

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @KuntaiDu.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label Aug 26, 2025
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
@mergify mergify Bot removed the needs-rebase label Aug 26, 2025
@gemini-code-assist

Copy link
Copy Markdown
Contributor

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
heheda12345 and others added 6 commits September 14, 2025 17:26
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
…KuntaiDu/vllm into kuntai-support-hybrid-allocator

Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
Co-authored-by: heheda12345 <zhangch99@outlook.com>

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Comment thread vllm/config/__init__.py Outdated
…omments from @hmellor, and fix missing return value

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
@KuntaiDu KuntaiDu requested a review from ApostaC as a code owner September 18, 2025 05:56
… signature

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
@mergify mergify Bot added the kv-connector label Sep 18, 2025
@KuntaiDu

Copy link
Copy Markdown
Collaborator Author

Per @heheda12345 's suggestion, this PR will be separated to smaller PRs to reduce the review overhead.

@mergify

mergify Bot commented Sep 19, 2025

Copy link
Copy Markdown
Contributor

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @KuntaiDu.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label Sep 19, 2025
ivanium added a commit to ivanium/vllm that referenced this pull request Dec 14, 2025
Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>
Co-authored-by: KuntaiDu <kuntai@uchicago.edu>
ivanium added a commit to ivanium/vllm that referenced this pull request Dec 15, 2025
Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>
Co-authored-by: KuntaiDu <kuntai@uchicago.edu>
@github-actions

Copy link
Copy Markdown

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

@github-actions github-actions Bot added the stale Over 90 days of inactivity label Dec 20, 2025
@heheda12345

Copy link
Copy Markdown
Collaborator

we are working on this here #30166

ivanium added a commit to ivanium/vllm that referenced this pull request Dec 26, 2025
Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>
Co-authored-by: KuntaiDu <kuntai@uchicago.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kv-connector needs-rebase stale Over 90 days of inactivity tpu Related to Google TPUs v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants