[Core][Hybrid allocator + connector 2/n] Unify `remove_skipped_blocks` by `get_last_useful_token` by KuntaiDu · Pull Request #25431 · vllm-project/vllm

KuntaiDu · 2025-09-22T23:33:38Z

Purpose

This PR is separated from #23624 that aims to enable allocating tokens inside sliding window when long prefix is not cached in vLLM but cached in connector.

This PR unifies remove_skipped_blocks by adding a new function get_last_useful_token.

Test Plan

pytest -v -s tests/v1/core/test_single_type_kv_cache_manager.py

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

…he attention window Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

gemini-code-assist

Code Review

This pull request refactors the remove_skipped_blocks method by moving its implementation to the base SingleTypeKVCacheManager class, delegating attention-specific logic to a new get_last_useful_token method. This is a good simplification for most attention managers. However, this refactoring has introduced a critical issue for ChunkedLocalAttentionManager by removing logic essential for maintaining prefix cache integrity. My review focuses on this critical regression.

heheda12345 · 2025-10-02T04:24:01Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

sarckk · 2025-10-29T07:03:13Z

@KuntaiDu hi what's the status of this PR?

KuntaiDu · 2025-11-04T00:10:14Z

Now circle back to this PR. Was making Ray Summit talk slides lol.

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

heheda12345

LGTM! Only some nits related to the function name update in the last pass.

KuntaiDu · 2025-11-04T20:31:17Z

Now working on correctness test on ChunkedLocalAttention using Llama 4. The plan is to use lm_eval on long-context dataset, and make sure the GPU memory is small enough to trigger frequent chunked block eviction.

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

KuntaiDu · 2025-11-04T23:22:31Z

Fixed the comments. Thanks for the feedback @heheda12345

heheda12345 · 2025-11-05T00:30:40Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

heheda12345

LGTM! Can you adjust the figures a little bit?

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

heheda12345

LGTM!

…` by `get_last_useful_token` (vllm-project#25431) Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

KuntaiDu added 2 commits September 22, 2025 16:15

Introduce get_last_useful_token to get the last useful token inside t…

9ca160e

…he attention window Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

add comments

c33ecd0

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

KuntaiDu requested review from ApostaC, WoosukKwon, alexm-redhat, comaniac, heheda12345, njhill, robertgshaw2-redhat and ywang96 as code owners September 22, 2025 23:33

mergify Bot added the v1 label Sep 22, 2025

gemini-code-assist Bot reviewed Sep 22, 2025

View reviewed changes

KuntaiDu mentioned this pull request Sep 23, 2025

[Core][Hybrid allocator + connector 1/n] Enable KV cache connector + hybrid allocator #25363

Closed

5 tasks

heheda12345 reviewed Oct 2, 2025

View reviewed changes

Comment thread vllm/v1/core/single_type_kv_cache_manager.py Outdated

Comment thread vllm/v1/core/single_type_kv_cache_manager.py

Comment thread vllm/v1/core/single_type_kv_cache_manager.py Outdated

chatgpt-codex-connector Bot reviewed Oct 2, 2025

View reviewed changes

Comment thread vllm/v1/core/single_type_kv_cache_manager.py Outdated

KuntaiDu added 2 commits November 3, 2025 16:15

resolve merge conflicts

f278b81

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

handle Chen's suggestion

ac0a895

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

heheda12345 reviewed Nov 4, 2025

View reviewed changes

Comment thread vllm/v1/core/single_type_kv_cache_manager.py Outdated

Comment thread vllm/v1/core/single_type_kv_cache_manager.py Outdated

Comment thread vllm/v1/core/single_type_kv_cache_manager.py Outdated

KuntaiDu added 3 commits November 4, 2025 15:09

adjust the docstring

8cd2b21

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

adjust the docstring

f91f828

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

mention num_skipped_tokens in docstring

8b0592e

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

chatgpt-codex-connector Bot reviewed Nov 5, 2025

View reviewed changes

Comment thread vllm/v1/core/single_type_kv_cache_manager.py

heheda12345 approved these changes Nov 5, 2025

View reviewed changes

Comment thread vllm/v1/core/single_type_kv_cache_manager.py Outdated

Comment thread vllm/v1/core/single_type_kv_cache_manager.py Outdated

Comment thread vllm/v1/core/single_type_kv_cache_manager.py Outdated

adjust figure

1207f5e

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

Merge branch 'main' into kuntai-add-get-last-useful-token

522f085

KuntaiDu requested a review from heheda12345 November 5, 2025 08:10

heheda12345 approved these changes Nov 5, 2025

View reviewed changes

heheda12345 enabled auto-merge (squash) November 5, 2025 19:06

github-actions Bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 5, 2025

heheda12345 merged commit efe73e9 into vllm-project:main Nov 6, 2025
47 checks passed

KuntaiDu deleted the kuntai-add-get-last-useful-token branch November 6, 2025 21:00

ZhengHongming888 pushed a commit to ZhengHongming888/vllm that referenced this pull request Nov 8, 2025

[Core][Hybrid allocator + connector 2/n] Unify `remove_skipped_blocks…

dd6aa15

…` by `get_last_useful_token` (vllm-project#25431) Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

heheda12345 mentioned this pull request Nov 24, 2025

[V1] [Hybrid] Lighter Mamba Prefix Caching with standard memory layout #29272

Closed

5 tasks

devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025

[Core][Hybrid allocator + connector 2/n] Unify `remove_skipped_blocks…

ca511d7

…` by `get_last_useful_token` (vllm-project#25431) Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

mystous pushed a commit to mystous/vllm_hybrid that referenced this pull request May 10, 2026

[Core][Hybrid allocator + connector 2/n] Unify `remove_skipped_blocks…

928aad0

…` by `get_last_useful_token` (vllm-project#25431) Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

0826joyce pushed a commit to 0826joyce/vllm-serving-optimization that referenced this pull request May 19, 2026

[Core][Hybrid allocator + connector 2/n] Unify `remove_skipped_blocks…

99ba7c6

…` by `get_last_useful_token` (vllm-project#25431) Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

Uh oh!

Conversation

KuntaiDu commented Sep 22, 2025 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

heheda12345 commented Oct 2, 2025

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

sarckk commented Oct 29, 2025

Uh oh!

KuntaiDu commented Nov 4, 2025

Uh oh!

heheda12345 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

KuntaiDu commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KuntaiDu commented Nov 4, 2025

Uh oh!

heheda12345 commented Nov 5, 2025

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

heheda12345 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

heheda12345 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

KuntaiDu commented Sep 22, 2025 •

edited by github-actions Bot

Loading

KuntaiDu commented Nov 4, 2025 •

edited

Loading