[P/D][Nixl] Make kv cache register compatible with hybrid memory allocator by sfeng33 · Pull Request #23079 · vllm-project/vllm

sfeng33 · 2025-08-18T05:09:54Z

Purpose

This PR refactors the register_kv_caches method for nixl_connector, so that it works with or without hybrid memory allocator (HMA).
Partially fix #22292.

Background

With HMA, for models with hybrid attention, there can be less the number of kv cache's physical tensors, e.g., gemma-3-4b-it's tensor count drops from 34 to 5, where different layers can the same kv cache tensor.

Model	Attention Type	# of Layers	(No HMA) # of Tensors	(With HMA) # of Tensors
gpt-2	Full Attention only	12	12	12
gemma-3-4b-it	Full + Sliding Window Attention	34	34	5

In nixl_connector's method register_kv_caches(), the related two functionalities are:

It registers KV cache memory regions with NIXL for direct memory access.
It creates transfer descriptors for individual KV cache blocks.

This PR

Refactors KV cache memory regions registration by iterating through kv caches and registering unique memory address, e.g.

seen_base_addresses = set()
for layer_name, kv_cache_tensor in kv_caches:
    base_addr = cache.data_ptr()
    if base_addr not in seen_base_addresses:
        seen_base_addresses.add(base_addr)
        # register address with NIXL ...

The full implementation is on L742-L761 in nixl_connector.py

Remove the complex logic for block shape, block len, slot len, etc. These info are available in kv_cache_config, which is passed in from model runner. It already took care of the logic for different attention backends and shape calculation in _reshape_kv_cache_tensors(). We can also get the needed info from kv_cache_config regardless of HMA being on and off.

Test Plan

# Unit Test
python -m pytest tests/v1/kv_connector/unit/test_nixl_connector.py -v

# Integration Test
# Tested with prefill decode ratio (1,1) (2,2) (1,2) (2,4) (4,4)
./tests/v1/kv_connector/nixl_integration/run_accuracy_test.sh

# Manual Test
vllm serve google/gemma-3-4b-it \
  --port 8100 \
  --kv-transfer-config '{"kv_connector":"NixlConnector","kv_role":"kv_both"}'

github-actions · 2025-08-18T05:10:04Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request refactors the KV cache registration in NixlConnector to support a hybrid memory allocator by using KVCacheConfig. This simplifies the logic by removing device- and backend-specific inferences of the cache layout. The changes are well-aligned with the goal, but I've identified two issues. First, the register_kv_caches method in NixlConnector has a signature that is incompatible with its base class, which could cause runtime errors. Second, there's a critical assumption that all KV cache tensors are of the same size, which might not be true with a hybrid allocator and could lead to silent data corruption. I've provided suggestions to address both points.

sfeng33 · 2025-08-18T21:18:32Z

cc @robertgshaw2-redhat @NickLucche @njhill PTAL

heheda12345 · 2025-08-18T21:32:51Z

Can you join #feat-hybrid-allocator-kv-connector in slack to collaborate on kv connector + hybrid allocator?

njhill · 2025-08-18T22:07:09Z

cc @KuntaiDu

NickLucche

I think I like this initial refactoring, thanks for the work!
We still need to figure out the best logic+interface for sliding window attention layers. which is probably going to be the main thing about enabling hma here, but the block_len and kv cache sharing logic look good.

Can we add tests for the kv sharing case to the nixl suite?
Also, are all other nixl tests running fine with the changes?

sfeng33 · 2025-08-20T03:30:22Z

I think I like this initial refactoring, thanks for the work! We still need to figure out the best logic+interface for sliding window attention layers. which is probably going to be the main thing about enabling hma here, but the block_len and kv cache sharing logic look good.

Can we add tests for the kv sharing case to the nixl suite? Also, are all other nixl tests running fine with the changes?

Hey @NickLucche, thanks for the review! Before hma can be enabled in nixl connector, there is also work to update the start_load_kv. What is mainly missing is this part from the design doc:

Connector: layout of KV [layer, block_ids, …]
Hybrid allocator: layout of KV [# of groups, block_ids, …]
We need a mapping between [# of groups, block_ids, …] →  [layer, block_ids, …]

I added a unit test. For the integration test, is it mainly run_accuracy_test and run_edge_case_test?

NickLucche

Thanks a lot for adding the test, looks good now!

For the integration test, is it mainly run_accuracy_test and run_edge_case_test

Yep we just have to make sure run_accuracy_test passes. Could you also test out if this PR works fine with heteroTP? Just set PREFILLER_TP_SIZE and DECODER_TP_SIZE.
I have very limited access these days sorry :(

I also left a comment about a small test refactoring in light of upcoming changes, hope that's ok.

Other than that this is LGTM.

sfeng33 · 2025-08-20T23:37:20Z

Thanks a lot for adding the test, looks good now!

For the integration test, is it mainly run_accuracy_test and run_edge_case_test

Yep we just have to make sure run_accuracy_test passes. Could you also test out if this PR works fine with heteroTP? Just set PREFILLER_TP_SIZE and DECODER_TP_SIZE. I have very limited access these days sorry :(

I also left a comment about a small test refactoring in light of upcoming changes, hope that's ok.

Other than that this is LGTM.

Thanks @NickLucche! The unit test is updated now. I've run run_accuracy_test and validate it passes on prefill decode ratio (1,1) (2,2) (1,2) (2,4) (4,4). Please let me know if there is anything missing.

NickLucche

LGTM, thanks for the patience @sfeng33 !

sfeng33 · 2025-08-21T19:36:26Z

Thanks for the review! Since this PR requires approval from someone with write access, tagging @robertgshaw2-redhat and @njhill for a final look when you get a chance 🙏

njhill

Thanks @sfeng33 @NickLucche!

Signed-off-by: sfeng33 <4florafeng@gmail.com>

…cator (vllm-project#23079) Signed-off-by: sfeng33 <4florafeng@gmail.com> Signed-off-by: root <xwq391974@alibaba-inc.com>

…cator (vllm-project#23079) Signed-off-by: sfeng33 <4florafeng@gmail.com>

…cator (vllm-project#23079) Signed-off-by: sfeng33 <4florafeng@gmail.com> Signed-off-by: Xiao Yu <xiao.yu@amd.com>

…cator (vllm-project#23079) Signed-off-by: sfeng33 <4florafeng@gmail.com>

mergify Bot added v1 tpu Related to Google TPUs labels Aug 18, 2025

gemini-code-assist Bot reviewed Aug 18, 2025

View reviewed changes

Comment thread vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py Outdated

Comment thread vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py Outdated

sfeng33 changed the title ~~[P/D][Nixl] Make kv cache register compatible with hybrid memory allocator~~ [WIP][P/D][Nixl] Make kv cache register compatible with hybrid memory allocator Aug 18, 2025

sfeng33 marked this pull request as ready for review August 18, 2025 21:15

sfeng33 requested review from WoosukKwon, alexm-redhat, comaniac, njhill, robertgshaw2-redhat and ywang96 as code owners August 18, 2025 21:15

sfeng33 changed the title ~~[WIP][P/D][Nixl] Make kv cache register compatible with hybrid memory allocator~~ [P/D][Nixl] Make kv cache register compatible with hybrid memory allocator Aug 18, 2025

sfeng33 force-pushed the pd branch from 832016c to 09661d0 Compare August 19, 2025 01:37

NickLucche requested changes Aug 19, 2025

View reviewed changes

Comment thread vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py Outdated

Comment thread vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py Outdated

Comment thread vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py Outdated

mergify Bot removed the tpu Related to Google TPUs label Aug 20, 2025

NickLucche requested changes Aug 20, 2025

View reviewed changes

Comment thread tests/v1/kv_connector/unit/test_nixl_connector.py Outdated

sfeng33 force-pushed the pd branch from d23d358 to e5ab98c Compare August 21, 2025 01:36

NickLucche approved these changes Aug 21, 2025

View reviewed changes

njhill approved these changes Aug 21, 2025

View reviewed changes

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 21, 2025

sfeng33 force-pushed the pd branch from e5ab98c to 8816499 Compare August 21, 2025 22:40

sfeng33 added 2 commits August 21, 2025 18:01

register cache

17a2ac7

Signed-off-by: sfeng33 <4florafeng@gmail.com>

fix

95f0101

Signed-off-by: sfeng33 <4florafeng@gmail.com>

sfeng33 added 4 commits August 21, 2025 18:01

pre-commit

b88fe89

Signed-off-by: sfeng33 <4florafeng@gmail.com>

fix test

8cdcff3

Signed-off-by: sfeng33 <4florafeng@gmail.com>

Add test

0fcba51

Signed-off-by: sfeng33 <4florafeng@gmail.com>

fix test

9e9619b

Signed-off-by: sfeng33 <4florafeng@gmail.com>

sfeng33 force-pushed the pd branch from 8816499 to 9e9619b Compare August 22, 2025 01:01

njhill merged commit 5341565 into vllm-project:main Aug 22, 2025
42 checks passed

sfeng33 deleted the pd branch August 24, 2025 20:44

KuntaiDu mentioned this pull request Aug 26, 2025

[Core][Hybrid allocator + connector] Support hybrid allocator + kv cache connector #23624

Closed

5 tasks

epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025

[P/D][Nixl] Make kv cache register compatible with hybrid memory allo…

3748cf9

…cator (vllm-project#23079) Signed-off-by: sfeng33 <4florafeng@gmail.com>

xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025

[P/D][Nixl] Make kv cache register compatible with hybrid memory allo…

b756915

…cator (vllm-project#23079) Signed-off-by: sfeng33 <4florafeng@gmail.com> Signed-off-by: Xiao Yu <xiao.yu@amd.com>

zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025

[P/D][Nixl] Make kv cache register compatible with hybrid memory allo…

8b25ae8

…cator (vllm-project#23079) Signed-off-by: sfeng33 <4florafeng@gmail.com>

mengxingkongzhouhan pushed a commit to mengxingkongzhouhan/vllm that referenced this pull request Aug 30, 2025

[P/D][Nixl] Make kv cache register compatible with hybrid memory allo…

fa40868

…cator (vllm-project#23079) Signed-off-by: sfeng33 <4florafeng@gmail.com>

zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Sep 3, 2025

[P/D][Nixl] Make kv cache register compatible with hybrid memory allo…

8169f16

…cator (vllm-project#23079) Signed-off-by: sfeng33 <4florafeng@gmail.com>

zhenwei-intel pushed a commit to zhenwei-intel/vllm that referenced this pull request Sep 10, 2025

[P/D][Nixl] Make kv cache register compatible with hybrid memory allo…

3a9be26

…cator (vllm-project#23079) Signed-off-by: sfeng33 <4florafeng@gmail.com>

ABC12345anouys pushed a commit to ABC12345anouys/vllm that referenced this pull request Sep 25, 2025

[P/D][Nixl] Make kv cache register compatible with hybrid memory allo…

9922fe9

…cator (vllm-project#23079) Signed-off-by: sfeng33 <4florafeng@gmail.com>

mystous pushed a commit to mystous/vllm_hybrid that referenced this pull request May 10, 2026

[P/D][Nixl] Make kv cache register compatible with hybrid memory allo…

817456f

…cator (vllm-project#23079) Signed-off-by: sfeng33 <4florafeng@gmail.com>

my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request May 15, 2026

[P/D][Nixl] Make kv cache register compatible with hybrid memory allo…

4aaa65a

…cator (vllm-project#23079) Signed-off-by: sfeng33 <4florafeng@gmail.com>

my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request May 15, 2026

[P/D][Nixl] Make kv cache register compatible with hybrid memory allo…

b11a1d5

…cator (vllm-project#23079) Signed-off-by: sfeng33 <4florafeng@gmail.com>

0826joyce pushed a commit to 0826joyce/vllm-serving-optimization that referenced this pull request May 19, 2026

[P/D][Nixl] Make kv cache register compatible with hybrid memory allo…

702ad5c

…cator (vllm-project#23079) Signed-off-by: sfeng33 <4florafeng@gmail.com>

Uh oh!

Conversation

sfeng33 commented Aug 18, 2025 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Background

This PR

Test Plan

Uh oh!

github-actions Bot commented Aug 18, 2025

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

sfeng33 commented Aug 18, 2025

Uh oh!

heheda12345 commented Aug 18, 2025

Uh oh!

njhill commented Aug 18, 2025

Uh oh!

NickLucche left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sfeng33 commented Aug 20, 2025

Uh oh!

NickLucche left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sfeng33 commented Aug 20, 2025

Uh oh!

NickLucche left a comment

Choose a reason for hiding this comment

Uh oh!

sfeng33 commented Aug 21, 2025

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sfeng33 commented Aug 18, 2025 •

edited by github-actions Bot

Loading

NickLucche left a comment •

edited

Loading