[Refactor]: add a centralized CreateGPUConnector() by sammshen · Pull Request #2530 · LMCache/LMCache

sammshen · 2026-02-02T22:30:33Z

Add logic to find local_worker_id and local_world_size
Move all GPU Creation logic to a new folder in lmcache/v1/gpu_connector/__init__.py

This prepares:

gpu kv cache format
fix flash infer format for multilayer connector

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

gemini-code-assist · 2026-02-02T22:31:00Z

Summary of Changes

Hello @sammshen, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the GPU connector creation and management within the LMCache system. By introducing a centralized factory function and reorganizing related utility code into dedicated modules, it aims to improve the clarity, maintainability, and extensibility of how GPU connectors are initialized and utilized across different integration points like SGLang and vLLM. The changes streamline device handling and parallel configuration logic, making the system more robust and easier to understand.

Highlights

Centralized GPU Connector Creation: Introduced a new CreateGPUConnector factory function in lmcache/v1/gpu_connector/__init__.py to centralize the logic for instantiating various GPU connector types (SGLang, vLLM) based on configuration and metadata. This replaces scattered conditional logic for connector creation.
Code Reorganization and Modularity: Refactored existing GPU connector-related code into new, more specific modules. The main gpu_connector.py file was renamed to gpu_connectors.py to specifically house connector implementations. Helper functions for memory copy (lmcache_memcpy_async_h2d, lmcache_memcpy_async_d2h) were moved to lmcache/v1/gpu_connector/gpu_ops.py, and the need_gpu_interm_buffer utility was moved to lmcache/v1/gpu_connector/utils.py.
Improved vLLM Integration Utilities: Added new utility functions get_vllm_torch_dev and calculate_local_rank_and_world_size to lmcache/integration/vllm/utils.py. These functions streamline device detection and local rank calculation, especially for multi-node vLLM deployments, and are now utilized by the centralized GPU connector creation.
Simplified Manager Logic: The LMCacheManager in lmcache/v1/manager.py has been significantly simplified by removing internal methods (_get_device_info, _create_gpu_connector, _need_gpu_interm_buffer) and delegating their responsibilities to the new centralized CreateGPUConnector function and utility modules.
Updated Imports Across Project: Numerous files across the lmcache project, including sglang_adapter.py, cache_engine.py, multiprocess/server.py, standalone/__main__.py, xpu_connector.py, and various test files, have been updated to reflect the new module structure and import paths for GPU connector components.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

lmcache/integration/sglang/sglang_adapter.py
- Removed need_gpu_interm_buffer function (moved to lmcache/v1/gpu_connector/utils.py).
- Removed direct torch.cuda.device and torch.device calls.
- Replaced conditional SGLangGPUConnector instantiation with CreateGPUConnector call.
- Updated imports to reflect new CreateGPUConnector function.
lmcache/integration/vllm/utils.py
- Added VllmConfig to TYPE_CHECKING imports.
- Introduced get_vllm_torch_dev function for platform-agnostic torch device retrieval.
- Added calculate_local_rank_and_world_size function for computing local worker ID and world size in parallel environments.
lmcache/v1/cache_engine.py
- Updated import path for GPUConnectorInterface and related GPU connector classes to lmcache.v1.gpu_connector.gpu_connectors.
lmcache/v1/gpu_connector/init.py
- New file: Implemented CreateGPUConnector factory function to centralize GPU connector instantiation logic.
lmcache/v1/gpu_connector/gpu_connectors.py
- Renamed from lmcache/v1/gpu_connector.py.
- Removed lmcache_memcpy_async_h2d and lmcache_memcpy_async_d2h helper functions (moved to lmcache/v1/gpu_connector/gpu_ops.py).
- Removed import of LazyMemoryAllocator.
lmcache/v1/gpu_connector/gpu_ops.py
- New file: Contains lmcache_memcpy_async_h2d and lmcache_memcpy_async_d2h functions, moved from lmcache/v1/gpu_connector.py.
lmcache/v1/gpu_connector/utils.py
- New file: Contains need_gpu_interm_buffer function, moved from lmcache/integration/sglang/sglang_adapter.py.
lmcache/v1/manager.py
- Imported CreateGPUConnector from lmcache.v1.gpu_connector.
- Imported calculate_local_rank_and_world_size from lmcache.integration.vllm.utils.
- Removed _get_device_info, _create_gpu_connector, and _need_gpu_interm_buffer methods.
- Updated _create_lmcache_engine to use new utility functions and CreateGPUConnector for vLLM GPU connector instantiation.
- Removed import of vllm.platforms.current_platform.
lmcache/v1/multiprocess/server.py
- Updated import paths for lmcache_memcpy_async_d2h and lmcache_memcpy_async_h2d to lmcache.v1.gpu_connector.gpu_ops.
lmcache/v1/standalone/main.py
- Updated import path for VLLMPagedMemGPUConnectorV2 to lmcache.v1.gpu_connector.gpu_connectors.
lmcache/v1/xpu_connector.py
- Updated import path for VLLMPagedMemGPUConnectorV2 to lmcache.v1.gpu_connector.gpu_connectors.
tests/disagg/test_nixl_cache_engine.py
- Updated import path for VLLMPagedMemGPUConnectorV2 to lmcache.v1.gpu_connector.gpu_connectors.
tests/v1/test_gpu_connector.py
- Updated import paths for various GPU connector classes to lmcache.v1.gpu_connector.gpu_connectors.
tests/v1/utils.py
- Updated import path for VLLMPagedMemGPUConnectorV2 to lmcache.v1.gpu_connector.gpu_connectors.

Activity

The pull request was opened by sammshen with the title "[Refactor]: add a centralized CreateGPUConnector()".

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a valuable refactoring by centralizing the GPU connector creation logic into a CreateGPUConnector factory function. This significantly improves code maintainability by removing duplicated logic from sglang_adapter.py and manager.py. The new gpu_connector package is well-structured, and the added utility for calculating local rank and world size enhances multi-node support for vLLM. I've identified one potential issue regarding XPU platform support for layerwise connectors and a minor opportunity for code simplification.

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

sammshen · 2026-02-02T23:28:45Z

+    return torch_dev, dev_name
+
+
+def calculate_local_rank_and_world_size(vllm_config: "VllmConfig"):


logic to find local worker id

sammshen · 2026-02-02T23:29:05Z

-            dtype=kv_dtype,
-            device=device,
-        )
+    gpu_connector = CreateGPUConnector(config, metadata, "sglang")


format can remove the passing of engine string like "sglang" in the future

ApostaC · 2026-02-03T18:46:28Z

+
+
+def calculate_local_rank_and_world_size(vllm_config: "VllmConfig"):
+    """


Update doc string and return type hint

ApostaC · 2026-02-03T18:48:18Z

 from lmcache.v1.config import LMCacheEngineConfig
 from lmcache.v1.config_base import parse_command_line_extra_params
-from lmcache.v1.gpu_connector import VLLMPagedMemGPUConnectorV2
+from lmcache.v1.gpu_connector.gpu_connectors import VLLMPagedMemGPUConnectorV2


See if we can use CreateGPUConnector() in this file

ApostaC · 2026-02-03T18:49:13Z

 from lmcache.v1.config import LMCacheEngineConfig
 from lmcache.v1.event_manager import EventManager, EventStatus, EventType
-from lmcache.v1.gpu_connector import (
+from lmcache.v1.gpu_connector.gpu_connectors import (


See if we can use CreateGPUConnector and do not care about which specific type it is

ApostaC · 2026-02-03T18:50:51Z

move to gpu_connector/ folder

ApostaC

Otherwise LGTM!

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

…onnector

DongDongJu

Hello @sammshen, thanks for the continuous hard working for refactoring.
I left one nit request and few comments but generally looks good!

DongDongJu · 2026-02-03T21:41:02Z

+                device=device,
+            )
+        else:
+            return SGLangGPUConnector(


Not sure that this case is still working with actual integration case in sglang codes.
@Oasis-Git Do you think that remove this case and forcing to layerwise mode is too brutal?

this logic has not changed at all

Yes I saw that. I meant that SGLangGPUConnector integration is not valid anymore If I remember correctly.
So asked the opinion about removal after this PR.

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

sammshen · 2026-02-04T11:27:44Z

@hickeyma seems like all the github runners just broke.

Update: should be fixed by: #2543

DongDongJu

LGTM. Thanks for hard working!

ApostaC

LGTM!

* CreateGPUConnector() Signed-off-by: Samuel Shen <slshen@uchciago.edu> * move the import outside of TYPE_CHECKING Signed-off-by: Samuel Shen <slshen@uchciago.edu> * fix need_gpu_interm_buffer UT Signed-off-by: Samuel Shen <slshen@uchciago.edu> * address AspostaC's comments Signed-off-by: Samuel Shen <slshen@uchciago.edu> * move xpu connector Signed-off-by: Samuel Shen <slshen@uchciago.edu> --------- Signed-off-by: Samuel Shen <slshen@uchciago.edu> Co-authored-by: Samuel Shen <slshen@uchciago.edu>

* CreateGPUConnector() Signed-off-by: Samuel Shen <slshen@uchciago.edu> * move the import outside of TYPE_CHECKING Signed-off-by: Samuel Shen <slshen@uchciago.edu> * fix need_gpu_interm_buffer UT Signed-off-by: Samuel Shen <slshen@uchciago.edu> * address AspostaC's comments Signed-off-by: Samuel Shen <slshen@uchciago.edu> * move xpu connector Signed-off-by: Samuel Shen <slshen@uchciago.edu> --------- Signed-off-by: Samuel Shen <slshen@uchciago.edu> Co-authored-by: Samuel Shen <slshen@uchciago.edu> Signed-off-by: shaoxiawjc <wjc2800@163.com>

CreateGPUConnector()

2405eac

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

gemini-code-assist Bot reviewed Feb 2, 2026

View reviewed changes

Comment thread lmcache/v1/gpu_connector/__init__.py

Comment thread lmcache/v1/gpu_connector/utils.py

sammshen mentioned this pull request Feb 2, 2026

[Refactor]: Northbound #2458

Open

12 tasks

Samuel Shen added 2 commits February 2, 2026 22:44

move the import outside of TYPE_CHECKING

9dd1f7c

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

fix need_gpu_interm_buffer UT

a68fed1

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

sammshen added the full Run comprehensive tests on this PR label Feb 2, 2026

sammshen requested review from ApostaC and DongDongJu February 2, 2026 23:26

sammshen commented Feb 2, 2026

View reviewed changes

ApostaC reviewed Feb 3, 2026

View reviewed changes

Samuel Shen added 3 commits February 3, 2026 19:56

address AspostaC's comments

82c4bd9

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

move xpu connector

48fc0ae

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

Merge branch 'dev' of github-samuel:sammshen/LMCache into reorg-gpu-c…

49af78b

…onnector

DongDongJu requested changes Feb 3, 2026

View reviewed changes

merge latest

e8c4c52

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

sammshen requested review from ApostaC and DongDongJu February 4, 2026 11:23

DongDongJu approved these changes Feb 4, 2026

View reviewed changes

Merge branch 'dev' into reorg-gpu-connector

6294ec9

ApostaC approved these changes Feb 5, 2026

View reviewed changes

sammshen merged commit e5f2854 into LMCache:dev Feb 5, 2026
24 checks passed

hickeyma mentioned this pull request Feb 6, 2026

[Core] Add enum for EngineType #2555

Merged

2 tasks

		return torch_dev, dev_name


		def calculate_local_rank_and_world_size(vllm_config: "VllmConfig"):

Conversation

sammshen commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot commented Feb 2, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ApostaC left a comment

Choose a reason for hiding this comment

Uh oh!

DongDongJu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sammshen commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DongDongJu left a comment

Choose a reason for hiding this comment

Uh oh!

ApostaC left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sammshen commented Feb 2, 2026 •

edited

Loading

sammshen commented Feb 4, 2026 •

edited

Loading