Skip to content

[Refactor]: add a centralized CreateGPUConnector()#2530

Merged
sammshen merged 8 commits intoLMCache:devfrom
sammshen:reorg-gpu-connector
Feb 5, 2026
Merged

[Refactor]: add a centralized CreateGPUConnector()#2530
sammshen merged 8 commits intoLMCache:devfrom
sammshen:reorg-gpu-connector

Conversation

@sammshen
Copy link
Copy Markdown
Contributor

@sammshen sammshen commented Feb 2, 2026

  • Add logic to find local_worker_id and local_world_size
  • Move all GPU Creation logic to a new folder in lmcache/v1/gpu_connector/__init__.py

This prepares:

  • gpu kv cache format
  • fix flash infer format for multilayer connector

Signed-off-by: Samuel Shen <slshen@uchciago.edu>
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @sammshen, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the GPU connector creation and management within the LMCache system. By introducing a centralized factory function and reorganizing related utility code into dedicated modules, it aims to improve the clarity, maintainability, and extensibility of how GPU connectors are initialized and utilized across different integration points like SGLang and vLLM. The changes streamline device handling and parallel configuration logic, making the system more robust and easier to understand.

Highlights

  • Centralized GPU Connector Creation: Introduced a new CreateGPUConnector factory function in lmcache/v1/gpu_connector/__init__.py to centralize the logic for instantiating various GPU connector types (SGLang, vLLM) based on configuration and metadata. This replaces scattered conditional logic for connector creation.
  • Code Reorganization and Modularity: Refactored existing GPU connector-related code into new, more specific modules. The main gpu_connector.py file was renamed to gpu_connectors.py to specifically house connector implementations. Helper functions for memory copy (lmcache_memcpy_async_h2d, lmcache_memcpy_async_d2h) were moved to lmcache/v1/gpu_connector/gpu_ops.py, and the need_gpu_interm_buffer utility was moved to lmcache/v1/gpu_connector/utils.py.
  • Improved vLLM Integration Utilities: Added new utility functions get_vllm_torch_dev and calculate_local_rank_and_world_size to lmcache/integration/vllm/utils.py. These functions streamline device detection and local rank calculation, especially for multi-node vLLM deployments, and are now utilized by the centralized GPU connector creation.
  • Simplified Manager Logic: The LMCacheManager in lmcache/v1/manager.py has been significantly simplified by removing internal methods (_get_device_info, _create_gpu_connector, _need_gpu_interm_buffer) and delegating their responsibilities to the new centralized CreateGPUConnector function and utility modules.
  • Updated Imports Across Project: Numerous files across the lmcache project, including sglang_adapter.py, cache_engine.py, multiprocess/server.py, standalone/__main__.py, xpu_connector.py, and various test files, have been updated to reflect the new module structure and import paths for GPU connector components.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • lmcache/integration/sglang/sglang_adapter.py
    • Removed need_gpu_interm_buffer function (moved to lmcache/v1/gpu_connector/utils.py).
    • Removed direct torch.cuda.device and torch.device calls.
    • Replaced conditional SGLangGPUConnector instantiation with CreateGPUConnector call.
    • Updated imports to reflect new CreateGPUConnector function.
  • lmcache/integration/vllm/utils.py
    • Added VllmConfig to TYPE_CHECKING imports.
    • Introduced get_vllm_torch_dev function for platform-agnostic torch device retrieval.
    • Added calculate_local_rank_and_world_size function for computing local worker ID and world size in parallel environments.
  • lmcache/v1/cache_engine.py
    • Updated import path for GPUConnectorInterface and related GPU connector classes to lmcache.v1.gpu_connector.gpu_connectors.
  • lmcache/v1/gpu_connector/init.py
    • New file: Implemented CreateGPUConnector factory function to centralize GPU connector instantiation logic.
  • lmcache/v1/gpu_connector/gpu_connectors.py
    • Renamed from lmcache/v1/gpu_connector.py.
    • Removed lmcache_memcpy_async_h2d and lmcache_memcpy_async_d2h helper functions (moved to lmcache/v1/gpu_connector/gpu_ops.py).
    • Removed import of LazyMemoryAllocator.
  • lmcache/v1/gpu_connector/gpu_ops.py
    • New file: Contains lmcache_memcpy_async_h2d and lmcache_memcpy_async_d2h functions, moved from lmcache/v1/gpu_connector.py.
  • lmcache/v1/gpu_connector/utils.py
    • New file: Contains need_gpu_interm_buffer function, moved from lmcache/integration/sglang/sglang_adapter.py.
  • lmcache/v1/manager.py
    • Imported CreateGPUConnector from lmcache.v1.gpu_connector.
    • Imported calculate_local_rank_and_world_size from lmcache.integration.vllm.utils.
    • Removed _get_device_info, _create_gpu_connector, and _need_gpu_interm_buffer methods.
    • Updated _create_lmcache_engine to use new utility functions and CreateGPUConnector for vLLM GPU connector instantiation.
    • Removed import of vllm.platforms.current_platform.
  • lmcache/v1/multiprocess/server.py
    • Updated import paths for lmcache_memcpy_async_d2h and lmcache_memcpy_async_h2d to lmcache.v1.gpu_connector.gpu_ops.
  • lmcache/v1/standalone/main.py
    • Updated import path for VLLMPagedMemGPUConnectorV2 to lmcache.v1.gpu_connector.gpu_connectors.
  • lmcache/v1/xpu_connector.py
    • Updated import path for VLLMPagedMemGPUConnectorV2 to lmcache.v1.gpu_connector.gpu_connectors.
  • tests/disagg/test_nixl_cache_engine.py
    • Updated import path for VLLMPagedMemGPUConnectorV2 to lmcache.v1.gpu_connector.gpu_connectors.
  • tests/v1/test_gpu_connector.py
    • Updated import paths for various GPU connector classes to lmcache.v1.gpu_connector.gpu_connectors.
  • tests/v1/utils.py
    • Updated import path for VLLMPagedMemGPUConnectorV2 to lmcache.v1.gpu_connector.gpu_connectors.
Activity
  • The pull request was opened by sammshen with the title "[Refactor]: add a centralized CreateGPUConnector()".
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable refactoring by centralizing the GPU connector creation logic into a CreateGPUConnector factory function. This significantly improves code maintainability by removing duplicated logic from sglang_adapter.py and manager.py. The new gpu_connector package is well-structured, and the added utility for calculating local rank and world size enhances multi-node support for vLLM. I've identified one potential issue regarding XPU platform support for layerwise connectors and a minor opportunity for code simplification.

Comment thread lmcache/v1/gpu_connector/__init__.py
Comment thread lmcache/v1/gpu_connector/utils.py
@sammshen sammshen mentioned this pull request Feb 2, 2026
12 tasks
Samuel Shen added 2 commits February 2, 2026 22:44
Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Signed-off-by: Samuel Shen <slshen@uchciago.edu>
@sammshen sammshen added the full Run comprehensive tests on this PR label Feb 2, 2026
Comment thread lmcache/integration/vllm/utils.py Outdated
return torch_dev, dev_name


def calculate_local_rank_and_world_size(vllm_config: "VllmConfig"):
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic to find local worker id

dtype=kv_dtype,
device=device,
)
gpu_connector = CreateGPUConnector(config, metadata, "sglang")
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

format can remove the passing of engine string like "sglang" in the future



def calculate_local_rank_and_world_size(vllm_config: "VllmConfig"):
"""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update doc string and return type hint

Comment thread lmcache/v1/standalone/__main__.py Outdated
from lmcache.v1.config import LMCacheEngineConfig
from lmcache.v1.config_base import parse_command_line_extra_params
from lmcache.v1.gpu_connector import VLLMPagedMemGPUConnectorV2
from lmcache.v1.gpu_connector.gpu_connectors import VLLMPagedMemGPUConnectorV2
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See if we can use CreateGPUConnector() in this file

Comment thread lmcache/v1/cache_engine.py Outdated
from lmcache.v1.config import LMCacheEngineConfig
from lmcache.v1.event_manager import EventManager, EventStatus, EventType
from lmcache.v1.gpu_connector import (
from lmcache.v1.gpu_connector.gpu_connectors import (
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See if we can use CreateGPUConnector and do not care about which specific type it is

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to gpu_connector/ folder

Copy link
Copy Markdown
Contributor

@ApostaC ApostaC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM!

Samuel Shen added 3 commits February 3, 2026 19:56
Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Copy link
Copy Markdown
Collaborator

@DongDongJu DongDongJu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @sammshen, thanks for the continuous hard working for refactoring.
I left one nit request and few comments but generally looks good!

Comment thread lmcache/integration/vllm/utils.py
device=device,
)
else:
return SGLangGPUConnector(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure that this case is still working with actual integration case in sglang codes.
@Oasis-Git Do you think that remove this case and forcing to layerwise mode is too brutal?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this logic has not changed at all

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I saw that. I meant that SGLangGPUConnector integration is not valid anymore If I remember correctly.
So asked the opinion about removal after this PR.

Comment thread lmcache/v1/manager.py
Comment thread lmcache/v1/manager.py
Signed-off-by: Samuel Shen <slshen@uchciago.edu>
@sammshen
Copy link
Copy Markdown
Contributor Author

sammshen commented Feb 4, 2026

@hickeyma seems like all the github runners just broke.

Update: should be fixed by: #2543

Copy link
Copy Markdown
Collaborator

@DongDongJu DongDongJu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for hard working!

Copy link
Copy Markdown
Contributor

@ApostaC ApostaC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@sammshen sammshen merged commit e5f2854 into LMCache:dev Feb 5, 2026
24 checks passed
@hickeyma hickeyma mentioned this pull request Feb 6, 2026
2 tasks
DongDongJu pushed a commit to DongDongJu/LMCache that referenced this pull request Feb 22, 2026
* CreateGPUConnector()

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* move the import outside of TYPE_CHECKING

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix need_gpu_interm_buffer UT

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* address AspostaC's comments

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* move xpu connector

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

---------

Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Co-authored-by: Samuel Shen <slshen@uchciago.edu>
sammshen added a commit to sammshen/LMCache that referenced this pull request Mar 1, 2026
* CreateGPUConnector()

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* move the import outside of TYPE_CHECKING

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix need_gpu_interm_buffer UT

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* address AspostaC's comments

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* move xpu connector

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

---------

Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Co-authored-by: Samuel Shen <slshen@uchciago.edu>
shaoxiawjc pushed a commit to shaoxiawjc/LMCache that referenced this pull request Mar 11, 2026
* CreateGPUConnector()

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* move the import outside of TYPE_CHECKING

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix need_gpu_interm_buffer UT

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* address AspostaC's comments

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* move xpu connector

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

---------

Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Co-authored-by: Samuel Shen <slshen@uchciago.edu>
Signed-off-by: shaoxiawjc <wjc2800@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

full Run comprehensive tests on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants