Skip to content

Remove vllm deps from lookup client/server#2376

Merged
chunxiaozheng merged 4 commits intoLMCache:devfrom
maobaolong:removeVllmFromLookupClient
Jan 12, 2026
Merged

Remove vllm deps from lookup client/server#2376
chunxiaozheng merged 4 commits intoLMCache:devfrom
maobaolong:removeVllmFromLookupClient

Conversation

@maobaolong
Copy link
Copy Markdown
Collaborator

What this PR does / why we need it:

Special notes for your reviewers:

If applicable:

  • this PR contains user facing changes - docs added
  • this PR contains unit tests

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @maobaolong, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request aims to enhance the modularity and reduce the coupling of the LMCache lookup client and server with the vLLM framework. By extracting vLLM-specific configuration parameters into the LMCacheEngineMetadata object, the lookup components can now operate with a more generalized set of metadata, making them more independent and potentially reusable in different contexts. This change primarily involves updating function signatures and parameter passing across several files related to lookup client/server creation and RPC utility functions.

Highlights

  • Decoupling from vLLM Config: The lookup client and server components no longer directly depend on vllm_config, promoting greater modularity.
  • Centralized Metadata for RPC: Essential configuration details (like engine_id, num_ranks, kv_connector_extra_config) are now encapsulated and passed via the LMCacheEngineMetadata object.
  • Refactored RPC Path Generation: The get_zmq_rpc_path_lmcache utility function has been updated to accept engine_id directly, further reducing vLLM specific dependencies and improving clarity.
  • Cleanup of Type Hinting: Unnecessary TYPE_CHECKING imports related to vllm.config have been removed from several files, simplifying the codebase and reducing potential import issues.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively decouples the lookup client and server from vllm dependencies by introducing new fields in LMCacheEngineMetadata and passing this object instead of the vllm_config. This is a great refactoring that improves modularity and maintainability. The changes are applied consistently across all relevant files. I also noticed and appreciate that a bug in LMCacheAsyncLookupServer.close() has been fixed and logging statements have been updated to use %-style formatting, which is a good practice for performance. I've found one minor issue with a misleading error message and have left a comment with a suggestion to fix it. Overall, this is a high-quality contribution.

Comment thread lmcache/v1/rpc_utils.py
Signed-off-by: baoloongmao <baoloongmao@tencent.com>
@maobaolong maobaolong force-pushed the removeVllmFromLookupClient branch from ba54e88 to ca94d41 Compare January 11, 2026 05:47
Signed-off-by: baoloongmao <baoloongmao@tencent.com>
@maobaolong
Copy link
Copy Markdown
Collaborator Author

@sammshen Would you like to take a look at this PR? Thanks!

Comment thread lmcache/integration/vllm/utils.py Outdated
head_size = model_cfg.get_head_size()
kv_shape = (num_layer, 1 if use_mla else 2, chunk_size, num_kv_head, head_size)

# Extract engine_id from vllm_config if available
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what cases is the engine_id unavailable?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

engine_id is introduced by vllm-project/vllm#17751 , before this PR, there is no engine_id within KVTransferConfig

Copy link
Copy Markdown
Contributor

@sammshen sammshen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! This is great!

Signed-off-by: baoloongmao <baoloongmao@tencent.com>
Signed-off-by: baoloongmao <baoloongmao@tencent.com>
@maobaolong maobaolong added the full Run comprehensive tests on this PR label Jan 12, 2026
Copy link
Copy Markdown
Collaborator

@chunxiaozheng chunxiaozheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@chunxiaozheng chunxiaozheng merged commit ff2b40e into LMCache:dev Jan 12, 2026
25 of 26 checks passed
DongDongJu pushed a commit to DongDongJu/LMCache that referenced this pull request Feb 22, 2026
* Remove vllm deps from lookup client/server

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

* fix

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

* fix

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

* fix

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

---------

Signed-off-by: baoloongmao <baoloongmao@tencent.com>
sammshen pushed a commit to sammshen/LMCache that referenced this pull request Mar 1, 2026
* Remove vllm deps from lookup client/server

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

* fix

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

* fix

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

* fix

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

---------

Signed-off-by: baoloongmao <baoloongmao@tencent.com>
shaoxiawjc pushed a commit to shaoxiawjc/LMCache that referenced this pull request Mar 11, 2026
* Remove vllm deps from lookup client/server

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

* fix

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

* fix

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

* fix

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

---------

Signed-off-by: baoloongmao <baoloongmao@tencent.com>
Signed-off-by: shaoxiawjc <wjc2800@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

full Run comprehensive tests on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants