Skip to content

[Platform]: Add Intel Gaudi (HPU) Support#2822

Merged
DongDongJu merged 3 commits intoLMCache:devfrom
hlin99:ww12_PR_HPU
Mar 21, 2026
Merged

[Platform]: Add Intel Gaudi (HPU) Support#2822
DongDongJu merged 3 commits intoLMCache:devfrom
hlin99:ww12_PR_HPU

Conversation

@hlin99
Copy link
Copy Markdown
Contributor

@hlin99 hlin99 commented Mar 19, 2026

Add HPU connector to support Intel Gaudi Platform

Signed-off-by: Tony Lin <tony.lin@intel.com>
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates support for Intel Gaudi (HPU) accelerators into the LMCache system. It enables the LMCache engine to detect, utilize, and manage memory on HPU devices, expanding the range of hardware platforms compatible with the system. The changes primarily involve adding HPU-specific device detection, information retrieval, and a new GPU connector to facilitate efficient KV cache operations on Gaudi.

Highlights

  • Intel Gaudi (HPU) Support: Added comprehensive support for Intel Gaudi (HPU) devices, allowing LMCache to leverage these accelerators for its engine operations.
  • HPU Device Detection: Implemented logic to detect and utilize HPU devices within the LMCache engine, ensuring proper device selection for computations.
  • HPU GPU Connector: Introduced a dedicated GPU connector for HPU, VLLMPagedMemHPUConnectorV2, which handles memory management and data transfer for KV caches on Gaudi hardware.
  • GPU Information Retrieval: Extended the system's ability to retrieve detailed GPU information to include Intel Gaudi devices, providing insights into their properties and memory.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for Intel Gaudi (HPU) by adding a new HPU connector and integrating it into the device detection and GPU information retrieval logic. The changes are well-structured, following existing patterns for XPU support. The new VLLMPagedMemHPUConnectorV2 class provides the necessary to_gpu and from_gpu functionalities using HPU-specific PyTorch operations. However, the get_shape method in the new connector is not implemented, which is a critical omission for a class inheriting from an abstract base class.

Comment thread lmcache/v1/gpu_connector/hpu_connector.py
@hlin99 hlin99 changed the title feat: add Intel Gaudi (HPU) support [Platform]: add Intel Gaudi (HPU) support Mar 19, 2026
@hlin99 hlin99 changed the title [Platform]: add Intel Gaudi (HPU) support [Platform]: Add Intel Gaudi (HPU) support Mar 19, 2026
@hlin99 hlin99 changed the title [Platform]: Add Intel Gaudi (HPU) support [Platform]: Add Intel Gaudi (HPU) Support Mar 19, 2026
@hlin99
Copy link
Copy Markdown
Contributor Author

hlin99 commented Mar 19, 2026

hi @ApostaC @sammshen @YaoJiayi @DongDongJu

This is a rework of #1066.

The previous implementation was overly complex. I've made several refactors on both vLLM & LMCache sides in past months to ensure today's HPU connector in LMCache is clean and straightforward — similar in structure to the GPU connector.

HPU is an ASIC accelerator, and this PR demonstrates that LMCache's architecture-level design is not only compatible with GPUs, but also extensible to other ASIC processing units. Looking ahead, I hope this opens the door to a broader LMCache ecosystem supporting hardware from diverse vendors — if that aligns with the LMCache project's vision.

I'd greatly appreciate it if the maintainers could take 5 minutes to review this. Thank you! 😊

@hlin99
Copy link
Copy Markdown
Contributor Author

hlin99 commented Mar 19, 2026

hi @ApostaC @sammshen @YaoJiayi @DongDongJu

This is a rework of #1066.

The previous implementation was overly complex. I've made several refactors on both vLLM & LMCache sides in past months to ensure today's HPU connector in LMCache is clean and straightforward — similar in structure to the GPU connector.

HPU is an ASIC accelerator, and this PR demonstrates that LMCache's architecture-level design is not only compatible with GPUs, but also extensible to other ASIC processing units. Looking ahead, I hope this opens the door to a broader LMCache ecosystem supporting hardware from diverse vendors — if that aligns with the LMCache project's vision.

I'd greatly appreciate it if the maintainers could take 5 minutes to review this. Thank you! 😊

++ @maobaolong @deng451e

Comment thread lmcache/v1/gpu_connector/hpu_connector.py
Comment thread lmcache/v1/gpu_connector/hpu_connector.py
Comment thread lmcache/v1/gpu_connector/hpu_connector.py Outdated
hlin99 added 2 commits March 20, 2026 11:01
Signed-off-by: Tony Lin <tony.lin@intel.com>
@hlin99 hlin99 requested a review from sammshen March 21, 2026 02:21
Copy link
Copy Markdown
Contributor

@sammshen sammshen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! clean ups can come later

Copy link
Copy Markdown
Collaborator

@DongDongJu DongDongJu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

much cleaner than before. Thanks for the work.
LGTM.

@DongDongJu DongDongJu enabled auto-merge (squash) March 21, 2026 14:04
@github-actions github-actions Bot added the full Run comprehensive tests on this PR label Mar 21, 2026
@DongDongJu DongDongJu merged commit 4406119 into LMCache:dev Mar 21, 2026
35 of 36 checks passed
realAaronWu pushed a commit to realAaronWu/LMCache that referenced this pull request Mar 26, 2026
* feat: add Intel Gaudi (HPU) support

Signed-off-by: Tony Lin <tony.lin@intel.com>

* address review comments

Signed-off-by: Tony Lin <tony.lin@intel.com>

---------

Signed-off-by: Tony Lin <tony.lin@intel.com>
deng451e pushed a commit to deng451e/LMCache that referenced this pull request Mar 27, 2026
* feat: add Intel Gaudi (HPU) support

Signed-off-by: Tony Lin <tony.lin@intel.com>

* address review comments

Signed-off-by: Tony Lin <tony.lin@intel.com>

---------

Signed-off-by: Tony Lin <tony.lin@intel.com>
jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026
* feat: add Intel Gaudi (HPU) support

Signed-off-by: Tony Lin <tony.lin@intel.com>

* address review comments

Signed-off-by: Tony Lin <tony.lin@intel.com>

---------

Signed-off-by: Tony Lin <tony.lin@intel.com>
jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026
* feat: add Intel Gaudi (HPU) support

Signed-off-by: Tony Lin <tony.lin@intel.com>

* address review comments

Signed-off-by: Tony Lin <tony.lin@intel.com>

---------

Signed-off-by: Tony Lin <tony.lin@intel.com>
@hlin99 hlin99 deleted the ww12_PR_HPU branch April 25, 2026 05:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

full Run comprehensive tests on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants