Skip to content

[Fix][Operator] Add privileged mode and nvidia runtime for GPU visibility#2749

Merged
ApostaC merged 14 commits intoLMCache:devfrom
royyhuang:fix/operator-gpu-driver
Mar 27, 2026
Merged

[Fix][Operator] Add privileged mode and nvidia runtime for GPU visibility#2749
ApostaC merged 14 commits intoLMCache:devfrom
royyhuang:fix/operator-gpu-driver

Conversation

@royyhuang
Copy link
Copy Markdown
Contributor

@royyhuang royyhuang commented Mar 11, 2026

Summary

  • Add runtimeClassName: nvidia and privileged: true to the DaemonSet pod spec so LMCache pods get GPU visibility via the NVIDIA container runtime without claiming GPUs through nvidia.com/gpu resource requests
  • Add NVIDIA_DRIVER_CAPABILITIES=all env var alongside existing NVIDIA_VISIBLE_DEVICES=all
  • Update DESIGN.md and README.md with the new GPU visibility requirements and security implications
  • [CI] Use job-level path filtering to unblock operator-only PRs: Move path filtering from workflow-level paths:/paths-ignore: triggers to job-level if: conditions using dorny/paths-filter. When jobs are skipped via if:, GitHub reports them as "Success" — unlike workflow-level paths: which causes required checks to never report, permanently blocking PRs that don't touch relevant files.

Why

LMCache needs access to all GPUs on the node for CUDA IPC and custom data transfer kernels. However, requesting GPUs via the device plugin (nvidia.com/gpu) would exclusively claim them, making them unavailable to the serving engine (e.g., vLLM). The combination of runtimeClassName: nvidia + privileged: true + NVIDIA env vars allows full GPU access without consuming device plugin resources.

CI workflow changes

  • test.yml: Always triggers on PRs, but skips the test matrix when no Python files (**.py, pyproject.toml, requirements/**.txt) changed
  • code_quality_checks.yml: Always triggers on PRs, but skips pre-commit checks when only operator/ files changed
  • Uses dorny/paths-filter@v3 for path detection at the job level
  • Jobs skipped via if: report as "Success" to GitHub, so required status checks pass without running unnecessary CI

Test plan

  • Verified nvidia-smi shows all 8 GPUs in a test pod with this configuration (no nvidia.com/gpu request)
  • Verified the operator builds cleanly (go build ./...)
  • Deploy LMCacheEngine CR on a cluster with NVIDIA GPU Operator and verify pods start successfully
  • Verify vLLM serving engine can still claim all GPUs on the same node
  • Verify CI workflows trigger correctly on this PR and required checks pass

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces changes to the LMCache operator to enable LMCache pods to access all GPUs on a node without consuming GPU resources through the Kubernetes device plugin. This is critical for scenarios requiring CUDA IPC and custom data transfer kernels while ensuring that serving engines like vLLM can still claim and utilize the GPUs. The implementation involves setting the NVIDIA container runtime, enabling privileged mode for the pods, and configuring specific NVIDIA environment variables, with corresponding updates to the project's documentation regarding these new requirements and their security considerations.

Highlights

  • Enabled GPU Visibility: Configured DaemonSet pods with "runtimeClassName: nvidia" and "privileged: true" to allow LMCache pods to see all GPUs without claiming them via "nvidia.com/gpu" resource requests.
  • Added NVIDIA Driver Capabilities: Injected "NVIDIA_DRIVER_CAPABILITIES=all" environment variable alongside "NVIDIA_VISIBLE_DEVICES=all" for full driver access.
  • Updated Documentation: Reflected the new GPU visibility requirements and associated security implications in DESIGN.md and README.md.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • operator/DESIGN.md
    • Documented the injection of "runtimeClassName: nvidia", "privileged: true", and "NVIDIA_DRIVER_CAPABILITIES=all" into the pod spec.
    • Expanded the security implications section to cover "privileged: true" and "hostIPC: true".
    • Updated the DaemonSet reconciliation logic to include injecting "runtimeClassName: nvidia" and "privileged: true".
  • operator/README.md
    • Added the NVIDIA GPU Operator as a prerequisite for deployment.
    • Included a note explaining the use of "runtimeClassName: nvidia" and "privileged: true" for GPU visibility without resource requests.
    • Updated the description of automatic operator handling to explicitly mention GPU visibility.
  • operator/internal/resources/daemonset.go
    • Introduced variables for "nvidiaRuntime" and "privileged" mode.
    • Added the "NVIDIA_DRIVER_CAPABILITIES=all" environment variable to the container specification.
    • Configured the PodSpec to use "runtimeClassName: nvidia".
    • Set the container's SecurityContext to "privileged: true".
Activity
  • Verified "nvidia-smi" shows all 8 GPUs in a test pod with this configuration (no "nvidia.com/gpu" request).
  • Verified the operator builds cleanly ("go build ./...").
  • Pending verification: Deploy LMCacheEngine CR on a cluster with NVIDIA GPU Operator and verify pods start successfully.
  • Pending verification: Verify vLLM serving engine can still claim all GPUs on the same node.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for running LMCache pods with GPU visibility without consuming GPU resources from the Kubernetes device plugin. This is achieved by setting runtimeClassName: nvidia, privileged: true, and the necessary NVIDIA environment variables on the DaemonSet pods. The changes are well-implemented and correctly apply the required settings to the pod specification. The documentation in DESIGN.md and README.md has been updated accordingly to reflect these new requirements and their security implications, which is a great addition for users. I have one minor suggestion to improve code conciseness in daemonset.go.

Comment on lines +37 to +38
nvidiaRuntime := "nvidia"
privileged := true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

These local variables are only used to get pointers to a string and a boolean. To make the code more concise, you could use a helper function. Since you are already using Go generics in your tests (ptr[T any]), you could move that helper to a shared package (e.g., internal/resources/helpers.go) and use it directly at the call sites.

For example, with an exported Ptr[T any](v T) *T helper, you could change the call sites as follows and remove these variables:

// at line 155
RuntimeClassName:   resources.Ptr("nvidia"),

// at line 177
Privileged: resources.Ptr(true),

Alternatively, you could use the standard k8s.io/utils/pointer package which provides pointer.String() and pointer.Bool() for this exact purpose.

@royyhuang royyhuang changed the title [Operator] Add privileged mode and nvidia runtime for GPU visibility [Fix][Operator] Add privileged mode and nvidia runtime for GPU visibility Mar 12, 2026
Copy link
Copy Markdown
Contributor

@ruizhang0101 ruizhang0101 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :)))

Copy link
Copy Markdown
Contributor

@sammshen sammshen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Copy Markdown
Contributor

@ApostaC ApostaC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ApostaC ApostaC enabled auto-merge (squash) March 14, 2026 00:29
@github-actions github-actions Bot added the full Run comprehensive tests on this PR label Mar 14, 2026
LMCache pods need access to all GPUs on the node for CUDA IPC and
custom data transfer kernels, but must not claim GPUs via nvidia.com/gpu
resource requests (otherwise the serving engine loses access to them).

Add runtimeClassName: nvidia, privileged: true, and NVIDIA env vars to
the DaemonSet pod spec so the NVIDIA container runtime injects driver
libraries and exposes all GPUs without consuming device plugin resources.

Update DESIGN.md and README.md with the new requirements.

Signed-off-by: royyhuang <roy.y.huang@gmail.com>
Update TestBuildDaemonSet_CustomEnvAndVolumes to expect 4 env vars
instead of 3, accounting for the new NVIDIA_DRIVER_CAPABILITIES=all
env var added in the previous commit.

Signed-off-by: royyhuang <roy.y.huang@gmail.com>
@royyhuang royyhuang force-pushed the fix/operator-gpu-driver branch from e72afdd to 45664d0 Compare March 16, 2026 22:17
royyhuang and others added 5 commits March 18, 2026 14:07
Move path filtering from workflow-level triggers to job-level `if:`
conditions using dorny/paths-filter. When jobs are skipped via `if:`,
GitHub reports them as "Success" — unlike workflow-level `paths:` which
causes required checks to never report, permanently blocking PRs.

- test.yml: skip test matrix when no Python files changed
- code_quality_checks.yml: skip pre-commit when only operator files changed

Signed-off-by: royyhuang <roy.y.huang@gmail.com>
@royyhuang royyhuang disabled auto-merge March 20, 2026 20:55
@github-actions github-actions Bot removed the full Run comprehensive tests on this PR label Mar 20, 2026
royyhuang and others added 4 commits March 24, 2026 14:34
Signed-off-by: royyhuang <roy.y.huang@gmail.com>

# Conflicts:
#	.github/workflows/code_quality_checks.yml
#	.github/workflows/test.yml
@ApostaC ApostaC enabled auto-merge (squash) March 25, 2026 18:22
@github-actions github-actions Bot added the full Run comprehensive tests on this PR label Mar 25, 2026
@ApostaC ApostaC merged commit 6e02b91 into LMCache:dev Mar 27, 2026
25 checks passed
jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026
…lity (LMCache#2749)

* [Operator] Add privileged mode and nvidia runtime for GPU visibility

Signed-off-by: royyhuang <roy.y.huang@gmail.com>
Co-authored-by: Yihua Cheng <yihua98@uchicago.edu>
jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026
…lity (LMCache#2749)

* [Operator] Add privileged mode and nvidia runtime for GPU visibility

Signed-off-by: royyhuang <roy.y.huang@gmail.com>
Co-authored-by: Yihua Cheng <yihua98@uchicago.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

full Run comprehensive tests on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants