[Fix][Operator] Add privileged mode and nvidia runtime for GPU visibility by royyhuang · Pull Request #2749 · LMCache/LMCache

royyhuang · 2026-03-11T23:58:35Z

Summary

Add runtimeClassName: nvidia and privileged: true to the DaemonSet pod spec so LMCache pods get GPU visibility via the NVIDIA container runtime without claiming GPUs through nvidia.com/gpu resource requests
Add NVIDIA_DRIVER_CAPABILITIES=all env var alongside existing NVIDIA_VISIBLE_DEVICES=all
Update DESIGN.md and README.md with the new GPU visibility requirements and security implications
[CI] Use job-level path filtering to unblock operator-only PRs: Move path filtering from workflow-level paths:/paths-ignore: triggers to job-level if: conditions using dorny/paths-filter. When jobs are skipped via if:, GitHub reports them as "Success" — unlike workflow-level paths: which causes required checks to never report, permanently blocking PRs that don't touch relevant files.

Why

LMCache needs access to all GPUs on the node for CUDA IPC and custom data transfer kernels. However, requesting GPUs via the device plugin (nvidia.com/gpu) would exclusively claim them, making them unavailable to the serving engine (e.g., vLLM). The combination of runtimeClassName: nvidia + privileged: true + NVIDIA env vars allows full GPU access without consuming device plugin resources.

CI workflow changes

test.yml: Always triggers on PRs, but skips the test matrix when no Python files (**.py, pyproject.toml, requirements/**.txt) changed
code_quality_checks.yml: Always triggers on PRs, but skips pre-commit checks when only operator/ files changed
Uses dorny/paths-filter@v3 for path detection at the job level
Jobs skipped via if: report as "Success" to GitHub, so required status checks pass without running unnecessary CI

Test plan

Verified nvidia-smi shows all 8 GPUs in a test pod with this configuration (no nvidia.com/gpu request)
Verified the operator builds cleanly (go build ./...)
Deploy LMCacheEngine CR on a cluster with NVIDIA GPU Operator and verify pods start successfully
Verify vLLM serving engine can still claim all GPUs on the same node
Verify CI workflows trigger correctly on this PR and required checks pass

gemini-code-assist · 2026-03-11T23:58:54Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces changes to the LMCache operator to enable LMCache pods to access all GPUs on a node without consuming GPU resources through the Kubernetes device plugin. This is critical for scenarios requiring CUDA IPC and custom data transfer kernels while ensuring that serving engines like vLLM can still claim and utilize the GPUs. The implementation involves setting the NVIDIA container runtime, enabling privileged mode for the pods, and configuring specific NVIDIA environment variables, with corresponding updates to the project's documentation regarding these new requirements and their security considerations.

Highlights

Enabled GPU Visibility: Configured DaemonSet pods with "runtimeClassName: nvidia" and "privileged: true" to allow LMCache pods to see all GPUs without claiming them via "nvidia.com/gpu" resource requests.
Added NVIDIA Driver Capabilities: Injected "NVIDIA_DRIVER_CAPABILITIES=all" environment variable alongside "NVIDIA_VISIBLE_DEVICES=all" for full driver access.
Updated Documentation: Reflected the new GPU visibility requirements and associated security implications in DESIGN.md and README.md.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

operator/DESIGN.md
- Documented the injection of "runtimeClassName: nvidia", "privileged: true", and "NVIDIA_DRIVER_CAPABILITIES=all" into the pod spec.
- Expanded the security implications section to cover "privileged: true" and "hostIPC: true".
- Updated the DaemonSet reconciliation logic to include injecting "runtimeClassName: nvidia" and "privileged: true".
operator/README.md
- Added the NVIDIA GPU Operator as a prerequisite for deployment.
- Included a note explaining the use of "runtimeClassName: nvidia" and "privileged: true" for GPU visibility without resource requests.
- Updated the description of automatic operator handling to explicitly mention GPU visibility.
operator/internal/resources/daemonset.go
- Introduced variables for "nvidiaRuntime" and "privileged" mode.
- Added the "NVIDIA_DRIVER_CAPABILITIES=all" environment variable to the container specification.
- Configured the PodSpec to use "runtimeClassName: nvidia".
- Set the container's SecurityContext to "privileged: true".

Activity

Verified "nvidia-smi" shows all 8 GPUs in a test pod with this configuration (no "nvidia.com/gpu" request).
Verified the operator builds cleanly ("go build ./...").
Pending verification: Deploy LMCacheEngine CR on a cluster with NVIDIA GPU Operator and verify pods start successfully.
Pending verification: Verify vLLM serving engine can still claim all GPUs on the same node.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds support for running LMCache pods with GPU visibility without consuming GPU resources from the Kubernetes device plugin. This is achieved by setting runtimeClassName: nvidia, privileged: true, and the necessary NVIDIA environment variables on the DaemonSet pods. The changes are well-implemented and correctly apply the required settings to the pod specification. The documentation in DESIGN.md and README.md has been updated accordingly to reflect these new requirements and their security implications, which is a great addition for users. I have one minor suggestion to improve code conciseness in daemonset.go.

gemini-code-assist · 2026-03-11T23:59:47Z

+	nvidiaRuntime := "nvidia"
+	privileged := true


These local variables are only used to get pointers to a string and a boolean. To make the code more concise, you could use a helper function. Since you are already using Go generics in your tests (ptr[T any]), you could move that helper to a shared package (e.g., internal/resources/helpers.go) and use it directly at the call sites.

For example, with an exported Ptr[T any](v T) *T helper, you could change the call sites as follows and remove these variables:

// at line 155 RuntimeClassName: resources.Ptr("nvidia"), // at line 177 Privileged: resources.Ptr(true),

Alternatively, you could use the standard k8s.io/utils/pointer package which provides pointer.String() and pointer.Bool() for this exact purpose.

ruizhang0101

LGTM :)))

sammshen

LGTM!

ApostaC

LGTM!

LMCache pods need access to all GPUs on the node for CUDA IPC and custom data transfer kernels, but must not claim GPUs via nvidia.com/gpu resource requests (otherwise the serving engine loses access to them). Add runtimeClassName: nvidia, privileged: true, and NVIDIA env vars to the DaemonSet pod spec so the NVIDIA container runtime injects driver libraries and exposes all GPUs without consuming device plugin resources. Update DESIGN.md and README.md with the new requirements. Signed-off-by: royyhuang <roy.y.huang@gmail.com>

Update TestBuildDaemonSet_CustomEnvAndVolumes to expect 4 env vars instead of 3, accounting for the new NVIDIA_DRIVER_CAPABILITIES=all env var added in the previous commit. Signed-off-by: royyhuang <roy.y.huang@gmail.com>

Move path filtering from workflow-level triggers to job-level `if:` conditions using dorny/paths-filter. When jobs are skipped via `if:`, GitHub reports them as "Success" — unlike workflow-level `paths:` which causes required checks to never report, permanently blocking PRs. - test.yml: skip test matrix when no Python files changed - code_quality_checks.yml: skip pre-commit when only operator files changed Signed-off-by: royyhuang <roy.y.huang@gmail.com>

Signed-off-by: royyhuang <roy.y.huang@gmail.com> # Conflicts: # .github/workflows/code_quality_checks.yml # .github/workflows/test.yml

…e into fix/operator-gpu-driver

…lity (LMCache#2749) * [Operator] Add privileged mode and nvidia runtime for GPU visibility Signed-off-by: royyhuang <roy.y.huang@gmail.com> Co-authored-by: Yihua Cheng <yihua98@uchicago.edu>

gemini-code-assist Bot reviewed Mar 11, 2026

View reviewed changes

royyhuang changed the title ~~[Operator] Add privileged mode and nvidia runtime for GPU visibility~~ [Fix][Operator] Add privileged mode and nvidia runtime for GPU visibility Mar 12, 2026

ruizhang0101 approved these changes Mar 12, 2026

View reviewed changes

sammshen approved these changes Mar 12, 2026

View reviewed changes

ApostaC approved these changes Mar 12, 2026

View reviewed changes

ApostaC enabled auto-merge (squash) March 14, 2026 00:29

github-actions Bot added the full Run comprehensive tests on this PR label Mar 14, 2026

royyhuang added 2 commits March 16, 2026 22:17

[Operator] Fix test for NVIDIA_DRIVER_CAPABILITIES env var

45664d0

Update TestBuildDaemonSet_CustomEnvAndVolumes to expect 4 env vars instead of 3, accounting for the new NVIDIA_DRIVER_CAPABILITIES=all env var added in the previous commit. Signed-off-by: royyhuang <roy.y.huang@gmail.com>

royyhuang force-pushed the fix/operator-gpu-driver branch from e72afdd to 45664d0 Compare March 16, 2026 22:17

royyhuang and others added 5 commits March 18, 2026 14:07

Merge branch 'dev' into fix/operator-gpu-driver

0ad4a07

Merge branch 'dev' into fix/operator-gpu-driver

0b85b0b

Merge branch 'dev' into fix/operator-gpu-driver

c06a3d3

Merge branch 'dev' into fix/operator-gpu-driver

242447b

royyhuang disabled auto-merge March 20, 2026 20:55

github-actions Bot removed the full Run comprehensive tests on this PR label Mar 20, 2026

royyhuang and others added 4 commits March 24, 2026 14:34

Merge branch 'dev' into fix/operator-gpu-driver

bd1f673

Merge branch 'dev' into fix/operator-gpu-driver

a864a90

Signed-off-by: royyhuang <roy.y.huang@gmail.com> # Conflicts: # .github/workflows/code_quality_checks.yml # .github/workflows/test.yml

Merge branch 'fix/operator-gpu-driver' of github.com:royyhuang/LMCach…

e72f8e8

…e into fix/operator-gpu-driver

Merge branch 'dev' into fix/operator-gpu-driver

47c5594

ApostaC enabled auto-merge (squash) March 25, 2026 18:22

github-actions Bot added the full Run comprehensive tests on this PR label Mar 25, 2026

royyhuang added 3 commits March 25, 2026 16:08

Merge branch 'dev' into fix/operator-gpu-driver

8ac5e20

Merge branch 'dev' into fix/operator-gpu-driver

f324cb8

Merge branch 'dev' into fix/operator-gpu-driver

7a3fa4c

ApostaC merged commit 6e02b91 into LMCache:dev Mar 27, 2026
25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix][Operator] Add privileged mode and nvidia runtime for GPU visibility#2749

[Fix][Operator] Add privileged mode and nvidia runtime for GPU visibility#2749
ApostaC merged 14 commits intoLMCache:devfrom
royyhuang:fix/operator-gpu-driver

royyhuang commented Mar 11, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Mar 11, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Mar 11, 2026

Uh oh!

ruizhang0101 left a comment

Uh oh!

sammshen left a comment

Uh oh!

ApostaC left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

royyhuang commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

CI workflow changes

Test plan

Uh oh!

gemini-code-assist Bot commented Mar 11, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

ruizhang0101 left a comment

Choose a reason for hiding this comment

Uh oh!

sammshen left a comment

Choose a reason for hiding this comment

Uh oh!

ApostaC left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

royyhuang commented Mar 11, 2026 •

edited

Loading