Ensure that IPC sockets are not mounted read-only#1593
Merged
elezar merged 1 commit intoNVIDIA:mainfrom Jan 20, 2026
Merged
Conversation
IPC sockets (nvidia-persistenced, nvidia-fabricmanager, nvidia-mps) no longer include the "ro" mount option. This matches the behavior of libnvidia-container and allows nested container runtimes like enroot to bind-mount these sockets. Signed-off-by: Fagani Hajizada <fhajizada@nvidia.com>
Member
|
/cherry-pick release-1.18 |
Pull Request Test Coverage Report for Build 21167707185Details
💛 - Coveralls |
Member
|
/ok-to-test fb15d14 |
|
🤖 Backport PR created for |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
CDI spec generation mounts IPC sockets (
nvidia-persistenced,nvidia-fabricmanager...) with thero(read-only) mount option. This breaks nested container runtimes like enroot/pyxis that need to bind-mount these sockets into containers.When we try to run slurm job with enroot/pyxis on K8s:
Root Cause
The
rooption is inherited from the default mount options inmounts.go, but IPC sockets should not be read-only. This is inconsistent withlibnvidia-containerwhich does not useMS_RDONLYfor IPC mounts (reference).Fix
Define IPC-specific mount options in
ipc.gothat excludero, matchinglibnvidia-containerbehavior:Testing
Additional Context
I tested two AWS EKS clusters with identical GPU Operator versions:
nvidia-persistencedruns inside the driver container. The socket is part of the overlay filesystem and it works fine.tmpfs (ro)This is critical to support SlurmonK8s (https://github.com/SlinkyProject/slurm-operator) with enroot/pyxis on K8s.