fix: Set device node GID in CDI specs#1631
Merged
elezar merged 1 commit intoNVIDIA:mainfrom Feb 6, 2026
Merged
Conversation
Pull Request Test Coverage Report for Build 21756623430Details
💛 - Coveralls |
Member
Author
|
/cherry-pick release-1.18 |
70be01b to
62f0561
Compare
jgehrcke
reviewed
Feb 6, 2026
| Minor: dn.Minor, | ||
| FileMode: &dn.FileMode, | ||
| Permissions: string(dn.Permissions), | ||
| GID: ptrIfNonZero(dn.Gid), |
Collaborator
There was a problem hiding this comment.
Looking at this, I wonder which other properties on dn there are that we currently don't use within this new DeviceNode{} construction.
Member
Author
There was a problem hiding this comment.
The expanded Device type is:
type Device struct {
Rule struct {
// Type of device ('c' for char, 'b' for block). If set to 'a', this rule
// acts as a wildcard and all fields other than Allow are ignored.
Type Type `json:"type"`
// Major is the device's major number.
Major int64 `json:"major"`
// Minor is the device's minor number.
Minor int64 `json:"minor"`
// Permissions is the set of permissions that this rule applies to (in the
// cgroupv1 format -- any combination of "rwm").
Permissions Permissions `json:"permissions"`
// Allow specifies whether this rule is allowed.
Allow bool `json:"allow"`
}
// Path to the device.
Path string `json:"path"`
// FileMode permission bits for the device.
FileMode os.FileMode `json:"file_mode"`
// Uid of the device.
Uid uint32 `json:"uid,omitempty"` //nolint:revive // Suppress "var-naming: struct field Uid should be UID".
// Gid of the device.
Gid uint32 `json:"gid,omitempty"` //nolint:revive // Suppress "var-naming: struct field Gid should be GID".
}
Meaning that we don't use:
Allow: We are specifying device nodes and NOT cgroup rules directly and such this is out of scope.Type: We don't explicilty specify this, but could since we always inject character devices.UID: I elected to not include the UID since this should depend on the user in the contianer (if anything) and not the UID on the host. Note that the CDI package has a special case for whenUIDis not set: https://github.com/cncf-tags/container-device-interface/blob/e2632194760242fc74a30c3803107f9c1ba5718b/pkg/cdi/container-edits.go#L96-L100 and I didn't want to inadvertently interfere with this.
I can add a more complete explaination if you like?
Member
Author
There was a problem hiding this comment.
Added a more meaningful comment.
This change ensures that the GID for a device node is set in a generated CDI specification. This is important when a device node allows users of the non-root group to access the device node -- especially when running in non-root containers. Signed-off-by: Evan Lezar <elezar@nvidia.com>
62f0561 to
b4db92f
Compare
|
🤖 Backport PR created for |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This change ensures that the GID for a device node is set in a generated CDI specification. This is important when a device node allows users of the non-root group to access the device node -- especially when running in non-root containers.
Testing:
On the host:
(where
44is the GID of thevideogroup)Without this change we have:
(indicating that the
/dev/nvmaphas the GID0associated with it).Running
nvidia-smishows the following errors (and memory is not reported):With this change:
(showing a
GIDof44matching the host):and then:
Note that the injection of additional GIDs (removing the need for
--group-add 44) is covered in #630.