Skip to content

fix: Set device node GID in CDI specs#1631

Merged
elezar merged 1 commit intoNVIDIA:mainfrom
elezar:set-gid-for-device-nodes
Feb 6, 2026
Merged

fix: Set device node GID in CDI specs#1631
elezar merged 1 commit intoNVIDIA:mainfrom
elezar:set-gid-for-device-nodes

Conversation

@elezar
Copy link
Member

@elezar elezar commented Feb 5, 2026

This change ensures that the GID for a device node is set in a generated CDI specification. This is important when a device node allows users of the non-root group to access the device node -- especially when running in non-root containers.

Testing:

On the host:

$ ls -aln /dev/nvmap
cr--r----- 1 0 44 10, 123 Feb  5 18:45 /dev/nvmap

(where 44 is the GID of the video group)

Without this change we have:

$ docker run --rm -ti --device=nvidia.com/gpu=0 -u 1000:1000 ubuntu ls -aln /dev/nvmap
cr--r----- 1 0 0 10, 123 Feb  5 20:54 /dev/nvmap

(indicating that the /dev/nvmap has the GID 0 associated with it).

Running nvidia-smi shows the following errors (and memory is not reported):

$ docker run --rm -ti --device=nvidia.com/gpu=0 -u 1000:1000 --group-add 44 ubuntu nvidia-smi -L
NvRmMemInitNvmap failed: error Permission denied
NvRmMemMgrInit failed: Memory Manager Not supported, line 333
NvRmMemMgrInit failed: error type 196626
libnvrm_gpu.so: NvRmGpuLibOpen failed, error=196625
NvRmMemInitNvmap failed: error Permission denied
NvRmMemMgrInit failed: Memory Manager Not supported, line 333
NvRmMemMgrInit failed: error type 196626
libnvrm_gpu.so: NvRmGpuLibOpen failed, error=196625
GPU 0: NVIDIA Thor (UUID: GPU-a7c66ad2-6dbb-0ab8-c1a2-37ba6dba3600)

With this change:

$  docker run --rm -ti --device=test.nvidia.com/gpu=0 -u 1000:1000 ubuntu ls -aln /dev/nvmap
cr--r----- 1 0 44 10, 123 Feb  5 20:52 /dev/nvmap

(showing a GID of 44 matching the host):
and then:

docker run --rm -ti --device=test.nvidia.com/gpu=0 -u 1000:1000 --group-add 44 ubuntu nvidia-smi -L
GPU 0: NVIDIA Thor (UUID: GPU-a7c66ad2-6dbb-0ab8-c1a2-37ba6dba3600)

Note that the injection of additional GIDs (removing the need for --group-add 44) is covered in #630.

@coveralls
Copy link

coveralls commented Feb 5, 2026

Pull Request Test Coverage Report for Build 21756623430

Details

  • 0 of 6 (0.0%) changed or added relevant lines in 1 file are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage decreased (-0.02%) to 39.547%

Changes Missing Coverage Covered Lines Changed/Added Lines %
internal/edits/device.go 0 6 0.0%
Files with Coverage Reduction New Missed Lines %
internal/edits/device.go 1 43.18%
Totals Coverage Status
Change from base Build 21756016385: -0.02%
Covered Lines: 5724
Relevant Lines: 14474

💛 - Coveralls

@elezar elezar requested review from jgehrcke and klueska February 5, 2026 20:57
@elezar elezar added this to the v1.18.3 milestone Feb 5, 2026
@elezar elezar added the bug Issue/PR to expose/discuss/fix a bug label Feb 5, 2026
@elezar
Copy link
Member Author

elezar commented Feb 5, 2026

/cherry-pick release-1.18

@elezar elezar self-assigned this Feb 5, 2026
@elezar elezar force-pushed the set-gid-for-device-nodes branch 2 times, most recently from 70be01b to 62f0561 Compare February 6, 2026 14:22
Copy link
Collaborator

@jgehrcke jgehrcke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Minor: dn.Minor,
FileMode: &dn.FileMode,
Permissions: string(dn.Permissions),
GID: ptrIfNonZero(dn.Gid),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this, I wonder which other properties on dn there are that we currently don't use within this new DeviceNode{} construction.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expanded Device type is:

type Device struct {
	Rule struct {
		// Type of device ('c' for char, 'b' for block). If set to 'a', this rule
		// acts as a wildcard and all fields other than Allow are ignored.
		Type Type `json:"type"`

		// Major is the device's major number.
		Major int64 `json:"major"`

		// Minor is the device's minor number.
		Minor int64 `json:"minor"`

		// Permissions is the set of permissions that this rule applies to (in the
		// cgroupv1 format -- any combination of "rwm").
		Permissions Permissions `json:"permissions"`

		// Allow specifies whether this rule is allowed.
		Allow bool `json:"allow"`
	}

	// Path to the device.
	Path string `json:"path"`

	// FileMode permission bits for the device.
	FileMode os.FileMode `json:"file_mode"`

	// Uid of the device.
	Uid uint32 `json:"uid,omitempty"` //nolint:revive // Suppress "var-naming: struct field Uid should be UID".

	// Gid of the device.
	Gid uint32 `json:"gid,omitempty"` //nolint:revive // Suppress "var-naming: struct field Gid should be GID".
}

Meaning that we don't use:

I can add a more complete explaination if you like?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a more meaningful comment.

This change ensures that the GID for a device node is
set in a generated CDI specification. This is important when
a device node allows users of the non-root group to access the
device node -- especially when running in non-root containers.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
@elezar elezar force-pushed the set-gid-for-device-nodes branch from 62f0561 to b4db92f Compare February 6, 2026 15:48
@elezar elezar merged commit 6b81d3b into NVIDIA:main Feb 6, 2026
9 of 10 checks passed
@github-actions
Copy link

github-actions bot commented Feb 6, 2026

🤖 Backport PR created for release-1.18: #1635 ⚠️ (has conflicts)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Issue/PR to expose/discuss/fix a bug cherry-pick/release-1.18

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants