Skip to content

Add IsCoherent API to Device type#68

Merged
elezar merged 4 commits intoNVIDIA:mainfrom
elezar:add-is-coherent
Aug 20, 2025
Merged

Add IsCoherent API to Device type#68
elezar merged 4 commits intoNVIDIA:mainfrom
elezar:add-is-coherent

Conversation

@elezar
Copy link
Member

@elezar elezar commented Aug 15, 2025

This change adds an IsCoherent API to the Device interface. This returns whether the selected device has coherent access to system memory and is determined by the NVML Addressing mode.

This is blocked by:

@elezar elezar marked this pull request as draft August 15, 2025 13:30
@elezar elezar requested review from cdesiniotis and klueska August 15, 2025 13:30
@elezar elezar changed the title Add is coherent Add IsCoherent API to Device type Aug 15, 2025
@elezar elezar force-pushed the add-is-coherent branch 2 times, most recently from f72924c to 523d0ed Compare August 20, 2025 08:05
@elezar elezar marked this pull request as ready for review August 20, 2025 08:05
@elezar elezar self-assigned this Aug 20, 2025
@elezar elezar modified the milestone: https://github.com/NVIDIA/nvidia-container-toolkit/milestone/5 Aug 20, 2025
@elezar elezar force-pushed the add-is-coherent branch 2 times, most recently from 0521d1b to 77cc7c4 Compare August 20, 2025 08:35
elezar added 3 commits August 20, 2025 12:02
We use --exit-code instead of --quite to show the diffs causing
the check-vendor make target to fail.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
return "Hopper", nil
case nvml.DEVICE_ARCH_BLACKWELL:
return "Blackwell", nil
case nvml.DEVICE_ARCH_T23X:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is required since the DEVICE_ARCH_T23X definition was removed from nvml.h as part of the CUDA 13.0 release.

Copy link
Collaborator

@ArangoGutierrez ArangoGutierrez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only a nit comment
lgtm

}

// IsCoherent returns whether the device is capable of coherent access to cpu
// and gpu memory.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: GPU in upper case


check-vendor: vendor
git diff --quiet HEAD -- go.mod go.sum vendor
git diff --exit-code HEAD -- go.mod go.sum vendor
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the curious reader: exits with non-zero code if there are differences

(was this a bug?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some vendoring mismatches that CI fails on. Switching to --exit-code allows the diff to actually be displayed in CI so that it's easier to see what caused the failure without requiring that one try to reproduce it locally.

jgehrcke
jgehrcke previously approved these changes Aug 20, 2025
Copy link
Collaborator

@jgehrcke jgehrcke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take this with a grain of salt but LGTM

This change adds an IsCoherent function to the Device
type that can be used to check wither a device has coherent
access to system memory based on the supported addressing mode.

Note that this requires a CUDA 13.0 driver.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
@elezar elezar dismissed stale reviews from jgehrcke and ArangoGutierrez via a308b4a August 20, 2025 12:12
@elezar elezar merged commit fdfe25d into NVIDIA:main Aug 20, 2025
4 checks passed
@elezar elezar deleted the add-is-coherent branch August 20, 2025 12:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants