Skip to content

Query CUDA forward compatibility elf note if available#1598

Merged
elezar merged 3 commits intoNVIDIA:mainfrom
elezar:cuda-elf-header
Jan 23, 2026
Merged

Query CUDA forward compatibility elf note if available#1598
elezar merged 3 commits intoNVIDIA:mainfrom
elezar:cuda-elf-header

Conversation

@elezar
Copy link
Member

@elezar elezar commented Jan 21, 2026

This change queries an ELF note section in libcuda in the container to determine whether the forward compat libraries in the container should be used over the host drivers.

If the elf note section is not available, we fall back to the heuristics using the major numbers of the host and compat driver libraries.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
@elezar elezar added this to the v1.19.0 milestone Jan 21, 2026
This change is a minor refactor to the enable-cuda-compat hook to allow
the mechanism for determining the compat libraries in the container to
be extended more easily.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
@elezar elezar force-pushed the cuda-elf-header branch 2 times, most recently from e23b06e to 7902588 Compare January 21, 2026 14:58
@coveralls
Copy link

coveralls commented Jan 21, 2026

Pull Request Test Coverage Report for Build 21216900667

Details

  • 61 of 95 (64.21%) changed or added relevant lines in 2 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage increased (+0.2%) to 36.998%

Changes Missing Coverage Covered Lines Changed/Added Lines %
cmd/nvidia-cdi-hook/cudacompat/cudacompat.go 22 36 61.11%
cmd/nvidia-cdi-hook/cudacompat/cuda-elf-header.go 39 59 66.1%
Files with Coverage Reduction New Missed Lines %
cmd/nvidia-cdi-hook/cudacompat/cudacompat.go 1 46.21%
Totals Coverage Status
Change from base Build 21183749927: 0.2%
Covered Lines: 5307
Relevant Lines: 14344

💛 - Coveralls

This change queries an ELF note section in libcuda in the container
to determine whether the forward compat libraries in the container
should be used over the host drivers.

If the elf note section is not available, we fall back to the heuristics
using the major numbers of the host and compat driver libraries.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
Copy link
Collaborator

@ArangoGutierrez ArangoGutierrez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - One non blocking nit

m.logger.Debugf("Compat major version is not greater than the host driver major version (%v >= %v)", hostDriverVersion, compatDriverVersion)
return "", nil
// First check the elf header.
cudaCompatHeader, _ := GetCUDACompatElfHeader(libcudaCompatPath)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Could we at least log the error?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Firstly, this is in a hook and the logs are not captured and as such logs are not that helpful. Secondly, I want to explicitly ignore errors at this stage since I want to fall back to the heuristics that we already have in place.

@elezar elezar merged commit d4b70bc into NVIDIA:main Jan 23, 2026
16 checks passed
@elezar elezar deleted the cuda-elf-header branch January 23, 2026 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants