nvidia arm64 & GPU operator test#583
Merged
jepio merged 15 commits intoflatcar-masterfrom Mar 14, 2025
Merged
Conversation
Member
jepio
commented
Feb 27, 2025
- Add SkipFunc implementation for skipping test on unsupported instance types
- Add GPU operator test (includes nvidia-runtime sysext test)
- Add Arm64 support to both tests
- Add AWS support
There was a problem hiding this comment.
PR Overview
This pull request adds support for NVIDIA GPU testing by introducing a SkipFunc for unsupported instance types, adding a GPU operator test (including an NVIDIA runtime sysext test), and extending support to the ARM64 architecture and AWS platform.
- Introduces skipOnNonGpu to conditionally skip tests on unsupported instances.
- Adds a new test (cl.misc.nvidia.operator) with a complete GPU operator installation and validation workflow.
- Updates existing NVIDIA installation test to incorporate ARM64 support via template configuration.
Reviewed Changes
| File | Description |
|---|---|
| kola/tests/misc/nvidia.go | Added new constants, skip logic, GPU operator test implementation, and expanded platform/architecture support |
Copilot reviewed 1 out of 1 changed files in this pull request and generated no comments.
Comments suppressed due to low confidence (2)
kola/tests/misc/nvidia.go:162
- The multi-line helm installation command uses backticks, which preserve literal newlines. Verify that the shell execution handles these newlines as intended, or consider converting it to a single-line command.
_ = c.MustSSH(m, `curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 \
kola/tests/misc/nvidia.go:101
- The SSH check in waitForNvidiaDriver only verifies for the substring 'active (exited)', which may be too specific if the nvidia service enters other valid states. Consider broadening the check or adding comments to clarify the expected state.
out, err := c.SSH(*m, "systemctl status nvidia.service")
2 tasks
9e7301d to
dbb49cb
Compare
krnowak
reviewed
Mar 6, 2025
dbb49cb to
119cd04
Compare
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
This relies on the nvidia-runtime sysext from the bakery. Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
So that it doesn't look like a subtest which messes with the retry logic in scripts. Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
Instead of a particular output, which only matches a single GPU type. Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
The driver version for arm64 has been changed in Flatcar, so we can rely on the default now. Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
119cd04 to
2480322
Compare
krnowak
approved these changes
Mar 7, 2025
Member
krnowak
left a comment
There was a problem hiding this comment.
I think that the PR is fine as it is. I have some ideas below about moving the version numbers to constants to make it easier to bump the them when a need appears. This could be very well be done in a follow-up PR, that could probably also add some automation. Up to you.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
krnowak
approved these changes
Mar 14, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.