Skip to content

added nvidia-repositories script#4553

Merged
rachit-google merged 2 commits into
GoogleCloudPlatform:developfrom
rachit-google:rachit/a3_ultra_slurm_nvidia_script
Sep 16, 2025
Merged

added nvidia-repositories script#4553
rachit-google merged 2 commits into
GoogleCloudPlatform:developfrom
rachit-google:rachit/a3_ultra_slurm_nvidia_script

Conversation

@rachit-google

Copy link
Copy Markdown
Contributor

Submission Checklist

Added NVIDIA repositories following the new script in Guest OS accelerator images for A3 Ultra Slurm blueprint.
See below for example:
https://github.com/GoogleCloudPlatform/cluster-toolkit/pull/3615/files

@rachit-google rachit-google requested review from a team and samskillman as code owners August 20, 2025 05:49
@rachit-google rachit-google added the release-module-improvements Added to release notes under the "Module Improvements" heading. label Aug 20, 2025

@samskillman samskillman left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please run the PR tests on this. Also, that script is duplicating the same things lower in the blueprint, and likely opens the build up to upgrading packages at the wrong time (before they are held). Last, sudo is not needed here as they run as root.

@samskillman samskillman left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we take this approach here, then let's do this across a3u and a4 both slurm and vms.

Please make sure to run all tests for A* slurm clusters.

@rachit-google

Copy link
Copy Markdown
Contributor Author

If we take this approach here, then let's do this across a3u and a4 both slurm and vms.

Please make sure to run all tests for A* slurm clusters.

sure. once i get it working for A3 ultra, i will change it for others as well.

@rachit-google rachit-google marked this pull request as draft September 2, 2025 06:40
@rachit-google rachit-google marked this pull request as ready for review September 2, 2025 09:01

@arpit974 arpit974 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes looks good,
suggested changes seems to be taken care of.

@rachit-google

Copy link
Copy Markdown
Contributor Author

If we take this approach here, then let's do this across a3u and a4 both slurm and vms.

Please make sure to run all tests for A* slurm clusters.

will be taken care of in subsequent prs.

@rachit-google rachit-google dismissed samskillman’s stale review September 16, 2025 06:27

will be taken care of in subsequent prs.

addressed feedback

fix typo
@rachit-google rachit-google force-pushed the rachit/a3_ultra_slurm_nvidia_script branch from 15fde06 to b7f4dac Compare September 16, 2025 08:03
@rachit-google rachit-google merged commit 7189867 into GoogleCloudPlatform:develop Sep 16, 2025
11 of 61 checks passed
rachit-google added a commit to rachit-google/cluster-toolkit that referenced this pull request Sep 17, 2025
…/a3_ultra_slurm_nvidia_script

added nvidia-repositories script
rachit-google added a commit to rachit-google/cluster-toolkit that referenced this pull request Sep 17, 2025
…/a3_ultra_slurm_nvidia_script

added nvidia-repositories script
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-module-improvements Added to release notes under the "Module Improvements" heading.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants