Skip to content

Pin DCGM to version 4.5.3#5721

Merged
shubpal07 merged 1 commit into
GoogleCloudPlatform:developfrom
shubpal07:shubham/update-DCGM
May 31, 2026
Merged

Pin DCGM to version 4.5.3#5721
shubpal07 merged 1 commit into
GoogleCloudPlatform:developfrom
shubpal07:shubham/update-DCGM

Conversation

@shubpal07

Copy link
Copy Markdown
Contributor

This PR pins the datacenter-gpu-manager (DCGM) version to 4.5.3 across multiple machine learning example blueprints and deployment configurations.

https://docs.nvidia.com/datacenter/dcgm/latest/release-notes/changelog.html

Submission Checklist

NOTE: Community submissions can take up to 2 weeks to be reviewed.

Please take the following actions before submitting this pull request.

  • Fork your PR branch from the Toolkit "develop" branch (not main)
  • Test all changes with pre-commit in a local branch #
  • Confirm that "make tests" passes all tests
  • Add or modify unit tests to cover code changes
  • Ensure that unit test coverage remains above 80%
  • Update all applicable documentation
  • Follow Cluster Toolkit Contribution guidelines #

Change-Id: I623cbe17316f88102e8a8d30c0180eab36d728c8
@shubpal07 shubpal07 requested a review from a team as a code owner May 29, 2026 09:38
@shubpal07 shubpal07 changed the base branch from main to develop May 29, 2026 09:38
@shubpal07 shubpal07 requested a review from LAVEEN May 29, 2026 09:38
@shubpal07 shubpal07 self-assigned this May 29, 2026
@shubpal07 shubpal07 added the release-improvements Added to release notes under the "Improvements" heading. label May 29, 2026
@gemini-code-assist

Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request standardizes the DCGM version across several machine learning example blueprints by bumping the pinned version from 4.5.2 to 4.5.3. This ensures consistency and compatibility with the latest recommended releases for these deployment configurations.

Highlights

  • Dependency Update: Updated the datacenter-gpu-manager (DCGM) components to version 4.5.3 across multiple machine learning blueprints.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the datacenter-gpu-manager package versions from 1:4.5.2-1 to 1:4.5.3-1 across several Slurm blueprint examples, including a3mega, a3ultra, and a4high. The feedback suggests defining the version string as a local variable within the shell scripts to avoid duplication and improve maintainability.

Comment thread examples/machine-learning/a3-megagpu-8g/a3mega-slurm-blueprint.yaml
Comment thread examples/machine-learning/a4-highgpu-8g/a4high-slurm-blueprint.yaml
@shubpal07

Copy link
Copy Markdown
Contributor Author

corresponding integ. tests of slurm a3 mega, a3 ultra, a4 high passed.

@LAVEEN LAVEEN left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@shubpal07 shubpal07 merged commit 47f83e6 into GoogleCloudPlatform:develop May 31, 2026
73 of 86 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-improvements Added to release notes under the "Improvements" heading.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants