Enable JobSet and Nvidia Data Center monitoring by default#5384
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request updates the default configurations for GKE clusters to align with new internal requirements. It specifically enables NVIDIA Data Center GPU Manager (DCGM) monitoring and the JobSet component by default, enhancing the out-of-the-box monitoring and job management capabilities for users. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
The pull request successfully updates the default monitoring components to include JOBSET and sets enable_dcgm_monitoring to true by default, aligning with the objective to enable JobSet and Nvidia Data Center monitoring. The README.md has also been updated to reflect the new default value for enable_dcgm_monitoring.
|
/gcbrun |
|
Running tests |
a2a9e76
into
GoogleCloudPlatform:develop
…udPlatform#5384) Co-authored-by: Swarna Bharathi Mantena <swarnabm@google.com> and Neelabh94
This PR changes monitoring defaults.