Skip to content

Add A4X blueprints and gpu definition#4461

Merged
alyssa-sm merged 1 commit into
GoogleCloudPlatform:developfrom
alyssa-sm:a4x-blueprints
Aug 6, 2025
Merged

Add A4X blueprints and gpu definition#4461
alyssa-sm merged 1 commit into
GoogleCloudPlatform:developfrom
alyssa-sm:a4x-blueprints

Conversation

@alyssa-sm

Copy link
Copy Markdown
Contributor

Submission Checklist

NOTE: Community submissions can take up to 2 weeks to be reviewed.

Please take the following actions before submitting this pull request.

  • Fork your PR branch from the Toolkit "develop" branch (not main)
  • Test all changes with pre-commit in a local branch #
  • Confirm that "make tests" passes all tests
  • Add or modify unit tests to cover code changes
  • Ensure that unit test coverage remains above 80%
  • Update all applicable documentation
  • Follow Cluster Toolkit Contribution guidelines #

@alyssa-sm alyssa-sm self-assigned this Jul 28, 2025
@alyssa-sm alyssa-sm force-pushed the a4x-blueprints branch 2 times, most recently from 1042812 to e39bc96 Compare July 28, 2025 22:20

@samskillman samskillman left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments so far. Would like to understand if we should push more of the prologs into slurm-gcp instead of inlining here.

Comment thread examples/machine-learning/a4x-highgpu-4g/a4xhigh-slurm-blueprint.yaml Outdated
Comment thread examples/machine-learning/a4x-highgpu-4g/a4xhigh-slurm-blueprint.yaml Outdated
Comment thread examples/machine-learning/a4x-highgpu-4g/a4xhigh-slurm-blueprint.yaml Outdated
Comment thread examples/machine-learning/a4x-highgpu-4g/a4xhigh-slurm-blueprint.yaml Outdated
Comment thread examples/machine-learning/a4x-highgpu-4g/a4xhigh-slurm-blueprint.yaml Outdated
Comment thread examples/machine-learning/a4x-highgpu-4g/a4xhigh-slurm-blueprint.yaml Outdated
@alyssa-sm alyssa-sm requested a review from samskillman August 4, 2025 21:38
@alyssa-sm alyssa-sm marked this pull request as ready for review August 4, 2025 21:38
@alyssa-sm alyssa-sm requested a review from a team as a code owner August 4, 2025 21:38
@alyssa-sm alyssa-sm assigned samskillman and unassigned alyssa-sm Aug 4, 2025
@alyssa-sm alyssa-sm added the release-key-new-features Added to release notes under the "Key New Features" heading. label Aug 4, 2025
samskillman
samskillman previously approved these changes Aug 4, 2025

@samskillman samskillman left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple last suggestions.

Comment thread examples/machine-learning/a4x-highgpu-4g/a4xhigh-slurm-blueprint.yaml Outdated
@alyssa-sm alyssa-sm force-pushed the a4x-blueprints branch 2 times, most recently from 5d1d6ce to 7b39d5d Compare August 4, 2025 23:43
@alyssa-sm alyssa-sm requested a review from samskillman August 5, 2025 00:08
@alyssa-sm alyssa-sm force-pushed the a4x-blueprints branch 7 times, most recently from 9388b72 to 40e104c Compare August 6, 2025 14:59

@samskillman samskillman left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more fix for hostname issue

Comment thread examples/machine-learning/a4x-highgpu-4g/a4xhigh-slurm-blueprint.yaml Outdated
Comment thread examples/machine-learning/a4x-highgpu-4g/a4xhigh-slurm-blueprint.yaml Outdated
samskillman
samskillman previously approved these changes Aug 6, 2025
@alyssa-sm alyssa-sm merged commit efb855f into GoogleCloudPlatform:develop Aug 6, 2025
14 of 65 checks passed
@alyssa-sm alyssa-sm deleted the a4x-blueprints branch August 6, 2025 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-key-new-features Added to release notes under the "Key New Features" heading.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants