Skip to content

Add a3u-gke-gcs blueprint#3454

Merged
samskillman merged 15 commits into
GoogleCloudPlatform:developfrom
samskillman:examples/a3u-gke-gcs
Jan 21, 2025
Merged

Add a3u-gke-gcs blueprint#3454
samskillman merged 15 commits into
GoogleCloudPlatform:developfrom
samskillman:examples/a3u-gke-gcs

Conversation

@samskillman

@samskillman samskillman commented Dec 21, 2024

Copy link
Copy Markdown
Collaborator

In addition to the blueprint, which gives an opinionated way to mount buckets for training and checkpointing, I modified the gke-persistent-volume to be able to use the mount_options specified in network_storage. It previously just hardcoded implicit-dirs.

Submission Checklist

NOTE: Community submissions can take up to 2 weeks to be reviewed.

Please take the following actions before submitting this pull request.

  • Fork your PR branch from the Toolkit "develop" branch (not main)
  • Test all changes with pre-commit in a local branch #
  • Confirm that "make tests" passes all tests
  • Add or modify unit tests to cover code changes
  • Ensure that unit test coverage remains above 80%
  • Update all applicable documentation
  • Follow Cluster Toolkit Contribution guidelines #

@samskillman samskillman added the release-key-new-features Added to release notes under the "Key New Features" heading. label Dec 21, 2024
@samskillman samskillman requested a review from cboneti December 21, 2024 07:46
Comment thread examples/hypercompute_clusters/a3u-gke-gcs/a3u-gke-gcs.yaml Outdated
Comment thread examples/hypercompute_clusters/a3u-gke-gcs/a3u-gke-gcs.yaml Outdated
Comment thread examples/hypercompute_clusters/a3u-gke-gcs/kueue-configuration.yaml.tftpl Outdated
Comment thread examples/hypercompute_clusters/a3u-gke-gcs/a3u-gke-gcs.yaml Outdated
Comment thread examples/hypercompute_clusters/a3u-gke-gcs/a3u-gke-gcs.yaml
Comment thread examples/hypercompute_clusters/a3u-gke-gcs/a3u-gke-gcs.yaml Outdated
Comment thread examples/hypercompute_clusters/a3u-gke-gcs/a3u-gke-gcs.yaml
Comment thread examples/hypercompute_clusters/a3u-gke-gcs/README.md
Comment thread examples/hypercompute_clusters/a3u-gke-gcs/README.md Outdated

@ankitkinra ankitkinra left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but will allow storage experts to approve.

We should try to create an e2e integration test for this blueprint as soon as can.

@samskillman samskillman force-pushed the examples/a3u-gke-gcs branch 2 times, most recently from 9dc3cb9 to 443ac98 Compare January 16, 2025 22:10
@samskillman

Copy link
Copy Markdown
Collaborator Author

Rebased on top of current develop branch. I think this PR is ready for final review.

Comment thread examples/hypercompute_clusters/a3u-gke-gcs/a3u-gke-gcs.yaml
Comment thread examples/hypercompute_clusters/a3u-gke-gcs/kueue-configuration.yaml.tftpl Outdated
@samskillman samskillman merged commit 07460df into GoogleCloudPlatform:develop Jan 21, 2025
@abbas1902 abbas1902 mentioned this pull request Feb 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-key-new-features Added to release notes under the "Key New Features" heading.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants