TPU v6e DWS flex integration tests#5135
TPU v6e DWS flex integration tests#5135shubpal07 merged 1 commit intoGoogleCloudPlatform:developfrom
Conversation
Summary of ChangesHello @shubpal07, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the GKE TPU offerings by introducing a new integration test for TPU v6e using the Dynamic Workload Scheduling (DWS) Flex Start consumption model. It provides a complete blueprint and documentation for deploying and managing dynamically scaled TPU clusters, alongside a detailed guide for local test execution. The changes ensure robust validation of the Flex Start lifecycle, from initial node provisioning to automatic scale-down, and integrate these tests into the CI/CD pipeline. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces new integration tests and documentation for GKE TPU v6e and 7x with DWS Flex Start. It also updates the gke-node-pool module to correctly handle per-zone autoscaling limits required for TPU Flex Start, including new variables and preconditions. The changes enhance the project's testing capabilities and documentation for these new features.
examples/gke-consumption-options/dws-flex-start/gke-tpu-7x/gke-tpu-7x.yaml
Show resolved
Hide resolved
examples/gke-consumption-options/dws-flex-start/gke-tpu-v6e/gke-tpu-v6e.yaml
Show resolved
Hide resolved
023b390 to
4b83bc6
Compare
.../cloud-build/daily-tests/ansible_playbooks/test-validation/test-gke-tpu-flex-autoscaling.yml
Outdated
Show resolved
Hide resolved
.../cloud-build/daily-tests/ansible_playbooks/test-validation/test-gke-tpu-flex-autoscaling.yml
Outdated
Show resolved
Hide resolved
shubpal07
left a comment
There was a problem hiding this comment.
pushed revision
.../cloud-build/daily-tests/ansible_playbooks/test-validation/test-gke-tpu-flex-autoscaling.yml
Outdated
Show resolved
Hide resolved
.../cloud-build/daily-tests/ansible_playbooks/test-validation/test-gke-tpu-flex-autoscaling.yml
Outdated
Show resolved
Hide resolved
Change-Id: I610b93194747bdfba9c58d3b489142f5b289af80 Change-Id: I4e0e25dbcaa97acaf6dab364b45c11d0f8c801e5 Change-Id: I8dd3c04ce61fd50a31c0d8085e100ce9fba8d45d Change-Id: I89795626d0f7ff24bef420bdad9c7d8586d8bbb1 Change-Id: I529413e8a32c4f2982b9c02b31e6ae6eebcf5400 Change-Id: I92086bee858c216e2ff65acb70b5471c9a6f74a3 Change-Id: I1dfe968c8751307ec89cd6cfd18525d84662c4f3
c739da9 to
41d91fb
Compare
Overview
This PR introduces a new integration test for GKE TPU v6e utilizing the DWS Flex Start (Dynamic Workload
Scheduling) consumption model.
Key Changes
end-to-end lifecycle in CI/CD.
for TPU v6e topologies.
the Flex Start lifecycle:
Verification Results
Cloud Build
Successfully validated via:
gcloud builds submit --config tools/cloud-build/daily-tests/builds/gke-tpu-v6e-flex.yamlLocal Execution
Successfully verified the full validation suite running from a local Cloud Workstation.
Submission Checklist
NOTE: Community submissions can take up to 2 weeks to be reviewed.
Please take the following actions before submitting this pull request.