fix: Add tpu_topology conditional logic for TPU flex start #5655
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces support for TPU Flex Start configurations within the hardware settings. It ensures that the configuration logic respects existing topology expressions and correctly bypasses static node count calculations when Flex Start is enabled, improving compatibility with dynamic TPU provisioning. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request updates the hardware configuration logic to preserve existing topology expressions in placement policies and introduces a skip for node calculations when 'enable_flex_start' is active. Feedback suggests implementing defensive checks for null or unknown values when evaluating boolean settings to prevent runtime panics. Additionally, it is recommended to use the attribute name 'accelerator_topology' instead of 'tpu_topology' to ensure consistency with the underlying Terraform provider.
|
The PR-test-gke-tpu-v6e-flex passed the test_deployment_variable_not_used validation and hence the error is resolved. The failure is due to capacity constraints. |
b0d62a0
into
GoogleCloudPlatform:develop
This PR resolves test validation failure in
test_deployment_variable_not_usedfor the flex_start TPU blueprint.The validator was incorrectly flagging tpu_topology as unused when placement_policy was injected via hardware.go logic for deployments with static node count > 1.
Key changes
injectCompactPlacementPolicyfunction since this is already being handled in the blueprint modules.Submission Checklist
NOTE: Community submissions can take up to 2 weeks to be reviewed.
Please take the following actions before submitting this pull request.