Update GKE TPU examples by removing static_node_count var#5557
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refactors the GKE TPU configuration to improve consistency and reduce redundancy. By removing an unused variable and updating the quota calculation logic to reference the node pool's internal state, the configuration becomes more robust and easier to maintain. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request removes the static_node_count variable from the blueprint and updates the tpu_quota calculation to reference an output from the gke-tpu-7x-pool module. Review feedback highlights two critical issues: first, removing the explicit node count may cause deployment failures if the module logic for automatic calculation is missing; second, the updated tpu_quota calculation will likely fail because it attempts to perform arithmetic on a list output (resulting from the module's use of count) without proper indexing.
shubpal07
left a comment
There was a problem hiding this comment.
LGTM!!
Automating static_node_count from topology and machine type reduces configuration complexity and prevents deployment failures due to manual calculation errors.
4e9329e
into
GoogleCloudPlatform:develop
1. Rationale
This PR eliminates
static_node_countvariable from TPU example blueprints as the node count value is automatically getting derived frommachine_type,tpu_topology, andnum_slices. This reduces deployment errors and simplifies the user experience.2. Changes Done
static_node_countvariable from YAML filesnode_count_staticfrom the GKE moduleSubmission Checklist
NOTE: Community submissions can take up to 2 weeks to be reviewed.
Please take the following actions before submitting this pull request.