NAP support on GKE Clusters (gke-cluster module)#5420
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces support for GKE Node Auto-Provisioning (NAP) within the gke-cluster module. It enhances the cluster configuration by allowing users to define autoscaling limits and defaults, which are then processed and validated through new Go-based configuration logic. This change enables more flexible and efficient resource management for GKE clusters by automatically adjusting node pools based on workload demands. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request implements GKE Node Auto-Provisioning (NAP) and Cluster Autoscaling by adding Terraform configurations, input variables, and Go logic to process accelerator limits. Feedback highlights a critical logic error where machine types are incorrectly mapped to accelerator types, which will cause Terraform failures and requires updates to the unit tests. Additionally, it is recommended to replace hardcoded resource limits and disk settings with configurable variables to improve the module's flexibility and maintainability.
d66660a to
67c42f6
Compare
|
/gemini summary |
Summary of ChangesThis pull request introduces support for GKE Node Auto-Provisioning (NAP) within the gke-cluster module. It enhances the cluster configuration by allowing users to define autoscaling limits and defaults, which are then processed and validated through new Go-based configuration logic. This change enables more flexible and efficient resource management for GKE clusters by automatically adjusting node pools based on workload demands. Highlights
Activity
|
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces support for GKE Node Auto-Provisioning (NAP) and Cluster Autoscaling. Key changes include a new cluster_autoscaling variable in the GKE module, dynamic resource limit configuration, and Go-based logic to resolve accelerator details from machine types. Feedback focuses on the hardcoding of disk parameters (type and size) in the auto-provisioning defaults, which could cause compatibility issues, and a restrictive default for accelerator scaling limits that may hinder cluster expansion.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces support for GKE Node Auto-Provisioning (NAP) and Cluster Autoscaling by adding a new cluster_autoscaling input variable and implementing logic to dynamically configure resource limits and defaults. The changes include updates to the GKE cluster Terraform module and new Go functionality to resolve accelerator details from machine types. Feedback focuses on ensuring correct GKE-specific resource types for TPUs, avoiding duplicate resource limits for CPU and memory, failing fast on invalid accelerator configurations, and improving Go code maintainability by reusing existing configuration structures.
179e8ce to
44f608b
Compare
|
The failing tests are either capacity issues or known issues the team is working on. |
8fe32e7 to
8240c37
Compare
|
The failing tests are either capacity issues or known issues the team is working on. |
…treamline autoscaling to be optional, and add fractional GPU shorthands
… enabled flag, rename cache, and update v5litepod naming
|
Test failures: Reservation does not exist Capacity issues Known issues |
Neelabh94
left a comment
There was a problem hiding this comment.
added some nit comments on improvements. They can be done as a follow up PR.
LGTM!
6a44cf8
into
GoogleCloudPlatform:develop
GKE Node Auto-Provisioning (NAP) Support: Added support for GKE Node Auto-Provisioning (NAP) in the gke-cluster module, enabling dynamic node pool provisioning based on workload requirements.
Terraform Configuration Updates: Updated Terraform variables and main configuration to include cluster_autoscaling settings, allowing for dynamic resource limits and auto-provisioning defaults.
Go Configuration Logic: Implemented Go-based logic to parse machine types and inject internal variables for accelerator configuration, ensuring proper NAP setup.
Testing: Added comprehensive unit tests to validate the new autoscaling configuration logic and accelerator extraction.