feat: Implement dynamic machine configurations via Compute Engine API#5426
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request centralizes the management of GCE machine type mappings for GPUs and TPUs. By moving these definitions into a shared JSON file, the configuration becomes easier to maintain and can be reused across different parts of the codebase, including Go logic, thereby reducing duplication and potential for configuration drift. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request centralizes GPU and TPU machine type definitions by migrating hardcoded HCL maps from the gpu-definition and tpu-definition modules into a shared JSON configuration file at pkg/config/accelerators.json. A critical issue was identified where several g4-standard machine types (6, 12, and 24) were omitted during the migration, which would result in a breaking change for users of those machine types.
|
/gcbrun |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request refactors GPU and TPU definitions by replacing hardcoded Terraform maps with a dynamic injection system that fetches machine configurations via gcloud during blueprint expansion. Feedback focuses on improving the robustness of this new mechanism, including handling gcloud errors to preserve offline functionality, using the encoding/json package for safer JSON construction, and relying on API data for TPU counts instead of fragile string parsing. Additionally, suggestions were made to fix an unused import, ensure consistent JSON schemas, and improve the reliability of the command caching logic.
|
/gcbrun |
|
/gcbrun |
9d34f09 to
8237199
Compare
1cf461f to
d9e90b0
Compare
|
SUCCESS PR-test-gke go/ghpc-cb/c25a922d-e52e-408e-8f93-77c08cdbe7b2 |
8b0b811
into
GoogleCloudPlatform:develop
Summary
This PR modernizes machine configuration and accelerator discovery within the Cluster Toolkit by replacing hardcoded configuration maps with dynamic lookups against the Google Cloud Compute Engine API. In addition, all relevant Terraform module interfaces and Go structures have been updated to accurately reflect general machine specifications (CPUs, memory, GPUs, and TPUs).
Key Changes
Dynamic Machine Configurations via Go SDK:
High-Performance in-Memory Caching: