feat: Implement dynamic machine configurations via Compute Engine API by SwarnaBharathiMantena · Pull Request #5426 · GoogleCloudPlatform/cluster-toolkit

SwarnaBharathiMantena · 2026-03-30T08:27:45Z

Summary

This PR modernizes machine configuration and accelerator discovery within the Cluster Toolkit by replacing hardcoded configuration maps with dynamic lookups against the Google Cloud Compute Engine API. In addition, all relevant Terraform module interfaces and Go structures have been updated to accurately reflect general machine specifications (CPUs, memory, GPUs, and TPUs).

Key Changes

Dynamic Machine Configurations via Go SDK:

Implemented secure, direct HTTP/REST lookups using the official Google Cloud Compute Engine Go SDK (compute.Service).
Replaced static files (like accelerators.json) to natively support the latest GKE machine and accelerator offerings dynamically.
High-Performance in-Memory Caching:
Introduced a thread-safe caching layer (sync.Map) in pkg/config/machine_configs.go to prevent unnecessary API calls and optimize blueprint expansion speeds.

gemini-code-assist · 2026-03-30T08:32:36Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request centralizes the management of GCE machine type mappings for GPUs and TPUs. By moving these definitions into a shared JSON file, the configuration becomes easier to maintain and can be reused across different parts of the codebase, including Go logic, thereby reducing duplication and potential for configuration drift.

Highlights

Centralized Configuration: Introduced a new pkg/config/accelerators.json file to serve as the single source of truth for GPU and TPU machine type mappings.
Terraform Refactoring: Updated gpu-definition and tpu-definition Terraform modules to dynamically load accelerator configurations from the new JSON file instead of using hardcoded local maps.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request centralizes GPU and TPU machine type definitions by migrating hardcoded HCL maps from the gpu-definition and tpu-definition modules into a shared JSON configuration file at pkg/config/accelerators.json. A critical issue was identified where several g4-standard machine types (6, 12, and 24) were omitted during the migration, which would result in a breaking change for users of those machine types.

SwarnaBharathiMantena · 2026-03-30T09:21:46Z

/gcbrun

SwarnaBharathiMantena · 2026-04-02T09:50:49Z

/gemini review

gemini-code-assist

Code Review

This pull request refactors GPU and TPU definitions by replacing hardcoded Terraform maps with a dynamic injection system that fetches machine configurations via gcloud during blueprint expansion. Feedback focuses on improving the robustness of this new mechanism, including handling gcloud errors to preserve offline functionality, using the encoding/json package for safer JSON construction, and relying on API data for TPU counts instead of fragile string parsing. Additionally, suggestions were made to fix an unused import, ensure consistent JSON schemas, and improve the reliability of the command caching logic.

SwarnaBharathiMantena · 2026-04-03T02:59:01Z

/gcbrun

SwarnaBharathiMantena · 2026-04-03T03:02:03Z

/gcbrun

…e-cluster NAP

… API rather than gcloud CLI

…with upstream/develop

… to GCE

This reverts commit d7fd19b.

This reverts commit ee40fc8.

…stream

SwarnaBharathiMantena · 2026-04-13T10:13:50Z

SUCCESS PR-test-gke go/ghpc-cb/c25a922d-e52e-408e-8f93-77c08cdbe7b2
SUCCESS PR-test-gke-a2-highgpu-kueue-onspot go/ghpc-cb/b5b42a75-8775-4b61-b94b-120a32fbd151
SUCCESS PR-test-gke-a3-highgpu-onspot go/ghpc-cb/4726f2ac-17d7-4a5a-a7ad-bd6e9c8097e0
SUCCESS PR-test-gke-a3-ultragpu-onspot go/ghpc-cb/999b6199-eccd-46ba-89d2-9dae7b69014d
SUCCESS PR-test-gke-a4-onspot go/ghpc-cb/b5fc9256-ed81-4602-82a9-cfacd9a0ca6d
SUCCESS PR-test-gke-a4x go/ghpc-cb/22de52f7-9a35-4c1d-bbeb-f97768e3d513
SUCCESS PR-test-gke-g4 go/ghpc-cb/41a18691-d728-4607-b0eb-aa90f2e5663c
SUCCESS PR-test-gke-h4d-onspot go/ghpc-cb/b9c27e66-1210-44de-95cd-89e2da43f8f6
SUCCESS PR-test-gke-inactive-reservation go/ghpc-cb/c8d3275b-6ab4-419f-acda-a413ce71bf0f
SUCCESS PR-test-gke-managed-lustre go/ghpc-cb/06fd9e27-360d-45c3-9f99-d635b1522fc7
SUCCESS PR-test-gke-storage go/ghpc-cb/6411ccae-656e-49f2-ad23-38b72c9d49a6
SUCCESS PR-test-gke-tpu-7x go/ghpc-cb/ea0a6a6d-5014-47f0-8a3c-d2a08a6203a7
SUCCESS PR-test-gke-tpu-v6e go/ghpc-cb/201276b9-d48a-464e-9d88-6313827cc57d
SUCCESS PR-test-ml-gke go/ghpc-cb/07bf272c-7c77-4ff5-b0d6-c5ae44ac90d4
SUCCESS PR-test-ml-gke-e2e go/ghpc-cb/5e22eb1f-c290-40ea-b20f-d76c4910b90c
SUCCESS PR-test-slurm-gke go/ghpc-cb/74df824f-8b99-4a20-9d05-718eb6ae50c2
FAILURE[2] PR-test-gke-a3-megagpu-onspot go/ghpc-cb/bb7359b1-43cc-49f1-87c2-d7b18a8c9dba
FAILURE[2] PR-test-gke-managed-hyperdisk go/ghpc-cb/dfe70a64-bdc3-4b6c-a707-13614429fab3
------- TOTAL:18 | SUCCESS: 16 | FAILURE: 2

SwarnaBharathiMantena added the release-module-improvements Added to release notes under the "Module Improvements" heading. label Mar 30, 2026

SwarnaBharathiMantena requested review from a team and samskillman as code owners March 30, 2026 08:27

kadupoornima requested changes Mar 30, 2026

View reviewed changes

Comment thread pkg/config/accelerators.json Outdated

gemini-code-assist Bot reviewed Mar 30, 2026

View reviewed changes

Comment thread pkg/config/accelerators.json Outdated

SwarnaBharathiMantena requested a review from kadupoornima March 30, 2026 08:38

SwarnaBharathiMantena marked this pull request as draft March 30, 2026 09:17

SwarnaBharathiMantena changed the title ~~Introduce accelerators.json as Single Source of Truth for GCE machine types~~ feat: Use dynamic gcloud commands to fetch machine info for Terraform modules Apr 2, 2026

gemini-code-assist Bot reviewed Apr 2, 2026

View reviewed changes

SwarnaBharathiMantena marked this pull request as ready for review April 3, 2026 03:00

kvenkatachala333 previously approved these changes Apr 3, 2026

View reviewed changes

SwarnaBharathiMantena dismissed kvenkatachala333’s stale review via 0b1c5a4 April 3, 2026 10:53

SwarnaBharathiMantena marked this pull request as draft April 3, 2026 11:31

SwarnaBharathiMantena marked this pull request as ready for review April 3, 2026 12:55

SwarnaBharathiMantena force-pushed the swarnabm/update_machine_info_map branch 2 times, most recently from 9d34f09 to 8237199 Compare April 6, 2026 10:03

SwarnaBharathiMantena requested a review from kvenkatachala333 April 7, 2026 03:54

SwarnaBharathiMantena changed the title ~~feat: Use dynamic gcloud commands to fetch machine info for Terraform modules~~ refactor: Implement dynamic machine configurations via Compute Engine Go SDK Apr 7, 2026

SwarnaBharathiMantena changed the title ~~refactor: Implement dynamic machine configurations via Compute Engine Go SDK~~ feat: Implement dynamic machine configurations via Compute Engine Go SDK Apr 7, 2026

SwarnaBharathiMantena requested review from Neelabh94 and cboneti April 8, 2026 05:40

SwarnaBharathiMantena changed the title ~~feat: Implement dynamic machine configurations via Compute Engine Go SDK~~ feat: Implement dynamic machine configurations via Compute Engine API Apr 8, 2026

kadupoornima reviewed Apr 10, 2026

View reviewed changes

kadupoornima previously approved these changes Apr 10, 2026

View reviewed changes

cboneti previously approved these changes Apr 10, 2026

View reviewed changes

SwarnaBharathiMantena added 21 commits April 13, 2026 04:31

Pass accelerator_configs to internal definition modules and update gk…

29ca07d

…e-cluster NAP

Update golden copy expectations for merge_flatten

4dc0bf7

Skip gcloud call for invalid-project in machine configs

0f830f5

Fix TPU detection for ct6e machine types

8adb682

Fix unsupported attribute and function call failure in PR 5426

70a952b

Fix zone detection and restore missing upstream changes in gke-cluster

a209713

Remove cluster_autoscaling changes from gke-cluster

52c1ff4

refactor: rename accelerator_configs to machine_configs

10882cb

test: update golden copy expectations for machine_configs

a1f53a5

chore: exclude GEMINI.md from pymarkdown pre-commit hook

b3af547

chore: remove GEMINI.md from git tracking

d5466f2

Refactor dynamic machine configuration fetching to use Compute Engine…

683842a

… API rather than gcloud CLI

chore: clean up dead pkg/gcloud folder and files

ff9d123

style: clean up whitespace and align gke-cluster configuration order …

8a66f6d

…with upstream/develop

style: clean up whitespace formatting in gke-cluster variables

b897302

style: verify pre-commit formatting for tpu-definition README

c02f813

refactor: address PR 5426 review comments

614d021

feat: mock machine configs for tests and update descriptions from GKE…

0f0a298

… to GCE

docs: update community module READMEs via terraform-docs

d7fd19b

docs: update remaining module READMEs via terraform-docs

ee40fc8

Revert "docs: update community module READMEs via terraform-docs"

d9e90b0

This reverts commit d7fd19b.

SwarnaBharathiMantena force-pushed the swarnabm/update_machine_info_map branch from 1cf461f to d9e90b0 Compare April 13, 2026 04:47

SwarnaBharathiMantena added 2 commits April 13, 2026 04:49

Revert "docs: update remaining module READMEs via terraform-docs"

de60fca

This reverts commit ee40fc8.

style: revert table header formatting in specific READMEs to match up…

0116517

…stream

kadupoornima reviewed Apr 14, 2026

View reviewed changes

Comment thread pkg/config/machine_configs.go

kadupoornima approved these changes Apr 14, 2026

View reviewed changes

SwarnaBharathiMantena merged commit 8b0b811 into GoogleCloudPlatform:develop Apr 14, 2026
32 of 72 checks passed

SwarnaBharathiMantena mentioned this pull request Apr 16, 2026

(Slurm) Implement dynamic machine configurations via API #5514

Merged

aslam-quad mentioned this pull request Apr 20, 2026

Release candidate: v1.88.1 #5531

Closed

Conversation

SwarnaBharathiMantena commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Changes

Uh oh!

gemini-code-assist Bot commented Mar 30, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

SwarnaBharathiMantena commented Mar 30, 2026

Uh oh!

SwarnaBharathiMantena commented Apr 2, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SwarnaBharathiMantena commented Apr 3, 2026

Uh oh!

SwarnaBharathiMantena commented Apr 3, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SwarnaBharathiMantena commented Apr 13, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

SwarnaBharathiMantena commented Mar 30, 2026 •

edited

Loading