Skip to content

Support Compute Endpoint Override for Slurm image building and cluster deployment#5493

Merged
AdarshK15 merged 7 commits into
GoogleCloudPlatform:developfrom
AdarshK15:packer-staging
Apr 27, 2026
Merged

Support Compute Endpoint Override for Slurm image building and cluster deployment#5493
AdarshK15 merged 7 commits into
GoogleCloudPlatform:developfrom
AdarshK15:packer-staging

Conversation

@AdarshK15

@AdarshK15 AdarshK15 commented Apr 14, 2026

Copy link
Copy Markdown
Member

Summary

This PR enables Cluster Toolkit to provide custom compute endpoint overrides for Slurm image building and cluster deployment.

Packer Upgrade: I upgraded the googlecompute Packer plugin to ~> 1.2.5, as older versions don't support the custom_endpoints variable.
Gcloud override & Compute endpoint Variables: I added gcloud_path_override and compute_endpoint_version as input variables to both the Packer and startup-script modules. This lets us pass these values directly from the blueprint.
Startup Script Wrapper: I added a gcloud wrapper in the startup script to set the custom compute endpoint. This ensures that when ansible-pull runs the slurm-gcp playbook to install Slurm, all underlying gcloud commands correctly use the provided override.
Customizable Licenses: I changed image_licenses field into a variable, to allow it to be overridden as the default production license URL may not be available.
Explicit Region: Introduced an explicit region variable for the googlecompute builder. Without this change, the logic truncated the zone name to infer the region name, which failed for zones that did not follow the region-[a-z] naming convention.

Testing

I verified these changes by building and deploying a4high, a4x, and g4 clusters. I also verified Rocky Linux 8 image building and cluster creation using the hpc-build-slurm-image.yaml blueprint.

@AdarshK15 AdarshK15 requested review from a team and samskillman as code owners April 14, 2026 09:06
@AdarshK15 AdarshK15 changed the title Support image building and cluster deployment in GCE staging Support Slurm image building and cluster deployment in GCE staging Apr 14, 2026
@AdarshK15 AdarshK15 added the release-chore To not include into release notes label Apr 14, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for custom Google Compute API endpoints and gcloud path overrides in the Packer custom image and startup-script modules, updates the googlecompute plugin version, and introduces variables for image licenses and regions. Feedback recommends extending the compute endpoint override to Packer's provisioner logic and using an OR operator in the startup script template to ensure the gcloud wrapper is created if either override is specified.

Comment thread modules/packer/custom-image/image.pkr.hcl Outdated
Comment thread modules/scripts/startup-script/templates/startup-script-custom.tftpl Outdated
@AdarshK15 AdarshK15 changed the title Support Slurm image building and cluster deployment in GCE staging Support Compute Endpoint Override for Slurm image building and cluster deployment Apr 17, 2026

@LAVEEN LAVEEN left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@AdarshK15 AdarshK15 merged commit 576d4fa into GoogleCloudPlatform:develop Apr 27, 2026
53 of 84 checks passed
@AdarshK15 AdarshK15 deleted the packer-staging branch May 3, 2026 18:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-chore To not include into release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants