Release candidate: v1.68.0#4746
Merged
Merged
Conversation
Update nvidia DRA driver version to v25.3.0
Readme update for A3 Mega
Update the blueprints using managed_lustre vars to 36T/500MB Tier
Refactoring in gke persistent module
Merge v1.67.0 into develop
Updated A3-mega and A4-high Slurm blueprints to adopt nvidia add repository scirpt.
update terraform-provider version to 7.3.0
Remove superfluous addition chs logs to cloud ops config
downloading libnccl2 and libnccl-dev for a3u and a4h
Add nvidia-imex-* to list of held packages
When changing startup script content or changing partitions, following error may occur: ╷ │ Error: Provider produced inconsistent final plan │ │ When expanding the plan for module.slurm_controller.module.slurm_files.google_storage_bucket_object.nodeset_config["flexnodeset"] to include new values learned so far during apply, provider "registry.terraform.io/hashicorp/google" produced an invalid new value for .md5hash: was known, but now unknown. │ │ This is a bug in the provider, which should be reported in the provider's own issue tracker. With this change this is no longer the case
Fixing output for gke-storage module
Add mufaqam-gcl to cluster-toolkit-writers.json
Updating google provider version upper bound to 7.6.0 - latest version
Update GKE cluster and firewall rules module versions to use recent google provider
…add-filestore-pvc Add Filestore, PV, and sample job template snippets to the GKE H4D blueprint
Add NUMA-aware scheduling in GKE clusters (enabled for G4)
Add slurm-gke blueprint
Migrate Kueue installation to use Helm chart
Update Toolkit release to v1.68.0
Updated the instance_image.family in a3ultra-vm.yaml to use ubuntu-accelerator-2204-amd64-with-nvidia-570 instead of nvidia-550 for improved compatibility and performance.
This pull request makes several updates to the GKE cluster and network modules, primarily focused on removing NUMA-aware scheduling support and aligning Terraform module and provider versions for improved compatibility. The changes simplify the configuration and ensure consistent dependency management across modules. **Removal of NUMA-aware scheduling support:** * Removed the `enable_numa_aware_scheduling` variable and all related configuration from the GKE cluster module, including the `kubelet_config` block and references in documentation and example files. [[1]](diffhunk://#diff-7939cd594b53ae6e59dae4629a32d7558e7c23123919d7b6e469ac18a57adddcL244-L259) [[2]](diffhunk://#diff-e54397224c9be21ab0ad72546e3d818fd2a4921bf593b2b6d7e881e6fc1d56e6L528-L533) [[3]](diffhunk://#diff-35b044e2245368feb59f14b7a63621200c0df5f4245b426552a09b8329705507L158) [[4]](diffhunk://#diff-e6090e2163c0286245ffc70056c158ea25acdeab329b5d21352fb007f80f4c73L125) **Module and provider version alignment:** * Updated the required versions for the `google` and `google-beta` Terraform providers from `>= 7.2` to `>= 6.16` in both the `versions.tf` and documentation files to standardize provider requirements. [[1]](diffhunk://#diff-b8e991c0f592027d61744d232494249832632ecc529153eac609f9e70444b471L21-R25) [[2]](diffhunk://#diff-35b044e2245368feb59f14b7a63621200c0df5f4245b426552a09b8329705507L106-R122) * Changed the version constraints for the `workload_identity` module from `>= 40.0` to `~> 34.0` for compatibility, reflected in both code and documentation. [[1]](diffhunk://#diff-7939cd594b53ae6e59dae4629a32d7558e7c23123919d7b6e469ac18a57adddcL412-R396) [[2]](diffhunk://#diff-35b044e2245368feb59f14b7a63621200c0df5f4245b426552a09b8329705507L106-R122) * Updated the version constraint for the `firewall_rule` module from `~> 12.0` to `~> 9.0` in both code and documentation for consistency. [[1]](diffhunk://#diff-bd07c7386bc0355d11578ce911bbc9a34a40f078b6f41fc0a8230d9b74eec28fL54-R54) [[2]](diffhunk://#diff-04a94d2869736107d8d67616c00f4e89cea5605aaec349c3d040b61be9cd1d0dL86-R86) ·
Updated the instance_image.family in a3ultra-vm.yaml to use ubuntu-accelerator-2204-amd64-with-nvidia-570 instead of nvidia-550 for improved compatibility and performance.
Release patch
arpit974
approved these changes
Oct 10, 2025
parulbajaj01
approved these changes
Oct 10, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Submission Checklist
NOTE: Community submissions can take up to 2 weeks to be reviewed.
Please take the following actions before submitting this pull request.