Skip to content

Releases: GoogleCloudPlatform/cluster-toolkit

v1.92.0

04 Jun 15:05
dbd2806

Choose a tag to compare

What's Changed

Key New Features 🎉

Breaking Changes 🚨

  • Transitioning to Slurm Native Auth with resilient workbench keys distribution by @arpit974 in #5695
  • default to sauth for newer deployments in h4d and a3mega-gcsfuse blueprints by @arpit974 in #5707

New Modules 🧱

Module Improvements 🔨

  • Adding native K8s annotations and GKE cluster enhancements by @arpit974 in #5610
  • Default Kueue config for Pathways by @scaliby in #5628

Improvements 🛠

  • [Telemetry] Get blueprint even from deployment directory by @kadupoornima in #5656
  • [Telemetry] Capture exit code upon fatal command failures by @kadupoornima in #5658
  • (gke) Remove additional network settings from A3U blueprint by @agrawalkhushi18 in #5652
  • (gke) Remove additional networks from A4 and A4X family blueprints by @agrawalkhushi18 in #5682
  • (gke) Remove additional network settings from TPU v6e,7x and g4 by @agrawalkhushi18 in #5692
  • [Telemetry] Add support to merge vars from deployment files and CLI --vars by @kadupoornima in #5694
  • [Telemetry] Add support for collection of CPU machines and Default machines when unset in module by @kadupoornima in #5696
  • Make Managed lustre default in A3u and A3m series Slurm blueprints by @saara-tyagi27 in #5396
  • [Telemetry] Add a retry mechanism to get the GCP Project information to eliminate transient issues by @kadupoornima in #5702
  • [Telemetry] Add an atomic flag to ensure telemetry event is not recurrently called by @kadupoornima in #5705
  • Pin DCGM to version 4.5.3 by @shubpal07 in #5721
  • feat(gke): expose monitoring components as a parameter by @cboneti in #5722
  • feat(job submission): Dynamic topology routing for gke jobs by @Neelabh94 in #5664

Deprecations 💤

Version Updates ⏫

  • Fix A3 HighGPU test by pinning GKE version to 1.33 to resolve COS incompatibility by @kadupoornima in #5673
  • Update minimum required Packer version to 1.15.3 by @AdarshK15 in #5701

Bug fixes 🐞

Full Changelog: v1.91.0...v1.92.0

v1.91.0

14 May 06:41
c5e27e9

Choose a tag to compare

What's Changed

Key New Features 🎉

  • Allow parallel containers for TPU7x by @Neelabh94 in #5612
  • [Telemetry] Start collecting Telemetry data by adding a new "telemetry" command to GCluster CLI! by @kadupoornima in #5602

New Modules 🧱

  • adding new module direct-helm-install in community folder. by @arpit974 in #5578
  • adding new module spanner in cluster toolkit. by @arpit974 in #5592

Module Improvements 🔨

  • Ensure fully qualified URLs for reservation subblocks by @scaliby in #5452
  • Introduce Kueue and Jobset controller resources overrides inputs by @jamOne- in #5581

Improvements 🛠

  • [Telemetry] Use GitHub API and local caching for metadata retrieval by @kadupoornima in #5589
  • [Telemetry] Add support to collect the Blueprint name by @kadupoornima in #5547
  • [Telemetry] Add support to collect the Deployment File name by @kadupoornima in #5539
  • [Telemetry] Implement local caching to persistently store user config. Remove Firestore dependency completely by @kadupoornima in #5594
  • fix: Update hardware.go for tpu_topology extraction through workload_policy by @agrawalkhushi18 in #5600
  • feat: implement lean deployment modules by selective copying by @cboneti in #5482
  • Add PriorityClasses to example Kueue configs by @scaliby in #5614

Bug fixes 🐞

  • Correctly evaluate Docker credentials prerequisite state by @scaliby in #5607
  • fix(slurm): respect visible_core_count in cloud.conf generation by @saara-tyagi27 in #5529
  • fix(gke): Missing Pathways Quotas in Kueue by @Neelabh94 in #5645

Full Changelog: v1.90.0...v1.91.0

v1.90.0

07 May 03:42
e83c18b

Choose a tag to compare

What's Changed

Key New Features 🎉

Module Improvements 🔨

  • Expose accelerator_topology_mode and enable_slice_controller outputs by @jamOne- in #5573

Improvements 🛠

  • [Telemetry] Refactor getModules method to use cached standard modules from firestore by @kadupoornima in #5570
  • [Telemetry] Add support to collect Toolkit installation mode by @kadupoornima in #5598

Bug fixes 🐞

Full Changelog: v1.89.0...v1.90.0

v1.89.0

30 Apr 04:54
ac96a6c

Choose a tag to compare

What's Changed

Key New Features 🎉

Module Improvements 🔨

Improvements 🛠

Deprecations 💤

Version Updates ⏫

Bug fixes 🐞

  • Fixing NCCL Test For A4X : Updating NCCL network interfaces and enroot paths by @LAVEEN in #5504

Other changes

New Contributors

Full Changelog: v1.88.0...v1.89.0

v1.88.0

16 Apr 04:36
14fba60

Choose a tag to compare

Release v1.88.0

What's Changed

Key New Features 🎉

Breaking Changes 🚨

Module Improvements 🔨

Improvements 🛠

New Contributors

Full Changelog: v1.87.0...v1.88.0

v1.87.0

09 Apr 05:34
c4996d8

Choose a tag to compare

What's Changed

Key New Features 🎉

Breaking Changes 🚨

New Modules 🧱

  • refactor: Introduce internal semver compare module by @jamOne- in #5411

Module Improvements 🔨

  • refactor: Fix pre-commit error in kubectl-apply by @jamOne- in #5427
  • feat: Add resource-policy accelerator_topology_mode by @jamOne- in #5393

Improvements 🛠

Deprecations 💤

Full Changelog: v1.86.0...v1.87.0

v1.86.0

02 Apr 05:14
bf80139

Choose a tag to compare

What's Changed

Key New Features 🎉

  • feat: Implement and configure GKE Image Streaming (GCFS) at the cluster level. by @raushan2016 in #5387
  • Support vGPU (fractional GPU) for G4 GKE by @kadupoornima in #5399
  • Support Customer-Managed Encryption Keys (CMEK) in Slurm GCP deployments by @saara-tyagi27 in #5407

Breaking Changes 🚨

Module Improvements 🔨

Improvements 🛠

Version Updates ⏫

Bug fixes 🐞

New Contributors

Full Changelog: v1.85.0...v1.86.0

v1.85.0

25 Mar 08:06
eed7b09

Choose a tag to compare

What's Changed

Key New Features 🎉

  • feat(storage): Enable GCS zonal bucket capability with RAPID storage. by @Neelabh94 in #5353
  • Support future reservation in name check validator by @saara-tyagi27 in #5252

Breaking Changes 🚨

Improvements 🛠

Version Updates ⏫

  • Pin shfmt and goimports version to resolve Go version conflict by @kadupoornima in #5365

Bug fixes 🐞

  • Refine gcluster deploy flag checks to only consider local flags. by @scaliby in #5372

New Contributors

Full Changelog: v1.84.0...v1.85.0

v1.84.0

12 Mar 10:50
164f0a3

Choose a tag to compare

What's Changed

Key New Features 🎉

Version Updates ⏫

Bug fixes 🐞

Full Changelog: v1.83.0...v1.84.0

v1.83.0

05 Mar 11:33
8eb5de0

Choose a tag to compare

What's Changed

Key New Features 🎉

Breaking Changes 🚨

Module Improvements 🔨

Improvements 🛠

Bug fixes 🐞

New Contributors

Full Changelog: v1.82.0...v1.83.0