Skip to content

Support Customer-Managed Encryption Keys (CMEK) in Slurm GCP deployments#5407

Merged
saara-tyagi27 merged 7 commits into
GoogleCloudPlatform:developfrom
saara-tyagi27:slurm-kms-support
Mar 27, 2026
Merged

Support Customer-Managed Encryption Keys (CMEK) in Slurm GCP deployments#5407
saara-tyagi27 merged 7 commits into
GoogleCloudPlatform:developfrom
saara-tyagi27:slurm-kms-support

Conversation

@saara-tyagi27

@saara-tyagi27 saara-tyagi27 commented Mar 25, 2026

Copy link
Copy Markdown
Contributor

Summary

This PR adds comprehensive support for Customer-Managed Encryption Keys (CMEK) to Slurm GCP deployments, ensuring that all storage (GCS) and compute resources (disks) can be encrypted with a user-specified KMS key.

Changes

Core Implementation

  • Plumb KMS variables (disk_encryption_key, etc.) down from nodeset/login modules through instance_template wrappers.
  • Inject kms_key_self_link explicitly in the boot_disk override of google_compute_instance_from_template.
  • Add slurm_bucket_kms_key to the controller module to encrypt the generated Slurm configuration bucket.
  • Patch util.py to handle missing md5_hashes from CMEK-encrypted GCS configuration blobs by falling back to crc32c.

Enhancements and Propagation

  • Propagate KMS variables through nodeset, partition, login, and controller modules to ensure full coverage.
  • Update internal_instance_template to utilize kms_key_self_link for disk encryption.
  • Add examples/hpc-slurm-kms.yaml blueprint demonstrating CMEK configuration for all components.
  • Update module and examples documentation to reflect CMEK support.

Verification

The implementation has been verified by deploying a cluster with CMEK enabled and confirming that:

  1. The controller and login node boot disks are encrypted.
  2. The Slurm state GCS bucket is encrypted.
  3. Dynamically created compute nodes are encrypted.

cboneti and others added 2 commits March 25, 2026 18:28
- Plumb KMS variables (disk_encryption_key, etc.) down from nodeset/login modules through instance_template wrappers.
- Inject kms_key_self_link explicitly in the boot_disk override of google_compute_instance_from_template.
- Add slurm_bucket_kms_key to the controller module to encrypt the generated Slurm configuration bucket.
- Patch util.py to handle missing md5_hashes from CMEK-encrypted GCS configuration blobs by falling back to crc32c.
@gemini-code-assist

Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the security posture of Slurm on Google Cloud deployments by integrating extensive support for Customer-Managed Encryption Keys (CMEK). Users can now encrypt critical components of their Slurm clusters, including all types of disks, source images, source snapshots, and the Slurm configuration bucket, using their own KMS keys. This provides greater control over data encryption, helping to meet stringent security and compliance requirements for High-Performance Computing (HPC) workloads.

Highlights

  • Customer-Managed Encryption Keys (CMEK) for Disks: Added comprehensive support for Customer-Managed Encryption Keys (CMEK) for boot disks and additional disks across various Slurm modules. This includes new variables such as disk_encryption_key and disk_encryption_key_service_account.
  • CMEK for Source Images and Snapshots: Introduced support for CMEK for source images and snapshots used to create disks, enabling encryption via source_image_encryption_key, source_image_encryption_key_service_account, source_snapshot_encryption_key, and source_snapshot_encryption_key_service_account variables.
  • CMEK for Slurm Configuration Bucket: Implemented a slurm_bucket_kms_key variable to allow encryption of the Slurm configuration bucket with Customer-Managed Encryption Keys.
  • Updated Documentation and Example: Updated README.md files for affected modules to reflect the new KMS encryption parameters and added a new example blueprint (hpc-slurm-kms.yaml) to demonstrate the usage of KMS encryption for Slurm deployments.
  • Utility Script Enhancement: Modified the util.py script to gracefully handle cases where md5_hash is missing for blobs (e.g., due to CMEK encryption), falling back to crc32c or an empty string for consistent hashing.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces comprehensive support for Customer-Managed Encryption Keys (CMEK) across various disk configurations and the Slurm configuration bucket within the Slurm GCP modules. This includes adding new input variables for encryption keys and service accounts, and updating module logic to correctly pass these through to Google Cloud resources. A new example hpc-slurm-kms.yaml is also introduced to demonstrate CMEK usage. Review feedback highlights a critical issue where the dynamic "boot_disk" block in community/modules/internal/slurm-gcp/instance/main.tf uses an unsupported kms_key_self_link attribute, which will cause a Terraform error. Additionally, the local.state_disk definition in community/modules/scheduler/schedmd-slurm-gcp-v6-controller/controller.tf is missing the disk_encryption_key_service_account, which could lead to authentication errors. There is also an inconsistency in the dynamic blocks for source_image_encryption_key and source_snapshot_encryption_key in community/modules/internal/slurm-gcp/internal_instance_template/main.tf regarding efficiency and fallback logic. Lastly, the new hpc-slurm-kms.yaml example needs to be registered in examples/README.md as per the repository style guide.

Comment thread community/modules/internal/slurm-gcp/instance/main.tf Outdated
Comment thread community/examples/hpc-slurm-kms.yaml
Comment thread community/modules/internal/slurm-gcp/internal_instance_template/main.tf Outdated
@saara-tyagi27 saara-tyagi27 changed the title Slurm kms encryption Support Customer-Managed Encryption Keys (CMEK) in Slurm GCP deployments Mar 27, 2026
@saara-tyagi27 saara-tyagi27 added enhancement New feature or request release-key-new-features Added to release notes under the "Key New Features" heading. labels Mar 27, 2026
@saara-tyagi27 saara-tyagi27 marked this pull request as ready for review March 27, 2026 08:47
@saara-tyagi27 saara-tyagi27 requested review from a team and samskillman as code owners March 27, 2026 08:47
Comment thread examples/README.md Outdated
@saara-tyagi27 saara-tyagi27 merged commit 178e272 into GoogleCloudPlatform:develop Mar 27, 2026
14 of 70 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request release-key-new-features Added to release notes under the "Key New Features" heading.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants