Support Customer-Managed Encryption Keys (CMEK) in Slurm GCP deployments#5407
Conversation
- Plumb KMS variables (disk_encryption_key, etc.) down from nodeset/login modules through instance_template wrappers. - Inject kms_key_self_link explicitly in the boot_disk override of google_compute_instance_from_template. - Add slurm_bucket_kms_key to the controller module to encrypt the generated Slurm configuration bucket. - Patch util.py to handle missing md5_hashes from CMEK-encrypted GCS configuration blobs by falling back to crc32c.
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the security posture of Slurm on Google Cloud deployments by integrating extensive support for Customer-Managed Encryption Keys (CMEK). Users can now encrypt critical components of their Slurm clusters, including all types of disks, source images, source snapshots, and the Slurm configuration bucket, using their own KMS keys. This provides greater control over data encryption, helping to meet stringent security and compliance requirements for High-Performance Computing (HPC) workloads. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces comprehensive support for Customer-Managed Encryption Keys (CMEK) across various disk configurations and the Slurm configuration bucket within the Slurm GCP modules. This includes adding new input variables for encryption keys and service accounts, and updating module logic to correctly pass these through to Google Cloud resources. A new example hpc-slurm-kms.yaml is also introduced to demonstrate CMEK usage. Review feedback highlights a critical issue where the dynamic "boot_disk" block in community/modules/internal/slurm-gcp/instance/main.tf uses an unsupported kms_key_self_link attribute, which will cause a Terraform error. Additionally, the local.state_disk definition in community/modules/scheduler/schedmd-slurm-gcp-v6-controller/controller.tf is missing the disk_encryption_key_service_account, which could lead to authentication errors. There is also an inconsistency in the dynamic blocks for source_image_encryption_key and source_snapshot_encryption_key in community/modules/internal/slurm-gcp/internal_instance_template/main.tf regarding efficiency and fallback logic. Lastly, the new hpc-slurm-kms.yaml example needs to be registered in examples/README.md as per the repository style guide.
…y service account to controller disk
178e272
into
GoogleCloudPlatform:develop
Summary
This PR adds comprehensive support for Customer-Managed Encryption Keys (CMEK) to Slurm GCP deployments, ensuring that all storage (GCS) and compute resources (disks) can be encrypted with a user-specified KMS key.
Changes
Core Implementation
disk_encryption_key, etc.) down from nodeset/login modules through instance_template wrappers.kms_key_self_linkexplicitly in the boot_disk override ofgoogle_compute_instance_from_template.slurm_bucket_kms_keyto the controller module to encrypt the generated Slurm configuration bucket.util.pyto handle missingmd5_hashesfrom CMEK-encrypted GCS configuration blobs by falling back tocrc32c.Enhancements and Propagation
nodeset,partition,login, andcontrollermodules to ensure full coverage.internal_instance_templateto utilizekms_key_self_linkfor disk encryption.Verification
The implementation has been verified by deploying a cluster with CMEK enabled and confirming that: