Integrate storage profile in GCSFuse#5476
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the GKE Cluster Toolkit by integrating GKE storage profiles with GCSFuse. The changes streamline the process of provisioning persistent volumes backed by Google Cloud Storage, allowing users to specify optimized storage classes for different workloads (e.g., training, checkpointing). This update simplifies configuration, improves performance management for GCS-backed storage, and includes automated permission handling for a smoother user experience. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request updates the gke-a3-ultragpu example to support GCS Fuse Storage Profiles, including a GKE version bump to 1.35 and the addition of gcsfuse_storage_class_name parameters. It also implements automated IAM role binding for the GKE service agent and updates the module documentation. Review feedback highlights critical issues where removing the pre-existing-network-storage modules breaks the gke-persistent-volume module's dependency on the network_storage variable. Other feedback points out a breaking change in mount option filtering, a potential Terraform error in the IAM resource count logic, and redundant variable assignments that violate the style guide.
gargnitingoogle
left a comment
There was a problem hiding this comment.
Minor comments other than whatever Gemini code assistant has commented. Otherwise LGTM. We should also confirm by testing these changes before going ahead with full-on review.
97a464e
into
GoogleCloudPlatform:develop
This PR integrates GCS Fuse Storage Profiles directly into the GKE persistent volume modules. We are moving away from the older method of manually configuring GCS mounts on the node host (via pre-existing-network-storage) and switching to using Kubernetes-native StorageClasses like gcsfusecsi-training.
What changed?
Module Updates: Added the
gcsfuse_storage_class_namesetting into the gke-persistent-volume module. It automatically maps inputs into the PV/PVC specs cleanly.IAM Security: Instead of trying to force Terraform to create global custom roles (which needs administrative permissions we don't want to grant automation runners), both the
gke.gcsfuse.profileUserrole creation and service account bindings are kept manual. I've added straightforward instructions in the module's README.Pre-flight Checks: Adde Go validations that check if permissions are set up correctly before provisioning to avoid messy deployment failures.
Testing
Everything builds successfully locally.
If a user updates their blueprint and runs
gcluster deploy -won an existing cluster which earlier had the mount options config, the only thing in this case that the user might need to do is deleting the PV's and PVC's manually because the first time, the PVCs were created with an empty storageClassName: "" (or defaulting to standard). Now, the update will try to change that field togcsfusecsi-checkpointingandgcsfusecsi-training. Kubernetes rejects the patch because it does not allow changing the storage class of an existing, bound PVC.To apply the new storage profiles, user will need to delete the old PVCs and PVs so that Terraform can recreate them from scratch with the new settings. Since the data is stored externally in GCS buckets, deleting the Kubernetes volume objects will not delete the actual data in GCS.
Submission Checklist
NOTE: Community submissions can take up to 2 weeks to be reviewed.
Please take the following actions before submitting this pull request.