Skip to content

Integrate storage profile in GCSFuse#5476

Merged
parulbajaj01 merged 10 commits into
GoogleCloudPlatform:developfrom
parulbajaj01:parul/dev
May 4, 2026
Merged

Integrate storage profile in GCSFuse#5476
parulbajaj01 merged 10 commits into
GoogleCloudPlatform:developfrom
parulbajaj01:parul/dev

Conversation

@parulbajaj01

@parulbajaj01 parulbajaj01 commented Apr 9, 2026

Copy link
Copy Markdown
Contributor

This PR integrates GCS Fuse Storage Profiles directly into the GKE persistent volume modules. We are moving away from the older method of manually configuring GCS mounts on the node host (via pre-existing-network-storage) and switching to using Kubernetes-native StorageClasses like gcsfusecsi-training.

What changed?

Module Updates: Added the gcsfuse_storage_class_name setting into the gke-persistent-volume module. It automatically maps inputs into the PV/PVC specs cleanly.

IAM Security: Instead of trying to force Terraform to create global custom roles (which needs administrative permissions we don't want to grant automation runners), both the gke.gcsfuse.profileUser role creation and service account bindings are kept manual. I've added straightforward instructions in the module's README.

Pre-flight Checks: Adde Go validations that check if permissions are set up correctly before provisioning to avoid messy deployment failures.

Testing
Everything builds successfully locally.

If a user updates their blueprint and runs gcluster deploy -w on an existing cluster which earlier had the mount options config, the only thing in this case that the user might need to do is deleting the PV's and PVC's manually because the first time, the PVCs were created with an empty storageClassName: "" (or defaulting to standard). Now, the update will try to change that field to gcsfusecsi-checkpointing and gcsfusecsi-training. Kubernetes rejects the patch because it does not allow changing the storage class of an existing, bound PVC.
To apply the new storage profiles, user will need to delete the old PVCs and PVs so that Terraform can recreate them from scratch with the new settings. Since the data is stored externally in GCS buckets, deleting the Kubernetes volume objects will not delete the actual data in GCS.

Submission Checklist

NOTE: Community submissions can take up to 2 weeks to be reviewed.

Please take the following actions before submitting this pull request.

  • Fork your PR branch from the Toolkit "develop" branch (not main)
  • Test all changes with pre-commit in a local branch #
  • Confirm that "make tests" passes all tests
  • Add or modify unit tests to cover code changes
  • Ensure that unit test coverage remains above 80%
  • Update all applicable documentation
  • Follow Cluster Toolkit Contribution guidelines #

@parulbajaj01 parulbajaj01 requested review from a team and samskillman as code owners April 9, 2026 08:50
@parulbajaj01 parulbajaj01 added the release-key-new-features Added to release notes under the "Key New Features" heading. label Apr 9, 2026
@parulbajaj01 parulbajaj01 marked this pull request as draft April 9, 2026 08:51
@gemini-code-assist

Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the GKE Cluster Toolkit by integrating GKE storage profiles with GCSFuse. The changes streamline the process of provisioning persistent volumes backed by Google Cloud Storage, allowing users to specify optimized storage classes for different workloads (e.g., training, checkpointing). This update simplifies configuration, improves performance management for GCS-backed storage, and includes automated permission handling for a smoother user experience.

Highlights

  • GCSFuse Storage Profile Integration: Integrated GKE storage profiles into GCSFuse, allowing for simplified configuration of GCS-backed persistent volumes using predefined storage classes like gcsfusecsi-training and gcsfusecsi-checkpointing.
  • Automated IAM Role Binding: Implemented automatic binding of the custom IAM role gke.gcsfuse.profileUser to the GKE service agent when a GCSFuse storage class name is provided, ensuring necessary permissions for GCSFuse operations.
  • Example Configuration Update: Updated the gke-a3-ultragpu example to leverage the new GCSFuse storage profiles, removing older manual mount option configurations and upgrading the GKE version prefix.
  • Documentation and Input Variable: Added comprehensive documentation for GCS Fuse Storage Profiles prerequisites, including gcloud and Terraform instructions for creating the required custom IAM role. A new input variable gcsfuse_storage_class_name was introduced with validation.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the gke-a3-ultragpu example to support GCS Fuse Storage Profiles, including a GKE version bump to 1.35 and the addition of gcsfuse_storage_class_name parameters. It also implements automated IAM role binding for the GKE service agent and updates the module documentation. Review feedback highlights critical issues where removing the pre-existing-network-storage modules breaks the gke-persistent-volume module's dependency on the network_storage variable. Other feedback points out a breaking change in mount option filtering, a potential Terraform error in the IAM resource count logic, and redundant variable assignments that violate the style guide.

Comment thread examples/gke-a3-ultragpu/gke-a3-ultragpu.yaml
Comment thread modules/file-system/gke-persistent-volume/main.tf Outdated
Comment thread modules/file-system/gke-persistent-volume/main.tf Outdated
Comment thread examples/gke-a3-ultragpu/gke-a3-ultragpu.yaml Outdated

@gargnitingoogle gargnitingoogle left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments other than whatever Gemini code assistant has commented. Otherwise LGTM. We should also confirm by testing these changes before going ahead with full-on review.

Comment thread modules/file-system/gke-persistent-volume/main.tf Outdated
Comment thread modules/file-system/gke-persistent-volume/main.tf Outdated
Comment thread modules/file-system/gke-persistent-volume/main.tf Outdated
@parulbajaj01 parulbajaj01 requested a review from bytetwin April 17, 2026 07:13
@parulbajaj01 parulbajaj01 marked this pull request as ready for review April 17, 2026 07:14
@parulbajaj01 parulbajaj01 added the release-breaking-changes Prevents "smooth" re-deploy across versions label Apr 21, 2026
Comment thread modules/file-system/gke-persistent-volume/main.tf Outdated
Comment thread pkg/validators/cloud.go Outdated
Comment thread pkg/validators/cloud.go Outdated
Comment thread pkg/validators/cloud.go Outdated
@parulbajaj01 parulbajaj01 merged commit 97a464e into GoogleCloudPlatform:develop May 4, 2026
17 of 76 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-breaking-changes Prevents "smooth" re-deploy across versions release-key-new-features Added to release notes under the "Key New Features" heading.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants