Skip to content

Compress the H4D blueprint with multivpc and vpc module update#5133

Merged
SwarnaBharathiMantena merged 7 commits intoGoogleCloudPlatform:developfrom
SwarnaBharathiMantena:swarnabm/multivpc_for_rdma
Jan 27, 2026
Merged

Compress the H4D blueprint with multivpc and vpc module update#5133
SwarnaBharathiMantena merged 7 commits intoGoogleCloudPlatform:developfrom
SwarnaBharathiMantena:swarnabm/multivpc_for_rdma

Conversation

@SwarnaBharathiMantena
Copy link
Contributor

@SwarnaBharathiMantena SwarnaBharathiMantena commented Jan 22, 2026

Highlight

-30 lines on blueprint, and only +5 lines!

Details

Multiple GKE blueprints use 30 lines (15 each) for adding additional networks on cluster and nodepool.

The blueprints can be made concise by updating the multivpc and vpc modules to support additional variables. Then the multivpc module can be used for setting up the RDMA network in the GKE H4D blueprint.

This PR updates only the GKE H4D blueprint. Other blueprints can also be updated in a followup PR.

Verification

  1. gcloud container clusters describe h4d-swarnabm-05 --region=us-central1 --project=PROJECT_ID results in the additionalNodeNetworkConfigs displaying the correct RDMA network name.
  networkConfig:
    additionalNodeNetworkConfigs:
    - network: h4d-swarnabm-05-rdma-net-0
      subnetwork: h4d-swarnabm-05-rdma-net-0-subnet
    enablePrivateNodes: true
    networkTierConfig:
      networkTier: NETWORK_TIER_DEFAULT
    podIpv4CidrBlock: 10.64.0.0/19
    podIpv4RangeUtilization: 0.0938
    podRange: pods
    subnetwork: projects/hpc-toolkit-dev/regions/us-central1/subnetworks/h4d-swarnabm-05-sub
  1. Nodepool description results in the additionalNodeNetworkConfigs displaying the correct RDMA network name.
    Command:
gcloud container node-pools describe h4d-highmem-192-lssd-h4d-pool \
    --cluster=h4d-swarnabm-05 \
    --location=us-central1 \
    --project=PROJECT_ID

Config snippet:

  networkConfig:
    additionalNodeNetworkConfigs:
    - network: h4d-swarnabm-05-rdma-net-0
      subnetwork: h4d-swarnabm-05-rdma-net-0-subnet
    enablePrivateNodes: true
    networkTierConfig:
      networkTier: NETWORK_TIER_DEFAULT
    podIpv4CidrBlock: 10.64.0.0/19
    podIpv4RangeUtilization: 0.0938
    podRange: pods
    subnetwork: projects/hpc-toolkit-dev/regions/us-central1/subnetworks/h4d-swarnabm-05-sub
  1. VM description
gcloud compute instances describe gke-h4d-swarnabm-05-h4d-highmem-192-l-c1268c1a-6z9z   
    --project=hpc-toolkit-dev
    --zone=us-central1-b
    --format="yaml(networkInterfaces)"

The description displayed includes nicType: IRDMA entry.

Explanation on the CIDR ranges

  1. The Primary GKE Network (gke-h4d-net)
    We will use the 10.10.x.x block. This keeps your management, pods, and services distinct.
    Primary Subnet (Nodes/Control Plane): 10.10.0.0/20 (4,096 IPs)
    Secondary Range (Pods): 10.11.0.0/16 (65,536 IPs)
    Why: Pods are the hungriest for IPs. A /16 ensures you never run out as you autoscale.
    Secondary Range (Services): 10.10.16.0/20 (4,096 IPs)
    Why: Services (ClusterIPs) rarely need as much space as Pods.

  2. The RDMA/Falcon Network (gke-h4d-rdma-net)
    We will use the 10.20.x.x block. This creates a "mental gap" between standard traffic and high-performance backend traffic.
    Global Range: 10.20.0.0/16
    Subnetwork Suffix: 24
    Why: If you scale to 8 or 16 RDMA networks (common in large GPU clusters), each network gets a full /24 (256 IPs). This aligns perfectly with hardware rack boundaries.

  3. The Firewall Refinement
    Use specific super-block: 10.10.0.0/15 (covers nodes, pods, and services).

@SwarnaBharathiMantena SwarnaBharathiMantena added release-module-improvements Added to release notes under the "Module Improvements" heading. release-improvements Added to release notes under the "Improvements" heading. labels Jan 22, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @SwarnaBharathiMantena, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request streamlines the network configuration within the GKE H4D blueprint by migrating the RDMA network setup to an enhanced multivpc module. This change significantly reduces boilerplate code, introduces a more structured IP addressing scheme, and provides greater flexibility in managing network features like Cloud Routers, NATs, and private access through new module variables. The overall impact is a more concise, maintainable, and scalable network definition for GKE deployments.

Highlights

  • GKE H4D Blueprint Network Refactoring: The gke-h4d.yaml blueprint has been updated to use the multivpc module for RDMA network configuration, replacing a more verbose manual setup.
  • Updated IP Addressing Scheme: The primary GKE network's CIDR ranges for subnets, pods, and services have been revised to a 10.10.x.x block, and firewall rules updated to reflect these new ranges, including a 10.20.x.x block for RDMA.
  • Multivpc Module Enhancements: The multivpc module now supports additional variables like enable_cloud_router, enable_cloud_nat, and subnetwork_private_access, and its network_count validation has been relaxed to allow single VPC deployments.
  • VPC Module Flexibility: The base vpc module has been updated to accept a subnetwork_private_access variable, making its configuration more dynamic.
  • Simplified GKE Cluster/Nodepool Configuration: The explicit additional_networks blocks for RDMA in the GKE cluster and nodepool definitions have been removed, as the multivpc module now handles this integration more efficiently.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively compresses the GKE H4D blueprint by updating the multivpc and vpc modules to simplify network configuration. Using the multivpc module for the RDMA network is a good change that significantly improves the blueprint's conciseness by removing repetitive additional_networks blocks. The module updates are well-executed, and the new variables are properly documented. I have one suggestion regarding code cleanup and documentation consistency. Overall, this is a valuable improvement for the project's maintainability.

@SwarnaBharathiMantena SwarnaBharathiMantena merged commit 297bd66 into GoogleCloudPlatform:develop Jan 27, 2026
12 of 75 checks passed
AdarshK15 pushed a commit to AdarshK15/cluster-toolkit that referenced this pull request Jan 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-improvements Added to release notes under the "Improvements" heading. release-module-improvements Added to release notes under the "Module Improvements" heading.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants