Compress the H4D blueprint with multivpc and vpc module update#5133
Conversation
Summary of ChangesHello @SwarnaBharathiMantena, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request streamlines the network configuration within the GKE H4D blueprint by migrating the RDMA network setup to an enhanced Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request effectively compresses the GKE H4D blueprint by updating the multivpc and vpc modules to simplify network configuration. Using the multivpc module for the RDMA network is a good change that significantly improves the blueprint's conciseness by removing repetitive additional_networks blocks. The module updates are well-executed, and the new variables are properly documented. I have one suggestion regarding code cleanup and documentation consistency. Overall, this is a valuable improvement for the project's maintainability.
297bd66
into
GoogleCloudPlatform:develop
Highlight
-30 lines on blueprint, and only +5 lines!
Details
Multiple GKE blueprints use 30 lines (15 each) for adding additional networks on cluster and nodepool.
The blueprints can be made concise by updating the multivpc and vpc modules to support additional variables. Then the multivpc module can be used for setting up the RDMA network in the GKE H4D blueprint.
This PR updates only the GKE H4D blueprint. Other blueprints can also be updated in a followup PR.
Verification
gcloud container clusters describe h4d-swarnabm-05 --region=us-central1 --project=PROJECT_IDresults in the additionalNodeNetworkConfigs displaying the correct RDMA network name.Command:
Config snippet:
The description displayed includes
nicType: IRDMAentry.Explanation on the CIDR ranges
The Primary GKE Network (gke-h4d-net)
We will use the 10.10.x.x block. This keeps your management, pods, and services distinct.
Primary Subnet (Nodes/Control Plane): 10.10.0.0/20 (4,096 IPs)
Secondary Range (Pods): 10.11.0.0/16 (65,536 IPs)
Why: Pods are the hungriest for IPs. A /16 ensures you never run out as you autoscale.
Secondary Range (Services): 10.10.16.0/20 (4,096 IPs)
Why: Services (ClusterIPs) rarely need as much space as Pods.
The RDMA/Falcon Network (gke-h4d-rdma-net)
We will use the 10.20.x.x block. This creates a "mental gap" between standard traffic and high-performance backend traffic.
Global Range: 10.20.0.0/16
Subnetwork Suffix: 24
Why: If you scale to 8 or 16 RDMA networks (common in large GPU clusters), each network gets a full /24 (256 IPs). This aligns perfectly with hardware rack boundaries.
The Firewall Refinement
Use specific super-block: 10.10.0.0/15 (covers nodes, pods, and services).