Skip to content

Fix network names backward compatibility for A3 Mega and A3 High#3811

Merged
sharabiani merged 4 commits into
GoogleCloudPlatform:developfrom
sharabiani:gke-network-postfix
Mar 20, 2025
Merged

Fix network names backward compatibility for A3 Mega and A3 High#3811
sharabiani merged 4 commits into
GoogleCloudPlatform:developfrom
sharabiani:gke-network-postfix

Conversation

@sharabiani

Copy link
Copy Markdown
Collaborator

The GKE network names' convention for A3Mega used to be "{cluster_name}-gpunet-{i}-subnet". Therefor, to keep k8s_network_names variable backward compatible, we need to have a postfix for the network names as well.

This PRs adds gvnic_postfix and rdma_postfix to k8s_network_names.

Submission Checklist

NOTE: Community submissions can take up to 2 weeks to be reviewed.

Please take the following actions before submitting this pull request.

  • Fork your PR branch from the Toolkit "develop" branch (not main)
  • Test all changes with pre-commit in a local branch #
  • Confirm that "make tests" passes all tests
  • Add or modify unit tests to cover code changes
  • Ensure that unit test coverage remains above 80%
  • Update all applicable documentation
  • Follow Cluster Toolkit Contribution guidelines #

@sharabiani sharabiani requested review from a team and samskillman as code owners March 18, 2025 05:02
@sharabiani sharabiani added the release-module-improvements Added to release notes under the "Module Improvements" heading. label Mar 18, 2025
@sharabiani sharabiani enabled auto-merge March 18, 2025 05:27
ighosh98
ighosh98 previously approved these changes Mar 19, 2025
@SwarnaBharathiMantena

SwarnaBharathiMantena commented Mar 19, 2025

Copy link
Copy Markdown
Contributor

I see that the a3 megagpu integration test failed on this PR, with the following error. Let us fix that.

Error: Invalid template interpolation value

  on modules/embedded/modules/scheduler/gke-cluster/main.tf line 396, in locals:
 396:     merge(net, { name = "${var.k8s_network_names.gvnic_prefix}${idx + var.k8s_network_names.gvnic_start_index}${var.k8s_network_names.gvnic_postfix}" })
    ├────────────────
    │ var.k8s_network_names.gvnic_postfix is null

The expression result is null. Cannot include a null value in a string
template.

Comment thread modules/scheduler/gke-cluster/variables.tf Outdated
Comment thread modules/scheduler/gke-cluster/variables.tf Outdated

@SwarnaBharathiMantena SwarnaBharathiMantena left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default for that single value can be passed to avoid the error.

mr0re1
mr0re1 previously approved these changes Mar 19, 2025
@sharabiani sharabiani dismissed stale reviews from mr0re1 and ighosh98 via 75d8bb3 March 20, 2025 08:30

@SwarnaBharathiMantena SwarnaBharathiMantena left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - tests passed successfully

@sharabiani sharabiani merged commit fd48fd8 into GoogleCloudPlatform:develop Mar 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-module-improvements Added to release notes under the "Module Improvements" heading.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants