Skip to content

helm_install module implemented#3933

Merged
ighosh98 merged 4 commits into
GoogleCloudPlatform:developfrom
ighosh98:a4x
Apr 12, 2025
Merged

helm_install module implemented#3933
ighosh98 merged 4 commits into
GoogleCloudPlatform:developfrom
ighosh98:a4x

Conversation

@ighosh98

@ighosh98 ighosh98 commented Apr 12, 2025

Copy link
Copy Markdown
Contributor

Submission Checklist

  • Using helm_install module to install gpu_operator and nvidia_dra_driver. This is the recommended approach over kubectl
  • Defined a extensible helm module that can handle helm depedency installation.
  • Tests using nvidia-dra-driver helm chart passed by provisioning a cluster that uses it.

Callout

  • This is a first draft setup. We would evaluate the possibility of making it more modular and clean in future iterations.
  • gpu_operator helm chart installation is flaky and may throw the following cluster provisioning errors on some runs for newer machine families.
Warning: Helm release "" was created but has a failed status. Use the `helm` command to investigate the error, correct it, then run Terraform again.

  with module.workload-manager-install.module.install_gpu_operator_namespace[0].helm_release.apply_chart,
  on modules/embedded/modules/management/kubectl-apply/helm_install/main.tf line 15, in resource "helm_release" "apply_chart":
  15: resource "helm_release" "apply_chart" {


Error: context deadline exceeded

  with module.workload-manager-install.module.install_gpu_operator_namespace[0].helm_release.apply_chart,
  on modules/embedded/modules/management/kubectl-apply/helm_install/main.tf line 15, in resource "helm_release" "apply_chart":
  15: resource "helm_release" "apply_chart" {

Error: exit status 1

Error: context deadline exceeded

  with module.workload-manager-install.module.install_gpu_operator_namespace[0].helm_release.apply_chart,
  on modules/embedded/modules/management/kubectl-apply/helm_install/main.tf line 15, in resource "helm_release" "apply_chart":
  15: resource "helm_release" "apply_chart" {

NOTE: Community submissions can take up to 2 weeks to be reviewed.

Please take the following actions before submitting this pull request.

  • Fork your PR branch from the Toolkit "develop" branch (not main)
  • Test all changes with pre-commit in a local branch #
  • Confirm that "make tests" passes all tests
  • Add or modify unit tests to cover code changes
  • Ensure that unit test coverage remains above 80%
  • Update all applicable documentation
  • Follow Cluster Toolkit Contribution guidelines #

@ighosh98 ighosh98 requested review from a team and samskillman as code owners April 12, 2025 07:09
@ighosh98 ighosh98 changed the title helm module implemented helm_install module implemented Apr 12, 2025
@ighosh98 ighosh98 added the release-key-new-features Added to release notes under the "Key New Features" heading. label Apr 12, 2025
@ighosh98

Copy link
Copy Markdown
Contributor Author

@mohitchaurasia91 one test is failing because of capacity issues. could you please take a look and approve the PR if changes look fine?

@mohitchaurasia91 mohitchaurasia91 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added few comment, PTAL, otherwise LGTM.

Comment thread modules/management/kubectl-apply/kubectl/versions.tf
Comment thread modules/management/kubectl-apply/main.tf
Comment thread modules/management/kubectl-apply/main.tf
Comment thread modules/management/kubectl-apply/main.tf Outdated
Comment thread modules/management/kubectl-apply/main.tf Outdated
@ighosh98 ighosh98 merged commit 6599507 into GoogleCloudPlatform:develop Apr 12, 2025
@ighosh98 ighosh98 deleted the a4x branch April 12, 2025 18:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-key-new-features Added to release notes under the "Key New Features" heading.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants