Skip to content

fix(bundler): re-enable aws-ebs-csi-driver by default and support --set disable#397

Merged
mchmarny merged 4 commits intoNVIDIA:mainfrom
yuanchen8911:fix/ebs-csi-default-enabled
Mar 13, 2026
Merged

fix(bundler): re-enable aws-ebs-csi-driver by default and support --set disable#397
mchmarny merged 4 commits intoNVIDIA:mainfrom
yuanchen8911:fix/ebs-csi-default-enabled

Conversation

@yuanchen8911
Copy link
Copy Markdown
Contributor

Summary

Re-enable aws-ebs-csi-driver by default in EKS recipes and add --set override support for disabling components at bundle time.

Motivation / Context

PR #382 disabled aws-ebs-csi-driver by default assuming EKS clusters have it as a managed addon. However, EKS only includes coredns, kube-proxy, and vpc-cni by default — EBS CSI must be explicitly installed. This caused Prometheus PVC provisioning to fail on EKS clusters without the addon.

Related: #382

Type of Change

  • Bug fix (non-breaking change that fixes an issue)

Component(s) Affected

  • Recipe engine / data (pkg/recipe)
  • Bundlers (pkg/bundler, pkg/component/*)

Implementation Notes

  1. Remove overrides: enabled: false from aws-ebs-csi-driver in recipes/overlays/eks.yaml
  2. Add getSetEnabledOverride() in pkg/bundler/bundler.go — checks --set overrides for enabled key before the recipe-level IsEnabled() check. --set takes precedence.

Users can disable at bundle time: --set awsebscsidriver:enabled=false

Testing

go test -race ./pkg/bundler -count=1   # PASS
go test -race ./pkg/recipe/... -count=1 # PASS

Verified on EKS (aicr-cuj1, p5.48xlarge):

  • Default: EBS CSI included, Prometheus PVC bound
  • --set awsebscsidriver:enabled=false: component skipped
  • undeploy.sh only manages components it deployed

Risk Assessment

  • Low — Restores previous default behavior, adds opt-out mechanism

Checklist

  • Tests pass locally (make test with -race)
  • Linter passes (make lint)
  • I did not skip/disable tests to make CI green
  • Changes follow existing patterns in the codebase
  • Commits are cryptographically signed (git commit -S)

…et disable

PR NVIDIA#382 disabled aws-ebs-csi-driver by default in EKS recipes assuming
it was pre-installed as an EKS addon. However, EKS does not include it
by default — only coredns, kube-proxy, and vpc-cni are included. This
caused Prometheus PVC provisioning to fail on clusters without the addon.

Changes:
- Re-enable aws-ebs-csi-driver in the EKS overlay (remove overrides.enabled: false)
- Add getSetEnabledOverride() so --set awsebscsidriver:enabled=false
  works at bundle time for clusters that have it as an EKS addon
- --set overrides take precedence over recipe-level overrides

Refs: NVIDIA#382

Signed-off-by: Yuan Chen <yuanchen97@gmail.com>
@yuanchen8911 yuanchen8911 force-pushed the fix/ebs-csi-default-enabled branch from 7574a3e to f2e3df8 Compare March 13, 2026 02:03
@github-actions github-actions bot added size/M and removed size/S labels Mar 13, 2026
mchmarny

This comment was marked as resolved.

Copy link
Copy Markdown
Member

@mchmarny mchmarny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the Helm value leaking, LGTM

@mchmarny mchmarny assigned yuanchen8911 and unassigned xdu31 Mar 13, 2026
@lockwobr
Copy link
Copy Markdown
Contributor

This is blocking me, so I am going to push a fix to cover the review feedback so this can merge.

Address review feedback on PR NVIDIA#397:
- Strip "enabled" key from --set overrides before passing to Helm chart values to prevent unintended side effects in charts
- Add test verifying enabled key does not leak into generated values.yaml
- Add default happy path test case (recipe enabled + no --set => included)
- Document enabled override in --set CLI help tex
mchmarny
mchmarny previously approved these changes Mar 13, 2026
Address review feedback on PR NVIDIA#397:
- Strip "enabled" key from --set overrides before passing to Helm chart values to prevent unintended side effects in charts
- Add test verifying enabled key does not leak into generated values.yaml
- Add default happy path test case (recipe enabled + no --set => included)
- Document enabled override in --set CLI help tex
@mchmarny mchmarny enabled auto-merge (squash) March 13, 2026 20:39
@lockwobr lockwobr disabled auto-merge March 13, 2026 20:41
@mchmarny mchmarny merged commit 88af95b into NVIDIA:main Mar 13, 2026
46 checks passed
xdu31 pushed a commit to xdu31/aicr that referenced this pull request Mar 24, 2026
…et disable (NVIDIA#397)

Signed-off-by: Yuan Chen <yuanchen97@gmail.com>
Co-authored-by: Brian Lockwood <lockwobr@gmail.com>
Co-authored-by: Mark Chmarny <mchmarny@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants