Skip to content

fix: rename prometheus component to kube-prometheus-stack#3

Merged
mchmarny merged 1 commit intoNVIDIA:mainfrom
yuanchen8911:fix/kube-prometheus-stack-naming
Feb 2, 2026
Merged

fix: rename prometheus component to kube-prometheus-stack#3
mchmarny merged 1 commit intoNVIDIA:mainfrom
yuanchen8911:fix/kube-prometheus-stack-naming

Conversation

@yuanchen8911
Copy link
Copy Markdown
Contributor

Summary

  • Rename component from prometheus to kube-prometheus-stack to match the Helm chart name
  • Ensures values are correctly passed to the sub-chart in umbrella chart deployments

Problem

The component was named prometheus but the Helm chart is kube-prometheus-stack. When generating umbrella charts, the Chart.yaml dependency uses the actual chart name (kube-prometheus-stack), but values were keyed under prometheus:. This mismatch caused Helm values (like fullnameOverride) to not be passed to the sub-chart.

Changes

  • pkg/recipe/data/registry.yaml: Rename component from prometheus to kube-prometheus-stack
  • pkg/recipe/data/components/prometheus/pkg/recipe/data/components/kube-prometheus-stack/
  • pkg/recipe/data/overlays/base.yaml: Update component name and valuesFile path
  • pkg/recipe/data/overlays/monitoring-hpa.yaml: Update dependencyRef
  • Keep prometheus in valueOverrideKeys for backwards compatibility with --set prometheus:key=value

Test plan

  • Generate a bundle and verify Chart.yaml dependency name matches values.yaml key
  • Deploy with helm install and verify kube-prometheus-stack values are applied correctly
  • Verify --set prometheus:key=value still works for backwards compatibility

🤖 Generated with Claude Code

@yuanchen8911 yuanchen8911 requested a review from mchmarny January 31, 2026 00:11
@yuanchen8911 yuanchen8911 force-pushed the fix/kube-prometheus-stack-naming branch from 56d3f26 to ba6c04c Compare January 31, 2026 00:23
@mchmarny
Copy link
Copy Markdown
Member

Looks like there will be more work required on this to make it work in CI. @dims @lalitadithya anything we can replicate here form NVS?

dims added a commit to dims/cloud-native-stack that referenced this pull request Jan 31, 2026
Fork PRs have restricted GITHUB_TOKEN permissions that prevent posting
comments directly. This change uses the workflow_run pattern:

1. Main workflow uploads coverage data as artifact (read-only safe)
2. Separate workflow_run triggered workflow posts comment (write perms)

This is the recommended secure pattern per GitHub Security Lab:
https://securitylab.github.com/resources/github-actions-preventing-pwn-requests/

Fixes: NVIDIA/aicr#3

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Davanum Srinivas <dsrinivas@nvidia.com>
dims added a commit to dims/aicr that referenced this pull request Jan 31, 2026
Fork PRs have restricted GITHUB_TOKEN permissions that prevent posting
comments directly. This change uses the workflow_run pattern:

1. Main workflow uploads coverage data as artifact (read-only safe)
2. Separate workflow_run triggered workflow posts comment (write perms)

This is the recommended secure pattern per GitHub Security Lab:
https://securitylab.github.com/resources/github-actions-preventing-pwn-requests/

Fixes: NVIDIA/aicr#3

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Davanum Srinivas <dsrinivas@nvidia.com>
@dims dims force-pushed the fix/kube-prometheus-stack-naming branch from 0b5b091 to 43c038e Compare January 31, 2026 20:56
@dims dims force-pushed the fix/kube-prometheus-stack-naming branch from 43c038e to 6c6c199 Compare January 31, 2026 21:18
@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 31, 2026

Coverage Report ✅

Metric Value
Coverage 73.8%
Threshold 70%
Status Pass
Coverage Badge
![Coverage](https://img.shields.io/badge/coverage-73.8%25-green)

Coverage unchanged by this PR.

@yuanchen8911 yuanchen8911 requested review from a team as code owners February 2, 2026 16:40
Align component name with Helm chart name to ensure values are correctly
passed to the sub-chart in umbrella chart deployments.

Changes:
- Rename component from 'prometheus' to 'kube-prometheus-stack' in registry
- Rename components/prometheus directory to components/kube-prometheus-stack
- Update base.yaml overlay to use new component name and values path
- Update monitoring-hpa.yaml dependency reference
- Keep 'prometheus' in valueOverrideKeys for backwards compatibility

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

fix: rename prometheus component to kube-prometheus-stack

Align component name with Helm chart name to ensure values are correctly
passed to the sub-chart in umbrella chart deployments.

Changes:
- Rename component from 'prometheus' to 'kube-prometheus-stack' in registry
- Rename components/prometheus directory to components/kube-prometheus-stack
- Update base.yaml overlay to use new component name and values path
- Update monitoring-hpa.yaml dependency reference
- Keep 'prometheus' in valueOverrideKeys for backwards compatibility

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@yuanchen8911 yuanchen8911 force-pushed the fix/kube-prometheus-stack-naming branch from a6df5f4 to 27afca5 Compare February 2, 2026 16:48
@mchmarny mchmarny merged commit fc68f03 into NVIDIA:main Feb 2, 2026
3 checks passed
dims referenced this pull request in dims/aicr Feb 20, 2026
Add three new validation steps to the H100 inference test:

- Inference Gateway (#6): verify GatewayClass accepted and Gateway
  programmed with inference extension CRDs present
- Accelerator & AI Service Metrics (#4/#5): verify DCGM Exporter
  metrics, Prometheus scraping, and custom metrics API availability
- Secure Accelerator Access (#3): verify GPU access is DRA-mediated
  (no hostPath, no device plugin), with proper container security

Also adds diagnostics for gateway, metrics, and DRA state on failure.

Signed-off-by: Davanum Srinivas <dsrinivas@nvidia.com>
dims referenced this pull request in dims/aicr Feb 20, 2026
Add three new validation steps to the H100 inference test:

- Inference Gateway (#6): verify GatewayClass accepted and Gateway
  programmed with inference extension CRDs present
- Accelerator & AI Service Metrics (#4/#5): verify DCGM Exporter
  metrics, Prometheus scraping, and custom metrics API availability
- Secure Accelerator Access (#3): verify GPU access is DRA-mediated
  (no hostPath, no device plugin), with proper container security

Also adds diagnostics for gateway, metrics, and DRA state on failure.

Signed-off-by: Davanum Srinivas <dsrinivas@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants