Updated monitoring-stack #91

singh-kalpana · 2025-01-02T10:27:46Z

What this PR does / why we need it?
Install the Prometheus Operator and Grafana Operator instead of a standalone Prometheus and Grafana instance.
The standalone instances does not include important CRDs like servicemonitor ( prometheus-community/helm-charts#3010 ), that is needed to monitor user-defined applications without directly modifying the Prometheus configuration, also absence of these CRDs limits flexibility and stop us to migrate monitoring resources to production environment.

Changes:

Install Prometheus Operator (kube-prometheus stack) by keeping Grafana deployment set to false, as it installs standalone Grafana instance with it
This PR #2172 suggests including the Grafana Operator within the kube-prometheus stack rather than Grafana instance, but the Prometheus community recommends disabling Grafana in the stack and installing Grafana Operator separately.
Install Grafana Operator following this doc https://github.com/grafana/grafana-operator

garethsb · 2025-01-07T14:33:23Z

playbooks/files/grafana.yaml

+    type: prometheus
+    uid: prometheus
+    access: proxy
+    url: http://kube-prometheus-stack-1735-prometheus.monitoring.svc.cluster.local:9090


Is this really a fixed URL?

I guess it will change but need to test

Thank you for pointing this out.

Initially, I was facing problem while setting the URL, so I followed this recommendation, and it worked.

But yes, you're right, URL can change because of the suffix (1735). I used the --generate-name option while installing kube-prometheus-stack, as this option adds a unique suffix to the release name.

To avoid this, we can use a fixed release name "kube-prometheus-stack", instead of --generate-name. This will create a consistent service name: kube-prometheus-stack-prometheus

helm install kube-prometheus-stack --version {{ prometheus_stack }} prometheus-community/kube-prometheus-stack --create-namespace --namespace monitoring --values {{ ansible_user_dir }}/kube-prometheus-stack.values

With this, URL will always be:
http://kube-prometheus-stack-prometheus.monitoring:9090 as describe in this kube-prometheus-stack grafana datasource file also

I tried this approach, and it worked.

ghost · 2025-02-19T20:05:23Z

playbooks/files/kube-prometheus-stack.values

@singh-kalpana not sure why you removed the additional scrape config, because of that unable to see GPU Metrics on Grafana Dashboard. I will fix that but it's FYI

We don't need to add additional scrape config, as ServiceMonitor for DCGM exporter is being enabled, which allows Prometheus to scrape metrics from DCGM exporter, it takes about a minute to show metrics on Grafana Dashboard

Updated monitoring-stack

fd68503

garethsb reviewed Jan 7, 2025

View reviewed changes

kalpanas and others added 3 commits January 8, 2025 12:07

consistent grafana datasource URL

b2ebceb

Add kube-prometheus-stack.values file to copy task

a0546c0

Updated Prometheus Adapter URL and port

f18d85c

ghost merged commit ef96395 into NVIDIA:master Feb 4, 2025

ghost reviewed Feb 19, 2025

View reviewed changes

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Updated monitoring-stack #91

Updated monitoring-stack #91

Uh oh!

singh-kalpana commented Jan 2, 2025

Uh oh!

garethsb Jan 7, 2025

Uh oh!

angudadev Jan 7, 2025

Uh oh!

singh-kalpana Jan 7, 2025

Uh oh!

ghost Feb 19, 2025

Uh oh!

singh-kalpana Feb 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Updated monitoring-stack #91

Updated monitoring-stack #91

Uh oh!

Conversation

singh-kalpana commented Jan 2, 2025

Uh oh!

garethsb Jan 7, 2025

Choose a reason for hiding this comment

Uh oh!

angudadev Jan 7, 2025

Choose a reason for hiding this comment

Uh oh!

singh-kalpana Jan 7, 2025

Choose a reason for hiding this comment

Uh oh!

ghost Feb 19, 2025

Choose a reason for hiding this comment

Uh oh!

singh-kalpana Feb 20, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants