[Docs] Add ServiceMonitor section and make some step optional in Grafana & Promethus page#53474
Conversation
doc/source/cluster/kubernetes/k8s-ecosystem/prometheus-grafana.md
Outdated
Show resolved
Hide resolved
doc/source/cluster/kubernetes/k8s-ecosystem/prometheus-grafana.md
Outdated
Show resolved
Hide resolved
doc/source/cluster/kubernetes/k8s-ecosystem/prometheus-grafana.md
Outdated
Show resolved
Hide resolved
75e443e to
66b002c
Compare
There was a problem hiding this comment.
We can add a new step to display the KubeRay Operator Dashboard once this PR is merged. Thanks!
doc/source/cluster/kubernetes/k8s-ecosystem/metrics-references.md
Outdated
Show resolved
Hide resolved
doc/source/cluster/kubernetes/k8s-ecosystem/metrics-references.md
Outdated
Show resolved
Hide resolved
doc/source/cluster/kubernetes/k8s-ecosystem/metrics-references.md
Outdated
Show resolved
Hide resolved
doc/source/cluster/kubernetes/k8s-ecosystem/metrics-references.md
Outdated
Show resolved
Hide resolved
doc/source/cluster/kubernetes/k8s-ecosystem/metrics-references.md
Outdated
Show resolved
Hide resolved
doc/source/cluster/kubernetes/k8s-ecosystem/metrics-references.md
Outdated
Show resolved
Hide resolved
doc/source/cluster/kubernetes/k8s-ecosystem/metrics-references.md
Outdated
Show resolved
Hide resolved
doc/source/cluster/kubernetes/k8s-ecosystem/metrics-references.md
Outdated
Show resolved
Hide resolved
doc/source/cluster/kubernetes/k8s-ecosystem/metrics-references.md
Outdated
Show resolved
Hide resolved
doc/source/cluster/kubernetes/k8s-ecosystem/metrics-references.md
Outdated
Show resolved
Hide resolved
troychiu
left a comment
There was a problem hiding this comment.
Can we also mention that KubeRay provides the service monitor in helm chart https://github.com/ray-project/kuberay/blob/master/helm-chart/kuberay-operator/templates/servicemonitor.yaml? Maybe in Step 3: Install a KubeRay operator
doc/source/cluster/kubernetes/k8s-ecosystem/prometheus-grafana.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Maybe simply KubeRay Metrics?
There was a problem hiding this comment.
I think KubeRay Metrics should include Controller Runtime Metrics and KubeRay Custom Metrics emphasize "custom" so we can differentiate with natively provided metrics (Controller Runtime), does it make sense to you?
doc/source/cluster/kubernetes/k8s-ecosystem/prometheus-grafana.md
Outdated
Show resolved
Hide resolved
Is this for production purpose? |
doc/source/cluster/kubernetes/k8s-ecosystem/metrics-references.md
Outdated
Show resolved
Hide resolved
doc/source/cluster/kubernetes/k8s-ecosystem/metrics-references.md
Outdated
Show resolved
Hide resolved
doc/source/cluster/kubernetes/k8s-ecosystem/metrics-references.md
Outdated
Show resolved
Hide resolved
doc/source/cluster/kubernetes/k8s-ecosystem/prometheus-grafana.md
Outdated
Show resolved
Hide resolved
Yes, they serve different purposes. The install.sh script is mainly for users to quickly try out the Grafana dashboard, while the ServiceMonitor in the Helm chart is intended for production use. |
|
@troychiu Then I suppose I should also make helm chart for other components in install.sh? |
Signed-off-by: owenowenisme <mses010108@gmail.com>
Signed-off-by: owenowenisme <mses010108@gmail.com>
Signed-off-by: owenowenisme <mses010108@gmail.com>
Signed-off-by: owenowenisme <mses010108@gmail.com>
Co-authored-by: Jun-Hao Wan <ken89@kimo.com> Signed-off-by: Owen Lin (You-Cheng Lin) <106612301+owenowenisme@users.noreply.github.com>
Signed-off-by: Owen Lin (You-Cheng Lin) <106612301+owenowenisme@users.noreply.github.com>
Co-authored-by: Troy Chiu <114708546+troychiu@users.noreply.github.com> Signed-off-by: Owen Lin (You-Cheng Lin) <106612301+owenowenisme@users.noreply.github.com>
|
@kevin85421 PTAL |
|
I’ve requested that the docs team review this PR. @owenowenisme, if you haven’t installed Vale yet, please do so and ensure its rules are being followed. |
|
Thanks for reminding, I already ran vale, and make sure there is no new error. |
angelinalg
left a comment
There was a problem hiding this comment.
Just some style and grammatical fixes. Our style guide says to use sentence case for titles. Unfortunately, the titles you haven't changed in this PR aren't all sentence case. If it's not too much work for you to fix them in this PR, please do. Otherwise, I can follow up with a PR after yours merges to fix them. Thanks for updating the docs!
doc/source/cluster/kubernetes/k8s-ecosystem/metrics-references.md
Outdated
Show resolved
Hide resolved
doc/source/cluster/kubernetes/k8s-ecosystem/metrics-references.md
Outdated
Show resolved
Hide resolved
doc/source/cluster/kubernetes/k8s-ecosystem/metrics-references.md
Outdated
Show resolved
Hide resolved
doc/source/cluster/kubernetes/k8s-ecosystem/metrics-references.md
Outdated
Show resolved
Hide resolved
doc/source/cluster/kubernetes/k8s-ecosystem/metrics-references.md
Outdated
Show resolved
Hide resolved
doc/source/cluster/kubernetes/k8s-ecosystem/prometheus-grafana.md
Outdated
Show resolved
Hide resolved
doc/source/cluster/kubernetes/k8s-ecosystem/prometheus-grafana.md
Outdated
Show resolved
Hide resolved
doc/source/cluster/kubernetes/k8s-ecosystem/prometheus-grafana.md
Outdated
Show resolved
Hide resolved
doc/source/cluster/kubernetes/k8s-ecosystem/prometheus-grafana.md
Outdated
Show resolved
Hide resolved
doc/source/cluster/kubernetes/k8s-ecosystem/prometheus-grafana.md
Outdated
Show resolved
Hide resolved
Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Signed-off-by: Owen Lin (You-Cheng Lin) <106612301+owenowenisme@users.noreply.github.com>
ca7c630 to
181ab2e
Compare
|
Thanks @angelinalg I just applied your suggestions. |
angelinalg
left a comment
There was a problem hiding this comment.
I made a typo in one of my suggestions. Could you accept it? Also, are the corrections accurate? I want to verify that you meant RayService and not Ray Service for those metrics titles. @owenowenisme
doc/source/cluster/kubernetes/k8s-ecosystem/metrics-references.md
Outdated
Show resolved
Hide resolved
| ## Step 3: Install a KubeRay operator | ||
|
|
||
| * Follow [this document](kuberay-operator-deploy) to install the latest stable KubeRay operator via Helm repository. | ||
| * You can enable the ServiceMonitor when installing the KubeRay operator with helm. See [Step 7](#step-7-collect-kuberay-metrics-with-servicemonitor) for more details. |
There was a problem hiding this comment.
Remove this and provide the instruction to install KubeRay operator with ServiceMonitor instead.
helm install kuberay-operator kuberay/kuberay-operator --version 1.4.0 --set metrics.serviceMonitor.enabled=true
doc/source/cluster/kubernetes/k8s-ecosystem/prometheus-grafana.md
Outdated
Show resolved
Hide resolved
| * KubeRay provides an [install.sh script](https://github.com/ray-project/kuberay/blob/master/install/prometheus/install.sh) to: | ||
| * Install the [kube-prometheus-stack v48.2.1](https://github.com/prometheus-community/helm-charts/tree/kube-prometheus-stack-48.2.1/charts/kube-prometheus-stack) chart and related custom resources, including **PodMonitor** and **PrometheusRule**, in the namespace `prometheus-system` automatically. | ||
| * Import Ray Dashboard’s [Grafana JSON files](https://github.com/ray-project/kuberay/tree/master/config/grafana) into Grafana using the `--auto-load-dashboard true` flag. If the flag isn't set, the following step also provides instructions for manual import. | ||
| * Install the [kube-prometheus-stack v48.2.1](https://github.com/prometheus-community/helm-charts/tree/kube-prometheus-stack-48.2.1/charts/kube-prometheus-stack) chart and related custom resources, including **PodMonitor**, **ServiceMonitor**, and **PrometheusRule**, in the namespace `prometheus-system` automatically. |
There was a problem hiding this comment.
Please remove ServiceMonitor for KubeRay operator from the install.sh.
| ## Step 7: Collect custom metrics with Recording Rules | ||
| ## Step 7: Collect KubeRay metrics with ServiceMonitor | ||
|
|
||
| Starting with KubeRay 1.4.0, KubeRay provides a [ServiceMonitor](https://github.com/ray-project/kuberay/blob/master/config/prometheus/serviceMonitor.yaml) to help Prometheus discover and scrape KubeRay metrics. |
There was a problem hiding this comment.
We just need to tell users that the ServiceMonitor will be created when install KubeRay operator is installed and then provide a instruction to help users to verify it (something like kubectl get servicemonitor).
| ## Step 13: Embed Grafana panels in Ray Dashboard | ||
| ## Step 14: View the KubeRay operator dashboard | ||
|
|
||
| Once the KubeRay Operator dashboard is imported into Grafana, you can monitor metrics from the KubeRay operator. The dashboard provides a dropdown menu to filter and view controller runtime metrics for specific Ray CRs:`RayCluster`, `RayJob`, `RayService`. |
There was a problem hiding this comment.
We only need to include one picture here; four pictures are too many.
| # KubeRay metrics references | ||
|
|
||
| ## Controller runtime metrics | ||
| KubeRay is built with Controller Runtime, which natively exposes metrics that KubeRay includes in its metrics. These metrics include: |
There was a problem hiding this comment.
❯ vale doc/source/cluster/kubernetes/k8s-ecosystem/metrics-references.md
doc/source/cluster/kubernetes/k8s-ecosystem/metrics-references.md
1:1 suggestion Use parentheses judiciously. Google.Parens
1:2 error Use 'KubeRay' instead of Google.WordList
'kuberay'.
6:38 error Use 'Kubernetes' instead of Vale.Terms
'kubernetes'.
8:57 error Use 'Kubernetes' instead of Vale.Terms
'kubernetes'.
32:95 suggestion In general, use active voice Google.Passive
instead of passive voice ('is
provisioned').
33:158 suggestion Use parentheses judiciously. Google.Parens
49:313 suggestion In general, use active voice Google.Passive
instead of passive voice ('is
enabled').
I suppose that we can ignore these.
doc/source/cluster/kubernetes/k8s-ecosystem/metrics-references.md
Outdated
Show resolved
Hide resolved
doc/source/cluster/kubernetes/k8s-ecosystem/metrics-references.md
Outdated
Show resolved
Hide resolved
| # KubeRay metrics references | ||
|
|
||
| ## Controller runtime metrics | ||
| KubeRay is built with Controller Runtime, which natively exposes metrics that KubeRay includes in its metrics. These metrics include: |
There was a problem hiding this comment.
It's too verbose.
Please remove:
KubeRay is built with Controller Runtime ...
...
- Go runtime stats like Goroutines and GC duration
and use the following line instead:
"KubeRay exposes metrics provided by kubernetes-sigs/controller-runtime, including information about reconciliation, work queues, and more, to help users operate the KubeRay operator in production environments."
doc/source/cluster/kubernetes/k8s-ecosystem/metrics-references.md
Outdated
Show resolved
Hide resolved
Co-authored-by: Kai-Hsun Chen <kaihsun@apache.org> Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Signed-off-by: Owen Lin (You-Cheng Lin) <106612301+owenowenisme@users.noreply.github.com>
Signed-off-by: You-Cheng Lin <mses010108@gmail.com>
doc/source/cluster/kubernetes/k8s-ecosystem/prometheus-grafana.md
Outdated
Show resolved
Hide resolved
doc/source/cluster/kubernetes/k8s-ecosystem/prometheus-grafana.md
Outdated
Show resolved
Hide resolved
| ```sh | ||
| helm install kuberay-operator kuberay/kuberay-operator --version 1.4.0 \ | ||
| --set metrics.serviceMonitor.enabled=true | ||
| ``` |
There was a problem hiding this comment.
Add an instruction to verify whether the ServiceMonitor is created correctly.
| ## Step 7: Collect custom metrics with Recording Rules | ||
| ## Step 7: Collect KubeRay metrics with ServiceMonitor | ||
|
|
||
| Installing the KubeRay operator automatically creates a ServiceMonitor to help Prometheus discover and scrape KubeRay metrics. You can verify the ServiceMonitor creation with: |
| honorLabels: true | ||
|
|
||
| ``` | ||
| * Same as PodMonitor, the **install.sh** script also creates the [serviceMonitor.yaml](https://github.com/ray-project/kuberay/blob/master/config/prometheus/serviceMonitor.yaml) shown above, so you don't need to create it manually. |
doc/source/cluster/kubernetes/k8s-ecosystem/prometheus-grafana.md
Outdated
Show resolved
Hide resolved
Co-authored-by: Kai-Hsun Chen <kaihsun@apache.org> Signed-off-by: Owen Lin (You-Cheng Lin) <106612301+owenowenisme@users.noreply.github.com>
Signed-off-by: You-Cheng Lin <mses010108@gmail.com>
Signed-off-by: Owen Lin (You-Cheng Lin) <106612301+owenowenisme@users.noreply.github.com>
|
@owenowenisme please ping me when all CI tests pass. |
|
@kevin85421 CI passed. |
…ana & Promethus page (ray-project#53474) This PR update docs to show latest change of KubeRay. Changes including: - A page listing existing KubeRay Metrics https://anyscale-ray--53474.com.readthedocs.build/en/53474/cluster/kubernetes/k8s-ecosystem/metrics-references.html#kuberay-metrics-references - Instruction of ServiceMonitor in https://anyscale-ray--53474.com.readthedocs.build/en/53474/cluster/kubernetes/k8s-ecosystem/prometheus-grafana.html#step-7-collect-kuberay-metrics-with-servicemonitor <!-- Please give a short summary of the change and the problem this solves. --> ## Related issue number <!-- For example: "Closes ray-project#1234" --> ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: owenowenisme <mses010108@gmail.com> Signed-off-by: Owen Lin (You-Cheng Lin) <106612301+owenowenisme@users.noreply.github.com> Signed-off-by: You-Cheng Lin <mses010108@gmail.com> Co-authored-by: Jun-Hao Wan <ken89@kimo.com> Co-authored-by: Troy Chiu <114708546+troychiu@users.noreply.github.com> Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Co-authored-by: Kai-Hsun Chen <kaihsun@apache.org>
…ana & Promethus page (#53474) This PR update docs to show latest change of KubeRay. Changes including: - A page listing existing KubeRay Metrics https://anyscale-ray--53474.com.readthedocs.build/en/53474/cluster/kubernetes/k8s-ecosystem/metrics-references.html#kuberay-metrics-references - Instruction of ServiceMonitor in https://anyscale-ray--53474.com.readthedocs.build/en/53474/cluster/kubernetes/k8s-ecosystem/prometheus-grafana.html#step-7-collect-kuberay-metrics-with-servicemonitor <!-- Please give a short summary of the change and the problem this solves. --> ## Related issue number <!-- For example: "Closes #1234" --> ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: owenowenisme <mses010108@gmail.com> Signed-off-by: Owen Lin (You-Cheng Lin) <106612301+owenowenisme@users.noreply.github.com> Signed-off-by: You-Cheng Lin <mses010108@gmail.com> Co-authored-by: Jun-Hao Wan <ken89@kimo.com> Co-authored-by: Troy Chiu <114708546+troychiu@users.noreply.github.com> Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Co-authored-by: Kai-Hsun Chen <kaihsun@apache.org> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Why are these changes needed?
This PR update docs to show latest change of KubeRay.
Changes including:
Related issue number
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.