mgr/dashboard: add prometheus federation config for mullti-cluster monitoring#54964
Conversation
src/pybind/mgr/cephadm/templates/services/prometheus/prometheus.yml.j2
Outdated
Show resolved
Hide resolved
8e64de8 to
9ad8cd4
Compare
9ad8cd4 to
bc95834
Compare
|
jenkins retest this please |
|
jenkins test make check |
adk3798
left a comment
There was a problem hiding this comment.
minor comments. Can't speak much to the changes to the actual prometheus conf, but generally the code looks okay outside of the tests failing.
96182fd to
c21d7ac
Compare
rkachach
left a comment
There was a problem hiding this comment.
I just left some minor comments + some other more specific to security. Plz, I'd like to know if we have done any security assessment of what are the implications of enabling the security + this new feature. Is the system still secure? if not what security issues could we face when enabling this new feature and what should we do to overcome them.
| relabel_configs: | ||
| - source_labels: [__address__] | ||
| target_label: cluster | ||
| replacement: {{ cluster_fsid }} |
There was a problem hiding this comment.
This section assumes you are using secure communication. Is this the case? what security implications has this new feature? are we taking them into account? have we did any security assessment for the impact?
| except ArgumentError as e: | ||
| return HandleCommandResult(-errno.EINVAL, "", (str(e))) | ||
|
|
||
| @_cli_write_command('orch prometheus set-target') |
There was a problem hiding this comment.
if I give a target like http://<ip>:port it kind of kills the prometheus daemon and i had to remove the target and restart prometheus module to get it working. Should it fail like that for a simple error. And if this is a crucial mistake, then it should have a proper validation set-up or we might end up breaking a deployment.
There was a problem hiding this comment.
atleast some helpers mentioning how the prometheus target should look like would be helpful
There was a problem hiding this comment.
@nizamial09 , this issue is being tracked here - https://tracker.ceph.com/issues/64369, Will open a separate PR for the mentioned issues soon.
@adk3798 any updates? |
d7faaaf to
852ecb8
Compare
|
@aaSharma14 i saw these in the unit test failures |
failures were all the |
852ecb8 to
f8c0940
Compare
monitoring Signed-off-by: Aashish Sharma <aasharma@redhat.com>
f8c0940 to
82b50b4
Compare
|
jenkins test api |
Rendering the dashboards and alerts with showMultiCluster=True allows for
them to work with multiple clusters storing their metrics in a single
Prometheus instance. This works via the cluster label and that functionality
already existed. This just fixes some inconsistencies in applying the label
filters.
Additionally this contains updates to the tests to have them succeed with
with both configurations and avoid the introduction of regressions in
regards to multiCluster in the future.
There also are some consistency cleanups here and there:
* `datasource` was not used consistently
* `cluster` label_values are determined from `ceph_health_status`
* `job` template and filters on this label were removed to align multi cluster
support solely via the `cluster` label
* `ceph_hosts` filter now uses label_values from any ceph_metadata metrici
to now show all instance values, but those of hosts with some Ceph
component / daemon.
* Enable showMultiCluster=True since `cluster` label is now always present,
via ceph#54964
Fixes: https://tracker.ceph.com/issues/64321
Signed-off-by: Christian Rohmann <christian.rohmann@inovex.de>
Rendering the dashboards and alerts with showMultiCluster=True allows for
them to work with multiple clusters storing their metrics in a single
Prometheus instance. This works via the cluster label and that functionality
already existed. This just fixes some inconsistencies in applying the label
filters.
Additionally this contains updates to the tests to have them succeed with
with both configurations and avoid the introduction of regressions in
regards to multiCluster in the future.
There also are some consistency cleanups here and there:
* `datasource` was not used consistently
* `cluster` label_values are determined from `ceph_health_status`
* `job` template and filters on this label were removed to align multi cluster
support solely via the `cluster` label
* `ceph_hosts` filter now uses label_values from any ceph_metadata metrici
to now show all instance values, but those of hosts with some Ceph
component / daemon.
* Enable showMultiCluster=True since `cluster` label is now always present,
via ceph#54964
Fixes: https://tracker.ceph.com/issues/64321
Signed-off-by: Christian Rohmann <christian.rohmann@inovex.de>
Rendering the dashboards and alerts with showMultiCluster=True allows for
them to work with multiple clusters storing their metrics in a single
Prometheus instance. This works via the cluster label and that functionality
already existed. This just fixes some inconsistencies in applying the label
filters.
Additionally this contains updates to the tests to have them succeed with
with both configurations and avoid the introduction of regressions in
regards to multiCluster in the future.
There also are some consistency cleanups here and there:
* `datasource` was not used consistently
* `cluster` label_values are determined from `ceph_health_status`
* `job` template and filters on this label were removed to align multi cluster
support solely via the `cluster` label
* `ceph_hosts` filter now uses label_values from any ceph_metadata metrici
to now show all instance values, but those of hosts with some Ceph
component / daemon.
* Enable showMultiCluster=True since `cluster` label is now always present,
via ceph#54964
Fixes: https://tracker.ceph.com/issues/64321
Signed-off-by: Christian Rohmann <christian.rohmann@inovex.de>
Rendering the dashboards with showMultiCluster=True allows for
them to work with multiple clusters storing their metrics in a single
Prometheus instance. This works via the cluster label and that functionality
already existed. This just fixes some inconsistencies in applying the label
filters.
Additionally this contains updates to the tests to have them succeed with
with both configurations and avoid the introduction of regressions in
regards to multiCluster in the future.
There also are some consistency cleanups here and there:
* `datasource` was not used consistently
* `cluster` label_values are determined from `ceph_health_status`
* `job` template and filters on this label were removed to align multi cluster
support solely via the `cluster` label
* `ceph_hosts` filter now uses label_values from any ceph_metadata metrici
to now show all instance values, but those of hosts with some Ceph
component / daemon.
* Enable showMultiCluster=True since `cluster` label is now always present,
via ceph#54964
Fixes: https://tracker.ceph.com/issues/64321
Signed-off-by: Christian Rohmann <christian.rohmann@inovex.de>
Rendering the dashboards with showMultiCluster=True allows for
them to work with multiple clusters storing their metrics in a single
Prometheus instance. This works via the cluster label and that functionality
already existed. This just fixes some inconsistencies in applying the label
filters.
Additionally this contains updates to the tests to have them succeed with
with both configurations and avoid the introduction of regressions in
regards to multiCluster in the future.
There also are some consistency cleanups here and there:
* `datasource` was not used consistently
* `cluster` label_values are determined from `ceph_health_status`
* `job` template and filters on this label were removed to align multi cluster
support solely via the `cluster` label
* `ceph_hosts` filter now uses label_values from any ceph_metadata metrici
to now show all instance values, but those of hosts with some Ceph
component / daemon.
* Enable showMultiCluster=True since `cluster` label is now always present,
via ceph#54964
Improves: https://tracker.ceph.com/issues/64321
Signed-off-by: Christian Rohmann <christian.rohmann@inovex.de>
Rendering the dashboards with showMultiCluster=True allows for
them to work with multiple clusters storing their metrics in a single
Prometheus instance. This works via the cluster label and that functionality
already existed. This just fixes some inconsistencies in applying the label
filters.
Additionally this contains updates to the tests to have them succeed with
with both configurations and avoid the introduction of regressions in
regards to multiCluster in the future.
There also are some consistency cleanups here and there:
* `datasource` was not used consistently
* `cluster` label_values are determined from `ceph_health_status`
* `job` template and filters on this label were removed to align multi cluster
support solely via the `cluster` label
* `ceph_hosts` filter now uses label_values from any ceph_metadata metrici
to now show all instance values, but those of hosts with some Ceph
component / daemon.
* Enable showMultiCluster=True since `cluster` label is now always present,
via ceph#54964
Improves: https://tracker.ceph.com/issues/64321
Signed-off-by: Christian Rohmann <christian.rohmann@inovex.de>
Rendering the dashboards with showMultiCluster=True allows for
them to work with multiple clusters storing their metrics in a single
Prometheus instance. This works via the cluster label and that functionality
already existed. This just fixes some inconsistencies in applying the label
filters.
Additionally this contains updates to the tests to have them succeed with
with both configurations and avoid the introduction of regressions in
regards to multiCluster in the future.
There also are some consistency cleanups here and there:
* `datasource` was not used consistently
* `cluster` label_values are determined from `ceph_health_status`
* `job` template and filters on this label were removed to align multi cluster
support solely via the `cluster` label
* `ceph_hosts` filter now uses label_values from any ceph_metadata metrici
to now show all instance values, but those of hosts with some Ceph
component / daemon.
* Enable showMultiCluster=True since `cluster` label is now always present,
via ceph#54964
Improves: https://tracker.ceph.com/issues/64321
Signed-off-by: Christian Rohmann <christian.rohmann@inovex.de>
Rendering the dashboards with showMultiCluster=True allows for
them to work with multiple clusters storing their metrics in a single
Prometheus instance. This works via the cluster label and that functionality
already existed. This just fixes some inconsistencies in applying the label
filters.
Additionally this contains updates to the tests to have them succeed with
with both configurations and avoid the introduction of regressions in
regards to multiCluster in the future.
There also are some consistency cleanups here and there:
* `datasource` was not used consistently
* `cluster` label_values are determined from `ceph_health_status`
* `job` template and filters on this label were removed to align multi cluster
support solely via the `cluster` label
* `ceph_hosts` filter now uses label_values from any ceph_metadata metrici
to now show all instance values, but those of hosts with some Ceph
component / daemon.
* Enable showMultiCluster=True since `cluster` label is now always present,
via ceph#54964
Resolves: rhbz#2275936
Improves: https://tracker.ceph.com/issues/64321
Signed-off-by: Christian Rohmann <christian.rohmann@inovex.de>
(cherry picked from commit 2457451)
Rendering the dashboards with showMultiCluster=True allows for
them to work with multiple clusters storing their metrics in a single
Prometheus instance. This works via the cluster label and that functionality
already existed. This just fixes some inconsistencies in applying the label
filters.
Additionally this contains updates to the tests to have them succeed with
with both configurations and avoid the introduction of regressions in
regards to multiCluster in the future.
There also are some consistency cleanups here and there:
* `datasource` was not used consistently
* `cluster` label_values are determined from `ceph_health_status`
* `job` template and filters on this label were removed to align multi cluster
support solely via the `cluster` label
* `ceph_hosts` filter now uses label_values from any ceph_metadata metrici
to now show all instance values, but those of hosts with some Ceph
component / daemon.
* Enable showMultiCluster=True since `cluster` label is now always present,
via ceph#54964
Improves: https://tracker.ceph.com/issues/64321
Signed-off-by: Christian Rohmann <christian.rohmann@inovex.de>
Rendering the dashboards with showMultiCluster=True allows for
them to work with multiple clusters storing their metrics in a single
Prometheus instance. This works via the cluster label and that functionality
already existed. This just fixes some inconsistencies in applying the label
filters.
Additionally this contains updates to the tests to have them succeed with
with both configurations and avoid the introduction of regressions in
regards to multiCluster in the future.
There also are some consistency cleanups here and there:
* `datasource` was not used consistently
* `cluster` label_values are determined from `ceph_health_status`
* `job` template and filters on this label were removed to align multi cluster
support solely via the `cluster` label
* `ceph_hosts` filter now uses label_values from any ceph_metadata metrici
to now show all instance values, but those of hosts with some Ceph
component / daemon.
* Enable showMultiCluster=True since `cluster` label is now always present,
via ceph#54964
Improves: https://tracker.ceph.com/issues/64321
Signed-off-by: Christian Rohmann <christian.rohmann@inovex.de>
(cherry picked from commit 090b8e1)
Rendering the dashboards with showMultiCluster=True allows for
them to work with multiple clusters storing their metrics in a single
Prometheus instance. This works via the cluster label and that functionality
already existed. This just fixes some inconsistencies in applying the label
filters.
Additionally this contains updates to the tests to have them succeed with
with both configurations and avoid the introduction of regressions in
regards to multiCluster in the future.
There also are some consistency cleanups here and there:
* `datasource` was not used consistently
* `cluster` label_values are determined from `ceph_health_status`
* `job` template and filters on this label were removed to align multi cluster
support solely via the `cluster` label
* `ceph_hosts` filter now uses label_values from any ceph_metadata metrici
to now show all instance values, but those of hosts with some Ceph
component / daemon.
* Enable showMultiCluster=True since `cluster` label is now always present,
via ceph#54964
Improves: https://tracker.ceph.com/issues/64321
Signed-off-by: Christian Rohmann <christian.rohmann@inovex.de>
(cherry picked from commit 090b8e1)
Introduce prometheus fedeartion in ceph dashboard. This is done by adding a
federatejob to the prometheus configuration. We can add/remove targets (remote cluster's prometheus service endpoint) to this job to scrape data from different clusters. These targets are getting added in the prometheus config file by exposing two new orch clis -Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windowsjenkins test rook e2e