Skip to content

mgr/prometheus: introduce metric for collection time#36298

Merged
tchaikov merged 3 commits intoceph:masterfrom
p-se:mgr-prom-collect-time-metric
Mar 15, 2021
Merged

mgr/prometheus: introduce metric for collection time#36298
tchaikov merged 3 commits intoceph:masterfrom
p-se:mgr-prom-collect-time-metric

Conversation

@p-se
Copy link
Contributor

@p-se p-se commented Jul 27, 2020

Introduces metric prometheus_collect_duration_seconds summary (_sum and _count) for the time it
takes the Prometheus manager module to collect all the metrics.

# HELP ceph_prometheus_collect_duration_seconds_sum The sum of seconds took to collect all metrics of this exporter
# TYPE ceph_prometheus_collect_duration_seconds_sum counter
ceph_prometheus_collect_duration_seconds_sum{method="get_health"} 0.014837503433227539
ceph_prometheus_collect_duration_seconds_sum{method="get_pool_stats"} 0.015115976333618164
ceph_prometheus_collect_duration_seconds_sum{method="get_df"} 0.024508237838745117
ceph_prometheus_collect_duration_seconds_sum{method="get_fs"} 0.03028130531311035
ceph_prometheus_collect_duration_seconds_sum{method="get_quorum_status"} 0.030310869216918945
ceph_prometheus_collect_duration_seconds_sum{method="get_mgr_status"} 0.2740020751953125
ceph_prometheus_collect_duration_seconds_sum{method="get_pg_status"} 0.019570589065551758
ceph_prometheus_collect_duration_seconds_sum{method="get_osd_stats"} 0.01889801025390625
ceph_prometheus_collect_duration_seconds_sum{method="get_metadata_and_osd_status"} 0.21236467361450195
ceph_prometheus_collect_duration_seconds_sum{method="get_num_objects"} 0.012030839920043945
ceph_prometheus_collect_duration_seconds_sum{method="get_rbd_stats"} 0.011925935745239258
# HELP ceph_prometheus_collect_duration_seconds_count The amount of metrics gathered for this exporter
# TYPE ceph_prometheus_collect_duration_seconds_count counter
ceph_prometheus_collect_duration_seconds_count{method="get_health"} 107.0
ceph_prometheus_collect_duration_seconds_count{method="get_pool_stats"} 107.0
ceph_prometheus_collect_duration_seconds_count{method="get_df"} 107.0
ceph_prometheus_collect_duration_seconds_count{method="get_fs"} 107.0
ceph_prometheus_collect_duration_seconds_count{method="get_quorum_status"} 107.0
ceph_prometheus_collect_duration_seconds_count{method="get_mgr_status"} 107.0
ceph_prometheus_collect_duration_seconds_count{method="get_pg_status"} 107.0
ceph_prometheus_collect_duration_seconds_count{method="get_osd_stats"} 107.0
ceph_prometheus_collect_duration_seconds_count{method="get_metadata_and_osd_status"} 107.0
ceph_prometheus_collect_duration_seconds_count{method="get_num_objects"} 107.0
ceph_prometheus_collect_duration_seconds_count{method="get_rbd_stats"} 107.0

Fixes: https://tracker.ceph.com/issues/46703

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard backend
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

@p-se p-se requested review from alfonsomthd and trociny July 27, 2020 13:58
@p-se p-se force-pushed the mgr-prom-collect-time-metric branch from 9d5e796 to 8ad84c0 Compare July 28, 2020 09:19
@p-se p-se requested review from b-ranto and tchaikov July 28, 2020 12:38
@trociny trociny requested a review from jan--f July 28, 2020 14:26
@trociny
Copy link
Contributor

trociny commented Jul 29, 2020

I think storing a method last run time may work in this particular case, because methods are called together with (before) get_collect_time_metrics.

But sill I would prefer if we have counters similar to other "time" counters like latency. I.e. it would be nice if the profile_method collected not the time of the last run, but the accumulated time and the count of runs.

And I would prefer 'method' label instead of 'subsection'.

It is just my opinion though.

Copy link
Contributor

@jan--f jan--f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @trociny change requests.
I.e. we should measure the duration with a summary/simple histogramm metric type (https://prometheus.io/docs/practices/histograms/#count-and-sum-of-observations).

Also 👍 for the s/subsection/method/ change.

@p-se p-se force-pushed the mgr-prom-collect-time-metric branch 2 times, most recently from 5661d65 to 6d69282 Compare September 17, 2020 09:38
@p-se p-se requested a review from jan--f September 17, 2020 12:24
@p-se
Copy link
Contributor Author

p-se commented Oct 8, 2020

I've updated the PR according to the comments and changed the counter metric to be a summary. Please review.

jan--f
jan--f previously requested changes Oct 9, 2020
Copy link
Contributor

@jan--f jan--f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See inline comment. One nit from me would be that the move of the MetricCollectionThread is a bit confusing and hides the actual change somewhat, but I could live with that

Copy link
Contributor

@trociny trociny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@p-se p-se force-pushed the mgr-prom-collect-time-metric branch from 6d69282 to ff57714 Compare October 13, 2020 07:48
@p-se p-se requested a review from jan--f October 13, 2020 07:49
@p-se p-se requested a review from LenzGr October 22, 2020 09:24
@p-se p-se dismissed jan--f’s stale review November 13, 2020 16:02

requested changes have been implemented

@p-se
Copy link
Contributor Author

p-se commented Nov 13, 2020

jenkins retest this please

@p-se p-se force-pushed the mgr-prom-collect-time-metric branch from ff57714 to d4f9768 Compare December 4, 2020 09:28
@github-actions github-actions bot added the pybind label Dec 4, 2020
@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@p-se p-se force-pushed the mgr-prom-collect-time-metric branch from d4f9768 to 5109c14 Compare February 22, 2021 14:54
@p-se p-se requested review from epuertat and removed request for LenzGr February 23, 2021 10:10
p-se added 3 commits February 25, 2021 15:45
Introduces metric `prometheus_collect_duration_seconds` for the time it
takes the Prometheus manager module to collect all the metrics.

```
ceph_prometheus_collect_duration_seconds_sum{method="get_health"} 0.0002613067626953125
ceph_prometheus_collect_duration_seconds_sum{method="get_pool_stats"} 0.0018298625946044922
ceph_prometheus_collect_duration_seconds_sum{method="get_df"} 0.0005767345428466797
ceph_prometheus_collect_duration_seconds_sum{method="get_fs"} 0.0010402202606201172
ceph_prometheus_collect_duration_seconds_sum{method="get_quorum_status"} 0.0007524490356445312
ceph_prometheus_collect_duration_seconds_sum{method="get_mgr_status"} 0.0035364627838134766
ceph_prometheus_collect_duration_seconds_sum{method="get_pg_status"} 0.00021266937255859375
ceph_prometheus_collect_duration_seconds_sum{method="get_osd_stats"} 0.0018737316131591797
ceph_prometheus_collect_duration_seconds_sum{method="get_metadata_and_osd_status"} 0.0032796859741210938
ceph_prometheus_collect_duration_seconds_sum{method="get_num_objects"} 0.00011086463928222656
ceph_prometheus_collect_duration_seconds_sum{method="get_rbd_stats"} 0.00036144256591796875
ceph_prometheus_collect_duration_seconds_count{method="get_health"} 1.0
ceph_prometheus_collect_duration_seconds_count{method="get_pool_stats"} 1.0
ceph_prometheus_collect_duration_seconds_count{method="get_df"} 1.0
ceph_prometheus_collect_duration_seconds_count{method="get_fs"} 1.0
ceph_prometheus_collect_duration_seconds_count{method="get_quorum_status"} 1.0
ceph_prometheus_collect_duration_seconds_count{method="get_mgr_status"} 1.0
ceph_prometheus_collect_duration_seconds_count{method="get_pg_status"} 1.0
ceph_prometheus_collect_duration_seconds_count{method="get_osd_stats"} 1.0
ceph_prometheus_collect_duration_seconds_count{method="get_metadata_and_osd_status"} 1.0
ceph_prometheus_collect_duration_seconds_count{method="get_num_objects"} 1.0
ceph_prometheus_collect_duration_seconds_count{method="get_rbd_stats"} 1.0
```

Fixes: https://tracker.ceph.com/issues/46703

Signed-off-by: Patrick Seidensal <pseidensal@suse.com>
Signed-off-by: Patrick Seidensal <pseidensal@suse.com>
Signed-off-by: Patrick Seidensal <pseidensal@suse.com>
@p-se p-se force-pushed the mgr-prom-collect-time-metric branch from 10e079b to 806ef8a Compare February 25, 2021 14:46
@tchaikov tchaikov merged commit 3ef8116 into ceph:master Mar 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants