Skip to content

rgw: change section name of rgw_op counters#54623

Merged
alimaredia merged 2 commits intoceph:mainfrom
alimaredia:wip-rgw-op-metrics-section-rename
Feb 7, 2024
Merged

rgw: change section name of rgw_op counters#54623
alimaredia merged 2 commits intoceph:mainfrom
alimaredia:wip-rgw-op-metrics-section-rename

Conversation

@alimaredia
Copy link
Copy Markdown
Contributor

The rgw_op section of counter dump/schema becomes:

  • rgw_op_global for the global op counters
  • rgw_op_user for the user labeled counters
  • rgw_op_bucket for the bucket labeled counters

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e


"rgw_op": [
The counters in the ``rgw_op_global`` section reflect the totals of each op metric for a given Ceph Object Gateway.
The counters in the ``rgw_op_user`` and ``rgw_op_bucket`` sections are labeled counters of op metrics for a user or bucket respectively.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The counters in the ``rgw_op_user`` and ``rgw_op_bucket`` sections are labeled counters of op metrics for a user or bucket respectively.
The counters in the ``rgw_op_user`` and ``rgw_op_bucket`` sections are labelled counters of op metrics for a user or bucket respectively.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like both spellings are correct, it's just a matter of British vs American English: https://www.grammarly.com/blog/labeled-labelled/

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zdover23 do you have an opinion on this spelling?

Copy link
Copy Markdown
Contributor

@mattbenjamin mattbenjamin Jan 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exactly, we should try to lean into one spelling dialect

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW labelled looks wrong to me, and we've tended toward Murican English in the past for docs, so I vote to leave it as it is.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though I live in the Commonwealth, I think that American English is in ascendancy. "labelled".

@avanthakkar
Copy link
Copy Markdown
Contributor

Following points came upon discussion with @cloudbehl. @cloudbehl please feel free to add anything I may have missed.

- Lowercase for all labels (User, Bucket)
- Redundant "bucket" for list/del OPs  metrics name like ceph_rgw_op_bucket_list_buckets_lat_count, 
ceph_rgw_op_bucket_list_buckets_lat_sum
ceph_rgw_op_bucket_del_bucket_lat_count
- Also avoid Plural names for  list_buckets metrics (labels: user, bucket, global)
ceph_rgw_op_bucket_list_buckets_lat_count ```

@alimaredia
Copy link
Copy Markdown
Contributor Author

Following points came upon discussion with @cloudbehl. @cloudbehl please feel free to add anything I may have missed.

- Lowercase for all labels (User, Bucket)
- Redundant "bucket" for list/del OPs  metrics name like ceph_rgw_op_bucket_list_buckets_lat_count, 
ceph_rgw_op_bucket_list_buckets_lat_sum
ceph_rgw_op_bucket_del_bucket_lat_count
- Also avoid Plural names for  list_buckets metrics (labels: user, bucket, global)
ceph_rgw_op_bucket_list_buckets_lat_count ```

I'll go ahead and make the labels lowercase. For the other two suggestions, it's sounds like you have an issue with the names of the perf counters within the RGW.

In the example you gave list_buckets_* and del_bucket_* correspond to the operations for listing of all buckets (not the objects in a specific bucket) and deletion of a bucket.

I don't think it makes sense to change these perf counters internally just because the metrics sent to Prometheus has the word "bucket" twice. Do you have a better name for those perf counters?

@cloudbehl
Copy link
Copy Markdown
Contributor

@alimaredia Thanks for making the change. The naming convention and the metrics, looks good.

Related to the global metrics, do we want to call it global explicitly or can we just use the regular name rgw_op_ for global metrics?

@cbodley
Copy link
Copy Markdown
Contributor

cbodley commented Nov 30, 2023

Related to the global metrics, do we want to call it global explicitly or can we just use the regular name rgw_op_ for global metrics?

i don't care for the 'global' part either - it only makes sense if you know there are per-bucket/per-user versions in contrast

what do you think about the names rgw_op, rgw_op_per_user, rgw_op_per_bucket?

@alimaredia alimaredia force-pushed the wip-rgw-op-metrics-section-rename branch from 606a48c to a59d7d2 Compare December 12, 2023 21:01
@alimaredia
Copy link
Copy Markdown
Contributor Author

@cloudbehl @avanthakkar updated with name changes

Copy link
Copy Markdown
Contributor

@cloudbehl cloudbehl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, minor nits in docs for the name change.

The sections are ``rgw_op_global``, ``rgw_op_user``, and ``rgw_op_bucket``.

"rgw_op": [
The counters in the ``rgw_op_global`` section reflect the totals of each op metric for a given Ceph Object Gateway.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The counters in the ``rgw_op_global`` section reflect the totals of each op metric for a given Ceph Object Gateway.
The counters in the ``rgw_op`` section reflect the totals of each op metric for a given Ceph Object Gateway.


To view op metrics in the Ceph Object Gateway go to the ``rgw_op`` sections of the output of the ``counter dump`` command::

"rgw_op_global": [
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"rgw_op_global": [
"rgw_op": [

The sections are ``rgw_op_global``, ``rgw_op_user``, and ``rgw_op_bucket``.

"rgw_op": [
The counters in the ``rgw_op_global`` section reflect the totals of each op metric for a given Ceph Object Gateway.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The counters in the ``rgw_op_global`` section reflect the totals of each op metric for a given Ceph Object Gateway.
The counters in the ``rgw_op`` section reflect the totals of each op metric for a given Ceph Object Gateway.

@alimaredia alimaredia force-pushed the wip-rgw-op-metrics-section-rename branch from a59d7d2 to 02029cd Compare December 13, 2023 16:13
Copy link
Copy Markdown
Contributor

@cloudbehl cloudbehl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks good to me.

@alimaredia alimaredia requested a review from cbodley January 4, 2024 14:23
@avanthakkar
Copy link
Copy Markdown
Contributor

jenkins test api

@alimaredia
Copy link
Copy Markdown
Contributor Author

@cbodley can I get approval on this PR?

@alimaredia
Copy link
Copy Markdown
Contributor Author

jenkins test api

@alimaredia
Copy link
Copy Markdown
Contributor Author

@avanthakkar @cloudbehl for reference could you add some of the reasoning behind the motivation for this PR that you saw when you were building dashboards? Specifically why each section should have a consistent number of labels instead of varying labels.

@cloudbehl
Copy link
Copy Markdown
Contributor

@avanthakkar @cloudbehl for reference could you add some of the reasoning behind the motivation for this PR that you saw when you were building dashboards? Specifically why each section should have a consistent number of labels instead of varying labels.

Hi Ali, We have two major issues with that.

  1. cardinality : Single metric having multiple labels and that increases the cardinality and creates the performance bottleneck and scale issue. As in our case, we were having same 1 metric name that provides global counters, then user counters and bucket counters. When we were adding user metrics then we were adding the user label and when we were having bucket metric then we were having the bucket label and when no label is added then its global. So having such a scenario, increases metrics complexity. So its better to have 1 metric for one kind of data.

  2. Filtering issues: This is the major issue. Having single metric provide every data makes things complex while plotting the data in grafana charts and adding filters based on specific type like user or bucket. Like in our case I have to use regex in our case to make sue if the metric value has bucket label then this is a bucket chart, if the metric value has user label then plot user charts and if none are there then its global. Plus, Grafana provides ways to define variables based on metrics which is not possible to be done if the labels are non-consistent.

@alimaredia
Copy link
Copy Markdown
Contributor Author

@cbodley anything left to do here you see? should I run this through the verify suite in teuthology?

The rgw_op section of `counter dump/schema` becomes:
- rgw_op_global for the global op counters
- rgw_op_per_user for the user labeled counters
- rgw_op_per_bucket for the bucket labeled counters

Signed-off-by: Ali Maredia <amaredia@redhat.com>
Signed-off-by: Ali Maredia <amaredia@redhat.com>
@alimaredia alimaredia force-pushed the wip-rgw-op-metrics-section-rename branch from 02029cd to 6a0960b Compare January 31, 2024 22:18
@alimaredia
Copy link
Copy Markdown
Contributor Author

@cbodley this branched passed a minimal teuthology run: https://pulpito.ceph.com/amaredia-2024-02-01_05:08:51-rgw:verify-wip-rgw-op-metrics-section-rename-distro-default-smithi/

Is this enough or should I do a bigger one?

@cbodley
Copy link
Copy Markdown
Contributor

cbodley commented Feb 2, 2024

@cbodley this branched passed a minimal teuthology run: https://pulpito.ceph.com/amaredia-2024-02-01_05:08:51-rgw:verify-wip-rgw-op-metrics-section-rename-distro-default-smithi/

Is this enough or should I do a bigger one?

please run the full rgw suite

@alimaredia
Copy link
Copy Markdown
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants