Skip to content

ceph CLI: introduce daemonhistogram command#41600

Closed
ifed01 wants to merge 1 commit intoceph:mainfrom
ifed01:wip-ifed-daemonhistogram
Closed

ceph CLI: introduce daemonhistogram command#41600
ifed01 wants to merge 1 commit intoceph:mainfrom
ifed01:wip-ifed-daemonhistogram

Conversation

@ifed01
Copy link
Contributor

@ifed01 ifed01 commented May 31, 2021

This patch embeds daemon's histogram presentation functionality originally brought by ../src/tools/histogram_dump.py script into primary Ceph's CLI.
Some minor usability improvements have been made too:

  • allow daemon specification via both daemon's name and admin socket path
  • allow one-shot every daemon specification via type.all clause, e.g. osd.all
  • minor improvements in axis presentation

Signed-off-by: Igor Fedotov ifedotov@suse.com

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

@ifed01
Copy link
Contributor Author

ifed01 commented May 31, 2021

histogram_demo

Copy link
Member

@jdurgin jdurgin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea to integrate this to make it easily accessible!

Could you add a test to invoke it somewhere so we can be sure it at least does not crash?


Usage::

ceph daemonperf {type.all|daemon_names|socket_paths} [ {batch|live} [{interval} [{count}]]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this describe the meaning of the parameters / that you need to run it on the same host as the daemon?

@ifed01 ifed01 force-pushed the wip-ifed-daemonhistogram branch from 1ad51be to 53cf242 Compare June 16, 2021 17:31
@github-actions github-actions bot added the tests label Jun 16, 2021
@ifed01
Copy link
Contributor Author

ifed01 commented Jun 16, 2021

Great idea to integrate this to make it easily accessible!

Could you add a test to invoke it somewhere so we can be sure it at least does not crash?

@jdurgin - done!

@ifed01 ifed01 force-pushed the wip-ifed-daemonhistogram branch from 53cf242 to b7744fa Compare June 16, 2021 17:43
@neha-ojha
Copy link
Member

jenkins test make check

@ifed01 ifed01 force-pushed the wip-ifed-daemonhistogram branch from b7744fa to 3bbbacd Compare June 28, 2021 12:24
@ifed01
Copy link
Contributor Author

ifed01 commented Jul 20, 2021

@jdurgin - Josh, mind reviewing? Or may be recommend someone for that? Thanks!

@ifed01 ifed01 force-pushed the wip-ifed-daemonhistogram branch from 3bbbacd to 495b520 Compare July 22, 2021 16:38
ifed01 added a commit to ifed01/ceph that referenced this pull request Jul 22, 2021
This allows to monitor how fragmented resulting allocations are
depending on the requested block size.

Could be coupled with ceph#41600 for easy
live monitoring for either an individual OSD or full set of them.

Signed-off-by: Igor Fedotov <ifedotov@suse.com>
ifed01 added a commit to ifed01/ceph that referenced this pull request Jul 22, 2021
This allows to monitor how fragmented resulting allocations are
depending on the requested block size.

Could be coupled with ceph#41600 for easy
live monitoring for either an individual OSD or full set of them.

Signed-off-by: Igor Fedotov <ifedotov@suse.com>
Copy link
Member

@jdurgin jdurgin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me!

OSD_TESTS+=" admin_heap_profiler"
OSD_TESTS+=" osd_tell_help_command"
OSD_TESTS+=" osd_compact"
OSD_TESTS+=" osd_daemon_histogram"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2021-07-29T08:48:54.174 INFO:tasks.workunit.client.0.smithi143.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:2960: main:  test_osd_daemon_histogram
2021-07-29T08:48:54.174 INFO:tasks.workunit.client.0.smithi143.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:2731: test_osd_daemon_histogram:  sudo ceph daemonhistogram osd.all osd.op_r_latency_out_bytes_histogram batch 1 3
2021-07-29T08:48:54.654 INFO:tasks.workunit.client.0.smithi143.stderr:no valid command found; 10 closest matches:
2021-07-29T08:48:54.654 INFO:tasks.workunit.client.0.smithi143.stderr:pg stat
2021-07-29T08:48:54.655 INFO:tasks.workunit.client.0.smithi143.stderr:pg getmap
2021-07-29T08:48:54.655 INFO:tasks.workunit.client.0.smithi143.stderr:pg dump [<dumpcontents:all|summary|sum|delta|pools|osds|pgs|pgs_brief>...]
2021-07-29T08:48:54.655 INFO:tasks.workunit.client.0.smithi143.stderr:pg dump_json [<dumpcontents:all|summary|sum|pools|osds|pgs>...]
2021-07-29T08:48:54.656 INFO:tasks.workunit.client.0.smithi143.stderr:pg dump_pools_json
2021-07-29T08:48:54.656 INFO:tasks.workunit.client.0.smithi143.stderr:pg ls-by-pool <poolstr> [<states>...]
2021-07-29T08:48:54.656 INFO:tasks.workunit.client.0.smithi143.stderr:pg ls-by-primary <id|osd.id> [<pool:int>] [<states>...]
2021-07-29T08:48:54.656 INFO:tasks.workunit.client.0.smithi143.stderr:pg ls-by-osd <id|osd.id> [<pool:int>] [<states>...]
2021-07-29T08:48:54.656 INFO:tasks.workunit.client.0.smithi143.stderr:pg ls [<pool:int>] [<states>...]
2021-07-29T08:48:54.657 INFO:tasks.workunit.client.0.smithi143.stderr:pg dump_stuck [<stuckops:inactive|unclean|stale|undersized|degraded>...] [<threshold:int>]
2021-07-29T08:48:54.657 INFO:tasks.workunit.client.0.smithi143.stderr:Error EINVAL: invalid command

see https://pulpito.ceph.com/kchai-2021-07-29_08:05:42-rados-wip-kefu-testing-2021-07-29-1336-distro-basic-smithi/6300873/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tchaikov - thanks for testing! This however looks like a broken/uncomplete ceph tool deployment - the daemonhistogram call from the test case looks valid but not available in ceph tool. I presume legacy tool version is in use. I added a ceph tool help call prior to daemonhistogram invocation to verify that proper tool is in use. Could you please re-run the QA.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming this has been solved.

@ifed01 ifed01 force-pushed the wip-ifed-daemonhistogram branch from 5918bc2 to f4904b8 Compare December 20, 2021 16:24
@djgalloway djgalloway changed the base branch from master to main July 4, 2022 00:00
@djgalloway djgalloway requested a review from a team as a code owner July 4, 2022 00:00

def monids():
ret, outbuf, outs = json_command(cluster_handle, prefix='mon dump',
def monids(h):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this breaks the test of cephtool/test.sh, like

2022-07-27T09:23:37.246 INFO:tasks.workunit.client.0.smithi154.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:35: expect_false:  set -x
2022-07-27T09:23:37.247 INFO:tasks.workunit.client.0.smithi154.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:36: expect_false:  ceph ping mon.foo
2022-07-27T09:23:37.312 INFO:tasks.workunit.client.0.smithi154.stderr:[errno 2] RADOS object not found (error calling ping_monitor)
2022-07-27T09:23:37.313 INFO:tasks.workunit.client.0.smithi154.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:36: expect_false:  return 0
2022-07-27T09:23:37.313 INFO:tasks.workunit.client.0.smithi154.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:2639: test_mon_ping:  ceph ping 'mon.*'
2022-07-27T09:23:37.415 INFO:tasks.workunit.client.0.smithi154.stderr:monids() missing 1 required positional argument: 'h'

see https://pulpito.ceph.com/kchai-2022-07-27_06:59:49-rados-wip-kefu-testing-2022-07-27-0857-distro-default-smithi/

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Nov 28, 2022
@github-actions
Copy link

This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution!

@github-actions github-actions bot closed this Dec 28, 2022
@ifed01 ifed01 reopened this Jul 12, 2024
@ifed01 ifed01 requested a review from a team as a code owner July 12, 2024 11:49
@github-actions github-actions bot removed the stale label Jul 12, 2024
daemonhistogram
---------------

Visualize Ceph daemons' performance histogram. To be executed from daemon's host.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/from/on/

Parameters::

* daemons - Daemons to collect data from. This can be either 'daemon_type.all' specification to use every daemon of a specific type. Or {daemon_names|socket_paths}*}. Or a comma separated list of daemon ids or admin socket paths.
* histogram_name - Performance histogram name, formatted as <component>.<name>. List of possible values is available via 'ceph daemon <daemon_name> perf histogram schema' command
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/command//

* daemons - Daemons to collect data from. This can be either 'daemon_type.all' specification to use every daemon of a specific type. Or {daemon_names|socket_paths}*}. Or a comma separated list of daemon ids or admin socket paths.
* histogram_name - Performance histogram name, formatted as <component>.<name>. List of possible values is available via 'ceph daemon <daemon_name> perf histogram schema' command
* mode - output mode, either live(default) or batch.
* interval - period (in seconds) to update the histogram, 1 sec if omitted
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/1 sec if omitted/defaults to one second/

Copy link
Contributor

@rzarzynski rzarzynski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically LGTM! The comments are about minors.

I will be happy to merge assuming it passes QA. The teuthology problem is already tackled, right?

OSD_TESTS+=" admin_heap_profiler"
OSD_TESTS+=" osd_tell_help_command"
OSD_TESTS+=" osd_compact"
OSD_TESTS+=" osd_daemon_histogram"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming this has been solved.


sockpaths = []
for_whom = None
if len(childargs) > 0 and childargs[0] in ["daemonhistogram"]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the intention here allow user to use shortcuts of the command?

>>> "daemonhistog" in "daemonhistogram"
True

If so, is this coherent with other parts of the CLI interface?

target = mask.split('.', 1)
service = target[0]
targets = ids_by_service_for_cluster(name, cluster_name, conffile, service)
if len(targets) <= 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: len() below zero?

if len(childargs) > 0:
try:
interval = float(childargs.pop(0))
if interval < 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can a valid interval be zero?

if interval < 0:
raise ValueError
except ValueError:
print('daemonperf: interval should be a positive number', file=sys.stderr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inconsistent with the interval == 0 case.

self.termsize.update()

def create_histogram(self, sockets, counter, last, batch):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do we really need this newline?

except Exception as e:
return (last,
"Couldn't connect to admin socket, result: \n{}".format(e))
current_datasets[socket] = json_d[c[0]][c[1]]['values']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plenty of magics. Naming c[0] and c[1] would this far more readable.

def _handle_sigwinch(self, signo, frame) -> None:
self.termsize.update()

def create_histogram(self, sockets, counter, last, batch):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a public method? The only caller I'm able to find is _print_vals_histogram().

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Sep 11, 2024
@github-actions
Copy link

This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution!

@github-actions github-actions bot closed this Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants