ceph CLI: introduce daemonhistogram command#41600
Conversation
jdurgin
left a comment
There was a problem hiding this comment.
Great idea to integrate this to make it easily accessible!
Could you add a test to invoke it somewhere so we can be sure it at least does not crash?
doc/man/8/ceph.rst
Outdated
|
|
||
| Usage:: | ||
|
|
||
| ceph daemonperf {type.all|daemon_names|socket_paths} [ {batch|live} [{interval} [{count}]]] |
There was a problem hiding this comment.
can this describe the meaning of the parameters / that you need to run it on the same host as the daemon?
1ad51be to
53cf242
Compare
@jdurgin - done! |
53cf242 to
b7744fa
Compare
|
jenkins test make check |
b7744fa to
3bbbacd
Compare
|
@jdurgin - Josh, mind reviewing? Or may be recommend someone for that? Thanks! |
3bbbacd to
495b520
Compare
This allows to monitor how fragmented resulting allocations are depending on the requested block size. Could be coupled with ceph#41600 for easy live monitoring for either an individual OSD or full set of them. Signed-off-by: Igor Fedotov <ifedotov@suse.com>
This allows to monitor how fragmented resulting allocations are depending on the requested block size. Could be coupled with ceph#41600 for easy live monitoring for either an individual OSD or full set of them. Signed-off-by: Igor Fedotov <ifedotov@suse.com>
| OSD_TESTS+=" admin_heap_profiler" | ||
| OSD_TESTS+=" osd_tell_help_command" | ||
| OSD_TESTS+=" osd_compact" | ||
| OSD_TESTS+=" osd_daemon_histogram" |
There was a problem hiding this comment.
2021-07-29T08:48:54.174 INFO:tasks.workunit.client.0.smithi143.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:2960: main: test_osd_daemon_histogram
2021-07-29T08:48:54.174 INFO:tasks.workunit.client.0.smithi143.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:2731: test_osd_daemon_histogram: sudo ceph daemonhistogram osd.all osd.op_r_latency_out_bytes_histogram batch 1 3
2021-07-29T08:48:54.654 INFO:tasks.workunit.client.0.smithi143.stderr:no valid command found; 10 closest matches:
2021-07-29T08:48:54.654 INFO:tasks.workunit.client.0.smithi143.stderr:pg stat
2021-07-29T08:48:54.655 INFO:tasks.workunit.client.0.smithi143.stderr:pg getmap
2021-07-29T08:48:54.655 INFO:tasks.workunit.client.0.smithi143.stderr:pg dump [<dumpcontents:all|summary|sum|delta|pools|osds|pgs|pgs_brief>...]
2021-07-29T08:48:54.655 INFO:tasks.workunit.client.0.smithi143.stderr:pg dump_json [<dumpcontents:all|summary|sum|pools|osds|pgs>...]
2021-07-29T08:48:54.656 INFO:tasks.workunit.client.0.smithi143.stderr:pg dump_pools_json
2021-07-29T08:48:54.656 INFO:tasks.workunit.client.0.smithi143.stderr:pg ls-by-pool <poolstr> [<states>...]
2021-07-29T08:48:54.656 INFO:tasks.workunit.client.0.smithi143.stderr:pg ls-by-primary <id|osd.id> [<pool:int>] [<states>...]
2021-07-29T08:48:54.656 INFO:tasks.workunit.client.0.smithi143.stderr:pg ls-by-osd <id|osd.id> [<pool:int>] [<states>...]
2021-07-29T08:48:54.656 INFO:tasks.workunit.client.0.smithi143.stderr:pg ls [<pool:int>] [<states>...]
2021-07-29T08:48:54.657 INFO:tasks.workunit.client.0.smithi143.stderr:pg dump_stuck [<stuckops:inactive|unclean|stale|undersized|degraded>...] [<threshold:int>]
2021-07-29T08:48:54.657 INFO:tasks.workunit.client.0.smithi143.stderr:Error EINVAL: invalid command
There was a problem hiding this comment.
@tchaikov - thanks for testing! This however looks like a broken/uncomplete ceph tool deployment - the daemonhistogram call from the test case looks valid but not available in ceph tool. I presume legacy tool version is in use. I added a ceph tool help call prior to daemonhistogram invocation to verify that proper tool is in use. Could you please re-run the QA.
There was a problem hiding this comment.
I'm assuming this has been solved.
495b520 to
a504341
Compare
5918bc2 to
f4904b8
Compare
|
|
||
| def monids(): | ||
| ret, outbuf, outs = json_command(cluster_handle, prefix='mon dump', | ||
| def monids(h): |
There was a problem hiding this comment.
this breaks the test of cephtool/test.sh, like
2022-07-27T09:23:37.246 INFO:tasks.workunit.client.0.smithi154.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:35: expect_false: set -x
2022-07-27T09:23:37.247 INFO:tasks.workunit.client.0.smithi154.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:36: expect_false: ceph ping mon.foo
2022-07-27T09:23:37.312 INFO:tasks.workunit.client.0.smithi154.stderr:[errno 2] RADOS object not found (error calling ping_monitor)
2022-07-27T09:23:37.313 INFO:tasks.workunit.client.0.smithi154.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:36: expect_false: return 0
2022-07-27T09:23:37.313 INFO:tasks.workunit.client.0.smithi154.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:2639: test_mon_ping: ceph ping 'mon.*'
2022-07-27T09:23:37.415 INFO:tasks.workunit.client.0.smithi154.stderr:monids() missing 1 required positional argument: 'h'
|
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
|
This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution! |
| daemonhistogram | ||
| --------------- | ||
|
|
||
| Visualize Ceph daemons' performance histogram. To be executed from daemon's host. |
| Parameters:: | ||
|
|
||
| * daemons - Daemons to collect data from. This can be either 'daemon_type.all' specification to use every daemon of a specific type. Or {daemon_names|socket_paths}*}. Or a comma separated list of daemon ids or admin socket paths. | ||
| * histogram_name - Performance histogram name, formatted as <component>.<name>. List of possible values is available via 'ceph daemon <daemon_name> perf histogram schema' command |
| * daemons - Daemons to collect data from. This can be either 'daemon_type.all' specification to use every daemon of a specific type. Or {daemon_names|socket_paths}*}. Or a comma separated list of daemon ids or admin socket paths. | ||
| * histogram_name - Performance histogram name, formatted as <component>.<name>. List of possible values is available via 'ceph daemon <daemon_name> perf histogram schema' command | ||
| * mode - output mode, either live(default) or batch. | ||
| * interval - period (in seconds) to update the histogram, 1 sec if omitted |
There was a problem hiding this comment.
s/1 sec if omitted/defaults to one second/
| OSD_TESTS+=" admin_heap_profiler" | ||
| OSD_TESTS+=" osd_tell_help_command" | ||
| OSD_TESTS+=" osd_compact" | ||
| OSD_TESTS+=" osd_daemon_histogram" |
There was a problem hiding this comment.
I'm assuming this has been solved.
|
|
||
| sockpaths = [] | ||
| for_whom = None | ||
| if len(childargs) > 0 and childargs[0] in ["daemonhistogram"]: |
There was a problem hiding this comment.
Is the intention here allow user to use shortcuts of the command?
>>> "daemonhistog" in "daemonhistogram"
True
If so, is this coherent with other parts of the CLI interface?
| target = mask.split('.', 1) | ||
| service = target[0] | ||
| targets = ids_by_service_for_cluster(name, cluster_name, conffile, service) | ||
| if len(targets) <= 0: |
| if len(childargs) > 0: | ||
| try: | ||
| interval = float(childargs.pop(0)) | ||
| if interval < 0: |
There was a problem hiding this comment.
Can a valid interval be zero?
| if interval < 0: | ||
| raise ValueError | ||
| except ValueError: | ||
| print('daemonperf: interval should be a positive number', file=sys.stderr) |
There was a problem hiding this comment.
This is inconsistent with the interval == 0 case.
| self.termsize.update() | ||
|
|
||
| def create_histogram(self, sockets, counter, last, batch): | ||
|
|
There was a problem hiding this comment.
nit: do we really need this newline?
| except Exception as e: | ||
| return (last, | ||
| "Couldn't connect to admin socket, result: \n{}".format(e)) | ||
| current_datasets[socket] = json_d[c[0]][c[1]]['values'] |
There was a problem hiding this comment.
Plenty of magics. Naming c[0] and c[1] would this far more readable.
| def _handle_sigwinch(self, signo, frame) -> None: | ||
| self.termsize.update() | ||
|
|
||
| def create_histogram(self, sockets, counter, last, batch): |
There was a problem hiding this comment.
Why is this a public method? The only caller I'm able to find is _print_vals_histogram().
|
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
|
This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution! |

This patch embeds daemon's histogram presentation functionality originally brought by ../src/tools/histogram_dump.py script into primary Ceph's CLI.
Some minor usability improvements have been made too:
Signed-off-by: Igor Fedotov ifedotov@suse.com
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume tox