Wip libcephfs perf dump by vshankar · Pull Request #62632 · ceph/ceph

vshankar · 2025-04-02T14:46:49Z

This refactor is done so as to be able to fetch the perf counters
via client APIs (say, libcephfs). The current mechanism of exporting
perf counters is via formatters (JSON, etc..) makes it a little
weird to expose the perf counters via a library API. This change
adds a mechanism to get perf counters by type packed in a struct.

Contribution Guidelines

To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

Tracker (select at least one)
- References tracker ticket
- Very recent bug; references commit where it was introduced
- New feature (ticket optional)
- Doc update (no ticket needed)
- Code cleanup (no ticket needed)
Component impact
- Affects Dashboard, opened tracker ticket
- Affects Orchestrator, opened tracker ticket
- No impact that needs to be tracked
Documentation (select at least one)
- Updates relevant documentation
- No doc update is appropriate
Tests (select at least one)
- Includes unit test(s)
- Includes integration test(s)
- Includes bug reproducer
- No tests

Show available Jenkins commands

jenkins test classic perf Jenkins Job | Jenkins Job Definition
jenkins test crimson perf Jenkins Job | Jenkins Job Definition
jenkins test signed Jenkins Job | Jenkins Job Definition
jenkins test make check Jenkins Job | Jenkins Job Definition
jenkins test make check arm64 Jenkins Job | Jenkins Job Definition
jenkins test submodules Jenkins Job | Jenkins Job Definition
jenkins test dashboard Jenkins Job | Jenkins Job Definition
jenkins test dashboard cephadm Jenkins Job | Jenkins Job Definition
jenkins test api Jenkins Job | Jenkins Job Definition
jenkins test docs ReadTheDocs | Github Workflow Definition
jenkins test ceph-volume all Jenkins Jobs | Jenkins Jobs Definition
jenkins test windows Jenkins Job | Jenkins Job Definition
jenkins test rook e2e Jenkins Job | Jenkins Job Definition

vshankar · 2025-04-07T06:30:35Z

jenkins retest this please

src/libcephfs.cc

vshankar · 2025-04-17T03:25:24Z

jenkins test make check

vshankar · 2025-04-17T03:25:29Z

jenkins test make check arm64

vshankar · 2025-04-22T17:25:50Z

I did a sample run with this change ported to squid branch for some downstream stuff. Here is the test run: https://pulpito.ceph.com/vshankar-2025-04-12_17:18:59-fs-wip-perf-dump-squid-2-testing-default-smithi/

vshankar · 2025-04-22T17:40:29Z

Oh, btw, we need to bump up the libcephfs version since a new API has been added.

batrick · 2025-04-22T18:13:02Z

Oh, btw, we need to bump up the libcephfs version since a new API has been added.

We're not very rigorous about this historically. I think as long as it's a new API/ABI that doesn't affect existing functions then there is probably no need.

batrick

I don't see a ticket associated with this work. Who is the intended user?

I'm a little curious why not just wire up cct->get_admin_socket()->execute_command() (sync or async) to send commands to itself. Then the caller can just use whatever json library to parse it.

mattbenjamin

does this change update the api version?

vshankar · 2025-04-23T06:53:12Z

I don't see a ticket associated with this work. Who is the intended user?

This is what's being asked: "Right now, the only way to get performance data out of a client is with the admin socket. If that's not the right interface for whatever reason, you're out of luck.
At a minimum we have to build libcephfs APIs that provide access to the existing metrics. We will consider building more observability/diagnostics in to the client library, but that will likely be future epics."

I'm a little curious why not just wire up cct->get_admin_socket()->execute_command() (sync or async) to send commands to itself. Then the caller can just use whatever json library to parse it.

I did that initially from Client.cc (or rather, invoke the relevant perf counter API) and handing the JSON back to the caller as JSON (or as std::string) seemed a bit off-putting to me. JSON is totally fine, but here I sticked to look-and-fell of other libcephfs APIs.

vshankar · 2025-04-24T08:06:34Z

I will create a tracker and include details there. @batrick

batrick

I'm a little curious why not just wire up cct->get_admin_socket()->execute_command() (sync or async) to send commands to itself. Then the caller can just use whatever json library to parse it.

I did that initially from Client.cc (or rather, invoke the relevant perf counter API) and handing the JSON back to the caller as JSON (or as std::string) seemed a bit off-putting to me. JSON is totally fine, but here I sticked to look-and-fell of other libcephfs APIs.

libcephfs also exports an mds_command/mds_command2 function which generally will return json as well (which is fine because the mgr is the main user and has json libraries). So I think think there is a case for consistency there. The gap was only that we had no method to send commands to Client itself.

I'd rather not have us try to export perf counters as raw types if we can avoid it. However if it's too troublesome for Ganesha to parse then this is okay with me.

batrick · 2025-04-24T13:38:24Z

src/client/Client.cc

    plb.add_time(l_c_wr_avg, "writeavg", "Average latency for processing write requests");
    plb.add_u64(l_c_wr_sqsum, "writesqsum", "Sum of squares ((to calculate variability/stdev) for write requests");
-    plb.add_u64(l_c_wr_ops, "rdops", "Total write IO operations");
+    plb.add_u64(l_c_wr_ops, "wrops", "Total write IO operations");


gregsfortytwo · 2025-05-06T01:58:47Z

@vshankar I've been thinking about this a lot and I agree with @batrick -- we should just return a json dump a la

int dump_perf_counters(struct ceph_mount_info *cmount, string *perf_dump_json)

There's a few reasons:

it's unlikely users will be doing anything except converting to json anyway,
the most likely second-most-common operation would be dumping a single value, which we can just add another function for,
doing it this way adds a huuuge interface we need to keep in sync, and it's not complete here (no histogram), and if the representation changes we may need to update these types (and we've just seen how troublesome ABI changes are), and it adds memory ownership complexity

I agree the big json dump feels icky, but unless it was a lot of complexity with locking somehow it just seems a lot simpler to implement and maintain as a stable interface. Can you post that one?

vshankar · 2025-05-07T08:51:00Z

@vshankar I've been thinking about this a lot and I agree with @batrick -- we should just return a json dump a la
int dump_perf_counters(struct ceph_mount_info *cmount, string *perf_dump_json)
There's a few reasons:

it's unlikely users will be doing anything except converting to json anyway,

the most likely second-most-common operation would be dumping a single value, which we can just add another function for,

doing it this way adds a huuuge interface we need to keep in sync, and it's not complete here (no histogram), and if the representation changes we may need to update these types (and we've just seen how troublesome ABI changes are), and it adds memory ownership complexity

I agree the big json dump feels icky, but unless it was a lot of complexity with locking somehow it just seems a lot simpler to implement and maintain as a stable interface. Can you post that one?

OK, I'm fine with the above. The requirements were terse and its was unclear (and IMO, its still unclear) if the caller is fine with parsing JSON. I will not throw away this branch altogether though in case there is future requirement for it. Will post an update for returning json string.

vshankar · 2025-05-07T09:24:44Z

https://tracker.ceph.com/issues/71233

vshankar · 2025-05-07T13:40:09Z

@gregsfortytwo @batrick - fixed and updated.

vshankar · 2025-05-07T13:41:47Z

I've used admin socket's execute_command() to fetch the perf dump json and return the json string for parsing (basic tests have been added for verifying the json string).

batrick

otherwise LGTM.

src/include/cephfs/libcephfs.h

src/client/Client.cc

src/test/libcephfs/perfcounters.cc

vshankar · 2025-05-08T16:45:54Z

@batrick PTAL.

batrick · 2025-05-08T18:09:16Z

jenkins test api

vshankar · 2025-05-08T20:03:24Z

This PR is under test in https://tracker.ceph.com/issues/71260.

It collides with total read operations ("rdops") and thereby not getting reported in perf dump. Signed-off-by: Venky Shankar <vshankar@redhat.com>

Fixes: http://tracker.ceph.com/issues/71233 Signed-off-by: Venky Shankar <vshankar@redhat.com>

Signed-off-by: Venky Shankar <vshankar@redhat.com>

vshankar · 2025-05-12T08:43:07Z

Updated to run ceph_test_libcephfs_perfcounters test via libcephfs workunit.

vshankar · 2025-05-15T09:19:07Z

This PR is under test in https://tracker.ceph.com/issues/71260.

Suite run looks good. This will be merged soon.

vshankar · 2025-05-20T04:59:13Z

https://tracker.ceph.com/projects/cephfs/wiki/QA_main_2025#wip-vshankar-testing-20250508200127-debug

vshankar requested review from a team, ljflores, ronen-fr and rzarzynski April 2, 2025 14:46

github-actions bot added build/ops cephfs Ceph File System common tests labels Apr 2, 2025

vshankar requested a review from a team April 7, 2025 06:30

vshankar force-pushed the wip-libcephfs-perf-dump branch from 168693b to dca9aa4 Compare April 8, 2025 08:56

markhpc added the performance label Apr 10, 2025

mchangir reviewed Apr 14, 2025

View reviewed changes

src/libcephfs.cc Outdated Show resolved Hide resolved

vshankar force-pushed the wip-libcephfs-perf-dump branch from dca9aa4 to cb5ebd4 Compare April 14, 2025 15:34

vshankar requested a review from a team April 16, 2025 12:35

vshankar requested a review from gregsfortytwo April 22, 2025 17:03

batrick reviewed Apr 22, 2025

View reviewed changes

mattbenjamin reviewed Apr 22, 2025

View reviewed changes

batrick reviewed Apr 24, 2025

View reviewed changes

vshankar force-pushed the wip-libcephfs-perf-dump branch from cb5ebd4 to f4d64f4 Compare May 7, 2025 13:34

batrick requested changes May 7, 2025

View reviewed changes

src/include/cephfs/libcephfs.h Outdated Show resolved Hide resolved

src/client/Client.cc Show resolved Hide resolved

src/test/libcephfs/perfcounters.cc Show resolved Hide resolved

vshankar force-pushed the wip-libcephfs-perf-dump branch from f4d64f4 to 5808c38 Compare May 8, 2025 16:44

batrick approved these changes May 8, 2025

View reviewed changes

vshankar added the wip-vshankar-testing label May 8, 2025

vshankar added 4 commits May 12, 2025 08:42

client: fix total write operations perf counter name

00ff16c

It collides with total read operations ("rdops") and thereby not getting reported in perf dump. Signed-off-by: Venky Shankar <vshankar@redhat.com>

libcephfs: add API to get client perf counters

259967e

Fixes: http://tracker.ceph.com/issues/71233 Signed-off-by: Venky Shankar <vshankar@redhat.com>

test: add test to fetch perf counters via libcephfs API

c51c9a8

Signed-off-by: Venky Shankar <vshankar@redhat.com>

libcephfs: increment library minor version

48446ab

Signed-off-by: Venky Shankar <vshankar@redhat.com>

vshankar force-pushed the wip-libcephfs-perf-dump branch from 5808c38 to 48446ab Compare May 12, 2025 08:42

vshankar merged commit 9d55fa0 into ceph:main May 20, 2025
12 checks passed

vshankar removed the wip-vshankar-testing label May 20, 2025

vshankar mentioned this pull request Jun 9, 2025

tentacle: Wip libcephfs perf dump #63806

Merged

Conversation

vshankar commented Apr 2, 2025

Contribution Guidelines

Checklist

Uh oh!

vshankar commented Apr 7, 2025

Uh oh!

Uh oh!

vshankar commented Apr 17, 2025

Uh oh!

vshankar commented Apr 17, 2025

Uh oh!

vshankar commented Apr 22, 2025

Uh oh!

vshankar commented Apr 22, 2025

Uh oh!

batrick commented Apr 22, 2025

Uh oh!

batrick left a comment

Choose a reason for hiding this comment

Uh oh!

mattbenjamin left a comment

Choose a reason for hiding this comment

Uh oh!

vshankar commented Apr 23, 2025

Uh oh!

vshankar commented Apr 24, 2025

Uh oh!

batrick left a comment

Choose a reason for hiding this comment

Uh oh!

batrick Apr 24, 2025

Choose a reason for hiding this comment

Uh oh!

gregsfortytwo commented May 6, 2025

Uh oh!

vshankar commented May 7, 2025

Uh oh!

vshankar commented May 7, 2025

Uh oh!

vshankar commented May 7, 2025

Uh oh!

vshankar commented May 7, 2025

Uh oh!

batrick left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vshankar commented May 8, 2025

Uh oh!

batrick commented May 8, 2025

Uh oh!

vshankar commented May 8, 2025

Uh oh!

vshankar commented May 12, 2025

Uh oh!

vshankar commented May 15, 2025

Uh oh!

vshankar commented May 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants