crimson: implement crimson Omap iterate interface by liu-chunmei · Pull Request #62530 · ceph/ceph

liu-chunmei · 2025-03-27T06:24:16Z

https://tracker.ceph.com/issues/69612
https://tracker.ceph.com/issues/69990

For ceph_test_cls_rbd test, original omap list will cause slow request when osd_op_complaint_time <=3s, by this PR, don't meet omap list slow request even setting osd_op_complaint_time = 1s.

Contribution Guidelines

To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

Tracker (select at least one)
- [x ] References tracker ticket
- Very recent bug; references commit where it was introduced
- [ x] New feature (ticket optional)
- Doc update (no ticket needed)
- Code cleanup (no ticket needed)
Component impact
- Affects Dashboard, opened tracker ticket
- Affects Orchestrator, opened tracker ticket
- No impact that needs to be tracked
Documentation (select at least one)
- Updates relevant documentation
- No doc update is appropriate
Tests (select at least one)
- [ x] Includes unit test(s)
- Includes integration test(s)
- Includes bug reproducer
- No tests

Show available Jenkins commands

jenkins test classic perf Jenkins Job | Jenkins Job Definition
jenkins test crimson perf Jenkins Job | Jenkins Job Definition
jenkins test signed Jenkins Job | Jenkins Job Definition
jenkins test make check Jenkins Job | Jenkins Job Definition
jenkins test make check arm64 Jenkins Job | Jenkins Job Definition
jenkins test submodules Jenkins Job | Jenkins Job Definition
jenkins test dashboard Jenkins Job | Jenkins Job Definition
jenkins test dashboard cephadm Jenkins Job | Jenkins Job Definition
jenkins test api Jenkins Job | Jenkins Job Definition
jenkins test docs ReadTheDocs | Github Workflow Definition
jenkins test ceph-volume all Jenkins Jobs | Jenkins Jobs Definition
jenkins test windows Jenkins Job | Jenkins Job Definition
jenkins test rook e2e Jenkins Job | Jenkins Job Definition

cyx1231st · 2025-03-27T09:28:03Z

Can you verify whether the mentioned teuthology test mentioned is passing or not? (Repeat 5 times to see if it's stable)

cyx1231st · 2025-03-27T09:40:59Z

Nit, by convention, commit title should match the related directory or file, e.g. :

2nd commit should be: "test/crimson/seastore/test_omap_manager: test omap_iterate()"
3rd: "crimson/osd/pg_backend: replace omap_get_keys/vals() by omap_iterate()"
4th: "crimson/osd/replicated_recovery_backend: replace omap_get_values() by omap_iterate()"
5th: "crimson/osd/../scrub_events: replace omap_get_values() by omap_iterate()"

cyx1231st

High-level comments.

src/crimson/os/futurized_store.h

src/crimson/os/seastore/seastore.h

src/crimson/os/seastore/omap_manager.h

Matan-B

(Left an additional review below)

src/crimson/os/alienstore/alien_store.cc

src/crimson/os/futurized_store.h

Matan-B

Alien/CyanStore side looks good so far! We should consider introducing the new crimson logic as coroutines. This could largely help with future maintainability. What do you think?

Matan-B · 2025-03-31T09:03:13Z

src/crimson/osd/pg_backend.cc

+      return maybe_do_omap_iterate(store, coll, os.oi, start_from, callback)
+        .safe_then([&delta_stats, &osd_op, &result, &num, &truncated](auto ret) {
+	  if (ret != ObjectStore::omap_iter_ret_t::STOP) {
+	    logger().warn("omap_iterate not meet a stop condition");
+	  }
+          encode(num, osd_op.outdata);
+          osd_op.outdata.claim_append(result);
+          encode(truncated, osd_op.outdata);
+          delta_stats.num_rd_kb += shift_round_up(osd_op.outdata.length(), 10);
+          delta_stats.num_rd++;
+        }).handle_error(
+          crimson::ct_error::enodata::handle([&osd_op] {
+            uint32_t num = 0;
+            bool truncated = false;
+            encode(num, osd_op.outdata);
+            encode(truncated, osd_op.outdata);
+            osd_op.rval = 0;
+            return seastar::now();
+          }),
+          ll_read_errorator::pass_further{}


This could be a great opportunity to introduce the new logic with coroutines:

auto ret = co_await maybe_do_omap_iterate((store, coll, os.oi, start_from, callback).handle_error(...) if (ret != ObjectStore::omap_iter_ret_t::STOP) { logger().warn("omap_iterate not meet a stop condition"); } encode(num, osd_op.outdata); ...

will do it.

It would be ok to do the new code as a coroutine and update the rest of the file later. Doing the whole file later would also be ok, I think.

Generally, avoiding the need for future combinators like do_with by using coroutines is a pretty huge readability win.

@Matan-B @athanatos any examples for coroutine with return type interruptible errorator, I try to co_return ll_read_ierrorator::make_ready_future<>(); seems not supported.

It should work, but you'll need crimson/common/coroutine.h for the supporting awaiter types.

There's already quite a bit of code that uses coroutines with errorated and interruptible futures.

@Matan-B @athanatos coroutine can't return errorator, so if I use coroutine, in co_await function, I only can do crimson::ct_error::assert_all{} to handle all errorators, the caller will not get errorator information. and in function which calling ca_await, I also can't use co_return to return errorator, only can do try{}throw error. Do you want me change like this or wait until our coroutine support return errorator? better to let coroutine support return errorator first. Thanks!

To handle errors we would need handle_error after awaiting the coroutine, see for example how OSD::mkfs handles mount_ertr from mount()

co_await store.mount().handle_error( crimson::stateful_ec::assert_failure([FNAME](const auto& ec) { ERROR("error mounting object store in {}: ({}) {}", local_conf().get_val<std::string>("osd_data"), ec.value(), ec.message()); })); .... co_return;

See test_crimson_coroutine.cc for more examples.

I was able to compile the following in replicated_backend:

+#include "crimson/common/coroutine.h" .. + + ll_read_errorator::future<> + do_stuff() { + co_return; + }; + + ll_read_ierrorator::future<> + do_stuff2() { + co_return; + }; + ll_read_ierrorator::future<ceph::bufferlist> _read(const hobject_t& hoid, uint64_t off, uint64_t len, uint32_t flags) override;

@Matan-B I have the two issues:

co_await maybe_do_omap_iterate(store, coll, os.oi, start_from, callback) .safe_then([&osd_op, &delta_stats, &result, &num, &truncated](auto ret) { ...... }).handle_error( crimson::ct_error::enodata::handle([&osd_op] { encode(uint32_t{0} /* num */, osd_op.outdata); encode(bool{false} /* truncated */, osd_op.outdata); osd_op.rval = 0; return ll_read_errorator::now(); }), ll_read_errorator::pass_further{} **//how does the caller get this errorator?** );

if (!os.exists || os.oi.is_whiteout()) { logger().debug("{}: object does not exist: {}", os.oi.soid); return crimson::ct_error::enoent::make(); **// how to return this errorator? can't use return or co_return** } ..... co_await ...

@liu-chunmei - I've added some examples to your questions and it should compile now. Please see: 2f835ae.
Unittests added as a separate PR- #62649

Matan-B · 2025-03-31T09:05:37Z

src/crimson/osd/osd_operations/scrub_events.cc

+        return seastar::do_with(
+          std::move(start_from),
+          std::move(callback),
+          [FNAME, this, &obj, &progress, &entry, &pg] (auto &start_from, auto &fun) {
+            return pg.shard_services.get_store().omap_iterate(
+              pg.get_collection_ref(),
+              obj,
+              start_from,
+              fun
+            ).safe_then([FNAME, this, &obj, &progress, &entry, &pg](auto result) {
+              if (result == ObjectStore::omap_iter_ret_t::NEXT) {
+                DEBUGDPP("op: {}, obj: {}, progress: {} omap done",
+                          pg, *this, obj, progress);
+                progress.keys_done = true;


Same here we can avoid the seastar::do_with:

auto result = co_await pg.shard_services.get_store().omap_iterate(..)..handle_error(..); if (result == ObjectStore::omap_iter_ret_t::NEXT) { ... }

src/include/buffer.h

Matan-B · 2025-03-31T09:10:48Z

src/osd/SnapMapper.cc

+    ch, hoid, FuturizedStore::Shard::omap_keys_t{key}
+  ).safe_then([&key, next] (FuturizedStore::Shard::omap_values_t&& vals) {


Does this change belong to this commit?

it is introduced by dropping omap_get_values(, , &start), so change it to use omap_get_values( , , keyset).

This came up in ceph#62530. Showcase an awaited throwing coroutine and a passed_further error. Signed-off-by: Matan Breizman <mbreizma@redhat.com>

Matan-B

Coroutines structure seem to work well! However, I'm not sure I understand why are we wrapping the awaiting calls with lambdas. I think that we can avoid that and simply co_await when needed. I've left an example on one of the usages.

Matan-B · 2025-04-07T08:38:28Z

src/crimson/osd/pg_backend.cc

-        }
+    std::min(max_return, max_omap_entries);
+
+  co_await [&os, &delta_stats, &osd_op, start_after, max_return, this]() -> ll_read_ierrorator::future<> {


Why is the lambda needed here?

ceph::bufferlist result; uint32_t num = 0; bool truncated = false; ... co_await maybe_do_omap_iterate ....

coroutine maybe_do_omap_iterate use reference for start_from and callback function. so, there will be problem for these variables' lifecycle. I put them in lambda and in a coroutine to make the alive.

Got it. It seems to me that this is the first suspension point therefore the references are still valid. However, any future additional suspension point prior to this one might result in a dangling reference.

I think that we should prioritize readability and skip the lambda wrapper as long as the references are well defined. We can add a comment above the co_await call that any additional suspension point prior to this one would require guarding the references lifecycle.

See:

CP.53: Parameters to coroutines should not be passed by reference Reason Once a coroutine reaches the first suspension point, such as a co_await, the synchronous portion returns. After that point any parameters passed by reference are dangling. Any usage beyond that is undefined behavior which may include writing to freed memory.

I do understand that this is tricky and therefore keeping the lambda is also ok by me.
@liu-chunmei, what do you think?

@Matan-B since there are result, num, truncated, and start_from shared between callback and omap following call, I don't want to use some share_ptr for these variables, just keep this co_await lambada for this case.

If removing the additional lambda (co_await [&os, &delta_stats, &osd_op, start_after, max_return, this]() -> ll_read_ierrorator::future<> { ... }), why "result, num, truncated, and start_from shared between callback and omap following call" can have lifecycle problems?

Seems to me that without the additional lambda, for example:

PGBackend::omap_get_keys(...) const { ... ceph::bufferlist result; uint32_t num = 0; bool truncated = false; ObjectStore::omap_iter_seek_t start_from{ ... }; std::function< ... > callback; co_await maybe_do_omap_iterate(...).safe_then(...); // IIUC result, num, truncated, start_from, callback should still be alive here // because omap_get_keys(...) isn't returned yet. }

^ If I'm correct above, these local variable's lifecycle are extended until omap_get_keys() is returned, which means they must be alive during co_await maybe_do_omap_iterate(...).safe_then(...);.

Is that correct?

Is that correct?

@cyx1231st, Yes though it's not the problematic usage that was addressed with the lambda.
See omap_get_keys signature:

PGBackend::omap_get_keys( const ObjectState& os, OSDOp& osd_op, object_stat_sum_t& delta_stats) const

The tricky variables are os, osd_op and delta_stats which are are all passed by ref.

// IIUC result, num, truncated, start_from, callback should still be alive here // because omap_get_keys(...) isn't returned yet.

This is true. These where also passed to the lambda so they would be accessible there and not for life-cycle maintenance.

Looking back, it seems easier to understand why the lambda is used here if it was replaced with do_with. But I don't see it as a blocker for now. What do you think?

The tricky variables are os, osd_op and delta_stats which are are all passed by ref.

They were passed by & from the caller (looks the caller makes sure they will be alive), and also passed by & in this additional lambda -- their lifecycle aren't changed with or without the additional lambda.

So I still don't see how this additional lambda is useful ..

They were passed by & from the caller (looks the caller makes sure they will be alive), and also passed by & in this additional lambda -- their lifecycle aren't changed with or without the additional lambda.

So I still don't see how this additional lambda is useful ..

Ah, that looks correct to me, I've missed that.
Unless we make sure the upper layer is keeping them alive or we call them before the first suspension point - we will have to fallback to do_with

edit: or use the lambda with copy captures, but seems that do_with or seastar::coroutine::lambda is more readable.

@cyx1231st I think you are right, local variable will keep alive until omap_get_keys return, needn't co_await to capture the local variable. I will remove it.

src/crimson/osd/pg_backend.cc

Matan-B · 2025-07-13T09:06:35Z

https://pulpito.ceph.com/matan-2025-07-13_08:45:30-crimson-rados-wip-liucm-omap-empty-test-crimson-only-distro-crimson-debug-smithi/

Matan-B · 2025-07-13T12:18:39Z

https://pulpito.ceph.com/matan-2025-07-13_08:45:30-crimson-rados-wip-liucm-omap-empty-test-crimson-only-distro-crimson-debug-smithi/
The failures are due to the suite branch used being newer than the branch tested (i.e not included #63144). Other than that:
All jobs passed!

@cyx1231st, was your latest review addressed? If so, this can be merged.

cyx1231st · 2025-07-14T07:16:00Z

See #62530 (comment), I'm still not convinced about the correctness if we fall into the new else case, even if the tests are passed. Previously we don't have this additional if/else, which has been working fine.

Can the additional if (iter != iter_begin()) { be dropped in this PR? If not, what becomes wrong specifically with this PR and why ?

cyx1231st · 2025-07-14T07:19:17Z

Some other (although minor) unresolved comments are not addressed as well.

Signed-off-by: chunmei liu <chunmei.liu@ibm.com>

needn't scan the entire omap tree. Signed-off-by: chunmei liu <chunmei.liu@ibm.com>

… omap_iterate() Signed-off-by: chunmei liu <chunmei.liu@ibm.com>

Signed-off-by: chunmei liu <chunmei.liu@ibm.com>

liu-chunmei · 2025-07-15T03:24:44Z

Some other (although minor) unresolved comments are not addressed as well.

seems address all. please take a look.

…rate Signed-off-by: chunmei liu <chunmei.liu@ibm.com>

Signed-off-by: Chunmei Liu <chunmei.liu@ibm.com>

Signed-off-by: chunmei liu <chunmei.liu@ibm.com>

cyx1231st

LGTM!

Doesn't worth full tests as the changes were minor.

cyx1231st · 2025-07-15T03:48:19Z

Crimson-has-followup:

Drop duplicated OMapManager::omap_list(): crimson: implement crimson Omap iterate interface #62530 (comment)
Drop the unnecessary omap_list_config_t from OMapManager::omap_rm_key_range()

Matan-B · 2025-07-15T14:53:02Z

LGTM!

Doesn't worth full tests as the changes were minor.

Ack, merging based on https://pulpito.ceph.com/matan-2025-07-13_08:45:30-crimson-rados-wip-liucm-omap-empty-test-crimson-only-distro-crimson-debug-smithi/

only failures are cls which were not included in this branch.

Matan-B · 2025-07-15T14:57:56Z

Crimson-has-followup:

Drop duplicated OMapManager::omap_list(): crimson: implement crimson Omap iterate interface #62530 (comment)

Drop the unnecessary omap_list_config_t from OMapManager::omap_rm_key_range()

https://tracker.ceph.com/issues/69990#note-7

https://tracker.ceph.com/issues/69612 is resolved :)

liu-chunmei requested a review from a team as a code owner March 27, 2025 06:24

github-actions bot added crimson tests labels Mar 27, 2025

liu-chunmei requested review from Matan-B and cyx1231st March 27, 2025 06:27

Matan-B added this to Crimson Mar 27, 2025

Matan-B moved this to Awaits review in Crimson Mar 27, 2025

rzarzynski self-requested a review March 27, 2025 14:48

liu-chunmei force-pushed the omap_iterate branch from 9e7071a to 852252d Compare March 28, 2025 07:53

cyx1231st reviewed Mar 28, 2025

View reviewed changes

cyx1231st added the seastore label Mar 28, 2025

cyx1231st requested review from athanatos and xxhdx1985126 March 28, 2025 09:01

liu-chunmei force-pushed the omap_iterate branch from 852252d to 1e897c2 Compare March 30, 2025 10:18

liu-chunmei requested a review from a team as a code owner March 30, 2025 10:18

github-actions bot added the core label Mar 30, 2025

liu-chunmei force-pushed the omap_iterate branch from 1e897c2 to 56063bd Compare March 30, 2025 22:35

Matan-B reviewed Mar 31, 2025

View reviewed changes

src/crimson/os/alienstore/alien_store.cc Show resolved Hide resolved

src/crimson/os/alienstore/alien_store.cc Outdated Show resolved Hide resolved

src/crimson/os/futurized_store.h Outdated Show resolved Hide resolved

src/crimson/os/futurized_store.h Outdated Show resolved Hide resolved

Matan-B reviewed Mar 31, 2025

View reviewed changes

liu-chunmei force-pushed the omap_iterate branch 4 times, most recently from 4879953 to bbeb0e6 Compare April 3, 2025 00:33

Matan-B mentioned this pull request Apr 3, 2025

test/crimson/test_crimson_coroutines: showcase errorator examples #62649

Merged

14 tasks

liu-chunmei force-pushed the omap_iterate branch from bbeb0e6 to 2664845 Compare April 5, 2025 22:32

Matan-B reviewed Apr 7, 2025

View reviewed changes

liu-chunmei force-pushed the omap_iterate branch from f8484ca to 831d952 Compare July 1, 2025 23:54

liu-chunmei force-pushed the omap_iterate branch from 831d952 to 4c0f0e3 Compare July 9, 2025 06:19

Matan-B moved this from In Progress to Needs QA in Crimson Jul 13, 2025

Matan-B requested a review from cyx1231st July 13, 2025 09:06

Matan-B moved this from Needs QA to Tested in Crimson Jul 13, 2025

Matan-B added the TESTED label Jul 13, 2025

liu-chunmei added 6 commits July 14, 2025 19:58

crimson/os: add omap_iterate interface to futurized store

e696da8

Signed-off-by: chunmei liu <chunmei.liu@ibm.com>

test/crimson/seastore/test_omap_manager: test omap_iterate()

ed8ccc9

Signed-off-by: chunmei liu <chunmei.liu@ibm.com>

crimson/osd/pg_backend: replace omap_get_keys/vals() by omap_iterate()

9d10221

needn't scan the entire omap tree. Signed-off-by: chunmei liu <chunmei.liu@ibm.com>

crimson/osd/replicated_recovery_backend: replace omap_get_values() by…

97000a1

… omap_iterate() Signed-off-by: chunmei liu <chunmei.liu@ibm.com>

crimson/osd/../scrub_events: replace omap_get_values() by omap_iterate()

f109ce9

Signed-off-by: chunmei liu <chunmei.liu@ibm.com>

osd/PGLog.cc: replace omap_get_values by omap_iterate

3ccc8ec

Signed-off-by: chunmei liu <chunmei.liu@ibm.com>

liu-chunmei force-pushed the omap_iterate branch from 4c0f0e3 to 6a245a0 Compare July 15, 2025 03:15

liu-chunmei added 3 commits July 14, 2025 20:38

test/crimson/seastore: check_omap replace omap_get_values by omap_ite…

d725a9f

…rate Signed-off-by: chunmei liu <chunmei.liu@ibm.com>

osd/SnapMapper: replace omap_get_values by omap_iterate in get_next

019a75c

Signed-off-by: Chunmei Liu <chunmei.liu@ibm.com>

crimson/os: drop off omap_get_values with start

aa2f606

Signed-off-by: chunmei liu <chunmei.liu@ibm.com>

liu-chunmei force-pushed the omap_iterate branch from 6a245a0 to aa2f606 Compare July 15, 2025 03:39

cyx1231st approved these changes Jul 15, 2025

View reviewed changes

cyx1231st added the crimson-has-followup label Jul 15, 2025

Matan-B merged commit 853c609 into ceph:main Jul 15, 2025
13 checks passed

Matan-B moved this from Tested to Merged (Main) in Crimson Jul 15, 2025

Matan-B mentioned this pull request Jul 16, 2025

crimson: Add objectstore tool with partial functionality #63604

Merged

14 tasks

		ch, hoid, FuturizedStore::Shard::omap_keys_t{key}
		).safe_then([&key, next] (FuturizedStore::Shard::omap_values_t&& vals) {

Conversation

liu-chunmei commented Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Contribution Guidelines

Checklist

Uh oh!

cyx1231st commented Mar 27, 2025

Uh oh!

cyx1231st commented Mar 27, 2025

Uh oh!

cyx1231st left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Matan-B left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Matan-B left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

liu-chunmei Mar 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Matan-B Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

liu-chunmei Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Matan-B Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Matan-B Mar 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Matan-B left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

liu-chunmei commented Mar 27, 2025 •

edited

Loading

Matan-B left a comment •

edited

Loading

liu-chunmei Mar 31, 2025 •

edited

Loading

Matan-B Apr 2, 2025 •

edited

Loading

liu-chunmei Apr 2, 2025 •

edited

Loading

Matan-B Apr 3, 2025 •

edited

Loading

Matan-B Mar 31, 2025 •

edited

Loading

Matan-B Apr 16, 2025 •

edited

Loading