Skip to content

crimson: implement crimson Omap iterate interface#62530

Merged
Matan-B merged 9 commits intoceph:mainfrom
liu-chunmei:omap_iterate
Jul 15, 2025
Merged

crimson: implement crimson Omap iterate interface#62530
Matan-B merged 9 commits intoceph:mainfrom
liu-chunmei:omap_iterate

Conversation

@liu-chunmei
Copy link
Contributor

@liu-chunmei liu-chunmei commented Mar 27, 2025

https://tracker.ceph.com/issues/69612
https://tracker.ceph.com/issues/69990

For ceph_test_cls_rbd test, original omap list will cause slow request when osd_op_complaint_time <=3s, by this PR, don't meet omap list slow request even setting osd_op_complaint_time = 1s.

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • [x ] References tracker ticket
    • Very recent bug; references commit where it was introduced
    • [ x] New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands

@liu-chunmei liu-chunmei requested a review from a team as a code owner March 27, 2025 06:24
@Matan-B Matan-B added this to Crimson Mar 27, 2025
@Matan-B Matan-B moved this to Awaits review in Crimson Mar 27, 2025
@cyx1231st
Copy link
Member

Can you verify whether the mentioned teuthology test mentioned is passing or not? (Repeat 5 times to see if it's stable)

@cyx1231st
Copy link
Member

Nit, by convention, commit title should match the related directory or file, e.g. :

  • 2nd commit should be: "test/crimson/seastore/test_omap_manager: test omap_iterate()"
  • 3rd: "crimson/osd/pg_backend: replace omap_get_keys/vals() by omap_iterate()"
  • 4th: "crimson/osd/replicated_recovery_backend: replace omap_get_values() by omap_iterate()"
  • 5th: "crimson/osd/../scrub_events: replace omap_get_values() by omap_iterate()"

Copy link
Member

@cyx1231st cyx1231st left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High-level comments.

Copy link
Contributor

@Matan-B Matan-B left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Left an additional review below)

Copy link
Contributor

@Matan-B Matan-B left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alien/CyanStore side looks good so far! We should consider introducing the new crimson logic as coroutines. This could largely help with future maintainability. What do you think?

Comment on lines +1411 to +1431
return maybe_do_omap_iterate(store, coll, os.oi, start_from, callback)
.safe_then([&delta_stats, &osd_op, &result, &num, &truncated](auto ret) {
if (ret != ObjectStore::omap_iter_ret_t::STOP) {
logger().warn("omap_iterate not meet a stop condition");
}
encode(num, osd_op.outdata);
osd_op.outdata.claim_append(result);
encode(truncated, osd_op.outdata);
delta_stats.num_rd_kb += shift_round_up(osd_op.outdata.length(), 10);
delta_stats.num_rd++;
}).handle_error(
crimson::ct_error::enodata::handle([&osd_op] {
uint32_t num = 0;
bool truncated = false;
encode(num, osd_op.outdata);
encode(truncated, osd_op.outdata);
osd_op.rval = 0;
return seastar::now();
}),
ll_read_errorator::pass_further{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a great opportunity to introduce the new logic with coroutines:

auto ret = co_await maybe_do_omap_iterate((store, coll, os.oi, start_from, callback).handle_error(...)

if (ret != ObjectStore::omap_iter_ret_t::STOP) {
  logger().warn("omap_iterate not meet a stop condition");
}
encode(num, osd_op.outdata);
...

Copy link
Contributor Author

@liu-chunmei liu-chunmei Mar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be ok to do the new code as a coroutine and update the rest of the file later. Doing the whole file later would also be ok, I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, avoiding the need for future combinators like do_with by using coroutines is a pretty huge readability win.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Matan-B @athanatos any examples for coroutine with return type interruptible errorator, I try to co_return ll_read_ierrorator::make_ready_future<>(); seems not supported.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should work, but you'll need crimson/common/coroutine.h for the supporting awaiter types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's already quite a bit of code that uses coroutines with errorated and interruptible futures.

Copy link
Contributor

@Matan-B Matan-B Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Matan-B @athanatos coroutine can't return errorator, so if I use coroutine, in co_await function, I only can do crimson::ct_error::assert_all{} to handle all errorators, the caller will not get errorator information. and in function which calling ca_await, I also can't use co_return to return errorator, only can do try{}throw error. Do you want me change like this or wait until our coroutine support return errorator? better to let coroutine support return errorator first. Thanks!

To handle errors we would need handle_error after awaiting the coroutine, see for example how OSD::mkfs handles mount_ertr from mount()

  co_await store.mount().handle_error(
    crimson::stateful_ec::assert_failure([FNAME](const auto& ec) {
      ERROR("error mounting object store in {}: ({}) {}",
	    local_conf().get_val<std::string>("osd_data"),
	    ec.value(), ec.message());
    }));

....

  co_return;

See test_crimson_coroutine.cc for more examples.


I was able to compile the following in replicated_backend:

+#include "crimson/common/coroutine.h"
..
+
+  ll_read_errorator::future<>
+    do_stuff() {
+        co_return;
+    };
+
+  ll_read_ierrorator::future<>
+    do_stuff2() {
+        co_return;
+    };
+
   ll_read_ierrorator::future<ceph::bufferlist>
     _read(const hobject_t& hoid, uint64_t off,
          uint64_t len, uint32_t flags) override;

Copy link
Contributor Author

@liu-chunmei liu-chunmei Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Matan-B I have the two issues:

      co_await maybe_do_omap_iterate(store, coll, os.oi, start_from, callback)
        .safe_then([&osd_op, &delta_stats, &result, &num, &truncated](auto ret) {
         ......
        }).handle_error(
          crimson::ct_error::enodata::handle([&osd_op] {
            encode(uint32_t{0} /* num */, osd_op.outdata);
            encode(bool{false} /* truncated */, osd_op.outdata);
            osd_op.rval = 0;
            return ll_read_errorator::now();
          }),
          ll_read_errorator::pass_further{}     **//how does the caller get this errorator?**
        );
 if (!os.exists || os.oi.is_whiteout()) {
    logger().debug("{}: object does not exist: {}", os.oi.soid);
    return crimson::ct_error::enoent::make();   **// how to return this errorator? can't use return or co_return** 
  }
.....
  co_await ...

Copy link
Contributor

@Matan-B Matan-B Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liu-chunmei - I've added some examples to your questions and it should compile now. Please see: 2f835ae.
Unittests added as a separate PR- #62649

Comment on lines +310 to +337
return seastar::do_with(
std::move(start_from),
std::move(callback),
[FNAME, this, &obj, &progress, &entry, &pg] (auto &start_from, auto &fun) {
return pg.shard_services.get_store().omap_iterate(
pg.get_collection_ref(),
obj,
start_from,
fun
).safe_then([FNAME, this, &obj, &progress, &entry, &pg](auto result) {
if (result == ObjectStore::omap_iter_ret_t::NEXT) {
DEBUGDPP("op: {}, obj: {}, progress: {} omap done",
pg, *this, obj, progress);
progress.keys_done = true;
Copy link
Contributor

@Matan-B Matan-B Mar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here we can avoid the seastar::do_with:

auto result = co_await pg.shard_services.get_store().omap_iterate(..)..handle_error(..);
if (result == ObjectStore::omap_iter_ret_t::NEXT) {
...
}

Comment on lines +119 to +113
ch, hoid, FuturizedStore::Shard::omap_keys_t{key}
).safe_then([&key, next] (FuturizedStore::Shard::omap_values_t&& vals) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this change belong to this commit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is introduced by dropping omap_get_values(, , &start), so change it to use omap_get_values( , , keyset).

@liu-chunmei liu-chunmei force-pushed the omap_iterate branch 4 times, most recently from 4879953 to bbeb0e6 Compare April 3, 2025 00:33
Matan-B added a commit to Matan-B/ceph that referenced this pull request Apr 3, 2025
This came up in ceph#62530.
Showcase an awaited throwing coroutine and a passed_further error.

Signed-off-by: Matan Breizman <mbreizma@redhat.com>
Matan-B added a commit to Matan-B/ceph that referenced this pull request Apr 3, 2025
This came up in ceph#62530.
Showcase an awaited throwing coroutine and a passed_further error.

Signed-off-by: Matan Breizman <mbreizma@redhat.com>
Copy link
Contributor

@Matan-B Matan-B left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coroutines structure seem to work well! However, I'm not sure I understand why are we wrapping the awaiting calls with lambdas. I think that we can avoid that and simply co_await when needed. I've left an example on one of the usages.

}
std::min(max_return, max_omap_entries);

co_await [&os, &delta_stats, &osd_op, start_after, max_return, this]() -> ll_read_ierrorator::future<> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the lambda needed here?

    ceph::bufferlist result;
    uint32_t num = 0;
    bool truncated = false;
    ...

    co_await maybe_do_omap_iterate ....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coroutine maybe_do_omap_iterate use reference for start_from and callback function. so, there will be problem for these variables' lifecycle. I put them in lambda and in a coroutine to make the alive.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. It seems to me that this is the first suspension point therefore the references are still valid. However, any future additional suspension point prior to this one might result in a dangling reference.

I think that we should prioritize readability and skip the lambda wrapper as long as the references are well defined. We can add a comment above the co_await call that any additional suspension point prior to this one would require guarding the references lifecycle.

See:

CP.53: Parameters to coroutines should not be passed by reference
Reason Once a coroutine reaches the first suspension point, such as a co_await, the synchronous portion
returns. After that point any parameters passed by reference are dangling. Any usage beyond that is
undefined behavior which may include writing to freed memory.

I do understand that this is tricky and therefore keeping the lambda is also ok by me.
@liu-chunmei, what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Matan-B since there are result, num, truncated, and start_from shared between callback and omap following call, I don't want to use some share_ptr for these variables, just keep this co_await lambada for this case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If removing the additional lambda (co_await [&os, &delta_stats, &osd_op, start_after, max_return, this]() -> ll_read_ierrorator::future<> { ... }), why "result, num, truncated, and start_from shared between callback and omap following call" can have lifecycle problems?

Seems to me that without the additional lambda, for example:

PGBackend::omap_get_keys(...) const {
  ...

  ceph::bufferlist result;
  uint32_t num = 0;
  bool truncated = false;
  ObjectStore::omap_iter_seek_t start_from{ ... };
  std::function< ... > callback;

  co_await maybe_do_omap_iterate(...).safe_then(...);

  // IIUC result, num, truncated, start_from, callback should still be alive here
  // because omap_get_keys(...) isn't returned yet.
}

^ If I'm correct above, these local variable's lifecycle are extended until omap_get_keys() is returned, which means they must be alive during co_await maybe_do_omap_iterate(...).safe_then(...);.

Is that correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that correct?

@cyx1231st, Yes though it's not the problematic usage that was addressed with the lambda.
See omap_get_keys signature:

PGBackend::omap_get_keys(
  const ObjectState& os,
  OSDOp& osd_op,
  object_stat_sum_t& delta_stats) const

The tricky variables are os, osd_op and delta_stats which are are all passed by ref.

  // IIUC result, num, truncated, start_from, callback should still be alive here
  // because omap_get_keys(...) isn't returned yet.

This is true. These where also passed to the lambda so they would be accessible there and not for life-cycle maintenance.


Looking back, it seems easier to understand why the lambda is used here if it was replaced with do_with. But I don't see it as a blocker for now. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tricky variables are os, osd_op and delta_stats which are are all passed by ref.

They were passed by & from the caller (looks the caller makes sure they will be alive), and also passed by & in this additional lambda -- their lifecycle aren't changed with or without the additional lambda.

So I still don't see how this additional lambda is useful ..

Copy link
Contributor

@Matan-B Matan-B Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They were passed by & from the caller (looks the caller makes sure they will be alive), and also passed by & in this additional lambda -- their lifecycle aren't changed with or without the additional lambda.

So I still don't see how this additional lambda is useful ..

Ah, that looks correct to me, I've missed that.
Unless we make sure the upper layer is keeping them alive or we call them before the first suspension point - we will have to fallback to do_with

edit: or use the lambda with copy captures, but seems that do_with or seastar::coroutine::lambda is more readable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cyx1231st I think you are right, local variable will keep alive until omap_get_keys return, needn't co_await to capture the local variable. I will remove it.

@Matan-B Matan-B moved this from In Progress to Needs QA in Crimson Jul 13, 2025
@Matan-B
Copy link
Contributor

Matan-B commented Jul 13, 2025

@Matan-B Matan-B requested a review from cyx1231st July 13, 2025 09:06
@Matan-B Matan-B moved this from Needs QA to Tested in Crimson Jul 13, 2025
@Matan-B Matan-B added the TESTED label Jul 13, 2025
@Matan-B
Copy link
Contributor

Matan-B commented Jul 13, 2025

https://pulpito.ceph.com/matan-2025-07-13_08:45:30-crimson-rados-wip-liucm-omap-empty-test-crimson-only-distro-crimson-debug-smithi/
The failures are due to the suite branch used being newer than the branch tested (i.e not included #63144). Other than that:
All jobs passed!

@cyx1231st, was your latest review addressed? If so, this can be merged.

@cyx1231st
Copy link
Member

See #62530 (comment), I'm still not convinced about the correctness if we fall into the new else case, even if the tests are passed. Previously we don't have this additional if/else, which has been working fine.

Can the additional if (iter != iter_begin()) { be dropped in this PR? If not, what becomes wrong specifically with this PR and why ?

@cyx1231st
Copy link
Member

Some other (although minor) unresolved comments are not addressed as well.

Signed-off-by: chunmei liu <chunmei.liu@ibm.com>
Signed-off-by: chunmei liu <chunmei.liu@ibm.com>
needn't scan the entire omap tree.

Signed-off-by: chunmei liu <chunmei.liu@ibm.com>
… omap_iterate()

Signed-off-by: chunmei liu <chunmei.liu@ibm.com>
Signed-off-by: chunmei liu <chunmei.liu@ibm.com>
Signed-off-by: chunmei liu <chunmei.liu@ibm.com>
@liu-chunmei
Copy link
Contributor Author

Some other (although minor) unresolved comments are not addressed as well.

seems address all. please take a look.

…rate

Signed-off-by: chunmei liu <chunmei.liu@ibm.com>
Signed-off-by: Chunmei Liu <chunmei.liu@ibm.com>
Signed-off-by: chunmei liu <chunmei.liu@ibm.com>
Copy link
Member

@cyx1231st cyx1231st left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Doesn't worth full tests as the changes were minor.

@cyx1231st
Copy link
Member

Crimson-has-followup:

@Matan-B
Copy link
Contributor

Matan-B commented Jul 15, 2025

LGTM!

Doesn't worth full tests as the changes were minor.

Ack, merging based on https://pulpito.ceph.com/matan-2025-07-13_08:45:30-crimson-rados-wip-liucm-omap-empty-test-crimson-only-distro-crimson-debug-smithi/

only failures are cls which were not included in this branch.

@Matan-B Matan-B merged commit 853c609 into ceph:main Jul 15, 2025
13 checks passed
@Matan-B
Copy link
Contributor

Matan-B commented Jul 15, 2025

Crimson-has-followup:

https://tracker.ceph.com/issues/69990#note-7

https://tracker.ceph.com/issues/69612 is resolved :)

@Matan-B Matan-B moved this from Tested to Merged (Main) in Crimson Jul 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Merged (Main)

Development

Successfully merging this pull request may close these issues.

4 participants