crimson: implement crimson Omap iterate interface#62530
Conversation
|
Can you verify whether the mentioned teuthology test mentioned is passing or not? (Repeat 5 times to see if it's stable) |
|
Nit, by convention, commit title should match the related directory or file, e.g. :
|
9e7071a to
852252d
Compare
852252d to
1e897c2
Compare
1e897c2 to
56063bd
Compare
Matan-B
left a comment
There was a problem hiding this comment.
Alien/CyanStore side looks good so far! We should consider introducing the new crimson logic as coroutines. This could largely help with future maintainability. What do you think?
src/crimson/osd/pg_backend.cc
Outdated
| return maybe_do_omap_iterate(store, coll, os.oi, start_from, callback) | ||
| .safe_then([&delta_stats, &osd_op, &result, &num, &truncated](auto ret) { | ||
| if (ret != ObjectStore::omap_iter_ret_t::STOP) { | ||
| logger().warn("omap_iterate not meet a stop condition"); | ||
| } | ||
| encode(num, osd_op.outdata); | ||
| osd_op.outdata.claim_append(result); | ||
| encode(truncated, osd_op.outdata); | ||
| delta_stats.num_rd_kb += shift_round_up(osd_op.outdata.length(), 10); | ||
| delta_stats.num_rd++; | ||
| }).handle_error( | ||
| crimson::ct_error::enodata::handle([&osd_op] { | ||
| uint32_t num = 0; | ||
| bool truncated = false; | ||
| encode(num, osd_op.outdata); | ||
| encode(truncated, osd_op.outdata); | ||
| osd_op.rval = 0; | ||
| return seastar::now(); | ||
| }), | ||
| ll_read_errorator::pass_further{} |
There was a problem hiding this comment.
This could be a great opportunity to introduce the new logic with coroutines:
auto ret = co_await maybe_do_omap_iterate((store, coll, os.oi, start_from, callback).handle_error(...)
if (ret != ObjectStore::omap_iter_ret_t::STOP) {
logger().warn("omap_iterate not meet a stop condition");
}
encode(num, osd_op.outdata);
...
There was a problem hiding this comment.
It would be ok to do the new code as a coroutine and update the rest of the file later. Doing the whole file later would also be ok, I think.
There was a problem hiding this comment.
Generally, avoiding the need for future combinators like do_with by using coroutines is a pretty huge readability win.
There was a problem hiding this comment.
@Matan-B @athanatos any examples for coroutine with return type interruptible errorator, I try to co_return ll_read_ierrorator::make_ready_future<>(); seems not supported.
There was a problem hiding this comment.
It should work, but you'll need crimson/common/coroutine.h for the supporting awaiter types.
There was a problem hiding this comment.
There's already quite a bit of code that uses coroutines with errorated and interruptible futures.
There was a problem hiding this comment.
@Matan-B @athanatos coroutine can't return errorator, so if I use coroutine, in co_await function, I only can do crimson::ct_error::assert_all{} to handle all errorators, the caller will not get errorator information. and in function which calling ca_await, I also can't use co_return to return errorator, only can do try{}throw error. Do you want me change like this or wait until our coroutine support return errorator? better to let coroutine support return errorator first. Thanks!
To handle errors we would need handle_error after awaiting the coroutine, see for example how OSD::mkfs handles mount_ertr from mount()
co_await store.mount().handle_error(
crimson::stateful_ec::assert_failure([FNAME](const auto& ec) {
ERROR("error mounting object store in {}: ({}) {}",
local_conf().get_val<std::string>("osd_data"),
ec.value(), ec.message());
}));
....
co_return;
See test_crimson_coroutine.cc for more examples.
I was able to compile the following in replicated_backend:
+#include "crimson/common/coroutine.h"
..
+
+ ll_read_errorator::future<>
+ do_stuff() {
+ co_return;
+ };
+
+ ll_read_ierrorator::future<>
+ do_stuff2() {
+ co_return;
+ };
+
ll_read_ierrorator::future<ceph::bufferlist>
_read(const hobject_t& hoid, uint64_t off,
uint64_t len, uint32_t flags) override;
There was a problem hiding this comment.
@Matan-B I have the two issues:
co_await maybe_do_omap_iterate(store, coll, os.oi, start_from, callback)
.safe_then([&osd_op, &delta_stats, &result, &num, &truncated](auto ret) {
......
}).handle_error(
crimson::ct_error::enodata::handle([&osd_op] {
encode(uint32_t{0} /* num */, osd_op.outdata);
encode(bool{false} /* truncated */, osd_op.outdata);
osd_op.rval = 0;
return ll_read_errorator::now();
}),
ll_read_errorator::pass_further{} **//how does the caller get this errorator?**
);
if (!os.exists || os.oi.is_whiteout()) {
logger().debug("{}: object does not exist: {}", os.oi.soid);
return crimson::ct_error::enoent::make(); **// how to return this errorator? can't use return or co_return**
}
.....
co_await ...
There was a problem hiding this comment.
@liu-chunmei - I've added some examples to your questions and it should compile now. Please see: 2f835ae.
Unittests added as a separate PR- #62649
| return seastar::do_with( | ||
| std::move(start_from), | ||
| std::move(callback), | ||
| [FNAME, this, &obj, &progress, &entry, &pg] (auto &start_from, auto &fun) { | ||
| return pg.shard_services.get_store().omap_iterate( | ||
| pg.get_collection_ref(), | ||
| obj, | ||
| start_from, | ||
| fun | ||
| ).safe_then([FNAME, this, &obj, &progress, &entry, &pg](auto result) { | ||
| if (result == ObjectStore::omap_iter_ret_t::NEXT) { | ||
| DEBUGDPP("op: {}, obj: {}, progress: {} omap done", | ||
| pg, *this, obj, progress); | ||
| progress.keys_done = true; |
There was a problem hiding this comment.
Same here we can avoid the seastar::do_with:
auto result = co_await pg.shard_services.get_store().omap_iterate(..)..handle_error(..);
if (result == ObjectStore::omap_iter_ret_t::NEXT) {
...
}
src/osd/SnapMapper.cc
Outdated
| ch, hoid, FuturizedStore::Shard::omap_keys_t{key} | ||
| ).safe_then([&key, next] (FuturizedStore::Shard::omap_values_t&& vals) { |
There was a problem hiding this comment.
Does this change belong to this commit?
There was a problem hiding this comment.
it is introduced by dropping omap_get_values(, , &start), so change it to use omap_get_values( , , keyset).
4879953 to
bbeb0e6
Compare
This came up in ceph#62530. Showcase an awaited throwing coroutine and a passed_further error. Signed-off-by: Matan Breizman <mbreizma@redhat.com>
This came up in ceph#62530. Showcase an awaited throwing coroutine and a passed_further error. Signed-off-by: Matan Breizman <mbreizma@redhat.com>
bbeb0e6 to
2664845
Compare
Matan-B
left a comment
There was a problem hiding this comment.
Coroutines structure seem to work well! However, I'm not sure I understand why are we wrapping the awaiting calls with lambdas. I think that we can avoid that and simply co_await when needed. I've left an example on one of the usages.
src/crimson/osd/pg_backend.cc
Outdated
| } | ||
| std::min(max_return, max_omap_entries); | ||
|
|
||
| co_await [&os, &delta_stats, &osd_op, start_after, max_return, this]() -> ll_read_ierrorator::future<> { |
There was a problem hiding this comment.
Why is the lambda needed here?
ceph::bufferlist result;
uint32_t num = 0;
bool truncated = false;
...
co_await maybe_do_omap_iterate ....
There was a problem hiding this comment.
coroutine maybe_do_omap_iterate use reference for start_from and callback function. so, there will be problem for these variables' lifecycle. I put them in lambda and in a coroutine to make the alive.
There was a problem hiding this comment.
Got it. It seems to me that this is the first suspension point therefore the references are still valid. However, any future additional suspension point prior to this one might result in a dangling reference.
I think that we should prioritize readability and skip the lambda wrapper as long as the references are well defined. We can add a comment above the co_await call that any additional suspension point prior to this one would require guarding the references lifecycle.
See:
CP.53: Parameters to coroutines should not be passed by reference
Reason Once a coroutine reaches the first suspension point, such as a co_await, the synchronous portion
returns. After that point any parameters passed by reference are dangling. Any usage beyond that is
undefined behavior which may include writing to freed memory.
I do understand that this is tricky and therefore keeping the lambda is also ok by me.
@liu-chunmei, what do you think?
There was a problem hiding this comment.
@Matan-B since there are result, num, truncated, and start_from shared between callback and omap following call, I don't want to use some share_ptr for these variables, just keep this co_await lambada for this case.
There was a problem hiding this comment.
If removing the additional lambda (co_await [&os, &delta_stats, &osd_op, start_after, max_return, this]() -> ll_read_ierrorator::future<> { ... }), why "result, num, truncated, and start_from shared between callback and omap following call" can have lifecycle problems?
Seems to me that without the additional lambda, for example:
PGBackend::omap_get_keys(...) const {
...
ceph::bufferlist result;
uint32_t num = 0;
bool truncated = false;
ObjectStore::omap_iter_seek_t start_from{ ... };
std::function< ... > callback;
co_await maybe_do_omap_iterate(...).safe_then(...);
// IIUC result, num, truncated, start_from, callback should still be alive here
// because omap_get_keys(...) isn't returned yet.
}^ If I'm correct above, these local variable's lifecycle are extended until omap_get_keys() is returned, which means they must be alive during co_await maybe_do_omap_iterate(...).safe_then(...);.
Is that correct?
There was a problem hiding this comment.
Is that correct?
@cyx1231st, Yes though it's not the problematic usage that was addressed with the lambda.
See omap_get_keys signature:
PGBackend::omap_get_keys(
const ObjectState& os,
OSDOp& osd_op,
object_stat_sum_t& delta_stats) const
The tricky variables are os, osd_op and delta_stats which are are all passed by ref.
// IIUC result, num, truncated, start_from, callback should still be alive here
// because omap_get_keys(...) isn't returned yet.
This is true. These where also passed to the lambda so they would be accessible there and not for life-cycle maintenance.
Looking back, it seems easier to understand why the lambda is used here if it was replaced with do_with. But I don't see it as a blocker for now. What do you think?
There was a problem hiding this comment.
The tricky variables are
os,osd_opanddelta_statswhich are are all passed by ref.
They were passed by & from the caller (looks the caller makes sure they will be alive), and also passed by & in this additional lambda -- their lifecycle aren't changed with or without the additional lambda.
So I still don't see how this additional lambda is useful ..
There was a problem hiding this comment.
They were passed by
&from the caller (looks the caller makes sure they will be alive), and also passed by&in this additional lambda -- their lifecycle aren't changed with or without the additional lambda.So I still don't see how this additional lambda is useful ..
Ah, that looks correct to me, I've missed that.
Unless we make sure the upper layer is keeping them alive or we call them before the first suspension point - we will have to fallback to do_with
edit: or use the lambda with copy captures, but seems that do_with or seastar::coroutine::lambda is more readable.
There was a problem hiding this comment.
@cyx1231st I think you are right, local variable will keep alive until omap_get_keys return, needn't co_await to capture the local variable. I will remove it.
|
https://pulpito.ceph.com/matan-2025-07-13_08:45:30-crimson-rados-wip-liucm-omap-empty-test-crimson-only-distro-crimson-debug-smithi/ @cyx1231st, was your latest review addressed? If so, this can be merged. |
|
See #62530 (comment), I'm still not convinced about the correctness if we fall into the new Can the additional |
|
Some other (although minor) unresolved comments are not addressed as well. |
Signed-off-by: chunmei liu <chunmei.liu@ibm.com>
Signed-off-by: chunmei liu <chunmei.liu@ibm.com>
needn't scan the entire omap tree. Signed-off-by: chunmei liu <chunmei.liu@ibm.com>
… omap_iterate() Signed-off-by: chunmei liu <chunmei.liu@ibm.com>
Signed-off-by: chunmei liu <chunmei.liu@ibm.com>
Signed-off-by: chunmei liu <chunmei.liu@ibm.com>
4c0f0e3 to
6a245a0
Compare
seems address all. please take a look. |
…rate Signed-off-by: chunmei liu <chunmei.liu@ibm.com>
Signed-off-by: Chunmei Liu <chunmei.liu@ibm.com>
Signed-off-by: chunmei liu <chunmei.liu@ibm.com>
6a245a0 to
aa2f606
Compare
cyx1231st
left a comment
There was a problem hiding this comment.
LGTM!
Doesn't worth full tests as the changes were minor.
|
Crimson-has-followup:
|
Ack, merging based on https://pulpito.ceph.com/matan-2025-07-13_08:45:30-crimson-rados-wip-liucm-omap-empty-test-crimson-only-distro-crimson-debug-smithi/ only failures are cls which were not included in this branch. |
https://tracker.ceph.com/issues/69990#note-7 https://tracker.ceph.com/issues/69612 is resolved :) |
https://tracker.ceph.com/issues/69612
https://tracker.ceph.com/issues/69990
For ceph_test_cls_rbd test, original omap list will cause slow request when osd_op_complaint_time <=3s, by this PR, don't meet omap list slow request even setting osd_op_complaint_time = 1s.
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins test classic perfJenkins Job | Jenkins Job Definitionjenkins test crimson perfJenkins Job | Jenkins Job Definitionjenkins test signedJenkins Job | Jenkins Job Definitionjenkins test make checkJenkins Job | Jenkins Job Definitionjenkins test make check arm64Jenkins Job | Jenkins Job Definitionjenkins test submodulesJenkins Job | Jenkins Job Definitionjenkins test dashboardJenkins Job | Jenkins Job Definitionjenkins test dashboard cephadmJenkins Job | Jenkins Job Definitionjenkins test apiJenkins Job | Jenkins Job Definitionjenkins test docsReadTheDocs | Github Workflow Definitionjenkins test ceph-volume allJenkins Jobs | Jenkins Jobs Definitionjenkins test windowsJenkins Job | Jenkins Job Definitionjenkins test rook e2eJenkins Job | Jenkins Job Definition