rgw: async refcount operate in copy_obj#48155
Conversation
cbodley
left a comment
There was a problem hiding this comment.
this is really great, thank you! unfortunately i don't think librados::AioCompletion is the best primitive for this - would you be willing to try this with rgw's AioThrottles instead?
src/rgw/rgw_common.h
Outdated
| { | ||
| assert(!ios.empty()); | ||
| IO &io = ios.front(); | ||
| io.c->wait_for_complete(); |
There was a problem hiding this comment.
we're trying to avoid blocking waits like AioCompletion::wait_for_complete() now that we're running rgw requests as coroutines. we have an optional_yield-enabled version of this RGWIOManager in rgw_aio_throttle.h with rgw::make_throttle(uint64_t window_size, optional_yield y)
that's what we use for GetObj and PutObj to read/write object data, and it's a good fit here too
There was a problem hiding this comment.
Year, I will learn it and revise.
There was a problem hiding this comment.
thank you! please let me know if you have any more questions
src/rgw/rgw_rados.cc
Outdated
| append_rand_alpha(cct, tag, tag, 32); | ||
| } | ||
|
|
||
| RGWIOManager<rgw_raw_obj> ref_io_manager(cct, &ref_objs, cct->_conf->rgw_max_copy_obj_concurrent_io); |
There was a problem hiding this comment.
auto aio = rgw::make_throttle(cct->_conf->rgw_max_copy_obj_concurrent_io, y);
src/rgw/rgw_rados.cc
Outdated
| ioctx.locator_set_key(loc.loc); | ||
|
|
||
| ret = rgw_rados_operate(dpp, ioctx, loc.oid, &op, null_yield); | ||
| ret = ref_io_manager.schedule_io(&ioctx, loc.oid, &op, loc); |
There was a problem hiding this comment.
to schedule this librados op, we'd call something like this:
static constexpr uint64_t cost = 1; // 1 throttle unit per request
static constexpr uint64_t id = 0; // ids unused
rgw::AioResultList completed = aio->get(obj, rgw::Aio::librados_op(std::move(op), y), cost, id);the obj comes from services/svc_rados.h, which will need to replace the use of rgw_rados_ref ref here:
cls_refcount_get(op, ref_tag, true);
- const rgw_raw_obj& loc = miter.get_location().get_raw_obj(store);
-
- auto& ioctx = ref.pool.ioctx();
- ioctx.locator_set_key(loc.loc);
+ auto obj = svc.rados->obj(miter.get_location().get_raw_obj(store));
+ ret = obj.open(dpp);
if (ret < 0) {each call to aio->get() returns an AioResultList. we'll have to check these results for error codes. if we keep an AioResultList of all completions we got, the rollback logic below can loop over that list instead of tracking ref_obs
rgw::AioResultList completed = aio->get(obj, rgw::Aio::librados_op(std::move(op), y), cost, id);
+ ret = rgw::check_for_errors(completed);
+ all_results.splice(all_results.end(), completed);
if (ret < 0) {
goto done_ret;the rollback logic below would look something like:
/* rollback reference */
string ref_tag = tag + '\0';
- for (riter = ref_objs.begin(); riter != ref_objs.end(); ++riter) {
+ for (auto& r : all_results) {
+ if (r.result < 0) {
+ continue; // skip errors
+ }
ObjectWriteOperation op;
cls_refcount_put(op, ref_tag, true);
- ref.pool.ioctx().locator_set_key(riter->loc);
- int r = rgw_rados_operate(dpp, ref.pool.ioctx(), riter->oid, &op, null_yield);
+ rgw::AioResultList completed = aio->get(r.obj, rgw::Aio::librados_op(std::move(op), y), cost, id);
src/rgw/rgw_rados.cc
Outdated
| } | ||
|
|
||
| ref_objs.push_back(loc); | ||
| ret = ref_io_manager.drain_ios(); |
There was a problem hiding this comment.
- ret = ref_io_manager.drain_ios();
+ rgw::AioResultList completed = aio->drain();
+ ret = rgw::check_for_errors(completed);
+ all_results.splice(all_results.end(), completed);
if (ret < 0) {There was a problem hiding this comment.
Thank you really for such detailed illustrating, that's very helpful for me. At this please, I think it may does not have to add completed to all_results, because all_results is used for cleanup. What's your opinion?
There was a problem hiding this comment.
this drains the rest of the requests sent by the refcount loop above. if drain() returns any errors, we'll need to run that rollback logic below - and that rollback logic should undo any of the successful writes that we drained here too
so yes, i do think we need to add these completions to all_results here
There was a problem hiding this comment.
Yeal, my bad. I revise and commit again, please review when you have time.
src/common/options/rgw.yaml.in
Outdated
| - name: rgw_max_copy_obj_concurrent_io | ||
| type: int | ||
| level: advanced | ||
| desc: the async refcount io number processed meanwhile in copy_obj |
There was a problem hiding this comment.
Hello, thanks for contributing. I would like to see this description be more clear.
@cbodley Does the below make sense?
"desc: Number of refcount operations to process concurrently when executing copy_obj"
There was a problem hiding this comment.
ok, this is much better.
15c471b to
8364222
Compare
|
Docs Lgtm |
8364222 to
bd073d3
Compare
cbodley
left a comment
There was a problem hiding this comment.
this looks great, how does it work in testing? can we find a way to inject an error to test the rollback logic?
src/rgw/rgw_rados.cc
Outdated
| ret = rgw::check_for_errors(completed); | ||
| if (ret < 0) { |
There was a problem hiding this comment.
since this is the done_ret: error path, we need to make sure we return the original error code - so we can't overwrite ret this way
bd073d3 to
fce234e
Compare
I added a unittest, then implemented the refcount++ and rollback logic with BlockingAioThrottle. Please review again at your time. |
nice job getting the test to work! ultimately though, i don't think that makes for a good copyobj regression test because it duplicates the copy logic instead of testing the s3tests for CopyObj won't be able to cover the rollback logic, so i was just looking for confirmation that it works. i added an error after the @@ -4508,6 +4510,7 @@ int RGWRados::copy_obj(RGWObjectCtx& obj_ctx,
rgw::AioResultList completed = aio->drain();
ret = rgw::check_for_errors(completed);
all_results.splice(all_results.end(), completed);
+ ret = -EIO; // inject EIO on drain
if (ret < 0) {
ldpp_dout(dpp, 0) << "ERROR: failed to drain ios, the error code = " << ret <<dendl;
goto done_ret;in ceph/s3-tests#473 i added a test case that copies a 16mb object so we could see this ref-counting logic in action. and running that with -EIO error injection, i do see the rollback logic correctly calling could you please remove the test case? |
| } | ||
|
|
||
| auto aio = rgw::make_throttle(cct->_conf->rgw_max_copy_obj_concurrent_io, y); | ||
| rgw::AioResultList all_results; |
There was a problem hiding this comment.
it looks like we could avoid allocating aio for the copy_itself case?
std::unique_ptr<rgw::Aio> aio;
rgw::AioResultList all_results;
if (!copy_itself) {
aio = rgw::make_throttle(cct->_conf->rgw_max_copy_obj_concurrent_io, y);Signed-off-by: Mingyuan Liang <liangmingyuan@baidu.com>
fce234e to
1b4f6a2
Compare
ok. |
|
jenkins test api |
|
@liangmingyuanneo do you need this fix on the pacific or quincy release? we generally don't backport this kind of performance improvement |
No. So I don't backport it either now. |
When copy-object objects are called between buckets, the head object is copied in full, other objects only use refcount++ operation. Refcount does not require copying data, but it is processed serially in RGW. It is still time-consuming to go through all rados objects' refcount contained in an RGW object.
The refcount operation itself uses an RPC-like method to construct a request on the RGW side and send the request to the OSD side. After processing the request, the OSD process returns the result. Osd has a special thread pool to process requests, meanwhile different fragments may be sent to multiple OSD nodes for processing. Therefore, if the RGW can send multiple requests at the same time, the processing speed will be greatly improved. So I changed the refcount operation in copy_obj to async mode.
https://tracker.ceph.com/issues/57588
Signed-off-by: Mingyuan Liang liangmingyuan@baidu.com