rgw: async refcount operate in copy_obj by liangmingyuanneo · Pull Request #48155 · ceph/ceph

liangmingyuanneo · 2022-09-17T11:00:31Z

When copy-object objects are called between buckets, the head object is copied in full, other objects only use refcount++ operation. Refcount does not require copying data, but it is processed serially in RGW. It is still time-consuming to go through all rados objects' refcount contained in an RGW object.
The refcount operation itself uses an RPC-like method to construct a request on the RGW side and send the request to the OSD side. After processing the request, the OSD process returns the result. Osd has a special thread pool to process requests, meanwhile different fragments may be sent to multiple OSD nodes for processing. Therefore, if the RGW can send multiple requests at the same time, the processing speed will be greatly improved. So I changed the refcount operation in copy_obj to async mode.

https://tracker.ceph.com/issues/57588

Signed-off-by: Mingyuan Liang liangmingyuan@baidu.com

cbodley

this is really great, thank you! unfortunately i don't think librados::AioCompletion is the best primitive for this - would you be willing to try this with rgw's AioThrottles instead?

cbodley · 2022-09-17T15:47:48Z

src/rgw/rgw_common.h

+{
+  assert(!ios.empty());
+  IO &io = ios.front();
+  io.c->wait_for_complete();


we're trying to avoid blocking waits like AioCompletion::wait_for_complete() now that we're running rgw requests as coroutines. we have an optional_yield-enabled version of this RGWIOManager in rgw_aio_throttle.h with rgw::make_throttle(uint64_t window_size, optional_yield y)

that's what we use for GetObj and PutObj to read/write object data, and it's a good fit here too

Year, I will learn it and revise.

thank you! please let me know if you have any more questions

cbodley · 2022-09-17T15:49:35Z

src/rgw/rgw_rados.cc

    append_rand_alpha(cct, tag, tag, 32);
  }

+  RGWIOManager<rgw_raw_obj> ref_io_manager(cct, &ref_objs, cct->_conf->rgw_max_copy_obj_concurrent_io);


auto aio = rgw::make_throttle(cct->_conf->rgw_max_copy_obj_concurrent_io, y);

cbodley · 2022-09-17T16:52:13Z

src/rgw/rgw_rados.cc

      ioctx.locator_set_key(loc.loc);

-      ret = rgw_rados_operate(dpp, ioctx, loc.oid, &op, null_yield);
+      ret = ref_io_manager.schedule_io(&ioctx, loc.oid, &op, loc);


to schedule this librados op, we'd call something like this:

static constexpr uint64_t cost = 1; // 1 throttle unit per request static constexpr uint64_t id = 0; // ids unused rgw::AioResultList completed = aio->get(obj, rgw::Aio::librados_op(std::move(op), y), cost, id);

the obj comes from services/svc_rados.h, which will need to replace the use of rgw_rados_ref ref here:

cls_refcount_get(op, ref_tag, true); - const rgw_raw_obj& loc = miter.get_location().get_raw_obj(store); - - auto& ioctx = ref.pool.ioctx(); - ioctx.locator_set_key(loc.loc); + auto obj = svc.rados->obj(miter.get_location().get_raw_obj(store)); + ret = obj.open(dpp); if (ret < 0) {

each call to aio->get() returns an AioResultList. we'll have to check these results for error codes. if we keep an AioResultList of all completions we got, the rollback logic below can loop over that list instead of tracking ref_obs

rgw::AioResultList completed = aio->get(obj, rgw::Aio::librados_op(std::move(op), y), cost, id); + ret = rgw::check_for_errors(completed); + all_results.splice(all_results.end(), completed); if (ret < 0) { goto done_ret;

the rollback logic below would look something like:

/* rollback reference */ string ref_tag = tag + '\0'; - for (riter = ref_objs.begin(); riter != ref_objs.end(); ++riter) { + for (auto& r : all_results) { + if (r.result < 0) { + continue; // skip errors + } ObjectWriteOperation op; cls_refcount_put(op, ref_tag, true); - ref.pool.ioctx().locator_set_key(riter->loc); - int r = rgw_rados_operate(dpp, ref.pool.ioctx(), riter->oid, &op, null_yield); + rgw::AioResultList completed = aio->get(r.obj, rgw::Aio::librados_op(std::move(op), y), cost, id);

cbodley · 2022-09-17T16:58:51Z

src/rgw/rgw_rados.cc

+    }

-      ref_objs.push_back(loc);
+    ret = ref_io_manager.drain_ios();


- ret = ref_io_manager.drain_ios(); + rgw::AioResultList completed = aio->drain(); + ret = rgw::check_for_errors(completed); + all_results.splice(all_results.end(), completed); if (ret < 0) {

Thank you really for such detailed illustrating, that's very helpful for me. At this please, I think it may does not have to add completed to all_results, because all_results is used for cleanup. What's your opinion?

this drains the rest of the requests sent by the refcount loop above. if drain() returns any errors, we'll need to run that rollback logic below - and that rollback logic should undo any of the successful writes that we drained here too

so yes, i do think we need to add these completions to all_results here

Yeal, my bad. I revise and commit again, please review when you have time.

anthonyeleven · 2022-09-19T01:08:54Z

src/common/options/rgw.yaml.in

+- name: rgw_max_copy_obj_concurrent_io
+  type: int
+  level: advanced
+  desc: the async refcount io number processed meanwhile in copy_obj


Hello, thanks for contributing. I would like to see this description be more clear.

@cbodley Does the below make sense?

"desc: Number of refcount operations to process concurrently when executing copy_obj"

ok, this is much better.

anthonyeleven · 2022-09-19T12:03:03Z

Docs Lgtm

cbodley

this looks great, how does it work in testing? can we find a way to inject an error to test the rollback logic?

cbodley · 2022-09-21T18:36:48Z

src/rgw/rgw_rados.cc

+      ret = rgw::check_for_errors(completed);
+      if (ret < 0) {


since this is the done_ret: error path, we need to make sure we return the original error code - so we can't overwrite ret this way

src/rgw/rgw_rados.cc

liangmingyuanneo · 2022-09-26T06:33:48Z

this looks great, how does it work in testing? can we find a way to inject an error to test the rollback logic?

I added a unittest, then implemented the refcount++ and rollback logic with BlockingAioThrottle. Please review again at your time.

liangmingyuanneo · 2022-09-28T01:37:40Z

@cbodley

cbodley · 2022-09-30T16:37:57Z

I added a unittest, then implemented the refcount++ and rollback logic with BlockingAioThrottle. Please review again at your time.

nice job getting the test to work! ultimately though, i don't think that makes for a good copyobj regression test because it duplicates the copy logic instead of testing RGWRados::copy_obj() directly. we already have https://github.com/ceph/s3-tests to test the s3 copy APIs, and src/test/rgw/test_rgw_throttle.cc for low-level throttle testing

the s3tests for CopyObj won't be able to cover the rollback logic, so i was just looking for confirmation that it works. i added an error after the drain() to trigger this rollback:

@@ -4508,6 +4510,7 @@ int RGWRados::copy_obj(RGWObjectCtx& obj_ctx,
     rgw::AioResultList completed = aio->drain();
     ret = rgw::check_for_errors(completed);
     all_results.splice(all_results.end(), completed);
+    ret = -EIO; // inject EIO on drain
     if (ret < 0) {
       ldpp_dout(dpp, 0) << "ERROR: failed to drain ios, the error code = " << ret <<dendl;
       goto done_ret;

in ceph/s3-tests#473 i added a test case that copies a 16mb object so we could see this ref-counting logic in action. and running that with -EIO error injection, i do see the rollback logic correctly calling call refcount.put on each of the 3 tail objects:

2022-09-30T12:22:13.945-0400 7f8cd0754640  0 req 5021485027944259865 0.009000298s s3:copy_obj ERROR: failed to drain ios, the error code = -5
2022-09-30T12:22:13.945-0400 7f8cd0754640  1 -- 192.168.245.128:0/352496449 --> [v2:192.168.245.128:6800/850826,v1:192.168.245.128:6801/850826] -- osd_op(unknown.0.0:4443 6.0 6:ffb5a3a3:::eb6b7daa-f6c3-46a8-9a92-5e649c93b954.4137.7__shadow_.T2kNhevtg8qKE7paX1WOXBXh1kzQRkM_1:head [call refcount.put in=84b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e22) v8 -- 0x558cea82a800 con 0x558ce2baf000
2022-09-30T12:22:13.945-0400 7f8cd0754640  1 -- 192.168.245.128:0/352496449 --> [v2:192.168.245.128:6800/850826,v1:192.168.245.128:6801/850826] -- osd_op(unknown.0.0:4444 6.0 6:74a0c32b:::eb6b7daa-f6c3-46a8-9a92-5e649c93b954.4137.7__shadow_.T2kNhevtg8qKE7paX1WOXBXh1kzQRkM_2:head [call refcount.put in=84b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e22) v8 -- 0x558cea960000 con 0x558ce2baf000
2022-09-30T12:22:13.945-0400 7f8cd0754640  1 -- 192.168.245.128:0/352496449 --> [v2:192.168.245.128:6800/850826,v1:192.168.245.128:6801/850826] -- osd_op(unknown.0.0:4445 6.0 6:13df50b3:::eb6b7daa-f6c3-46a8-9a92-5e649c93b954.4137.7__shadow_.T2kNhevtg8qKE7paX1WOXBXh1kzQRkM_3:head [call refcount.put in=84b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e22) v8 -- 0x558cea960400 con 0x558ce2baf000
2022-09-30T12:22:13.946-0400 7f8d9c1c6640  1 -- 192.168.245.128:0/352496449 <== osd.0 v2:192.168.245.128:6800/850826 4519 ==== osd_op_reply(4443 eb6b7daa-f6c3-46a8-9a92-5e649c93b954.4137.7__shadow_.T2kNhevtg8qKE7paX1WOXBXh1kzQRkM_1 [call] v22'62 uv62 ondisk = 0) v8 ==== 230+0+0 (crc 0 0 0) 0x558ce7a14d80 con 0x558ce2baf000
2022-09-30T12:22:13.947-0400 7f8d9c1c6640  1 -- 192.168.245.128:0/352496449 <== osd.0 v2:192.168.245.128:6800/850826 4520 ==== osd_op_reply(4444 eb6b7daa-f6c3-46a8-9a92-5e649c93b954.4137.7__shadow_.T2kNhevtg8qKE7paX1WOXBXh1kzQRkM_2 [call] v22'63 uv63 ondisk = 0) v8 ==== 230+0+0 (crc 0 0 0) 0x558ce7a14d80 con 0x558ce2baf000
2022-09-30T12:22:13.947-0400 7f8d9c1c6640  1 -- 192.168.245.128:0/352496449 <== osd.0 v2:192.168.245.128:6800/850826 4521 ==== osd_op_reply(4445 eb6b7daa-f6c3-46a8-9a92-5e649c93b954.4137.7__shadow_.T2kNhevtg8qKE7paX1WOXBXh1kzQRkM_3 [call] v22'64 uv64 ondisk = 0) v8 ==== 230+0+0 (crc 0 0 0) 0x558ce7a14d80 con 0x558ce2baf000
2022-09-30T12:22:13.947-0400 7f8cd8764640  2 req 5021485027944259865 0.011000365s s3:copy_obj completing
2022-09-30T12:22:13.947-0400 7f8cd8764640  0 WARNING: set_req_state_err err_no=5 resorting to 500

could you please remove the test case?

cbodley · 2022-09-30T16:39:33Z

src/rgw/rgw_rados.cc

  }

+  auto aio = rgw::make_throttle(cct->_conf->rgw_max_copy_obj_concurrent_io, y);
+  rgw::AioResultList all_results;


it looks like we could avoid allocating aio for the copy_itself case?

std::unique_ptr<rgw::Aio> aio; rgw::AioResultList all_results; if (!copy_itself) { aio = rgw::make_throttle(cct->_conf->rgw_max_copy_obj_concurrent_io, y);

Signed-off-by: Mingyuan Liang <liangmingyuan@baidu.com>

liangmingyuanneo · 2022-10-02T18:13:05Z

could you please remove the test case?

ok.

cbodley

thanks! passed qa in https://pulpito.ceph.com/cbodley-2022-09-30_16:47:21-rgw-wip-cbodley-testing-distro-default-smithi/

cbodley · 2022-10-03T15:07:44Z

jenkins test api

cbodley · 2022-10-03T17:20:43Z

@liangmingyuanneo do you need this fix on the pacific or quincy release? we generally don't backport this kind of performance improvement

liangmingyuanneo · 2022-10-04T02:37:39Z

@liangmingyuanneo do you need this fix on the pacific or quincy release? we generally don't backport this kind of performance improvement

No. So I don't backport it either now.

liangmingyuanneo requested a review from a team as a code owner September 17, 2022 11:00

github-actions bot added common documentation rgw labels Sep 17, 2022

mattbenjamin requested review from adamemerson and cbodley September 17, 2022 13:54

cbodley reviewed Sep 17, 2022

View reviewed changes

anthonyeleven requested changes Sep 19, 2022

View reviewed changes

liangmingyuanneo force-pushed the wip-rgw-aync-refcount branch from 15c471b to 8364222 Compare September 19, 2022 09:30

anthonyeleven approved these changes Sep 19, 2022

View reviewed changes

liangmingyuanneo force-pushed the wip-rgw-aync-refcount branch from 8364222 to bd073d3 Compare September 20, 2022 02:53

cbodley reviewed Sep 21, 2022

View reviewed changes

liangmingyuanneo force-pushed the wip-rgw-aync-refcount branch from bd073d3 to fce234e Compare September 26, 2022 06:25

github-actions bot added build/ops tests labels Sep 26, 2022

cbodley added the wip-cbodley-testing label Sep 29, 2022

cbodley reviewed Sep 30, 2022

View reviewed changes

rgw: async refcount operate in copy_obj

1b4f6a2

Signed-off-by: Mingyuan Liang <liangmingyuan@baidu.com>

liangmingyuanneo force-pushed the wip-rgw-aync-refcount branch from fce234e to 1b4f6a2 Compare October 2, 2022 18:09

cbodley approved these changes Oct 3, 2022

View reviewed changes

cbodley removed the wip-cbodley-testing label Oct 3, 2022

cbodley merged commit e4a967f into ceph:main Oct 3, 2022

cbodley mentioned this pull request Nov 2, 2022

rgw: concurrency for multi object deletes #48679

Merged

14 tasks

Conversation

liangmingyuanneo commented Sep 17, 2022

Uh oh!

cbodley left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anthonyeleven commented Sep 19, 2022

Uh oh!

cbodley left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

liangmingyuanneo commented Sep 26, 2022

Uh oh!

liangmingyuanneo commented Sep 28, 2022

Uh oh!

cbodley commented Sep 30, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

liangmingyuanneo commented Oct 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cbodley left a comment

Choose a reason for hiding this comment

Uh oh!

cbodley commented Oct 3, 2022

Uh oh!

cbodley commented Oct 3, 2022

Uh oh!

liangmingyuanneo commented Oct 4, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

liangmingyuanneo commented Oct 2, 2022 •

edited

Loading