Skip to content

rgw: async refcount operate in copy_obj#48155

Merged
cbodley merged 1 commit intoceph:mainfrom
liangmingyuanneo:wip-rgw-aync-refcount
Oct 3, 2022
Merged

rgw: async refcount operate in copy_obj#48155
cbodley merged 1 commit intoceph:mainfrom
liangmingyuanneo:wip-rgw-aync-refcount

Conversation

@liangmingyuanneo
Copy link
Contributor

When copy-object objects are called between buckets, the head object is copied in full, other objects only use refcount++ operation. Refcount does not require copying data, but it is processed serially in RGW. It is still time-consuming to go through all rados objects' refcount contained in an RGW object.
The refcount operation itself uses an RPC-like method to construct a request on the RGW side and send the request to the OSD side. After processing the request, the OSD process returns the result. Osd has a special thread pool to process requests, meanwhile different fragments may be sent to multiple OSD nodes for processing. Therefore, if the RGW can send multiple requests at the same time, the processing speed will be greatly improved. So I changed the refcount operation in copy_obj to async mode.

https://tracker.ceph.com/issues/57588

Signed-off-by: Mingyuan Liang liangmingyuan@baidu.com

Copy link
Contributor

@cbodley cbodley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is really great, thank you! unfortunately i don't think librados::AioCompletion is the best primitive for this - would you be willing to try this with rgw's AioThrottles instead?

{
assert(!ios.empty());
IO &io = ios.front();
io.c->wait_for_complete();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're trying to avoid blocking waits like AioCompletion::wait_for_complete() now that we're running rgw requests as coroutines. we have an optional_yield-enabled version of this RGWIOManager in rgw_aio_throttle.h with rgw::make_throttle(uint64_t window_size, optional_yield y)

that's what we use for GetObj and PutObj to read/write object data, and it's a good fit here too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Year, I will learn it and revise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you! please let me know if you have any more questions

append_rand_alpha(cct, tag, tag, 32);
}

RGWIOManager<rgw_raw_obj> ref_io_manager(cct, &ref_objs, cct->_conf->rgw_max_copy_obj_concurrent_io);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto aio = rgw::make_throttle(cct->_conf->rgw_max_copy_obj_concurrent_io, y);

ioctx.locator_set_key(loc.loc);

ret = rgw_rados_operate(dpp, ioctx, loc.oid, &op, null_yield);
ret = ref_io_manager.schedule_io(&ioctx, loc.oid, &op, loc);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to schedule this librados op, we'd call something like this:

static constexpr uint64_t cost = 1; // 1 throttle unit per request
static constexpr uint64_t id = 0; // ids unused
rgw::AioResultList completed = aio->get(obj, rgw::Aio::librados_op(std::move(op), y), cost, id);

the obj comes from services/svc_rados.h, which will need to replace the use of rgw_rados_ref ref here:

       cls_refcount_get(op, ref_tag, true);
-      const rgw_raw_obj& loc = miter.get_location().get_raw_obj(store);
-
-      auto& ioctx = ref.pool.ioctx();
-      ioctx.locator_set_key(loc.loc);
+      auto obj = svc.rados->obj(miter.get_location().get_raw_obj(store));
+      ret = obj.open(dpp);
       if (ret < 0) {

each call to aio->get() returns an AioResultList. we'll have to check these results for error codes. if we keep an AioResultList of all completions we got, the rollback logic below can loop over that list instead of tracking ref_obs

       rgw::AioResultList completed = aio->get(obj, rgw::Aio::librados_op(std::move(op), y), cost, id);
+      ret = rgw::check_for_errors(completed);
+      all_results.splice(all_results.end(), completed);
       if (ret < 0) {
         goto done_ret;

the rollback logic below would look something like:

     /* rollback reference */
     string ref_tag = tag + '\0';
-    for (riter = ref_objs.begin(); riter != ref_objs.end(); ++riter) {
+    for (auto& r : all_results) {
+      if (r.result < 0) {
+        continue; // skip errors
+      }
       ObjectWriteOperation op;
       cls_refcount_put(op, ref_tag, true);
-      ref.pool.ioctx().locator_set_key(riter->loc);
-      int r = rgw_rados_operate(dpp, ref.pool.ioctx(), riter->oid, &op, null_yield);
+      rgw::AioResultList completed = aio->get(r.obj, rgw::Aio::librados_op(std::move(op), y), cost, id);

}

ref_objs.push_back(loc);
ret = ref_io_manager.drain_ios();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-    ret = ref_io_manager.drain_ios();
+    rgw::AioResultList completed = aio->drain();
+    ret = rgw::check_for_errors(completed);
+    all_results.splice(all_results.end(), completed);
     if (ret < 0) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you really for such detailed illustrating, that's very helpful for me. At this please, I think it may does not have to add completed to all_results, because all_results is used for cleanup. What's your opinion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this drains the rest of the requests sent by the refcount loop above. if drain() returns any errors, we'll need to run that rollback logic below - and that rollback logic should undo any of the successful writes that we drained here too

so yes, i do think we need to add these completions to all_results here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeal, my bad. I revise and commit again, please review when you have time.

- name: rgw_max_copy_obj_concurrent_io
type: int
level: advanced
desc: the async refcount io number processed meanwhile in copy_obj
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, thanks for contributing. I would like to see this description be more clear.

@cbodley Does the below make sense?

"desc: Number of refcount operations to process concurrently when executing copy_obj"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, this is much better.

@anthonyeleven
Copy link
Contributor

Docs Lgtm

Copy link
Contributor

@cbodley cbodley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks great, how does it work in testing? can we find a way to inject an error to test the rollback logic?

Comment on lines +4577 to +4578
ret = rgw::check_for_errors(completed);
if (ret < 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is the done_ret: error path, we need to make sure we return the original error code - so we can't overwrite ret this way

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my bad.

@liangmingyuanneo
Copy link
Contributor Author

this looks great, how does it work in testing? can we find a way to inject an error to test the rollback logic?

I added a unittest, then implemented the refcount++ and rollback logic with BlockingAioThrottle. Please review again at your time.

@liangmingyuanneo
Copy link
Contributor Author

@cbodley

@cbodley
Copy link
Contributor

cbodley commented Sep 30, 2022

I added a unittest, then implemented the refcount++ and rollback logic with BlockingAioThrottle. Please review again at your time.

nice job getting the test to work! ultimately though, i don't think that makes for a good copyobj regression test because it duplicates the copy logic instead of testing RGWRados::copy_obj() directly. we already have https://github.com/ceph/s3-tests to test the s3 copy APIs, and src/test/rgw/test_rgw_throttle.cc for low-level throttle testing

the s3tests for CopyObj won't be able to cover the rollback logic, so i was just looking for confirmation that it works. i added an error after the drain() to trigger this rollback:

@@ -4508,6 +4510,7 @@ int RGWRados::copy_obj(RGWObjectCtx& obj_ctx,
     rgw::AioResultList completed = aio->drain();
     ret = rgw::check_for_errors(completed);
     all_results.splice(all_results.end(), completed);
+    ret = -EIO; // inject EIO on drain
     if (ret < 0) {
       ldpp_dout(dpp, 0) << "ERROR: failed to drain ios, the error code = " << ret <<dendl;
       goto done_ret;

in ceph/s3-tests#473 i added a test case that copies a 16mb object so we could see this ref-counting logic in action. and running that with -EIO error injection, i do see the rollback logic correctly calling call refcount.put on each of the 3 tail objects:

2022-09-30T12:22:13.945-0400 7f8cd0754640  0 req 5021485027944259865 0.009000298s s3:copy_obj ERROR: failed to drain ios, the error code = -5
2022-09-30T12:22:13.945-0400 7f8cd0754640  1 -- 192.168.245.128:0/352496449 --> [v2:192.168.245.128:6800/850826,v1:192.168.245.128:6801/850826] -- osd_op(unknown.0.0:4443 6.0 6:ffb5a3a3:::eb6b7daa-f6c3-46a8-9a92-5e649c93b954.4137.7__shadow_.T2kNhevtg8qKE7paX1WOXBXh1kzQRkM_1:head [call refcount.put in=84b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e22) v8 -- 0x558cea82a800 con 0x558ce2baf000
2022-09-30T12:22:13.945-0400 7f8cd0754640  1 -- 192.168.245.128:0/352496449 --> [v2:192.168.245.128:6800/850826,v1:192.168.245.128:6801/850826] -- osd_op(unknown.0.0:4444 6.0 6:74a0c32b:::eb6b7daa-f6c3-46a8-9a92-5e649c93b954.4137.7__shadow_.T2kNhevtg8qKE7paX1WOXBXh1kzQRkM_2:head [call refcount.put in=84b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e22) v8 -- 0x558cea960000 con 0x558ce2baf000
2022-09-30T12:22:13.945-0400 7f8cd0754640  1 -- 192.168.245.128:0/352496449 --> [v2:192.168.245.128:6800/850826,v1:192.168.245.128:6801/850826] -- osd_op(unknown.0.0:4445 6.0 6:13df50b3:::eb6b7daa-f6c3-46a8-9a92-5e649c93b954.4137.7__shadow_.T2kNhevtg8qKE7paX1WOXBXh1kzQRkM_3:head [call refcount.put in=84b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e22) v8 -- 0x558cea960400 con 0x558ce2baf000
2022-09-30T12:22:13.946-0400 7f8d9c1c6640  1 -- 192.168.245.128:0/352496449 <== osd.0 v2:192.168.245.128:6800/850826 4519 ==== osd_op_reply(4443 eb6b7daa-f6c3-46a8-9a92-5e649c93b954.4137.7__shadow_.T2kNhevtg8qKE7paX1WOXBXh1kzQRkM_1 [call] v22'62 uv62 ondisk = 0) v8 ==== 230+0+0 (crc 0 0 0) 0x558ce7a14d80 con 0x558ce2baf000
2022-09-30T12:22:13.947-0400 7f8d9c1c6640  1 -- 192.168.245.128:0/352496449 <== osd.0 v2:192.168.245.128:6800/850826 4520 ==== osd_op_reply(4444 eb6b7daa-f6c3-46a8-9a92-5e649c93b954.4137.7__shadow_.T2kNhevtg8qKE7paX1WOXBXh1kzQRkM_2 [call] v22'63 uv63 ondisk = 0) v8 ==== 230+0+0 (crc 0 0 0) 0x558ce7a14d80 con 0x558ce2baf000
2022-09-30T12:22:13.947-0400 7f8d9c1c6640  1 -- 192.168.245.128:0/352496449 <== osd.0 v2:192.168.245.128:6800/850826 4521 ==== osd_op_reply(4445 eb6b7daa-f6c3-46a8-9a92-5e649c93b954.4137.7__shadow_.T2kNhevtg8qKE7paX1WOXBXh1kzQRkM_3 [call] v22'64 uv64 ondisk = 0) v8 ==== 230+0+0 (crc 0 0 0) 0x558ce7a14d80 con 0x558ce2baf000
2022-09-30T12:22:13.947-0400 7f8cd8764640  2 req 5021485027944259865 0.011000365s s3:copy_obj completing
2022-09-30T12:22:13.947-0400 7f8cd8764640  0 WARNING: set_req_state_err err_no=5 resorting to 500

could you please remove the test case?

}

auto aio = rgw::make_throttle(cct->_conf->rgw_max_copy_obj_concurrent_io, y);
rgw::AioResultList all_results;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like we could avoid allocating aio for the copy_itself case?

  std::unique_ptr<rgw::Aio> aio;
  rgw::AioResultList all_results;
  if (!copy_itself) {
    aio = rgw::make_throttle(cct->_conf->rgw_max_copy_obj_concurrent_io, y);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well.

Signed-off-by: Mingyuan Liang <liangmingyuan@baidu.com>
@liangmingyuanneo
Copy link
Contributor Author

liangmingyuanneo commented Oct 2, 2022

could you please remove the test case?

ok.

Copy link
Contributor

@cbodley cbodley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbodley
Copy link
Contributor

cbodley commented Oct 3, 2022

jenkins test api

@cbodley cbodley merged commit e4a967f into ceph:main Oct 3, 2022
@cbodley
Copy link
Contributor

cbodley commented Oct 3, 2022

@liangmingyuanneo do you need this fix on the pacific or quincy release? we generally don't backport this kind of performance improvement

@liangmingyuanneo
Copy link
Contributor Author

@liangmingyuanneo do you need this fix on the pacific or quincy release? we generally don't backport this kind of performance improvement

No. So I don't backport it either now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants