rgw: prevent spurious/lost notifications in the index completion thread by yuvalif · Pull Request #45131 · ceph/ceph

yuvalif · 2022-02-23T15:25:37Z

this was happening when async completions happened during reshard.
more information about testing:
https://gist.github.com/yuvalif/d526c0a3a4c5b245b9e951a6c5a10517

to trace possible issues, dpp was added to the completion thread.
this should allow finding unhandled completions due to reshards. for example:
https://0x0.st/oKQ1.log

Signed-off-by: Yuval Lifshitz ylifshit@redhat.com

Checklist

Tracker (select at least one)
- References tracker ticket
- Very recent bug; references commit where it was introduced
- New feature (ticket optional)
- Doc update (no ticket needed)
- Code cleanup (no ticket needed)
Component impact
- Affects Dashboard, opened tracker ticket
- Affects Orchestrator, opened tracker ticket
- No impact that needs to be tracked
Documentation (select at least one)
- Updates relevant documentation
- No doc update is appropriate
Tests (select at least one)
- Includes unit test(s)
- Includes integration test(s)
- Includes bug reproducer
- No tests

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox

cbodley · 2022-02-24T14:18:31Z

src/rgw/rgw_rados.cc

  entry->bilog_op = bilog_op;
+  std::ostringstream oss;
+  dpp->gen_prefix(oss);
+  entry->original_prefix = oss.str(); 


i suggested doing this for debugging, but i'd prefer not to merge it

this adds overhead (allocations and a global lock for iostream locale support) to every index operation, but is only relevant to a small subset of those (writes that prepared before reshard started, but completed after)

i added debug logs to handle_completion() even for the non-reshard cases.
since these are async callbacks where the original dpp context is not lost (they are synchronous with the op)
i can probably pass in the original dpp, without copying the prefix string

cbodley · 2022-02-24T14:37:49Z

src/rgw/rgw_rados.cc


+void RGWIndexCompletionManager::process()
+{
+  auto dpp = std::make_unique<DoutPrefix>(store->ctx(), dout_subsys, "rgw index completion thread: ");


can't this live on the stack?

cbodley · 2022-02-24T16:53:07Z

src/rgw/rgw_rados.cc

+  const int num_shards;
+  ceph::containers::tiny_vector<ceph::mutex> locks;
+  std::vector<set<complete_op_data*>> completions;
+  std::vector<complete_op_data*> async_completions;


the async_ piece is a bit misleading here? it's completions that were submitted asynchronously with aio_operate(). the completions in this vector failed with ERR_BUSY_RESHARDING and need to be retried synchronously. maybe call this retry_completions?

what was the motivation for collapsing RGWIndexCompletionThread into RGWIndexCompletionManager? i think it's better to keep the Thread class and encapsulate its state/locking there. that way we don't mix completions/locks and async_completions/async_completions_lock together in one place

the async_ piece is a bit misleading here? it's completions that were submitted asynchronously with aio_operate(). the completions in this vector failed with ERR_BUSY_RESHARDING and need to be retried synchronously. maybe call this retry_completions?

sure, will rename

what was the motivation for collapsing RGWIndexCompletionThread into RGWIndexCompletionManager? i think it's better to keep the Thread class and encapsulate its state/locking there. that way we don't mix completions/locks and async_completions/async_completions_lock together in one place

there is no thread class. it is just a function called from the manager class inside its own thread.
both the async_completion_lock is used inside functions called from different thread contexts, os it belongs to the manager class

cbodley · 2022-02-24T16:56:49Z

src/rgw/rgw_rados.cc

+      std::unique_lock l{async_completions_lock};
+      cond.wait(l, [this](){return _stop || !async_completions.empty();});
+      if (_stop) {
+        return;
+      }
+      async_completions.swap(comps);


yuvalif · 2022-02-28T14:58:34Z

when testing with 2 zones (1 rgw in each zone), and uploading objects only to primary i do not see any difference between the zones
when testing with 2 zones (1 rgw in each zone), and uploading objects only to the same bucket in both zones, there is no difference between the objects, however, the 2ndary zone sync status is "behind". see:
https://paste.sh/aGzWOiJ8#UyGOOVvdILeFknWgd-nioBXd

cbodley · 2022-02-28T19:01:05Z

src/rgw/rgw_rados.h

+  int cls_obj_complete_op(const DoutPrefixProvider *dpp, BucketShard& bs, const rgw_obj& obj, RGWModifyOp op, std::string& tag, int64_t pool, uint64_t epoch,
                          rgw_bucket_dir_entry& ent, RGWObjCategory category, std::list<rgw_obj_index_key> *remove_objs, uint16_t bilog_flags, rgw_zone_set *zones_trace = nullptr);
-  int cls_obj_complete_add(BucketShard& bs, const rgw_obj& obj, std::string& tag, int64_t pool, uint64_t epoch, rgw_bucket_dir_entry& ent,
+  int cls_obj_complete_add(const DoutPrefixProvider *dpp, BucketShard& bs, const rgw_obj& obj, std::string& tag, int64_t pool, uint64_t epoch, rgw_bucket_dir_entry& ent,
                           RGWObjCategory category, std::list<rgw_obj_index_key> *remove_objs, uint16_t bilog_flags, rgw_zone_set *zones_trace = nullptr);
-  int cls_obj_complete_del(BucketShard& bs, std::string& tag, int64_t pool, uint64_t epoch, rgw_obj& obj,
+  int cls_obj_complete_del(const DoutPrefixProvider *dpp, BucketShard& bs, std::string& tag, int64_t pool, uint64_t epoch, rgw_obj& obj,
                           ceph::real_time& removed_mtime, std::list<rgw_obj_index_key> *remove_objs, uint16_t bilog_flags, rgw_zone_set *zones_trace = nullptr);
-  int cls_obj_complete_cancel(BucketShard& bs, std::string& tag, rgw_obj& obj,
+  int cls_obj_complete_cancel(const DoutPrefixProvider *dpp, BucketShard& bs, std::string& tag, rgw_obj& obj,


this adds a dpp to lots of functions, apparently for RGWIndexCompletionManager::create_completion(), but the new dpp argument is unused there. can we safely remove these now? that would simplify backports

ok. removing dpp. as it is too expensive to cache it inside the callback args

this was happening when asyn completions happened during reshard. more information about testing: https://gist.github.com/yuvalif/d526c0a3a4c5b245b9e951a6c5a10517 we also add more logs to the completion manager. should allow finding unhandled completions due to reshards. Signed-off-by: Yuval Lifshitz <ylifshit@redhat.com>

cbodley · 2022-02-28T22:35:41Z

merged, thanks @yuvalif!

yuvalif added the rgw label Feb 23, 2022

yuvalif changed the title ~~rgw: add dpp to the completion thread and manager~~ rgw: prevent spurious/lost notifications in the index completion thread Feb 24, 2022

yuvalif requested review from adamemerson and cbodley February 24, 2022 10:53

yuvalif added the bug-fix label Feb 24, 2022

adamemerson approved these changes Feb 24, 2022

View reviewed changes

cbodley reviewed Feb 24, 2022

View reviewed changes

yuvalif force-pushed the wip-yuval-completion-dpp branch from 2918b9b to 359db94 Compare February 28, 2022 18:32

cbodley reviewed Feb 28, 2022

View reviewed changes

yuvalif force-pushed the wip-yuval-completion-dpp branch from 359db94 to 1cc5fdf Compare February 28, 2022 19:31

cbodley approved these changes Feb 28, 2022

View reviewed changes

cbodley closed this Feb 28, 2022

cbodley mentioned this pull request Mar 1, 2022

rgw: prevent spurious/lost notifications in the index completion thread #45212

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rgw: prevent spurious/lost notifications in the index completion thread#45131

rgw: prevent spurious/lost notifications in the index completion thread#45131
yuvalif wants to merge 1 commit intoceph:wip-rgw-multisite-reshardfrom
yuvalif:wip-yuval-completion-dpp

yuvalif commented Feb 23, 2022 •

edited

Loading

Uh oh!

cbodley Feb 24, 2022

Uh oh!

yuvalif Feb 26, 2022

Uh oh!

cbodley Feb 24, 2022

Uh oh!

cbodley Feb 24, 2022

Uh oh!

yuvalif Feb 26, 2022

Uh oh!

cbodley Feb 24, 2022

Uh oh!

yuvalif commented Feb 28, 2022

Uh oh!

cbodley Feb 28, 2022

Uh oh!

yuvalif Feb 28, 2022

Uh oh!

cbodley commented Feb 28, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

yuvalif commented Feb 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yuvalif commented Feb 28, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cbodley commented Feb 28, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yuvalif commented Feb 23, 2022 •

edited

Loading