rgw multisite reshard: generation numbers in data sync are not optional<> by cbodley · Pull Request #45417 · ceph/ceph

cbodley · 2022-03-16T13:32:18Z

this removes the idea of 'optional' generation numbers coming from data sync

with explicit generation numbers, there is no longer any ambiguity over the equality of two obligations (one with a generation and one without). so the 'bucket sync cache' can cache each generation separately, and prevent different generations from interacting in RGWDataSyncSingleEntryCR. see commit message for details

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows

the coroutine that 'bucket sync run' spawns only syncs a single log generation. the actual transition logic is built into data sync require the admin pass a specific --gen as input Signed-off-by: Casey Bodley <cbodley@redhat.com>

data sync was passing 'empty' generation numbers into `RGWDataSyncSingleEntryCR` when triggered by full sync. `RGWSyncBucketCR` treated these empty generation numbers as a request to sync the current generation tracked in `bucket_status.incremental_gen` `RGWDataSyncSingleEntryCR` uses the 'bucket sync cache' to prevent itself from spawning concurrent syncs on the same entry. so when a new request conflicts with an existing one, it must decide which one to throw away. the use of an optional generation number made it impossible for `RGWDataSyncSingleEntryCR` to enforce a strict ordering between these obligations, so it was just as likely to throw away the wrong one replace the 'empty' generation numbers from data full sync with the value 0. instead of requesting sync of the 'current' generation, this obligation with generation 0 is potentially much less restrictive: if the bucket has not ever been resharded, its only log generation would be 0, and this change would have no effect: the bucket's current would match both the 'empty' generation and an explicit generation 0 but if the bucket has been resharded, an obligation of `gen=0` only requires that we've made it past 'bucket full sync'. once we finish full sync and transition to incremental sync on its latest generation, we'll do the comparison with the input `gen=0` and return success after logging "requested sync of past generation 0 < Y, returning success". after 'data full sync' switches to incremental, we'll eventually see the datalog entries for any changes to later generations and trigger their sync then Signed-off-by: Casey Bodley <cbodley@redhat.com>

The alternative would be to compare generations and throw out older generation/no generation if we have a (newer) one. But if we have the potential for older generations and blank generations coming up on error retry, then we have to keep them around. Signed-off-by: Adam C. Emerson <aemerson@redhat.com>

cbodley · 2022-03-16T13:43:11Z

src/rgw/rgw_bucket_sync_cache.h

 struct State {
  // the source bucket shard to sync
-  rgw_bucket_shard key;
+  std::pair<rgw_bucket_shard, uint64_t> key;


TODO: introduce a rgw_bucket_log_shard struct that includes the generation

i avoided this before, because the optional gen was specific to data sync paths, so couldn't be generalized this way

but changing this would cause lots of backport conflicts that i don't want to deal with, so we'll hold off on it for now

yuvalif · 2022-03-16T16:34:29Z

src/rgw/rgw_admin.cc

      cerr << "ERROR: bucket not specified" << std::endl;
      return EINVAL;
    }
+    if (!gen.has_value()) {


how would the user know which gen to specify?

they don't, really. this is a hack on top of something that was kinda broken already

by 'kinda broken already', i mean that it doesn't transition over reshards, so would just sync the current gen then stop

cbodley · 2022-03-16T18:39:06Z

ugh, sorry, i don't think this design is correct in the case of data sync init after buckets have reached incremental sync

in a fresh cluster which has no sync status objects at all, we can safely assume that a sync obligation at gen=0 would trigger a 'bucket full sync', so catch the bucket up entirely

but for a bucket whose sync status is already in incremental on some gen>0, a sync obligation at gen=0 will do nothing but log "requested sync of past generation". in the absense of other writes in this bucket shard, nothing would come along later and trigger the sync of its current generation

the job of data sync init is to, by forcing a full sync, restore our invariant that all data on the source zone is tracked by some obligation to sync it. we can't accomplish that if data full sync only uses obligations with gen=0

cbodley and others added 4 commits March 15, 2022 15:26

radosgw-admin: 'bucket sync run' requires --gen

6c71058

the coroutine that 'bucket sync run' spawns only syncs a single log generation. the actual transition logic is built into data sync require the admin pass a specific --gen as input Signed-off-by: Casey Bodley <cbodley@redhat.com>

SQUASHME: bucket sync cache with non-optional gen

bc56485

cbodley added the rgw label Mar 16, 2022

cbodley requested review from adamemerson and yuvalif March 16, 2022 13:32

This was referenced Mar 16, 2022

rgw: Redimension bucket sync cache to include optional generation #45122

Closed

rgw/multisite: consider generation when canceling obligations #45389

Closed

rgw multisite: bucket reshard work in progress #39002

Merged

cbodley commented Mar 16, 2022

View reviewed changes

yuvalif reviewed Mar 16, 2022

View reviewed changes

adamemerson approved these changes Mar 16, 2022

View reviewed changes

cbodley closed this Mar 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rgw multisite reshard: generation numbers in data sync are not optional<>#45417

rgw multisite reshard: generation numbers in data sync are not optional<>#45417
cbodley wants to merge 4 commits intoceph:wip-rgw-multisite-reshardfrom
cbodley:wip-rgw-multisite-reshard-data-sync-gen

cbodley commented Mar 16, 2022

Uh oh!

cbodley Mar 16, 2022

Uh oh!

yuvalif Mar 16, 2022

Uh oh!

cbodley Mar 16, 2022

Uh oh!

cbodley Mar 16, 2022

Uh oh!

cbodley commented Mar 16, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

cbodley commented Mar 16, 2022

Uh oh!

cbodley Mar 16, 2022

Choose a reason for hiding this comment

Uh oh!

yuvalif Mar 16, 2022

Choose a reason for hiding this comment

Uh oh!

cbodley Mar 16, 2022

Choose a reason for hiding this comment

Uh oh!

cbodley Mar 16, 2022

Choose a reason for hiding this comment

Uh oh!

cbodley commented Mar 16, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants