rgw multisite reshard: generation numbers in data sync are not optional<>#45417
rgw multisite reshard: generation numbers in data sync are not optional<>#45417cbodley wants to merge 4 commits intoceph:wip-rgw-multisite-reshardfrom
Conversation
the coroutine that 'bucket sync run' spawns only syncs a single log generation. the actual transition logic is built into data sync require the admin pass a specific --gen as input Signed-off-by: Casey Bodley <cbodley@redhat.com>
data sync was passing 'empty' generation numbers into `RGWDataSyncSingleEntryCR` when triggered by full sync. `RGWSyncBucketCR` treated these empty generation numbers as a request to sync the current generation tracked in `bucket_status.incremental_gen` `RGWDataSyncSingleEntryCR` uses the 'bucket sync cache' to prevent itself from spawning concurrent syncs on the same entry. so when a new request conflicts with an existing one, it must decide which one to throw away. the use of an optional generation number made it impossible for `RGWDataSyncSingleEntryCR` to enforce a strict ordering between these obligations, so it was just as likely to throw away the wrong one replace the 'empty' generation numbers from data full sync with the value 0. instead of requesting sync of the 'current' generation, this obligation with generation 0 is potentially much less restrictive: if the bucket has not ever been resharded, its only log generation would be 0, and this change would have no effect: the bucket's current would match both the 'empty' generation and an explicit generation 0 but if the bucket has been resharded, an obligation of `gen=0` only requires that we've made it past 'bucket full sync'. once we finish full sync and transition to incremental sync on its latest generation, we'll do the comparison with the input `gen=0` and return success after logging "requested sync of past generation 0 < Y, returning success". after 'data full sync' switches to incremental, we'll eventually see the datalog entries for any changes to later generations and trigger their sync then Signed-off-by: Casey Bodley <cbodley@redhat.com>
The alternative would be to compare generations and throw out older generation/no generation if we have a (newer) one. But if we have the potential for older generations and blank generations coming up on error retry, then we have to keep them around. Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
| struct State { | ||
| // the source bucket shard to sync | ||
| rgw_bucket_shard key; | ||
| std::pair<rgw_bucket_shard, uint64_t> key; |
There was a problem hiding this comment.
TODO: introduce a rgw_bucket_log_shard struct that includes the generation
i avoided this before, because the optional gen was specific to data sync paths, so couldn't be generalized this way
but changing this would cause lots of backport conflicts that i don't want to deal with, so we'll hold off on it for now
| cerr << "ERROR: bucket not specified" << std::endl; | ||
| return EINVAL; | ||
| } | ||
| if (!gen.has_value()) { |
There was a problem hiding this comment.
how would the user know which gen to specify?
There was a problem hiding this comment.
they don't, really. this is a hack on top of something that was kinda broken already
There was a problem hiding this comment.
by 'kinda broken already', i mean that it doesn't transition over reshards, so would just sync the current gen then stop
|
ugh, sorry, i don't think this design is correct in the case of in a fresh cluster which has no sync status objects at all, we can safely assume that a sync obligation at gen=0 would trigger a 'bucket full sync', so catch the bucket up entirely but for a bucket whose sync status is already in incremental on some gen>0, a sync obligation at gen=0 will do nothing but log "requested sync of past generation". in the absense of other writes in this bucket shard, nothing would come along later and trigger the sync of its current generation the job of |
this removes the idea of 'optional' generation numbers coming from data sync
with explicit generation numbers, there is no longer any ambiguity over the equality of two obligations (one with a generation and one without). so the 'bucket sync cache' can cache each generation separately, and prevent different generations from interacting in
RGWDataSyncSingleEntryCR. see commit message for detailsShow available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windows