rgw: Redimension bucket sync cache to include optional generation#45122
rgw: Redimension bucket sync cache to include optional generation#45122adamemerson wants to merge 1 commit intoceph:wip-rgw-multisite-reshardfrom
Conversation
The alternative would be to compare generations and throw out older generation/no generation if we have a (newer) one. But if we have the potential for older generations and blank generations coming up on error retry, then we have to keep them around. Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
|
i took this commit into #45417 with an additional change, can you please review? |
|
Where do blank generations come from? In discussion w/Yuval, the topic of generation "reset" also came up--is that actually needed? |
empty generations come from 'data full sync', which walks through a listing of remote bucket metadata and tries to run sync on each. it doesn't get any information about log generations from that list of bucket metadata keys, so it passed an 'empty' generation number into bucket sync in
not sure what this means |
@yuvalif was that related to bucket sync disable/enable? do you suspect there's a problem with empty gen vs. gen=0 there? |
was wondering about the case where we have incremental sync with some gen going on, and then we disable+enable sync. |
let's be careful to differentiate between 'bucket full sync' and 'data full sync'. bucket sync disable/enable is cycling through the remember this logic in the |
|
@adamemerson unfortunately my approach in #45417 (comment) isn't going to work, so this PR may be the best we can do my main concern with this one is that it allows us to spawn two concurrent syncs on the same thing (one with a real gen, and one with empty), now that they're tracked by separate cache entries when the 'bucket sync cache' was added, it allowed us to detect racing calls to however, along with that change to remove locking, i did at least make sure that all the sync status writes were using cls_version, so i believe that racing bucket syncs should be okay in general. they may duplicate some work, but the first time they try to record their progress in the sync status object, one of them will fail with ECANCELED so let's take this fix, and try to exercise these races to make sure this untested error path does what we expect? |
|
merged into wip-rgw-multisite-reshard, thanks @adamemerson |
The alternative would be to compare generations and throw out older
generation/no generation if we have a (newer) one.
But if we have the potential for older generations and blank
generations coming up on error retry, then we have to keep them
around.
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume tox