rgw multisite: data sync optimizations#34094
Conversation
0b18c7d to
fa6a701
Compare
c833844 to
251e800
Compare
35d39d4 to
4fb8f81
Compare
4fb8f81 to
1d24fb7
Compare
|
rebased |
|
planning to add some simple unit test cases for the Cache class |
|
@cbodley I started reviewing but rebase seems to have clobbered it. I'll need to re-start I think. |
apologies! i won't force-push again until reviews are done |
| } | ||
| } | ||
| if (progress) { | ||
| *progress = *std::min_element(shard_progress.begin(), shard_progress.end()); |
There was a problem hiding this comment.
@cbodley: not sure if I fully understand this. What if a shard had no changes therefore it's completely up to date? What will be its timestamp?
There was a problem hiding this comment.
for each shard, we'll either get the timestamp we read from its sync status, or the timestamp we last wrote to its sync status. the sync status will either store a) the last timestamp incremental sync wrote, b) the timestamp when full sync started, or c) an empty timestamp if those events happened before upgrade
if we get empty timestamps, we lose the state->progress_timestamp filtering optimization in DataSyncSingleEntry, and we'll retry RunBucketSourcesSync each time state->counter changes - this is effectively the same behavior we got from marker_tracker->need_retry() before
but eventually these timestamps will fill in. and in the general case, we should be seeing changes to these bucket shards, because that's what the datalog entry implies
There was a problem hiding this comment.
it's also worth noting how this timestamp strategy ties into the planned 'sync fairness' work, where we'll be expiring our mdlog and datalog leases somewhat regularly so other gateways have a chance to take over
when a datalog lease is pending this expiration, we'll stop any associated bucket sync. by tracking the timestamps of their progress, we can potentially retire the datalog/error-repo entries that spawned them. that way the next gateway that gets the lease won't have to retry it
9fa01d7 to
91ed2a3
Compare
|
addressed review comments, squashed, and rebased on top of the changes in #33982 |
3d9f05c to
8b4d4e1
Compare
|
added some test fixes for ceph_test_cls_cmpomap and got a completely green run in http://pulpito.ceph.com/cbodley-2020-04-01_19:06:18-rgw-wip-cbodley-testing-distro-basic-smithi/ |
coroutines that want to sleep should just call RGWCoroutine::wait() Signed-off-by: Casey Bodley <cbodley@redhat.com>
async notifications are just hints, and don't imply an obligation to sync the bucket shard. if we fail to sync, don't write it to the error repo for retry. we'll see the change later when processing the datalog Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
it's easier for DataSyncShard to handle parsing failures before calling MarkerTrack::start() and DataSyncSingleEntry Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
8b4d4e1 to
f897683
Compare
bucket sync remembers the latest timestamp that it successfully wrote to the bucket sync status. data sync can use this to make future decisions without having to reread its sync status Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
like write(), we need to apply the writev back to readv Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
use cls_version on bucket sync status to detect racing writes - whether from other gateways, or from radosgw-admin commands like 'bucket sync' or 'bucket sync init' classes that require a non-null version tracker take it by reference Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
bucket sync now gets a const pointer to the DataSyncShard's lease to check whether the lease has expired Signed-off-by: Casey Bodley <cbodley@redhat.com>
the error_repo writes need to be synchronous Signed-off-by: Casey Bodley <cbodley@redhat.com>
f897683 to
482b44f
Compare
|
@yehudasa i think this is ready - any final review comments? |
|
@ofriedma 7fb98ea is the main commit that threaded similar use of cls_version in metadata and data sync was how i was planning to address https://tracker.ceph.com/issues/43357 |
This contains a series of related data sync optimizations which allow sites with a large backlog of stale datalog entries to recover much faster.
Data sync obligations (whether from datalog entries, error repo retries, or async notifications) were previously unbounded, meaning that bucket sync had to catch up completely before they could be retired. Each obligation now carries a timestamp, allowing them to be retired (or ignored) once bucket sync status reaches that timestamp. The new
cls_cmpomapmodule from #33982 allows the error repo to store these timestamps in omap values, and update/remove entries with atomic timestamp comparisons.DataSyncShardCR now contains a cache of its associated bucket shards. This cache remembers the latest timestamp from the bucket-shard's last sync, and uses that in DataSyncSingleEntryCR to quickly filter out obligations that we've already satisfied. This cache also detects racing calls to DataSyncSingleEntryCR and avoids duplicating bucket sync on the same entry. The cache is invalidated when the data sync lease expires.
Bucket sync leases are removed now that we don't duplicate calls to bucket sync within the same radosgw instance. Bucket sync is already covered by the lease of the data sync shard that spawned it, so different radosgws are similarly protected. The bucket sync status objects now use cls_version to detect racing writes in the event that a data sync shard lease expires (or the admin runs
bucket syncorbucket sync init).Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard backendjenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume tox