Skip to content

rgw multisite: data sync optimizations#34094

Merged
cbodley merged 28 commits intoceph:masterfrom
cbodley:wip-rgw-data-sync-cache
Apr 15, 2020
Merged

rgw multisite: data sync optimizations#34094
cbodley merged 28 commits intoceph:masterfrom
cbodley:wip-rgw-data-sync-cache

Conversation

@cbodley
Copy link
Contributor

@cbodley cbodley commented Mar 20, 2020

This contains a series of related data sync optimizations which allow sites with a large backlog of stale datalog entries to recover much faster.

Data sync obligations (whether from datalog entries, error repo retries, or async notifications) were previously unbounded, meaning that bucket sync had to catch up completely before they could be retired. Each obligation now carries a timestamp, allowing them to be retired (or ignored) once bucket sync status reaches that timestamp. The new cls_cmpomap module from #33982 allows the error repo to store these timestamps in omap values, and update/remove entries with atomic timestamp comparisons.

DataSyncShardCR now contains a cache of its associated bucket shards. This cache remembers the latest timestamp from the bucket-shard's last sync, and uses that in DataSyncSingleEntryCR to quickly filter out obligations that we've already satisfied. This cache also detects racing calls to DataSyncSingleEntryCR and avoids duplicating bucket sync on the same entry. The cache is invalidated when the data sync lease expires.

Bucket sync leases are removed now that we don't duplicate calls to bucket sync within the same radosgw instance. Bucket sync is already covered by the lease of the data sync shard that spawned it, so different radosgws are similarly protected. The bucket sync status objects now use cls_version to detect racing writes in the event that a data sync shard lease expires (or the admin runs bucket sync or bucket sync init).

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard backend
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

@cbodley cbodley added the rgw label Mar 20, 2020
@cbodley cbodley force-pushed the wip-rgw-data-sync-cache branch from 0b18c7d to fa6a701 Compare March 20, 2020 18:49
@cbodley cbodley force-pushed the wip-rgw-data-sync-cache branch 2 times, most recently from c833844 to 251e800 Compare March 20, 2020 20:47
@cbodley cbodley requested a review from yehudasa March 24, 2020 12:58
@cbodley cbodley force-pushed the wip-rgw-data-sync-cache branch 3 times, most recently from 35d39d4 to 4fb8f81 Compare March 25, 2020 14:54
@cbodley cbodley requested a review from smanjara March 25, 2020 17:06
@cbodley cbodley force-pushed the wip-rgw-data-sync-cache branch from 4fb8f81 to 1d24fb7 Compare March 26, 2020 14:43
@cbodley
Copy link
Contributor Author

cbodley commented Mar 26, 2020

rebased

@cbodley
Copy link
Contributor Author

cbodley commented Mar 26, 2020

planning to add some simple unit test cases for the Cache class

@yehudasa
Copy link
Member

@cbodley I started reviewing but rebase seems to have clobbered it. I'll need to re-start I think.

@cbodley
Copy link
Contributor Author

cbodley commented Mar 26, 2020

@cbodley I started reviewing but rebase seems to have clobbered it. I'll need to re-start I think.

apologies! i won't force-push again until reviews are done

Copy link
Member

@yehudasa yehudasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbodley see my comments

}
}
if (progress) {
*progress = *std::min_element(shard_progress.begin(), shard_progress.end());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbodley: not sure if I fully understand this. What if a shard had no changes therefore it's completely up to date? What will be its timestamp?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for each shard, we'll either get the timestamp we read from its sync status, or the timestamp we last wrote to its sync status. the sync status will either store a) the last timestamp incremental sync wrote, b) the timestamp when full sync started, or c) an empty timestamp if those events happened before upgrade

if we get empty timestamps, we lose the state->progress_timestamp filtering optimization in DataSyncSingleEntry, and we'll retry RunBucketSourcesSync each time state->counter changes - this is effectively the same behavior we got from marker_tracker->need_retry() before

but eventually these timestamps will fill in. and in the general case, we should be seeing changes to these bucket shards, because that's what the datalog entry implies

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's also worth noting how this timestamp strategy ties into the planned 'sync fairness' work, where we'll be expiring our mdlog and datalog leases somewhat regularly so other gateways have a chance to take over

when a datalog lease is pending this expiration, we'll stop any associated bucket sync. by tracking the timestamps of their progress, we can potentially retire the datalog/error-repo entries that spawned them. that way the next gateway that gets the lease won't have to retry it

@cbodley cbodley force-pushed the wip-rgw-data-sync-cache branch 2 times, most recently from 9fa01d7 to 91ed2a3 Compare March 30, 2020 21:14
@cbodley
Copy link
Contributor Author

cbodley commented Mar 30, 2020

addressed review comments, squashed, and rebased on top of the changes in #33982

@cbodley cbodley force-pushed the wip-rgw-data-sync-cache branch 5 times, most recently from 3d9f05c to 8b4d4e1 Compare April 1, 2020 20:43
@cbodley
Copy link
Contributor Author

cbodley commented Apr 1, 2020

added some test fixes for ceph_test_cls_cmpomap and got a completely green run in http://pulpito.ceph.com/cbodley-2020-04-01_19:06:18-rgw-wip-cbodley-testing-distro-basic-smithi/

cbodley added 3 commits April 13, 2020 11:06
coroutines that want to sleep should just call RGWCoroutine::wait()

Signed-off-by: Casey Bodley <cbodley@redhat.com>
async notifications are just hints, and don't imply an obligation to
sync the bucket shard. if we fail to sync, don't write it to the error
repo for retry. we'll see the change later when processing the datalog

Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
cbodley added 5 commits April 13, 2020 11:06
it's easier for DataSyncShard to handle parsing failures before calling
MarkerTrack::start() and DataSyncSingleEntry

Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
@cbodley cbodley force-pushed the wip-rgw-data-sync-cache branch from 8b4d4e1 to f897683 Compare April 13, 2020 15:36
cbodley added 15 commits April 13, 2020 14:08
bucket sync remembers the latest timestamp that it successfully wrote to
the bucket sync status. data sync can use this to make future decisions
without having to reread its sync status

Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
like write(), we need to apply the writev back to readv

Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
use cls_version on bucket sync status to detect racing writes - whether
from other gateways, or from radosgw-admin commands like 'bucket sync'
or 'bucket sync init'

classes that require a non-null version tracker take it by reference

Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
bucket sync now gets a const pointer to the DataSyncShard's lease to
check whether the lease has expired

Signed-off-by: Casey Bodley <cbodley@redhat.com>
the error_repo writes need to be synchronous

Signed-off-by: Casey Bodley <cbodley@redhat.com>
@cbodley cbodley force-pushed the wip-rgw-data-sync-cache branch from f897683 to 482b44f Compare April 13, 2020 18:08
@cbodley
Copy link
Contributor Author

cbodley commented Apr 14, 2020

@cbodley
Copy link
Contributor Author

cbodley commented Apr 15, 2020

@yehudasa i think this is ready - any final review comments?

@yehudasa yehudasa self-requested a review April 15, 2020 14:52
Copy link
Member

@yehudasa yehudasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@cbodley cbodley merged commit b62e0c2 into ceph:master Apr 15, 2020
@cbodley cbodley deleted the wip-rgw-data-sync-cache branch April 15, 2020 15:00
@cbodley
Copy link
Contributor Author

cbodley commented Apr 7, 2022

@ofriedma 7fb98ea is the main commit that threaded RGWObjVersionTracker into the reads/writes of bucket sync status. the important thing here is that writers use the same instance of RGWObjVersionTracker that they used for the read

similar use of cls_version in metadata and data sync was how i was planning to address https://tracker.ceph.com/issues/43357

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants