rgw multisite: remove sharding from bucket full sync#37573
rgw multisite: remove sharding from bucket full sync#37573cbodley wants to merge 27 commits intoceph:masterfrom
Conversation
0a5b9e5 to
8092929
Compare
|
https://pulpito.ceph.com/cbodley-2020-10-08_17:50:30-rgw-wip-cbodley-testing-distro-basic-smithi/ i seem to have broken the 'bucket sync disable' tests for pubsub again |
2a1183d to
5963f09
Compare
addressed by using exclusive create/cls_version in |
|
Should we even continue supporting |
|
@yehudasa remember the sync policy stuff is still experimental, and has to be manually enabled in the zonegroup configuration once all zones are upgraded to octopus+. once that's set up, we'll either need some automated conversion logic for disabled buckets, or documentation on how to set up the equivalent policy. until then, i think we're stuck supporting |
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
allows other code to spawn this coroutine without having the class definition Signed-off-by: Casey Bodley <cbodley@redhat.com>
RGWShardCollectCR was hard-coded to ignore ENOENT errors and print a 'failed to fetch log status' error message. this moves that logic into a handle_result() virtual function. it also exposes the member variables 'status' and 'max_concurrent' as protected, so they can be consulted or modified by overrides of handle_result() and spawn_next() Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
a coroutine to initialize a bucket for full sync using a new bucket-wide sync status object Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
full sync happens as the bucket level, so the shards will always start in StateIncrementalSync Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
renamed ListBucketShardCR to ListRemoteBucketCR and removed the shard-id parameter renamed BucketFullSyncShardMarkerTrack to BucketFullSyncMarkerTrack, which now updates the bucket-level rgw_bucket_sync_status renamed BucketShardFullSyncCR to BucketFullSyncCR BucketSyncCR now takes a bucket-wide lease during full sync Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
5963f09 to
c8f1c86
Compare
|
pushed an update which seems to be passing multisite tests, except for those with |
Signed-off-by: Casey Bodley <cbodley@redhat.com>
if metadata sync hasn't finished, the 'bucket checkpoint' commands may not find its bucket info Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
the ability to filter tests by attribute is provided by the nose.plugins.attrib plugin, which wasn't being loaded by default Signed-off-by: Casey Bodley <cbodley@redhat.com>
c8f1c86 to
6de4df0
Compare
this backoff is triggered often by the per-bucket lease for full sync, and causes tests to fail with checkpoint timeouts Signed-off-by: Casey Bodley <cbodley@redhat.com>
eaa81b2 to
07834ac
Compare
|
Do we still need 'rgw_bucket_shard_full_sync_marker' in rgw_bucket_shard_sync_info object which is now only tracking incremental sync? |
Signed-off-by: Casey Bodley <cbodley@redhat.com>
|
i pushed this to a feature branch at https://github.com/ceph/ceph/commits/wip-rgw-multisite-reshard so we can continue development there (and target PRs at it) without merging to master |
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
|
this is in #39002 |
currently, bucket full sync runs once for each bucket index shard, by listing all objects in that shard and fetching them. this coupling with the index shards is problematic for resharding, because any full syncs in progress would be unable to continue their listing after a reshard
by moving full sync from per-bucket-shard to per-bucket, we can use a normal ordered bucket listing, which is stable across reshards. to track this full sync status, we add a new per-bucket sync status object, along with some backward-compatibility logic to skip a full sync if all of its shard status objects exist and show incremental sync
when starting full sync, a cls lock is acquired on this full sync status object to prevent other data sync shards from duplicating the work. the shards that fail to get this lock will retry until the full sync finishes
in the future, this bucket sync status object will be used to coordinate the per-shard incremental sync across reshard events
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume tox