rgw: multisite stabilization for reef#48898
Conversation
|
@adamemerson @smanjara @soumyakoduri @yuvalif i could use your help tracking down all of the multisite commits we've taken to 5.3 for testing that haven't made it upstream yet. feel free to push commits directly to this ceph:wip-rgw-multisite-reshard-reef branch regarding the fifo stuff in #48632, it's probably best to keep it separate while we're still iterating on it? |
I have added couple of commits to #48936. Will merge them to this branch after few sanity tests. Meanwhile please review the changes. Thanks! |
263e930 to
90a57a3
Compare
90a57a3 to
7a09368
Compare
|
local multisite test results:
|
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
|
once #49179 merges to main, i'll rebase this PR on top |
…ject and store it in a vector Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>
In RGWDataSyncShardCR, after acquiring the lease, reread sync status shard object to fetch the latest marker & objv stored. Signed-off-by: Soumya Koduri <skoduri@redhat.com>
…will report that it's behind the remote's max-marker even if there are no more entries to sync for each behind shard. if we get an empty listing, remove that shard from behind_shards. Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>
Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>
Also clear objv before reading the bucket sync status. Signed-off-by: Soumya Koduri <skoduri@redhat.com>
Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>
this can be useful to prevent long-lived connections from being dropped due to inactivity Fixes: https://tracker.ceph.com/issues/48402 Signed-off-by: Casey Bodley <cbodley@redhat.com>
Sticking random #defines everywhere is just atrocious style. Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Since we were taking them by reference and copying before, this is strictly better. Callers that give us an RValue can skip the copy. Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
RGWDataSyncCR manages the lock instead, holding it through StateInit and StateBuildingFullSyncMaps but releasing it by StateSync. Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
If someone else got there first, we won't smash their work. Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
The `radosgw-admin data sync init` command does *not* use `cls_version` and just overwrites. Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Don't go through the 'system object' cache. This also saves us the use of the RADOS async completion processor. Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Don't go through the 'system object' cache. This also saves us the use of the RADOS async completion processor. Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Don't go through the 'system object' cache. This also saves us the use of the RADOS async completion processor. Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Don't go through the 'system object' cache. This also saves us the use of the RADOS async completion processor. Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Lock latency in RGWContinuousLeaseCR gets high enough under load that the locks end up timing out, leading to incorrect behavior. Monitor lock latency and cut concurrent operations in half if it goes above ten seconds. Cut currency to one if it goes about twenty seconds. Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
7a09368 to
3010abd
Compare
Limited to only warn every five minutes. Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Fixes: https://tracker.ceph.com/issues/48416 bucket was passed in without bucket_id, now reading entrypoint info if needed. Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
Added test cases for the various use-cases of multisite sync policy feature listed in https://docs.ceph.com/en/latest/radosgw/multisite-sync-policy/ Signed-off-by: Soumya Koduri <skoduri@redhat.com>
3010abd to
c55c505
Compare
…s enabled/disabled Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>
…n when versioning is disabled on primary Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>
…ucket modifications Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>
Flush the marker tracker and abort if we don't still have it. Resolves: rhbz#2129718 Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
…r update failures Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>
In case if any data log entries are missing for the older generations, the sync server may not mark those shards as done and can get stuck in that old gen for ever. To avoid that, whenever a future gen entry is read, write the undone (shard,gen) entry to error repo so that it can be processed and marked as done and hence sync can progress eventually. Signed-off-by: Soumya Koduri <skoduri@redhat.com>
|
@cbodley We think this has all the things from 5.3. Do we want to put it through QA and merge it, or do we want to try doing load tests against it? |
|
@adamemerson i'd like to see it qa'd and merged. we can continue testing on the main branch |
|
we'll need to clean up the multisite functional tests. i won't block this merge due to the multisite failures, but i'd really like the tests to be green for the reef release. that way we can actually validate future multisite changes and their reef backports |
tracks multisite stabilization work that hasn't yet merged to main, so we can validate all of it together in workload testing
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windows