Skip to content

rgw: multisite stabilization for reef#48898

Merged
adamemerson merged 26 commits intomainfrom
wip-rgw-multisite-reshard-reef
Jan 19, 2023
Merged

rgw: multisite stabilization for reef#48898
adamemerson merged 26 commits intomainfrom
wip-rgw-multisite-reshard-reef

Conversation

@cbodley
Copy link
Contributor

@cbodley cbodley commented Nov 15, 2022

tracks multisite stabilization work that hasn't yet merged to main, so we can validate all of it together in workload testing

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@cbodley
Copy link
Contributor Author

cbodley commented Nov 15, 2022

@adamemerson @smanjara @soumyakoduri @yuvalif i could use your help tracking down all of the multisite commits we've taken to 5.3 for testing that haven't made it upstream yet. feel free to push commits directly to this ceph:wip-rgw-multisite-reshard-reef branch

regarding the fifo stuff in #48632, it's probably best to keep it separate while we're still iterating on it?

@soumyakoduri
Copy link
Contributor

@adamemerson @smanjara @soumyakoduri @yuvalif i could use your help tracking down all of the multisite commits we've taken to 5.3 for testing that haven't made it upstream yet. feel free to push commits directly to this ceph:wip-rgw-multisite-reshard-reef branch

regarding the fifo stuff in #48632, it's probably best to keep it separate while we're still iterating on it?

I have added couple of commits to #48936. Will merge them to this branch after few sanity tests. Meanwhile please review the changes. Thanks!

@cbodley
Copy link
Contributor Author

cbodley commented Nov 23, 2022

added commits from #43609 #47566 #47797 #48451

@cbodley
Copy link
Contributor Author

cbodley commented Nov 23, 2022

local multisite test results:

FAIL: test_multi.test_version_suspended_incremental_sync
FAIL: test_multi.test_zg_master_zone_delete
FAIL: test_multi.test_bucket_reshard_index_log_trim
Ran 54 tests in 2843.306s

FAILED (SKIP=16, failures=3)

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@cbodley
Copy link
Contributor Author

cbodley commented Dec 16, 2022

once #49179 merges to main, i'll rebase this PR on top

Shilpa Jagannath and others added 9 commits January 11, 2023 00:13
…ject and store it in a vector

Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>
In RGWDataSyncShardCR, after acquiring the lease, reread sync status
shard object to fetch the latest marker & objv stored.

Signed-off-by: Soumya Koduri <skoduri@redhat.com>
…will report that it's behind the remote's max-marker even if there are no more entries to sync for each behind shard. if we get an empty listing, remove that shard from behind_shards.

Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>
Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>
Also clear objv before reading the bucket sync status.

Signed-off-by: Soumya Koduri <skoduri@redhat.com>
Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>
this can be useful to prevent long-lived connections from being dropped
due to inactivity

Fixes: https://tracker.ceph.com/issues/48402

Signed-off-by: Casey Bodley <cbodley@redhat.com>
Sticking random #defines everywhere is just atrocious style.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Since we were taking them by reference and copying before, this is
strictly better. Callers that give us an RValue can skip the copy.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
RGWDataSyncCR manages the lock instead, holding it through StateInit
and StateBuildingFullSyncMaps but releasing it by StateSync.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
If someone else got there first, we won't smash their work.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
The `radosgw-admin data sync init` command does *not* use
`cls_version` and just overwrites.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Don't go through the 'system object' cache. This also saves us the use
of the RADOS async completion processor.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Don't go through the 'system object' cache. This also saves us the use
of the RADOS async completion processor.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Don't go through the 'system object' cache. This also saves us the use
of the RADOS async completion processor.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Don't go through the 'system object' cache. This also saves us the use
of the RADOS async completion processor.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Lock latency in RGWContinuousLeaseCR gets high enough under load that
the locks end up timing out, leading to incorrect behavior.

Monitor lock latency and cut concurrent operations in half if it goes
above ten seconds.

Cut currency to one if it goes about twenty seconds.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
@adamemerson adamemerson force-pushed the wip-rgw-multisite-reshard-reef branch from 7a09368 to 3010abd Compare January 12, 2023 23:17
adamemerson and others added 3 commits January 13, 2023 08:27
Limited to only warn every five minutes.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Fixes: https://tracker.ceph.com/issues/48416

bucket was passed in without bucket_id, now reading entrypoint
info if needed.

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
Added test cases for the various use-cases of multisite sync
policy feature listed in
https://docs.ceph.com/en/latest/radosgw/multisite-sync-policy/

Signed-off-by: Soumya Koduri <skoduri@redhat.com>
@adamemerson adamemerson force-pushed the wip-rgw-multisite-reshard-reef branch from 3010abd to c55c505 Compare January 13, 2023 13:33
Shilpa Jagannath and others added 6 commits January 13, 2023 08:51
…s enabled/disabled

Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>
…n when versioning

is disabled on primary

Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>
…ucket modifications

Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>
Flush the marker tracker and abort if we don't still have it.

Resolves: rhbz#2129718
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
…r update failures

Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>
In case if any data log entries are missing for the older
generations, the sync server may not mark those shards as done
and can get stuck in that old gen for ever.

To avoid that, whenever a future gen entry is read, write the undone (shard,gen)
entry to error repo so that it can be processed and marked as done
and hence sync can progress eventually.

Signed-off-by: Soumya Koduri <skoduri@redhat.com>
@adamemerson
Copy link
Contributor

@cbodley We think this has all the things from 5.3. Do we want to put it through QA and merge it, or do we want to try doing load tests against it?

@cbodley
Copy link
Contributor Author

cbodley commented Jan 13, 2023

@adamemerson i'd like to see it qa'd and merged. we can continue testing on the main branch

@adamemerson
Copy link
Contributor

@cbodley
Copy link
Contributor Author

cbodley commented Jan 19, 2023

we'll need to clean up the multisite functional tests. i won't block this merge due to the multisite failures, but i'd really like the tests to be green for the reef release. that way we can actually validate future multisite changes and their reef backports

@adamemerson adamemerson self-requested a review January 19, 2023 17:24
@adamemerson adamemerson merged commit e0f68a1 into main Jan 19, 2023
@adamemerson adamemerson deleted the wip-rgw-multisite-reshard-reef branch January 19, 2023 17:26
@badone badone mentioned this pull request Jan 20, 2023
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants