test/rgw: test_bucket_reshard verifies that ACLs are preserved by cbodley · Pull Request #44643 · ceph/ceph

cbodley · 2022-01-18T14:27:42Z

extends the bucket reshard test cases to also verify that ACL grants are unchanged. test fails against current wip-rgw-multisite-reshard

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox

Signed-off-by: Casey Bodley <cbodley@redhat.com>

allows other code to spawn this coroutine without having the class definition Signed-off-by: Casey Bodley <cbodley@redhat.com>

RGWShardCollectCR was hard-coded to ignore ENOENT errors and print a 'failed to fetch log status' error message. this moves that logic into a handle_result() virtual function. it also exposes the member variables 'status' and 'max_concurrent' as protected, so they can be consulted or modified by overrides of handle_result() and spawn_next() Signed-off-by: Casey Bodley <cbodley@redhat.com>

Signed-off-by: Casey Bodley <cbodley@redhat.com>

a coroutine to initialize a bucket for full sync using a new bucket-wide sync status object Signed-off-by: Casey Bodley <cbodley@redhat.com>

Signed-off-by: Casey Bodley <cbodley@redhat.com>

full sync happens as the bucket level, so the shards will always start in StateIncrementalSync Signed-off-by: Casey Bodley <cbodley@redhat.com>

Signed-off-by: Casey Bodley <cbodley@redhat.com>

renamed ListBucketShardCR to ListRemoteBucketCR and removed the shard-id parameter renamed BucketFullSyncShardMarkerTrack to BucketFullSyncMarkerTrack, which now updates the bucket-level rgw_bucket_sync_status renamed BucketShardFullSyncCR to BucketFullSyncCR BucketSyncCR now takes a bucket-wide lease during full sync Signed-off-by: Casey Bodley <cbodley@redhat.com>

Signed-off-by: Casey Bodley <cbodley@redhat.com>

if metadata sync hasn't finished, the 'bucket checkpoint' commands may not find its bucket info Signed-off-by: Casey Bodley <cbodley@redhat.com>

Signed-off-by: Casey Bodley <cbodley@redhat.com>

the ability to filter tests by attribute is provided by the nose.plugins.attrib plugin, which wasn't being loaded by default Signed-off-by: Casey Bodley <cbodley@redhat.com>

this backoff is triggered often by the per-bucket lease for full sync, and causes tests to fail with checkpoint timeouts Signed-off-by: Casey Bodley <cbodley@redhat.com>

rgw_read_bucket_inc_sync_status() uses the size of this vector as the 'num_shards', so we need to resize it appropriately beforehand Signed-off-by: Casey Bodley <cbodley@redhat.com>

all we need to construct the per-shard bucket sync status object names are the bucket names themselves, which we already have from rgw_sync_bucket_pipe Signed-off-by: Casey Bodley <cbodley@redhat.com>

if the full sync status object is missing, it's possible that we just haven't started syncing it again after upgrading from just the per-shard status objects in this case, as long as we have a log generation 0, assume that we just haven't initialized the full status object and try to read the gen=0 per-shard incremental status for comparison Signed-off-by: Casey Bodley <cbodley@redhat.com>

if the remote gives us more shards than we expect, just count those shards as 'behind' and avoid out-of-bounds access of shard_status Signed-off-by: Casey Bodley <cbodley@redhat.com>

Signed-off-by: Shilpa Jagannath <smanjara@redhat.com> Signed-off-by: Casey Bodley <cbodley@redhat.com>

Signed-off-by: Casey Bodley <cbodley@redhat.com>

enable the background dynamic resharding thread based on RGWSI_Zone::can_reshard(), which takes the zonegroup features into account Fixes: https://tracker.ceph.com/issues/52877 Signed-off-by: Casey Bodley <cbodley@redhat.com>

when data sync queries RGWOp_BILog_Info from an un-upgraded gateway, it doesn't include the oldest_gen/latest_gen fields. so initialize these variables to 0 by default Signed-off-by: Casey Bodley <cbodley@redhat.com>

Signed-off-by: Casey Bodley <cbodley@redhat.com>

RGWDataSyncSingleEntryCR is the only caller of RGWRunBucketSourcesSyncCR it always provides a source_bs, and never provides a target_bs. so remove all the complexity related to target_bs, and the idea that we'd need to sync several source bucket shards related to the target bucket we now just have the single loop over the target buckets that use the given bucket as a source Signed-off-by: Casey Bodley <cbodley@redhat.com>

This reverts commit c0baf3e. Signed-off-by: Casey Bodley <cbodley@redhat.com> Conflicts: src/rgw/rgw_data_sync.cc no longer loops over num_shards

This reverts commit 7970f35. Signed-off-by: Casey Bodley <cbodley@redhat.com>

we run bucket sync on each of the sync pipes, so size the vector accordingly Signed-off-by: Casey Bodley <cbodley@redhat.com>

if bucket sync is disabled, apply that flag to new index objects on bucket reshard Signed-off-by: Casey Bodley <cbodley@redhat.com>

this is happening when resharding while objects are uploaded tests steps are here: https://gist.github.com/yuvalif/060f66f03511bff881e952287df3087b Signed-off-by: Yuval Lifshitz <ylifshit@redhat.com>

The new bucket layout code didn't check whether the bucket is indexless prior to asking for the last entry in the layout log. The layout log appears to be empty for an indexless bucket, thereby putting the runtime in an undefined state that later may cause a failed assertion. This commit adds two safety checks and returns -EINVAL along with putting useful information on stderr when either stats are requested on an indexless bucket or when the layout log is empty. Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>

github-actions · 2022-01-20T17:42:35Z

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

…m bi As one of the steps in `radosgw-admin bucket check --fix ...` it looks for bucket index entries for incomplete multipart uploads that do not have a corresponding ".meta" entry in the same bucket index. It then intends to delete those entries, however the function that it calls to perform the bucket index deletions was flawed and did not direct the removals to the appropriate shard(s), but instead a non-existant oid. This commit determines the appropriate shard for each of the entries to be removed and asynchronously issues "dir suggest changes" to each of the shards to remove the entries. Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>

without that the following errors are happening during sync: ERROR: AWS4 completion for operation: 0, NOT IMPLEMENTED op->ERRORHANDLER: err_no=-2201 new_err_no=-2201 Signed-off-by: Yuval Lifshitz <ylifshit@redhat.com>

Signed-off-by: Casey Bodley <cbodley@redhat.com>

github-actions · 2022-02-02T03:15:14Z

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

cbodley added rgw tests labels Jan 18, 2022

cbodley mentioned this pull request Jan 18, 2022

rgw multisite: bucket reshard work in progress #39002

Merged

31 tasks

dang approved these changes Jan 19, 2022

View reviewed changes

cbodley and others added 26 commits January 20, 2022 11:11

rgw: RGWSimpleRadosReadCR copies out objv_tracker

751a37d

Signed-off-by: Casey Bodley <cbodley@redhat.com>

rgw: remove unused RGWRunBucketsSyncBySourceCR

b8d16a2

Signed-off-by: Casey Bodley <cbodley@redhat.com>

rgw: rename to rgw_read_bucket_inc_sync_status

615c887

Signed-off-by: Casey Bodley <cbodley@redhat.com>

rgw: add data structures for bucket sync status

817ac42

Signed-off-by: Casey Bodley <cbodley@redhat.com>

rgw: rename to inc_status_oid

67029ef

Signed-off-by: Casey Bodley <cbodley@redhat.com>

rgw: use const for string constants in rgw_data_sync.cc

1347e8f

Signed-off-by: Casey Bodley <cbodley@redhat.com>

rgw: add full_status_oid() for buckets

7c9d2bc

Signed-off-by: Casey Bodley <cbodley@redhat.com>

rgw: rename to RGWSyncBucketShardCR

60f08c0

Signed-off-by: Casey Bodley <cbodley@redhat.com>

rgw: add sync_bucket_shard_cr() factory function

ee41898

allows other code to spawn this coroutine without having the class definition Signed-off-by: Casey Bodley <cbodley@redhat.com>

rgw: add exclusive flag to RGWSimpleRadosWriteCR

01b5b4b

Signed-off-by: Casey Bodley <cbodley@redhat.com>

rgw: system objects can set exclusive on set_attrs()

d7764b1

Signed-off-by: Casey Bodley <cbodley@redhat.com>

rgw: RGWSimpleRadosWriteAttrsCR supports exclusive create

c1255b5

Signed-off-by: Casey Bodley <cbodley@redhat.com>

rgw: add InitBucketFullSyncStatusCR

9c99cbc

a coroutine to initialize a bucket for full sync using a new bucket-wide sync status object Signed-off-by: Casey Bodley <cbodley@redhat.com>

rgw: split SyncBucket from SyncBucketShard

3b69b5f

Signed-off-by: Casey Bodley <cbodley@redhat.com>

rgw: InitBucketShardSyncStatus always sets state to Incremental

f81eb91

full sync happens as the bucket level, so the shards will always start in StateIncrementalSync Signed-off-by: Casey Bodley <cbodley@redhat.com>

rgw: add rgw_read_bucket_full_sync_status()

3cf5380

Signed-off-by: Casey Bodley <cbodley@redhat.com>

radosgw-admin: 'bucket sync status' displays new full sync status

fb40a06

Signed-off-by: Casey Bodley <cbodley@redhat.com>

radosgw-admin: 'bucket sync checkpoint' waits for full sync

2e8c937

Signed-off-by: Casey Bodley <cbodley@redhat.com>

rgw: fix up BucketShardIncrementalSync log message

0b7a46e

Signed-off-by: Casey Bodley <cbodley@redhat.com>

rgw: RGWSyncBucketCR holds lease over Init state too

4a7a93d

Signed-off-by: Casey Bodley <cbodley@redhat.com>

qa/rgw: add missing meta checkpoint to test_multipart_object_sync

bee4cc6

if metadata sync hasn't finished, the 'bucket checkpoint' commands may not find its bucket info Signed-off-by: Casey Bodley <cbodley@redhat.com>

qa/rgw: disable multisite tests for 'bucket sync disable'

1dd34e3

Signed-off-by: Casey Bodley <cbodley@redhat.com>

qa/rgw: rgw_multisite_tests task loads default plugins

25e2219

the ability to filter tests by attribute is provided by the nose.plugins.attrib plugin, which wasn't being loaded by default Signed-off-by: Casey Bodley <cbodley@redhat.com>

rgw: disable backoff on data sync error_retry_time

b11fbc6

this backoff is triggered often by the per-bucket lease for full sync, and causes tests to fail with checkpoint timeouts Signed-off-by: Casey Bodley <cbodley@redhat.com>

cbodley and others added 16 commits January 20, 2022 11:12

rgw: resize status vector before reading inc_sync_status

5933d74

rgw_read_bucket_inc_sync_status() uses the size of this vector as the 'num_shards', so we need to resize it appropriately beforehand Signed-off-by: Casey Bodley <cbodley@redhat.com>

rgw: rgw_read_bucket_inc_sync_status doesn't need bucket info

0905130

all we need to construct the per-shard bucket sync status object names are the bucket names themselves, which we already have from rgw_sync_bucket_pipe Signed-off-by: Casey Bodley <cbodley@redhat.com>

radosgw-admin: bucket sync status guards against shard count mismatch

44e6cf8

if the remote gives us more shards than we expect, just count those shards as 'behind' and avoid out-of-bounds access of shard_status Signed-off-by: Casey Bodley <cbodley@redhat.com>

rgw: remove per-shard sync status object after incremental sync finishes

38a6671

Signed-off-by: Shilpa Jagannath <smanjara@redhat.com> Signed-off-by: Casey Bodley <cbodley@redhat.com>

rgw: prevent reshard from creating too many log generations

82b451c

Signed-off-by: Casey Bodley <cbodley@redhat.com>

rgw: enable RGWReshard thread on any zone that supports it

0c3e33b

enable the background dynamic resharding thread based on RGWSI_Zone::can_reshard(), which takes the zonegroup features into account Fixes: https://tracker.ceph.com/issues/52877 Signed-off-by: Casey Bodley <cbodley@redhat.com>

rgw: fix for uninitialized oldest_gen/latest_gen

3735992

when data sync queries RGWOp_BILog_Info from an un-upgraded gateway, it doesn't include the oldest_gen/latest_gen fields. so initialize these variables to 0 by default Signed-off-by: Casey Bodley <cbodley@redhat.com>

radosgw-admin: allow reshard commands in multisite on secondary

977cd07

Signed-off-by: Casey Bodley <cbodley@redhat.com>

Revert "rgw: bucket sync: track progress by stack id"

7947307

This reverts commit c0baf3e. Signed-off-by: Casey Bodley <cbodley@redhat.com> Conflicts: src/rgw/rgw_data_sync.cc no longer loops over num_shards

Revert "rgw: cr: add prealloc_stack()"

fed7c39

This reverts commit 7970f35. Signed-off-by: Casey Bodley <cbodley@redhat.com>

rgw/multisite: handle shard_progress correctly in RunBucketSources

6034078

we run bucket sync on each of the sync pipes, so size the vector accordingly Signed-off-by: Casey Bodley <cbodley@redhat.com>

rgw: preserve 'bucket sync disable' over reshard

81db61c

if bucket sync is disabled, apply that flag to new index objects on bucket reshard Signed-off-by: Casey Bodley <cbodley@redhat.com>

rgw: fix reshard cancelling race condition

46217c1

this is happening when resharding while objects are uploaded tests steps are here: https://gist.github.com/yuvalif/060f66f03511bff881e952287df3087b Signed-off-by: Yuval Lifshitz <ylifshit@redhat.com>

adamemerson force-pushed the wip-rgw-multisite-reshard branch from 9ac3812 to b930a6e Compare January 20, 2022 17:21

cbodley requested a review from a team as a code owner January 20, 2022 17:21

github-actions bot added the needs-rebase label Jan 20, 2022

ivancich and others added 3 commits January 24, 2022 12:16

rgw/multisite: add type to RGW_OP_SYNC_DATALOG_NOTIFY2

8e192db

without that the following errors are happening during sync: ERROR: AWS4 completion for operation: 0, NOT IMPLEMENTED op->ERRORHANDLER: err_no=-2201 new_err_no=-2201 Signed-off-by: Yuval Lifshitz <ylifshit@redhat.com>

test/rgw: test_bucket_reshard verifies that ACLs are preserved

d31cb8c

Signed-off-by: Casey Bodley <cbodley@redhat.com>

cbodley force-pushed the wip-rgw-multisite-reshard-test-acls branch from 8ba1cb1 to d31cb8c Compare January 24, 2022 19:31

github-actions bot removed the needs-rebase label Jan 24, 2022

cbodley mentioned this pull request Jan 26, 2022

rgw: maintain bucket instance xattrs during successful & cancelled reshard #44788

Closed

14 tasks

adamemerson force-pushed the wip-rgw-multisite-reshard branch from b6342d0 to 8800bf7 Compare February 2, 2022 00:10

github-actions bot added the needs-rebase label Feb 2, 2022

cbodley closed this Feb 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test/rgw: test_bucket_reshard verifies that ACLs are preserved#44643

test/rgw: test_bucket_reshard verifies that ACLs are preserved#44643
cbodley wants to merge 141 commits intoceph:wip-rgw-multisite-reshardfrom
cbodley:wip-rgw-multisite-reshard-test-acls

cbodley commented Jan 18, 2022

Uh oh!

github-actions bot commented Jan 20, 2022

Uh oh!

github-actions bot commented Feb 2, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

cbodley commented Jan 18, 2022

Uh oh!

github-actions bot commented Jan 20, 2022

Uh oh!

github-actions bot commented Feb 2, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants