Skip to content

test/rgw: test_bucket_reshard verifies that ACLs are preserved#44643

Closed
cbodley wants to merge 141 commits intoceph:wip-rgw-multisite-reshardfrom
cbodley:wip-rgw-multisite-reshard-test-acls
Closed

test/rgw: test_bucket_reshard verifies that ACLs are preserved#44643
cbodley wants to merge 141 commits intoceph:wip-rgw-multisite-reshardfrom
cbodley:wip-rgw-multisite-reshard-test-acls

Conversation

@cbodley
Copy link
Contributor

@cbodley cbodley commented Jan 18, 2022

extends the bucket reshard test cases to also verify that ACL grants are unchanged. test fails against current wip-rgw-multisite-reshard

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

cbodley and others added 26 commits January 20, 2022 11:11
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
allows other code to spawn this coroutine without having the class
definition

Signed-off-by: Casey Bodley <cbodley@redhat.com>
RGWShardCollectCR was hard-coded to ignore ENOENT errors and print a
'failed to fetch log status' error message. this moves that logic into a
handle_result() virtual function. it also exposes the member variables
'status' and 'max_concurrent' as protected, so they can be consulted or
modified by overrides of handle_result() and spawn_next()

Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
a coroutine to initialize a bucket for full sync using a new bucket-wide
sync status object

Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
full sync happens as the bucket level, so the shards will always start
in StateIncrementalSync

Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
renamed ListBucketShardCR to ListRemoteBucketCR and removed the shard-id
parameter

renamed BucketFullSyncShardMarkerTrack to BucketFullSyncMarkerTrack,
which now updates the bucket-level rgw_bucket_sync_status

renamed BucketShardFullSyncCR to BucketFullSyncCR

BucketSyncCR now takes a bucket-wide lease during full sync

Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
if metadata sync hasn't finished, the 'bucket checkpoint' commands may
not find its bucket info

Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
the ability to filter tests by attribute is provided by the
nose.plugins.attrib plugin, which wasn't being loaded by default

Signed-off-by: Casey Bodley <cbodley@redhat.com>
this backoff is triggered often by the per-bucket lease for full sync,
and causes tests to fail with checkpoint timeouts

Signed-off-by: Casey Bodley <cbodley@redhat.com>
cbodley and others added 16 commits January 20, 2022 11:12
rgw_read_bucket_inc_sync_status() uses the size of this vector as the
'num_shards', so we need to resize it appropriately beforehand

Signed-off-by: Casey Bodley <cbodley@redhat.com>
all we need to construct the per-shard bucket sync status object names
are the bucket names themselves, which we already have from
rgw_sync_bucket_pipe

Signed-off-by: Casey Bodley <cbodley@redhat.com>
if the full sync status object is missing, it's possible that we just
haven't started syncing it again after upgrading from just the per-shard
status objects

in this case, as long as we have a log generation 0, assume that we just
haven't initialized the full status object and try to read the gen=0
per-shard incremental status for comparison

Signed-off-by: Casey Bodley <cbodley@redhat.com>
if the remote gives us more shards than we expect, just count those
shards as 'behind' and avoid out-of-bounds access of shard_status

Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
enable the background dynamic resharding thread based on
RGWSI_Zone::can_reshard(), which takes the zonegroup features into
account

Fixes: https://tracker.ceph.com/issues/52877

Signed-off-by: Casey Bodley <cbodley@redhat.com>
when data sync queries RGWOp_BILog_Info from an un-upgraded gateway, it
doesn't include the oldest_gen/latest_gen fields. so initialize these
variables to 0 by default

Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
RGWDataSyncSingleEntryCR is the only caller of RGWRunBucketSourcesSyncCR

it always provides a source_bs, and never provides a target_bs. so remove
all the complexity related to target_bs, and the idea that we'd need to
sync several source bucket shards related to the target bucket

we now just have the single loop over the target buckets that use the
given bucket as a source

Signed-off-by: Casey Bodley <cbodley@redhat.com>
This reverts commit c0baf3e.

Signed-off-by: Casey Bodley <cbodley@redhat.com>

Conflicts:
	src/rgw/rgw_data_sync.cc no longer loops over num_shards
This reverts commit 7970f35.

Signed-off-by: Casey Bodley <cbodley@redhat.com>
we run bucket sync on each of the sync pipes, so size the vector
accordingly

Signed-off-by: Casey Bodley <cbodley@redhat.com>
if bucket sync is disabled, apply that flag to new index objects on
bucket reshard

Signed-off-by: Casey Bodley <cbodley@redhat.com>
this is happening when resharding while objects are uploaded
tests steps are here:
https://gist.github.com/yuvalif/060f66f03511bff881e952287df3087b

Signed-off-by: Yuval Lifshitz <ylifshit@redhat.com>
The new bucket layout code didn't check whether the bucket is
indexless prior to asking for the last entry in the layout log. The
layout log appears to be empty for an indexless bucket, thereby
putting the runtime in an undefined state that later may cause a
failed assertion.

This commit adds two safety checks and returns -EINVAL along with
putting useful information on stderr when either stats are requested
on an indexless bucket or when the layout log is empty.

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
@adamemerson adamemerson force-pushed the wip-rgw-multisite-reshard branch from 9ac3812 to b930a6e Compare January 20, 2022 17:21
@cbodley cbodley requested a review from a team as a code owner January 20, 2022 17:21
@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

ivancich and others added 3 commits January 24, 2022 12:16
…m bi

As one of the steps in `radosgw-admin bucket check --fix ...` it looks
for bucket index entries for incomplete multipart uploads that do not
have a corresponding ".meta" entry in the same bucket index. It then
intends to delete those entries, however the function that it calls
to perform the bucket index deletions was flawed and did not direct
the removals to the appropriate shard(s), but instead a non-existant
oid.

This commit determines the appropriate shard for each of the entries
to be removed and asynchronously issues "dir suggest changes" to each
of the shards to remove the entries.

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
without that the following errors are happening during sync:

ERROR: AWS4 completion for operation: 0, NOT IMPLEMENTED
op->ERRORHANDLER: err_no=-2201 new_err_no=-2201

Signed-off-by: Yuval Lifshitz <ylifshit@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
@github-actions
Copy link

github-actions bot commented Feb 2, 2022

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@cbodley cbodley closed this Feb 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants