Skip to content

src/test: Test case for bilog trim across reshard event#44758

Closed
TRYTOBE8TME wants to merge 150 commits intoceph:wip-rgw-multisite-reshardfrom
TRYTOBE8TME:wip-rgw-bilog-tests-add
Closed

src/test: Test case for bilog trim across reshard event#44758
TRYTOBE8TME wants to merge 150 commits intoceph:wip-rgw-multisite-reshardfrom
TRYTOBE8TME:wip-rgw-bilog-tests-add

Conversation

@TRYTOBE8TME
Copy link

Resharding a bucket and then performing the bilog trim

Signed-off-by: Kalpesh Pandya kapandya@redhat.com

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

cbodley and others added 27 commits January 31, 2022 14:07
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
allows other code to spawn this coroutine without having the class
definition

Signed-off-by: Casey Bodley <cbodley@redhat.com>
RGWShardCollectCR was hard-coded to ignore ENOENT errors and print a
'failed to fetch log status' error message. this moves that logic into a
handle_result() virtual function. it also exposes the member variables
'status' and 'max_concurrent' as protected, so they can be consulted or
modified by overrides of handle_result() and spawn_next()

Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
a coroutine to initialize a bucket for full sync using a new bucket-wide
sync status object

Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
full sync happens as the bucket level, so the shards will always start
in StateIncrementalSync

Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
renamed ListBucketShardCR to ListRemoteBucketCR and removed the shard-id
parameter

renamed BucketFullSyncShardMarkerTrack to BucketFullSyncMarkerTrack,
which now updates the bucket-level rgw_bucket_sync_status

renamed BucketShardFullSyncCR to BucketFullSyncCR

BucketSyncCR now takes a bucket-wide lease during full sync

Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
if metadata sync hasn't finished, the 'bucket checkpoint' commands may
not find its bucket info

Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
the ability to filter tests by attribute is provided by the
nose.plugins.attrib plugin, which wasn't being loaded by default

Signed-off-by: Casey Bodley <cbodley@redhat.com>
this backoff is triggered often by the per-bucket lease for full sync,
and causes tests to fail with checkpoint timeouts

Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
ivancich and others added 7 commits February 1, 2022 18:50
Determining whether a bucket is indexless starting with an
RGWBucketInfo object requires traversing multiple data structures and
"inside knowledge" blurring the line between interface and
implementation. The same applies for retrieving the current index for
non-indexless buckets.

This commit adds to the RGWBucketInfo interface to make this
information readily accessible.

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
The code for bucket stats was recently updated to check for an
indexless bucket before proceeding. The interface on RGWBucketInfo was
recently expanded to support these types of checks, so it is now used.

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
The "bucket radoslist" sub-command of radosgw-admin is supposed to
list all rados objects tied to one or all directories and thereby
provide a way to determine orphaned rados objects.

But indexless buckets don't provide an index to employ for this
purpose. So warnings or errors should be provided depending on the
circumstances.

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
With the new resharding code, some bucket metadata that is stored as
xattrs (e.g., ACLs, life-cycle policies) were not sent with the
updated bucket instance data when resharding completed. As a result,
resharding has a regression where that metadata is lost after a
successful reshard.

This commit restores the variable in the RGWBucketReshard class that
maintains the bucket attributes, so they can be saved when the bucket
instance object is updated.

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
There appears to be a long-standing bug in RGW such that when
resharding is cancelled and the bucket instance is updated to reflect
the new resharding status, the xattrs were lost. The xattrs are used
to store metadata such as ACLs and LifeCycle policies.

This commit makes sure that all call paths that lead to a cancelled
reshard provide the xattrs, so they can be included when the bucket
instance info is updated.

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
use an API that does not check for cache inconsistency
hence, "WARNING: The bucket info cache is inconsistent" warnings is removed from reshard

Signed-off-by: Yuval Lifshitz <ylifshit@redhat.com>
@github-actions
Copy link

github-actions bot commented Feb 2, 2022

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

Resharding a bucket and then performing the bilog trim

Signed-off-by: Kalpesh Pandya <kapandya@redhat.com>
@cbodley
Copy link
Contributor

cbodley commented Feb 8, 2022

i think that will require radosgw-admin metadata get on the bucket instance metadata, and inspecting its array of layout.logs

i added a new radosgw-admin bucket layout --bucket=name command in #44947 to simplify this part

@adamemerson adamemerson force-pushed the wip-rgw-multisite-reshard branch from d0f01cf to fe5ea5e Compare February 9, 2022 16:35
Signed-off-by: Kalpesh Pandya <kapandya@redhat.com>
@TRYTOBE8TME
Copy link
Author

trim also has new logic to delete the old log generations entirely, so i'd like to verify that part too. i think that will require radosgw-admin metadata get on the bucket instance metadata, and inspecting its array of layout.logs

@cbodley Can you please elaborate this part or maybe can give a hint on how this can be done?

@cbodley
Copy link
Contributor

cbodley commented Feb 14, 2022

hey @TRYTOBE8TME, i'd start by creating a bucket with some objects in it, then playing around with radosgw-admin bucket layout, and see how it changes after you run radosgw-admin bucket reshard. each time you reshard, you should see an extra entry in the logs, until you hit the maximum number of logs at 4

then, once the other zone is all caught up with sync, you can run radosgw-admin bilog autotrim and see how that changes the output of radosgw-admin bucket layout. each time bilog autotrim runs, you should see one less entry the list of logs, until there's only one entry left

this is exactly what we want to write a test for; for example, do 2 reshards and verify that bucket layout shows 3 logs. then do a bucket checkpoint to wait for sync to catch up. then bilog autotrim and verify that bucket layout removes 1 of the logs, then that another bilog autotrim removes another. finally, verify that bilog autotrim doesn't remove the last log

@adamemerson
Copy link
Contributor

@TRYTOBE8TME Before you get on to anything else, can you rebase this?

@TRYTOBE8TME
Copy link
Author

TRYTOBE8TME commented Feb 23, 2022

@TRYTOBE8TME Before you get on to anything else, can you rebase this?

@adamemerson sorry, I've created a new alias PR for this #45053 and would be closing this

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@cbodley cbodley closed this Apr 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants