Skip to content

[WIP] rgw: add generation number to async notification#39477

Closed
smanjara wants to merge 83 commits intoceph:wip-rgw-multisite-reshardfrom
smanjara:wip-rgw-multisite-reshard-async
Closed

[WIP] rgw: add generation number to async notification#39477
smanjara wants to merge 83 commits intoceph:wip-rgw-multisite-reshardfrom
smanjara:wip-rgw-multisite-reshard-async

Conversation

@smanjara
Copy link
Contributor

Signed-off-by: Shilpa Jagannath smanjara@redhat.com

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

cbodley and others added 30 commits February 4, 2021 16:11
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
allows other code to spawn this coroutine without having the class
definition

Signed-off-by: Casey Bodley <cbodley@redhat.com>
RGWShardCollectCR was hard-coded to ignore ENOENT errors and print a
'failed to fetch log status' error message. this moves that logic into a
handle_result() virtual function. it also exposes the member variables
'status' and 'max_concurrent' as protected, so they can be consulted or
modified by overrides of handle_result() and spawn_next()

Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
a coroutine to initialize a bucket for full sync using a new bucket-wide
sync status object

Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
full sync happens as the bucket level, so the shards will always start
in StateIncrementalSync

Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
renamed ListBucketShardCR to ListRemoteBucketCR and removed the shard-id
parameter

renamed BucketFullSyncShardMarkerTrack to BucketFullSyncMarkerTrack,
which now updates the bucket-level rgw_bucket_sync_status

renamed BucketShardFullSyncCR to BucketFullSyncCR

BucketSyncCR now takes a bucket-wide lease during full sync

Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
if metadata sync hasn't finished, the 'bucket checkpoint' commands may
not find its bucket info

Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
the ability to filter tests by attribute is provided by the
nose.plugins.attrib plugin, which wasn't being loaded by default

Signed-off-by: Casey Bodley <cbodley@redhat.com>
this backoff is triggered often by the per-bucket lease for full sync,
and causes tests to fail with checkpoint timeouts

Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>
adds a backward-compatible binary encoding for error repo keys that can
contain a generation number along with the bucket and shard

Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
@cbodley
Copy link
Contributor

cbodley commented Apr 15, 2021

sorry, i had review comments pending but didn't submit the review!

@smanjara smanjara force-pushed the wip-rgw-multisite-reshard-async branch 2 times, most recently from 12a932b to b90bc8d Compare April 21, 2021 08:15
@mattbenjamin
Copy link
Contributor

@smanjara @cbodley what's the current status on this one?

@smanjara
Copy link
Contributor Author

@smanjara @cbodley what's the current status on this one?

Have trouble building. Should be fixed shortly.

if the old index is still referenced by an InIndex log layout, we can't
call clean_index() to remove the index objects yet. log trimming will do
that later, once the bilogs are no longer needed

Signed-off-by: Casey Bodley <cbodley@redhat.com>
@smanjara
Copy link
Contributor Author

smanjara commented May 3, 2021

@smanjara @cbodley what's the current status on this one?

Have trouble building. Should be fixed shortly.

This is done now. Need review.

Copy link
Contributor

@cbodley cbodley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a lot of code duplicated between RGWOp_DATALog_Notify and Notify2 that could use some cleanup, but that doesn't need to happen now

have you been able to test this? you should be able to run the backward compat logic of RGWDataPostNotifyCR by commenting out support for notify2, like:

//    } else if (s->info.args.exists("notify2")) {
//      return new RGWOp_DATALog_Notify2;

@smanjara smanjara force-pushed the wip-rgw-multisite-reshard-async branch from 971ed53 to 998d434 Compare May 5, 2021 14:03
@smanjara
Copy link
Contributor Author

smanjara commented May 5, 2021

there's a lot of code duplicated between RGWOp_DATALog_Notify and Notify2 that could use some cleanup, but that doesn't need to happen now

have you been able to test this? you should be able to run the backward compat logic of RGWDataPostNotifyCR by commenting out support for notify2, like:

//    } else if (s->info.args.exists("notify2")) {
//      return new RGWOp_DATALog_Notify2;

will test and update.

@smanjara smanjara force-pushed the wip-rgw-multisite-reshard-async branch from 998d434 to f4e6bf3 Compare May 6, 2021 07:51
@smanjara
Copy link
Contributor Author

smanjara commented May 7, 2021

there's a lot of code duplicated between RGWOp_DATALog_Notify and Notify2 that could use some cleanup, but that doesn't need to happen now
have you been able to test this? you should be able to run the backward compat logic of RGWDataPostNotifyCR by commenting out support for notify2, like:

//    } else if (s->info.args.exists("notify2")) {
//      return new RGWOp_DATALog_Notify2;

will test and update.

Tested locally and it seems to work as expected. Some logs from RGWOp_DATALog_Notify::execute() https://gist.github.com/smanjara/098a10872dd95ddc3423746e9a65e9e2

@smanjara smanjara force-pushed the wip-rgw-multisite-reshard-async branch from f4e6bf3 to 0ef4de4 Compare May 7, 2021 06:36
@smanjara smanjara force-pushed the wip-rgw-multisite-reshard-async branch from 0ef4de4 to 7d2fff2 Compare May 10, 2021 05:51
@smanjara smanjara force-pushed the wip-rgw-multisite-reshard-async branch from 7d2fff2 to 5e4f514 Compare May 11, 2021 09:38
@adamemerson adamemerson force-pushed the wip-rgw-multisite-reshard branch from 36c2959 to 95355ab Compare May 12, 2021 16:05
@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

cbodley and others added 2 commits May 17, 2021 13:22
this adds wrapper structs rgw_data_notify_v1_encoder and
rgw_data_notify_v1_decoder that can encode/decode the v1 json format
directly on the v2 data structure

Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>
@smanjara smanjara force-pushed the wip-rgw-multisite-reshard-async branch from 5e4f514 to fd82752 Compare May 17, 2021 07:53
@github-actions github-actions bot added the tests label May 17, 2021
@smanjara
Copy link
Contributor Author

@cbodley I think this is ready to be merged now.

@cbodley
Copy link
Contributor

cbodley commented May 17, 2021

thanks @smanjara! i rebased and merged into wip-rgw-multisite-reshard

@cbodley cbodley closed this May 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants