Skip to content

rgw/multisite-notification: Replicate Topic and Bucket notification in multisite config#54868

Closed
kchheda3 wants to merge 6 commits intoceph:mainfrom
kchheda3:wip-topic-repl
Closed

rgw/multisite-notification: Replicate Topic and Bucket notification in multisite config#54868
kchheda3 wants to merge 6 commits intoceph:mainfrom
kchheda3:wip-topic-repl

Conversation

@kchheda3
Copy link
Contributor

@kchheda3 kchheda3 commented Dec 11, 2023

Implement new notification mechanism that supports multisite replication.
For detail design about the new system and migration plan, check the following doc.

The topics are stored as individual rados object and have buckets_and_notification_id stored in them. And bucket notifications are stored as BUCKET_ATTRS with topic information stored in it. Both the info are synced in multisite env as part of metadata sync.
Also add a new feature that stores the mapping between topic and buckets, so it helps from the readability perspective for the user to know all the buckets that given topic is subscribed.

Fixes: https://tracker.ceph.com/issues/50078

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@kchheda3 kchheda3 marked this pull request as ready for review December 11, 2023 21:22
@kchheda3 kchheda3 requested a review from a team as a code owner December 11, 2023 21:22
@kchheda3 kchheda3 force-pushed the wip-topic-repl branch 3 times, most recently from bb1305d to 4691032 Compare December 12, 2023 01:12
@yuvalif
Copy link
Contributor

yuvalif commented Dec 12, 2023

please split the data model redesign work from the multisite syncing work.
ideally this is done is a separate PR - so we can add the migration work on top of that

@kchheda3

This comment was marked as resolved.

@cbodley
Copy link
Contributor

cbodley commented Dec 12, 2023

i tend to agree with @kchheda3 that we shouldn't try to split the data model from its interactions with metadata sync. the metadata handler abstraction is critical there, and imposes some requirements on how we represent the data in the first place

i also don't see a need for separate zonegroup features. multisite could automatically try to replicate any topics that are in the new format. the feature would just control whether any radosgws/zones can write metadata in that new format

finally, pointing out that we'll see conflicts between this and #54777, where i try to remove/simplify some of these metadata abstractions. i'm happy to rebase that after we get this tested/merged

@yuvalif
Copy link
Contributor

yuvalif commented Dec 12, 2023

i also don't see a need for separate zonegroup features. multisite could automatically try to replicate any topics that are in the new format. the feature would just control whether any radosgws/zones can write metadata in that new format

it is the other direction of dependency I'm worried about. we want to change the data model of topics and notifications to a more correct one, and to a one that scale better. this should be done, and is unrelated to metadata sync.
the new model should be used in single zone setups and should be be depended with zone features

@kchheda3

This comment was marked as resolved.

@kchheda3

This comment was marked as resolved.

@kchheda3
Copy link
Contributor Author

happy to rebase that after we get this tested/merged

Thanks @cbodley,i see lot of rebase and merge due to #54777, but appreciate if you merge that PR once this one is merged.

Copy link
Contributor

@cbodley cbodley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

several minor comments but looking great overall. the metadata handler/backend stuff looks good, and the topic service is consistent with other examples 👍

still want to revisit the need to track buckets per topic

@cbodley
Copy link
Contributor

cbodley commented Jan 19, 2024

regarding the fallback logic to read v1 if v2 fails, that logic should apply when the notification-version value is migration. If the value is either v1 or v2, then we should continue reading accordingly. and when the value is migration we could try reading v2 and fallback to v1

this should be how notifications are migrated (lazily after topic migration is done)

@yuvalif does the notification migration have to be lazy? it seems like multisite setups would benefit from the migration happening on the primary zone right away, so that events on the secondary zone(s) would start generating notifications

@yuvalif
Copy link
Contributor

yuvalif commented Jan 21, 2024

regarding the fallback logic to read v1 if v2 fails, that logic should apply when the notification-version value is migration. If the value is either v1 or v2, then we should continue reading accordingly. and when the value is migration we could try reading v2 and fallback to v1

this should be how notifications are migrated (lazily after topic migration is done)

@yuvalif does the notification migration have to be lazy? it seems like multisite setups would benefit from the migration happening on the primary zone right away, so that events on the secondary zone(s) would start generating notifications

this would require going through all buckets, and may take significat time. do we want to block the system during that process?

@yuvalif yuvalif mentioned this pull request Jan 21, 2024
14 tasks
@cbodley
Copy link
Contributor

cbodley commented Jan 22, 2024

this would require going through all buckets, and may take significat time. do we want to block the system during that process?

@yuvalif we don't have to visit every bucket. we can list only the bucket/topic objects in the log pool that need migration. those have oids corresponding to topics_oid():

std::string RadosBucket::topics_oid() const {
  return pubsub_oid_prefix + get_tenant() + ".bucket." + get_name() + "/" + get_marker();
}

we could either do a global listing of everything under pubsub_oid_prefix, or do a separate per-topic listing with the entire pubsub_oid_prefix + get_tenant() + ".bucket." + get_name() + "/" part as prefix

@yuvalif
Copy link
Contributor

yuvalif commented Jan 22, 2024

this would require going through all buckets, and may take significat time. do we want to block the system during that process?

@yuvalif we don't have to visit every bucket. we can list only the bucket/topic objects in the log pool that need migration. those have oids corresponding to topics_oid():

std::string RadosBucket::topics_oid() const {
  return pubsub_oid_prefix + get_tenant() + ".bucket." + get_name() + "/" + get_marker();
}

we could either do a global listing of everything under pubsub_oid_prefix,

  • this could result with large number of object, and we will have to set the bucket attribute for each one of them. this might block the load process for relatively long tie
  • we will have to do this search on every reload (after migration). won't this take noticeable time?

or do a separate per-topic listing with the entire pubsub_oid_prefix + get_tenant() + ".bucket." + get_name() + "/" part as prefix

but this name has the bucket's name in it, so won't you need to go though all buckets for that?

@cbodley
Copy link
Contributor

cbodley commented Jan 22, 2024

we could either do a global listing of everything under pubsub_oid_prefix,

  • this could result with large number of object, and we will have to set the bucket attribute for each one of them. this might block the load process for relatively long tie

  • we will have to do this search on every reload (after migration). won't this take noticeable time?

@yuvalif only if we're in the 'migrating' state where the global v1 topics object exists. and after migrating each bucket, we can delete its v1 bucket-topics object so we won't revisit it again if the migrating radosgw restarts

or do a separate per-topic listing with the entire pubsub_oid_prefix + get_tenant() + ".bucket." + get_name() + "/" part as prefix

but this name has the bucket's name in it, so won't you need to go though all buckets for that?

sorry right, get_name() is the bucket's name, not the topic's. we would have to do the listing globally

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@yuvalif
Copy link
Contributor

yuvalif commented Jan 25, 2024

summary from discussion in the RGW refactoring meeting: video

@AliMasarweh, @cbodley, @kchheda3 please review.

  • "migrating" state will be derived implicitly from enabling of the topics v2 feature together with the existence of the v1 topics system object
  • migration will be perfomed on the master zone (based on being in the "migrating" state) during restart and/or realm reload
  • one RGW in that zone would perform the migration by using a cls_lock on the v1 topics object
  • during "migrating" state all topic and notification REST operations will be rejected with "503"
  • RGWs on 2ndary zones will be forwarding and topic/notification requests to the master zone, and will be rejected there
  • sending notifications should not be interrupted by the migration process
  • migration will be perfomed both on topics and notifications (no lazy migration for notifications - explaned in the video above):
    • for each tenant
    • prefix search of objects that start with pubsub.<tenant>...
    • the topics objects will be broken down to the per topic objects of v2
    • the notification objects (that star with: pubsub.<tenant>.bucket...) they will be converted to bucket attributes, and then deleted
    • at the end of the process the v1 topics object should be deleted, moving the system out of the "migrating" state

handling of topic and notification on 2ndary zones will be done via documentation (we may provide a tool that does that post squid):

  • in theory, the same configuration should exists on all zones (otherwise we can have inconsistencies in the system regardless of the migration)
  • if for some reason this is not the case, there should be a documented manual process for moving v1 topics and notifications from the 2ndary zone to the master zone (before enabling v2)
  • eventually, the system objects should be deleted once the migration ends and metadata sync ends. these objects will have to be deleted manually (at rados level). this should be done for cleanup, and to make sure that if a 2ndary zone becomes master, it does not accidently go into "migrating" state
  • the RGW that does the migration o nthe master zone should not try top perform migration of topics and notifications from other zones

@kchheda3
Copy link
Contributor Author

  • sending notifications should not be interrupted by the migration process

@yuvalif looks good, the only question or elaboration needed is on sending notifications should not be interrupted by the migration process, we might need additional code which will check if the flag value == migration then look up for topic in v2 and if not found look in v1 ?

@yuvalif
Copy link
Contributor

yuvalif commented Jan 28, 2024

  • sending notifications should not be interrupted by the migration process

@yuvalif looks good, the only question or elaboration needed is on sending notifications should not be interrupted by the migration process, we might need additional code which will check if the flag value == migration then look up for topic in v2 and if not found look in v1 ?

not sure we should check for the "migration" state here.
when we have an s3 op on the bucket, we check for the bucket attribute. if we don't find it, we check for the v1 configuration object. if the object exists we should use it.
the per bucket system object may be there for several reasons:

  • we are in the process of migration and didn't reach this bucket yet
  • we could be on a 2ndary zone, and the bucket attribute was not synced yet
  • we can be on a 2ndary zone and notifications were never configured on this bucket on the primary zone (this is probably undesireable, but would require manual cleanup anyway)

also, on a 2ndary zone, it won't be possible to detect the "migration" state, because it involves checking on the existence of the v1 topics object, which would be automatically deleted only on the primary zone

kchheda3 and others added 6 commits February 1, 2024 15:54
…n multisite config

Signed-off-by: kchheda3 <kchheda3@bloomberg.net>
…ions in multisite config.

Signed-off-by: kchheda3 <kchheda3@bloomberg.net>
…een bucket and topics.

Signed-off-by: kchheda3 <kchheda3@bloomberg.net>
…ions in multisite config.

Signed-off-by: kchheda3 <kchheda3@bloomberg.net>
both locally and in teuthology

Signed-off-by: Yuval Lifshitz <ylifshit@redhat.com>
v1 could be enabled only in local tests. teuthology tests would run with v2

Signed-off-by: Yuval Lifshitz <ylifshit@redhat.com>
@kchheda3
Copy link
Contributor Author

kchheda3 commented Feb 1, 2024

@cbodley i have made changed to pull SiteConfig and run over all zonegroups to confirm the v2 flag is turned on.
Do take a look again.

@kchheda3 kchheda3 requested a review from cbodley February 1, 2024 22:24
@cbodley
Copy link
Contributor

cbodley commented Feb 5, 2024

@cbodley i have made changed to pull SiteConfig and run over all zonegroups to confirm the v2 flag is turned on. Do take a look again.

thanks @kchheda3, i rebased #55152 on top of the latest here, and included 4 new commits related to SiteConfig

the main issue there was that SiteConfig::get_period() returns an optional period, because you only have a period if you're in a realm. when the period is empty, we have to check the local zonegroup instead. so i added a generic rgw::all_zonegroups_support() function that handles both cases, then made sure that all call paths had a SiteConfig to pass in

can you take a look at those new commits?

@kchheda3
Copy link
Contributor Author

kchheda3 commented Feb 6, 2024

@cbodley i have made changed to pull SiteConfig and run over all zonegroups to confirm the v2 flag is turned on. Do take a look again.

thanks @kchheda3, i rebased #55152 on top of the latest here, and included 4 new commits related to SiteConfig

the main issue there was that SiteConfig::get_period() returns an optional period, because you only have a period if you're in a realm. when the period is empty, we have to check the local zonegroup instead. so i added a generic rgw::all_zonegroups_support() function that handles both cases, then made sure that all call paths had a SiteConfig to pass in

can you take a look at those new commits?

looks good. generic rgw::all_zonegroups_support() makes sense !
also tested the changes and all the integration tests pass as well.

@cbodley
Copy link
Contributor

cbodley commented Mar 8, 2024

merged in #55657, thanks!

@cbodley cbodley closed this Mar 8, 2024
@kchheda3 kchheda3 deleted the wip-topic-repl branch March 8, 2024 18:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants