Skip to content

RGW: Migrate topics to data path v2#55214

Merged
yuvalif merged 3 commits intoceph:mainfrom
AliMasarweh:wip-alimasa-notif-data-path-v2
Apr 3, 2024
Merged

RGW: Migrate topics to data path v2#55214
yuvalif merged 3 commits intoceph:mainfrom
AliMasarweh:wip-alimasa-notif-data-path-v2

Conversation

@AliMasarweh
Copy link
Member

@AliMasarweh AliMasarweh commented Jan 17, 2024

RGW: Migrate topics to data path v2

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@AliMasarweh AliMasarweh requested review from a team as code owners January 17, 2024 12:33
@AliMasarweh AliMasarweh force-pushed the wip-alimasa-notif-data-path-v2 branch 4 times, most recently from e6366ff to 891411e Compare January 17, 2024 17:11
@AliMasarweh AliMasarweh removed the request for review from a team January 18, 2024 10:15
@AliMasarweh AliMasarweh marked this pull request as draft January 18, 2024 10:15
@AliMasarweh AliMasarweh force-pushed the wip-alimasa-notif-data-path-v2 branch from 891411e to b3c4d50 Compare January 18, 2024 12:27
@AliMasarweh AliMasarweh marked this pull request as ready for review January 18, 2024 14:44
@AliMasarweh AliMasarweh force-pushed the wip-alimasa-notif-data-path-v2 branch from b3c4d50 to 0ae0893 Compare January 18, 2024 16:58
@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@yuvalif
Copy link
Contributor

yuvalif commented Jan 28, 2024

pleass see the ongoing design discussion here: #54868 (comment)

@AliMasarweh AliMasarweh force-pushed the wip-alimasa-notif-data-path-v2 branch from 0ae0893 to 81c11d3 Compare February 6, 2024 12:06
@AliMasarweh AliMasarweh force-pushed the wip-alimasa-notif-data-path-v2 branch from 81c11d3 to 900fb0e Compare February 6, 2024 12:07
@AliMasarweh AliMasarweh changed the title RGW: Zone feautres supports migrating mode and prevent modification while migrtaing RGW: Migrate topics to data path v2 Feb 6, 2024
@AliMasarweh AliMasarweh marked this pull request as draft February 6, 2024 12:08
@AliMasarweh AliMasarweh force-pushed the wip-alimasa-notif-data-path-v2 branch from 900fb0e to 2c71a4c Compare February 6, 2024 13:01
@cbodley
Copy link
Contributor

cbodley commented Feb 6, 2024

early feedback: rados is the only backend that has topic metadata, so i think we should make this migration logic rados-specific

that means we don't need to add new interfaces to rgw_sal.h, rgw_sal_filter.h, etc. and the migration code itself would be under driver/rados/ instead of class RGWPubSub

@cbodley
Copy link
Contributor

cbodley commented Feb 6, 2024

p.s. there are additional commits in #55152, we should base the migration work on that

@AliMasarweh AliMasarweh force-pushed the wip-alimasa-notif-data-path-v2 branch 3 times, most recently from d273ec9 to 42c6643 Compare February 7, 2024 10:57
@yuvalif
Copy link
Contributor

yuvalif commented Mar 11, 2024

we migrate the notifications and then the topics.

i don't think that's the case. migrate() does a single listing over pubsub. objects, and either calls migrate_notification() or migrate_topics() depending on the format of each object name

if we require ordering for correctness here, we'll need two separate loops to guarantee that we migrate all the notifications first

ok, so we need the separate loops. not just for the testing, but for the migration state handling to work.
we don't want to allow any changes to topics or notifications while we are doing the migration, and the way to detect that is to check of the system object holding the topic exists

@yuvalif
Copy link
Contributor

yuvalif commented Mar 15, 2024

with #56005 and c83521c, the migration tests are now running in teuthology.

see: https://pulpito.ceph.com/yuvalif-2024-03-15_07:46:28-rgw:notifications-wip-yuval-notif-data-path-v2-distro-default-smithi/

@AliMasarweh, once you add the tests described here: #55214 (comment)
please rerun teuthology.

to make sure that migration works, even without cls_lock, we should setup teuthology with multiple RGWs.
@AliMasarweh can you please tests that locally?
you can do that by running:

RGW_PER_ZONE=2 ../src/test/rgw/test-rgw-multisite.sh 1

this will give 2 RGWs with one zone and one zonegroup.

I will create a separate PR for running the notifications test suite with multiple RGWs, but I don't want it to be a blocker for squid.

@cbodley
Copy link
Contributor

cbodley commented Mar 15, 2024

I will create a separate PR for running the notifications test suite with multiple RGWs, but I don't want it to be a blocker for squid.

@yuvalif all you have to do is add another 'client' under the rgw task:

diff --git a/qa/suites/rgw/notifications/tasks/amqp/0-install.yaml b/qa/suites/rgw/notifications/tasks/amqp/0-install.yaml
index 9e7852d4d25..4d5d9b890e3 100644
--- a/qa/suites/rgw/notifications/tasks/amqp/0-install.yaml
+++ b/qa/suites/rgw/notifications/tasks/amqp/0-install.yaml
@@ -4,6 +4,7 @@ tasks:
 - openssl_keys:
 - rgw:
     client.0:
+    client.1:
 
 overrides:
   ceph:

same for kafka and others

@yuvalif
Copy link
Contributor

yuvalif commented Mar 19, 2024

@cbodley
Copy link
Contributor

cbodley commented Mar 26, 2024

@AliMasarweh @yuvalif can you please help me list the outstanding tasks here? i want to determine whether we can reasonably merge this week

@AliMasarweh
Copy link
Member Author

AliMasarweh commented Mar 26, 2024

@AliMasarweh @yuvalif can you please help me list the outstanding tasks here? i want to determine whether we can reasonably merge this week

I think merging this week is very unlikely, unless we cut down on the requirements,
if we agree to migrate also the internal topics then we probably can merge it this week

@cbodley
Copy link
Contributor

cbodley commented Mar 26, 2024

can we please set up a meeting today or tomorrow to discuss?

@cbodley
Copy link
Contributor

cbodley commented Mar 27, 2024

@yuvalif do we have a plan for the handling of internal topics?

@AliMasarweh
Copy link
Member Author

AliMasarweh commented Mar 27, 2024

@yuvalif do we have a plan for the handling of internal topics?

I pushed a change the solves the issue of internal topics
I need to run teuthology before merging

@yuvalif
Copy link
Contributor

yuvalif commented Mar 28, 2024

before the latest fixes from 3436daf
i have a clean teuthology run: https://pulpito.ceph.com/yuvalif-2024-03-28_15:00:43-rgw:notifications-wip-yuval-notif-data-path-v2-distro-default-smithi/

will build and run again.

@yuvalif
Copy link
Contributor

yuvalif commented Apr 1, 2024

fixed the latest comments and squashed commits.

teuthology is mostly passing: https://pulpito.ceph.com/yuvalif-2024-04-01_16:06:27-rgw:notifications-wip-yuval-notif-data-path-v2-distro-default-smithi/
only failure is with test_ps_s3_persistent_topic_stats which is a known issue: https://tracker.ceph.com/issues/63909
all 4 "data path v2" tests were successfull.

Copy link
Contributor

@cbodley cbodley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise looks great

AliMasarweh and others added 3 commits April 2, 2024 19:28
also add migration tests

Signed-off-by: Ali Masarwa <amasarwa@redhat.com>
Signed-off-by: Yuval Lifshitz <ylifshit@ibm.com>
this is useful to:
* cover the cls_lock code with persistent notifications
* cover v1 to v2 migration collisions between RGWs

Signed-off-by: Yuval Lifshitz <ylifshit@ibm.com>
@yuvalif
Copy link
Contributor

yuvalif commented Apr 3, 2024

latest teuthology: https://pulpito.ceph.com/yuvalif-2024-04-03_15:02:16-rgw:notifications-wip-yuval-notif-data-path-v2-distro-default-smithi/
with 2 known failures:

  • kafka crash
  • test_ps_s3_persistent_topic_stats

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

5 participants