Skip to content

rgw: remove unused backend abstraction for metadata handlers#54777

Merged
cbodley merged 18 commits intoceph:mainfrom
cbodley:wip-rgw-meta-handlers
Sep 13, 2024
Merged

rgw: remove unused backend abstraction for metadata handlers#54777
cbodley merged 18 commits intoceph:mainfrom
cbodley:wip-rgw-meta-handlers

Conversation

@cbodley
Copy link
Contributor

@cbodley cbodley commented Dec 5, 2023

the metadata backend abstraction from #29118 predated zipper, and never evolved to support more than one backend. this was a heavyweight abstraction, and the complicated layering made it very hard to work with. in contrast, the base RGWMetadataHandler interface is easy to implement for new metadata types. it's important that we can keep multisite's metadata sync up-to-date with newer features like bucket notifications, roles, accounts, etc

to untangle this, i started by changing services like RGWSI_Bucket_SObj and RGWSI_User_RADOS to read/write their metadata with rados/sysobj interfaces instead of the metadata backends. then i was able to reimplement their metadata handlers in terms of RGWMetadataHandler directly

RGWSI_MBSObj_Handler_Module had provided a common abstraction for metadata listing which i was able to convert into a RGWMetadataLister class with a single virtual function

with respect to the creation of mdlog entries, RGWSI_MetaBackend_SObj had taken care of that with its pre_modify()/post_modify(). these mdlog entries must be written manually now, using the new RGWSI_MDLog::complete_entry(). this is only called after a successful write, instead of logging 'prepare' and 'complete' entries separately (metadata sync has always ignored 'prepare' entries)

the 'otp' and 'role' metadata got more significant changes, where their 'services' were removed entirely. namespaces rgwrados::otp and rgwrados::role now provide their rados metadata interfaces as free functions which are called by their metadata handlers and rgw_sal_rados.cc

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@cbodley
Copy link
Contributor Author

cbodley commented Dec 5, 2023

cc @pritha-srivastava since you worked on the roles stuff in #43597. i ended up refactoring a lot of that because i have future plans for it. in my work on iam accounts, i'll be making it so that roles can be owned by an account. so in addition to a global listing of roles, you'll also be able to list the roles of a single account for the iam ListRoles api. #54563 adds the cls_user interfaces to track those lists in omap, with similar 'path prefix' filtering to what you wrote for get_roles(). whenever we write/delete/replicate a role in an account, we'll need to update the list in cls_user accordingly

@cbodley
Copy link
Contributor Author

cbodley commented Dec 5, 2023

with respect to the creation of mdlog entries, RGWSI_MetaBackend_SObj had taken care of that with its pre_modify()/post_modify(). these mdlog entries must be written manually now, using the new RGWSI_MDLog::complete_entry(). this is only called after a successful write, instead of logging 'prepare' and 'complete' entries separately (metadata sync has always ignored 'prepare' entries)

cc @smanjara on this part

@pritha-srivastava
Copy link
Contributor

cc @pritha-srivastava since you worked on the roles stuff in #43597. i ended up refactoring a lot of that because i have future plans for it. in my work on iam accounts, i'll be making it so that roles can be owned by an account. so in addition to a global listing of roles, you'll also be able to list the roles of a single account for the iam ListRoles api. #54563 adds the cls_user interfaces to track those lists in omap, with similar 'path prefix' filtering to what you wrote for get_roles(). whenever we write/delete/replicate a role in an account, we'll need to update the list in cls_user accordingly

@cbodley: I reviewed the roles part - all looks fine to me. Just to let you know that currently also listing of roles is for a tenant.

@cbodley cbodley force-pushed the wip-rgw-meta-handlers branch from fec8fed to c4b3047 Compare January 5, 2024 17:19
@github-actions
Copy link

github-actions bot commented Jan 9, 2024

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@cbodley
Copy link
Contributor Author

cbodley commented Feb 16, 2024

planning to rebase this once #55152 merges

@cbodley
Copy link
Contributor Author

cbodley commented Mar 15, 2024

planning to rebase this once #55152 merges

that did merge, but i'd still like to wait for the squid release so we don't complicate any urgent backports

@smanjara
Copy link
Contributor

@cbodley i'm guessing that this will be ready before my OIDC metadata handler changes complete. I probably should rebase on top of this pr?

@cbodley
Copy link
Contributor Author

cbodley commented Mar 15, 2024

@cbodley i'm guessing that this will be ready before my OIDC metadata handler changes complete. I probably should rebase on top of this pr?

i don't think this PR touches oidc stuff, so that's probably not necessary. just don't use any 'metadata backend' stuff to implement that

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label May 14, 2024
@github-actions
Copy link

This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution!

@github-actions github-actions bot closed this Jun 13, 2024
@cbodley cbodley reopened this Aug 4, 2024
@cbodley cbodley force-pushed the wip-rgw-meta-handlers branch from c4b3047 to 1693850 Compare August 4, 2024 20:18
@cbodley cbodley force-pushed the wip-rgw-meta-handlers branch from 1693850 to 31d565b Compare August 4, 2024 20:47
@github-actions github-actions bot removed the stale label Aug 4, 2024
@cbodley cbodley force-pushed the wip-rgw-meta-handlers branch from 31d565b to 99dfbfe Compare August 4, 2024 21:02
@cbodley
Copy link
Contributor Author

cbodley commented Sep 9, 2024

failed qa in https://pulpito.ceph.com/cbodley-2024-09-07_03:10:16-rgw-wip-rgw-meta-handlers-distro-default-smithi/ with rerun https://pulpito.ceph.com/cbodley-2024-09-07_13:15:19-rgw-wip-rgw-meta-handlers-distro-default-smithi/

multisite job showing crashes in radosgw-admin bilog autotrim:

./src/rgw/driver/rados/rgw_trim_bilog.cc: In function 'virtual int AsyncMetadataList::_send_request(const DoutPrefixProvider*)' thread 7f0aaffe7640 time 2024-09-07T04:04:04.880902+0000
./src/rgw/driver/rados/rgw_trim_bilog.cc: 937: FAILED ceph_assert(keys.size() == 1)

that causes failures for:

FAIL: rgw_multi.tests.test_bucket_index_log_trim
FAIL: rgw_multi.tests.test_replication_status

a notification test is also failing, unrelated to bilog trim:

FAIL: rgw_multi.tests.test_topic_notification_sync

Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
inherit from RGWMetadataHandler instead of
RGWMetadataHandler_GenericMetaBE to avoid all of the metadata backend
abstractions, and just call into the cls and sysobj services directly

Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
RGWBucketMetadataHandler inherits from RGWMetadataHandler directly
instead of RGWMetadataHandler_GenericMetaBE

replace RGWSI_Bucket_SObj_Module with the RGWMetadataLister abstraction

RGWSI_Bucket_SObj now depends on RGWSI_MDLog and writes mdlog entries
itself

Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
move everything from class RadosRole into driver/rados/role.cc and use
RGWSI_SysObj instead of the metadata backend. narrows the RGWRole
interface by handling the name/path objects internally

rgwrados::role::write() adds/remove name and path linkages in case they
change. admin ops don't allow this, but 'metadata put' could upload
arbitrary json that does change them

Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
@cbodley cbodley force-pushed the wip-rgw-meta-handlers branch from bad475c to 8c56a23 Compare September 12, 2024 20:55
@cbodley
Copy link
Contributor Author

cbodley commented Sep 12, 2024

rebased/squashed

@cbodley cbodley added the TESTED label Sep 12, 2024
Copy link
Contributor

@adamemerson adamemerson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay!

@cbodley
Copy link
Contributor Author

cbodley commented Sep 13, 2024

https://jenkins.ceph.com/job/ceph-api/81524/

Command failed with status 250: '../src/vstart.sh -n --nolockdep'

@cbodley
Copy link
Contributor Author

cbodley commented Sep 13, 2024

jenkins test api

@cbodley cbodley merged commit 1c941a6 into ceph:main Sep 13, 2024
@cbodley cbodley deleted the wip-rgw-meta-handlers branch September 13, 2024 16:56
@smanjara
Copy link
Contributor

while testing #53799 after a rebase, I see a crash while removing bucket instance:

2024-09-18T03:26:45.143-0400 7f8474a506c0 20 rgw async rados processor: do_read_bucket_instance_info, bucket instance not found (key=bucket1:41b6e36e-480d-4b64-baec-f43c3a4b73ff.4298.1)

2024-09-18T03:26:45.172-0400 7f8474a506c0 -1 *** Caught signal (Segmentation fault) **
in thread 7f8474a506c0 thread_name:rados_async

ceph version 19.3.0-5006-ge19dc9e23a4 (https://github.com/ceph/ceph/commit/e19dc9e23a4853e4e0e60badddce5448c1c27cf8) squid (dev)
1: /root/ceph/build/bin/radosgw(+0x2172cbb) [0x55f15bdb1cbb]
2: /lib64/libc.so.6(+0x3dbb0) [0x7f858d05fbb0]
3: pthread_mutex_lock()
4: (std::mutex::lock()+0x9) [0x7f858eaa0659]
5: (std::unique_lockstd::mutex::lock()+0x17) [0x7f858eaa08e7]
6: (ceph::logging::Log::submit_entry(ceph::logging::Entry&&)+0x31) [0x7f858ed2bfa1]
7: (RGWMetadataManager::remove(std::__cxx11::basic_string<char, std::char_traits, std::allocator >&, optional_yield, DoutPrefixProvider const*)+0x1df) [0x55f15ba7fc6b]
8: (RGWAsyncMetaRemoveEntry::_send_request(DoutPrefixProvider const*)+0x48) [0x55f15b94bf40]
9: (RGWAsyncRadosRequest::send_request(DoutPrefixProvider const*)+0x1d) [0x55f15baffb35]
10: (RGWAsyncRadosProcessor::handle_request(DoutPrefixProvider const*, RGWAsyncRadosRequest*)+0xc) [0x55f
15baf5c5a]
11: (RGWAsyncRadosProcessor::RGWWQ::_process(RGWAsyncRadosRequest*, ThreadPool::TPHandle&)+0x13) [0x55f15
bafe2a3]
12: (ThreadPool::WorkQueue::_void_process(void*, ThreadPool::TPHandle&)+0xa) [0x55f
15b933b1c]

this was probably never exercised because we didn't remove bucket instance when multisite sync is involved.

@smanjara
Copy link
Contributor

while testing #53799 after a rebase, I see a crash while removing bucket instance:

2024-09-18T03:26:45.143-0400 7f8474a506c0 20 rgw async rados processor: do_read_bucket_instance_info, bucket instance not found (key=bucket1:41b6e36e-480d-4b64-baec-f43c3a4b73ff.4298.1)

2024-09-18T03:26:45.172-0400 7f8474a506c0 -1 *** Caught signal (Segmentation fault) ** in thread 7f8474a506c0 thread_name:rados_async

ceph version 19.3.0-5006-ge19dc9e23a4 (https://github.com/ceph/ceph/commit/e19dc9e23a4853e4e0e60badddce5448c1c27cf8) squid (dev)
1: /root/ceph/build/bin/radosgw(+0x2172cbb) [0x55f15bdb1cbb]
2: /lib64/libc.so.6(+0x3dbb0) [0x7f858d05fbb0]
3: pthread_mutex_lock()
4: (std::mutex::lock()+0x9) [0x7f858eaa0659]
5: (std::unique_lockstd::mutex::lock()+0x17) [0x7f858eaa08e7]
6: (ceph::logging::Log::submit_entry(ceph::logging::Entry&&)+0x31) [0x7f858ed2bfa1]
7: (RGWMetadataManager::remove(std::__cxx11::basic_string<char, std::char_traits, std::allocator >&, optional_yield, DoutPrefixProvider const*)+0x1df) [0x55f15ba7fc6b]
8: (RGWAsyncMetaRemoveEntry::_send_request(DoutPrefixProvider const*)+0x48) [0x55f15b94bf40]
9: (RGWAsyncRadosRequest::send_request(DoutPrefixProvider const*)+0x1d) [0x55f15baffb35]
10: (RGWAsyncRadosProcessor::handle_request(DoutPrefixProvider const*, RGWAsyncRadosRequest*)+0xc) [0x55f
15baf5c5a]
11: (RGWAsyncRadosProcessor::RGWWQ::_process(RGWAsyncRadosRequest*, ThreadPool::TPHandle&)+0x13) [0x55f15
bafe2a3]
12: (ThreadPool::WorkQueue::_void_process(void*, ThreadPool::TPHandle&)+0xa) [0x55f
15b933b1c]

this was probably never exercised because we didn't remove bucket instance when multisite sync is involved.

ignore this. i was trying to log something in the RGWMetadataManager without initializing cct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants