[DNM] rgw: metadata refactoring by yehudasa · Pull Request #28679 · ceph/ceph

yehudasa · 2019-06-20T18:30:55Z

There and back again.

De-scrambling this omelette took way longer than I originally anticipated, and scope of work grew way bigger than originally intended. There were a lot of interdependencies that needed to untangle. This started as an effort to move stuff out of RGWRados into modules. Turned quickly into refactoring of metadata handling. The original metadata code was doing some funny things in order to ensure that when writing metadata entries (e.g., bucket/user creation/modification/removal), the metadata log is updated. The different calls (e.g., RGWRados::put_bucket_instance_info(), etc.) would translate the entity id (e.g., bucket, user) into a metadata key, and then would funnel into RGWMetadataManager::put_entry() that will then convert that metadata key into raods {pool, oid} (by querying the specific metadata handler), update the metadata log, and write the corresponding system object. A generic metadata put would work similarly, but will go from RGWMetadataManagaer::put() into the corresponding handler put() method, which in turn will call the manager put_entry(). The otp handler called the manager mutate() instead of put_entry() as it wasn't doing a simple system object write, but called a cls_otp method.

New handling works different by trying to make sure that RGWMetadataManager doesn't need to know anything about the metadata layout on one hand, and on the other hand the different random utility functions (e.g., rgw_put_bucket_store_info()) don't call into the metadata manager directly or indirectly. This responsibility was pushed down to a set of svc (I'll avoid stating the badly named term 'service', should have been 'module'; if anyone wants to run a find and replace and change everything be my guest). There are the meta and meta_be svcs. The meta_be (backend) svc is the base svc for meta backend handling. Then there are different meta_be implementations: meta_be_sobj, meta_be_otp (that extends from sobj).

A meta_be svc has a backend type associated with it (e.g., SOBJ, OTP). Each can have different type of data that it needs in order to access and control the backend. The abstract RGWSI_MetaBackend::Context is being used to hold that data, and each backend type extends it. However, the meta_be svc is generic, and doesn't know anything about buckets and users. For each metadata type we need to create a meta-backend handler instance. The meta-backend handler can generate a type specific context that will be used by the specific backend. For the sobj backend there is a "module" that the handler holds and has a set of methods that can be used to translate between meta keys and the sobj {pool, oid}. Each metadata handler that wants to use the sobj backend implements a module instance that does these translations. At the moment things are explicit, and everything is created on initialization and ad-hoc. The metadata handlers at the moment know that they will be used with sobj (or otp) backends. In the future we can do things a bit differently in the way these things are constructed, but this structure should still hold.

Now that everything was pushed down below the metadata manager and metadata handlers, the utility functions don't need to call into them for writing certain metadata entries. However, there was a need to create new utility code that would use the new scheme. Note that when creating a user (for example), we don't just create the metadata entry, but there are a bunch of other objects that we create (the different indexes -- access_key, swift, email). These are also very specific to the backend that is being used. I created new base svcs for user, and for bucket, and created implementation svcs: user_rados, bucket_sobj. Note that the user implementation has rados specific calls, whereas bucket only does sobj specific stuff (which is still only rados, but in theory can be replaced).

It became clear to me that the svc layer is not where we really want to keep all the application apis. It should serve as a a lower abstraction layer for modular entities. One example for why this is needed is that calls that create a user's bucket, need to interact with both user svc and bucket svc. Having the two svcs know about each other would be problematic. Instead I created 'controls' (RGWBucketCtl, RGWUserCtl, RGWOTPCtl) that provide higher level apis and call themselves into the different svcs. The lower level svcs need to be provided with an appropriate backend context. Some of those don't really need that context, but I made sure they get it anyway -- mostly so that it's explicit that those calls might be backend dependent. Also, the context passed to these calls needed to be explicitly typed so that we don't make a mistake and accidentally pass a wrong context. This was a real issue when dealing with svc.bucket that has two separate backends: one for bucket, and one for bucket.instance.

I tried to avoid introducing new explicit dynamic allocations. This was a bit challenging, since the meta backends require having a context passed into them from the top level, but the top level doesn't need to know anything about the specific implementation. One way to do it was to leverage ceph::static_ptr<>, however, that required exposing all context implementations to the static_ptr<> definition, which I wasn't happy about. The solution I preferred was providing an abstract method on the backend that would get a lambda that receives a context pointer. The backends will then implement it by putting a context (of their type) on the stack, and calling that lambda. Hopefully that prevents dynamic allocations and is actually useful. The same was done for the backend handlers. With the latter there were cases where it was needed to pass two different context types (in the bucket + bucket.instance case).

Other notable things:

A bunch of code was moved from RGWRados to a new 'cls' svc.
A lot of old C calls that got RGWRados pointer as a param were removed, turned those into calls into the specific ctl/svc calls. There still a lot of these around.
Changed a lot of calls that got RGWRados pointer and replaced it with the specific ctl or svc that they actually were using. Again, didn't go through everything, was doing it on a per-find basis.
I removed a few REST management calls that were used to lock and unlock different multisite related logs. These were used in multisite v1. There might still be some around, but should also be removed in the future. I also removed the metadata lock functionality.
I mostly removed most of the sync_type param that was used when putting a metadata object. This was maybe used in multisite v1, but I'm not sure.
New svcs: bi (rados), bilog (rados), datalog (rados), mdlog (rados -- although not stated), quota
ctl initaialization is done at rgw_service.cc after svc initialization, and the RGWCtl structure is similar to RGWService. RGWRados now also holds a ctl member, similar to svc.

TBD:

There are still a few #warnings that need to be handled. The two main issues left behind are the metadata and chained cache that when I looked at I thought it needed some rework (looks like there are paths where we don't go to the metadata cache when dealing with bucket instance).
Since code is more generic and avoids being backend specific, the new ctl methods don't accept RGWObjectCtx (which is sobj specific). However, by removing that we lose something, and there needs to be a (good) way to add it back.
There is an unrelated includes cleanup work that snuck in and should be removed (and be sent separately).
rebase
lots of testing
References tracker ticket
Updates documentation if necessary
Includes tests for new functionality or reproducer for bug

cascading changes, and minor improvements. Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

Consolidate objclass util services. Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

move code out of RGWRados, refactor a bit to use rados svc. Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

move implementation out of header. Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

Don't include json_spirit.h, but json_spirit_value.h, and other minor changes. Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

Dependency reduction Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

I didn't intend for it to be such a big commit, and it's not even compiling yet. This changes the structure of how the metadata manager and handlers work. The idea is to be able to relatively easily hook in different meta backends (or same backends with different handling -- such as the otp). Added new services for meta, meta backend, and meta backend sysobj implementation. The meta backend service is responsible for the final data storage, and updating the meta log (log might be split later on, but at the moment it keeping it together for simplicity). The handlers themselves are the ones responsible for reading or modifying the metadata. This means that they need to call the meta backend service instead of calling the utility functions. The utility functions need to call the handlers, and not the other way around. Handlers can have utility methods to assist. Left to do: get everything actually compiling and implemented. The structure is there, now need to fill in the gaps. Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

Getting closer, but not there yet Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

So that it could be used now in svc.bucket without needing store. Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

and other changes Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

and a lot of compilation fixes Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

notable changes are around user metadata. Create an api that uses the service interface (that requires backend context) and use it for higher level functions. Still a lot to do. Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

Similar to svc, just for higher level apis. Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

in rgw_user.cc Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

rgw-admin now compiles Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

No need for fetching attrs, these could be read via the get() if meta object has pattrs set (happens in the RGWBucketEntryMetadataObject case). Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

This is needed so that we can provide a backend-specific params from the top level. For example in the current case we need in some cases to be able to provide a sysobj_ctx that will be used instead of a newly generated one. The layers in between don't need to know about the backend specifics. Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

To differentiate between the different uinfo cache indexes Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

No need to pass mtime there Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

So that requests to read_bucket_instance_info() could be satisfied through cache. Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

This field was only initialized if we also read the bucket entrypoint, which is not always the case. Added ep objv_tracker field on req_state instead, and changed logic in delete_bucket() to make sure we do the right thing. Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

Was passing wrong variable in constructor initialization, renamed handler member name to avoid future confusion. Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

allow both [tenant/]bucket:instance and [tenant:]bucket:instance Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

Was passing the period id (string) which triggered implicit construction of the period object. Instead pass the period object itself and make constructor explicit to avoid future similar issues. Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

put_entry() doesn't write to mdlog. Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

… key rgw_bucket_parse_bucket_instance() was returning a meta key, e.g., bucket:instance and not just 'instance'. This caused issue when using read_bucket_instance_info(). Adjusted other users of this function, now that api changed. Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

We already get the substring as input, don't do it again. Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

Need to diffrerentiate between need to sync and actually syncing. Zone might need to sync, but we're running in radosgw-admin and we don't actually sync. Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

yehudasa added 30 commits February 6, 2019 17:21

rgw: move manifest code around, initial tier separation

cf2b0ee

cascading changes, and minor improvements. Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

rgw: svc/mfa: new service for mfa functionality

734aeff

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

rgw: switch to using svc.mfa

c40b749

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

rgw: svc_cls: replace svc_mfa with a more generic svc_cls

52bb912

Consolidate objclass util services. Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

rgw: object expirer: move code around

06adfce

move code out of RGWRados, refactor a bit to use rados svc. Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

rgw: move rgw_compression_info_from_attrset() to a better home

b9d9611

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

rgw: move admin socket cache code into svc_sys_obj_cache

bb0af30

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

rgw: no need to pass in RGWRados to rgw_get_system_obj()

f7176f1

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

rgw: svc_bucket: initial work

18ce99f

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

rgw: use bucket service for get_bucket_info

e4507a8

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

rgw: header cleanup

9e59024

move implementation out of header. Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

rgw: more header cleanup: remove code from rgw_rados.h

a048d4d

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

rgw: more headers cleanups

2c44f73

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

ceph_json: header cleanup

5dfb9f0

Don't include json_spirit.h, but json_spirit_value.h, and other minor changes. Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

rgw: more headers cleanup

80d243f

Dependency reduction Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

rgw: move helper code out of rgw_rados.cc

aa940a8

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

rgw: svc bucket: set bucket instance info

f762c5c

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

rgw: more progress with meta restructure

10bb3c8

Getting closer, but not there yet Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

rgw: move service_module init into svc.sync_modules

5c04290

So that it could be used now in svc.bucket without needing store. Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

rgw: start adapting otp metadata handler

1970ec3

and other changes Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

rgw: meta_be_sobj interface changes and adapt otp, bucket meta handlers

3bdd99b

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

rgw: bucket and user meta: implement higher level service utils

4a3a419

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

rgw: meta put: revise

ac94453

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

rgw: more meta backend work

76a2bd1

and a lot of compilation fixes Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

rgw: more refactoring work

c7bef87

notable changes are around user metadata. Create an api that uses the service interface (that requires backend context) and use it for higher level functions. Still a lot to do. Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

rgw: consolidate some higher level apis into ctl structure

b61f1fe

Similar to svc, just for higher level apis. Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

rgw: trivial cleanups

d82ee3a

in rgw_user.cc Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

rgw: rgw_user api adjustments

a1fc762

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

rgw: svc_user adjustments

f88df01

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

yehudasa added 6 commits June 19, 2019 18:00

rgw-admin: api fixes

3aa2ca9

rgw-admin now compiles Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

rgw: unitests and other tools fixes

0e32fbb

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

rgw: initialization fixes

324fd7c

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

rgw: metadata handlers init fixes

baf5f86

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

rgw: more metadata related initialization fixes

845e436

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

rgw: fix user_ctl->read_info_by_access_key()

b8f31fd

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

batrick added the rgw label Jun 20, 2019

yehudasa added 16 commits June 25, 2019 15:55

rgw: remove unused code

ac015a5

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

rgw: meta: remove #warning

acb4975

No need for fetching attrs, these could be read via the get() if meta object has pattrs set (happens in the RGWBucketEntryMetadataObject case). Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

rgw: propagate sysobj_ctx to required bucket meta calls

37b7c0d

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

rgw: code cleanup

7b4efef

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

rgw: remove unused field from struct rgw_bucket

da5074e

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

rgw: add pool to uinfo cache keys

1d3a183

To differentiate between the different uinfo cache indexes Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

rgw: add missing cache_info param

1acd7f4

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

rgw: remove unneeded warning

aa4bbb6

No need to pass mtime there Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

rgw: binfo_cache: keep entries by both bucket name and bucket instance

35b1d9e

So that requests to read_bucket_instance_info() could be satisfied through cache. Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

rgw: silence log message

197fbab

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

rgw: a few initialization fixes

f4c8ad6

Was passing wrong variable in constructor initialization, renamed handler member name to avoid future confusion. Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

rgw: relax bucket meta instance parsing

f1c1f61

allow both [tenant/]bucket:instance and [tenant:]bucket:instance Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

rgw: svc.user: call meta_be->put() and not ->put_entry()

81027a9

put_entry() doesn't write to mdlog. Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

yehudasa force-pushed the wip-rgw-svc-bucket-3 branch from 71218d5 to 15c5dde Compare July 3, 2019 17:51

yehudasa added 4 commits July 3, 2019 10:58

rgw: list_prefixed_objs() callback: don't take substring

ae043ef

We already get the substring as input, don't do it again. Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

rgw: svc.mdlog: init oldest log period only if we're syncing

15c5dde

Need to diffrerentiate between need to sync and actually syncing. Zone might need to sync, but we're running in radosgw-admin and we don't actually sync. Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

rgw: svc_meta_be: add missing header

e691dc5

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

cbodley mentioned this pull request Jul 18, 2019

rgw: metadata refactoring #29118

Merged

cbodley closed this Sep 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DNM] rgw: metadata refactoring#28679

[DNM] rgw: metadata refactoring#28679
yehudasa wants to merge 105 commits intoceph:masterfrom
yehudasa:wip-rgw-svc-bucket-3

yehudasa commented Jun 20, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

yehudasa commented Jun 20, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants