Skip to content

rgw: bucketmv#23994

Closed
mdw-at-linuxbox wants to merge 11 commits intoceph:masterfrom
mdw-at-linuxbox:wip-master-rgw-bucketmv
Closed

rgw: bucketmv#23994
mdw-at-linuxbox wants to merge 11 commits intoceph:masterfrom
mdw-at-linuxbox:wip-master-rgw-bucketmv

Conversation

@mdw-at-linuxbox
Copy link
Contributor

When converting from a global implicit tenant setup to a multi-tenancy setup, old buckets become inaccessible. Jewel had a somewhat strange compromise setup for this, which had several strange properties. Modern versions of ceph have more uniform properties, but require that buckets be somehow moved to the current data layout. This series of patches provides an administrative way to move buckets between tenant. As a bonus this can also be used to rename buckets.

This is a slightly complicated change, so instead of one giant scary commit, I have,

8297f4c rgw: making implicit_tenants backwards compatible.
e9cf92a Add several types to ceph-dencoder.
d6f71e0 Add "--bucket-new-name" option to radosgw-admin.
bb3ba20 rgw: bucket link: Add ability to name bucket w/ different tenant.
fa40823 rgw: bucket link: simplify use of get bucket info.
a847ace rgw: bucket link: use data from bucket_info to rewrite bucket_endpoint.
4499845 rearrange / simplify RGWBucket::link logic - start bucket move support
d8efa3b rgw: bucket link: base "bucket move" (tenant id only)
b33d851 rgw: bucket link: "bucket move"; handle bucket names too.
0208764 rgw: bucket link: "bucket move" documentation changes

Note that 8297f4c is (or should be the same) as PR#22378
e9cf92a (dencoder) isn't strictly required, but made debugging this a lot easier with the ability to more easily print out rados objects associated with rgw buckets. I see no reason not to include this.

The remaining patches incrementally build "bucket link" functionality; note that some patches overwrite changes from earlier patches - I'm not sure this will make it as easy to review this as I had hoped.

Several things are not here because they appear to not be necessary: this does not loop and rewrite object acls in a bucket. Swift at least doesn't appear to care. Doesn't rewrite tail placement; doesn't seem to break anything. No real provisions to lock the bucket. Resharding a bucket seems like the most likely competitor; its locking logic doesn't appear to be compatible to this. If you think I'm missing functionality please if possible provide testable examples - if I can't test it I can't know that my code is right.

So possible things left: squash patches? create more tracker issues and reference them? Other things people identify here?

@vumrao
Copy link
Contributor

vumrao commented Sep 9, 2018

I have created this tracker - http://tracker.ceph.com/issues/35885 for this issue.

RGWBucketInfo bucket_info;
map<string, bufferlist>::iterator aiter = attrs.find(RGW_ATTR_ACL);
if (aiter == attrs.end()) {
// XXX why isn't this an error? mdw 20180825
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I sat down and traced old behavior re RGW_ATTR_ACL.

A long time before Argonaut, when ACLs were first introduced, the attribute was NOT mandatory, and the attribute contained a customized default ACL. If not present, it was generated from the owner uid on the fly. Regardless of being present or generated, it was then copied to the object, when a new object was created.

The attribute name has also changed multiple times, and no provisions for supporting old names was made, so they would have been broken LONG ago.

user.rgw.acl
user.s3.acl
user.s3acl

Dreamhost is probably the only org that has pre-argonaut buckets, and since it would have broken slightly long ago, I think making it an error now should be fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, robin

bool inline implicit_tenants_for_(const implicit_tenant_flag_bits bit)
{
assert(v != IMPLICIT_TENANTS_BAD);
return !!(v&bit);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static_cast<bool> or != 0 would be IMHO a bit more readable.

};


class S3AuthFactory : public rgw::auth::RemoteApplier::Factory,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for cleaning this up!

enum implicit_tenant_flag_bits {IMPLICIT_TENANTS_SWIFT=1,
IMPLICIT_TENANTS_S3=2, IMPLICIT_TENANTS_BAD = -1, };
private:
int saved;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

saved can be updated from different thread. Although in the current code it's hard to find a nasty scenario, it still might be desirable to use std::atomic_int even to put more attention on that and avoid issues in the future (evolving code).

return;
}

ldout(cct, 0) << "NOTICE: couldn't map swift user " << acct_user << dendl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surely swift user isn't a new thing but might be worth fixing.

* only use the identifier space that would be used if the id were
* to be created. */

if (split_mode && !implicit_tenant)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

   *
   * The truth table for the loads:
   *   split_mode = false, implicit_tenant = false -> (tenanted_uid IF acct_user.tenant.empty()) || acct_user
   *   split_mode = false, implicit_tenant = true  -> (tenanted_uid IF acct_user.tenant.empty()) || acct_user
   *   split_mode = true,  implicit_tenant = false -> acct_user
   *   split_mode = true,  implicit_tenant = true  -> tenanted_uid IF acct_user.tenant.empty() */


const AuthInfo info;
const bool implicit_tenants;
rgw::auth::ImplicitTenants& implicit_tenant_context;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, does RemoteApplier really need this bit as non-const?

TYPE(RGWBucketEntryPoint)
TYPE(RGWUploadPartInfo)
TYPE(rgw_obj)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK.

for bucket link: optional new name
--shard-id=<shard-id> optional for:
mdlog list
data sync status
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK.


// split possible tenant/name
auto pos = bucket_name.find('/');
if (pos != boost::string_ref::npos) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider switching to std::string::npos as:

/*
 * A simple wrapper class for administrative bucket operations
 */

class RGWBucket
{
  // ...
  std::string bucket_name;

{
static const char *keys[] = {
"rgw_keystone_implicit_tenants",
NULL };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: NULL -> nullptr.

Copy link
Contributor

@mattbenjamin mattbenjamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks good, noticed one block of dead code?

}

string bucket_id = op_state.get_bucket_id();
#if 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is up with the #if 0 block?


int RGWBucket::init(RGWRados *storage, RGWBucketAdminOpState& op_state,
std::string *err_msg)
std::string *err_msg, map<string, bufferlist> *pattrs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why pointer to pattrs (and err_msg, for that matter)--could these be reference/& arguments?

RGWBucketInfo bucket_info;
map<string, bufferlist>::iterator aiter = attrs.find(RGW_ATTR_ACL);
if (aiter == attrs.end()) {
// XXX why isn't this an error? mdw 20180825
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, robin

@mattbenjamin mattbenjamin self-assigned this Sep 10, 2018
@mattbenjamin
Copy link
Contributor

@mattbenjamin mattbenjamin self-requested a review September 11, 2018 15:16
Copy link
Contributor

@mattbenjamin mattbenjamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this PR should get respond to reviewer comments, but modulo that, lgtm--also it passed an RGW suite run

bucket.tenant = tenant;
if (!op_state.new_bucket_name.empty()) {
auto pos = op_state.new_bucket_name.find('/');
if (pos != boost::string_ref::npos) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::string::npos here also

// XXX I am not convinced this is at all necessary; but previous
// versions would have found and copied VERSION_ATTR so I will
// do likewise for now... mdw 20180825
auto version_iter = attrs.find(VERSION_ATTR);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree that this part isn't necessary. since we're creating a new object, we should throw out anything related to cls_version tracking of the old object

@stale
Copy link

stale bot commented Nov 10, 2018

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
If you are a maintainer or core committer, please follow-up on this issue to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@stale stale bot added the stale label Nov 10, 2018
@robbat2
Copy link
Contributor

robbat2 commented Nov 13, 2018

stalebot: please keep this open

@stale stale bot removed the stale label Nov 13, 2018
@mattbenjamin
Copy link
Contributor

@robbat2 thanks for unstaling; @mdw-at-linuxbox please prioritize a rebase for this; I'll work with @mdw-at-linuxbox to get this a standalone suite run--the prior version had passed

@tchaikov tchaikov changed the title Wip master rgw bucketmv rgw: bucketmv Dec 21, 2018
@mdw-at-linuxbox
Copy link
Contributor Author

github apparently won't let me respond individually above. But:
1, added tracker issues to commits.
2, new extra commit for changes to adapt to newer master.
3, replaced hated !! with static_cast.
4, don't want to do atomics for configuration - I don't think in this case the difference will even be perceptible in human terms - and atomics incur significance overhead the other 99.9999% time.
5, const reference - no point. references are constant already.
6, -1: replaced some boost::stirng_ref::npos with std::string:npos
7, NULL->nullptr
8, can't use reference for *pattrs. Well, could, but this would obfuscate the pgm logic, which sets pattrs to one of 2 values in different stmts.
9, added error message to catch error from prehistoric data.
10, removed redundant VERSION_ATTR copy.

@mdw-at-linuxbox
Copy link
Contributor Author

github doesn't seem to have absorbed the push I did, and is otherwise acting oddly - I'll try to fix this once things seem more stable.

@robbat2
Copy link
Contributor

robbat2 commented Jan 11, 2019

@mdw-at-linuxbox try push again?
Specifically, for the PR to pick it up, it must be in this branch in your mdw-at-linuxbox/ceph-1 repo.
https://github.com/mdw-at-linuxbox/ceph-1/tree/wip-master-rgw-bucketmv

@mdw-at-linuxbox mdw-at-linuxbox force-pushed the wip-master-rgw-bucketmv branch from 0208764 to e2af58e Compare January 23, 2019 09:43
@mdw-at-linuxbox
Copy link
Contributor Author

Got it pushed. github apparently thought my browser was too old. :-(

@mdw-at-linuxbox
Copy link
Contributor Author

I've pushed a rebased version of this. Rebased on a fresh copy of master, passes my testing.

@adamemerson
Copy link
Contributor

@mdw-at-linuxbox
Copy link
Contributor Author

The assert failure you showed me offline (radosgw_admin_rest.py line 481) looks like something I thought was a previously existing bug. However, assuming that teuthology test is not supposed to fail, that probably means this is a regression. I'm trying to compare its behavior to an unmodified build of master (with suiltable instrumentation), but I haven't got a good run of that yet.

@pritha-srivastava
Copy link
Contributor

This comment is related to BZ 1670217. While running the ACL test against master without the changes in this PR, I could see that the test failed irrespective of whether or not the config rgw keystone implicit tenants is set to true. The reason being that buckets were created under tenant namespace in both the cases. Details are in the BZ.
When the changes in this PR were applied on master and then when I ran the test against it, the test failed when rgw keystone implicit tenants was set to true and passed when the config was absent. It failed because with the config being set to true, the buckets were created under tenant namespace, whereas when the config was absent, buckets were created under global (empty) namespace.
So is this the correct way to configure rgw, in order to get the test passing in BZ 1670217, and also is this the correct behavior when the config is set to true and when the config is absent?

@smanjara smanjara self-requested a review April 5, 2019 14:42
@smanjara
Copy link
Contributor

smanjara commented Apr 9, 2019

@mdw-at-linuxbox Could you please rebase?

@cbodley
Copy link
Contributor

cbodley commented Apr 9, 2019

it doesn't look like there have been any updates since @adamemerson ran it through teuthology, which flagged some radosgw-admin failures

In jewel, "rgw keystone implicit tenants" only applied to swift. As of
luminous), this option applies to s3 also.
Sites that used this feature with jewel now have outstanding data that
depends on the old behavior.

The fix here is to expand "rgw keystone implicit tenants" so that it
can be set to any of "none", "all", "s3" or "swift" (also 0=false=none,
1=true=all).  When set to "s3" or "swift", the actual id lookup
is also partitioned.

Formerly "rgw keystone implicit tenants" was a legacy opt.
This change converts it to the new style of option,
including support for dynamically changing it.

Fixes: http://tracker.ceph.com/issues/24348

Signed-off-by: Marcus Watts <mwatts@redhat.com>
Add types: RGWBucketEntryPoint obj_version rgw_user
These are structures that are visible as data at rest
inside of rados when a bucket is made via radosgw.

RGWBucketEntryPoint is the contents of a rados
object with names that may be either "<bucket-name>"
or "<tenant>/<bucket-name>"

rgw_user is a structure contained inside of RGWBucketEntryPoint
and other structures.

obj_version is visible as the xattr "ceph.objclass.version"
on rados objects in ".rgw.meta" that contain ".bucket.meta."

Fixes: http://tracker.ceph.com/issues/35885

Signed-off-by: Marcus Watts <mwatts@redhat.com>
This just adds the command line option and related bits.

Underlying functionality will be a later commit.

Fixes: http://tracker.ceph.com/issues/35885

Signed-off-by: Marcus Watts <mwatts@redhat.com>
This is not a complete fix; but it makes it possible for the
bucket link command to correctly find and attempt to link a bucket
to a user with a different tenant.  The reason this is not a complete
fix is that with just this change, the resulting bucket is "broken";
a duplicate endpoint but 0 length contents is created, and
the info entry is not correctly moved.

Fixes: http://tracker.ceph.com/issues/35885

Signed-off-by: Marcus Watts <mwatts@redhat.com>
The bucket link command was doing an extra bucket
get info call because it needed attributes.  Revised
::init so that attributes could optionally be requested,
and eliminate now unnecessary call.

Fixes: http://tracker.ceph.com/issues/35885

Signed-off-by: Marcus Watts <mwatts@redhat.com>
The bucket link command was doing a fetch of the entrypoint late
in the link process.  This makes it harder to do "bucket move"
functionality, because then it would need to know the old
bucket late in the process.  The bucketinfo structure has
all the data elements necessary to recreate the endpoint,
so the changes here arrange to just use that data.

In order to write the object it's also necessary to propagate
xattrs.  The only xattr that seems to be present on the endpoint
is "ceph.objclass.version", so that's what this copies out.
It appears that attribute may be set set separately by cls,
so I'm not sure this is actually necessary.  However, the
old code would have written it, so this code preserves
that behavior.

Fixes: http://tracker.ceph.com/issues/35885

Signed-off-by: Marcus Watts <mwatts@redhat.com>
The existing RGWBucket::link logic changed things incrementally in
order to relink a bucket.  When doing a "bucket move", this is no longer
a good idea - bucket objects must be written to new names which don't
exist, so it is better to create then wholly out of in-memory data.

Also, add != for rgw_bucket - inverse of existing of ==, provides
another option to arrange code to make it more readable.

Fixes: http://tracker.ceph.com/issues/35885

Signed-off-by: Marcus Watts <mwatts@redhat.com>
This is the base or primitive "bucket move" function.  It handles rewriting
the endpoint and info rados objects for the rgw bucket, plus deleting
the original rados objects that are no longer correct.  It doesn't
handle changing the bucket name; that's in a future commit.  Some of
the changes here will get overwritten by that commit.

Fixes: http://tracker.ceph.com/issues/35885

Signed-off-by: Marcus Watts <mwatts@redhat.com>
This is the remainder of bucket move function.  Implement bucket "rename"
- use string passed in via '--bucket-new-name' to optinally override
the resultant bucket name when doing bucket link.  This is basically
a slight generalization of changing the tenant of a bucket; the same
operations are required for either.

Further refinements included here are minor improvements to associated
error messages.

This does not change any "tail placement" information in any objects
contained in the bucket.  The bucket name is encoded there, along with
the bucket id, but neither appears to be used, and the existing reshard
logic which changes bucket ids also does not appear to alter that.

Fixes: http://tracker.ceph.com/issues/35885

Signed-off-by: Marcus Watts <mwatts@redhat.com>
Improve and add to documentation for "bucket move" functionality;
including use moving to multi-tenancy and further deprecating bucket-id
which is no longer necessary.

Fixes: http://tracker.ceph.com/issues/35885

Signed-off-by: Marcus Watts <mwatts@redhat.com>
The previous few commits for rgw_bucket.cc could originally be applied
one at a time resulting in a complete buildable copy of ceph at each step.
Recent independent changes to rgw_bucket.cc have affected the same logic,
so the original commits no longer build.  This commit resyncs things
with master such that the result will build.

I'm preserving the intermediate commits for now, since that's what was
originally backported to jewel, If this causes problems, these commits
should be squashed together; except for backport purposes the intermediate
versions have no other value.

One other change here: omit logic that copied "VERSION_ATTR" to the
new container object.  This should already be provided for elsewhere.

Fixes: http://tracker.ceph.com/issues/35885

Signed-off-by: Marcus Watts <mwatts@redhat.com>
@mdw-at-linuxbox mdw-at-linuxbox force-pushed the wip-master-rgw-bucketmv branch from ff86b3b to e430ab2 Compare April 17, 2019 07:10
@mdw-at-linuxbox
Copy link
Contributor Author

I've pushed a new version against a fresh copy of master, and this copy matches the behavior I see with the corresponding luminous patch. The rebase I did before this had a lot of weirdness, probably a bad copy of master.

@smanjara
Copy link
Contributor

@mdw-at-linuxbox Can you please rebase again?

@stale
Copy link

stale bot commented Aug 9, 2019

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@stale stale bot added the stale label Aug 9, 2019
@cbodley
Copy link
Contributor

cbodley commented Aug 9, 2019

as far as i know, everything here merged with #28813

@cbodley cbodley closed this Aug 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants