Skip to content

mgr/cephadm: simplify handling for rgw#39877

Merged
sebastian-philipp merged 7 commits intoceph:masterfrom
liewegas:cephadm-rgw-simplification
Mar 10, 2021
Merged

mgr/cephadm: simplify handling for rgw#39877
sebastian-philipp merged 7 commits intoceph:masterfrom
liewegas:cephadm-rgw-simplification

Conversation

@liewegas
Copy link
Member

@liewegas liewegas commented Mar 5, 2021

  • radosgw-admin fiddling with zone/realm config is out of scope
  • remove naming constraints
  • keep the --rgw-zone= and --rgw-realm= arguments, but make them optional. if omitted, we'll get the simple non-multisite rgw behavior with autocreated pools etc.
  • update rook to match

Existing octopus deployments will continue to work: they follow the old naming convention, and we're just allowing new specs to deviate from that convention.

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

Copy link
Contributor

@ofriedma ofriedma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should add rgw-zonegroup parameter too so we will be able to create and use non-default zonegroup name, otherwise LGTM

@liewegas
Copy link
Member Author

liewegas commented Mar 7, 2021

I think that we should add rgw-zonegroup parameter too so we will be able to create and use non-default zonegroup name, otherwise LGTM

Since cephadm isn't creating the zone, zonegroup, or realm any more, I don't think adding the argument will serve any purpose. IIUC, there isn't any rgw_zonegroup parameter needed during startup... setting the rgw_realm and rgw_zone options for the newly created service is the only reason we are providing those parameters at this point.

@liewegas
Copy link
Member Author

liewegas commented Mar 7, 2021

jenkins run make test

@liewegas liewegas force-pushed the cephadm-rgw-simplification branch from 472419b to faef3a4 Compare March 8, 2021 14:27
@liewegas liewegas force-pushed the cephadm-rgw-simplification branch from 3560fda to 80560e6 Compare March 8, 2021 19:48
def validate(self) -> None:
super(RGWSpec, self).validate()

if not self.rgw_realm:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this check now is the opposite.....
If the user sets a realm and a zone ... It is worth to check if the realm and the zone exists ?
What if the user only specifies the realm or the zone .... should we use "default" for the missing piece?

Copy link
Member

@jmolmo jmolmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you want to keep the orchestrator as a thin management layer, and this change really goes in that direction.
My only concern is that with this approach cephadm does not fully replace other installation tools.

Keep simple the orchestrator layer will help to provide a consistent interface (baremetal, k8s) is positive .. but in my opinion we must work in parallel in providing tools that make the final user life more easy, and over all, if we change the installation method we must avoid to lose functionality if we compare with previous installation tools.

With this change I understand that the Ceph dashboard is going to be the only tool provided to install and manage a Ceph cluster. Is that true?

@epuertat
Copy link
Member

epuertat commented Mar 9, 2021

@liewegas I lack the context for this change, as I don't find any tracker/reference linked.

Please correct me if I'm wrong, but as a side effect of this, from now on when ceph-dashboard implements RGW zones/etc. management (which is a goal for Quincy), we'll have to figure out another mechanism/tool providing orchestrator-like CLI access (since radosgw-admin is not available as MonCommand), and then additionally call the orchestrator for deploying/restarting/removing the impacted RGW daemons accordingly. Why not keeping zone/zg/realm config changes tightly coupled with their associated service lifecycles, rather than spreading their management across different tools and interfaces?

I don't know about DeepSea, but compared to Ceph-ansible this is opening a feature gap, as it supported multi-site deployment.

@liewegas
Copy link
Member Author

liewegas commented Mar 9, 2021

@liewegas I lack the context for this change, as I don't find any tracker/reference linked.

Please correct me if I'm wrong, but as a side effect of this, from now on when ceph-dashboard implements RGW zones/etc. management (which is a goal for Quincy), we'll have to figure out another mechanism/tool providing orchestrator-like CLI access (since radosgw-admin is not available as MonCommand), and then additionally call the orchestrator for deploying/restarting/removing the impacted RGW daemons accordingly. Why not keeping zone/zg/realm config changes tightly coupled with their associated service lifecycles, rather than spreading their management across different tools and interfaces?

I don't know about DeepSea, but compared to Ceph-ansible this is opening a feature gap, as it supported multi-site deployment.

I would put it this way: responsibility for fiddling with the zone/realm configuration doesn't belong in cephadm, because realm configuration is very complex and cephadm lacks the knowledge of the users' intentions to do it well. The ceph-ansible support for setting up multisite assumes a particular vanilla kind of deployment, but we need to be able to deploy daemons for more complex multisite configurations.

Alternatives:

  • put a series of more friendly commands in mgr/orchestrator, where it could be leveraged by both rook and cephadm, or
  • create a new 'ceph rgw ...' CLI subnamespace (which tbh is where i wish we could move all of radosgw-admin). This would be more analogous to mgr/volumes, which creates a 'ceph fs volume create' command that (as a last step) tells the orchestrator API to start up the MDS daemons.

To get a simplified experience, I think we should start with what the CLI/GUI UX should look like, and then work backwards from there.

The main part I'm worried about is if/how we should allow you to define a multisite realm configuration in an orchestrator .spec file... is that something we should support? If so, then we'll probably have some bit in cephadm that takes the blob describing the multisite config an calls out to generic orchestrator code to go make it so before deploying the services...

@sebastian-philipp
Copy link
Contributor

The main part I'm worried about is if/how we should allow you to define a multisite realm configuration in an orchestrator .spec file... is that something we should support? If so, then we'll probably have some bit in cephadm that takes the blob describing the multisite config an calls out to generic orchestrator code to go make it so before deploying the services...

If we want to have some form of cluster specification to be able to repeatedly create the cluster, we would need to support this somehow. I think this is supported by ansible, right?

@liewegas
Copy link
Member Author

liewegas commented Mar 9, 2021

The main part I'm worried about is if/how we should allow you to define a multisite realm configuration in an orchestrator .spec file... is that something we should support? If so, then we'll probably have some bit in cephadm that takes the blob describing the multisite config an calls out to generic orchestrator code to go make it so before deploying the services...

If we want to have some form of cluster specification to be able to repeatedly create the cluster, we would need to support this somehow. I think this is supported by ansible, right?

Yeah. So, maybe there is a module (in mgr/orchestrator somewhere) that knows how to parse a section describing the realm configuration and apply it? Or maybe rgw already has such a way to fully specify the config in one go.

@liewegas
Copy link
Member Author

liewegas commented Mar 9, 2021

Or, maybe services can have "pre-exec" steps, which are just CLI commands to exec in a shell before the service is created, somewhat analogous to the new config section. Kind of a kludge, but sufficient.

@sebastian-philipp
Copy link
Contributor

Yeah. So, maybe there is a module (in mgr/orchestrator somewhere) that knows how to parse a section describing the realm configuration and apply it? Or maybe rgw already has such a way to fully specify the config in one go.

This is getting big pretty quickly. https://github.com/rook/rook/blob/master/design/ceph/object/ceph-multisite-overview.md#ceph-multisite-data-model

realms:
  myrealm1: 
    zone_groups:
    - group1
    - group2
    master_zone_group: group1
    pull_from: ...
zone_groups:
  group1: 
    zones:
    - zone1  
    - zone2
    master_zone: zone1
zones:
  zone1:
    pools: ...

this looks pretty unrealistic to build just for cephadm.

It is simpler to consider this out of scope for the orchestrator.  The
user should set up their multisite realms/zones before deploying the
daemons (or the daemons will not start).  In the future we can wrap this
with a more friendly tool, perhaps.

Signed-off-by: Sage Weil <sage@newdream.net>
liewegas added 2 commits March 9, 2021 11:01
- Let users name the rgw service(s) whatever they like
- Make the rgw_zone and rgw_realm arguments optional, so that they can
  omit them and let radosgw start up in the generic single-cluster
  configuration (whichk, notably, has no realm).
- Continue to set the rgw_realm and rgw_zone options for the daemon(s),
  but only when those values are specified.
- Adjust the CLI accordingly.  Drop the subcluster argument, which was
  only used to generate a service_id.
- Adjust rook. This is actually a simplification and improved mapping onto
  the rook CRD, *except* that for octopus we enforced the realm.zone
  naming *and* realm==zone.  I doubt a single user actually did this
  so it is not be worth dealing with, but we need a special case for
  where there is a . in the service name.

Signed-off-by: Sage Weil <sage@newdream.net>
This is just so we can load up a stored spec after upgrade.  We'll silently
drop it, since we have the service_id, and this was only used to generate
that anyway.

Signed-off-by: Sage Weil <sage@newdream.net>
@github-actions
Copy link

github-actions bot commented Mar 9, 2021

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

liewegas added 4 commits March 9, 2021 14:29
Signed-off-by: Sage Weil <sage@newdream.net>
Note that cephadm.py will no longer do anything with rgw realms and
zones.  That means that the setup of rgw roles here is only useful
for the default zone and a non-multisite config.

Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
This is for backward compatibility: an octopus spec yaml can still be
applied to an existing cluster.  Note that it might not work on a new
cluster, since cephadm no longer tries to create the realm or zone if
they don't exist.

Signed-off-by: Sage Weil <sage@newdream.net>
@liewegas liewegas force-pushed the cephadm-rgw-simplification branch from 3560fda to cca7391 Compare March 9, 2021 19:29
@sebastian-philipp
Copy link
Contributor

@ofriedma want to review this?

@ofriedma
Copy link
Contributor

LGTM

@ofriedma ofriedma self-requested a review March 10, 2021 11:36
@sebastian-philipp
Copy link
Contributor

sebastian-philipp commented Mar 10, 2021

sage: SebastianW: #39877 is passing tests

@sebastian-philipp sebastian-philipp merged commit b6a2dfc into ceph:master Mar 10, 2021
@sebastian-philipp
Copy link
Contributor

Follow-ups:

alfonsomthd added a commit to liewegas/ceph that referenced this pull request Aug 9, 2021
…ce spec.

Align rgw service id pattern with cephadm: ceph#39877
  - Update rgw pattern to allow service id for non-multisite config.
  - Extract realm and zone from service id (when detected) and add them to the service spec.

Fixes: https://tracker.ceph.com/issues/44605
Signed-off-by: Alfonso Martínez <almartin@redhat.com>
alfonsomthd added a commit to liewegas/ceph that referenced this pull request Aug 10, 2021
…ce spec.

Align rgw service id pattern with cephadm: ceph#39877
  - Update rgw pattern to allow service id for non-multisite config.
  - Extract realm and zone from service id (when detected) and add them to the service spec.

Fixes: https://tracker.ceph.com/issues/44605
Signed-off-by: Alfonso Martínez <almartin@redhat.com>
aaSharma14 pushed a commit to rhcs-dashboard/ceph that referenced this pull request Aug 25, 2021
…ce spec.

Align rgw service id pattern with cephadm: ceph#39877
  - Update rgw pattern to allow service id for non-multisite config.
  - Extract realm and zone from service id (when detected) and add them to the service spec.

Fixes: https://tracker.ceph.com/issues/44605
Signed-off-by: Alfonso Martínez <almartin@redhat.com>
sebastian-philipp pushed a commit to sebastian-philipp/ceph that referenced this pull request Sep 2, 2021
…ce spec.

Align rgw service id pattern with cephadm: ceph#39877
  - Update rgw pattern to allow service id for non-multisite config.
  - Extract realm and zone from service id (when detected) and add them to the service spec.

Fixes: https://tracker.ceph.com/issues/44605
Signed-off-by: Alfonso Martínez <almartin@redhat.com>
(cherry picked from commit 0575844)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants