python-common: clean-up ServiceSpec.service_id handling by mgfritch · Pull Request #35839 · ceph/ceph

mgfritch · 2020-06-30T01:15:42Z

ServiceSpec of type 'mon' and 'mgr' must not have a service_id

Fixes: https://tracker.ceph.com/issues/46175
Signed-off-by: Michael Fritch mfritch@suse.com

Checklist

References tracker ticket
Updates documentation if necessary
Includes tests for new functionality or reproducer for bug

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard backend
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox

mgfritch · 2020-06-30T01:17:31Z

PR #35838 needs to be merged before tests will pass.

sebastian-philipp

What about services already in the cephadm spec store? Do we need a migration?

src/python-common/ceph/deployment/service_spec.py

mgfritch · 2020-06-30T22:24:24Z

What about services already in the cephadm spec store? Do we need a migration?

I'm not sure a migration would be helpful, I've seen two cases thus far:

A single spec (w/ service_id), but unable to map daemon to the spec (orphan)
Two specs (one w/ service_id and another without service_id), where daemon is mapped to spec w/o service_id

In either case, the spec with a service_id would need to be manually removed.

mgfritch · 2020-06-30T22:25:12Z

@jschmid1 this PR works out aok for OSD/DriveGroups?

jschmid1 · 2020-07-01T14:46:05Z

@jschmid1 this PR works out aok for OSD/DriveGroups?

yes, I don't see a problem with this. We recommend using a service_id for a while now.

sebastian-philipp · 2020-07-01T15:23:48Z

@jschmid1 this PR works out aok for OSD/DriveGroups?

yes, I don't see a problem with this. We recommend using a service_id for a while now.

do we need to migrate existing clusters that have osdspecs w/o id?

jschmid1 · 2020-07-02T08:02:07Z

do we need to migrate existing clusters that have osdspecs w/o id?

mh.. that brings us back to the question:

"where to raise validation errors?"

Imo we shouldn't always automatically migrate things that have changed. In this case we should make the change, and then ask the user, in some form, to assign a service_id to the spec if necessary. Otherwise the deployment for this spec is blocked.

sebastian-philipp · 2020-07-02T08:09:50Z

do we need to migrate existing clusters that have osdspecs w/o id?

mh.. that brings us back to the question:

"where to raise validation errors?"

Imo we shouldn't always automatically migrate things that have changed. In this case we should make the change, and then ask the user, in some form, to assign a service_id to the spec if necessary. Otherwise the deployment for this spec is blocked.

sounds like a HEALTH_WARN to me?

jschmid1 · 2020-07-02T08:31:25Z

sounds like a HEALTH_WARN to me?

maybe even an error, as this would block any new deployment?

sebastian-philipp

approve. as long as CephadmOrchestrator.__init__ can load specs that fail to validate.

mgfritch · 2020-07-02T18:00:30Z

approve. as long as CephadmOrchestrator.__init__ can load specs that fail to validate.

after a quick test, any spec that fails to validate appears to be handled by this block :

ceph/src/pybind/mgr/cephadm/inventory.py

Lines 125 to 128 in 0e6004c

    
           except Exception as e: 
        
               self.mgr.log.warning('unable to load spec for %s: %s' % ( 
        
                   service_name, e)) 
        
               pass

jschmid1 · 2020-07-03T10:09:50Z

approve. as long as CephadmOrchestrator.__init__ can load specs that fail to validate.

after a quick test, any spec that fails to validate appears to be handled by this block :

ceph/src/pybind/mgr/cephadm/inventory.py

Lines 125 to 128 in 0e6004c

except Exception as e:

self.mgr.log.warning('unable to load spec for %s: %s' % (

service_name, e))

pass

This would just silently fail :/

we should really implement a error reporting mechanism.

sebastian-philipp · 2020-07-14T14:01:08Z

https://tracker.ceph.com/issues/46534

sebastian-philipp · 2020-07-14T14:18:25Z

jenkins test make check

sebastian-philipp · 2020-07-15T07:55:52Z

still getting

=================================== FAILURES ===================================
_____________________ test_service_name[rgw-s_id-rgw.s_id] _____________________

s_type = 'rgw', s_id = 's_id', s_name = 'rgw.s_id'

    @pytest.mark.parametrize(
        "s_type,s_id,s_name",
        [
            ('mgr', 's_id', 'mgr'),
            ('mon', 's_id', 'mon'),
            ('mds', 's_id', 'mds.s_id'),
            ('rgw', 's_id', 'rgw.s_id'),
            ('nfs', 's_id', 'nfs.s_id'),
            ('iscsi', 's_id', 'iscsi.s_id'),
            ('osd', 's_id', 'osd.s_id'),
        ])
    def test_service_name(s_type, s_id, s_name):
>       spec = ServiceSpec.from_json(_get_dict_spec(s_type, s_id))

ceph/tests/test_service_spec.py:187: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
ceph/deployment/service_spec.py:39: in inner
    return method(cls, *args, **kwargs)
ceph/deployment/service_spec.py:483: in from_json
    return _cls._from_json_impl(c)  # type: ignore
ceph/deployment/service_spec.py:495: in _from_json_impl
    _cls = cls(**args)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = RGWSpec({}), service_type = 'rgw', service_id = 's_id'
placement = PlacementSpec(hosts=[HostPlacementSpec(hostname='host1', network='1.1.1.1', name='')])
rgw_realm = 's_id', rgw_zone = 'zone', subcluster = None
rgw_frontend_port = None, rgw_frontend_ssl_certificate = None
rgw_frontend_ssl_key = None, unmanaged = False, ssl = False

    def __init__(self,
                 service_type: str = 'rgw',
                 service_id: Optional[str] = None,
                 placement: Optional[PlacementSpec] = None,
                 rgw_realm: Optional[str] = None,
                 rgw_zone: Optional[str] = None,
                 subcluster: Optional[str] = None,
                 rgw_frontend_port: Optional[int] = None,
                 rgw_frontend_ssl_certificate: Optional[List[str]] = None,
                 rgw_frontend_ssl_key: Optional[List[str]] = None,
                 unmanaged: bool = False,
                 ssl: bool = False,
                 ):
        assert service_type == 'rgw', service_type
        if service_id:
            a = service_id.split('.', 2)
            rgw_realm = a[0]
>           rgw_zone = a[1]
E           IndexError: list index out of range

ceph/deployment/service_spec.py:622: IndexError

https://jenkins.ceph.com/job/ceph-pull-requests/55577/consoleFull#-72518471e840cee4-f4a4-4183-81dd-42855615f2c1

sebastian-philipp · 2020-07-15T12:55:03Z

jenkins test make check

mgfritch · 2020-07-16T22:00:17Z

approve. as long as CephadmOrchestrator.__init__ can load specs that fail to validate.

after a quick test, any spec that fails to validate appears to be handled by this block :

ceph/src/pybind/mgr/cephadm/inventory.py

Lines 125 to 128 in 0e6004c

except Exception as e:

self.mgr.log.warning('unable to load spec for %s: %s' % (

service_name, e))

pass

This would just silently fail :/

we should really implement a error reporting mechanism.

agree 👍

sebastian-philipp · 2020-07-20T09:37:02Z

Hm. can't test this together with #35667

sebastian-philipp · 2020-07-22T10:46:14Z

do you also want to fix the documentation under https://docs.ceph.com/docs/master/mgr/orchestrator/#placement-specification ?
(https://tracker.ceph.com/issues/46377 )

mgfritch · 2020-07-22T22:39:49Z

do you also want to fix the documentation under https://docs.ceph.com/docs/master/mgr/orchestrator/#placement-specification ?
(https://tracker.ceph.com/issues/46377 )

done

service_id is required for iscsi, mds, nfs, osd, rgw. any other service_type (mon, mgr, etc.) should not contain a service_id Fixes: https://tracker.ceph.com/issues/46175 Signed-off-by: Michael Fritch <mfritch@suse.com>

example for deploying multiple specs via yaml was missing the service_id Fixes: https://tracker.ceph.com/issues/46377 Signed-off-by: Michael Fritch <mfritch@suse.com>

sebastian-philipp · 2020-07-23T08:11:01Z

src/python-common/ceph/tests/test_service_spec.py

                                         service_id='foo'
                                     ),
-                                     False
+                                     True


ok that's scary. I'm pretty sure we have existing clusters with multiple MON specs applied already:

service_type: mon placement: count: 5 --- service_type: mon service_id: mon placement: hosts: - host1 - host2 - host3

we need to handle this case properly

sebastian-philipp · 2020-07-23T15:51:08Z

QA Succeeded

https://pulpito.ceph.com/swagner-2020-07-23_13:30:55-rados:cephadm-wip-swagner-testing-2020-07-23-1023-distro-basic-smithi/

mgfritch added orchestrator cephadm labels Jun 30, 2020

mgfritch requested a review from a team as a code owner June 30, 2020 01:15

sebastian-philipp suggested changes Jun 30, 2020

View reviewed changes

src/python-common/ceph/deployment/service_spec.py Outdated Show resolved Hide resolved

src/python-common/ceph/deployment/service_spec.py Outdated Show resolved Hide resolved

src/python-common/ceph/deployment/service_spec.py Outdated Show resolved Hide resolved

mgfritch force-pushed the cephadm-ignore-mon-mgr-svc-id branch from 34ce420 to ccf07a7 Compare June 30, 2020 22:16

mgfritch changed the title ~~python-common: ignore service_id for 'mon' and 'mgr'~~ python-common: clean-up ServiceSpec.service_id handling Jun 30, 2020

mgfritch requested a review from jschmid1 June 30, 2020 22:24

sebastian-philipp approved these changes Jul 2, 2020

View reviewed changes

sebastian-philipp added the wip-swagner-testing My Teuthology tests label Jul 14, 2020

sebastian-philipp removed the wip-swagner-testing My Teuthology tests label Jul 15, 2020

sebastian-philipp added the wip-swagner3-testing label Jul 20, 2020

sebastian-philipp added needs-qa and removed wip-swagner3-testing labels Jul 20, 2020

sebastian-philipp mentioned this pull request Jul 20, 2020

mgr/cephadm: rework --dry-run/previews #35667

Merged

8 tasks

sebastian-philipp added the needs-rebase label Jul 22, 2020

mgfritch force-pushed the cephadm-ignore-mon-mgr-svc-id branch from ccf07a7 to a1e8642 Compare July 22, 2020 22:39

mgfritch removed the needs-rebase label Jul 22, 2020

mgfritch added 2 commits July 22, 2020 16:41

python-common: clean-up ServiceSpec.service_id handling

eecc8fc

service_id is required for iscsi, mds, nfs, osd, rgw. any other service_type (mon, mgr, etc.) should not contain a service_id Fixes: https://tracker.ceph.com/issues/46175 Signed-off-by: Michael Fritch <mfritch@suse.com>

doc/mgr/orchestrator: add missing OSD service_id

7906460

example for deploying multiple specs via yaml was missing the service_id Fixes: https://tracker.ceph.com/issues/46377 Signed-off-by: Michael Fritch <mfritch@suse.com>

mgfritch force-pushed the cephadm-ignore-mon-mgr-svc-id branch from a1e8642 to 7906460 Compare July 22, 2020 22:42

sebastian-philipp reviewed Jul 23, 2020

View reviewed changes

sebastian-philipp added the wip-swagner-testing My Teuthology tests label Jul 23, 2020

sebastian-philipp merged commit 93fca6a into ceph:master Jul 23, 2020

mgfritch deleted the cephadm-ignore-mon-mgr-svc-id branch July 23, 2020 21:14

sebastian-philipp mentioned this pull request Jul 24, 2020

mgr/cephadm: Add migration to keep the service names consistent #36284

Merged

3 tasks

sebastian-philipp mentioned this pull request Aug 4, 2020

octopus: cephadm batch backport August (1) #36450

Merged

Conversation

mgfritch commented Jun 30, 2020

Checklist

Uh oh!

mgfritch commented Jun 30, 2020

Uh oh!

sebastian-philipp left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mgfritch commented Jun 30, 2020

Uh oh!

mgfritch commented Jun 30, 2020

Uh oh!

jschmid1 commented Jul 1, 2020

Uh oh!

sebastian-philipp commented Jul 1, 2020

Uh oh!

jschmid1 commented Jul 2, 2020

Uh oh!

sebastian-philipp commented Jul 2, 2020

Uh oh!

jschmid1 commented Jul 2, 2020

Uh oh!

sebastian-philipp left a comment

Choose a reason for hiding this comment

Uh oh!

mgfritch commented Jul 2, 2020

Uh oh!

jschmid1 commented Jul 3, 2020

Uh oh!

sebastian-philipp commented Jul 14, 2020

Uh oh!

sebastian-philipp commented Jul 14, 2020

Uh oh!

sebastian-philipp commented Jul 15, 2020

Uh oh!

sebastian-philipp commented Jul 15, 2020

Uh oh!

mgfritch commented Jul 16, 2020

Uh oh!

sebastian-philipp commented Jul 20, 2020

Uh oh!

sebastian-philipp commented Jul 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mgfritch commented Jul 22, 2020

Uh oh!

sebastian-philipp Jul 23, 2020

Choose a reason for hiding this comment

Uh oh!

sebastian-philipp commented Jul 23, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sebastian-philipp left a comment •

edited

Loading

sebastian-philipp commented Jul 22, 2020 •

edited

Loading