Skip to content

python-common: clean-up ServiceSpec.service_id handling#35839

Merged
sebastian-philipp merged 2 commits intoceph:masterfrom
mgfritch:cephadm-ignore-mon-mgr-svc-id
Jul 23, 2020
Merged

python-common: clean-up ServiceSpec.service_id handling#35839
sebastian-philipp merged 2 commits intoceph:masterfrom
mgfritch:cephadm-ignore-mon-mgr-svc-id

Conversation

@mgfritch
Copy link
Contributor

ServiceSpec of type 'mon' and 'mgr' must not have a service_id

Fixes: https://tracker.ceph.com/issues/46175
Signed-off-by: Michael Fritch mfritch@suse.com

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard backend
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

@mgfritch
Copy link
Contributor Author

PR #35838 needs to be merged before tests will pass.

Copy link
Contributor

@sebastian-philipp sebastian-philipp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about services already in the cephadm spec store? Do we need a migration?

@mgfritch mgfritch force-pushed the cephadm-ignore-mon-mgr-svc-id branch from 34ce420 to ccf07a7 Compare June 30, 2020 22:16
@mgfritch mgfritch changed the title python-common: ignore service_id for 'mon' and 'mgr' python-common: clean-up ServiceSpec.service_id handling Jun 30, 2020
@mgfritch
Copy link
Contributor Author

What about services already in the cephadm spec store? Do we need a migration?

I'm not sure a migration would be helpful, I've seen two cases thus far:

  1. A single spec (w/ service_id), but unable to map daemon to the spec (orphan)
  2. Two specs (one w/ service_id and another without service_id), where daemon is mapped to spec w/o service_id

In either case, the spec with a service_id would need to be manually removed.

@mgfritch mgfritch requested a review from jschmid1 June 30, 2020 22:24
@mgfritch
Copy link
Contributor Author

@jschmid1 this PR works out aok for OSD/DriveGroups?

@jschmid1
Copy link
Contributor

jschmid1 commented Jul 1, 2020

@jschmid1 this PR works out aok for OSD/DriveGroups?

yes, I don't see a problem with this. We recommend using a service_id for a while now.

@sebastian-philipp
Copy link
Contributor

@jschmid1 this PR works out aok for OSD/DriveGroups?

yes, I don't see a problem with this. We recommend using a service_id for a while now.

do we need to migrate existing clusters that have osdspecs w/o id?

@jschmid1
Copy link
Contributor

jschmid1 commented Jul 2, 2020

do we need to migrate existing clusters that have osdspecs w/o id?

mh.. that brings us back to the question:

"where to raise validation errors?"

Imo we shouldn't always automatically migrate things that have changed. In this case we should make the change, and then ask the user, in some form, to assign a service_id to the spec if necessary. Otherwise the deployment for this spec is blocked.

@sebastian-philipp
Copy link
Contributor

do we need to migrate existing clusters that have osdspecs w/o id?

mh.. that brings us back to the question:

"where to raise validation errors?"

Imo we shouldn't always automatically migrate things that have changed. In this case we should make the change, and then ask the user, in some form, to assign a service_id to the spec if necessary. Otherwise the deployment for this spec is blocked.

sounds like a HEALTH_WARN to me?

@jschmid1
Copy link
Contributor

jschmid1 commented Jul 2, 2020

sounds like a HEALTH_WARN to me?

maybe even an error, as this would block any new deployment?

Copy link
Contributor

@sebastian-philipp sebastian-philipp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approve. as long as CephadmOrchestrator.__init__ can load specs that fail to validate.

@mgfritch
Copy link
Contributor Author

mgfritch commented Jul 2, 2020

approve. as long as CephadmOrchestrator.__init__ can load specs that fail to validate.

after a quick test, any spec that fails to validate appears to be handled by this block :

except Exception as e:
self.mgr.log.warning('unable to load spec for %s: %s' % (
service_name, e))
pass

@jschmid1
Copy link
Contributor

jschmid1 commented Jul 3, 2020

approve. as long as CephadmOrchestrator.__init__ can load specs that fail to validate.

after a quick test, any spec that fails to validate appears to be handled by this block :

except Exception as e:
self.mgr.log.warning('unable to load spec for %s: %s' % (
service_name, e))
pass

This would just silently fail :/

we should really implement a error reporting mechanism.

@sebastian-philipp sebastian-philipp added the wip-swagner-testing My Teuthology tests label Jul 14, 2020
@sebastian-philipp
Copy link
Contributor

jenkins test make check

@sebastian-philipp
Copy link
Contributor

still getting

=================================== FAILURES ===================================
_____________________ test_service_name[rgw-s_id-rgw.s_id] _____________________

s_type = 'rgw', s_id = 's_id', s_name = 'rgw.s_id'

    @pytest.mark.parametrize(
        "s_type,s_id,s_name",
        [
            ('mgr', 's_id', 'mgr'),
            ('mon', 's_id', 'mon'),
            ('mds', 's_id', 'mds.s_id'),
            ('rgw', 's_id', 'rgw.s_id'),
            ('nfs', 's_id', 'nfs.s_id'),
            ('iscsi', 's_id', 'iscsi.s_id'),
            ('osd', 's_id', 'osd.s_id'),
        ])
    def test_service_name(s_type, s_id, s_name):
>       spec = ServiceSpec.from_json(_get_dict_spec(s_type, s_id))

ceph/tests/test_service_spec.py:187: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
ceph/deployment/service_spec.py:39: in inner
    return method(cls, *args, **kwargs)
ceph/deployment/service_spec.py:483: in from_json
    return _cls._from_json_impl(c)  # type: ignore
ceph/deployment/service_spec.py:495: in _from_json_impl
    _cls = cls(**args)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = RGWSpec({}), service_type = 'rgw', service_id = 's_id'
placement = PlacementSpec(hosts=[HostPlacementSpec(hostname='host1', network='1.1.1.1', name='')])
rgw_realm = 's_id', rgw_zone = 'zone', subcluster = None
rgw_frontend_port = None, rgw_frontend_ssl_certificate = None
rgw_frontend_ssl_key = None, unmanaged = False, ssl = False

    def __init__(self,
                 service_type: str = 'rgw',
                 service_id: Optional[str] = None,
                 placement: Optional[PlacementSpec] = None,
                 rgw_realm: Optional[str] = None,
                 rgw_zone: Optional[str] = None,
                 subcluster: Optional[str] = None,
                 rgw_frontend_port: Optional[int] = None,
                 rgw_frontend_ssl_certificate: Optional[List[str]] = None,
                 rgw_frontend_ssl_key: Optional[List[str]] = None,
                 unmanaged: bool = False,
                 ssl: bool = False,
                 ):
        assert service_type == 'rgw', service_type
        if service_id:
            a = service_id.split('.', 2)
            rgw_realm = a[0]
>           rgw_zone = a[1]
E           IndexError: list index out of range

ceph/deployment/service_spec.py:622: IndexError

https://jenkins.ceph.com/job/ceph-pull-requests/55577/consoleFull#-72518471e840cee4-f4a4-4183-81dd-42855615f2c1

@sebastian-philipp sebastian-philipp removed the wip-swagner-testing My Teuthology tests label Jul 15, 2020
@sebastian-philipp
Copy link
Contributor

jenkins test make check

@mgfritch
Copy link
Contributor Author

approve. as long as CephadmOrchestrator.__init__ can load specs that fail to validate.

after a quick test, any spec that fails to validate appears to be handled by this block :

except Exception as e:
self.mgr.log.warning('unable to load spec for %s: %s' % (
service_name, e))
pass

This would just silently fail :/

we should really implement a error reporting mechanism.

agree 👍

@sebastian-philipp
Copy link
Contributor

Hm. can't test this together with #35667

@sebastian-philipp
Copy link
Contributor

sebastian-philipp commented Jul 22, 2020

do you also want to fix the documentation under https://docs.ceph.com/docs/master/mgr/orchestrator/#placement-specification ?
(https://tracker.ceph.com/issues/46377 )

@mgfritch
Copy link
Contributor Author

do you also want to fix the documentation under https://docs.ceph.com/docs/master/mgr/orchestrator/#placement-specification ?
(https://tracker.ceph.com/issues/46377 )

done

mgfritch added 2 commits July 22, 2020 16:41
service_id is required for iscsi, mds, nfs, osd, rgw.

any other service_type (mon, mgr, etc.) should not contain a service_id

Fixes: https://tracker.ceph.com/issues/46175
Signed-off-by: Michael Fritch <mfritch@suse.com>
example for deploying multiple specs via yaml was missing the service_id

Fixes: https://tracker.ceph.com/issues/46377
Signed-off-by: Michael Fritch <mfritch@suse.com>
@mgfritch mgfritch force-pushed the cephadm-ignore-mon-mgr-svc-id branch from a1e8642 to 7906460 Compare July 22, 2020 22:42
service_id='foo'
),
False
True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok that's scary. I'm pretty sure we have existing clusters with multiple MON specs applied already:

service_type: mon
placement:
  count: 5
---
service_type: mon
service_id: mon
placement:
  hosts:
  - host1
  - host2
  - host3

we need to handle this case properly

@sebastian-philipp sebastian-philipp added the wip-swagner-testing My Teuthology tests label Jul 23, 2020
@sebastian-philipp
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants