cephadm: error trying to get ceph auth entry for crash daemon#35274
cephadm: error trying to get ceph auth entry for crash daemon#35274sebastian-philipp merged 1 commit intoceph:masterfrom
Conversation
|
This would generate a big chaos of existing daemons using the bare name and new daemons using the fqdn. I think changing this for octopus is very late now. Do you think we can also fix this by improving ceph/src/pybind/mgr/cephadm/utils.py Lines 18 to 30 in 9e1e025 ? |
I thing that not using the real name of the host is going to cause more problems in the future. Besides that, this adds the need to use a "consensus" about when to use the real name or the abbreviated name... another source of problems. |
|
don't know. Ceph already prefers bare host names. Imagine if you simply want to change the domain of a cluster. That would be extremely complicated, if we add this to the daemon names. |
|
I have the feeling that doing this radical change of creating new daemons with a different naming scheme requires some more thought. Might be something to do for pacific? |
Avoiding the error with a not wonderful trick. (In spanish we call this kind of things "ñapa") |
src/pybind/mgr/cephadm/module.py
Outdated
| partial_auth_key = daemon_id | ||
| if daemon_type == 'crash': # | ||
| partial_auth_key = host |
There was a problem hiding this comment.
can you move this code into utils.name_to_auth_entity?
There was a problem hiding this comment.
Done.
Note: It seems that mgr also changed his auth key
|
Solved a couple of tricky collateral "mypy" issues. |
|
|
ping @jmolmo |
src/pybind/mgr/cephadm/module.py
Outdated
| host, # type str | ||
| entity, # type str | ||
| command, # type str | ||
| args, # type List[str] | ||
| addr = "", # type Optional[str] | ||
| stdin = "", # type Optional[str] | ||
| no_fsid = False, # type Optional[bool] | ||
| error_ok = False, # type Optional[bool] | ||
| image = False, # type Optional[str] | ||
| env_vars= None # type Optional[List[str]] |
There was a problem hiding this comment.
why change this to the old style type annotation?
There was a problem hiding this comment.
Because I didn't know what is the preferred annotation type... by your comment I assume it is the "new" annotation style. I will change.
| def name_to_auth_entity(daemon_type, # type: str | ||
| daemon_id, # type: str | ||
| host = "" # type Optional[str] = "" | ||
| ): | ||
| """ | ||
| Map from daemon names to ceph entity names (as seen in config) | ||
| Map from daemon names/host to ceph entity names (as seen in config) | ||
| """ | ||
| daemon_type = name.split('.', 1)[0] | ||
| if daemon_type in ['rgw', 'rbd-mirror', 'nfs', 'crash', 'iscsi']: | ||
| return 'client.' + name | ||
| if daemon_type in ['rgw', 'rbd-mirror', 'nfs', "iscsi"]: | ||
| return 'client.' + daemon_type + "." + daemon_id | ||
| elif daemon_type == 'crash': | ||
| return 'client.' + daemon_type + "." + host | ||
| elif daemon_type == 'mon': | ||
| return 'mon.' | ||
| elif daemon_type in ['osd', 'mds', 'mgr', 'client']: | ||
| return name | ||
| elif daemon_type == 'mgr': | ||
| return daemon_type + "." + daemon_id | ||
| elif daemon_type in ['osd', 'mds', 'client']: | ||
| return daemon_type + "." + daemon_id | ||
| else: | ||
| raise OrchestratorError("unknown auth entity name") |
There was a problem hiding this comment.
you want to add a pytest for this?
If your cluster has nodes with a . in the name. This will happen. Signed-off-by: Juan Miguel Olmo Martínez <jolmomar@redhat.com>
|
Failures:
|
| raise OrchestratorError('no hosts defined') | ||
| out, err, code = self._run_cephadm( | ||
| host, None, 'pull', [], | ||
| host, '', 'pull', [], |
There was a problem hiding this comment.
unfortunately, the upgrade test totally failed: http://pulpito.ceph.com/swagner-2020-06-23_11:55:14-rados:cephadm-wip-swagner-testing-2020-06-23-1057-distro-basic-smithi/5172323/
I'd like to verify this PR in a new run.
|
jenkins test make check |
If your cluster has nodes with a <.> in the name. This will happen.
Is the case if your hosts use a FQDN.
It seems that we changed the name of the host when using the <get_unique_name> function... Maybe I miss something .. probably ... but why we needed to change the name of the host?
I have removed the "problematic lines"
Fixes: https://tracker.ceph.com/issues/45726
Details:
Extracted the line where we tried to get auth details for the crash daemon, and executed alone:
Checking what entries we have with "crash":
so.. it seems that we are searching using a wrong key (client.crash.ceph-node-00) instead the valid key (client.crash.ceph-node-00.cephlab.com)
Digging in the code results that we need "ename" in the daemonid...
and the daemon_id is wrong ...