Skip to content

mgr/cephadm: Use mon command to set grafana url in the dashboard#34592

Closed
sebastian-philipp wants to merge 3 commits intoceph:masterfrom
sebastian-philipp:cephadm-dont-break-dashbaord-setting
Closed

mgr/cephadm: Use mon command to set grafana url in the dashboard#34592
sebastian-philipp wants to merge 3 commits intoceph:masterfrom
sebastian-philipp:cephadm-dont-break-dashbaord-setting

Conversation

@sebastian-philipp
Copy link
Contributor

Fixes: https://tracker.ceph.com/issues/44877
Signed-off-by: Sebastian Wagner sebastian.wagner@suse.com

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard backend
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

Daniel-Pivonka
Daniel-Pivonka previously approved these changes Apr 16, 2020
@sebastian-philipp sebastian-philipp added the wip-swagner-testing My Teuthology tests label Apr 16, 2020
@sebastian-philipp sebastian-philipp removed the wip-swagner-testing My Teuthology tests label Apr 17, 2020
Copy link
Member

@epuertat epuertat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this comes from a user requesting:

The dashboard has grafana-api-url set to https://host:3000. I want to set it to https://host.domain.tld:3000, but my change gets revoked every time!

... can't we simply make cephadm FQDN aware? This does not only applies to Grafana, but also to every user-facing web server (Prometheus, Alertmanager, RGW, HAProxy...).

- url = 'https://%s:3000' % (self.inventory[host].get('addr', host))
+ url = 'https://%s:3000' % (socket.getfqdn(self.inventory[host].get('addr', host)))

That should try to reverse lookup that IP/hostname into a FQDN. If that doesn't properly works, probably it's because some networking setting is missing. Alternatively, the inventory lookup could also resolve to FQDNs, which might allow users to set up complex networking with different domains.

@jjahns
Copy link

jjahns commented Apr 23, 2020

Hi,

Is there any chance that this will get fixed? We have to have the FQDN to access Grafana, as we have certs with a CN containing FQDN, and our users do not have hosts entries on their machines to access the system containing the Grafana container. This is a big deal for operations staff, who have already flagged this as a non-starter.

@jjahns
Copy link

jjahns commented Apr 24, 2020

Hi,

Is there any chance that this will get fixed? We have to have the FQDN to access Grafana, as we have certs with a CN containing FQDN, and our users do not have hosts entries on their machines to access the system containing the Grafana container. This is a big deal for operations staff, who have already flagged this as a non-starter.

To add context to this - if you use cephadm --bootstrap, and then FQDN, the grafana_api_url is set to the FQDN, but this is less than ideal, as there are some weird conditions that pop up with the list of services in the dashboard (i.e. any further hosts added only show the mgr daemon in the host list, and mons do not show up).

However, if you don't skip the monitoring components when bootstrapping, it will set the API URL to https://host:3000 (not FQDN).

This needs to be fixed if Grafana is going to be usable in the dashboard.

@sebastian-philipp sebastian-philipp force-pushed the cephadm-dont-break-dashbaord-setting branch from f7a7757 to 1fce1de Compare April 24, 2020 08:30
@sebastian-philipp sebastian-philipp force-pushed the cephadm-dont-break-dashbaord-setting branch 2 times, most recently from 0cac8ea to 2957e98 Compare April 24, 2020 08:34
@sebastian-philipp
Copy link
Contributor Author

@ceph/dashboard I'm agreeing with Ernesto that the cleanest API is the Mon Command API. If that's ok for you, we should use that from now on.

@jjahns
Copy link

jjahns commented Apr 24, 2020

Actually, here is the workaround I utilized on Octopus:
Bootstrap with --skip-monitoring-stack

cephadm shell
ceph orch host set-addr {hostname} {hostname.fqdn}
ceph orch apply grafana {hostname}
ceph orch apply alertmanager {hostname}
ceph orch apply node-exporter {hostname}
ceph orch apply prometheus {hostname}

Tests on our environment result in FQDN being used for GRAFANA_API_URL, which is what we needed in order to have cert complaince, and for clients outside of the environment access to the monitoring dashboards through the Ceph dashboard.

I think the permanent solution needs to be a couple of things:

  • Documentation
  • Allow cephadm to specify the address of the host similar to orchestrator during bootstrap operations, so the monitoring stack doesn't need to be installed afterwards

Copy link
Member

@epuertat epuertat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good (left some comments over there). Thanks for taking into account my suggestions!

Comment on lines +1669 to +1639
"""
Make sure, we have sane hostname.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's just me that took me a while to understand this function. Perhaps some comments would help.

So this results in cephadm either allowing all hosts as FQDN or as short hostnames? Is it that?

Wouldn't it be better to make this explicit? For example, FQDN would require '--fqdn' flag, otherwise short hostnames is assumed. I'm not much of a fan of validation rules changing implicitly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to make this explicit? For example, FQDN would require '--fqdn' flag, otherwise short hostnames is assumed. I'm not much of a fan of validation rules changing implicitly.

that would require

if '.' in hostname and not args.allow_fqdn_hostname:

to be moved into ceph itself. might be something for a follow-up pr

Use Dashboard's {get,set}-grafana-api-url

Fixes: https://tracker.ceph.com/issues/44877

Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>
Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>
Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>
@sebastian-philipp sebastian-philipp force-pushed the cephadm-dont-break-dashbaord-setting branch from 2957e98 to 9c8d900 Compare May 5, 2020 13:19
@sebastian-philipp sebastian-philipp changed the title mgr: Don't break dashboard's grafana url, if set by cephadm mgr/cephadm: Use mon command to set grafana url in the dashboard May 7, 2020
@sebastian-philipp
Copy link
Contributor Author

2020-05-08T14:35:26.215 INFO:ceph.mgr.x.smithi073.stdout:May 08 14:35:25 smithi073 bash[21868]: Traceback (most recent call last):
2020-05-08T14:35:26.215 INFO:ceph.mgr.x.smithi073.stdout:May 08 14:35:25 smithi073 bash[21868]:   File "/usr/share/ceph/mgr/cephadm/module.py", line 1888, in _check_daemons
2020-05-08T14:35:26.215 INFO:ceph.mgr.x.smithi073.stdout:May 08 14:35:25 smithi073 bash[21868]:     ret, current_url, err = self.check_mon_command({"prefix": "get-grafana-api-url"})
2020-05-08T14:35:26.216 INFO:ceph.mgr.x.smithi073.stdout:May 08 14:35:25 smithi073 bash[21868]:   File "/usr/share/ceph/mgr/mgr_module.py", line 1096, in check_mon_command
2020-05-08T14:35:26.216 INFO:ceph.mgr.x.smithi073.stdout:May 08 14:35:25 smithi073 bash[21868]:     raise MonCommandFailed(f'{cmd_dict["prefix"]} failed: {r.stderr}')
2020-05-08T14:35:26.216 INFO:ceph.mgr.x.smithi073.stdout:May 08 14:35:25 smithi073 bash[21868]: mgr_module.MonCommandFailed: get-grafana-api-url failed: command not known

http://qa-proxy.ceph.com/teuthology/swagner-2020-05-08_13:49:07-rados-wip-swagner-testing-2020-05-08-1133-distro-basic-smithi/5034678/teuthology.log

@sebastian-philipp
Copy link
Contributor Author

it's up to @ceph/dashboard to allow users to access Grafana.

@epuertat
Copy link
Member

epuertat commented Jun 2, 2020

it's up to @ceph/dashboard to allow users to access Grafana.

Hey, @sebastian-philipp , why this change of mind?

@sebastian-philipp
Copy link
Contributor Author

Mainly, cause (1) you took the responsibility to make this work and (2) I don't have any time to work on this.

@sebastian-philipp
Copy link
Contributor Author

Just note the the current workaround of specifying the client's hostname of grafana in cephadm is wrong: cephadm prefers bare host names. If users need to access grafana using the FQDN, it's the dashboards responsibility to make this possible.

@epuertat
Copy link
Member

epuertat commented Jun 2, 2020

So, this means no FQDN support at all, that including RGW or HAProxy, right?

@sebastian-philipp
Copy link
Contributor Author

So, this means no FQDN support at all, that including RGW or HAProxy, right?

this means that dashbaord users have different requirements than cephadm.

@sebastian-philipp
Copy link
Contributor Author

e.g.

  1. I'm setting up my cluster using hosts in /etc/hosts
  2. then I'm getting a signed certificate for my grafana instance using an FQDN
  3. and now I need to access grafana in the dasboard, without messing with cephadm.

@sebastian-philipp
Copy link
Contributor Author

e.g.

  1. I'm setting up my cluster using hosts in /etc/hosts
  2. then I'm getting a signed certificate for my grafana instance using an FQDN
  3. and now I need to access grafana in the dasboard, without messing with cephadm.

Telling users to create a new Ceph cluster, just because they got a certificate work grafana just isn't going to work

@epuertat
Copy link
Member

epuertat commented Jun 2, 2020

Just note the the current workaround of specifying the client's hostname of grafana in cephadm is wrong: cephadm prefers bare host names. If users need to access grafana using the FQDN, it's the dashboards responsibility to make this possible.

As I sometimes have issues understanding what cephadm is and what is not, I simply tried and replaced cephadm in that sentence with something I know better (ceph-ansible, k8s) and the result doesn't make sense to me. How's it possible that the configuration management tool is not responsible for something as essential as dealing with FQDNs and setting, where appropriate, either the FQDN or the short hostname...

@sebastian-philipp
Copy link
Contributor Author

As I sometimes have issues understanding what cephadm is and what is not, I simply tried and replaced cephadm in that sentence with something I know better (ceph-ansible, k8s) and the result doesn't make sense to me. How's it possible that the configuration management tool is not responsible for something as essential as dealing with FQDNs and setting, where appropriate, either the FQDN or the short hostname...

Ok, here are the glory details:

Mainly because cephadm has different requirements and demands than ceph-ansible. cephadm has very minimal requirements when it comes to resolving host names etc. Cephadm executes root@host where host can be resolved in four different ways:

  • a custom ssh config resolving the name to an IP
  • via an externally maintained /etc/hosts
  • via ceph orch host add <hostname> <IP>
  • DNS

Second, Ceph itself uses hostname to determine the name of the host and in addition, cephadm demands that the name of the host given via ceph orch host add equals hostname on the remote hosts. Otherwise cephadm can't be sure, the host names returned by ceph * metadata match the hosts known to cephadm.

Third, it turns out, there are two valid ways to set hostname:

  1. bare:
  • make hostname return the bare host name
  • make hostname -f return the FQDN
  1. full:
  • make hostname return the FQDN
  • make hostname -s return the bare host name

quoting man hostname:

THE FQDN
The FQDN (Fully Qualified Domain Name) of the system is the name that the resolver(3) returns for the host name, such as, ursula.example.com. It is usually the hostname followed by the DNS domain name (the part after the first dot). You can check
the FQDN using hostname --fqdn or the domain name using dnsdomainname.

  You cannot change the FQDN with hostname or dnsdomainname.

  The recommended method of setting the FQDN is to make the hostname be an alias for the fully qualified name using /etc/hosts, DNS, or NIS. For example, if the hostname was "ursula", one might have a line in /etc/hosts which reads

         127.0.1.1    ursula.example.com ursula

Which means, man hostname recommends hostname to return the bare host name. Which means that Ceph will return the bare hostnames in ceph * metadata. Which means that users run ceph orch host add <bare-name> to add a host to the cluster. Which means that the dashboard returns the bare host name when pointing the user to Grafana.

Fourth, If the user now get a certificate with the "common name" pointing to the FQDN, we need a place for the user to make the dashboard return the FQDN instead of the bare name used by Ceph.

To sum up, cephadm uses hostname on the machines to identify hosts and the browser uses the DNS name to identify hosts. And both are not necessary identical.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants