Improve mon location handling for stretch clusters#40483
Improve mon location handling for stretch clusters#40483gregsfortytwo merged 5 commits intoceph:masterfrom
Conversation
We can adopt new monmaps while bootstrapping, or in election messages, in addition to MonmapMonitor::update_from_paxos. Since we use the notification to update our election strategy and such, we need to notify from these locations as well! Fixes: https://tracker.ceph.com/issues/47654 Signed-off-by: Greg Farnum <gfarnum@redhat.com>
We blocked off the other routes to add location-less monitors, but if you turn on a monitor with the right keys it can auto-join via the MMonJoin functionality. Block that off! Signed-off-by: Greg Farnum <gfarnum@redhat.com>
|
Drat this is running into the boost build issue I've seen referenced elsewhere. I opened a pacific PR for it as well since I imagine that one will build boost correctly... #40484 |
|
jenkins test make check |
|
jenkins test api |
2 similar comments
|
jenkins test api |
|
jenkins test api |
|
I made a bug for the API tests; that seems to be happening everywhere https://tracker.ceph.com/issues/50058 |
|
monthrash passed with 2 failures in ansible. Scheduled singleton-bluestore per Neha's request as a reproducer for the election mode bug: https://pulpito.ceph.com/gregf-2021-03-30_18:35:54-rados:singleton-bluestore-wip-stretch-mon-location-329-distro-basic-smithi/ |
src/messages/MMonJoin.h
Outdated
| /* The location members are for stretch mode. crush_loc is the location | ||
| * (generally just a "datacenter=<foo>" statement) of the monitor. The | ||
| * force_loc is whether the mon cluster should replace a previously-known | ||
| * location. Geenrally the monitor will force an update if it's given a |
src/messages/MMonJoin.h
Outdated
| std::string_view get_type_name() const override { return "mon_join"; } | ||
| void print(std::ostream& o) const override { | ||
| o << "mon_join(" << name << " " << addrs << ")"; | ||
| o << "mon_join(" << name << " " << addrs << crush_loc << ")"; |
src/mon/Monitor.cc
Outdated
| if (monmap->contains(name) && | ||
| !monmap->get_addrs(name).front().is_blank_ip()) { | ||
| bool in_map = false; | ||
| const auto& my_info = monmap->mon_info.find(name); |
There was a problem hiding this comment.
This looks like undefined behaviour, my_info will end up as a ref to a free'd temporary since find returns an interator by value?
src/mon/Monitor.cc
Outdated
| !monmap->get_addrs(name).front().is_blank_ip()) { | ||
| bool in_map = false; | ||
| const auto& my_info = monmap->mon_info.find(name); | ||
| const map<string,string> *map_crush_loc; |
There was a problem hiding this comment.
Initialize to nullptr. Also, can you simply use map_crush_loc being non-null to indicate in_map and remove the second variable?
athanatos
left a comment
There was a problem hiding this comment.
I made a few minor comments. LGTM, though I don't know this code particularly well.
|
jenkins test api |
…join This will let newly-created monitors auto-join on startup in stretch mode, by providing the needed location. Signed-off-by: Greg Farnum <gfarnum@redhat.com>
Go to some effort to look at our location in the monmap and update it the same way we update names or IP addresses when necessary. Let users pass in the location on the CLI via "--set-crush-location". Signed-off-by: Greg Farnum <gfarnum@redhat.com>
As in dd63a3e for the OSDMap, this caused crashes when encoding for kernel clients, and is unnecessary for servers because they are separately gated. I did a full audit of every instance of "assert" I added to the codebase to make sure this is the very last one of these issues. Signed-off-by: Greg Farnum <gfarnum@redhat.com>
807b16e to
589de8b
Compare
|
Pushed updates for Sam's comments and will schedule upgrade suite against that. https://pulpito.ceph.com/gregf-2021-03-30_20:35:40-rados-wip-stretch-mon-location-329-distro-basic-smithi/ looks good |
|
jenkins test make check |
|
https://pulpito.ceph.com/gregf-2021-03-31_10:04:02-upgrade:octopus-x-wip-stretch-mon-location-331-distro-basic-smithi/ |
This PR has two parts, one a short bug fix and one handling a hole
in location handling. The bug fix is yet another invalid assert on structure
encoding triggered by kernel clients, but after an audit session it's the
last one. The hole in location handling is caused by monitors being able
to auto-join a cluster if they have the right keys, even if stretch mode
is engaged and there's no provided location. This PR closes the hole and
provides a mechanism for the daemons and admins/orchestrator to set
that location even when using this auto-bootstrap functionality instead
of creating the monitor in the monmap ahead of time.
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume tox