mon: fix a race between mgr fail and MgrMonitor::prepare_beacon()#46318
mon: fix a race between mgr fail and MgrMonitor::prepare_beacon()#46318
mgr fail and MgrMonitor::prepare_beacon()#46318Conversation
3620da6 to
2abdfaf
Compare
|
I'm not understanding what race condition you're saying is here, nor how this fixes anything. propose_pending() will do an immediate propose, but without that the command still waits for a commit before replying to the client which invoked "mgr fail". And this PR doesn't change anything at all about the locking sequence you've displayed in the logs. I do note that if there are no standbys available in pending_map.standbys, promote_standby() won't make a change and just returns false. I'd guess you're seeing that? |
|
Hello Greg! Thanks for taking a look! I think the problem is not about synchronization between the pair:
but it involves the third entity too: a manager which takes a different path than the Manager misses it because IIUC |
|
Ah, that makes more sense. I see a few different solutions, though I don't know what the balance of issues is for the mgr dependencies:
Should probably discuss that with people who know more about the mgr. I'm guessing they do want the ability to fail the mgr though. So perhaps adjust this PR so that prepare_command() only proposes when handling a fail command? There's a |
|
My impression was the mgr-related admin commands are supposed to be executed infrequently, usually in a reaction to an operator's request (even when it happens indirectly – via the orchestrator). The other commands we would flush the pending map for are about turning on/off manager's modules. I believe they are rarely happening too while (I don't have any proof, just gut feelings) are vulnerable to similar races as When investigating possible approaches I was considering a pin-point fix around A way to restart mgrs is already neeeded by |
|
I just don't want to force commits by default on commands. It may not cause a problem immediately, but defaulting to a forced paxos commit is a bad smell in monitor code and can lead to hard-to-detect scaling issues if we expand these commands in the future. New mgr commands are most likely to be involved with upgrades or recovery situations where we're likely already loading the monitors down and needing to invoke the commands frequently! |
|
Ah, I see: future-proofness. You're worried it would be too easy to miss this flush when adding a new, possibly not so innocent and infrequent command. Makes sense. I will limit the flush to |
2abdfaf to
15d3dbc
Compare
|
@gregsfortytwo: would you mind taking another look? |
gregsfortytwo
left a comment
There was a problem hiding this comment.
Good enough for now. In the other monitor implementations we have a "should_propose" boolean that gets set in each prefix when appropriate, but that's a simple and obvious enough change if it comes up in future.
|
Teuthology Test Result Related Failure |
There is a race condition between the `mgr fail` handling and `mgrbeacon`. ```diff diff --git a/src/mon/MgrMonitor.cc b/src/mon/MgrMonitor.cc index 8ada44e..9000b2e0687 100644 --- a/src/mon/MgrMonitor.cc +++ b/src/mon/MgrMonitor.cc @@ -1203,7 +1203,9 @@ bool MgrMonitor::prepare_command(MonOpRequestRef op) } if (changed && pending_map.active_gid == 0) { + dout(5) << "========== changed and active_state == 0" << dendl; promote_standby(); + dout(5) << "========== after promote_standby: " << pending_map.active_gid << dendl; } } else if (prefix == "mgr module enable") { string module; ``` ``` 2022-05-17T00:11:19.602+0200 7f6bd5769700 0 mon.a@0(leader) e1 handle_command mon_command({"prefix": "mgr fail", "who": "x"} v 0) v1 ... 2022-05-17T00:11:19.614+0200 7f6bd5769700 5 mon.a@0(leader).mgr e25 ========== changed and active_state == 0 2022-05-17T00:11:19.614+0200 7f6bd5769700 5 mon.a@0(leader).mgr e25 ========== after promote_standby: 0 2022-05-17T00:11:19.614+0200 7f6bd5769700 4 mon.a@0(leader).mgr e25 prepare_command done, r=0 ... 2022-05-17T00:11:19.630+0200 7f6bd5769700 4 mon.a@0(leader).mgr e25 selecting new active 4210 x (was 0 ) ``` ```cpp bool MgrMonitor::prepare_beacon(MonOpRequestRef op) if (pending_map.active_gid == m->get_gid()) { // ... } else if (pending_map.active_gid == 0) { // There is no currently active daemon, select this one. if (pending_map.standbys.count(m->get_gid())) { drop_standby(m->get_gid(), false); } dout(4) << "selecting new active " << m->get_gid() << " " << m->get_name() << " (was " << pending_map.active_gid << " " << pending_map.active_name << ")" << dendl; pending_map.active_gid = m->get_gid(); pending_map.active_name = m->get_name(); pending_map.active_change = ceph_clock_now() ``` The `25` version of `MgrMap`, when handled at `mgr.x`, doesn't trigger the `respawn()` path: ``` 2022-05-17T00:10:11.197+0200 7fa3d1e0a700 10 mgr ms_dispatch2 active mgrmap(e 25) v1 2022-05-17T00:10:11.197+0200 7fa3d1e0a700 4 mgr handle_mgr_map received map epoch 25 2022-05-17T00:10:11.197+0200 7fa3d1e0a700 4 mgr handle_mgr_map active in map: 1 active is 4210 2022-05-17T00:10:11.197+0200 7fa3d6613700 10 --2- 127.0.0.1:0/743576734 >> [v2:127.0.0.1:40929/0,v1:127.0.0.1:40930/0] conn(0x5592635ef400 0x5592635f6580 secure :-1 s=THROTTLE_DONE pgs=130 cs=0 l=1 rev1=1 crypto rx=0x55926362e810 tx=0x559263563b60 comp rx=0 tx=0).handle_read_frame_dispatch tag=17 2022-05-17T00:10:11.197+0200 7fa3d6613700 5 --2- 127.0.0.1:0/743576734 >> [v2:127.0.0.1:40929/0,v1:127.0.0.1:40930/0] conn(0x5592635ef400 0x5592635f6580 secure :-1 s=THROTTLE_DONE pgs=130 cs=0 l=1 rev1=1 crypto rx=0x55926362e810 tx=0x559263563b60 comp rx=0 tx=0).handle_message got 43089 + 0 + 0 byte message. envelope type=1796 src mon.0 off 0 2022-05-17T00:10:11.197+0200 7fa3d1e0a700 10 mgr handle_mgr_map I was already active ``` Fixes: https://tracker.ceph.com/issues/55711 Signed-off-by: Radosław Zarzyński <rzarzyns@redhat.com>
15d3dbc to
23c3f76
Compare
|
@rzarzynski hi Radek, did you manage to fix the test failure reported at #46318 (comment) ? if yes, i think this change is good to merge. |
|
Hi Kefu! Yup, I added the |
|
Hello @rzarzynski, I think I have a more robust solution to this in #50404. Please see commit |
There is a race condition between the
mgr failhandling andmgrbeacon.The
25th version ofMgrMap, when handled atmgr.x, doesn't trigger therespawn()path (while e.g.cephadmassumes it does):Signed-off-by: Radosław Zarzyński rzarzyns@redhat.com
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windows