mon/OSDMonitor: Added extra check before mon.go_recovery_stretch_mode()#47340
mon/OSDMonitor: Added extra check before mon.go_recovery_stretch_mode()#47340
Conversation
68d0c0f to
99ff33e
Compare
|
jenkins test make check |
|
jenkins test windows |
99ff33e to
32fade0
Compare
Added bug reproducer for https://bugzilla.redhat.com/show_bug.cgi?id=2104207 Added more logs in MON. Signed-off-by: Kamoltat <ksirivad@redhat.com>
Problem: There are certain scenarios in degraded stretched cluster where will try to go into the function ``Monitor::go_recovery_stretch_mode()`` that will lead to a `ceph_assert`. Solution: Make sure ``dead_mon_buckets.size() == 0`` in ``OSDMonitor:update_from_paxos()`` before going into ``Monitor::go_recovery_stretch_mode()``. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2104207 Signed-off-by: Kamoltat <ksirivad@redhat.com>
32fade0 to
d95c41a
Compare
|
Do we still need the |
|
removed |
|
jenkins test api |
|
jenkins test make check arm64 |
|
jenkins test windows |
|
jenkins test make check arm64 |
|
jenkins test make check arm64 |
|
jenkins test windows |
|
jenkins test api |
gregsfortytwo
left a comment
There was a problem hiding this comment.
I did not dig into the test case, but the assert patch looks good and the debugging out put is fine.
| (osdmap.num_up_osd / (double)osdmap.num_osd) > | ||
| cct->_conf.get_val<double>("mon_stretch_cluster_recovery_ratio")) { | ||
| cct->_conf.get_val<double>("mon_stretch_cluster_recovery_ratio") && | ||
| mon.dead_mon_buckets.size() == 0) { |
There was a problem hiding this comment.
Hmm. This definitely works for our current 2-site setup, but we'll need to adjust it if we start supporting 3-site (and other count) stretch clusters with the explicit stretch mechanisms. That was something I was considering (and trying to keep easy) when writing the other code.
So I'd rather do something that won't require changing when we hit that point, but I don't have a good simple solution, so looks good.
|
jenkins test make check arm64 |
|
jenkins test make check arm64 |
|
jenkins test make check arm64 |
|
This PR introduced: https://tracker.ceph.com/issues/58239, we are in the process of fixing the issue |
Problem:
There are certain scenarios in degraded
stretched cluster where will try to
go into the
function
Monitor::go_recovery_stretch_mode()that will lead to a
ceph_assert.Solution:
Make sure
dead_mon_buckets.size() == 0in
OSDMonitor:update_from_paxos()before going into
Monitor::go_recovery_stretch_mode().Fixes:
https://tracker.ceph.com/issues/57017
TODO:Need to separate the log commits and drop them before merging.Signed-off-by: Kamoltat ksirivad@redhat.com
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windows