squid: mon [stretch mode]: support disable_stretch_mode#60629
squid: mon [stretch mode]: support disable_stretch_mode#60629SrinivasaBharath merged 4 commits intoceph:squidfrom
Conversation
Problem: Currently, Ceph lacks the ability to exit stretch mode and move back to normal cluster (non-stretched). Solution: Provide a command to allow the user to exit stretch mode gracefully: `ceph mon disable_stretch_mode <crush_rule> --yes-i-really-mean-it` User can either specify a crush rule that they want all pools to move to or not specify a rule and Ceph will use a default replicated crush rule. Fixes: https://tracker.ceph.com/issues/67467 Signed-off-by: Kamoltat Sirivadhna <ksirivad@redhat.com> (cherry picked from commit 78ce68d)
Added documentation about exiting stretch mode. Fixes: https://tracker.ceph.com/issues/67467 Signed-off-by: Kamoltat Sirivadhna <ksirivad@redhat.com> (cherry picked from commit 0680f17)
Test disabling stretch mode with the following scenario: 1. Healthy Stretch Mode 2. Degraded Stretch Mode Fixes: https://tracker.ceph.com/issues/67467 Signed-off-by: Kamoltat Sirivadhna <ksirivad@redhat.com> (cherry picked from commit 4d2f887)
Problem: Current dump for "removed_ranks" and "disallowed_leaders" doesn't have the correct format so the python test script can parse through these values. Solution: Modified the values such that it is in the correct format Signed-off-by: Kamoltat Sirivadhna <ksirivad@redhat.com> (cherry picked from commit a7f3b7b)
|
@NitzanMordhai this is just a backport of a PR you recently reviewed in main: #59483 |
|
@NitzanMordhai ping |
| } | ||
|
|
||
| void OSDMonitor::try_disable_stretch_mode(stringstream& ss, | ||
| bool *okay, |
There was a problem hiding this comment.
(I know it's a backport, so probably too late, but: why have an out 'okay' param, instead of returning a value?)
There was a problem hiding this comment.
Hi @ronen-fr
'okay' is a pointer to a variable that is used to indicate whether the operation is successful or not, the function try_disable_stretch_mode is called by another Paxos service called MonmapMonitor which is in charge of processing the command mon disable_stretch_mode. So the okay is a parameter to indicate whether this command is successful or not, if not it returns an error code and also goto a part of the code where we reply with no propose to mon map changes.
There was a problem hiding this comment.
That's not what I was trying to say. I understand the role of 'okay'. What I was asking: why return 'void' and use an 'out' parameter (and there are good reasons for minimizing out params usage), instead of returning the 'is okay' value as the ret val of the function.
|
apart from that question above - LGTM. |
|
Thank you, @ronen-fr, for the feedback. I just replied to your comment. Since it's a backport, I think having a core engineer review the backported code is sufficient (Greg is the ideal candidate to review, but I understand he is busy). I will queue this for the Teuthology test and see if there is no issue. |
|
Rados approved: https://tracker.ceph.com/issues/69323#note-3 |
backport tracker: https://tracker.ceph.com/issues/68840
backport of #59483
parent tracker: https://tracker.ceph.com/issues/67467
this backport was staged using ceph-backport.sh version 16.0.0.6848
find the latest version at https://github.com/ceph/ceph/blob/main/src/script/ceph-backport.sh