Skip to content

squid: mon [stretch mode]: support disable_stretch_mode#60629

Merged
SrinivasaBharath merged 4 commits intoceph:squidfrom
kamoltat:wip-68840-squid
Jan 13, 2025
Merged

squid: mon [stretch mode]: support disable_stretch_mode#60629
SrinivasaBharath merged 4 commits intoceph:squidfrom
kamoltat:wip-68840-squid

Conversation

@kamoltat
Copy link
Copy Markdown
Member

@kamoltat kamoltat commented Nov 5, 2024

backport tracker: https://tracker.ceph.com/issues/68840


backport of #59483
parent tracker: https://tracker.ceph.com/issues/67467

this backport was staged using ceph-backport.sh version 16.0.0.6848
find the latest version at https://github.com/ceph/ceph/blob/main/src/script/ceph-backport.sh

Problem:

Currently, Ceph lacks the ability
to exit stretch mode and move back
to normal cluster (non-stretched).

Solution:

Provide a command to allow
the user to exit stretch mode gracefully:

`ceph mon disable_stretch_mode <crush_rule> --yes-i-really-mean-it`

User can either specify a crush rule that
they want all pools to move to or not specify
a rule and Ceph will use a default replicated crush rule.

Fixes: https://tracker.ceph.com/issues/67467

Signed-off-by: Kamoltat Sirivadhna <ksirivad@redhat.com>
(cherry picked from commit 78ce68d)
Added documentation about exiting stretch mode.

Fixes: https://tracker.ceph.com/issues/67467

Signed-off-by: Kamoltat Sirivadhna <ksirivad@redhat.com>
(cherry picked from commit 0680f17)
Test disabling stretch mode with the following scenario:

1. Healthy Stretch Mode
2. Degraded Stretch Mode

Fixes: https://tracker.ceph.com/issues/67467

Signed-off-by: Kamoltat Sirivadhna <ksirivad@redhat.com>
(cherry picked from commit 4d2f887)
Problem:
Current dump for "removed_ranks" and "disallowed_leaders"
doesn't have the correct format so the python test
script can parse through these values.

Solution:
Modified the values such that it is in the correct format

Signed-off-by: Kamoltat Sirivadhna <ksirivad@redhat.com>
(cherry picked from commit a7f3b7b)
@kamoltat kamoltat requested review from a team as code owners November 5, 2024 19:08
@kamoltat kamoltat added this to the squid milestone Nov 5, 2024
@kamoltat kamoltat added the core label Nov 5, 2024
@kamoltat kamoltat self-assigned this Nov 5, 2024
@kamoltat
Copy link
Copy Markdown
Member Author

@NitzanMordhai this is just a backport of a PR you recently reviewed in main: #59483
I thought you might be the best person to give it a look. Thank you!

@kamoltat
Copy link
Copy Markdown
Member Author

@NitzanMordhai ping

Comment thread src/mon/OSDMonitor.cc
}

void OSDMonitor::try_disable_stretch_mode(stringstream& ss,
bool *okay,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I know it's a backport, so probably too late, but: why have an out 'okay' param, instead of returning a value?)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ronen-fr
'okay' is a pointer to a variable that is used to indicate whether the operation is successful or not, the function try_disable_stretch_mode is called by another Paxos service called MonmapMonitor which is in charge of processing the command mon disable_stretch_mode. So the okay is a parameter to indicate whether this command is successful or not, if not it returns an error code and also goto a part of the code where we reply with no propose to mon map changes.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not what I was trying to say. I understand the role of 'okay'. What I was asking: why return 'void' and use an 'out' parameter (and there are good reasons for minimizing out params usage), instead of returning the 'is okay' value as the ret val of the function.

@ronen-fr
Copy link
Copy Markdown
Contributor

apart from that question above - LGTM.
I'm not versed in 'stretch mode' issues, so you may want to wait for more approvals.

@kamoltat
Copy link
Copy Markdown
Member Author

kamoltat commented Nov 25, 2024

Thank you, @ronen-fr, for the feedback. I just replied to your comment. Since it's a backport, I think having a core engineer review the backported code is sufficient (Greg is the ideal candidate to review, but I understand he is busy). I will queue this for the Teuthology test and see if there is no issue.

@Naveenaidu
Copy link
Copy Markdown
Contributor

Rados approved: https://tracker.ceph.com/issues/69323#note-3

@SrinivasaBharath SrinivasaBharath merged commit 9ed8121 into ceph:squid Jan 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants