Skip to content

mgr/cephadm: force flag for ok-to-stop and ok-to-stop for monitoring stack#38854

Merged
sebastian-philipp merged 1 commit intoceph:masterfrom
adk3798:ots-monitoring
Jan 27, 2021
Merged

mgr/cephadm: force flag for ok-to-stop and ok-to-stop for monitoring stack#38854
sebastian-philipp merged 1 commit intoceph:masterfrom
adk3798:ots-monitoring

Conversation

@adk3798
Copy link
Contributor

@adk3798 adk3798 commented Jan 11, 2021

Daemons that could cause data loss when stopped will always block.

Daemons that will only cause loss in availability should block but have
a workaround in the form of a force flag if the user is okay with the service
being down.

Also implements ok-to-stop for monitoring stack daemons that uses this system
of blocking on availability loss unless force flag is provided

Signed-off-by: Adam King adking@redhat.com
Signed-off-by: Daniel-Pivonka dpivonka@redhat.com

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

Copy link
Member

@jmolmo jmolmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me is ok once solved the "serialization" problem pointed by Sebastian

Copy link
Contributor

@tchaikov tchaikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you prefix the title of your commit message with the subcomponent your are changing ? see https://github.com/ceph/ceph/blob/master/SubmittingPatches.rst#3-describe-your-changes

@adk3798 adk3798 changed the title Add notification if stopping monitoring service through maintenance mgr/cephadm: Add notification if stopping monitoring service through maintenance Jan 12, 2021
@tchaikov tchaikov dismissed their stale review January 13, 2021 01:32

dismissed

Copy link
Contributor

@sebastian-philipp sebastian-philipp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's much better! Do you think it's easy to reduce the amount of copy&paste?

@adk3798 adk3798 changed the title mgr/cephadm: Add notification if stopping monitoring service through maintenance mgr/cephadm: force flag for ok-to-stop and ok-to-stop for monitoring stack Jan 13, 2021
@adk3798
Copy link
Contributor Author

adk3798 commented Jan 13, 2021

Just adjusted this pretty heavily to add a force flag to ok-to-stop rather than just adding notice messages for monitoring daemons.

Things to note:

  • changed _host_ok_to_stop to run through all daemons rather than returning on first failure. The idea here being if a user wanted to put a host in maintenance that had two daemons that would cause availability issues we want to show both of them rather than just showing one and then having a subsequent call with the force flag override both.
  • Did not include full use of this flag in places other than host maintenance. orch stop and orch daemon stop should likely make use of ok-to-stop and therefore this flag. I've left them alone for now because the way daemon actions is implemented through scheduling involves storing the action in a [host][daemon] -> action (str) mapping. I think that data structure would have to be adjusted (and therefore everywhere that uses the scheduled daemon actions as well) to pass the information that the force flag was set along. I could include that in this PR but I think it could also be its own change.
  • applying service specs also makes use of ok-to-stop. I currently have it default to force=True there as that maintains the old functionality. The apply command could theoretically also make use of the force flag. Again, not sure if we want something like that in this PR.
  • Currently, all the monitoring stack ok-to-stop functions just use an error code of 1 if they're going to block. I'm open to suggestions for better values to use. Swapped to EBUSY

@adk3798 adk3798 requested a review from jmolmo January 13, 2021 19:59
Daniel-Pivonka pushed a commit to Daniel-Pivonka/ceph that referenced this pull request Jan 13, 2021
depends on ceph#38854

Signed-off-by: Daniel-Pivonka <dpivonka@redhat.com>
Daniel-Pivonka pushed a commit to Daniel-Pivonka/ceph that referenced this pull request Jan 13, 2021
depends on ceph#38854

Signed-off-by: Daniel-Pivonka <dpivonka@redhat.com>
@sebastian-philipp
Copy link
Contributor

As a general rule of thumb: Please don't use Optional[bool] for parameter annotations.

The difference is:

def my_func(param: Optionl[bool]): ...

declares a variable as nullable. If you use Optionl[bool], param can be either True or False or None and you have to check that in the code. param = True or False is not identical to param = True or False or None. And on the other hand:

def my_func(param: bool = False): ...

declares param as optional when calling the function.

cc @ceph/orchestrators

@sebastian-philipp sebastian-philipp added the wip-swagner-testing My Teuthology tests label Jan 14, 2021
@adk3798 adk3798 force-pushed the ots-monitoring branch 3 times, most recently from 541742d to d972bf4 Compare January 14, 2021 18:25
Daniel-Pivonka pushed a commit to Daniel-Pivonka/ceph that referenced this pull request Jan 14, 2021
depends on ceph#38854

Signed-off-by: Daniel-Pivonka <dpivonka@redhat.com>
@sebastian-philipp sebastian-philipp removed the wip-swagner-testing My Teuthology tests label Jan 18, 2021
Daniel-Pivonka pushed a commit to Daniel-Pivonka/ceph that referenced this pull request Jan 18, 2021
depends on ceph#38854

Signed-off-by: Daniel-Pivonka <dpivonka@redhat.com>
adk3798 pushed a commit to adk3798/ceph that referenced this pull request Jan 18, 2021
depends on ceph#38854

Signed-off-by: Daniel-Pivonka <dpivonka@redhat.com>
@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

…stack

Daemons that could cause data loss when stopped will always block.

Daemons that will only cause loss in availability should block but have
a workaround in the form of a force flag if the user is okay with the service
being down.

Also implements ok-to-stop for monitoring stack daemons that uses this system
of blocking on availability loss unless force flag is provided

Signed-off-by: Adam King <adking@redhat.com>
Signed-off-by: Daniel-Pivonka <dpivonka@redhat.com>
@sebastian-philipp
Copy link
Contributor

@sebastian-philipp
Copy link
Contributor

@sebastian-philipp sebastian-philipp merged commit 84b7852 into ceph:master Jan 27, 2021
Daniel-Pivonka pushed a commit to Daniel-Pivonka/ceph that referenced this pull request Jan 27, 2021
depends on ceph#38854

Signed-off-by: Daniel-Pivonka <dpivonka@redhat.com>
Daniel-Pivonka pushed a commit to Daniel-Pivonka/ceph that referenced this pull request Jan 27, 2021
depends on ceph#38854

Signed-off-by: Daniel-Pivonka <dpivonka@redhat.com>
Daniel-Pivonka pushed a commit to Daniel-Pivonka/ceph that referenced this pull request Feb 3, 2021
depends on ceph#38854

Signed-off-by: Daniel-Pivonka <dpivonka@redhat.com>
jmolmo pushed a commit that referenced this pull request Feb 19, 2021
depends on #38854

Signed-off-by: Daniel-Pivonka <dpivonka@redhat.com>
(cherry picked from commit 96dcb15)
sebastian-philipp pushed a commit to sebastian-philipp/ceph that referenced this pull request Feb 22, 2021
depends on ceph#38854

Signed-off-by: Daniel-Pivonka <dpivonka@redhat.com>
(cherry picked from commit 96dcb15)
myoungwon pushed a commit to myoungwon/ceph that referenced this pull request Mar 29, 2021
depends on ceph#38854

Signed-off-by: Daniel-Pivonka <dpivonka@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants