Skip to content

osd: add osd_fast_shutdown_notify_mon option (default false)#38909

Merged
liewegas merged 2 commits intoceph:masterfrom
mfoliveira:osd_fast_shutdown_notify_mon
Mar 3, 2021
Merged

osd: add osd_fast_shutdown_notify_mon option (default false)#38909
liewegas merged 2 commits intoceph:masterfrom
mfoliveira:osd_fast_shutdown_notify_mon

Conversation

@mfoliveira
Copy link

The osd_fast_shutdown option may cause the cluster log to receive
too many entries of osd.X reported immediately failed by osd.Y,
depending on cluster scale.

This might be an issue for LMA stacks/tools that check ceph logs
for failed lines, and then require additional logic to filter on
an intended OSD (fast) shutdown; might not be an option/possible,
and require an admin to analyze.

So, add osd_fast_shutdown_notify_mon option for OSD to also tell
the monitor it is shutting down (done in slow/non-fast shutdown)
under osd_fast_shutdown.

This introduces minimal delay (the ack from the mon is required
to prevent the messages), and addresses the cluster log issue.
Note: the osd_mon_shutdown_timeout option can be used to control
the maximum amount of time waiting for the monitor ack to arrive.

Fixes: http://tracker.ceph.com/issues/46978
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>

Testing

Note: for testing the option value has been changed to true, so to exercise the code path difference (otherwise it's a nop).

This passed run-make-check.sh (transient failures cleared w/ reruns: unittest_bluefs and unittest_bluefs).

And qa/run-standalone.sh reported 3 apparently unrelated failures (not rerun): osd/osd-bluefs-volume-ops.sh, scrub/osd-recovery-scrub.sh, scrub/osd-scrub-snaps.sh

Testing with vstart.sh (10 OSDs) shows that without the option (or false) the mon log gets multiple reports per OSD, and with the option (true) it remains quiet, as in the slow/non-fast shutdown case, as expected.

  • Before / or with osd_fast_shutdown_notify_mon = false:
osd log:

2021-01-09T18:59:52.448+0000 7f937fcdc700 -1 received  signal: Terminated from -bash  (PID: 408) UID: 1000
2021-01-09T18:59:52.448+0000 7f937fcdc700 -1 osd.2 22 *** Got signal Terminated ***
2021-01-09T18:59:52.448+0000 7f937fcdc700 -1 osd.2 22 *** Immediate shutdown (osd_fast_shutdown=true) ***

mon log:

$ cat out/mon.a.log | grep '^2021-01-09T18:59:' | grep 'osd.0 reported immediately failed by osd.' | rev | cut -d: -f1 | rev | sort | uniq -c
      4  osd.0 reported immediately failed by osd.1
      4  osd.0 reported immediately failed by osd.2
      4  osd.0 reported immediately failed by osd.3
      4  osd.0 reported immediately failed by osd.4
      4  osd.0 reported immediately failed by osd.5
      4  osd.0 reported immediately failed by osd.6
      4  osd.0 reported immediately failed by osd.7
      4  osd.0 reported immediately failed by osd.8
      4  osd.0 reported immediately failed by osd.9
  • After / with osd_fast_shutdown_notify_mon = true:
osd log:

2021-01-14T17:21:10.825+0000 7feceded1700 -1 received  signal: Terminated from -bash  (PID: 1750) UID: 1000
2021-01-14T17:21:10.825+0000 7feceded1700 -1 osd.0 80 *** Got signal Terminated ***
2021-01-14T17:21:10.825+0000 7feceded1700 -1 osd.0 80 *** Immediate shutdown (osd_fast_shutdown=true) ***
2021-01-14T17:21:10.825+0000 7feceded1700  0 osd.0 80 prepare_to_stop telling mon we are shutting down
...
2021-01-14T17:21:11.021+0000 7fecdac0c700  0 osd.0 80 got_stop_ack starting shutdown
2021-01-14T17:21:11.021+0000 7feceded1700  0 osd.0 80 prepare_to_stop starting shutdown

mon log:

2021-01-14T17:21:10.829+0000 7f62fce61700  0 log_channel(cluster) log [INF] : osd.0 marked itself down
2021-01-14T17:21:10.885+0000 7f62ff666700  1 mon.a@0(leader).osd e80 do_prune osdmap full prune enabled
2021-01-14T17:21:10.889+0000 7f62ff666700  0 log_channel(cluster) log [WRN] : Health check failed: 1 osds down (OSD_DOWN)
2021-01-14T17:21:10.957+0000 7f62fb65e700  1 mon.a@0(leader).osd e81 e81: 10 total, 9 up, 10 in
2021-01-14T17:21:11.013+0000 7f62fb65e700  0 log_channel(cluster) log [DBG] : osdmap e81: 10 total, 9 up, 10 in

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

@smithfarm
Copy link
Contributor

smithfarm commented Jan 18, 2021

@neha-ojha @jdurgin @tchaikov The author is asking for this to be backported (all the way back to nautilus). Does it need a release note?

@mfoliveira
Copy link
Author

@smithfarm

@neha-ojha @jdurgin @tchaikov The author is asking for this to be backported (all the way back to nautilus). Does it need a release note?

Is this something I can do, say, adding a note to PendingReleaseNotes ?

Thanks

@smithfarm
Copy link
Contributor

Is this something I can do, say, adding a note to PendingReleaseNotes ?

Yes, I think so. At least, it would make sense to me if you added a commit for that to this PR.

Mauricio Faria de Oliveira added 2 commits January 26, 2021 12:56
The osd_fast_shutdown option may cause the cluster log to receive
too many entries of 'osd.X reported immediately failed by osd.Y',
depending on cluster scale.

This might be an issue for LMA stacks/tools that check ceph logs
for failed lines, and then require additional logic to filter on
an intended OSD (fast) shutdown; might not be an option/possible,
and require an admin to analyze.

So, add osd_fast_shutdown_notify_mon option for OSD to also tell
the monitor it is shutting down (done in slow/non-fast shutdown)
under osd_fast_shutdown.

This introduces minimal delay (the ack from the mon is required
to prevent the messages), and addresses the cluster log issue.
Note: the osd_mon_shutdown_timeout option can be used to control
the maximum amount of time waiting for the monitor ack to arrive.

Fixes: http://tracker.ceph.com/issues/46978
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Let's add the ``osd_fast_shutdown_notify_mon`` option to PendingReleaseNotes
so it is documented.

Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
@mfoliveira
Copy link
Author

mfoliveira commented Jan 26, 2021

@smithfarm

Is this something I can do, say, adding a note to PendingReleaseNotes ?

Yes, I think so. At least, it would make sense to me if you added a commit for that to this PR.

Done; thanks!

@mfoliveira
Copy link
Author

The failing checks don't seem to be related to the changes in this PR;
for ceph API tests the errors mention dashboard, and for make check
the log is no longer avaialable, but the changes previously passed locally.

@mfoliveira
Copy link
Author

@smithfarm hey, sorry to bother -- just following up -- is the current timeframe not the best for reviews, or maybe this just has to wait longer because it's lower priority/impact? Thank you!

@neha-ojha
Copy link
Member

@smithfarm hey, sorry to bother -- just following up -- is the current timeframe not the best for reviews, or maybe this just has to wait longer because it's lower priority/impact? Thank you!

We are currently focusing on our upcoming release, Pacific. We'll review this PR next week. Thanks.

@neha-ojha
Copy link
Member

jenkins test make check

@mfoliveira
Copy link
Author

@neha-ojha

We are currently focusing on our upcoming release, Pacific. We'll review this PR next week. Thanks.

Understand; thanks for clarifying!

@neha-ojha
Copy link
Member

jenkins retest this please

@neha-ojha
Copy link
Member

@liewegas do you seen any issues with this option (given that it is off by default)?

@jdurgin
Copy link
Member

jdurgin commented Feb 24, 2021

seems like this should be the default to me

Copy link
Member

@neha-ojha neha-ojha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's start with this as a non-default option

@@ -4154,6 +4154,8 @@ int OSD::shutdown()
{
if (cct->_conf->osd_fast_shutdown) {
derr << "*** Immediate shutdown (osd_fast_shutdown=true) ***" << dendl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A quick suggestion: log the three relevant values: osd_fast_shutdown, osd_fast_shutdown_notify_mon, and osd_mon_shutdown_timeout here. Perhaps even shift this outside of the osd_fast_shutdown clause.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When reading the logs, it would be helpful to quickly know which shutdown scenario applies. This could also help catch instances where settings were modified between the logged issue and the sosreport capture.

That said, I don't expect these settings will be adjusted frequently. There is also enough logging elsewhere to deduce which settings were applied at run-time. So, I'm just leaving this as a suggested improvement.

@mfoliveira
Copy link
Author

Per SubmittingPatches-backports.rst:

Once the master PR has been merged, after checking that the change really needs to be backported (...)

Can someone please confirm it's OK to go ahead with backports of this down to Nautilus? Thanks!

@satoru-takeuchi
Copy link
Contributor

@neha-ojha @liewegas

let's start with this as a non-default option

I'd like to enable this option because I found some I/O delays possibly caused by this problem. Is there any potential risk of enabling this option? In other words, why is this option still false by default? It looks like that notifying mon when voluntary OSD shutdown should be done unconditionally.

@neha-ojha
Copy link
Member

@neha-ojha @liewegas

let's start with this as a non-default option

I'd like to enable this option because I found some I/O delays possibly caused by this problem. Is there any potential risk of enabling this option? In other words, why is this option still false by default? It looks like that notifying mon when voluntary OSD shutdown should be done unconditionally.

Yes, we should turn it on by default! Would you like to open a PR for it?

@satoru-takeuchi
Copy link
Contributor

@neha-ojha Got it. I'll do it soon.

@neha-ojha
Copy link
Member

@neha-ojha Got it. I'll do it soon.

Please link this ticket https://tracker.ceph.com/issues/53329 in your commit for backporting purposes.

@satoru-takeuchi
Copy link
Contributor

I'd already created #44016 that has a link to issue 53328. Should I close issue 53328 and change link to issue 53329?

@neha-ojha
Copy link
Member

I'd already created #44016 that has a link to issue 53328. Should I close issue 53328 and change link to issue 53329?

No need, I merged them, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants