osd/OSD: osd_fast_shutdown_notify_mon not quite right by NitzanMordhai · Pull Request #44807 · ceph/ceph

NitzanMordhai · 2022-01-27T13:19:36Z

When osd_fast_shutdown and osd_fast_shutdown_notify_mon set as true, OSD marked as Down
it should be marked as Dead,

Fixed: https://tracker.ceph.com/issues/53327

Signed-off-by: Nitzan Mordechai nmordech@redhat.com

Checklist

Tracker (select at least one)
- References tracker ticket
- Very recent bug; references commit where it was introduced
- New feature (ticket optional)
- Doc update (no ticket needed)
- Code cleanup (no ticket needed)
Component impact
- Affects Dashboard, opened tracker ticket
- Affects Orchestrator, opened tracker ticket
- No impact that needs to be tracked
Documentation (select at least one)
- Updates relevant documentation
- No doc update is appropriate
Tests (select at least one)
- Includes unit test(s)
- Includes integration test(s)
- Includes bug reproducer
- No tests

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox

NitzanMordhai · 2022-01-28T08:01:46Z

Changed to add check also for osd_fast_shutdown as well as osd_fast_shutdown_notify_mon before mark as dead

jdurgin

This looks right, however it seems set_state(STOPPING); should precede the MarkMeDead message, since that's what stops us from accepting new connections and processing new messages.

Since the is_stopping_cond is waiting with a timeout, STATE_STOPPING may not be set yet at this point.

mlausch · 2022-02-16T09:22:08Z

I tested the functionality of this patch.
The MOSDMarkMeDead is send to the mon only in about 1/3 of the cases (while stopping all OSDs of a host).
Mostly the OSD process is exited before the message could be transmitted to the mon
I think there should be waited for a ack like in the down message as well.

ronen-fr · 2022-02-16T10:42:41Z

Note also @mlausch 's comment in the tracker. I see two options:

wait for the ack - but that might miss the main idea of shutting down quickly even if the Mon is not responding immediately (at least - I think that's part of the idea), or
a delay based on internal OSD data. Either until we know that the message has gone out (not sure how easy is it to create the required low-level mechanism), or using a short (1 sec?) delay.

@jdurgin ? @neha-ojha ?

jdurgin · 2022-02-16T18:12:08Z

@ronen-fr I'd suggest the 1st approach - waiting for the ack - to keep it simple.

The speed of shutting down is mainly a concern during an upgrade. It's worse to not get marked down and affect client i/o than to shutdown a little bit slower to be sure the message gets to the mon.

yuriw · 2022-02-16T19:31:00Z

@NitzanMordhai pls add needs-qa when fixed

neha-ojha · 2022-02-16T19:38:46Z

Note also @mlausch 's comment in the tracker. I see two options:

wait for the ack - but that might miss the main idea of shutting down quickly even if the Mon is not responding immediately (at least - I think that's part of the idea), or

a delay based on internal OSD data. Either until we know that the message has gone out (not sure how easy is it to create the required low-level mechanism), or using a short (1 sec?) delay.

@jdurgin ? @neha-ojha ?

I agree that waiting for an ack mechanism is probably the better way to go. Also, osd_fast_shutdown_notify_mon is set to false by default(because of #44016 (comment)), so we should be make sure we are testing this code path correctly, maybe add a test based on what @mlausch is doing.

jdurgin · 2022-02-16T19:56:58Z

Note also @mlausch 's comment in the tracker. I see two options:

wait for the ack - but that might miss the main idea of shutting down quickly even if the Mon is not responding immediately (at least - I think that's part of the idea), or

a delay based on internal OSD data. Either until we know that the message has gone out (not sure how easy is it to create the required low-level mechanism), or using a short (1 sec?) delay.

@jdurgin ? @neha-ojha ?

I agree that waiting for an ack mechanism is probably the better way to go. Also, osd_fast_shutdown_notify_mon is set to false by default(because of #44016 (comment)), so we should be make sure we are testing this code path correctly, maybe add a test based on what @mlausch is doing.

I think we should change that default - there's very little impact outside of an edge case when mons are unresponsive.

neha-ojha · 2022-02-16T20:33:53Z

Note also @mlausch 's comment in the tracker. I see two options:

wait for the ack - but that might miss the main idea of shutting down quickly even if the Mon is not responding immediately (at least - I think that's part of the idea), or

a delay based on internal OSD data. Either until we know that the message has gone out (not sure how easy is it to create the required low-level mechanism), or using a short (1 sec?) delay.

@jdurgin ? @neha-ojha ?

I agree that waiting for an ack mechanism is probably the better way to go. Also, osd_fast_shutdown_notify_mon is set to false by default(because of #44016 (comment)), so we should be make sure we are testing this code path correctly, maybe add a test based on what @mlausch is doing.

I think we should change that default - there's very little impact outside of an edge case when mons are unresponsive.

Agreed, how about we cherry-pick the commit from #44016 in this PR and test them together? @satoru-takeuchi does that sound good to you?

satoru-takeuchi · 2022-02-16T20:37:53Z

@neha-ojha Yes, sounds great.

neha-ojha · 2022-02-16T20:59:20Z

@neha-ojha Yes, sounds great.

Thanks @satoru-takeuchi!

@NitzanMordhai please cherry-pick bf4d358 into this PR and keep the original Signed-off-by, thanks!

mlausch · 2022-02-27T18:15:23Z

HI @NitzanMordhai. I tried out the current code. The Mon skipps the new Message type, because it is seems not be allowed send by a OSD.
I have created you a PR one your branch. I hope this is the right way to do this.

mlausch · 2022-03-15T11:22:47Z

works for me as well.

jdurgin · 2022-03-15T21:18:03Z

src/osd/OSD.cc

+		whoami,
+		osdmap->get_addrs(whoami),
+		osdmap->get_epoch(),
+		true  // request ack


is there a reason to not mark the osd dead during a graceful shutdown?

is there a reason to not mark the osd dead during a graceful shutdown?

I don't think there is any reason not, i couldn't find any

let's make it unconditional then

@jdurgin @neha-ojha Done

mlausch · 2022-03-21T11:03:23Z

@neha-ojha can you add a backport lable for pacific as well?

neha-ojha · 2022-03-21T20:33:38Z

@neha-ojha can you add a backport lable for pacific as well?

The associated tracker has pacific and quincy in the backport field, https://tracker.ceph.com/issues/53327#note-3. I added the additional needs-quincy-backport label, because I'd like to include this patch in the first cut of quincy.

When osd_fast_shutdown and osd_fast_shutdown_notify_mon set as true, OSD marked as Down it should be marked as Dead, Fixed: https://tracker.ceph.com/issues/53327 Signed-off-by: Nitzan Mordechai <nmordech@redhat.com> nd nd

ljflores · 2022-03-25T15:58:35Z

http://pulpito.front.sepia.ceph.com/yuriw-2022-03-24_16:44:32-rados-wip-yuri-testing-2022-03-24-0726-distro-default-smithi/

Failures, unrelated, tracked by:
https://tracker.ceph.com/issues/54990
https://tracker.ceph.com/issues/52124
https://tracker.ceph.com/issues/51904

Other unrelated failures include http://pulpito.front.sepia.ceph.com/yuriw-2022-03-24_16:44:32-rados-wip-yuri-testing-2022-03-24-0726-distro-default-smithi/6758316/, which is a rook test, and http://pulpito.front.sepia.ceph.com/yuriw-2022-03-24_16:44:32-rados-wip-yuri-testing-2022-03-24-0726-distro-default-smithi/6758319/, an upgrade test. http://pulpito.front.sepia.ceph.com/yuriw-2022-03-24_16:44:32-rados-wip-yuri-testing-2022-03-24-0726-distro-default-smithi/6758455/ has been analyzied by Sridhar and was deemed unrelated.

Modify test_activate_osd() to get the type of scheduler in use and then verify the value of osd_max_backfills. This is because mclock scheduler overrides this option to 1000 upon OSD initialization. The test earlier used to pass because the OSD daemon was killed but not marked down and upon being brought up, the wait for OSD up check was passing quickly. But the OSD still didn't have the latest config values. But now upon killing the OSD, the osd_fast_shutdown sequence notifies the mon (see PR: ceph#44807) and is marked down and dead. Upon bringing it up, the wait for OSD up check takes a longer time and this is sufficient for the config values to be updated. This results in the correct values being read from the config 'Values' map. Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>

Modify test_activate_osd() to get the type of scheduler in use and then verify the value of osd_max_backfills. This is because mclock scheduler overrides this option to 1000 upon OSD initialization. The test earlier used to pass because the OSD daemon was killed but not marked down and upon being brought up, the wait for OSD up check was passing quickly. But the OSD still didn't have the latest config values. But now upon killing the OSD, the osd_fast_shutdown sequence notifies the mon (see PR: ceph#44807) and is marked down and dead. Upon bringing it up, the wait for OSD up check takes a longer time and this is sufficient for the config values to be updated. This results in the correct values being read from the config 'Values' map. Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com> (cherry picked from commit 3aa2df2)

Modify test_activate_osd() to get the type of scheduler in use and then verify the value of osd_max_backfills. This is because mclock scheduler overrides this option to 1000 upon OSD initialization. The test earlier used to pass because the OSD daemon was killed but not marked down and upon being brought up, the wait for OSD up check was passing quickly. But the OSD still didn't have the latest config values. But now upon killing the OSD, the osd_fast_shutdown sequence notifies the mon (see PR: ceph#44807) and is marked down and dead. Upon bringing it up, the wait for OSD up check takes a longer time and this is sufficient for the config values to be updated. This results in the correct values being read from the config 'Values' map. Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>

Modify test_activate_osd() to get the type of scheduler in use and then verify the value of osd_max_backfills. This is because mclock scheduler overrides this option to 1000 upon OSD initialization. The test earlier used to pass because the OSD daemon was killed but not marked down and upon being brought up, the wait for OSD up check was passing quickly. But the OSD still didn't have the latest config values. But now upon killing the OSD, the osd_fast_shutdown sequence notifies the mon (see PR: ceph#44807) and is marked down and dead. Upon bringing it up, the wait for OSD up check takes a longer time and this is sufficient for the config values to be updated. This results in the correct values being read from the config 'Values' map. Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com> (cherry picked from commit 3aa2df2)

Modify test_activate_osd() to get the type of scheduler in use and then verify the value of osd_max_backfills. This is because mclock scheduler overrides this option to 1000 upon OSD initialization. The test earlier used to pass because the OSD daemon was killed but not marked down and upon being brought up, the wait for OSD up check was passing quickly. But the OSD still didn't have the latest config values. But now upon killing the OSD, the osd_fast_shutdown sequence notifies the mon (see PR: ceph#44807) and is marked down and dead. Upon bringing it up, the wait for OSD up check takes a longer time and this is sufficient for the config values to be updated. This results in the correct values being read from the config 'Values' map. Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>

Modify test_activate_osd() to get the type of scheduler in use and then verify the value of osd_max_backfills. This is because mclock scheduler overrides this option to 1000 upon OSD initialization. The test earlier used to pass because the OSD daemon was killed but not marked down and upon being brought up, the wait for OSD up check was passing quickly. But the OSD still didn't have the latest config values. But now upon killing the OSD, the osd_fast_shutdown sequence notifies the mon (see PR: ceph#44807) and is marked down and dead. Upon bringing it up, the wait for OSD up check takes a longer time and this is sufficient for the config values to be updated. This results in the correct values being read from the config 'Values' map. Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com> (cherry picked from commit 3aa2df2)

github-actions bot added the core label Jan 27, 2022

NitzanMordhai requested review from jdurgin and neha-ojha January 27, 2022 13:20

neha-ojha requested a review from liewegas January 27, 2022 23:43

NitzanMordhai force-pushed the wip-nitzan-fast-shutdown-notify-mon branch from 93378e3 to 7f71ae1 Compare January 28, 2022 08:00

NitzanMordhai force-pushed the wip-nitzan-fast-shutdown-notify-mon branch 2 times, most recently from 4b675f6 to 2c5d957 Compare January 30, 2022 13:42

jdurgin requested changes Feb 2, 2022

View reviewed changes

NitzanMordhai force-pushed the wip-nitzan-fast-shutdown-notify-mon branch from 2c5d957 to c2892ca Compare February 3, 2022 07:30

NitzanMordhai requested a review from jdurgin February 3, 2022 07:31

jdurgin approved these changes Feb 3, 2022

View reviewed changes

neha-ojha added needs-qa wip-yuri4-testing labels Feb 14, 2022

yuriw removed needs-qa wip-yuri4-testing labels Feb 16, 2022

NitzanMordhai force-pushed the wip-nitzan-fast-shutdown-notify-mon branch from c2892ca to 44761f2 Compare February 21, 2022 14:26

github-actions bot added the mon label Feb 21, 2022

NitzanMordhai force-pushed the wip-nitzan-fast-shutdown-notify-mon branch from 44761f2 to 985a90b Compare February 23, 2022 10:01

github-actions bot added the common label Feb 24, 2022

jdurgin reviewed Mar 15, 2022

View reviewed changes

jdurgin approved these changes Mar 15, 2022

View reviewed changes

neha-ojha added needs-quincy-backport backport required for quincy needs-qa labels Mar 17, 2022

yuriw added wip-yuri6-testing and removed wip-yuri6-testing labels Mar 17, 2022

osd/OSD: osd_fast_shutdown_notify_mon not quite right

07302d5

When osd_fast_shutdown and osd_fast_shutdown_notify_mon set as true, OSD marked as Down it should be marked as Dead, Fixed: https://tracker.ceph.com/issues/53327 Signed-off-by: Nitzan Mordechai <nmordech@redhat.com> nd nd

NitzanMordhai force-pushed the wip-nitzan-fast-shutdown-notify-mon branch from 8d65ce8 to 07302d5 Compare March 23, 2022 14:38

neha-ojha requested a review from jdurgin March 23, 2022 16:50

jdurgin approved these changes Mar 23, 2022

View reviewed changes

yuriw added the wip-yuri-testing label Mar 23, 2022

yuriw merged commit ea1727e into ceph:master Mar 25, 2022

sseshasa mentioned this pull request Mar 25, 2022

qa/standalone: Fix test_activate_osd() test in ceph-helpers.sh #45651

Merged

14 tasks

ljflores mentioned this pull request Mar 25, 2022

Quincy: fast shutdown backports #45653

Merged

14 tasks

This was referenced Mar 25, 2022

Pacific fast shutdown backports #45654

Merged

octopus: osd/OSD: osd_fast_shutdown_notify_mon not quite right #45655

Merged

NitzanMordhai mentioned this pull request Mar 28, 2022

pacific: osd/OSD: osd_fast_shutdown_notify_mon not quite right #45668

Closed

satoru-takeuchi mentioned this pull request Jul 1, 2022

osd: make osd_fast_shutdown_notify_mon option true by default #44016

Closed

15 tasks

Conversation

NitzanMordhai commented Jan 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Uh oh!

NitzanMordhai commented Jan 28, 2022

Uh oh!

jdurgin left a comment

Choose a reason for hiding this comment

Uh oh!

mlausch commented Feb 16, 2022

Uh oh!

ronen-fr commented Feb 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jdurgin commented Feb 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yuriw commented Feb 16, 2022

Uh oh!

neha-ojha commented Feb 16, 2022

Uh oh!

jdurgin commented Feb 16, 2022

Uh oh!

neha-ojha commented Feb 16, 2022

Uh oh!

satoru-takeuchi commented Feb 16, 2022

Uh oh!

neha-ojha commented Feb 16, 2022

Uh oh!

mlausch commented Feb 27, 2022

Uh oh!

mlausch commented Mar 15, 2022

Uh oh!

jdurgin Mar 15, 2022

Choose a reason for hiding this comment

Uh oh!

NitzanMordhai Mar 17, 2022

Choose a reason for hiding this comment

Uh oh!

jdurgin Mar 18, 2022

Choose a reason for hiding this comment

Uh oh!

NitzanMordhai Mar 23, 2022

Choose a reason for hiding this comment

Uh oh!

mlausch commented Mar 21, 2022

Uh oh!

neha-ojha commented Mar 21, 2022

Uh oh!

ljflores commented Mar 25, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

NitzanMordhai commented Jan 27, 2022 •

edited

Loading

ronen-fr commented Feb 16, 2022 •

edited

Loading

jdurgin commented Feb 16, 2022 •

edited

Loading