osd: Implement Context based completion for mon cmd to set a config option by sseshasa · Pull Request #47456 · ceph/ceph

sseshasa · 2022-08-04T11:18:17Z

The method, OSD::mon_cmd_set_config() currently sets a config option
related to mClock during OSD boot-up. The method used to wait on a
condition variable until the mon ack'ed the command. This was generally
not a problem.

But there could be scenarios where monitor could be slow to respond, or
due to a flaky network, response could be delayed. The OSD could therefore
be blocked from booting-up. To avoid this, the conditional wait can be
replaced with an async Context completion.

Moreover, persisting this in the monitor store is not very critical. An
existing fallback mechanism stores this value in the in-memory "values"
map of the config subsystem. This can be read by the OSD at any point
during its operation.

The issue of the OSDs being blocked from booting-up properly was
observed when running tests with failure injections during OSD boot-up.

Following are the changes:

The changes to mon_cmd_set_config() are generic and any osd specific
option may be set in the config monitor store using this method.

Implement Context based completion tracking of the mon command using
MonCmdSetConfigOnFinish which is derived from the base Context class.
In case of failures, the finish() method is overriden to save the config
option in the config subsystem's "values" map at CONF_DEFAULT level
using set_val_default() method. This allows users to modify this at a
later point using "ceph config set" cli command.
Additionally, if requested, finish() also checks if the config option
needs to be applied on each op shard and calls update_scheduler_config()
on each shard. This is required for some config options related to the
mClock scheduler.

Fixes: https://tracker.ceph.com/issues/57040
Signed-off-by: Sridhar Seshasayee sseshasa@redhat.com

Contribution Guidelines

To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

Checklist

Tracker (select at least one)
- References tracker ticket
- Very recent bug; references commit where it was introduced
- New feature (ticket optional)
- Doc update (no ticket needed)
- Code cleanup (no ticket needed)
Component impact
- Affects Dashboard, opened tracker ticket
- Affects Orchestrator, opened tracker ticket
- No impact that needs to be tracked
Documentation (select at least one)
- Updates relevant documentation
- No doc update is appropriate
Tests (select at least one)
- Includes unit test(s)
- Includes integration test(s)
- Includes bug reproducer
- No tests

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows

sseshasa · 2022-08-04T11:29:53Z

Teuthology Tests
https://pulpito.ceph.com/sseshasa-2022-08-03_18:56:32-smoke:basic-wip-sseshasa-testing-mon-nack-distro-default-smithi/
https://pulpito.ceph.com/sseshasa-2022-08-04_08:53:37-rbd:nbd-wip-sseshasa-testing-mon-nack-distro-default-smithi/
https://pulpito.ceph.com/sseshasa-2022-08-04_11:34:29-rbd:thrash-wip-sseshasa-testing-mon-nack-distro-default-smithi/

The above are thrash tests with msgr failures. The logs from the tests were analyzed and it was confirmed that the
OSDs now boots up without waiting, and sets the shard specific IOPS capacity when the Context is completed.

sseshasa · 2022-08-04T14:30:07Z

jenkins test make check

src/osd/OSD.cc

athanatos

LGTM

sseshasa · 2022-08-05T06:22:03Z

jenkins test make check

sseshasa · 2022-08-05T06:22:26Z

jenkins test api

sseshasa · 2022-08-05T06:23:38Z

jenkins test windows

sseshasa · 2022-08-05T06:24:48Z

jenkins render docs

sseshasa · 2022-08-05T06:26:18Z

jenkins test docs

sseshasa · 2022-08-05T06:26:55Z

jenkins retest this please

…ption The method, OSD::mon_cmd_set_config() currently sets a config option related to mClock during OSD boot-up. The method used to wait on a condition variable until the mon ack'ed the command. This was generally not a problem. But there could be scenarios where monitor could be slow to respond, or due to a flaky network, response could be delayed. The OSD could therefore be blocked from booting-up. To avoid this, the conditional wait is replaced with an async Context completion. Moreover, persisting this in the monitor store is not very critical. An existing fallback mechanism stores this value in the in-memory "values" map of the config subsystem. This can be read by the OSD at any point during its operation. The issue of the OSDs being blocked from booting-up properly was observed when running tests with failure injections during OSD boot-up. Following are the changes: The changes to mon_cmd_set_config() are generic and any osd specific option may be set in the config monitor store using this method. - Implement Context based completion tracking of the mon command using MonCmdSetConfigOnFinish which is derived from the base Context class. In case of failures, the finish() method is overriden to save the config option in the config subsystem's "values" map at CONF_DEFAULT level using set_val_default() method. This allows users to modify this at a later point using "ceph config set" cli command. - Additionally, if requested, finish() also checks if the config option needs to be applied on each op shard and calls update_scheduler_config() on each shard. This is required for some config options related to the mClock scheduler. Fixes: https://tracker.ceph.com/issues/57040 Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>

sseshasa · 2022-08-08T09:44:47Z

jenkins test windows

sseshasa · 2022-08-08T09:45:23Z

jenkins test make check

sseshasa · 2022-08-08T12:12:37Z

jenkins test make check

sseshasa · 2022-08-08T12:12:48Z

jenkins test windows

sseshasa · 2022-08-09T15:06:28Z

jenkins test windows

sseshasa · 2022-08-10T11:51:59Z

jenkins test windows

sseshasa · 2022-08-16T03:25:29Z

jenkins test windows

sseshasa · 2022-09-05T14:56:05Z

Teuthology Test Result
http://pulpito.front.sepia.ceph.com/?branch=wip-yuri5-testing-2022-08-18-0812
Summary: 294/319 Passed, 22 Failed, 3 dead jobs

Unrelated Failures
The following are the unrelated failures for which trackers already exist:

https://tracker.ceph.com/issues/57368 - The CustomResourceDefinition "installations.operator.tigera.io" is invalid: metadata.annotations: Too long: must have at most 262144 bytes
https://tracker.ceph.com/issues/57122 - test failure: rados:singleton-nomsgr librados_hello_world - Ceph - RADOS
https://tracker.ceph.com/issues/55853 - cls_rgw test failures
https://tracker.ceph.com/issues/57165 - expected valgrind issues and found none
https://tracker.ceph.com/issues/57270 - stderr E: The repository 'https://download.ceph.com/debian-octopus jammy Release' does not have a Release file
https://tracker.ceph.com/issues/52321 - qa/tasks/rook times out: 'check osd count' reached maximum tries (90) after waiting for 900 seconds - Ceph - Orchestrator
https://tracker.ceph.com/issues/56850 - src/mon/PaxosService.cc: 193: FAILED ceph_assert(have_pending)

sseshasa · 2022-09-05T15:01:11Z

jenkins test windows

sseshasa requested a review from a team as a code owner August 4, 2022 11:18

github-actions bot added the core label Aug 4, 2022

sseshasa force-pushed the wip-fix-mon-cmd-nack branch from 191ef31 to c1f9db1 Compare August 4, 2022 18:08

athanatos reviewed Aug 4, 2022

View reviewed changes

src/osd/OSD.cc Outdated Show resolved Hide resolved

athanatos approved these changes Aug 4, 2022

View reviewed changes

sseshasa force-pushed the wip-fix-mon-cmd-nack branch from c1f9db1 to 9f48c34 Compare August 4, 2022 18:47

sseshasa added the needs-qa label Aug 8, 2022

sseshasa force-pushed the wip-fix-mon-cmd-nack branch from 9f48c34 to 3c0603c Compare August 8, 2022 07:59

yuriw added the wip-yuri5-testing label Aug 16, 2022

yuriw merged commit 99f42bf into ceph:main Sep 5, 2022

sseshasa mentioned this pull request Sep 6, 2022

quincy: osd: Implement Context based completion for mon cmd to set a config option #47983

Merged

Conversation

sseshasa commented Aug 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Contribution Guidelines

Checklist

Uh oh!

sseshasa commented Aug 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sseshasa commented Aug 4, 2022

Uh oh!

Uh oh!

athanatos left a comment

Choose a reason for hiding this comment

Uh oh!

sseshasa commented Aug 5, 2022

Uh oh!

sseshasa commented Aug 5, 2022

Uh oh!

sseshasa commented Aug 5, 2022

Uh oh!

sseshasa commented Aug 5, 2022

Uh oh!

sseshasa commented Aug 5, 2022

Uh oh!

sseshasa commented Aug 5, 2022

Uh oh!

sseshasa commented Aug 8, 2022

Uh oh!

sseshasa commented Aug 8, 2022

Uh oh!

sseshasa commented Aug 8, 2022

Uh oh!

sseshasa commented Aug 8, 2022

Uh oh!

sseshasa commented Aug 9, 2022

Uh oh!

sseshasa commented Aug 10, 2022

Uh oh!

sseshasa commented Aug 16, 2022

Uh oh!

sseshasa commented Sep 5, 2022

Uh oh!

sseshasa commented Sep 5, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sseshasa commented Aug 4, 2022 •

edited

Loading

sseshasa commented Aug 4, 2022 •

edited

Loading