Project

General

Profile

Actions

Enhancement #57040

closed

osd: Update osd's IOPS capacity using async Context completion instead of cond wait.

Added by Sridhar Seshasayee over 3 years ago. Updated 8 months ago.

Status:
Resolved
Priority:
Normal
Category:
-
Target version:
-
% Done:

0%

Source:
Backport:
quincy
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:
Tags (freeform):
Fixed In:
v17.0.0-14813-g99f42bfba76
Released In:
v18.2.0~1397
Upkeep Timestamp:
2025-07-13T06:12:05+00:00

Description

The method, OSD::mon_cmd_set_config(), sets a config option related to
mClock during OSD boot-up. The method waits on a condition variable
until the mon acks the command. This is generally not a problem. But
there could be scenarios where monitor could be slow to respond, or due
to a flaky network, response could be delayed. The OSD could therefore
be blocked from booting-up. To avoid this, the conditional wait can be
replaced with an async Context completion.

Moreover, persisting this in the monitor store is not very critical. An
existing fallback mechanism stores this value in the in-memory "values"
map of the config subsystem. This can be read by the OSD at any point
during its operation.

The issue of the OSDs being blocked from booting-up properly was
observed when running tests with failure injections during OSD boot-up.


Related issues 2 (0 open2 closed)

Has duplicate RADOS - Bug #56574: rados/valgrind-leaks: cluster [WRN] Health check failed: 2 osds down (OSD_DOWN)" in cluster logDuplicateSridhar Seshasayee

Actions
Copied to RADOS - Backport #57443: quincy: osd: Update osd's IOPS capacity using async Context completion instead of cond wait.ResolvedSridhar SeshasayeeActions
Actions #1

Updated by Sridhar Seshasayee over 3 years ago

  • Status changed from New to Fix Under Review
  • Backport set to quincy
  • Pull request ID set to 47456
Actions #2

Updated by Sridhar Seshasayee over 3 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #3

Updated by Upkeep Bot over 3 years ago

  • Copied to Backport #57443: quincy: osd: Update osd's IOPS capacity using async Context completion instead of cond wait. added
Actions #5

Updated by Sridhar Seshasayee over 3 years ago

  • Status changed from Pending Backport to Resolved
Actions #6

Updated by Sridhar Seshasayee 9 months ago

  • Has duplicate Bug #56574: rados/valgrind-leaks: cluster [WRN] Health check failed: 2 osds down (OSD_DOWN)" in cluster log added
Actions #7

Updated by Upkeep Bot 8 months ago

  • Merge Commit set to 99f42bfba7625df1dfb2d8f8bdf1b1fa118dc813
  • Fixed In set to v17.0.0-14813-g99f42bfba76
  • Released In set to v18.2.0~1397
  • Upkeep Timestamp set to 2025-07-13T06:12:05+00:00
Actions

Also available in: Atom PDF