Project

General

Profile

Actions

Bug #70327

closed

v19.2.1 breaking compatibility with v19.2.0 and consumes all CPU available to OSDs

Added by Fred Heinecke about 1 year ago. Updated 7 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
% Done:

0%

Source:
Community (user)
Backport:
https://github.com/ceph/ceph/pull/59065
Regression:
Yes
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Tags (freeform):
Fixed In:
v19.2.2-784-gbdaad91566
Released In:
v19.2.3~209
Upkeep Timestamp:
2025-08-26T12:52:50+00:00

Description

The v19.2.1 release contained a backport (https://github.com/ceph/ceph/pull/59065) with a breaking change. This PR removed the `bdev_async_discard` configuration value, and replaced it with `bdev_async_discard_thread`. This new flag is a superset of the old flag, allowing for multiple threads to be used for a task that was previously single-threaded. However, by removing the old config value from the stable v19.2.0 release breaks Ceph deployments in the following ways:
  • If the value was set via ceph.conf, it is now silently ignored, and async discards are disabled.
  • If the value was set via `ceph config set`, it is now silently ignored, and async discards are disabled.
  • If the value is set via `ceph config set` after upgrading to v19.2.1 (via automation such as Rook), the command errors. This causes Rook deployments (new ones and upgrades) to repeatedly fail.

This may be the cause of https://tracker.ceph.com/issues/70010, as the reporter stated (on Slack) that their issue went away after they replaced the removed option with the new one.

In addition to this breaking backwards compatibility, setting the value above 1 causes all CPU cores available to each OSD to run at 100% until the value is reduced to 1 (or 0). See the below graphs for examples of this from three independent Ceph clusters and admins:

Going from unset to "4" to "1"


Going from unset to "4"


Going from unset to "4" to "2"

Related PRs:

Files

clipboard-202503051424-p2d1m.png (136 KB) clipboard-202503051424-p2d1m.png Fred Heinecke, 03/05/2025 08:24 PM
clipboard-202503051424-ma5y7.png (40.5 KB) clipboard-202503051424-ma5y7.png Fred Heinecke, 03/05/2025 08:24 PM
clipboard-202503051424-tjdna.png (96.4 KB) clipboard-202503051424-tjdna.png Fred Heinecke, 03/05/2025 08:24 PM

Related issues 1 (0 open1 closed)

Related to bluestore - Bug #70010: [WRN] BLUESTORE_SLOW_OP_ALERT: x OSD(s) experiencing slow operations in BlueStoreDuplicate

Actions
Actions #1

Updated by Igor Fedotov about 1 year ago

  • Related to Bug #70010: [WRN] BLUESTORE_SLOW_OP_ALERT: x OSD(s) experiencing slow operations in BlueStore added
Actions #2

Updated by Satoru Takeuchi about 1 year ago

Isn't "Pull request ID:" #62151?

Actions #3

Updated by Yite Gu about 1 year ago

  • Status changed from New to Fix Under Review
  • Assignee set to Yite Gu
  • Target version changed from v19.0.0 to v19.2.3
  • Pull request ID changed from 59065 to 62151
Actions #4

Updated by Yite Gu about 1 year ago

Satoru Takeuchi wrote in #note-2:

Isn't "Pull request ID:" #62151?

Yes

Actions #5

Updated by Igor Fedotov about 1 year ago

Additional PR to bring "bdev_async_discard" parameter back: https://github.com/ceph/ceph/pull/62254

Actions #6

Updated by Igor Fedotov 11 months ago

  • Status changed from Fix Under Review to Resolved
Actions #7

Updated by Upkeep Bot 9 months ago

  • Merge Commit set to bdaad9156682257f74e33a0696a50fd1be6f8917
  • Fixed In set to v19.2.2-784-gbdaad915668
  • Upkeep Timestamp set to 2025-07-09T19:08:46+00:00
Actions #8

Updated by Upkeep Bot 8 months ago

  • Fixed In changed from v19.2.2-784-gbdaad915668 to v19.2.2-784-gbdaad91566
  • Upkeep Timestamp changed from 2025-07-09T19:08:46+00:00 to 2025-07-14T18:13:24+00:00
Actions #9

Updated by Upkeep Bot 7 months ago

  • Released In set to v19.2.3~209
  • Upkeep Timestamp changed from 2025-07-14T18:13:24+00:00 to 2025-08-26T12:52:50+00:00
Actions

Also available in: Atom PDF