squid: a series of optimizations for kerneldevice discard by YiteGu · Pull Request #59065 · ceph/ceph

YiteGu · 2024-08-07T02:50:40Z

blk: mul-thread discard support
NCB fix for leaked space when bdev_async_discard is enabled
React to bdev_enable_discard changes in handle_conf_change(); Fix several issues with stopping discard threads

backport tracker: https://tracker.ceph.com/issues/67139
backport tracker: https://tracker.ceph.com/issues/67404

backport of #55469, #56744, #58409
parent tracker: https://tracker.ceph.com/issues/65298, https://tracker.ceph.com/issues/66817

Contribution Guidelines

To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

Tracker (select at least one)
- References tracker ticket
- Very recent bug; references commit where it was introduced
- New feature (ticket optional)
- Doc update (no ticket needed)
- Code cleanup (no ticket needed)
Component impact
- Affects Dashboard, opened tracker ticket
- Affects Orchestrator, opened tracker ticket
- No impact that needs to be tracked
Documentation (select at least one)
- Updates relevant documentation
- No doc update is appropriate
Tests (select at least one)
- Includes unit test(s)
- Includes integration test(s)
- Includes bug reproducer
- No tests

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows
jenkins test rook e2e

Signed-off-by: Matt Vandermeulen <matt@reenigne.net> (cherry picked from commit 4ae47bd)

Signed-off-by: Matt Vandermeulen <matt@reenigne.net> (cherry picked from commit d8815e1)

Signed-off-by: Matt Vandermeulen <matt@reenigne.net> (cherry picked from commit 671e126)

Signed-off-by: Matt Vandermeulen <matt@reenigne.net> (cherry picked from commit 5c4a234)

…bled Fix calls bdev->discard_drain() before calling store_allocator() to make sure all freed space is reflected in the allocator before destaging it The fix set a timeout for the drain call (500msec) and if expires will not store the allocator (forcing a recovery on the next startup) Fixes: https://tracker.ceph.com/issues/65298 Signed-off-by: Gabriel BenHanokh <gbenhano@redhat.com> (cherry picked from commit 3aa891d)

…toring the allocator. ON fast shutdown we will simply copy the discard queue entries to the allocator Signed-off-by: Gabriel BenHanokh <gbenhano@redhat.com> (cherry picked from commit 4762ffa)

On fast-shutdown take over the main discarded queue copying it to the allocator and only wait for the threads to commit their small private discarded queues Signed-off-by: Gabriel BenHanokh <gbenhano@redhat.com> (cherry picked from commit 1e10a9b)

Signed-off-by: Gabriel BenHanokh <gbenhano@redhat.com> (cherry picked from commit b37080c)

…_change() This fixes two issues that were introduced by 755f3e0: 1. After an OSD boots, discard threads were not stopped when bdev_enable_discard was set to false, whereas that was the intent of that commit. 2. If bdev_enable_discard or bdev_async_discard_threads are configured with a mask that can't be evaluated at OSD boot (e.g. a device class), then async discard won't be enabled until a later config change to bdev_async_discard_threads. Fixes: https://tracker.ceph.com/issues/66817 Signed-off-by: Joshua Baergen <jbaergen@digitalocean.com> (cherry picked from commit 8ffe35e)

1. In _discard_stop(), the wait for !discard_threads.empty() was there from a prior implementation where there could theoretically be a race between _discard_start() and _discard_stop(). If that race does exist, this check won't help; not only is _discard_stop() not called if discard_threads is empty, but discard_threads won't be populated until _discard_start() runs and thus this won't detect such a race. 2. Calling _discard_stop() from handle_conf_change() is a guaranteed deadlock because discard_lock is already held, so don't do that. Use the same flow whether we're stopping a subset of threads or all threads. 3. Asking a subset of discard threads to stop was not guaranteed to take effect, since if they continued to find contents in discard_queue then they would continue to run indefinitely. Add additional logic to _discard_thread() to have threads stop if they have been requested to stop and other threads exist to continue draining discard_queue. 4. Make the flow of _discard_stop() and handle_conf_change() more similar. Fixes: https://tracker.ceph.com/issues/66817 Signed-off-by: Joshua Baergen <jbaergen@digitalocean.com> (cherry picked from commit 3d4a899)

Instead of having _discard_start() and _discard_stop() partially or completely duplicate functionality in handle_conf_change(), have a single _discard_update_threads() that can handle all three. Loops are tidied slightly, the unnecessary target_discard_threads class variable has been removed, and now handle_conf_change() will respect support_discard. Signed-off-by: Joshua Baergen <jbaergen@digitalocean.com> (cherry picked from commit 617c936)

YiteGu · 2024-08-07T06:00:32Z

jenkins test make check

ljflores · 2024-09-30T19:51:41Z

@YiteGu @ifed01 please see this error found in QA testing that looks related to this PR:
/a/skanta-2024-09-27_06:56:34-rados-wip-bharath14-testing-2024-09-26-2119-squid-distro-default-smithi/7921618

2024-09-27T09:22:38.858 INFO:tasks.rados.rados.0.smithi006.stdout:4227:  finishing copy_from to smithi00633089-523
2024-09-27T09:22:38.858 INFO:tasks.rados.rados.0.smithi006.stdout:update_object_version oid 523 v 1068 (ObjNum 1802 snap 167 seq_num 1802) dirty exists
2024-09-27T09:22:38.867 INFO:teuthology.orchestra.run.smithi179.stderr:error preparing db environment: (5) Input/output error
2024-09-27T09:22:38.870 DEBUG:teuthology.orchestra.run:got remote process result: 1
2024-09-27T09:22:38.872 INFO:tasks.thrashosds.thrasher:Traceback (most recent call last):
  File "/home/teuthworker/src/github.com_ceph_ceph-c_396230dce0bbb423ab67ae5d3625348d61b9460c/qa/tasks/ceph_manager.py", line 192, in wrapper
    return func(self)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_396230dce0bbb423ab67ae5d3625348d61b9460c/qa/tasks/ceph_manager.py", line 1435, in _do_thrash
    self.choose_action()()
  File "/home/teuthworker/src/github.com_ceph_ceph-c_396230dce0bbb423ab67ae5d3625348d61b9460c/qa/tasks/ceph_manager.py", line 1171, in test_bluestore_reshard
    self.test_bluestore_reshard_action()
  File "/home/teuthworker/src/github.com_ceph_ceph-c_396230dce0bbb423ab67ae5d3625348d61b9460c/qa/tasks/ceph_manager.py", line 1151, in test_bluestore_reshard_action
    raise Exception("ceph-bluestore-tool resharding failed.")
Exception: ceph-bluestore-tool resharding failed.

Ref: https://tracker.ceph.com/issues/68294

YiteGu · 2024-10-02T15:39:31Z

@YiteGu @ifed01 please see this error found in QA testing that looks related to this PR: /a/skanta-2024-09-27_06:56:34-rados-wip-bharath14-testing-2024-09-26-2119-squid-distro-default-smithi/7921618

2024-09-27T09:22:38.858 INFO:tasks.rados.rados.0.smithi006.stdout:4227:  finishing copy_from to smithi00633089-523
2024-09-27T09:22:38.858 INFO:tasks.rados.rados.0.smithi006.stdout:update_object_version oid 523 v 1068 (ObjNum 1802 snap 167 seq_num 1802) dirty exists
2024-09-27T09:22:38.867 INFO:teuthology.orchestra.run.smithi179.stderr:error preparing db environment: (5) Input/output error
2024-09-27T09:22:38.870 DEBUG:teuthology.orchestra.run:got remote process result: 1
2024-09-27T09:22:38.872 INFO:tasks.thrashosds.thrasher:Traceback (most recent call last):
  File "/home/teuthworker/src/github.com_ceph_ceph-c_396230dce0bbb423ab67ae5d3625348d61b9460c/qa/tasks/ceph_manager.py", line 192, in wrapper
    return func(self)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_396230dce0bbb423ab67ae5d3625348d61b9460c/qa/tasks/ceph_manager.py", line 1435, in _do_thrash
    self.choose_action()()
  File "/home/teuthworker/src/github.com_ceph_ceph-c_396230dce0bbb423ab67ae5d3625348d61b9460c/qa/tasks/ceph_manager.py", line 1171, in test_bluestore_reshard
    self.test_bluestore_reshard_action()
  File "/home/teuthworker/src/github.com_ceph_ceph-c_396230dce0bbb423ab67ae5d3625348d61b9460c/qa/tasks/ceph_manager.py", line 1151, in test_bluestore_reshard_action
    raise Exception("ceph-bluestore-tool resharding failed.")
Exception: ceph-bluestore-tool resharding failed.

Ref: https://tracker.ceph.com/issues/68294

At first glance, this backport does not modify the code related to ceph-bluestore-tool rehard action. where I can see log file of ceph-bluestore-tool reshard?

src/blk/kernel/KernelDevice.cc

ifed01

Please bring discard_stop initialization back.

ifed01 · 2024-10-14T13:50:19Z

@YiteGu @ifed01 please see this error found in QA testing that looks related to this PR:
/a/skanta-2024-09-27_06:56:34-rados-wip-bharath14-testing-2024-09-26-2119-squid-distro-default-smithi/7921618

I have a feeling we have two issues in this batch.
The first one is valgrind error on uninitialized variable (presumable discard_stop),
see https://qa-proxy.ceph.com/teuthology/skanta-2024-09-27_06:56:34-rados-wip-bharath14-testing-2024-09-26-2119-squid-distro-default-smithi/7921444/teuthology.log

description: rados/verify/{centos_latest ceph clusters/{fixed-2 openstack} d-thrash/none
  mon_election/classic msgr-failures/few msgr/async-v1only objectstore/bluestore-low-osd-mem-target
  rados tasks/rados_cls_all validater/valgrind}
duration: 914.6719253063202
failure_reason: 'valgrind error: UninitCondition

  KernelDevice::_discard_update_threads()

  KernelDevice::open(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&)

  BlueStore::read_meta(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*)'
flavor: default
owner: scheduled_skanta@teuthology
sentry_event: https://sentry.ceph.com/organizations/ceph/?query=1feadf9efd354f1590bb35efa2e49b1e
status: fail
success: false

This is definitely to be addressed within this PR's scope.

The second one "ceph-bluestore-tool resharding failed" and it needs further investigation. Not sure if it's related to this PR.

Value discard_stop could be uninitialized. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com> (cherry picked from commit bdcc7da)

YiteGu · 2024-10-15T03:08:47Z

Please bring discard_stop initialization back.

done

YiteGu · 2024-10-15T06:05:07Z

jenkins test make check

ifed01 · 2024-10-15T10:43:02Z

The second one "ceph-bluestore-tool resharding failed" and it needs further investigation. Not sure if it's related to this PR.

So after checking the logs and talking to Adam I'm pretty sure that my hypothesis is valid - there were two independent issues with the last QA run. The second one is tracked by https://tracker.ceph.com/issues/67911

aclamk · 2024-10-15T10:45:16Z

@ljflores The issue in #59065 (comment) is caused by:
https://tracker.ceph.com/issues/67911

ifed01 · 2024-10-15T11:03:36Z

So @ljflores - please include #59969 along with this PR to get rid of the above issues.

ljflores · 2024-11-06T23:18:39Z

@YiteGu @ifed01 This PR was merged prematurely before it got tested. Can you please review the revert PR (#60641) and re-raise this PR?

parth-gr · 2025-01-28T13:52:26Z

@ljflores @SrinivasaBharath This might be the reason for Bluestore osd-exapnsion failing https://github.com/rook/rook/pull/15251#issuecomment-2582895014|

cc @travisn

travisn · 2025-01-28T19:28:23Z

Looks like the revert PR #60641 has not been merged yet. So if this is related, we should be able to see the latest squid devel image failing in the resize testing: quay.ceph.io/ceph-ci/ceph:squid

parth-gr · 2025-01-29T14:45:34Z

Yes its failing with quay.ceph.io/ceph-ci/ceph:squid

solidDoWant · 2025-03-04T00:10:35Z

Hey Ceph maintainers, this change removed bdev_async_discard, which is a breaking change.

While this is mentioned in the changelog, it only includes the PR title which isn't very descriptive. Can this change be rolled back until v20, or be made compatible with the current bdev_async_discard field, or can a large warning be added to the v19.2.1 changelog?

solidDoWant · 2025-03-04T04:37:05Z

This change causing a pretty major performance regression when bdev_async_discard_threads is greater than 1:

The charge from ~0 to ~12 is when I changed from 1 to 4.

I'm not the only one seeing this. Here's a few screenshots from some other users:

Before setting it, after raising it to 4, and after setting it to 1

Before setting it, after raising it to 4, and after dropping to 2

I can't file an issue on the official bugtracker because account creation requires manual approval. @YiteGu @ifed01 Sorry for the ping but I don't know how else to get this in front of somebody.

YiteGu · 2025-03-04T11:15:49Z

Hey Ceph maintainers, this change removed bdev_async_discard, which is a breaking change.

While this is mentioned in the changelog, it only includes the PR title which isn't very descriptive. Can this change be rolled back until v20, or be made compatible with the current bdev_async_discard field, or can a large warning be added to the v19.2.1 changelog?

The purpose of bdev_async_discard_threads is to enable concurrent completion of discard op using more threads. If set to size of one is no change from the existing behaviour. At this point, the bdev_async_discard option becomes redundant.

solidDoWant · 2025-03-04T23:05:35Z

I completely get that the bdev_async_discard field is now redundant, and setting it to 1 preserves the current behavior. However, because this requires ceph admins to take action to preserve the current behavior, this is by definition a breaking change.

Unless I am missing something here, ceph admins who have previously set ceph config set global bdev_async_discard 1, or have a config file with

[global]
bdev_async_discard = 1

will suddenly, without any notice, have async discards disabled on their clusters.

Additionally, running ceph config set global bdev_async_discard 1 now fails on v19.2.1 forward, which breaks automation and other tools that integrate with ceph, like Rook.

This change is breaking compatibility with v19 stable, and without backwards compatibility should have been delayed until v20.

YiteGu · 2025-03-06T07:42:32Z

I completely get that the bdev_async_discard field is now redundant, and setting it to 1 preserves the current behavior. However, because this requires ceph admins to take action to preserve the current behavior, this is by definition a breaking change.

Unless I am missing something here, ceph admins who have previously set ceph config set global bdev_async_discard 1, or have a config file with
[global]
bdev_async_discard = 1
will suddenly, without any notice, have async discards disabled on their clusters.

Additionally, running ceph config set global bdev_async_discard 1 now fails on v19.2.1 forward, which breaks automation and other tools that integrate with ceph, like Rook.

This change is breaking compatibility with v19 stable, and without backwards compatibility should have been delayed until v20.

I completely understand how you feel. It is indeed a bit troublesome during the period of introducing new configurations and canceling old configurations.

YiteGu · 2025-03-06T09:01:26Z

This change causing a pretty major performance regression when bdev_async_discard_threads is greater than 1:

The charge from ~0 to ~12 is when I changed from 1 to 4.

I'm not the only one seeing this. Here's a few screenshots from some other users:

Before setting it, after raising it to 4, and after setting it to 1

Before setting it, after raising it to 4, and after dropping to 2

I can't file an issue on the official bugtracker because account creation requires manual approval. @YiteGu @ifed01 Sorry for the ping but I don't know how else to get this in front of somebody.

19.2.1 is missing #59529 , causing excessive CPU load during multi-threaded discard, please keep bdev_async_discard_threads = 1 at 19.2.1

I LOVE BREAKING CHANGES! See ceph/ceph#59065 for details

Matt1360 and others added 11 commits August 7, 2024 10:45

common: add discard threads option, descriptions and flags

1239568

Signed-off-by: Matt Vandermeulen <matt@reenigne.net> (cherry picked from commit 4ae47bd)

blk: add threaded discard support to kernel devices

91b0fe4

Signed-off-by: Matt Vandermeulen <matt@reenigne.net> (cherry picked from commit d8815e1)

common: remove lingering bdev_async_discard option

519730b

Signed-off-by: Matt Vandermeulen <matt@reenigne.net> (cherry picked from commit 671e126)

blk: support bdev_async_discard_threads == 0

a776b6e

Signed-off-by: Matt Vandermeulen <matt@reenigne.net> (cherry picked from commit 5c4a234)

On graceful shutdown we will wait for discard queue to drain before s…

d65eebc

…toring the allocator. ON fast shutdown we will simply copy the discard queue entries to the allocator Signed-off-by: Gabriel BenHanokh <gbenhano@redhat.com> (cherry picked from commit 4762ffa)

style changes requested by Igor

c3dc1c4

Signed-off-by: Gabriel BenHanokh <gbenhano@redhat.com> (cherry picked from commit b37080c)

YiteGu requested a review from a team as a code owner August 7, 2024 02:50

github-actions bot added bluestore common core labels Aug 7, 2024

github-actions bot added this to the squid milestone Aug 7, 2024

ifed01 approved these changes Aug 7, 2024

View reviewed changes

ifed01 added the needs-qa label Sep 17, 2024

SrinivasaBharath added the wip-bharath14-testing label Sep 26, 2024

ljflores added TESTED and removed needs-qa wip-bharath14-testing labels Sep 30, 2024

ifed01 reviewed Oct 14, 2024

View reviewed changes

src/blk/kernel/KernelDevice.cc Show resolved Hide resolved

ifed01 self-requested a review October 14, 2024 13:42

ifed01 requested changes Oct 14, 2024

View reviewed changes

blk/kernel: Fix uninitialized discard_stop

46d33cc

Value discard_stop could be uninitialized. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com> (cherry picked from commit bdcc7da)

ifed01 self-requested a review October 15, 2024 10:07

ifed01 approved these changes Oct 15, 2024

View reviewed changes

ifed01 added needs-qa and removed TESTED labels Oct 15, 2024

SrinivasaBharath merged commit aee0006 into ceph:squid Nov 4, 2024

SrinivasaBharath added wip-bharath7-testing and removed wip-bharath7-testing labels Nov 4, 2024

JackMyers001 added a commit to JackMyers001/ducknet-ops that referenced this pull request May 30, 2025

fix(rook): set bdev_async_discard_threads to 1

4b129e0

I LOVE BREAKING CHANGES! See ceph/ceph#59065 for details

Conversation

YiteGu commented Aug 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Contribution Guidelines

Checklist

Uh oh!

YiteGu commented Aug 7, 2024

Uh oh!

ljflores commented Sep 30, 2024

Uh oh!

YiteGu commented Oct 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ifed01 left a comment

Choose a reason for hiding this comment

Uh oh!

ifed01 commented Oct 14, 2024

Uh oh!

YiteGu commented Oct 15, 2024

Uh oh!

YiteGu commented Oct 15, 2024

Uh oh!

ifed01 commented Oct 15, 2024

Uh oh!

aclamk commented Oct 15, 2024

Uh oh!

ifed01 commented Oct 15, 2024

Uh oh!

ljflores commented Nov 6, 2024

Uh oh!

parth-gr commented Jan 28, 2025

Uh oh!

travisn commented Jan 28, 2025

Uh oh!

parth-gr commented Jan 29, 2025

Uh oh!

solidDoWant commented Mar 4, 2025

Uh oh!

solidDoWant commented Mar 4, 2025

Uh oh!

YiteGu commented Mar 4, 2025

Uh oh!

solidDoWant commented Mar 4, 2025

Uh oh!

YiteGu commented Mar 6, 2025

Uh oh!

YiteGu commented Mar 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

YiteGu commented Aug 7, 2024 •

edited

Loading

YiteGu commented Oct 2, 2024 •

edited

Loading