Skip to content

blk/KernelDevice: React to bdev_enable_discard changes in handle_conf_change(); Fix several issues with stopping discard threads#58409

Merged
ljflores merged 3 commits intoceph:mainfrom
baergj:upstream-fix-async-discard-on-start
Jul 29, 2024
Merged

blk/KernelDevice: React to bdev_enable_discard changes in handle_conf_change(); Fix several issues with stopping discard threads#58409
ljflores merged 3 commits intoceph:mainfrom
baergj:upstream-fix-async-discard-on-start

Conversation

@baergj
Copy link

@baergj baergj commented Jul 3, 2024

Fixes: https://tracker.ceph.com/issues/66817

Test cases I've run:

Boot with class-masked bdev_enable_discard=true, global bdev_async_discard_threads=1:
* Discards issued
* 2 discard threads running (this is not seen before these changes; two because two KernelDevices exist)
PASS

Set bdev_enable_discard=false:
* Discards stop
* Discard threads disappear
PASS

Set bdev_enable_discard=true:
* Discards issued
* 2 discard threads running
PASS

Set bdev_async_discard_threads=2:
* Discards issued
* 4 discard threads running
PASS

Set bdev_async_discard_threads=1:
* Discards issued
* 2 discard threads running
PASS

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

…_change()

This fixes two issues that were introduced by 755f3e0:
1. After an OSD boots, discard threads were not stopped when
   bdev_enable_discard was set to false, whereas that was the intent of
   that commit.
2. If bdev_enable_discard or bdev_async_discard_threads are configured
   with a mask that can't be evaluated at OSD boot (e.g. a device
   class), then async discard won't be enabled until a later config
   change to bdev_async_discard_threads.

Fixes: https://tracker.ceph.com/issues/66817
Signed-off-by: Joshua Baergen <jbaergen@digitalocean.com>
1. In _discard_stop(), the wait for !discard_threads.empty() was there
   from a prior implementation where there could theoretically be a race
   between _discard_start() and _discard_stop(). If that race does
   exist, this check won't help; not only is _discard_stop() not called
   if discard_threads is empty, but discard_threads won't be populated
   until _discard_start() runs and thus this won't detect such a race.
2. Calling _discard_stop() from handle_conf_change() is a guaranteed
   deadlock because discard_lock is already held, so don't do that. Use
   the same flow whether we're stopping a subset of threads or all
   threads.
3. Asking a subset of discard threads to stop was not guaranteed to take
   effect, since if they continued to find contents in discard_queue
   then they would continue to run indefinitely. Add additional logic to
   _discard_thread() to have threads stop if they have been requested to
   stop and other threads exist to continue draining discard_queue.
4. Make the flow of _discard_stop() and handle_conf_change() more
   similar.

Fixes: https://tracker.ceph.com/issues/66817
Signed-off-by: Joshua Baergen <jbaergen@digitalocean.com>
@baergj baergj requested a review from a team as a code owner July 3, 2024 13:07
@github-actions github-actions bot added the core label Jul 3, 2024
@baergj
Copy link
Author

baergj commented Jul 3, 2024

jenkins test make check

@rzarzynski rzarzynski requested review from aclamk, ifed01 and pereman2 July 8, 2024 18:06
Instead of having _discard_start() and _discard_stop() partially or
completely duplicate functionality in handle_conf_change(), have a
single _discard_update_threads() that can handle all three. Loops are
tidied slightly, the unnecessary target_discard_threads class variable
has been removed, and now handle_conf_change() will respect
support_discard.

Signed-off-by: Joshua Baergen <jbaergen@digitalocean.com>
@baergj baergj force-pushed the upstream-fix-async-discard-on-start branch from ee5138d to 617c936 Compare July 15, 2024 14:47
@baergj
Copy link
Author

baergj commented Jul 15, 2024

I've re-run my documented test cases after pushing the latest commit and they all pass.

@amathuria
Copy link
Contributor

@yuriw
Copy link
Contributor

yuriw commented Jul 25, 2024

jenkins test make check

@yuriw
Copy link
Contributor

yuriw commented Jul 25, 2024

@baergj @ljflores pls merge when all checks passed
ref: https://tracker.ceph.com/issues/66955

@baergj
Copy link
Author

baergj commented Jul 26, 2024

jenkins test make check

@baergj
Copy link
Author

baergj commented Jul 26, 2024

Tests have passed but I don't have merge access - maybe @ljflores can help with that?

@ljflores ljflores merged commit 0ab5214 into ceph:main Jul 29, 2024
@ljflores
Copy link
Member

@baergj done!

@baergj baergj deleted the upstream-fix-async-discard-on-start branch July 29, 2024 18:34
@baergj
Copy link
Author

baergj commented Jul 29, 2024

@ljflores Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants