Skip to content

blk: threaded discard support#55469

Merged
yuriw merged 4 commits intoceph:mainfrom
Matt1360:main
Mar 14, 2024
Merged

blk: threaded discard support#55469
yuriw merged 4 commits intoceph:mainfrom
Matt1360:main

Conversation

@Matt1360
Copy link
Member

@Matt1360 Matt1360 commented Feb 6, 2024

We have encountered some drives that need discards enabled in order to stay performant. However, they aren't very quick at acting on the discard queue. I've turned the async discard functionality into a thread pool which can be tuned as needed, and if set to size of one is no change from the existing behaviour.

We're testing this in our lab currently (though against Pacific), and if there's appetite for this, I'll also backport it to Reef (selflishly, so we don't have to carry the patch). Note that because we're doing it against Pacific, I might have missed something in the config here, the yaml file is new to me - please let me know if I've missed anything there (I assume the build system does the appropriate generation here).

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@Matt1360 Matt1360 requested a review from a team as a code owner February 6, 2024 15:27
@Matt1360 Matt1360 force-pushed the main branch 4 times, most recently from fbe5179 to cc8339f Compare February 6, 2024 16:30
Signed-off-by: Matt Vandermeulen <matt@reenigne.net>
Signed-off-by: Matt Vandermeulen <matt@reenigne.net>
Signed-off-by: Matt Vandermeulen <matt@reenigne.net>
Signed-off-by: Matt Vandermeulen <matt@reenigne.net>
Copy link
Contributor

@ifed01 ifed01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good now!

@Matt1360
Copy link
Member Author

jenkins test make check

@interestingyong
Copy link

two questions, look for reply.

  1. release_alloc_txc osr->q write op, will execute early than async discard。
  2. newly allocate offset,len, will overlap with discard_queue. I can't found search and filter in discard_queue .
  3. discard/trim nvme area[offset, len] will lost data finally.
    thanks.

@ifed01
Copy link
Contributor

ifed01 commented Jul 8, 2025

two questions, look for reply.

  1. release_alloc_txc osr->q write op, will execute early than async discard。
  2. newly allocate offset,len, will overlap with discard_queue. I can't found search and filter in discard_queue .
  3. discard/trim nvme area[offset, len] will lost data finally.
    thanks.

Please see

void BlueStore::_txc_release_alloc(TransContext *txc)
{
  bool discard_queued = false;
  // it's expected we're called with lazy_release_lock already taken!
  if (unlikely(cct->_conf->bluestore_debug_no_reuse_blocks ||
               txc->released.size() == 0 ||
               !alloc)) {
      goto out;
  }
  discard_queued = bdev->try_discard(txc->released);
  // if async discard succeeded, will do alloc->release when discard callback
  // else we should release here
  if (!discard_queued) {
      dout(10) << __func__ << "(sync) " << txc << " " << std::hex
               << txc->released << std::dec << dendl;
      alloc->release(txc->released);
  }

BlueStore doesn't release extents immediately if discards are enabled. Instead it postpones that until relevant discard op is completed
void BlueStore::handle_discard(interval_set<uint64_t>& to_release) { dout(10) << __func__ << dendl; ceph_assert(alloc); alloc->release(to_release); }
Hence there is no way to allocate an extent while it's being discarded.

@interestingyong
Copy link

two questions, look for reply.

  1. release_alloc_txc osr->q write op, will execute early than async discard。
  2. newly allocate offset,len, will overlap with discard_queue. I can't found search and filter in discard_queue .
  3. discard/trim nvme area[offset, len] will lost data finally.
    thanks.

Please see

void BlueStore::_txc_release_alloc(TransContext *txc)
{
  bool discard_queued = false;
  // it's expected we're called with lazy_release_lock already taken!
  if (unlikely(cct->_conf->bluestore_debug_no_reuse_blocks ||
               txc->released.size() == 0 ||
               !alloc)) {
      goto out;
  }
  discard_queued = bdev->try_discard(txc->released);
  // if async discard succeeded, will do alloc->release when discard callback
  // else we should release here
  if (!discard_queued) {
      dout(10) << __func__ << "(sync) " << txc << " " << std::hex
               << txc->released << std::dec << dendl;
      alloc->release(txc->released);
  }

BlueStore doesn't release extents immediately if discards are enabled. Instead it postpones that until relevant discard op is completed void BlueStore::handle_discard(interval_set<uint64_t>& to_release) { dout(10) << __func__ << dendl; ceph_assert(alloc); alloc->release(to_release); } Hence there is no way to allocate an extent while it's being discarded.

thanks. I got it.
BlueFS and BlueStore reset the bitmap for the release_set only when discard_queued is false. Otherwise, the space is queued for discard and the bitmap reset is performed continuously by a background thread

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants