Skip to content

os/BlueStore: NCB fix for leaked space when bdev_async_discard is ena…#56744

Merged
yuriw merged 4 commits intoceph:mainfrom
benhanokh:ncb_async_discard_fix
Jun 10, 2024
Merged

os/BlueStore: NCB fix for leaked space when bdev_async_discard is ena…#56744
yuriw merged 4 commits intoceph:mainfrom
benhanokh:ncb_async_discard_fix

Conversation

@benhanokh
Copy link
Contributor

@benhanokh benhanokh commented Apr 7, 2024

…bled

On graceful shutdown we call bdev->discard_drain() before calling store_allocator() to make sure all freed space is reflected in the allocator before destaging it.
On fast shutdown we remove the discarded queue and copy all its entries into the allocator before calling store_allocator().
This is logically identical to the behavior before NCB when the freed space was reflected in column-B entry in RocksDB which was used to construct the allocator after shutdown even if the free operation didn't complete.
The PR adds a drain wait for all discarded entries in the worker-threads private space to be committed.
We had to limit the private space entries to only 10 entries to guarantee that the drain will finish in timely manner

Fixes: https://tracker.ceph.com/issues/65298
Signed-off-by: Gabriel BenHanokh gbenhano@redhat.com

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@benhanokh benhanokh requested a review from aclamk April 7, 2024 11:10
@benhanokh benhanokh self-assigned this Apr 7, 2024
@benhanokh benhanokh requested a review from a team as a code owner April 7, 2024 11:10
…bled

Fix calls bdev->discard_drain() before calling store_allocator() to make sure all freed space is reflected in the allocator before destaging it
The fix set a timeout for the drain call (500msec) and if expires will not store the allocator (forcing a recovery on the next startup)
Fixes: https://tracker.ceph.com/issues/65298
Signed-off-by: Gabriel BenHanokh <gbenhano@redhat.com>
…toring the allocator.

ON fast shutdown we will simply copy the discard queue entries to the allocator

Signed-off-by: Gabriel BenHanokh <gbenhano@redhat.com>
On fast-shutdown take over the main discarded queue copying it to the allocator and only wait for the threads to commit their small private discarded queues

Signed-off-by: Gabriel BenHanokh <gbenhano@redhat.com>
@benhanokh benhanokh force-pushed the ncb_async_discard_fix branch from ea535c9 to 1e10a9b Compare April 9, 2024 14:07
virtual bool try_discard(interval_set<uint64_t> &to_release, bool async=true) { return false; }
virtual void discard_drain() { return; }

virtual void swap_discard_queued(interval_set<uint64_t>& other) { other.clear(); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be name this func as cancel_discards() ?

Signed-off-by: Gabriel BenHanokh <gbenhano@redhat.com>
@benhanokh
Copy link
Contributor Author

jenkins test api

@benhanokh benhanokh removed the request for review from aclamk April 14, 2024 06:31
@benhanokh
Copy link
Contributor Author

jenkins test make check arm64

@ronen-fr
Copy link
Contributor

jenkins test make check arm64

@benhanokh - don't bother. It won't work - and it's not a blocker

Copy link
Contributor

@pereman2 pereman2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuriw yuriw merged commit f66d0b2 into ceph:main Jun 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants