Bug #65298
closedFree space can be leaked in Quincy+ when bdev_async_discard is enabled
100%
Description
Starting in Quincy, we no longer maintain a free space map in rocksdb in bluestore (https://github.com/ceph/ceph/pull/39871).
When bluestore is shutting down, it will serialize the current allocator state to disk, and this is what will be used on the next boot. The issue is that, with bdev_async_discard enabled, the allocator is not updated with any freed blocks immediately after a bluestore txn completes; rather, it's updated once the actual discard happens. KernelDevice::close() appears to wait for all discards to complete, but this will not happen until after BlueStore::close_db(), when the allocator state is serialized, thus leaking the free space for any outstanding discards.
I have not observed this directly; it came to mind as a possibility during a conversation with another Ceph user, who mentioned seeing something that sounds a lot like this (he has async discard queues that can back up for hours under some workloads). He mentioned that resharding would reclaim the leaked space, which I'm guessing is because the allocation map gets regenerated in this case.
A few options come immediately to mind:- Create an in-memory freespace manager which is what gets serialized to disk at shutdown. This has more complexity, but means that we don't need to wait for discards to complete during OSD shutdown (though maybe we're already waiting for them during bdev shutdown, per above).
- Switch to synchronous discards and flush outstanding discards at the bdev level before proceeding with serializing allocator state. This can take a while in extreme circumstances.
- Similar to above, except simply disable discards and drop outstanding discards, returning pending space to the allocator.
- Never serialize allocator state when async discards are outstanding.
Updated by Gabriel BenHanokh almost 2 years ago
- % Done changed from 0 to 50
- Pull request ID set to 56744
PR https://github.com/ceph/ceph/pull/56744 should solve this issue
Updated by Igor Fedotov over 1 year ago
- Status changed from New to Pending Backport
Updated by Upkeep Bot over 1 year ago
- Copied to Backport #67139: squid: Free space can be leaked in Quincy+ when bdev_async_discard is enabled added
Updated by Upkeep Bot over 1 year ago
- Copied to Backport #67140: reef: Free space can be leaked in Quincy+ when bdev_async_discard is enabled added
Updated by Upkeep Bot over 1 year ago
- Copied to Backport #67141: quincy: Free space can be leaked in Quincy+ when bdev_async_discard is enabled added
Updated by Konstantin Shalygin about 1 year ago
- Status changed from Pending Backport to Resolved
- Assignee set to Gabriel BenHanokh
- % Done changed from 80 to 100
- Source set to Community (user)
Updated by Upkeep Bot 8 months ago
- Merge Commit set to f66d0b28ec82a68d408c200b227b83d7129b9e27
- Fixed In set to v19.3.0-2661-gf66d0b28ec8
- Upkeep Timestamp set to 2025-07-11T13:48:17+00:00
Updated by Upkeep Bot 8 months ago
- Fixed In changed from v19.3.0-2661-gf66d0b28ec8 to v19.3.0-2661-gf66d0b28ec
- Upkeep Timestamp changed from 2025-07-11T13:48:17+00:00 to 2025-07-14T23:09:45+00:00
Updated by Upkeep Bot 5 months ago
- Released In set to v20.2.0~2734
- Upkeep Timestamp changed from 2025-07-14T23:09:45+00:00 to 2025-11-01T01:33:55+00:00