librbd/pwl: fix memory leaks in discard operations by tchaikov · Pull Request #66876 · ceph/ceph

tchaikov · 2026-01-10T01:33:17Z

Fix memory leaks in librbd persistent write log (PWL) cache discard operations by properly completing and deleting request objects.

ASan reported the following leaks in unittest_librbd:

  Direct leak of 240 byte(s) in 1 object(s) allocated from:
    #0 operator new(unsigned long)
    #1 librbd::cache::pwl::AbstractWriteLog<librbd::MockImageCtx>::discard(...)
       /ceph/src/librbd/cache/pwl/AbstractWriteLog.cc:935:5
    #2 TestMockCacheReplicatedWriteLog_discard_Test::TestBody()
       /ceph/src/test/librbd/cache/pwl/test_mock_ReplicatedWriteLog.cc:534:7

  Direct leak of 240 byte(s) in 1 object(s) allocated from:
    #0 operator new(unsigned long)
    #1 librbd::cache::pwl::AbstractWriteLog<librbd::MockImageCtx>::discard(...)
       /ceph/src/librbd/cache/pwl/AbstractWriteLog.cc:935:5
    #2 TestMockCacheSSDWriteLog_discard_Test::TestBody()
       /ceph/src/test/librbd/cache/pwl/test_mock_SSDWriteLog.cc:533:7

Plus multiple indirect leaks totaling 2,076 bytes:
- DiscardLogOperation (224 bytes x 2)
- SyncPoint (184 bytes)
- SyncPointLogEntry (168 bytes)
- DiscardLogEntry (152 bytes x 2)
- Various string and vector allocations

SUMMARY: AddressSanitizer: 2316 byte(s) leaked in 15 allocation(s).

Root cause analysis:

Missing request completion: C_DiscardRequest objects were never deleted because their complete() method was never called. The on_write_persist callback in setup_log_operations() called complete_user_request() to invoke the user's callback, but didn't call complete() on the request itself to trigger self-deletion via Context::complete().

Write requests use WriteLogOperationSet which takes the request as its on_finish callback, ensuring complete() is eventually called. Discard requests don't use WriteLogOperationSet and must explicitly call complete() in their on_write_persist callback.
Sync point not reset: m_current_sync_point was never explicitly reset during shutdown, keeping the sync point alive even after m_log_entries was cleared. This prevented proper cleanup of the sync point chain.
Indirect leaks: C_DiscardRequest holds shared_ptr, which holds shared_ptr, which holds shared_ptr to log entries. When the request leaked, all referenced objects also leaked through the shared_ptr reference chain.

Solution:

Add discard_req->complete(r) in the on_write_persist callback (Request.cc:445) to properly delete the request object after completing the user callback and releasing the cell. This follows the same pattern as WriteLogOperationSet for write requests.
Add m_current_sync_point.reset() in shut_down() (AbstractWriteLog.cc:682) after m_log_entries.clear() to explicitly release the sync point and break the reference chain.

These fixes ensure proper cleanup of all allocated objects in both production and test environments.

Test results:

Before: 2,316 bytes leaked in 15 allocations
After: 0 bytes leaked
unittest_librbd discard tests pass successfully with ASan

Contribution Guidelines

To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

Tracker (select at least one)
- References tracker ticket
- Very recent bug; references commit where it was introduced
- New feature (ticket optional)
- Doc update (no ticket needed)
- Code cleanup (no ticket needed)
Component impact
- Affects Dashboard, opened tracker ticket
- Affects Orchestrator, opened tracker ticket
- No impact that needs to be tracked
Documentation (select at least one)
- Updates relevant documentation
- No doc update is appropriate
Tests (select at least one)
- Includes unit test(s)
- Includes integration test(s)
- Includes bug reproducer
- No tests

Show available Jenkins commands

jenkins test classic perf Jenkins Job | Jenkins Job Definition
jenkins test crimson perf Jenkins Job | Jenkins Job Definition
jenkins test signed Jenkins Job | Jenkins Job Definition
jenkins test make check Jenkins Job | Jenkins Job Definition
jenkins test make check arm64 Jenkins Job | Jenkins Job Definition
jenkins test submodules Jenkins Job | Jenkins Job Definition
jenkins test dashboard Jenkins Job | Jenkins Job Definition
jenkins test dashboard cephadm Jenkins Job | Jenkins Job Definition
jenkins test api Jenkins Job | Jenkins Job Definition
jenkins test docs ReadTheDocs | Github Workflow Definition
jenkins test ceph-volume all Jenkins Jobs | Jenkins Jobs Definition
jenkins test windows Jenkins Job | Jenkins Job Definition
jenkins test rook e2e Jenkins Job | Jenkins Job Definition

You must only issue one Jenkins command per-comment. Jenkins does not understand
comments with more than one command.

src/librbd/cache/pwl/Request.cc

src/librbd/cache/pwl/AbstractWriteLog.cc

idryomov · 2026-02-16T23:11:43Z

This is a pretty serious memory leak, so we would want to backport. I have opened https://tracker.ceph.com/issues/74972, please add the corresponding Fixes tag to the commit message.

tchaikov · 2026-02-17T12:01:34Z

This is a pretty serious memory leak, so we would want to backport. I have opened https://tracker.ceph.com/issues/74972, please add the corresponding Fixes tag to the commit message.

Thanks for opening the tracker issue. I've added Fixes: https://tracker.ceph.com/issues/74972 to the commit message and updated it to address your review feedback.

Fix memory leak in librbd persistent write log (PWL) cache discard operations by properly completing request objects. ASan reported the following leaks in unittest_librbd: Direct leak of 240 byte(s) in 1 object(s) allocated from: #0 operator new(unsigned long) #1 librbd::cache::pwl::AbstractWriteLog<librbd::MockImageCtx>::discard(...) /ceph/src/librbd/cache/pwl/AbstractWriteLog.cc:935:5 #2 TestMockCacheReplicatedWriteLog_discard_Test::TestBody() /ceph/src/test/librbd/cache/pwl/test_mock_ReplicatedWriteLog.cc:534:7 Plus multiple indirect leaks totaling 2,076 bytes through the shared_ptr reference chain. Root cause: C_DiscardRequest objects were never deleted because their complete() method was never called. The on_write_persist callback released the BlockGuard cell but didn't call complete() to trigger self-deletion. Write requests use WriteLogOperationSet which takes the request as its on_finish callback, ensuring complete() is eventually called. Discard requests don't use WriteLogOperationSet and must explicitly call complete() in their on_write_persist callback. Solution: Call discard_req->complete(r) in the on_write_persist callback and move cell release into finish_req() -- mirroring how C_WriteRequest handles it. The complete() -> finish() -> finish_req() chain ensures the cell is released after the user request is completed, preserving the same ordering as write requests. Test results: - Before: 2,316 bytes leaked in 15 allocations - After: 0 bytes leaked - unittest_librbd discard tests pass successfully with ASan Fixes: https://tracker.ceph.com/issues/74972 Signed-off-by: Kefu Chai <k.chai@proxmox.com>

tchaikov · 2026-02-21T07:27:26Z

changelog:

incorporated @idryomov 's suggestion
rebased against main branch

idryomov · 2026-02-22T15:57:52Z

No related failures:

https://pulpito.ceph.com/dis-2026-02-22_14:44:58-rbd-wip-dis-testing-distro-default-trial/

github-actions · 2026-02-22T15:59:51Z

This is an automated message by src/script/redmine-upkeep.py.

I found one or more Fixes: tags in the commit messages in

git log ce92c047dafa673adc54925eae496b63c6d71b76^..ce92c047dafa673adc54925eae496b63c6d71b76

The referenced tickets are:

https://tracker.ceph.com/issues/74972

Those tickets do not reference this merged Pull Request. If this Pull Request merge resolves any of those tickets, please update the "Pull Request ID" field on each ticket. A future run of this script will appropriately update them.

Update Log: https://github.com/ceph/ceph/actions/runs/22280412118

tchaikov requested a review from a team as a code owner January 10, 2026 01:33

github-actions bot added the rbd label Jan 10, 2026

tchaikov mentioned this pull request Jan 10, 2026

script/run-make: enable ASan #56537

Draft

14 tasks

idryomov mentioned this pull request Jan 12, 2026

rgw/posix: add destructor to BucketCache to fix memory leaks #66850

Merged

14 tasks

idryomov added the wip-dis-testing label Jan 22, 2026

idryomov reviewed Feb 16, 2026

View reviewed changes

src/librbd/cache/pwl/Request.cc Show resolved Hide resolved

src/librbd/cache/pwl/AbstractWriteLog.cc Outdated Show resolved Hide resolved

tchaikov force-pushed the wip-librbd-pwl-fix-leaks branch from e2873e3 to 9cd9b68 Compare February 17, 2026 12:08

tchaikov force-pushed the wip-librbd-pwl-fix-leaks branch from 9cd9b68 to fcda31a Compare February 21, 2026 07:25

tchaikov requested a review from idryomov February 21, 2026 11:07

idryomov approved these changes Feb 22, 2026

View reviewed changes

idryomov merged commit ce92c04 into ceph:main Feb 22, 2026
12 of 13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

librbd/pwl: fix memory leaks in discard operations#66876

librbd/pwl: fix memory leaks in discard operations#66876
idryomov merged 1 commit intoceph:mainfrom
tchaikov:wip-librbd-pwl-fix-leaks

tchaikov commented Jan 10, 2026

Uh oh!

Uh oh!

Uh oh!

idryomov commented Feb 16, 2026

Uh oh!

tchaikov commented Feb 17, 2026

Uh oh!

tchaikov commented Feb 21, 2026

Uh oh!

idryomov commented Feb 22, 2026

Uh oh!

Uh oh!

github-actions bot commented Feb 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tchaikov commented Jan 10, 2026

Contribution Guidelines

Checklist

Uh oh!

Uh oh!

Uh oh!

idryomov commented Feb 16, 2026

Uh oh!

tchaikov commented Feb 17, 2026

Uh oh!

tchaikov commented Feb 21, 2026

Uh oh!

idryomov commented Feb 22, 2026

Uh oh!

Uh oh!

github-actions bot commented Feb 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants