Skip to content

Showcase 62080 tested/merged#2

Closed
Matan-B wants to merge 2 commits intowip-matanb-crimson-showcase-62080-mergedfrom
wip-matanb-crimson-showcase-62080-tested
Closed

Showcase 62080 tested/merged#2
Matan-B wants to merge 2 commits intowip-matanb-crimson-showcase-62080-mergedfrom
wip-matanb-crimson-showcase-62080-tested

Conversation

@Matan-B
Copy link
Owner

@Matan-B Matan-B commented Apr 15, 2025

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

Matan-B and others added 2 commits April 15, 2025 07:53
pushes/pulls

Instead of throttling recovery/backfill operations

Fixes: https://tracker.ceph.com/issues/70180
Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
(cherry picked from commit 81a6900)
(cherry picked from commit 7a1243db28491a2d32ba3c2b960fd73188647050)
pg->reset_pglog_based_recovery_op();
}
return seastar::make_ready_future<bool>(done);
return seastar::make_ready_future<bool>(!done);
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the version that was tested successfully

@Matan-B
Copy link
Owner Author

Matan-B commented Apr 15, 2025

Closing as this one is just to show the diff for the commit message

@Matan-B Matan-B closed this Apr 15, 2025
Matan-B pushed a commit that referenced this pull request May 21, 2025
Previously, in __ceph_abort and related abort handlers, we allocated
ClibBackTrace instances using raw pointers without proper cleanup. Since
these handlers terminate execution, the leaks didn't affect production
systems but were correctly flagged by ASan during testing:

```
Direct leak of 288 byte(s) in 1 object(s) allocated from:
    #0 0x55aefe8cb65d in operator new(unsigned long) (/home/jenkins-build/build/workspace/ceph-pull-requests/build/bin/unittest_ceph_assert+0x1f465d) (BuildId: a4faeddac80b0d81062bd53ede3388c0c10680bc)
    #1 0x7f3b84da988d in ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...) /home/jenkins-build/build/workspace/ceph-pull-requests/src/common/assert.cc:157:21
    #2 0x55aefe8cf04b in supressed_assertf_line22() /home/jenkins-build/build/workspace/ceph-pull-requests/src/test/ceph_assert.cc:22:3
    ceph#3 0x55aefe8ce4e6 in CephAssertDeathTest_ceph_assert_supresssions_Test::TestBody() /home/jenkins-build/build/workspace/ceph-pull-requests/src/test/ceph_assert.cc:31:3
    ceph#4 0x55aefe99135d in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2653:10
    ceph#5 0x55aefe94f015 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2689:14
...
```

This commit resolves the issue by using std::unique_ptr to manage the
lifecycle of backtrace objects, ensuring proper cleanup even in
non-returning functions. While these leaks had no practical impact in
production (as the process terminates anyway), fixing them improves code
quality and eliminates false positives in memory analysis tools.

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
Matan-B pushed a commit that referenced this pull request May 26, 2025
…ms in destructor

Fix memory leak in test_not_before_queue identified by AddressSanitizer.
Previously, the test was terminating without properly dequeueing all elements,
causing resource leaks during teardown.

This change implements a proper cleanup mechanism that:
1. Advances the queue's time until all elements become eligible for processing
2. Dequeues all remaining elements to ensure proper destruction
3. Guarantees clean teardown even for elements with future timestamps

The fix eliminates ASan-reported leaks occurring in the not_before_queue_t::enqueue
method when test cases were torn down prematurely.

A sample of the error from ASan:

```
Direct leak of 1800 byte(s) in 15 object(s) allocated from:
    #0 0x7f71b1f1ab5b in operator new(unsigned long) ../../../../src/libsanitizer/asan/asan_new_delete.cpp:86
    #1 0x55fb4c977058 in void not_before_queue_t<tv_t, test_time_t>::enqueue<tv_t const&>(tv_t const&) /home/kefu/dev/ceph/src/common/not_before_queue.h:164
    #2 0x55fb4c97748b in NotBeforeTest::load_test_data(std::vector<tv_t, std::allocator<tv_t> > const&) /home/kefu/dev/ceph/src/test/test_not_before_queue.cc:67
    ceph#3 0x55fb4c961112 in NotBeforeTest_RemoveIfByClass_no_cond_Test::TestBody() /home/kefu/dev/ceph/src/test/test_not_before_queue.cc:213
    ceph#4 0x55fb4ca05f67 in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/kefu/dev/ceph/src/googletest/googletest/src/gtest.cc:2653
    ceph#5 0x55fb4ca1c4f7 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/kefu/dev/ceph/src/googletest/googletest/src/gtest.cc:2689
    ceph#6 0x55fb4c9e1104 in testing::Test::Run() /home/kefu/dev/ceph/src/googletest/googletest/src/gtest.cc:2728
    ceph#7 0x55fb4c9e16e2 in testing::TestInfo::Run() /home/kefu/dev/ceph/src/googletest/googletest/src/gtest.cc:2874
    ceph#8 0x55fb4c9e73b4 in testing::TestSuite::Run() /home/kefu/dev/ceph/src/googletest/googletest/src/gtest.cc:3052
    ceph#9 0x55fb4c9f059b in testing::internal::UnitTestImpl::RunAllTests() /home/kefu/dev/ceph/src/googletest/googletest/src/gtest.cc:6004
    ceph#10 0x55fb4ca064ff in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/kefu/dev/ceph/src/googletest/googletest/src/gtest.cc:2653
    ceph#11 0x55fb4ca1d1bf in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/kefu/dev/ceph/src/googletest/googletest/src/gtest.cc:2689
    ceph#12 0x55fb4c9e124d in testing::UnitTest::Run() /home/kefu/dev/ceph/src/googletest/googletest/src/gtest.cc:5583
    ceph#13 0x55fb4c97a0b6 in RUN_ALL_TESTS() /home/kefu/dev/ceph/src/googletest/googletest/include/gtest/gtest.h:2334
    ceph#14 0x55fb4c979ffc in main /home/kefu/dev/ceph/src/googletest/googlemock/src/gmock_main.cc:71
    ceph#15 0x7f71ae833ca7 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58

SUMMARY: AddressSanitizer: 1800 byte(s) leaked in 15 allocation(s).
```

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
Matan-B pushed a commit that referenced this pull request Jun 5, 2025
Previously, we had memory leak in the test_bluestore_types.cc tests where
`BufferCacheShard` and `OnodeCacheShard` objects were allocated with
raw pointers but never freed, causing leaks detected by AddressSanitizer.

ASan rightly pointed this out:

```
Direct leak of 224 byte(s) in 1 object(s) allocated from:
    #0 0x55a7432a079d in operator new(unsigned long) (/home/jenkins-build/build/workspace/ceph-pull-requests/build/bin/unittest_bluestore_types+0xf2e79d) (BuildId: c3bec647afa97df6bb147bc82eac937531fc6272)
    #1 0x55a743523340 in BlueStore::BufferCacheShard::create(BlueStore*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, ceph::common::PerfCounters*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/os/bluestore/Bl
ueStore.cc:1678:9
    #2 0x55a74330b617 in ExtentMap_seek_lextent_Test::TestBody() /home/jenkins-build/build/workspace/ceph-pull-requests/src/test/objectstore/test_bluestore_types.cc:1077:7
    ceph#3 0x55a7434f2b2d in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.
cc:2653:10
    ceph#4 0x55a7434b5775 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:
2689:14
    ceph#5 0x55a74347005d in testing::Test::Run() /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2728:5
```
```
Direct leak of 9928 byte(s) in 1 object(s) allocated from:
    #0 0x7ff249d21a2d in operator new(unsigned long) /usr/src/debug/gcc/gcc/libsanitizer/asan/asan_new_delete.cpp:86
    #1 0x6048ed878b76 in BlueStore::OnodeCacheShard::create(ceph::common::CephContext*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::common::PerfCounters*) /home/kefu/dev/ceph/src/os/bluestore/BlueStore.cc:1219
    #2 0x6048ed66d4f9 in GarbageCollector_BasicTest_Test::TestBody() /home/kefu/dev/ceph/src/test/objectstore/test_bluestore_types.cc:2662
    ceph#3 0x6048ed820555 in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/kefu/dev/ceph/src/googletest/googletest/src/gtest.cc:2653
    ceph#4 0x6048ed80c78a in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/kefu/dev/ceph/src/googletest/googletest/src/gtest.cc:2689
    ceph#5 0x6048ed7b8bfa in testing::Test::Run() /home/kefu/dev/ceph/src/googletest/googletest/src/gtest.cc:2728
```

In this change, we replace raw pointer allocation with unique_ptr to
ensure automatic cleanup when the objects go out of scope.
`
Signed-off-by: Kefu Chai <tchaikov@gmail.com>
Matan-B pushed a commit that referenced this pull request Jun 9, 2025
Replace raw pointers with unique_ptr for AioCompletion instances in
test_librados_completion to prevent memory leaks. Previously, each test
case allocated AioCompletion objects but never freed them, causing
AddressSanitizer to report leaks.

The unique_ptr automatically manages the object lifecycle, ensuring
cleanup when the pointer goes out of scope.

Fixes ASan error:

```
`==1199357==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 8 byte(s) in 1 object(s) allocated from:
    #0 0x5602d9f0eaad in operator new(unsigned long) (/home/jenkins-build/build/workspace/ceph-pull-requests/build/bin/unittest_librados_completion+0x1f2aad) (BuildId: ef5b4b8f0a479e21b6a2686519ff4c3ef71b9f94)
    #1 0x7f3ac9b776f4 in librados::v14_2_0::Rados::aio_create_completion() /home/jenkins-build/build/workspace/ceph-pull-requests/src/librados/librados_cxx.cc:2892:10
    #2 0x5602d9f11a0a in CoroExcept_AioComplete_Test::TestBody() /home/jenkins-build/build/workspace/ceph-pull-requests/src/test/common/test_librados_completion.cc:54:14
    ceph#3 0x5602da01d69d in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2653:10
```

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
Matan-B pushed a commit that referenced this pull request Jun 22, 2025
Memory leak was detected by AddressSanitizer in dbstore tests. Objects
inserted into a static objectmap were not being properly freed when
tests completed without explicitly deleting buckets.

The leak occurred because:

- Objects were preserved in a static map after insertion
- Objects were only freed by DB::objectmapDelete() when deleting the
  corresponding bucket
- In tests, buckets weren't always deleted after insertion, leaving
  objects allocated

ASan rightly pointed this out:

```
Direct leak of 200 byte(s) in 1 object(s) allocated from:
    #0 0x55c420f5274d in operator new(unsigned long) (/home/jenkins-build/build/workspace/ceph-pull-requests/build/bin/unittest_dbstore_tests+0x4df74d) (BuildId: c19ed2d2b1ead306fdc59fc311f546a287300010)
    #1 0x55c42143cfaf in SQLGetBucket::Execute(DoutPrefixProvider const*, rgw::store::DBOpParams*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/rgw/driver/dbstore/sqlite/sqliteDB.cc:1689:11
    #2 0x55c42102751c in rgw::store::DB::ProcessOp(DoutPrefixProvider const*, std::basic_string_view<char, std::char_traits<char>>, rgw::store::DBOpParams*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/rgw/driver/dbstore/common/dbstore.cc:251:16
    ceph#3 0x55c4210328c9 in rgw::store::DB::get_bucket_info(DoutPrefixProvider const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, RGWBucketInfo&, std::map<std::__cxx11
::basic_string<char, std::char_traits<char>, std::allocator<char>>, ceph::buffer::v15_2_0::list, std::less<std::__cxx11::basic_string<char, std::c
har_traits<char>, std::allocator<char>>>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const, ceph::buffer::v15_2_0::list>>>*, std::chrono::time_point<ceph::real_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l>>>*, obj_version*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/rgw/driver/dbstore/common/dbstore.cc:450:9
    ceph#4 0x55c421034357 in rgw::store::DB::create_bucket(DoutPrefixProvider const*, std::variant<rgw_user, rgw_account_id> const&, rgw_bucket const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, rgw_placement_rule const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, ceph::buffer::v15_2_0::list, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const, ceph::buffer::v15_2_0::list>>> const&, std::optional<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>> const&, std::optional<RGWQuotaInfo> const&, std::optional<std::chrono::time_point<ceph::real_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l>>>>, obj_version*, RGWBucketInfo&, optional_yield) /home/jenkins-build/build/workspace/ceph-pull-requests/src/rgw/driver/dbstore/common/dbstore.cc:504:9
    ceph#5 0x55c420f6c3ed in DBStoreTest_CreateBucket_Test::TestBody() /home/jenkins-build/build/workspace/ceph-pull-requests/src/rgw/driver/dbstore/tests/dbstore_tests.cc:495:13
    ceph#6 0x55c42174fa0d in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2653:10
    ceph#7 0x55c421717f25 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2689:14
    ceph#8 0x55c4216d4bbd in testing::Test::Run() /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2728:5
```

In this change, we:

- Free objectmap entries in DB::Destroy() to ensure cleanup on shutdown
- Call DB::Destroy() in DBStoreTest::TearDown() to guarantee cleanup after each test

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
Matan-B pushed a commit that referenced this pull request Jun 22, 2025
Previously, we had Resource leak in Directory::for_each() where
directory streams opened with fdopendir() were never properly closed,
causing a 32KB leak per unclosed directory handle.

In this change, we add closedir() call to properly release directory stream
resources after enumeration completes.

ASan report:

```
Direct leak of 32816 byte(s) in 1 object(s) allocated from:
    #0 0x738387320e15 in malloc /usr/src/debug/gcc/gcc/libsanitizer/asan/asan_malloc_linux.cpp:67
    #1 0x738381383514  (/usr/lib/libc.so.6+0xe3514) (BuildId: 468e3585c794491a48ea75fceb9e4d6b1464fc35)
    #2 0x738381383418 in fdopendir (/usr/lib/libc.so.6+0xe3418) (BuildId: 468e3585c794491a48ea75fceb9e4d6b1464fc35)
    ceph#3 0x6433e1aefac4 in for_each<rgw::sal::Directory::fill_cache(const DoutPrefixProvider*, optional_yield, rgw::sal::fill_cache_cb_t&)::<lambda(char const*)> > /home/kefu/dev/ceph/src/rgw/driver/posix/rgw_sal_posix.cc:874
    ceph#4 0x6433e1aa10ab in rgw::sal::Directory::fill_cache(DoutPrefixProvider const*, optional_yield, fu2::abi_310::detail::function<fu2::abi_310::detail::config<true, false, 16ul>, fu2::abi_310::detail::property<true, false, int (DoutPrefixProvider const*, rgw_bucket_dir_ent
ry&) const> > const&) /home/kefu/dev/ceph/src/rgw/driver/posix/rgw_sal_posix.cc:1077
    ceph#5 0x6433e1aa8974 in rgw::sal::MPDirectory::fill_cache(DoutPrefixProvider const*, optional_yield, fu2::abi_310::detail::function<fu2::abi_310::detail::config<true, false, 16ul>, fu2::abi_310::detail::property<true, false, int (DoutPrefixProvider const*, rgw_bucket_dir_e
ntry&) const> > const&) /home/kefu/dev/ceph/src/rgw/driver/posix/rgw_sal_posix.cc:1384
    ceph#6 0x6433e1abf1ee in rgw::sal::POSIXBucket::fill_cache(DoutPrefixProvider const*, optional_yield, fu2::abi_310::detail::function<fu2::abi_310::detail::config<true, false, 16ul>, fu2::abi_310::detail::property<true, false, int (DoutPrefixProvider const*, rgw_bucket_dir_e
ntry&) const> > const&) /home/kefu/dev/ceph/src/rgw/driver/posix/rgw_sal_posix.cc:2236
    ceph#7 0x6433e1c43956 in file::listing::BucketCache<rgw::sal::POSIXDriver, rgw::sal::POSIXBucket>::fill(DoutPrefixProvider const*, file::listing::BucketCacheEntry<rgw::sal::POSIXDriver, rgw::sal::POSIXBucket>*, rgw::sal::POSIXBucket*, unsigned int, optional_yield) /home/kef
u/dev/ceph/src/rgw/driver/posix/bucket_cache.h:368
    ceph#8 0x6433e1c1c71c in file::listing::BucketCache<rgw::sal::POSIXDriver, rgw::sal::POSIXBucket>::list_bucket(DoutPrefixProvider const*, optional_yield, rgw::sal::POSIXBucket*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, fu2::abi_310::d
etail::function<fu2::abi_310::detail::config<true, false, 16ul>, fu2::abi_310::detail::property<true, false, bool (rgw_bucket_dir_entry const&) const> >) /home/kefu/dev/ceph/src/rgw/driver/posix/bucket_cache.h:410
    ceph#9 0x6433e1ac1829 in rgw::sal::POSIXBucket::list(DoutPrefixProvider const*, rgw::sal::Bucket::ListParams&, int, rgw::sal::Bucket::ListResults&, optional_yield) /home/kefu/dev/ceph/src/rgw/driver/posix/rgw_sal_posix.cc:2256
    ceph#10 0x6433e1ae1302 in rgw::sal::POSIXMultipartUpload::list_parts(DoutPrefixProvider const*, ceph::common::CephContext*, int, int, int*, bool*, optional_yield, bool) /home/kefu/dev/ceph/src/rgw/driver/posix/rgw_sal_posix.cc:3692
    ceph#11 0x6433e1ae2f3b in rgw::sal::POSIXMultipartUpload::complete(DoutPrefixProvider const*, optional_yield, ceph::common::CephContext*, std::map<int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<int>, std::allocator<std::pair<
int const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&, std::__cxx11::list<rgw_obj_index_key, std::allocator<rgw_obj_index_key> >&, unsigned long&, bool&, RGWCompressionInfo&, long&, std::__cxx11::basic_string<char, std::char_trait
s<char>, std::allocator<char> >&, ACLOwner&, unsigned long, rgw::sal::Object*, boost::container::flat_map<unsigned int, boost::container::flat_set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std
::char_traits<char>, std::allocator<char> > >, void>, std::less<unsigned int>, void>&) /home/kefu/dev/ceph/src/rgw/driver/posix/rgw_sal_posix.cc:3770
    ceph#12 0x6433e0fffb73 in POSIXMPObjectTest::create_MPObj(rgw::sal::MultipartUpload*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) /home/kefu/dev/ceph/src/test/rgw/test_rgw_posix_driver.cc:1717
    ceph#13 0x6433e0e95ab9 in POSIXMPObjectTest_BucketList_Test::TestBody() /home/kefu/dev/ceph/src/test/rgw/test_rgw_posix_driver.cc:1863
    ceph#14 0x6433e1308d17 in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/kefu/dev/ceph/src/googletest/googletest/src/gtest.cc:2653
```

Fixes https://tracker.ceph.com/issues/71505
Signed-off-by: Kefu Chai <tchaikov@gmail.com>
Matan-B pushed a commit that referenced this pull request Jun 22, 2025
Previously, error messages passed to luaL_error() were formatted using
std::string concatenation. Since luaL_error() never returns (it throws
a Lua exception via longjmp), the allocated std::string memory was
leaked, as detected by AddressSanitizer:

```
Direct leak of 105 byte(s) in 1 object(s) allocated from:
    #0 0x7fc5f1921a2d in operator new(unsigned long) /usr/src/debug/gcc/gcc/libsanitizer/asan/asan_new_delete.cpp:86
    #1 0x563bd89144c7 in std::__new_allocator<char>::allocate(unsigned long, void const*) /usr/include/c++/15.1.1/bits/new_allocator.h:151
    #2 0x563bd89144c7 in std::allocator<char>::allocate(unsigned long) /usr/include/c++/15.1.1/bits/allocator.h:203
    ceph#3 0x563bd89144c7 in std::allocator_traits<std::allocator<char> >::allocate(std::allocator<char>&, unsigned long) /usr/include/c++/15.1.1/bits/alloc_traits.h:614
    ceph#4 0x563bd89144c7 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_S_allocate(std::allocator<char>&, unsigned long) /usr/include/c++/15.1.1/bits/basic_string.h:142
    ceph#5 0x563bd89144c7 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_create(unsigned long&, unsigned long) /usr/include/c++/15.1.1/bits/basic_string.tcc:164
    ceph#6 0x563bd896ae1b in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_mutate(unsigned long, unsigned long, char const*, unsigned long) /usr/include/c++/15.1.1/bits/basic_string.tcc:363
    ceph#7 0x563bd896b256 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_append(char const*, unsigned long) /usr/include/c++/15.1.1/bits/basic_string.tcc:455
    ceph#8 0x563bd896b2bb in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::append(char const*) /usr/include/c++/15.1.1/bits/basic_string.h:1585
    ceph#9 0x563bd943c2f2 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > std::operator+<char, std::char_traits<char>, std::allocator<char> >(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&, char const*) /usr/include/c++/15.1.1/bits/basic_string.h:3977
    ceph#10 0x563bd943c2f2 in rgw::lua::lua_state_guard::runtime_hook(lua_State*, lua_Debug*) /home/kefu/dev/ceph/src/rgw/rgw_lua_utils.cc:245
    ceph#11 0x7fc5f139f8ef  (/usr/lib/liblua.so.5.4+0xe8ef) (BuildId: b7533e2973d4b0d82e10fc29973ec5a8d355d2b8)
    ceph#12 0x7fc5f139fbfe  (/usr/lib/liblua.so.5.4+0xebfe) (BuildId: b7533e2973d4b0d82e10fc29973ec5a8d355d2b8)
    ceph#13 0x7fc5f13b26fe  (/usr/lib/liblua.so.5.4+0x216fe) (BuildId: b7533e2973d4b0d82e10fc29973ec5a8d355d2b8)
    ceph#14 0x7fc5f139f581  (/usr/lib/liblua.so.5.4+0xe581) (BuildId: b7533e2973d4b0d82e10fc29973ec5a8d355d2b8)
    ceph#15 0x7fc5f139b735  (/usr/lib/liblua.so.5.4+0xa735) (BuildId: b7533e2973d4b0d82e10fc29973ec5a8d355d2b8)
    ceph#16 0x7fc5f139ba8f  (/usr/lib/liblua.so.5.4+0xaa8f) (BuildId: b7533e2973d4b0d82e10fc29973ec5a8d355d2b8)
    ceph#17 0x7fc5f139f696 in lua_pcallk (/usr/lib/liblua.so.5.4+0xe696) (BuildId: b7533e2973d4b0d82e10fc29973ec5a8d355d2b8)
    ceph#18 0x563bd8a925ef in rgw::lua::request::execute(rgw::sal::Driver*, RGWREST*, OpsLogSink*, req_state*, RGWOp*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /home/kefu/dev/ceph/src/rgw/rgw_lua_request.cc:824
    ceph#19 0x563bd8952e3d in TestRGWLua_LuaRuntimeLimit_Test::TestBody() /home/kefu/dev/ceph/src/test/rgw/test_rgw_lua.cc:1628
    ceph#20 0x563bd8a37817 in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/kefu/dev/ceph/src/googletest/googletest/src/gtest.cc:2653
    ceph#21 0x563bd8a509b5 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (/home/kefu/dev/ceph/build/bin/unittest_rgw_lua+0x11199b5) (BuildId: b2628caba5290d882d25f7bea166f058b682bc85)`
```

This change replaces std::string formatting with stack-allocated buffer
and std::to_chars() to eliminate the memory leak.

Note: We cannot format int64_t directly through luaL_error() because
lua_pushfstring() does not support long long or int64_t format specifiers,
even in Lua 5.4 (see https://www.lua.org/manual/5.4/manual.html#lua_pushfstring).
Since libstdc++ uses int64_t for std::chrono::milliseconds::rep, we use
std::to_chars() for safe, efficient conversion without heap allocation.

The maximum runtime limit was a configuration introduced by 3e3cb15.

Fixes: https://tracker.ceph.com/issues/71595
Signed-off-by: Kefu Chai <tchaikov@gmail.com>
Matan-B pushed a commit that referenced this pull request Jun 22, 2025
…write

Fix 4KB memory leak in ErasureCodePlugin_parity_delta_write_Test caused by
unmanaged raw buffer allocation. The test was allocating a 4096-byte raw
buffer to replace shard 4 for delta encoding validation, but the buffer::ptr
constructed from the raw pointer did not manage the buffer's lifecycle.

Detected by AddressSanitizer:

```
Direct leak of 4096 byte(s) in 1 object(s) allocated from:
    #0 0x7fb73a720e15 in malloc /usr/src/debug/gcc/gcc/libsanitizer/asan/asan_malloc_linux.cpp:67
    #1 0x5562f4062ccc in ErasureCodePlugin_parity_delta_write_Test::TestBody() /home/kefu/dev/ceph/src/test/erasure-code/TestErasureCodePluginJerasure.cc:122
    #2 0x5562f41081a1 in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/kefu/dev/ceph/src/googletest/googletest/src/gtest.cc:2653
    ceph#3 0x5562f40f3004 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/kefu/dev/ceph/src/googletest/googletest/src/gtest.cc:2689
    ceph#4 0x5562f409cbba in testing::Test::Run() /home/kefu/dev/ceph/src/googletest/googletest/src/gtest.cc:2728```
```

In this change, we replace raw pointer allocation with
create_bufferptr() to ensure proper memory management by buffer::ptr.

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
Matan-B pushed a commit that referenced this pull request Jun 22, 2025
Previously, SyncPoint allocated two C_Gather instances tracked by raw
pointers but failed to properly clean them up when only a single sync
point existed, causing memory leaks detected by AddressSanitizer.

This change fixes the leak by modifying AbstractWriteLog::shut_down()
to check for prior sync points in the chain. When the current sync point
is the only one present, we now activate the m_prior_log_entries_persisted
context to ensure:

- The onfinish callback executes and releases the captured strong
  reference to the enclosing SyncPoint
- The parent m_sync_point_persist context completes and gets properly
  released

This ensures all allocated contexts are cleaned up correctly during
shutdown, eliminating the memory leak.

The ASan report:

```
Indirect leak of 2064 byte(s) in 1 object(s) allocated from:
    #0 0x56440919ae2d in operator new(unsigned long) (/home/jenkins-build/build/workspace/ceph-pull-requests/build/bin/unittest_librbd+0x2f3de2d) (BuildId: 6a04677c6ee5235f1a41815df807f97c5b96d4cd)
    #1 0x56440bd67751 in __gnu_cxx::new_allocator<Context*>::allocate(unsigned long, void const*) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/ext/new_allocator.h:127:27
    #2 0x56440bd676e0 in std::allocator<Context*>::allocate(unsigned long) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/allocator.h:185:32
    ceph#3 0x56440bd676e0 in std::allocator_traits<std::allocator<Context*>>::allocate(std::allocator<Context*>&, unsigned long) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/alloc_traits.h:464:20
    ceph#4 0x56440bd6730b in std::_Vector_base<Context*, std::allocator<Context*>>::_M_allocate(unsigned long) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_vector.h:346:20
    ceph#5 0x7fd33e00e8d1 in std::vector<Context*, std::allocator<Context*>>::reserve(unsigned long) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/vector.tcc:78:22
    ceph#6 0x7fd33e00c51c in librbd::cache::pwl::SyncPoint::SyncPoint(unsigned long, ceph::common::CephContext*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/librbd/cache/pwl/SyncPoint.cc:20:27
    ceph#7 0x56440bd65f26 in decltype(::new((void*)(0)) librbd::cache::pwl::SyncPoint(std::declval<unsigned long&>(), std::declval<ceph::common::CephContext*&>())) std::construct_at<librbd::cache::pwl::SyncPoint, unsigned long&, ceph::common::CephContext*&>(librbd::cache::pwl::SyncPoint*, unsigned long&, ceph::common::CephContext*&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_construct.h:97:39
    ceph#8 0x56440bd65b98 in void std::allocator_traits<std::allocator<librbd::cache::pwl::SyncPoint>>::construct<librbd::cache::pwl::SyncPoint, unsigned long&, ceph::common::CephContext*&>(std::allocator<librbd::cache::pwl::SyncPoint>&, librbd::cache::pwl::SyncPoint*, unsigned long&, ceph::common::CephContext*&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/alloc_traits.h:518:4
    ceph#9 0x56440bd657d3 in std::_Sp_counted_ptr_inplace<librbd::cache::pwl::SyncPoint, std::allocator<librbd::cache::pwl::SyncPoint>, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<unsigned long&, ceph::common::CephContext*&>(std::allocator<librbd::cache::pwl::SyncPoint>, unsigned long&, ceph::common::CephContext*&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:519:4
    ceph#10 0x56440bd65371 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<librbd::cache::pwl::SyncPoint, std::allocator<librbd::cache::pwl::SyncPoint>, unsigned long&, ceph::common::CephContext*&>(librbd::cache::pwl::SyncPoint*&, std::_Sp_alloc_shared_tag<std::allocator<librbd::cache::pwl::SyncPoint>>, unsigned long&, ceph::common::CephContext*&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:651:6
    ceph#11 0x56440bd65163 in std::__shared_ptr<librbd::cache::pwl::SyncPoint, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<librbd::cache::pwl::SyncPoint>, unsigned long&, ceph::common::CephContext*&>(std::_Sp_alloc_shared_tag<std::allocator<librbd::cache::pwl::SyncPoint>>, unsigned long&, ceph::common::CephContext*&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:1342:14
    ceph#12 0x56440bd650e6 in std::shared_ptr<librbd::cache::pwl::SyncPoint>::shared_ptr<std::allocator<librbd::cache::pwl::SyncPoint>, unsigned long&, ceph::common::CephContext*&>(std::_Sp_alloc_shared_tag<std::allocator<librbd::cache::pwl::SyncPoint>>, unsigned long&, ceph::common::CephContext*&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr.h:409:4
    ceph#13 0x56440bd65057 in std::shared_ptr<librbd::cache::pwl::SyncPoint> std::allocate_shared<librbd::cache::pwl::SyncPoint, std::allocator<librbd::cache::pwl::SyncPoint>, unsigned long&, ceph::common::CephContext*&>(std::allocator<librbd::cache::pwl::SyncPoint> const&, unsigned long&, ceph::common::CephContext*&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr.h:862:14
    ceph#14 0x56440bca97e7 in std::shared_ptr<librbd::cache::pwl::SyncPoint> std::make_shared<librbd::cache::pwl::SyncPoint, unsigned long&, ceph::common::CephContext*&>(unsigned long&, ceph::common::CephContext*&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr.h:878:14
    ceph#15 0x56440bd443c8 in librbd::cache::pwl::AbstractWriteLog<librbd::MockImageCtx>::new_sync_point(librbd::cache::pwl::DeferredContexts&) /home/jenkins-build/build/workspace/ceph-pull-requests/src/librbd/cache/pwl/AbstractWriteLog.cc:1905:20
    ceph#16 0x56440bd42e4c in librbd::cache::pwl::AbstractWriteLog<librbd::MockImageCtx>::flush_new_sync_point(librbd::cache::pwl::C_FlushRequest<librbd::cache::pwl::AbstractWriteLog<librbd::MockImageCtx>>*, librbd::cache::pwl::DeferredContexts&) /home/jenkins-build/build/workspace/ceph-pull-requests/src/librbd/cache/pwl/AbstractWriteLog.cc:1951:3
    ceph#17 0x56440bd9cbf2 in librbd::cache::pwl::AbstractWriteLog<librbd::MockImageCtx>::flush_new_sync_point_if_needed(librbd::cache::pwl::C_FlushRequest<librbd::cache::pwl::AbstractWriteLog<librbd::MockImageCtx>>*, librbd::cache::pwl::DeferredContexts&) /home/jenkins-build/build/workspace/ceph-pull-requests/src/librbd/cache/pwl/AbstractWriteLog.cc:1990:5
    ceph#18 0x56440bd9c636 in librbd::cache::pwl::AbstractWriteLog<librbd::MockImageCtx>::internal_flush(bool, Context*)::'lambda'(librbd::cache::pwl::GuardedRequestFunctionContext&)::operator()(librbd::cache::pwl::GuardedRequestFunctionContext&) const /home/jenkins-build/build/workspace/ceph-pull-requests/src/librbd/cache/pwl/AbstractWriteLog.cc:2152:9
    ceph#19 0x56440bd9b9b4 in boost::detail::function::void_function_obj_invoker<librbd::cache::pwl::AbstractWriteLog<librbd::MockImageCtx>::internal_flush(bool, Context*)::'lambda'(librbd::cache::pwl::GuardedRequestFunctionContext&), void, librbd::cache::pwl::GuardedRequestFunctionContext&>::invoke(boost::detail::function::function_buffer&, librbd::cache::pwl::GuardedRequestFunctionContext&) /opt/ceph/include/boost/function/function_template.hpp:100:11
    ceph#20 0x56440bd29321 in boost::function_n<void, librbd::cache::pwl::GuardedRequestFunctionContext&>::operator()(librbd::cache::pwl::GuardedRequestFunctionContext&) const /opt/ceph/include/boost/function/function_template.hpp:789:14
    ceph#21 0x56440bd28d85 in librbd::cache::pwl::GuardedRequestFunctionContext::finish(int) /home/jenkins-build/build/workspace/ceph-pull-requests/src/librbd/cache/pwl/Request.h:335:5
    ceph#22 0x5644091e0fe0 in Context::complete(int) /home/jenkins-build/build/workspace/ceph-pull-requests/src/include/Context.h:102:5
    ceph#23 0x56440bd9b378 in librbd::cache::pwl::AbstractWriteLog<librbd::MockImageCtx>::detain_guarded_request(librbd::cache::pwl::C_BlockIORequest<librbd::cache::pwl::AbstractWriteLog<librbd::MockImageCtx>>*, librbd::cache::pwl::GuardedRequestFunctionContext*, bool) /home/jenkins-build/build/workspace/ceph-pull-requests/src/librbd/cache/pwl/AbstractWriteLog.cc:1202:20
    ceph#24 0x56440bd96c50 in librbd::cache::pwl::AbstractWriteLog<librbd::MockImageCtx>::internal_flush(bool, Context*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/librbd/cache/pwl/AbstractWriteLog.cc:2154:3
    ceph#25 0x56440bd1e4b5 in librbd::cache::pwl::AbstractWriteLog<librbd::MockImageCtx>::shut_down(Context*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/librbd/cache/pwl/AbstractWriteLog.cc:703:3
    ceph#26 0x56440bdb9022 in librbd::cache::pwl::TestMockCacheSSDWriteLog_compare_and_write_compare_matched_Test::TestBody() /home/jenkins-build/build/workspace/ceph-pull-requests/src/test/librbd/cache/pwl/test_mock_SSDWriteLog.cc:403:7
```

Fixes: https://tracker.ceph.com/issues/71335

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
Matan-B pushed a commit that referenced this pull request Jul 9, 2025
Previously, SyncPoint allocated two C_Gather instances tracked by raw
pointers but failed to properly clean them up when only a single sync
point existed, causing memory leaks detected by AddressSanitizer.

This change fixes the leak by modifying AbstractWriteLog::shut_down()
to check for prior sync points in the chain. When the current sync point
is the only one present, we now activate the m_prior_log_entries_persisted
context to ensure:

- The onfinish callback executes and releases the captured strong
  reference to the enclosing SyncPoint
- The parent m_sync_point_persist context completes and gets properly
  released

This ensures all allocated contexts are cleaned up correctly during
shutdown, eliminating the memory leak.

The ASan report:

```
Indirect leak of 2064 byte(s) in 1 object(s) allocated from:
    #0 0x56440919ae2d in operator new(unsigned long) (/home/jenkins-build/build/workspace/ceph-pull-requests/build/bin/unittest_librbd+0x2f3de2d) (BuildId: 6a04677c6ee5235f1a41815df807f97c5b96d4cd)
    #1 0x56440bd67751 in __gnu_cxx::new_allocator<Context*>::allocate(unsigned long, void const*) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/ext/new_allocator.h:127:27
    #2 0x56440bd676e0 in std::allocator<Context*>::allocate(unsigned long) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/allocator.h:185:32
    ceph#3 0x56440bd676e0 in std::allocator_traits<std::allocator<Context*>>::allocate(std::allocator<Context*>&, unsigned long) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/alloc_traits.h:464:20
    ceph#4 0x56440bd6730b in std::_Vector_base<Context*, std::allocator<Context*>>::_M_allocate(unsigned long) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_vector.h:346:20
    ceph#5 0x7fd33e00e8d1 in std::vector<Context*, std::allocator<Context*>>::reserve(unsigned long) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/vector.tcc:78:22
    ceph#6 0x7fd33e00c51c in librbd::cache::pwl::SyncPoint::SyncPoint(unsigned long, ceph::common::CephContext*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/librbd/cache/pwl/SyncPoint.cc:20:27
    ceph#7 0x56440bd65f26 in decltype(::new((void*)(0)) librbd::cache::pwl::SyncPoint(std::declval<unsigned long&>(), std::declval<ceph::common::CephContext*&>())) std::construct_at<librbd::cache::pwl::SyncPoint, unsigned long&, ceph::common::CephContext*&>(librbd::cache::pwl::SyncPoint*, unsigned long&, ceph::common::CephContext*&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_construct.h:97:39
    ceph#8 0x56440bd65b98 in void std::allocator_traits<std::allocator<librbd::cache::pwl::SyncPoint>>::construct<librbd::cache::pwl::SyncPoint, unsigned long&, ceph::common::CephContext*&>(std::allocator<librbd::cache::pwl::SyncPoint>&, librbd::cache::pwl::SyncPoint*, unsigned long&, ceph::common::CephContext*&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/alloc_traits.h:518:4
    ceph#9 0x56440bd657d3 in std::_Sp_counted_ptr_inplace<librbd::cache::pwl::SyncPoint, std::allocator<librbd::cache::pwl::SyncPoint>, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<unsigned long&, ceph::common::CephContext*&>(std::allocator<librbd::cache::pwl::SyncPoint>, unsigned long&, ceph::common::CephContext*&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:519:4
    ceph#10 0x56440bd65371 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<librbd::cache::pwl::SyncPoint, std::allocator<librbd::cache::pwl::SyncPoint>, unsigned long&, ceph::common::CephContext*&>(librbd::cache::pwl::SyncPoint*&, std::_Sp_alloc_shared_tag<std::allocator<librbd::cache::pwl::SyncPoint>>, unsigned long&, ceph::common::CephContext*&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:651:6
    ceph#11 0x56440bd65163 in std::__shared_ptr<librbd::cache::pwl::SyncPoint, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<librbd::cache::pwl::SyncPoint>, unsigned long&, ceph::common::CephContext*&>(std::_Sp_alloc_shared_tag<std::allocator<librbd::cache::pwl::SyncPoint>>, unsigned long&, ceph::common::CephContext*&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:1342:14
    ceph#12 0x56440bd650e6 in std::shared_ptr<librbd::cache::pwl::SyncPoint>::shared_ptr<std::allocator<librbd::cache::pwl::SyncPoint>, unsigned long&, ceph::common::CephContext*&>(std::_Sp_alloc_shared_tag<std::allocator<librbd::cache::pwl::SyncPoint>>, unsigned long&, ceph::common::CephContext*&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr.h:409:4
    ceph#13 0x56440bd65057 in std::shared_ptr<librbd::cache::pwl::SyncPoint> std::allocate_shared<librbd::cache::pwl::SyncPoint, std::allocator<librbd::cache::pwl::SyncPoint>, unsigned long&, ceph::common::CephContext*&>(std::allocator<librbd::cache::pwl::SyncPoint> const&, unsigned long&, ceph::common::CephContext*&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr.h:862:14
    ceph#14 0x56440bca97e7 in std::shared_ptr<librbd::cache::pwl::SyncPoint> std::make_shared<librbd::cache::pwl::SyncPoint, unsigned long&, ceph::common::CephContext*&>(unsigned long&, ceph::common::CephContext*&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr.h:878:14
    ceph#15 0x56440bd443c8 in librbd::cache::pwl::AbstractWriteLog<librbd::MockImageCtx>::new_sync_point(librbd::cache::pwl::DeferredContexts&) /home/jenkins-build/build/workspace/ceph-pull-requests/src/librbd/cache/pwl/AbstractWriteLog.cc:1905:20
    ceph#16 0x56440bd42e4c in librbd::cache::pwl::AbstractWriteLog<librbd::MockImageCtx>::flush_new_sync_point(librbd::cache::pwl::C_FlushRequest<librbd::cache::pwl::AbstractWriteLog<librbd::MockImageCtx>>*, librbd::cache::pwl::DeferredContexts&) /home/jenkins-build/build/workspace/ceph-pull-requests/src/librbd/cache/pwl/AbstractWriteLog.cc:1951:3
    ceph#17 0x56440bd9cbf2 in librbd::cache::pwl::AbstractWriteLog<librbd::MockImageCtx>::flush_new_sync_point_if_needed(librbd::cache::pwl::C_FlushRequest<librbd::cache::pwl::AbstractWriteLog<librbd::MockImageCtx>>*, librbd::cache::pwl::DeferredContexts&) /home/jenkins-build/build/workspace/ceph-pull-requests/src/librbd/cache/pwl/AbstractWriteLog.cc:1990:5
    ceph#18 0x56440bd9c636 in librbd::cache::pwl::AbstractWriteLog<librbd::MockImageCtx>::internal_flush(bool, Context*)::'lambda'(librbd::cache::pwl::GuardedRequestFunctionContext&)::operator()(librbd::cache::pwl::GuardedRequestFunctionContext&) const /home/jenkins-build/build/workspace/ceph-pull-requests/src/librbd/cache/pwl/AbstractWriteLog.cc:2152:9
    ceph#19 0x56440bd9b9b4 in boost::detail::function::void_function_obj_invoker<librbd::cache::pwl::AbstractWriteLog<librbd::MockImageCtx>::internal_flush(bool, Context*)::'lambda'(librbd::cache::pwl::GuardedRequestFunctionContext&), void, librbd::cache::pwl::GuardedRequestFunctionContext&>::invoke(boost::detail::function::function_buffer&, librbd::cache::pwl::GuardedRequestFunctionContext&) /opt/ceph/include/boost/function/function_template.hpp:100:11
    ceph#20 0x56440bd29321 in boost::function_n<void, librbd::cache::pwl::GuardedRequestFunctionContext&>::operator()(librbd::cache::pwl::GuardedRequestFunctionContext&) const /opt/ceph/include/boost/function/function_template.hpp:789:14
    ceph#21 0x56440bd28d85 in librbd::cache::pwl::GuardedRequestFunctionContext::finish(int) /home/jenkins-build/build/workspace/ceph-pull-requests/src/librbd/cache/pwl/Request.h:335:5
    ceph#22 0x5644091e0fe0 in Context::complete(int) /home/jenkins-build/build/workspace/ceph-pull-requests/src/include/Context.h:102:5
    ceph#23 0x56440bd9b378 in librbd::cache::pwl::AbstractWriteLog<librbd::MockImageCtx>::detain_guarded_request(librbd::cache::pwl::C_BlockIORequest<librbd::cache::pwl::AbstractWriteLog<librbd::MockImageCtx>>*, librbd::cache::pwl::GuardedRequestFunctionContext*, bool) /home/jenkins-build/build/workspace/ceph-pull-requests/src/librbd/cache/pwl/AbstractWriteLog.cc:1202:20
    ceph#24 0x56440bd96c50 in librbd::cache::pwl::AbstractWriteLog<librbd::MockImageCtx>::internal_flush(bool, Context*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/librbd/cache/pwl/AbstractWriteLog.cc:2154:3
    ceph#25 0x56440bd1e4b5 in librbd::cache::pwl::AbstractWriteLog<librbd::MockImageCtx>::shut_down(Context*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/librbd/cache/pwl/AbstractWriteLog.cc:703:3
    ceph#26 0x56440bdb9022 in librbd::cache::pwl::TestMockCacheSSDWriteLog_compare_and_write_compare_matched_Test::TestBody() /home/jenkins-build/build/workspace/ceph-pull-requests/src/test/librbd/cache/pwl/test_mock_SSDWriteLog.cc:403:7
```

Fixes: https://tracker.ceph.com/issues/71335

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
(cherry picked from commit 05fd6f9)
Matan-B pushed a commit that referenced this pull request Jul 16, 2025
Fix AddressSanitizer ODR (One Definition Rule) violation caused by
osdc/error_code.cc being compiled into both the osdc library and
ceph-common library.

The violation occurred because osdc_error_category was defined in
both the rbd binary (via osdc) and libceph-common.so, creating
duplicate symbols at runtime.

Since all targets that link against osdc also link against
ceph-common, removing osdc/error_code.cc from osdc doesn't break the
build while eliminating the duplicate definition.

ASan error sample:

```
  ==857433==ERROR: AddressSanitizer: odr-violation (0x5612ad665760):
    [1] size=22 'typeinfo name for osdc_error_category' /home/jenkins-build/build/workspace/ceph-pull-requests/src/osdc/error_code.cc in /home/jenkins-build/build/workspace/ceph-pull-requests/build/bin/rbd
    [2] size=22 'typeinfo name for osdc_error_category' /home/jenkins-build/build/workspace/ceph-pull-requests/src/osdc/error_code.cc in /home/jenkins-build/build/workspace/ceph-pull-requests/build/lib/libceph-common.so.2
  These globals were registered at these points:
    [1]:
      #0 0x5612acd62c88 in __asan_register_globals (/home/jenkins-build/build/workspace/ceph-pull-requests/build/bin/rbd+0x815c88) (BuildId: 62a02cbbf3426e5470e16372e3b53a18cb18ce0f)
      #1 0x5612acd63d59 in __asan_register_elf_globals (/home/jenkins-build/build/workspace/ceph-pull-requests/build/bin/rbd+0x816d59) (BuildId: 62a02cbbf3426e5470e16372e3b53a18cb18ce0f)
      #2 0x7f28d3b02eba in call_init csu/../csu/libc-start.c:145:3
      ceph#3 0x7f28d3b02eba in __libc_start_main csu/../csu/libc-start.c:379:5
```

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
Matan-B pushed a commit that referenced this pull request Jul 27, 2025
Removing vxattr 'ceph.dir.subvolume' on a directory without
it being set causes the mds to crash. This is because the
snaprealm would be null for the directory and the null check
is missing. Setting the vxattr, creates the snaprealm for
the directory as part of it. Hence, mds doesn't crash when
the vxattr is set and then removed. This patch fixes the same.

Reproducer:
$mkdir /mnt/dir1
$setfattr -x "ceph.dir.subvolume" /mnt/dir1

Traceback:
-------
Core was generated by `./ceph/build/bin/ceph-mds -i a -c ./ceph/build/ceph.conf'.
Program terminated with signal SIGSEGV, Segmentation fault.
(gdb) bt
 #0  0x00007f33f1aa8034 in __pthread_kill_implementation () from /lib64/libc.so.6
 #1  0x00007f33f1a4ef1e in raise () from /lib64/libc.so.6
 #2  0x0000562b148a6fd0 in reraise_fatal (signum=signum@entry=11) at /ceph/src/global/signal_handler.cc:88
 ceph#3  0x0000562b148a83d9 in handle_oneshot_fatal_signal (signum=11) at /ceph/src/global/signal_handler.cc:367
 ceph#4  <signal handler called>
 ceph#5  Server::handle_client_setvxattr (this=0x562b4ee3f800, mdr=..., cur=0x562b4ef9cc00) at /ceph/src/mds/Server.cc:6406
 ceph#6  0x0000562b145fadc2 in Server::handle_client_removexattr (this=0x562b4ee3f800, mdr=...) at /ceph/src/mds/Server.cc:7022
 ceph#7  0x0000562b145fbff0 in Server::dispatch_client_request (this=0x562b4ee3f800, mdr=...) at /ceph/src/mds/Server.cc:2825
 ceph#8  0x0000562b145fcfa2 in Server::handle_client_request (this=0x562b4ee3f800, req=...) at /ceph/src/mds/Server.cc:2676
 ceph#9  0x0000562b1460063c in Server::dispatch (this=0x562b4ee3f800, m=...) at /ceph/src/mds/Server.cc:382
 ceph#10 0x0000562b1450eb22 in MDSRank::handle_message (this=this@entry=0x562b4ef42008, m=...) at /ceph/src/mds/MDSRank.cc:1222
 ceph#11 0x0000562b14510c93 in MDSRank::_dispatch (this=this@entry=0x562b4ef42008, m=..., new_msg=new_msg@entry=true)
     at /ceph/src/mds/MDSRank.cc:1045
 ceph#12 0x0000562b14511620 in MDSRankDispatcher::ms_dispatch (this=this@entry=0x562b4ef42000, m=...) at /ceph/src/mds/MDSRank.cc:1019
 ceph#13 0x0000562b144ff117 in MDSDaemon::ms_dispatch2 (this=0x562b4ee64000, m=...) at /ceph/src/common/RefCountedObj.h:56
 ceph#14 0x00007f33f2f4974a in Messenger::ms_deliver_dispatch (this=0x562b4ee70000, m=...) at /ceph/src/msg/Messenger.h:746
 ceph#15 0x00007f33f2f467e2 in DispatchQueue::entry (this=0x562b4ee703b8) at /ceph/src/msg/DispatchQueue.cc:202
 ceph#16 0x00007f33f2ff61cb in DispatchQueue::DispatchThread::entry (this=<optimized out>) at /ceph/src/msg/DispatchQueue.h:101
 ceph#17 0x00007f33f2df4b5d in Thread::entry_wrapper (this=0x562b4ee70518) at /ceph/src/common/Thread.cc:87
 ceph#18 0x00007f33f2df4b6f in Thread::_entry_func (arg=<optimized out>) at /ceph/src/common/Thread.cc:74
 ceph#19 0x00007f33f1aa6088 in start_thread () from /lib64/libc.so.6
 ceph#20 0x00007f33f1b29f8c in clone3 () from /lib64/libc.so.6
---------

Fixes: https://tracker.ceph.com/issues/70794
Signed-off-by: Kotresh HR <khiremat@redhat.com>
Matan-B pushed a commit that referenced this pull request Jul 27, 2025
Fix a memory leak where HybridAllocator's embedded BitmapAllocator
(bmap_alloc) was not properly cleaned up during destruction. The issue
occurred because virtual function calls in destructors don't dispatch
to derived class implementations - when AvlAllocator::~AvlAllocator()
calls shutdown(), it invokes AvlAllocator::shutdown() rather than
HybridAllocatorBase::shutdown(), leaving bmap_alloc unfreed.

This issue only affects unit tests and does not impact production,
as BlueFS always calls shutdown() explicitly when stopping allocators
in BlueFS::_stop_alloc()

This change has minimal impact on existing code that already calls
shutdown() explicitly, since shutdown() has idempotent behavior and
can be safely called multiple times. Additionally, destructors are
only invoked during system teardown, so the potential double shutdown()
call does not affect runtime performance.

Changes:
- Replace raw pointer bmap_alloc with unique_ptr for automatic cleanup
- Add explicit shutdown() call in BitmapAllocator destructor for RAII
- Replace manual bmap_alloc->shutdown() with bmap_alloc.reset()
- Add explicit default destructor to HybridAllocatorBase for clarity

This eliminates the need for manual shutdown() calls and ensures
proper resource cleanup through RAII principles.

Fixes ASan-reported indirect leak of BitmapAllocator resources.

Please see the leak reported by ASan for more details:

```
Indirect leak of 23 byte(s) in 1 object(s) allocated from:
    #0 0x7fe30b121a2d in operator new(unsigned long) /usr/src/debug/gcc/gcc/libsanitizer/asan/asan_new_delete.cpp:86
    #1 0x5614e057ee11 in std::__new_allocator<char>::allocate(unsigned long, void const*) /usr/include/c++/15.1.1/bits/new_allocator.h:151
    #2 0x5614e057d279 in std::allocator<char>::allocate(unsigned long) /usr/include/c++/15.1.1/bits/allocator.h:203
    ceph#3 0x5614e057d279 in std::allocator_traits<std::allocator<char> >::allocate(std::allocator<char>&, unsigned long) /usr/include/c++/15.1.1/bits/alloc_traits.h:614
    ceph#4 0x5614e057d279 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_S_allocate(std::allocator<char>&, unsigned long) /usr/include/c++/15.1.1/bits/basic_string.h:142
    ceph#5 0x5614e057cf27 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_create(unsigned long&, unsigned long) /usr/include/c++/15.1.1/bits/basic_string.tcc:164
    ceph#6 0x5614e05fe91b in void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<true>(char const*, unsigned long) /usr/include/c++/15.1.1/bits/basic_string.tcc:291
    ceph#7 0x5614e05f3cd4 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /usr/include/c++/15.1.
1/bits/basic_string.h:617
    ceph#8 0x5614e069b892 in ceph::mutex_debug_detail::mutex_debug_impl<false>::mutex_debug_impl(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, bool) /home/kefu/dev/ceph/src/common/mutex_debug.
h:103
    ceph#9 0x5614e06b8586 in ceph::mutex_debug_detail::mutex_debug_impl<false> ceph::make_mutex<char const (&) [23]>(char const (&) [23]) /home/kefu/dev/ceph/src/common/ceph_mutex.h:118
    ceph#10 0x5614e06b8350 in AllocatorLevel02<AllocatorLevel01Loose>::AllocatorLevel02() /home/kefu/dev/ceph/src/os/bluestore/fastbmap_allocator_impl.h:526
    ceph#11 0x5614e06b393b in BitmapAllocator::BitmapAllocator(ceph::common::CephContext*, long, long, std::basic_string_view<char, std::char_traits<char> >) /home/kefu/dev/ceph/src/os/bluestore/BitmapAllocator.cc:16
    ceph#12 0x5614e072755f in HybridAllocatorBase<AvlAllocator>::_spillover_range(unsigned long, unsigned long) /home/kefu/dev/ceph/src/os/bluestore/HybridAllocator_impl.h:123
    ceph#13 0x5614e06c9045 in AvlAllocator::_try_insert_range(unsigned long, unsigned long, boost::intrusive::tree_iterator<boost::intrusive::mhtraits<range_seg_t, boost::intrusive::avl_set_member_hook<>, &range_seg_t::offset_hook>,
false>*) /home/kefu/dev/ceph/src/os/bluestore/AvlAllocator.h:210
    ceph#14 0x5614e06c0104 in AvlAllocator::_add_to_tree(unsigned long, unsigned long) /home/kefu/dev/ceph/src/os/bluestore/AvlAllocator.cc:136
    ceph#15 0x5614e0586e5f in HybridAllocatorBase<AvlAllocator>::_add_to_tree(unsigned long, unsigned long) /home/kefu/dev/ceph/src/os/bluestore/HybridAllocator.h:110
    ceph#16 0x5614e06c6559 in AvlAllocator::init_add_free(unsigned long, unsigned long) /home/kefu/dev/ceph/src/os/bluestore/AvlAllocator.cc:494
    ceph#17 0x5614e0582943 in HybridAllocator_fragmentation_Test::TestBody() /home/kefu/dev/ceph/src/test/objectstore/hybrid_allocator_test.cc:285
    ceph#18 0x5614e061bee3 in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/kefu/dev/ceph/src/googletest/googletest/src/gtest.cc:2653
    ceph#19 0x5614e0606562 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/kefu/dev/ceph/src/googletest/googletest/src/gtest.cc:2689`
```

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
Matan-B pushed a commit that referenced this pull request Aug 5, 2025
Fix a memory leak in ErasureCodePluginExample when plugin registration
fails. The allocated ErasureCodePluginExample instance was not being
freed if ErasureCodePluginRegistry::add() failed, which occurs in tests
that intentionally register duplicate plugins.

ASan detected the leak:

```
Direct leak of 16 byte(s) in 1 object(s) allocated from:
    #0 0x7f4501321a2d in operator new(unsigned long) /usr/src/debug/gcc/gcc/libsanitizer/asan/asan_new_delete.cpp:86
    #1 0x7f4501a5914d in __erasure_code_init /home/kefu/dev/ceph/src/test/erasure-code/ErasureCodePluginExample.cc:44
    #2 0x5589985be68d in ceph::ErasureCodePluginRegistry::load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>
> const&, ceph::ErasureCodePlugin**, std::ostream*) /home/kefu/dev/ceph/src/erasure-code/ErasureCodePlugin.cc:149
    ceph#3 0x5589984984ee in ErasureCodePluginRegistryTest_all_Test::TestBody() /home/kefu/dev/ceph/src/test/erasure-code/TestErasureCodePlugin.cc:116
```

Use unique_ptr to manage the plugin instance lifecycle, following the
pattern used by other erasure code plugins. The instance is now
automatically destroyed if registry addition fails.

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
Matan-B pushed a commit that referenced this pull request Aug 5, 2025
Previously, run-cli-tests ignored all environment variables from the parent
process to ensure a clean test environment. However, this also dropped
sanitizer settings (ASAN_OPTIONS and LSAN_OPTIONS) needed when AddressSanitizer
is enabled.

This causes test failures with TCMalloc due to false-positive leak reports
from TCMalloc's internal objects, which is a known issue documented in
Google's C++ style guide. While recent gperftools releases have fixed this,
Ubuntu Jammy still ships with an older version that triggers these warnings.

This change preserves sanitizer environment variables while maintaining
the clean test environment for other variables.

Note: Once we upgrade to newer gperftools, we can remove the related
suppression rule in qa/lsan.supp.

The test failure with TCMalloc looks like:

```
/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/cli/ceph-kvstore-tool/help.t: failed
--- /home/jenkins-build/build/workspace/ceph-pull-requests/src/test/cli/ceph-kvstore-tool/help.t
+++ /home/jenkins-build/build/workspace/ceph-pull-requests/src/test/cli/ceph-kvstore-tool/help.t.err
@@ -21,3 +21,19 @@
     stats
     histogram [prefix]

+
+  =================================================================
+  ==87908==ERROR: LeakSanitizer: detected memory leaks
+
+  Direct leak of 45 byte(s) in 1 object(s) allocated from:
+      #0 0x562fd797265d in operator new(unsigned long) (/home/jenkins-build/build/workspace/ceph-pull-requests/build/bin/ceph-kvstore-tool+0xe5e65d) (BuildId: 7eb56077b615aeb3c7aedafa0818ad89fdf3702d)
+      #1 0x562fd79815c8 in std::__new_allocator<char>::allocate(unsigned long, void const*) /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/new_allocator.h:137:27
+      #2 0x562fd7981520 in std::allocator<char>::allocate(unsigned long) /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/allocator.h:188:32
+      ceph#3 0x562fd7981520 in std::allocator_traits<std::allocator<char>>::allocate(std::allocator<char>&, unsigned long) /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/alloc_traits.h:464:20
+      ceph#4 0x562fd798115a in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>::_M_create(unsigned long&, unsigned long) /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/basic_string.tcc:155:14
+      ceph#5 0x562fd798787f in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>::_M_mutate(unsigned long, unsigned long, char const*, unsigned long) /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/basic_string.tcc:328:21
+      ceph#6 0x562fd79876a7 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>::_M_append(char const*, unsigned long) /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/basic_string.tcc:420:8
+      ceph#7 0x7fa1aa0286f0 in MallocExtension::Initialize() (/lib/x86_64-linux-gnu/libtcmalloc.so.4+0x2a6f0) (BuildId: eeef3d1257388a806e122398dbce3157ee568ef4)
+
+  SUMMARY: AddressSanitizer: 45 byte(s) leaked in 1 allocation(s).
```

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
Matan-B pushed a commit that referenced this pull request Sep 17, 2025
Replace unsafe string construction with bufferlist::length() to avoid
reading beyond buffer boundaries.

In commit 92ccbff, we introduced a bug when checking if a digest was
empty by constructing a std::string from bufferlist:

```
std::string(digest.second.c_str()).empty()
```

This is unsafe because bufferlist data is not guaranteed to be null-
terminated. The std::string constructor searches for a null terminator
and may read beyond the bufferlist's allocated memory, causing a
heap-buffer-overflow detected by AddressSanitizer:

```
==66092==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7e0c65215004 at pc 0x7fbc6e27c597 bp 0x7ffe29fb6100 sp 0x7ffe29fb58b8
READ of size 5 at 0x7e0c65215004 thread T0
    #0 0x7fbc6e27c596 in strlen /usr/src/debug/gcc/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:425
    #1 0x562c75fad91a in std::char_traits<char>::length(char const*) /usr/include/c++/15.2.1/bits/char_traits.h:393
    #2 0x562c75fb4222 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string<std::allocator<char> >(char const*, std::allocator<char> const&) /usr/include/c++/15.2.1/bits/b
asic_string.h:713
    ceph#3 0x562c761b81ae in operator() /home/kefu/dev/ceph/src/osd/scrubber/scrub_backend.cc:1300
    ceph#4 0x562c761d7d53 in operator()<mini_flat_map<shard_id_t, ceph::buffer::v15_2_0::list, signed char>::_iterator<false> > /usr/include/c++/15.2.1/bits/predefined_ops.h:318
    ceph#5 0x562c761d789c in __find_if<mini_flat_map<shard_id_t, ceph::buffer::v15_2_0::list, signed char>::_iterator<false>, __gnu_cxx::__ops::_Iter_pred<ScrubBackend::match_in_shards(const hobject_t&, auth_selection_
t&, inconsistent_obj_wrapper&, std::stringstream&)::<lambda(const std::pair<const shard_id_t, ceph::buffer::v15_2_0::list&>&)> > > /usr/include/c++/15.2.1/bits/stl_algobase.h:2095
    ceph#6 0x562c761d72b2 in find_if<mini_flat_map<shard_id_t, ceph::buffer::v15_2_0::list, signed char>::_iterator<false>, ScrubBackend::match_in_shards(const hobject_t&, auth_selection_t&, inconsistent_obj_wrapper&,
std::stringstream&)::<lambda(const std::pair<const shard_id_t, ceph::buffer::v15_2_0::list&>&)> > /usr/include/c++/15.2.1/bits/stl_algo.h:3921
    ceph#7 0x562c761d5f6f in none_of<mini_flat_map<shard_id_t, ceph::buffer::v15_2_0::list, signed char>::_iterator<false>, ScrubBackend::match_in_shards(const hobject_t&, auth_selection_t&, inconsistent_obj_wrapper&,
std::stringstream&)::<lambda(const std::pair<const shard_id_t, ceph::buffer::v15_2_0::list&>&)> > /usr/include/c++/15.2.1/bits/stl_algo.h:431
    ceph#8 0x562c761d4a50 in any_of<mini_flat_map<shard_id_t, ceph::buffer::v15_2_0::list, signed char>::_iterator<false>, ScrubBackend::match_in_shards(const hobject_t&, auth_selection_t&, inconsistent_obj_wrapper&, s
td::stringstream&)::<lambda(const std::pair<const shard_id_t, ceph::buffer::v15_2_0::list&>&)> > /usr/include/c++/15.2.1/bits/stl_algo.h:450
    ceph#9 0x562c761bb84b in ScrubBackend::match_in_shards(hobject_t const&, auth_selection_t&, inconsistent_obj_wrapper&, std::__cxx11::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >&) /home/k
efu/dev/ceph/src/osd/scrubber/scrub_backend.cc:1297
    ceph#10 0x562c761b4282 in ScrubBackend::compare_obj_in_maps[abi:cxx11](hobject_t const&) /home/kefu/dev/ceph/src/osd/scrubber/scrub_backend.cc:941
    ceph#11 0x562c761d44af in operator()<hobject_t> /home/kefu/dev/ceph/src/osd/scrubber/scrub_backend.cc:887
    ceph#12 0x562c761d4836 in for_each<std::_Rb_tree_const_iterator<hobject_t>, ScrubBackend::compare_smaps()::<lambda(const auto:422&)> > /usr/include/c++/15.2.1/bits/stl_algo.h:3798
    ceph#13 0x562c761b3259 in ScrubBackend::compare_smaps() /home/kefu/dev/ceph/src/osd/scrubber/scrub_backend.cc:884
    ceph#14 0x562c761a478d in ScrubBackend::update_authoritative() /home/kefu/dev/ceph/src/osd/scrubber/scrub_backend.cc:315`
```

Fix by using bufferlist::length() which tells if the given buffer is
empty instead of converting the buffer content to a string.

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
Matan-B pushed a commit that referenced this pull request Oct 23, 2025
…ives

Add suppression rules for two categories of false positive warnings
encountered during ASan-enabled testing:

1. PyModule_ExecDef memory leaks: ASan incorrectly interprets Python's
   module loading behavior as memory leaks when the interpreter loads
   extension modules.

2. __cxa_throw interception failures: ASan's interceptor cannot properly
   intercept exception handling when libstdc++.so is loaded after the
   ASan shared library, causing CHECK failures.

3. ErasureCodePluginRegistry::load:
   `ceph::ErasureCodePluginRegistry::load()` is known to leak, as we
   don't free the memory allocated by the ec plugins which are
   registered in the `ErasureCodePluginRegistry` singleton. this is a
   known issue, but since the `ErasureCodePluginRegistry` instance is a
   singleton. we can live with it. in this change, we add the rule to
   suppress the leak report from LeakSanitizer. this rule also exist in
   qa/valgrind.supp.

All warnings are confirmed false positives that should be suppressed
to reduce noise in test output.

Example warnings:

```
Direct leak of 3264 byte(s) in 1 object(s) allocated from:
    #0 0x7f6027d20cb5 in malloc /usr/src/debug/gcc/gcc/libsanitizer/asan/asan_malloc_linux.cpp:67
    #1 0x7f60277557ad  (/usr/lib/libpython3.13.so.1.0+0x1557ad) (BuildId: bea05fc2c8bd66145b159f10dcd810ebe813af39)
    #2 0x7f6027756067  (/usr/lib/libpython3.13.so.1.0+0x156067) (BuildId: bea05fc2c8bd66145b159f10dcd810ebe813af39)
    ceph#3 0x7f60278471a0  (/usr/lib/libpython3.13.so.1.0+0x2471a0) (BuildId: bea05fc2c8bd66145b159f10dcd810ebe813af39)
    ceph#4 0x7f602774d031  (/usr/lib/libpython3.13.so.1.0+0x14d031) (BuildId: bea05fc2c8bd66145b159f10dcd810ebe813af39)
    ceph#5 0x7b60234093bb in __Pyx_modinit_type_init_code.constprop.0 /home/kefu/dev/ceph/build/src/pybind/rados/rados.c:82066
    ceph#6 0x7b602340a826 in __pyx_pymod_exec_rados /home/kefu/dev/ceph/build/src/pybind/rados/rados.c:82755
    ceph#7 0x7f6027856777 in PyModule_ExecDef (/usr/lib/libpython3.13.so.1.0+0x256777) (BuildId: bea05fc2c8bd66145b159f10dcd810ebe813af39)
    ceph#8 0x7f602785baa3  (/usr/lib/libpython3.13.so.1.0+0x25baa3) (BuildId: bea05fc2c8bd66145b159f10dcd810ebe813af39)
    ceph#9 0x7f6027793df2  (/usr/lib/libpython3.13.so.1.0+0x193df2) (BuildId: bea05fc2c8bd66145b159f10dcd810ebe813af39)
    ceph#10 0x7f6027777cbe in _PyEval_EvalFrameDefault (/usr/lib/libpython3.13.so.1.0+0x177cbe) (BuildId: bea05fc2c8bd66145b159f10dcd810ebe813af39)
    ceph#11 0x7f60277957de  (/usr/lib/libpython3.13.so.1.0+0x1957de) (BuildId: bea05fc2c8bd66145b159f10dcd810ebe813af39)
    ceph#12 0x7f60277d11b9 in PyObject_CallMethodObjArgs (/usr/lib/libpython3.13.so.1.0+0x1d11b9) (BuildId: bea05fc2c8bd66145b159f10dcd810ebe813af39)
    ceph#13 0x7f60277d0ee4 in PyImport_ImportModuleLevelObject (/usr/lib/libpython3.13.so.1.0+0x1d0ee4) (BuildId: bea05fc2c8bd66145b159f10dcd810ebe813af39)
    ceph#14 0x7f6027779c0c in _PyEval_EvalFrameDefault (/usr/lib/libpython3.13.so.1.0+0x179c0c) (BuildId: bea05fc2c8bd66145b159f10dcd810ebe813af39)
    ceph#15 0x7f602784e2c8 in PyEval_EvalCode (/usr/lib/libpython3.13.so.1.0+0x24e2c8) (BuildId: bea05fc2c8bd66145b159f10dcd810ebe813af39)
    ceph#16 0x7f602788c88b  (/usr/lib/libpython3.13.so.1.0+0x28c88b) (BuildId: bea05fc2c8bd66145b159f10dcd810ebe813af39)
    ceph#17 0x7f602788985c  (/usr/lib/libpython3.13.so.1.0+0x28985c) (BuildId: bea05fc2c8bd66145b159f10dcd810ebe813af39)
    ceph#18 0x7f6027886f57  (/usr/lib/libpython3.13.so.1.0+0x286f57) (BuildId: bea05fc2c8bd66145b159f10dcd810ebe813af39)
    ceph#19 0x7f6027886211  (/usr/lib/libpython3.13.so.1.0+0x286211) (BuildId: bea05fc2c8bd66145b159f10dcd810ebe813af39)
    ceph#20 0x7f6027885b82  (/usr/lib/libpython3.13.so.1.0+0x285b82) (BuildId: bea05fc2c8bd66145b159f10dcd810ebe813af39)
    ceph#21 0x7f6027883e50 in Py_RunMain (/usr/lib/libpython3.13.so.1.0+0x283e50) (BuildId: bea05fc2c8bd66145b159f10dcd810ebe813af39)
    ceph#22 0x7f602783bbea in Py_BytesMain (/usr/lib/libpython3.13.so.1.0+0x23bbea) (BuildId: bea05fc2c8bd66145b159f10dcd810ebe813af39)
    ceph#23 0x7f6027227674  (/usr/lib/libc.so.6+0x27674) (BuildId: 4fe011c94a88e8aeb6f2201b9eb369f42b4a1e9e)
    ceph#24 0x7f6027227728 in __libc_start_main (/usr/lib/libc.so.6+0x27728) (BuildId: 4fe011c94a88e8aeb6f2201b9eb369f42b4a1e9e)
    ceph#25 0x55dae17e6044 in _start (/usr/bin/python3.13+0x1044) (BuildId: 8c0dc848f5b978c56ebeb07255bb332b4b37ae4e)
```

```
AddressSanitizer: CHECK failed: asan_interceptors.cpp:335 "((__interception::real___cxa_throw)) != (0)" (0x0, 0x0) (tid=3246455)
    #0 0x7f345ea81979 in CheckUnwind ../../../../src/libsanitizer/asan/asan_rtl.cpp:69
    #1 0x7f345eaa790d in __sanitizer::CheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) ../../../../src/libsanitizer/sanitizer_common/sanitizer_termination.cpp:86
    #2 0x7f345e9e1d54 in __interceptor___cxa_throw ../../../../src/libsanitizer/asan/asan_interceptors.cpp:335
    ceph#3 0x7f345e9e1d54 in __interceptor___cxa_throw ../../../../src/libsanitizer/asan/asan_interceptors.cpp:334
    ceph#4 0x7f3458623def in void boost::throw_exception<boost::bad_lexical_cast>(boost::bad_lexical_cast const&) /opt/ceph/include/boost/throw_exception.hpp:165
    ceph#5 0x7f345997ad3b in void boost::conversion::detail::throw_bad_cast<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, unsigned long>() /opt/ceph/include/boost/lexical_cast/bad_lexical_cast.hpp:93
    ceph#6 0x7f3459979d35 in unsigned long boost::lexical_cast<unsigned long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /opt/ceph/include/boost/lexical_cast.hpp:43`
```

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
Matan-B added a commit that referenced this pull request Dec 8, 2025
See comment:
```
  //TODO: should be changed to return future<> once all calls
  //	  to refresh are through co_await. We return LBAMapping
  //	  for now to avoid mandating the callers to make sure
  //	  the life of the lba mapping survives the refresh.
```

For now introduce co_refresh and mark the existing refresh as
deprecated. Following work will audit all the exising users of
refresh and move them to the new method. This change is not trivial
so I prefer to follow up on this as a seperate PR.

For now, move the trivial users to co_refresh to avoid UAR:
```
==103588==ERROR: AddressSanitizer: stack-use-after-return on address 0xffff80197e90 at pc 0xaaaacb941b24 bp 0xffff7e48dd80 sp 0xffff7e48dd78
READ of size 8 at 0xffff80197e90 thread T1
    #0 0xaaaacb941b20 in boost::intrusive_ptr<crimson::os::seastore::LBACursor>::swap(boost::intrusive_ptr<crimson::os::seastore::LBACursor>&) /opt/ceph/include/boost/smart_ptr/intrusive_ptr.hpp:172:18
    #1 0xaaaacb941998 in boost::intrusive_ptr<crimson::os::seastore::LBACursor>::operator=(boost::intrusive_ptr<crimson::os::seastore::LBACursor>&&) /opt/ceph/include/boost/smart_ptr/intrusive_ptr.hpp:93:61
    #2 0xaaaacb933758 in crimson::os::seastore::LBAMapping::operator=(crimson::os::seastore::LBAMapping&&) /ceph/src/crimson/os/seastore/lba_mapping.h:46:48
    ceph#3 0xaaaacde2fa54 in ... crimson::os::seastore::LBAMapping&&, std::array<crimson::os::seastore::LBAManager::remap_entry_t, 1ul>) (.resume) /ceph/src/crimson/os/seastore/transaction_manager.h:1282:11
```

Signed-off-by: Matan Breizman <mbreizma@redhat.com>
Matan-B added a commit that referenced this pull request Dec 8, 2025
See comment:
```
  //TODO: should be changed to return future<> once all calls
  //	  to refresh are through co_await. We return LBAMapping
  //	  for now to avoid mandating the callers to make sure
  //	  the life of the lba mapping survives the refresh.
```

For now introduce co_refresh and mark the existing refresh as
deprecated. Following work will audit all the exising users of
refresh and move them to the new method. This change is not trivial
so I prefer to follow up on this as a seperate PR.

For now, move the trivial users to co_refresh to avoid UAR:
```
==103588==ERROR: AddressSanitizer: stack-use-after-return on address 0xffff80197e90 at pc 0xaaaacb941b24 bp 0xffff7e48dd80 sp 0xffff7e48dd78
READ of size 8 at 0xffff80197e90 thread T1
    #0 0xaaaacb941b20 in boost::intrusive_ptr<crimson::os::seastore::LBACursor>::swap(boost::intrusive_ptr<crimson::os::seastore::LBACursor>&) /opt/ceph/include/boost/smart_ptr/intrusive_ptr.hpp:172:18
    #1 0xaaaacb941998 in boost::intrusive_ptr<crimson::os::seastore::LBACursor>::operator=(boost::intrusive_ptr<crimson::os::seastore::LBACursor>&&) /opt/ceph/include/boost/smart_ptr/intrusive_ptr.hpp:93:61
    #2 0xaaaacb933758 in crimson::os::seastore::LBAMapping::operator=(crimson::os::seastore::LBAMapping&&) /ceph/src/crimson/os/seastore/lba_mapping.h:46:48
    ceph#3 0xaaaacde2fa54 in ... crimson::os::seastore::LBAMapping&&, std::array<crimson::os::seastore::LBAManager::remap_entry_t, 1ul>) (.resume) /ceph/src/crimson/os/seastore/transaction_manager.h:1282:11
```

deprecate is commented out since otherwise make check would fail.

Signed-off-by: Matan Breizman <mbreizma@redhat.com>
Matan-B added a commit that referenced this pull request Dec 8, 2025
See comment:
```
  //TODO: should be changed to return future<> once all calls
  //	  to refresh are through co_await. We return LBAMapping
  //	  for now to avoid mandating the callers to make sure
  //	  the life of the lba mapping survives the refresh.
```

For now introduce co_refresh and mark the existing refresh as
deprecated. Following work will audit all the existing users of
refresh and move them to the new method. This change is not trivial
so I prefer to follow up on this as a separate PR.

This should help avoiding UAR in suspension points:
```
==103588==ERROR: AddressSanitizer: stack-use-after-return on address 0xffff80197e90 at pc 0xaaaacb941b24 bp 0xffff7e48dd80 sp 0xffff7e48dd78
READ of size 8 at 0xffff80197e90 thread T1
    #0 0xaaaacb941b20 in boost::intrusive_ptr<crimson::os::seastore::LBACursor>::swap(boost::intrusive_ptr<crimson::os::seastore::LBACursor>&) /opt/ceph/include/boost/smart_ptr/intrusive_ptr.hpp:172:18
    #1 0xaaaacb941998 in boost::intrusive_ptr<crimson::os::seastore::LBACursor>::operator=(boost::intrusive_ptr<crimson::os::seastore::LBACursor>&&) /opt/ceph/include/boost/smart_ptr/intrusive_ptr.hpp:93:61
    #2 0xaaaacb933758 in crimson::os::seastore::LBAMapping::operator=(crimson::os::seastore::LBAMapping&&) /ceph/src/crimson/os/seastore/lba_mapping.h:46:48
    ceph#3 0xaaaacde2fa54 in ... crimson::os::seastore::LBAMapping&&, std::array<crimson::os::seastore::LBAManager::remap_entry_t, 1ul>) (.resume) /ceph/src/crimson/os/seastore/transaction_manager.h:1282:11
```

Deprecate is commented out since otherwise make check would fail.

Signed-off-by: Matan Breizman <mbreizma@redhat.com>
Matan-B pushed a commit that referenced this pull request Dec 10, 2025
The static std::map max_prio_map was defined in the osd_types.h header
file, causing every translation unit that included this header to get
its own copy of the variable. This led to One Definition Rule (ODR)
violations where multiple instances of the same variable existed at
runtime.

During program cleanup, destructors for these multiple instances would
attempt to free the same memory regions, resulting in segmentation
faults in tcmalloc/memory allocator as seen with ceph-dencoder.

This issue surfaced after a yet-merged-change which converts erasure_code
and json_spirit to OBJECT libraries. Before that change, these were
STATIC libraries that were linked via target_link_libraries. The
incorrect linkage meant their object files (and thus their copies of
max_prio_map) were kept separate and didn't conflict at runtime.

After converting to OBJECT libraries and properly incorporating them
into libceph-common.so (commit 8b0e3fb2c23), the multiple copies of
max_prio_map from different translation units all ended up in the same
shared library, exposing the ODR violation. During program exit, the
dynamic linker attempted to run destructors for all instances, leading
to double-free crashes.

Fix by moving the map into a static helper function in PeeringState.cc
(the only file that uses it). The map is now a function-local static
const variable, ensuring a single instance that is properly initialized
and destructed.

Backtrace before fix:
```
    #0  0x00007ffff7dbb1a0 in tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned int, int) () from /lib/x86_64-linux-gnu/libtcmalloc.so.4
    #1  0x00007ffff7dbb57f in tcmalloc::ThreadCache::Scavenge() () from /lib/x86_64-linux-gnu/libtcmalloc.so.4
    #2  0x00007ffff6bc8aa2 in std::__new_allocator<std::_Rb_tree_node<std::pair<int const, int> > >::deallocate (this=0x7ffff7d48f78 <max_prio_map>, __p=0x555555f43890, __n=1)
    ceph#3  0x00007ffff6bc89f9 in std::allocator<std::_Rb_tree_node<std::pair<int const, int> > >::deallocate (this=0x7ffff7d48f78 <max_prio_map>, __p=0x555555f43890, __n=1)
    ceph#4  std::allocator_traits<std::allocator<std::_Rb_tree_node<std::pair<int const, int> > > >::deallocate (__a=..., __p=0x555555f43890, __n=1)
    ceph#5  std::_Rb_tree<int, std::pair<int const, int>, std::_Select1st<std::pair<int const, int> >, std::less<int>, std::allocator<std::pair<int const, int> > >::_M_put_node (this=0x7ffff7d48f78 <max_prio_map>, __p=0x555555f43890)
    ceph#6  0x00007ffff6bc892e in std::_Rb_tree<int, std::pair<int const, int>, std::_Select1st<std::pair<int const, int> >, std::less<int>, std::allocator<std::pair<int const, int> > >::_M_drop_node (this=0x7ffff7d48f78 <max_prio_map>, __p=0x555555f43890)
    ceph#7  0x00007ffff6bc886e in std::_Rb_tree<int, std::pair<int const, int>, std::_Select1st<std::pair<int const, int> >, std::less<int>, std::allocator<std::pair<int const, int> > >::_M_erase (this=0x7ffff7d48f78 <max_prio_map>, __x=0x555555f43890)
    ceph#8  0x00007ffff6bc8854 in std::_Rb_tree<int, std::pair<int const, int>, std::_Select1st<std::pair<int const, int> >, std::less<int>, std::allocator<std::pair<int const, int> > >::_M_erase (this=0x7ffff7d48f78 <max_prio_map>, __x=0x555555f43cb0)
    ceph#9  0x00007ffff6bc8854 in std::_Rb_tree<int, std::pair<int const, int>, std::_Select1st<std::pair<int const, int> >, std::less<int>, std::allocator<std::pair<int const, int> > >::_M_erase (this=0x7ffff7d48f78 <max_prio_map>, __x=0x555555f43ad0)
    ceph#10 0x00007ffff6bc8805 in std::_Rb_tree<int, std::pair<int const, int>, std::_Select1st<std::pair<int const, int> >, std::less<int>, std::allocator<std::pair<int const, int> > >::~_Rb_tree (this=0x7ffff7d48f78 <max_prio_map>)
    ceph#11 0x00007ffff6bc7345 in std::map<int, int, std::less<int>, std::allocator<std::pair<int const, int> > >::~map (this=0x7ffff7d48f78 <max_prio_map>)
    ceph#12 0x00007ffff484bd51 in __cxa_finalize (d=0x7ffff7d3f440) at ./stdlib/cxa_finalize.c:97
    ceph#13 0x00007ffff6af9487 in __do_global_dtors_aux () from /home/kefu/dev/ceph/build/lib/libceph-common.so.2
    ceph#14 0x00007ffff7fbfd20 in ?? ()
    ceph#15 0x00007ffff7fc8fc2 in _dl_call_fini (closure_map=0x7fffffffd0f0, closure_map@entry=0x7ffff7fbfd20) at ./elf/dl-call_fini.c:43
    ceph#16 0x00007ffff7fcbe72 in _dl_fini () at ./elf/dl-fini.c:120
    ceph#17 0x00007ffff484c291 in __run_exit_handlers (status=0, listp=0x7ffff49f1680 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at ./stdlib/exit.c:118
    ceph#18 0x00007ffff484c35a in __GI_exit (status=<optimized out>) at ./stdlib/exit.c:148
    ceph#19 0x00007ffff4833caf in __libc_start_call_main (main=main@entry=0x55555556cd90 <main(int, char const**)>, argc=argc@entry=2, argv=argv@entry=0x7fffffffd488) at ../sysdeps/nptl/libc_start_call_main.h:74
    ceph#20 0x00007ffff4833d65 in __libc_start_main_impl (main=0x55555556cd90 <main(int, char const**)>, argc=2, argv=0x7fffffffd488, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffd478) at ../csu/libc-start.c:360
    ceph#21 0x00005555555695e1 in _start ()
```

Signed-off-by: Kefu Chai <k.chai@proxmox.com>
Matan-B pushed a commit that referenced this pull request Jan 20, 2026
Fix memory leaks detected by AddressSanitizer in unittest_dbstore_tests.
The test was failing with ASan enabled due to SQLObjectOp objects not
being properly cleaned up.

ASan reported the following leaks:

  Direct leak of 200 byte(s) in 1 object(s) allocated from:
    #0 operator new(unsigned long)
    #1 SQLGetBucket::Execute(DoutPrefixProvider const*, rgw::store::DBOpParams*)
       /src/rgw/driver/dbstore/sqlite/sqliteDB.cc:1689
    #2 rgw::store::DB::ProcessOp(DoutPrefixProvider const*, ...)
       /src/rgw/driver/dbstore/common/dbstore.cc:258

  Direct leak of 200 byte(s) in 1 object(s) allocated from:
    #0 operator new(unsigned long)
    #1 SQLInsertBucket::Execute(DoutPrefixProvider const*, rgw::store::DBOpParams*)
       /src/rgw/driver/dbstore/sqlite/sqliteDB.cc:1433
    #2 rgw::store::DB::ProcessOp(DoutPrefixProvider const*, ...)
       /src/rgw/driver/dbstore/common/dbstore.cc:258

  SUMMARY: AddressSanitizer: 460550 byte(s) leaked in 1823 allocation(s).

Root cause: The DB::Destroy() method had an early return when the db
pointer was NULL, preventing cleanup of the objectmap which stores
SQLObjectOp pointers. These objects were allocated during test execution
but never freed.

Changes:
- Modified DB::Destroy() to always clean up objectmap even when db is NULL
- Added explicit delete in objectmapDelete() for consistency
- Added lsan suppression for SQLite internal allocations (indirect leaks)

After the fix, all direct leaks are eliminated. Only indirect leaks from
SQLite's internal memory management remain, which are now suppressed.

Test results:
- Before: 460,550 bytes leaked (including 2 direct leaks of 200 bytes each)
- After: 0 direct leaks, unittest_dbstore_tests passes with ASan

Signed-off-by: Kefu Chai <k.chai@proxmox.com>
Matan-B pushed a commit that referenced this pull request Jan 20, 2026
The ExtentMap.reshard_failure test was leaking memory by not properly
cleaning up the OnodeCacheShard and BufferCacheShard objects it created.

ASan reported:
  Direct leak of 9928 byte(s) in 1 object(s) allocated from:
    #1 BlueStore::OnodeCacheShard::create() BlueStore.cc:1221
    #2 ExtentMap_reshard_failure_Test::TestBody() test_bluestore_types.cc:1244

  Direct leak of 224 byte(s) in 1 object(s) allocated from:
    #1 BlueStore::BufferCacheShard::create() BlueStore.cc:1680
    #2 ExtentMap_reshard_failure_Test::TestBody() test_bluestore_types.cc:1246

  SUMMARY: AddressSanitizer: 10288 byte(s) leaked in 8 allocation(s).

Fix by:
1. Wrapping coll and onode in an additional scope block to ensure they
   are destroyed before the cache shards (releasing all blob references)
2. Adding proper cleanup with delete bc and delete oc at test end

This matches the cleanup pattern used in BlueStoreFixture::TearDown().

Signed-off-by: Kefu Chai <k.chai@proxmox.com>
Matan-B added a commit that referenced this pull request Jan 20, 2026
…yed static"

```
Jan 20 09:27:16 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]: AddressSanitizer:DEADLYSIGNAL
Jan 20 09:27:16 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]: =================================================================
Jan 20 09:27:16 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]: ==3==ERROR: AddressSanitizer: stack-overflow on address 0x7b512f6c8dd8 (pc 0x0000046e7a72 bp 0x7b512de7c900 sp 0x7b512f6c8dd8 T0)
Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]:     #0 0x0000046e7a72 in get_global_options() (/usr/bin/ceph-osd-crimson+0x46e7a72) (BuildId: 2a86043f51c9be9cb19801e276fb3ee36239556a)
Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]:     #1 0x0000046e540e in build_options() (/usr/bin/ceph-osd-crimson+0x46e540e) (BuildId: 2a86043f51c9be9cb19801e276fb3ee36239556a)
Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]:     #2 0x0000033b7949 in get_ceph_options() (/usr/bin/ceph-osd-crimson+0x33b7949) (BuildId: 2a86043f51c9be9cb19801e276fb3ee36239556a)
Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]:     ceph#3 0x000003440540 in md_config_t::md_config_t(ConfigValues&, ConfigTracker const&, bool) (/usr/bin/ceph-osd-crimson+0x3440540) (BuildId: 2a860>
Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]:     ceph#4 0x0000046856a8 in crimson::common::ConfigProxy::ConfigProxy(EntityName const&, std::basic_string_view<char, std::char_traits<char> >) (/usr>
Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]:     ceph#5 0x000000eb6cb5 in seastar::shared_ptr_count_for<crimson::common::ConfigProxy>::shared_ptr_count_for<EntityName&, std::__cxx11::basic_string>
..
Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]:     ceph#40 0x000000ed6434 in seastar::future<int> seastar::futurize<int>::apply<crimson::osd::_get_early_config(int, char const**)::{lambda()#1}::ope>
Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]:     ceph#41 0x000000ed672b in seastar::async<crimson::osd::_get_early_config(int, char const**)::{lambda()#1}::operator()() const::{lambda()#1}>(seast>
```

This reverts commit 1ab0a8c.
Matan-B added a commit that referenced this pull request Jan 20, 2026
…yed static"

```
Jan 20 09:27:16 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]: AddressSanitizer:DEADLYSIGNAL
Jan 20 09:27:16 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]: =================================================================
Jan 20 09:27:16 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]: ==3==ERROR: AddressSanitizer: stack-overflow on address 0x7b512f6c8dd8 (pc 0x0000046e7a72 bp 0x7b512de7c900 sp 0x7b512f6c8dd8 T0)
Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]:     #0 0x0000046e7a72 in get_global_options() (/usr/bin/ceph-osd-crimson+0x46e7a72) (BuildId: 2a86043f51c9be9cb19801e276fb3ee36239556a)
Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]:     #1 0x0000046e540e in build_options() (/usr/bin/ceph-osd-crimson+0x46e540e) (BuildId: 2a86043f51c9be9cb19801e276fb3ee36239556a)
Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]:     #2 0x0000033b7949 in get_ceph_options() (/usr/bin/ceph-osd-crimson+0x33b7949) (BuildId: 2a86043f51c9be9cb19801e276fb3ee36239556a)
Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]:     ceph#3 0x000003440540 in md_config_t::md_config_t(ConfigValues&, ConfigTracker const&, bool) (/usr/bin/ceph-osd-crimson+0x3440540) (BuildId: 2a860>
Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]:     ceph#4 0x0000046856a8 in crimson::common::ConfigProxy::ConfigProxy(EntityName const&, std::basic_string_view<char, std::char_traits<char> >) (/usr>
Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]:     ceph#5 0x000000eb6cb5 in seastar::shared_ptr_count_for<crimson::common::ConfigProxy>::shared_ptr_count_for<EntityName&, std::__cxx11::basic_string>
..
Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]:     ceph#40 0x000000ed6434 in seastar::future<int> seastar::futurize<int>::apply<crimson::osd::_get_early_config(int, char const**)::{lambda()#1}::ope>
Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]:     ceph#41 0x000000ed672b in seastar::async<crimson::osd::_get_early_config(int, char const**)::{lambda()#1}::operator()() const::{lambda()#1}>(seast>
```

This reverts commit 1ab0a8c.
Matan-B added a commit that referenced this pull request Jan 20, 2026
…yed static"

```
Jan 20 09:27:16 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]: AddressSanitizer:DEADLYSIGNAL
Jan 20 09:27:16 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]: =================================================================
Jan 20 09:27:16 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]: ==3==ERROR: AddressSanitizer: stack-overflow on address 0x7b512f6c8dd8 (pc 0x0000046e7a72 bp 0x7b512de7c900 sp 0x7b512f6c8dd8 T0)
Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]:     #0 0x0000046e7a72 in get_global_options() (/usr/bin/ceph-osd-crimson+0x46e7a72) (BuildId: 2a86043f51c9be9cb19801e276fb3ee36239556a)
Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]:     #1 0x0000046e540e in build_options() (/usr/bin/ceph-osd-crimson+0x46e540e) (BuildId: 2a86043f51c9be9cb19801e276fb3ee36239556a)
Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]:     #2 0x0000033b7949 in get_ceph_options() (/usr/bin/ceph-osd-crimson+0x33b7949) (BuildId: 2a86043f51c9be9cb19801e276fb3ee36239556a)
Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]:     ceph#3 0x000003440540 in md_config_t::md_config_t(ConfigValues&, ConfigTracker const&, bool) (/usr/bin/ceph-osd-crimson+0x3440540) (BuildId: 2a860>
Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]:     ceph#4 0x0000046856a8 in crimson::common::ConfigProxy::ConfigProxy(EntityName const&, std::basic_string_view<char, std::char_traits<char> >) (/usr>
Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]:     ceph#5 0x000000eb6cb5 in seastar::shared_ptr_count_for<crimson::common::ConfigProxy>::shared_ptr_count_for<EntityName&, std::__cxx11::basic_string>
..
Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]:     ceph#40 0x000000ed6434 in seastar::future<int> seastar::futurize<int>::apply<crimson::osd::_get_early_config(int, char const**)::{lambda()#1}::ope>
Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]:     ceph#41 0x000000ed672b in seastar::async<crimson::osd::_get_early_config(int, char const**)::{lambda()#1}::operator()() const::{lambda()#1}>(seast>
```

This reverts commit 1ab0a8c.

Signed-off-by: Matan Breizman <mbreizma@redhat.com>
Matan-B added a commit that referenced this pull request Jan 21, 2026
…yed static"

```
Jan 20 09:27:16 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]: AddressSanitizer:DEADLYSIGNAL
Jan 20 09:27:16 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]: =================================================================
Jan 20 09:27:16 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]: ==3==ERROR: AddressSanitizer: stack-overflow on address 0x7b512f6c8dd8 (pc 0x0000046e7a72 bp 0x7b512de7c900 sp 0x7b512f6c8dd8 T0)
Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]:     #0 0x0000046e7a72 in get_global_options() (/usr/bin/ceph-osd-crimson+0x46e7a72) (BuildId: 2a86043f51c9be9cb19801e276fb3ee36239556a)
Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]:     #1 0x0000046e540e in build_options() (/usr/bin/ceph-osd-crimson+0x46e540e) (BuildId: 2a86043f51c9be9cb19801e276fb3ee36239556a)
Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]:     #2 0x0000033b7949 in get_ceph_options() (/usr/bin/ceph-osd-crimson+0x33b7949) (BuildId: 2a86043f51c9be9cb19801e276fb3ee36239556a)
Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]:     ceph#3 0x000003440540 in md_config_t::md_config_t(ConfigValues&, ConfigTracker const&, bool) (/usr/bin/ceph-osd-crimson+0x3440540) (BuildId: 2a860>
Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]:     ceph#4 0x0000046856a8 in crimson::common::ConfigProxy::ConfigProxy(EntityName const&, std::basic_string_view<char, std::char_traits<char> >) (/usr>
Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]:     ceph#5 0x000000eb6cb5 in seastar::shared_ptr_count_for<crimson::common::ConfigProxy>::shared_ptr_count_for<EntityName&, std::__cxx11::basic_string>
..
Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]:     ceph#40 0x000000ed6434 in seastar::future<int> seastar::futurize<int>::apply<crimson::osd::_get_early_config(int, char const**)::{lambda()#1}::ope>
Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]:     ceph#41 0x000000ed672b in seastar::async<crimson::osd::_get_early_config(int, char const**)::{lambda()#1}::operator()() const::{lambda()#1}>(seast>
```

This reverts commit 1ab0a8c.

Signed-off-by: Matan Breizman <mbreizma@redhat.com>
Matan-B added a commit that referenced this pull request Jan 21, 2026
…yed static"

```
Jan 20 09:27:16 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]: AddressSanitizer:DEADLYSIGNAL
Jan 20 09:27:16 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]: =================================================================
Jan 20 09:27:16 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]: ==3==ERROR: AddressSanitizer: stack-overflow on address 0x7b512f6c8dd8 (pc 0x0000046e7a72 bp 0x7b512de7c900 sp 0x7b512f6c8dd8 T0)
Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]:     #0 0x0000046e7a72 in get_global_options() (/usr/bin/ceph-osd-crimson+0x46e7a72) (BuildId: 2a86043f51c9be9cb19801e276fb3ee36239556a)
Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]:     #1 0x0000046e540e in build_options() (/usr/bin/ceph-osd-crimson+0x46e540e) (BuildId: 2a86043f51c9be9cb19801e276fb3ee36239556a)
Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]:     #2 0x0000033b7949 in get_ceph_options() (/usr/bin/ceph-osd-crimson+0x33b7949) (BuildId: 2a86043f51c9be9cb19801e276fb3ee36239556a)
Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]:     ceph#3 0x000003440540 in md_config_t::md_config_t(ConfigValues&, ConfigTracker const&, bool) (/usr/bin/ceph-osd-crimson+0x3440540) (BuildId: 2a860>
Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]:     ceph#4 0x0000046856a8 in crimson::common::ConfigProxy::ConfigProxy(EntityName const&, std::basic_string_view<char, std::char_traits<char> >) (/usr>
Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]:     ceph#5 0x000000eb6cb5 in seastar::shared_ptr_count_for<crimson::common::ConfigProxy>::shared_ptr_count_for<EntityName&, std::__cxx11::basic_string>
..
Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]:     ceph#40 0x000000ed6434 in seastar::future<int> seastar::futurize<int>::apply<crimson::osd::_get_early_config(int, char const**)::{lambda()#1}::ope>
Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]:     ceph#41 0x000000ed672b in seastar::async<crimson::osd::_get_early_config(int, char const**)::{lambda()#1}::operator()() const::{lambda()#1}>(seast>
```

This reverts commit 1ab0a8c.

Fixes: https://tracker.ceph.com/issues/74481

Signed-off-by: Matan Breizman <mbreizma@redhat.com>
Matan-B pushed a commit that referenced this pull request Feb 9, 2026
Fix memory leak detected by AddressSanitizer in unittest_http_manager.
The test was failing with ASan enabled due to rgw_http_req_data objects
not being properly cleaned up when the HTTP manager thread exits.

ASan reported the following leaks:

  Direct leak of 17152 byte(s) in 32 object(s) allocated from:
    #0 operator new(unsigned long)
    #1 RGWHTTPManager::add_request(RGWHTTPClient*)
       /ceph/src/rgw/rgw_http_client.cc:946:33
    #2 HTTPManager_SignalThread_Test::TestBody()
       /ceph/src/test/rgw/test_http_manager.cc:132:10

  Indirect leak of 768 byte(s) in 32 object(s) allocated from:
    #0 operator new(unsigned long)
    #1 rgw_http_req_data::rgw_http_req_data()
       /ceph/src/rgw/rgw_http_client.cc:52:22
    #2 RGWHTTPManager::add_request(RGWHTTPClient*)
       /ceph/src/rgw/rgw_http_client.cc:946:37

  SUMMARY: AddressSanitizer: 17920 byte(s) leaked in 64 allocation(s).

Root cause: The rgw_http_req_data class uses reference counting
(inherits from RefCountedObject). When a request is unregistered,
unregister_request() calls get() to increment the refcount, expecting
a corresponding put() to be called later.

In manage_pending_requests(), unregistered requests are properly
handled with both _unlink_request() and put(). However, in the thread
cleanup code (reqs_thread_entry exit path), only _unlink_request() was
called without the matching put(), causing a reference count leak.

The fix adds the missing put() call in the thread cleanup code to match
the reference counting pattern used in manage_pending_requests().

Test results:
- Before: 17,920 bytes leaked in 64 allocations
- After: 0 leaks, unittest_http_manager passes with ASan

Fixes: https://tracker.ceph.com/issues/74762
Signed-off-by: Kefu Chai <k.chai@proxmox.com>
Matan-B pushed a commit that referenced this pull request Feb 22, 2026
This commit fixes a critical cache key collision bug in the ISA erasure
code plugin that could lead to heap-buffer-overflow and silent data
corruption.

Problem:
--------
The decoding table cache was indexed only by matrix type and erasure
signature (available/missing chunk pattern), but did NOT include the
(k,m) erasure code configuration parameters. This caused different EC
configurations with similar erasure patterns to collide in the cache,
leading to incorrectly-sized cached buffers being reused.

AddressSanitizer Report:
------------------------
==4904==ERROR: AddressSanitizer: heap-buffer-overflow on address
0x5160001397b8 at pc 0x5de8e415296b bp 0x7ffc82260310 sp 0x7ffc8225fad0
READ of size 576 at 0x5160001397b8 thread T0
    #0 __asan_memcpy
    #1 ErasureCodeIsaTableCache::getDecodingTableFromCache()
       .../ErasureCodeIsaTableCache.cc:260:5
    #2 ErasureCodeIsaDefault::isa_decode()
       .../ErasureCodeIsa.cc:490:15

0x5160001397b8 is located 0 bytes after 568-byte region
[0x516000139580,0x5160001397b8) allocated by:
    #0 posix_memalign
    #1 ceph::buffer::raw_combined::alloc_data_n_controlblock()
    #2 ErasureCodeIsaTableCache::putDecodingTableToCache()
       .../ErasureCodeIsaTableCache.cc:319:18

Root Cause:
-----------
Scenario illustrating the bug:
1. First decode operation: k=2, m=1, erasure pattern "+0+2-1"
   - Creates cache entry with key "+0+2-1"
   - Allocates buffer: 2*(1+2)*32 = 192 bytes
2. Second decode operation: k=3, m=3, same erasure pattern "+0+2-1"
   - Looks up cache with key "+0+2-1" → COLLISION
   - Retrieves 192-byte buffer but needs 3*(3+3)*32 = 576 bytes
   - Result: Heap-buffer-overflow (reads 384 bytes beyond allocation)

Worse scenario (silent corruption):
1. First decode: k=3, m=3 → caches 576-byte table
2. Second decode: k=2, m=1 → retrieves wrong table
   - Uses incorrect decoding matrix
   - Result: Silent data corruption during recovery

Solution:
---------
Include k and m parameters in cache signature
 - Old format: "+0+2+3-1-4"
 - New format: "k3m2a+0+2+3e-1-4"

Test Fix:
---------
Also fixes a buffer overflow in TestErasureCodePlugins.cc where
hashes_bl offset was calculated using chunk_size instead of sizeof(uint32_t),
causing reads beyond the CRC buffer.

Production Impact:
------------------

Backward Compatibility: FULLY COMPATIBLE
- Cache is ephemeral (in-memory only, not persisted)
- Cache cleared on process restart
- Rolling upgrades safe - each OSD restart gets fixed code
- Old cache entries automatically invalidated on upgrade
- No wire protocol or on-disk format changes
- No configuration changes required
- No breaking changes

Data Integrity:
- Eliminates silent data corruption risk
- Eliminates heap-buffer-overflow crashes
- Cache now correctly isolated by (k,m) configuration
- Correct decoding tables always used for recovery
- No risk of corrupting user data from the fix itself

Why Users Haven't Complained:
------------------------------

Several factors likely prevented widespread reports:

1. Low probability conditions required:
   - Need multiple EC pools with DIFFERENT (k,m) configurations
   - Need similar erasure patterns across pools
   - Need cache collision to occur during actual recovery operations
   - Recovery operations are relatively rare in healthy clusters

2. Crash vs silent corruption detection:
   - Buffer overflows (easier to detect) occur when k2,m2 > k1,m1
   - Silent corruption (harder to detect) occurs when k2,m2 < k1,m1
   - Crashes might be attributed to other causes
   - Data corruption only detected during scrub or data verification

3. Common deployment patterns:
   - Many deployments use single EC configuration cluster-wide
   - Default EC configurations (k=2,m=1 or k=4,m=2) reduce collision space
   - Erasure pattern variety may be insufficient for collisions

4. ISA plugin usage:
   - Not universally deployed (requires Intel ISA-L library)
   - Some sites use jerasure plugin instead
   - Plugin selection depends on hardware and configuration

5. Detection difficulty:
   - ASan not enabled in production builds
   - Silent corruption only appears during:
     * Degraded reads with recovery
     * Scrub operations
     * Deep-scrub verification
   - Corrupted data might not be immediately accessed

Fixes: https://tracker.ceph.com/issues/74382

Signed-off-by: Kefu Chai <k.chai@proxmox.com>
Matan-B pushed a commit that referenced this pull request Mar 2, 2026
Fix memory leak in librbd persistent write log (PWL) cache discard
operations by properly completing request objects.

ASan reported the following leaks in unittest_librbd:

  Direct leak of 240 byte(s) in 1 object(s) allocated from:
    #0 operator new(unsigned long)
    #1 librbd::cache::pwl::AbstractWriteLog<librbd::MockImageCtx>::discard(...)
       /ceph/src/librbd/cache/pwl/AbstractWriteLog.cc:935:5
    #2 TestMockCacheReplicatedWriteLog_discard_Test::TestBody()
       /ceph/src/test/librbd/cache/pwl/test_mock_ReplicatedWriteLog.cc:534:7

  Plus multiple indirect leaks totaling 2,076 bytes through the
  shared_ptr reference chain.

Root cause:

C_DiscardRequest objects were never deleted because their complete()
method was never called. The on_write_persist callback released the
BlockGuard cell but didn't call complete() to trigger self-deletion.

Write requests use WriteLogOperationSet which takes the request as
its on_finish callback, ensuring complete() is eventually called.
Discard requests don't use WriteLogOperationSet and must explicitly
call complete() in their on_write_persist callback.

Solution:

Call discard_req->complete(r) in the on_write_persist callback and
move cell release into finish_req() -- mirroring how C_WriteRequest
handles it. The complete() -> finish() -> finish_req() chain ensures
the cell is released after the user request is completed, preserving
the same ordering as write requests.

Test results:
- Before: 2,316 bytes leaked in 15 allocations
- After: 0 bytes leaked
- unittest_librbd discard tests pass successfully with ASan

Fixes: https://tracker.ceph.com/issues/74972
Signed-off-by: Kefu Chai <k.chai@proxmox.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants