Skip to content

osd/ECExtentCache: prevent use-after-free by checking Object existence#64062

Closed
tchaikov wants to merge 1 commit intoceph:mainfrom
tchaikov:wip-ec-cache-fix-use-after-free
Closed

osd/ECExtentCache: prevent use-after-free by checking Object existence#64062
tchaikov wants to merge 1 commit intoceph:mainfrom
tchaikov:wip-ec-cache-fix-use-after-free

Conversation

@tchaikov
Copy link
Contributor

@tchaikov tchaikov commented Jun 20, 2025

Prevent use-after-free errors when clearing cache lines by preserving member variables and verifying Object existence before cleanup.

When clearing the last cache line referencing an Object, the Object may be automatically removed from the cache and deleted. Calling delete_maybe() on a deleted Object would result in a use-after-free error.

This change:

  • Preserves necessary member variables on the stack before lines.clear()
  • Queries the cache with the Object's oid to check it still exists
  • Only calls delete_maybe() if the Object is still present in cache

This ensures safe cleanup by avoiding operations on potentially deleted Objects while maintaining proper cache management.

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands

@tchaikov tchaikov requested a review from a team as a code owner June 20, 2025 09:58
@github-actions github-actions bot added the core label Jun 20, 2025
@tchaikov
Copy link
Contributor Author

please see the attached report from ASan:

[ RUN      ] ECExtentCache.double_write_done
=================================================================
==438962==ERROR: AddressSanitizer: heap-use-after-free on address 0x7c9a7a1e36e0 at pc 0x559f62a8af27 bp 0x7ffdd92c74d0 sp 0x7ffdd92c74c0
READ of size 8 at 0x7c9a7a1e36e0 thread T0
    #0 0x559f62a8af26 in std::_Rb_tree<unsigned long, std::pair<unsigned long const, std::weak_ptr<ECExtentCache::Line> >, std::_Select1st<std::pair<unsigned long const, std::weak_ptr<ECExtentCache::Line> > >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, std::weak_ptr<ECExtentCache::Line> > > >::empty() const /usr/include/c++/15.1.1/bits/stl_tree.h:1653
    #1 0x559f62a81465 in std::map<unsigned long, std::weak_ptr<ECExtentCache::Line>, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, std::weak_ptr<ECExtentCache::Line> > > >::empty() const /usr/include/c++/15.1.1/bits/stl_map.h:501
    #2 0x559f62a717fb in ECExtentCache::Object::delete_maybe() const /home/kefu/dev/ceph/src/osd/ECExtentCache.cc:159
    #3 0x559f62a717d9 in ECExtentCache::Object::unpin(ECExtentCache::Op&) const /home/kefu/dev/ceph/src/osd/ECExtentCache.cc:155
    #4 0x559f62a7309f in ECExtentCache::Op::~Op() /home/kefu/dev/ceph/src/osd/ECExtentCache.cc:291
    #5 0x559f62ad2d36 in void std::destroy_at<ECExtentCache::Op>(ECExtentCache::Op*) /usr/include/c++/15.1.1/bits/stl_construct.h:88
    #6 0x559f62ad2ce5 in void std::_Destroy<ECExtentCache::Op>(ECExtentCache::Op*) /usr/include/c++/15.1.1/bits/stl_construct.h:164
    #7 0x559f62ad2589 in void std::allocator_traits<std::allocator<void> >::destroy<ECExtentCache::Op>(std::allocator<void>&, ECExtentCache::Op*) /usr/include/c++/15.1.1/bits/alloc_traits.h:819
    #8 0x559f62ad2589 in std::_Sp_counted_ptr_inplace<ECExtentCache::Op, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() /usr/include/c++/15.1.1/bits/shared_ptr_base.h:615
    #9 0x559f62945c81 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (/home/kefu/dev/ceph/build/bin/unittest_extent_cache+0x319c81) (BuildId: 5cf6ae960585895dd6571b2ac20297385395c27e)
    #10 0x559f62951bd7 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count() /usr/include/c++/15.1.1/bits/shared_ptr_base.h:1069
    #11 0x559f629508b1 in std::__shared_ptr<ECExtentCache::Op, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr() /usr/include/c++/15.1.1/bits/shared_ptr_base.h:1531
    #12 0x559f629508cd in std::shared_ptr<ECExtentCache::Op>::~shared_ptr() /usr/include/c++/15.1.1/bits/shared_ptr.h:175
    #13 0x559f62965292 in std::_Optional_payload_base<std::shared_ptr<ECExtentCache::Op> >::_M_destroy() /usr/include/c++/15.1.1/optional:307
    #14 0x559f6295fdb4 in std::_Optional_payload_base<std::shared_ptr<ECExtentCache::Op> >::_M_reset() /usr/include/c++/15.1.1/optional:338
    #15 0x559f62957415 in std::_Optional_payload<std::shared_ptr<ECExtentCache::Op>, false, false, false>::~_Optional_payload() /usr/include/c++/15.1.1/optional:461
    #16 0x559f629508e9 in std::_Optional_base<std::shared_ptr<ECExtentCache::Op>, false, false>::~_Optional_base() /usr/include/c++/15.1.1/optional:485
    #17 0x559f62950905 in std::optional<std::shared_ptr<ECExtentCache::Op> >::~optional() /usr/include/c++/15.1.1/optional:778
    #18 0x559f6291886f in ECExtentCache_double_write_done_Test::TestBody() /home/kefu/dev/ceph/src/test/osd/test_extent_cache.cc:124
    #19 0x559f62a388bd in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/kefu/dev/ceph/src/googletest/googletest/src/gtest.cc:2653
    #20 0x559f62a23bd4 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/kefu/dev/ceph/src/googletest/googletest/src/gtest.cc:2689
    #21 0x559f629cfca2 in testing::Test::Run() /home/kefu/dev/ceph/src/googletest/googletest/src/gtest.cc:2728
    #22 0x559f629d1201 in testing::TestInfo::Run() /home/kefu/dev/ceph/src/googletest/googletest/src/gtest.cc:2874
    #23 0x559f629d23aa in testing::TestSuite::Run() /home/kefu/dev/ceph/src/googletest/googletest/src/gtest.cc:3052
    #24 0x559f629f380d in testing::internal::UnitTestImpl::RunAllTests() /home/kefu/dev/ceph/src/googletest/googletest/src/gtest.cc:6004
    #25 0x559f62a3c838 in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/kefu/dev/ceph/src/googletest/googletest/src/gtest.cc:2653
    #26 0x559f62a267b0 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/kefu/dev/ceph/src/googletest/googletest/src/gtest.cc:2689
    #27 0x559f629f0057 in testing::UnitTest::Run() /home/kefu/dev/ceph/src/googletest/googletest/src/gtest.cc:5583
    #28 0x559f62979ae2 in RUN_ALL_TESTS() /home/kefu/dev/ceph/src/googletest/googletest/include/gtest/gtest.h:2334
    #29 0x559f62979a3a in main /home/kefu/dev/ceph/src/googletest/googlemock/src/gmock_main.cc:71
    #30 0x7f2a7d8376b4  (/usr/lib/libc.so.6+0x276b4) (BuildId: 468e3585c794491a48ea75fceb9e4d6b1464fc35)
    #31 0x7f2a7d837768 in __libc_start_main (/usr/lib/libc.so.6+0x27768) (BuildId: 468e3585c794491a48ea75fceb9e4d6b1464fc35)
    #32 0x559f629166e4 in _start (/home/kefu/dev/ceph/build/bin/unittest_extent_cache+0x2ea6e4) (BuildId: 5cf6ae960585895dd6571b2ac20297385395c27e)

0x7c9a7a1e36e0 is located 480 bytes inside of 656-byte region [0x7c9a7a1e3500,0x7c9a7a1e3790)
freed by thread T0 here:
    #0 0x7f2a83122b8d in operator delete(void*, unsigned long) /usr/src/debug/gcc/gcc/libsanitizer/asan/asan_new_delete.cpp:155
    #1 0x559f62973f41 in std::__new_allocator<std::_Rb_tree_node<std::pair<hobject_t const, ECExtentCache::Object> > >::deallocate(std::_Rb_tree_node<std::pair<hobject_t const, ECExtentCache::Object> >*, unsigned long) /usr/include/c++/15.1.1/bits/new_allocator.h:172
    #2 0x559f62969a33 in std::allocator<std::_Rb_tree_node<std::pair<hobject_t const, ECExtentCache::Object> > >::deallocate(std::_Rb_tree_node<std::pair<hobject_t const, ECExtentCache::Object> >*, unsigned long) /usr/include/c++/15.1.1/bits/allocator.h:215
    #3 0x559f62969a33 in std::allocator_traits<std::allocator<std::_Rb_tree_node<std::pair<hobject_t const, ECExtentCache::Object> > > >::deallocate(std::allocator<std::_Rb_tree_node<std::pair<hobject_t const, ECExtentCache::Object> > >&, std::_Rb_tree_node<std::pair<hobject_t const, ECExtentCache::Object> >*, unsigned long) /usr/include/c++/15.1.1/bits/alloc_traits.h:649
    #4 0x559f62969a33 in std::_Rb_tree<hobject_t, std::pair<hobject_t const, ECExtentCache::Object>, std::_Select1st<std::pair<hobject_t const, ECExtentCache::Object> >, std::less<hobject_t>, std::allocator<std::pair<hobject_t const, ECExtentCache::Object> > >::_M_put_node(std::_Rb_tree_node<std::pair<hobject_t const, ECExtentCache::Object> >*) /usr/include/c++/15.1.1/bits/stl_tree.h:1191
    #5 0x559f629640d3 in std::_Rb_tree<hobject_t, std::pair<hobject_t const, ECExtentCache::Object>, std::_Select1st<std::pair<hobject_t const, ECExtentCache::Object> >, std::less<hobject_t>, std::allocator<std::pair<hobject_t const, ECExtentCache::Object> > >::_M_drop_node(std::_Rb_tree_node<std::pair<hobject_t const, ECExtentCache::Object> >*) /usr/include/c++/15.1.1/bits/stl_tree.h:1274
    #6 0x559f6295e57c in std::_Rb_tree<hobject_t, std::pair<hobject_t const, ECExtentCache::Object>, std::_Select1st<std::pair<hobject_t const, ECExtentCache::Object> >, std::less<hobject_t>, std::allocator<std::pair<hobject_t const, ECExtentCache::Object> > >::_M_erase(std::_Rb_tree_node<std::pair<hobject_t const, ECExtentCache::Object> >*) /usr/include/c++/15.1.1/bits/stl_tree.h:2590
    #7 0x559f62a9d5eb in std::_Rb_tree<hobject_t, std::pair<hobject_t const, ECExtentCache::Object>, std::_Select1st<std::pair<hobject_t const, ECExtentCache::Object> >, std::less<hobject_t>, std::allocator<std::pair<hobject_t const, ECExtentCache::Object> > >::clear() /usr/include/c++/15.1.1/bits/stl_tree.h:1878
    #8 0x559f62a941db in std::_Rb_tree<hobject_t, std::pair<hobject_t const, ECExtentCache::Object>, std::_Select1st<std::pair<hobject_t const, ECExtentCache::Object> >, std::less<hobject_t>, std::allocator<std::pair<hobject_t const, ECExtentCache::Object> > >::_M_erase_aux(std::_Rb_tree_const_iterator<std::pair<hobject_t const, ECExtentCache::Object> >, std::_Rb_tree_const_iterator<std::pair<hobject_t const, ECExtentCache::Object> >) /usr/include/c++/15.1.1/bits/stl_tree.h:3127
    #9 0x559f62a8b0cb in std::_Rb_tree<hobject_t, std::pair<hobject_t const, ECExtentCache::Object>, std::_Select1st<std::pair<hobject_t const, ECExtentCache::Object> >, std::less<hobject_t>, std::allocator<std::pair<hobject_t const, ECExtentCache::Object> > >::erase(hobject_t const&) /usr/include/c++/15.1.1/bits/stl_tree.h:3141
    #10 0x559f62a8148a in std::map<hobject_t, ECExtentCache::Object, std::less<hobject_t>, std::allocator<std::pair<hobject_t const, ECExtentCache::Object> > >::erase(hobject_t const&) /usr/include/c++/15.1.1/bits/stl_map.h:1159
    #11 0x559f62a71892 in ECExtentCache::Object::delete_maybe() const /home/kefu/dev/ceph/src/osd/ECExtentCache.cc:160
    #12 0x559f62a71ccc in ECExtentCache::Object::erase_line(unsigned long) /home/kefu/dev/ceph/src/osd/ECExtentCache.cc:175
    #13 0x559f62a7b81d in ECExtentCache::Line::~Line() /home/kefu/dev/ceph/src/osd/ECExtentCache.h:299
    #14 0x559f62ad2d51 in void std::destroy_at<ECExtentCache::Line>(ECExtentCache::Line*) /usr/include/c++/15.1.1/bits/stl_construct.h:88
    #15 0x559f62ad2d00 in void std::_Destroy<ECExtentCache::Line>(ECExtentCache::Line*) /usr/include/c++/15.1.1/bits/stl_construct.h:164
    #16 0x559f62ad2819 in void std::allocator_traits<std::allocator<void> >::destroy<ECExtentCache::Line>(std::allocator<void>&, ECExtentCache::Line*) /usr/include/c++/15.1.1/bits/alloc_traits.h:819
    #17 0x559f62ad2819 in std::_Sp_counted_ptr_inplace<ECExtentCache::Line, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() /usr/include/c++/15.1.1/bits/shared_ptr_base.h:615
    #18 0x559f629585ad in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release_last_use() /usr/include/c++/15.1.1/bits/shared_ptr_base.h:174
    #19 0x559f62951b8b in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release_last_use_cold() /usr/include/c++/15.1.1/bits/shared_ptr_base.h:198
    #20 0x559f62945e08 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (/home/kefu/dev/ceph/build/bin/unittest_extent_cache+0x319e08) (BuildId: 5cf6ae960585895dd6571b2ac20297385395c27e)
    #21 0x559f62951bd7 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count() /usr/include/c++/15.1.1/bits/shared_ptr_base.h:1069
    #22 0x559f62a7b8a1 in std::__shared_ptr<ECExtentCache::Line, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr() /usr/include/c++/15.1.1/bits/shared_ptr_base.h:1531
    #23 0x559f62a7b8bd in std::shared_ptr<ECExtentCache::Line>::~shared_ptr() /usr/include/c++/15.1.1/bits/shared_ptr.h:175
    #24 0x559f62aa4ef7 in void std::destroy_at<std::shared_ptr<ECExtentCache::Line> >(std::shared_ptr<ECExtentCache::Line>*) /usr/include/c++/15.1.1/bits/stl_construct.h:88
    #25 0x559f62a93b3d in void std::allocator_traits<std::allocator<std::_List_node<std::shared_ptr<ECExtentCache::Line> > > >::destroy<std::shared_ptr<ECExtentCache::Line> >(std::allocator<std::_List_node<std::shared_ptr<ECExtentCache::Line> > >&, std::shared_ptr<ECExtentCache::Line>*) /usr/include/c++/15.1.1/bits/alloc_traits.h:698
    #26 0x559f62a93b3d in std::__cxx11::_List_base<std::shared_ptr<ECExtentCache::Line>, std::allocator<std::shared_ptr<ECExtentCache::Line> > >::_M_destroy_node(std::_List_node<std::shared_ptr<ECExtentCache::Line> >*) /usr/include/c++/15.1.1/bits/stl_list.h:845
    #27 0x559f62a8aebd in std::__cxx11::_List_base<std::shared_ptr<ECExtentCache::Line>, std::allocator<std::shared_ptr<ECExtentCache::Line> > >::_M_clear() /usr/include/c++/15.1.1/bits/list.tcc:76
    #28 0x559f62a8143d in std::__cxx11::list<std::shared_ptr<ECExtentCache::Line>, std::allocator<std::shared_ptr<ECExtentCache::Line> > >::clear() /usr/include/c++/15.1.1/bits/stl_list.h:2084
    #29 0x559f62a717cd in ECExtentCache::Object::unpin(ECExtentCache::Op&) const /home/kefu/dev/ceph/src/osd/ECExtentCache.cc:154
    #30 0x559f62a7309f in ECExtentCache::Op::~Op() /home/kefu/dev/ceph/src/osd/ECExtentCache.cc:291
    #31 0x559f62ad2d36 in void std::destroy_at<ECExtentCache::Op>(ECExtentCache::Op*) /usr/include/c++/15.1.1/bits/stl_construct.h:88
    #32 0x559f62ad2ce5 in void std::_Destroy<ECExtentCache::Op>(ECExtentCache::Op*) /usr/include/c++/15.1.1/bits/stl_construct.h:164
    #33 0x559f62ad2589 in void std::allocator_traits<std::allocator<void> >::destroy<ECExtentCache::Op>(std::allocator<void>&, ECExtentCache::Op*) /usr/include/c++/15.1.1/bits/alloc_traits.h:819
    #34 0x559f62ad2589 in std::_Sp_counted_ptr_inplace<ECExtentCache::Op, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() /usr/include/c++/15.1.1/bits/shared_ptr_base.h:615

previously allocated by thread T0 here:
    #0 0x7f2a83121a2d in operator new(unsigned long) /usr/src/debug/gcc/gcc/libsanitizer/asan/asan_new_delete.cpp:86
    #1 0x559f62ab25a8 in std::__new_allocator<std::_Rb_tree_node<std::pair<hobject_t const, ECExtentCache::Object> > >::allocate(unsigned long, void const*) /usr/include/c++/15.1.1/bits/new_allocator.h:151
    #2 0x559f62aa5124 in std::allocator<std::_Rb_tree_node<std::pair<hobject_t const, ECExtentCache::Object> > >::allocate(unsigned long) /usr/include/c++/15.1.1/bits/allocator.h:203
    #3 0x559f62aa5124 in std::allocator_traits<std::allocator<std::_Rb_tree_node<std::pair<hobject_t const, ECExtentCache::Object> > > >::allocate(std::allocator<std::_Rb_tree_node<std::pair<hobject_t const, ECExtentCache::Object> > >&, unsigned long) /usr/include/c++/15.1.1/bits/alloc_traits.h:614
    #4 0x559f62aa5124 in std::_Rb_tree<hobject_t, std::pair<hobject_t const, ECExtentCache::Object>, std::_Select1st<std::pair<hobject_t const, ECExtentCache::Object> >, std::less<hobject_t>, std::allocator<std::pair<hobject_t const, ECExtentCache::Object> > >::_M_get_node() /usr/include/c++/15.1.1/bits/stl_tree.h:1170
    #5 0x559f62a9e9a3 in std::_Rb_tree_node<std::pair<hobject_t const, ECExtentCache::Object> >* std::_Rb_tree<hobject_t, std::pair<hobject_t const, ECExtentCache::Object>, std::_Select1st<std::pair<hobject_t const, ECExtentCache::Object> >, std::less<hobject_t>, std::allocator<std::pair<hobject_t const, ECExtentCache::Object> > >::_M_create_node<hobject_t const&, ECExtentCache::Object>(hobject_t const&, ECExtentCache::Object&&) /usr/include/c++/15.1.1/bits/stl_tree.h:1253
    #6 0x559f62a95098 in std::_Rb_tree<hobject_t, std::pair<hobject_t const, ECExtentCache::Object>, std::_Select1st<std::pair<hobject_t const, ECExtentCache::Object> >, std::less<hobject_t>, std::allocator<std::pair<hobject_t const, ECExtentCache::Object> > >::_Auto_node::_Auto_node<hobject_t const&, ECExtentCache::Object>(std::_Rb_tree<hobject_t, std::pair<hobject_t const, ECExtentCache::Object>, std::_Select1st<std::pair<hobject_t const, ECExtentCache::Object> >, std::less<hobject_t>, std::allocator<std::pair<hobject_t const, ECExtentCache::Object> > >&, hobject_t const&, ECExtentCache::Object&&) /usr/include/c++/15.1.1/bits/stl_tree.h:2285
    #7 0x559f62a94df2 in std::_Rb_tree_iterator<std::pair<hobject_t const, ECExtentCache::Object> > std::_Rb_tree<hobject_t, std::pair<hobject_t const, ECExtentCache::Object>, std::_Select1st<std::pair<hobject_t const, ECExtentCache::Object> >, std::less<hobject_t>, std::allocator<std::pair<hobject_t const, ECExtentCache::Object> > >::_M_emplace_hint_unique<hobject_t const&, ECExtentCache::Object>(std::_Rb_tree_const_iterator<std::pair<hobject_t const, ECExtentCache::Object> >, hobject_t const&, ECExtentCache::Object&&) /usr/include/c++/15.1.1/bits/stl_tree.h:3084
    #8 0x559f62a8bd52 in std::_Rb_tree_iterator<std::pair<hobject_t const, ECExtentCache::Object> > std::map<hobject_t, ECExtentCache::Object, std::less<hobject_t>, std::allocator<std::pair<hobject_t const, ECExtentCache::Object> > >::emplace_hint<hobject_t const&, ECExtentCache::Object>(std::_Rb_tree_const_iterator<std::pair<hobject_t const, ECExtentCache::Object> >, hobject_t const&, ECExtentCache::Object&&) /usr/include/c++/15.1.1/bits/stl_map.h:663
    #9 0x559f62a8209f in std::pair<std::_Rb_tree_iterator<std::pair<hobject_t const, ECExtentCache::Object> >, bool> std::map<hobject_t, ECExtentCache::Object, std::less<hobject_t>, std::allocator<std::pair<hobject_t const, ECExtentCache::Object> > >::emplace<hobject_t const&, ECExtentCache::Object>(hobject_t const&, ECExtentCache::Object&&) /usr/include/c++/15.1.1/bits/stl_map.h:624
    #10 0x559f62a72b6b in ECExtentCache::prepare(std::unique_ptr<GenContext<std::shared_ptr<ECExtentCache::Op>&>, std::default_delete<GenContext<std::shared_ptr<ECExtentCache::Op>&> > >&&, hobject_t const&, std::optional<ECUtil::shard_extent_set_t> const&, ECUtil::shard_extent_set_t const&, unsigned long, unsigned long, bool) /home/kefu/dev/ceph/src/osd/ECExtentCache.cc:255
    #11 0x559f6293c023 in prepare<ECExtentCache_double_write_done_Test::TestBody()::<lambda(ECExtentCache::OpRef&)> > /home/kefu/dev/ceph/src/osd/ECExtentCache.h:369
    #12 0x559f62918778 in ECExtentCache_double_write_done_Test::TestBody() /home/kefu/dev/ceph/src/test/osd/test_extent_cache.cc:117
    #13 0x559f62a388bd in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/kefu/dev/ceph/src/googletest/googletest/src/gtest.cc:2653
    #14 0x559f62a23bd4 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/kefu/dev/ceph/src/googletest/googletest/src/gtest.cc:2689
    #15 0x559f629cfca2 in testing::Test::Run() /home/kefu/dev/ceph/src/googletest/googletest/src/gtest.cc:2728
    #16 0x559f629d1201 in testing::TestInfo::Run() /home/kefu/dev/ceph/src/googletest/googletest/src/gtest.cc:2874
    #17 0x559f629d23aa in testing::TestSuite::Run() /home/kefu/dev/ceph/src/googletest/googletest/src/gtest.cc:3052
    #18 0x559f629f380d in testing::internal::UnitTestImpl::RunAllTests() /home/kefu/dev/ceph/src/googletest/googletest/src/gtest.cc:6004
    #19 0x559f62a3c838 in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/kefu/dev/ceph/src/googletest/googletest/src/gtest.cc:2653
    #20 0x559f62a267b0 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/kefu/dev/ceph/src/googletest/googletest/src/gtest.cc:2689
    #21 0x559f629f0057 in testing::UnitTest::Run() /home/kefu/dev/ceph/src/googletest/googletest/src/gtest.cc:5583
    #22 0x559f62979ae2 in RUN_ALL_TESTS() /home/kefu/dev/ceph/src/googletest/googletest/include/gtest/gtest.h:2334
    #23 0x559f62979a3a in main /home/kefu/dev/ceph/src/googletest/googlemock/src/gmock_main.cc:71
    #24 0x7f2a7d8376b4  (/usr/lib/libc.so.6+0x276b4) (BuildId: 468e3585c794491a48ea75fceb9e4d6b1464fc35)
    #25 0x7f2a7d837768 in __libc_start_main (/usr/lib/libc.so.6+0x27768) (BuildId: 468e3585c794491a48ea75fceb9e4d6b1464fc35)
    #26 0x559f629166e4 in _start (/home/kefu/dev/ceph/build/bin/unittest_extent_cache+0x2ea6e4) (BuildId: 5cf6ae960585895dd6571b2ac20297385395c27e)

SUMMARY: AddressSanitizer: heap-use-after-free /home/kefu/dev/ceph/src/osd/ECExtentCache.cc:159 in ECExtentCache::Object::delete_maybe() const

@tchaikov tchaikov requested a review from aainscow June 20, 2025 09:59
@tchaikov
Copy link
Contributor Author

@aainscow hi Alex, could you please help review this change?

@tchaikov tchaikov mentioned this pull request Jun 20, 2025
14 tasks
Copy link
Contributor

@aainscow aainscow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes for now, happy to have a discussion on this.

delete_maybe();
if (pg.objects.contains(oid)) {
delete_maybe();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a fix for this in #63873. You don't need the if clause, it is good enough to simply delete the delete_maybe() as the delete always happens at a higher level.

Copy link
Contributor Author

@tchaikov tchaikov Jun 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aainscow as i explained in details in the commit message, we have to check for its existence. otherwise i still have the use-after-free failure.

regarding removing the delete_maybe(), i tried. but it failed a test. some object was not removed at the end of test.

i will paste the error messages from the failing test once i have access to my linux box.

Copy link
Contributor Author

@tchaikov tchaikov Jun 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aainscow this is the failing test if we drop delete_maybe() call in unpin().

226: [ RUN      ] ECExtentCache.test_invalidate
226: /home/kefu/dev/ceph/src/osd/ECExtentCache.cc: In function 'void ECExtentCache::on_change2() const' thread 7f3e21c00200 time 2025-06-21T06:43:37.436513+0800
226: /home/kefu/dev/ceph/src/osd/ECExtentCache.cc: 335: FAILED ceph_assert(objects.empty())
226:  ceph version Development (no_version) tentacle (dev - Debug)
226:  1: (ceph::ClibBackTrace::ClibBackTrace(int)+0x121) [0x7f3e1f4626b9]
226:  2: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x443) [0x7f3e1f45dc85]
226:  3: (ECExtentCache::on_change2() const+0x76) [0x55a174bc9550]
226:  4: (ECExtentCache_test_invalidate_Test::TestBody()+0x1f0a) [0x55a174a8950e]
226:  5: (void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*)+0xa4) [0x55a174b8e8be]
226:  6: (void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*)+0xfe) [0x55a174b79bd5]
226:  7: (testing::Test::Run()+0x173) [0x55a174b25ca3]
226:  8: (testing::TestInfo::Run()+0x396) [0x55a174b27202]
226:  9: (testing::TestSuite::Run()+0x367) [0x55a174b283ab]
226:  10: (testing::internal::UnitTestImpl::RunAllTests()+0x8c8) [0x55a174b4980e]
226:  11: (bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*)+0xa4) [0x55a174b92839]
226:  12: (bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*)+0xfe) [0x55a174b7c7b1]
226:  13: (testing::UnitTest::Run()+0x178) [0x55a174b46058]
226:  14: (RUN_ALL_TESTS()+0x11) [0x55a174acfae3]
226:  15: main()
226:  16: /usr/lib/libc.so.6(+0x276b5) [0x7f3e1c8376b5]
226:  17: __libc_start_main()
226:  18: _start()
2/2 Test #226: unittest_extent_cache ............Subprocess aborted***Exception:   0.09 sec

or, probably what you suggest was to remove it from Object::erase_line() ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I am investigating the failure on my branch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get this same failure in my branch, which does contain other fixes to the extent cache. You may well be hitting another failure.

I ran valgrind against the same test you were running and I don't see any use-after-free errors in my code anymore.

Can we wait until my PR has gone in and address this then?

Copy link
Contributor Author

@tchaikov tchaikov Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get this same failure in my branch, which does contain other fixes to the extent cache. You may well be hitting another failure.

I ran valgrind against the same test you were running and I don't see any use-after-free errors in my code anymore.

i am using ASan. probably, as you suggest, some other fixes already address it.

Can we wait until my PR has gone in and address this then?

sure. this PR is not a blocker at this moment. #56537 is pending on some other PRs as well.

Prevent use-after-free errors when clearing cache lines by preserving
member variables and verifying Object existence before cleanup.

When clearing the last cache line referencing an Object, the Object
may be automatically removed from the cache and deleted. Calling
delete_maybe() on a deleted Object would result in a use-after-free
error.

This change:
- Preserves necessary member variables on the stack before lines.clear()
- Queries the cache with the Object's oid to check it still exists
- Only calls delete_maybe() if the Object is still present in cache

This ensures safe cleanup by avoiding operations on potentially
deleted Objects while maintaining proper cache management.

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
@tchaikov tchaikov force-pushed the wip-ec-cache-fix-use-after-free branch from caf9014 to 9a6426e Compare June 28, 2025 13:39
@tchaikov
Copy link
Contributor Author

closing in favor of #63873.

@tchaikov tchaikov closed this Jul 16, 2025
@tchaikov tchaikov deleted the wip-ec-cache-fix-use-after-free branch July 16, 2025 00:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants