Skip to content

Fix incursive deadlock in ~Snapshot(#2277)#2274

Merged
JaySon-Huang merged 2 commits intopingcap:release-5.0from
JaySon-Huang:try_fix_race
Jun 30, 2021
Merged

Fix incursive deadlock in ~Snapshot(#2277)#2274
JaySon-Huang merged 2 commits intopingcap:release-5.0from
JaySon-Huang:try_fix_race

Conversation

@JaySon-Huang
Copy link
Contributor

@JaySon-Huang JaySon-Huang commented Jun 30, 2021

cherry-pick of #2277 to release-5.0

What problem does this PR solve?

Issue Number: close #2249

Problem Summary:
Running ./dbms/src/Storages/Page/tests/page_stress_page_storage "" 120 8 1024, it will create 8 writing threads and 1024 reading threads.

It will throw the same exception as #2249. The backtrace is[1].
We can see that inside ~Snapshot, it calls removeExpiredSnapshots and release another snapshot. It cause incursive deadlock on vset->read_write_mutex.
#2229 Add some logic in removeExpiredSnapshots and make it more likely to run into this case.

[1] backtrace

(gdb) bt
#0  0x00007ffff67da387 in raise () from /lib64/libc.so.6
#1  0x00007ffff67dba78 in abort () from /lib64/libc.so.6
#2  0x00000000006ff9d5 in __gnu_cxx::__verbose_terminate_handler () at ../../.././libstdc++-v3/libsupc++/vterminate.cc:95
#3  0x0000000000682226 in __cxxabiv1::__terminate (handler=<optimized out>) at ../../.././libstdc++-v3/libsupc++/eh_terminate.cc:47
#4  0x0000000000683329 in __cxa_call_terminate (ue_header=ue_header@entry=0x7ffd262cf100) at ../../.././libstdc++-v3/libsupc++/eh_call.cc:54
#5  0x00000000006829d8 in __cxxabiv1::__gxx_personality_v0 (version=<optimized out>, actions=6, exception_class=5138137972254386944, ue_header=0x7ffd262cf100, context=<optimized out>) at ../../.././libstdc++-v3/libsupc++/eh_personality.cc:676
#6  0x0000000000716573 in _Unwind_RaiseException_Phase2 (exc=exc@entry=0x7ffd262cf100, context=context@entry=0x7fff715bda10) at ../.././libgcc/unwind.inc:62
#7  0x0000000000716d8f in _Unwind_RaiseException (exc=exc@entry=0x7ffd262cf100) at ../.././libgcc/unwind.inc:131
#8  0x00000000006839b6 in __cxxabiv1::__cxa_throw (obj=obj@entry=0x7ffd262cf120, tinfo=tinfo@entry=0x9c61d8 <typeinfo for std::system_error>, dest=dest@entry=0x68b910 <std::system_error::~system_error()>) at ../../.././libstdc++-v3/libsupc++/eh_throw.cc:88
#9  0x000000000068bbde in std::__throw_system_error (__i=__i@entry=35) at ../../../.././libstdc++-v3/src/c++11/system_error.cc:81
#10 0x0000000000520735 in std::__shared_mutex_pthread::lock_shared (this=<optimized out>) at /usr/local/include/c++/7.3.0/shared_mutex:142
#11 std::shared_mutex::lock_shared (this=<optimized out>) at /usr/local/include/c++/7.3.0/shared_mutex:335
#12 std::shared_lock<std::shared_mutex>::shared_lock (__m=..., this=<optimized out>) at /usr/local/include/c++/7.3.0/shared_mutex:553

// try to get incursive lock on vset->read_write_mutex
#13 DB::MVCC::VersionSetWithDelta<DB::PageEntriesForDelta, DB::PageEntriesView, DB::PageEntriesEdit, DB::DeltaVersionEditAcceptor>::compactOnDeltaRelease (tail=..., this=<optimized out>)
    at /data1/jaysonhuang/tics/dbms/src/Storages/Page/mvcc/VersionSetWithDelta.h:275
#14 DB::MVCC::VersionSetWithDelta<DB::PageEntriesForDelta, DB::PageEntriesView, DB::PageEntriesEdit, DB::DeltaVersionEditAcceptor>::Snapshot::~Snapshot (this=0x7ffd3b96a010, __in_chrg=<optimized out>)
    at /data1/jaysonhuang/tics/dbms/src/Storages/Page/mvcc/VersionSetWithDelta.h:144
#15 __gnu_cxx::new_allocator<DB::MVCC::VersionSetWithDelta<DB::PageEntriesForDelta, DB::PageEntriesView, DB::PageEntriesEdit, DB::DeltaVersionEditAcceptor>::Snapshot>::destroy<DB::MVCC::VersionSetWithDelta<DB::PageEntriesForDelta, DB::PageEntriesView, DB::PageEntriesEdit, DB::DeltaVersionEditAcceptor>::Snapshot> (this=<optimized out>, __p=<optimized out>) at /usr/local/include/c++/7.3.0/ext/new_allocator.h:140
#16 std::allocator_traits<std::allocator<DB::MVCC::VersionSetWithDelta<DB::PageEntriesForDelta, DB::PageEntriesView, DB::PageEntriesEdit, DB::DeltaVersionEditAcceptor>::Snapshot> >::destroy<DB::MVCC::VersionSetWithDelta<DB::PageEntriesForDelta, DB::PageEntriesView, DB::PageEntriesEdit, DB::DeltaVersionEditAcceptor>::Snapshot> (__a=..., __p=<optimized out>) at /usr/local/include/c++/7.3.0/bits/alloc_traits.h:487
#17 std::_Sp_counted_ptr_inplace<DB::MVCC::VersionSetWithDelta<DB::PageEntriesForDelta, DB::PageEntriesView, DB::PageEntriesEdit, DB::DeltaVersionEditAcceptor>::Snapshot, std::allocator<DB::MVCC::VersionSetWithDelta<DB::PageEntriesForDelta, DB::PageEntriesView, DB::PageEntriesEdit, DB::DeltaVersionEditAcceptor>::Snapshot>, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=0x7ffd3b96a000) at /usr/local/include/c++/7.3.0/bits/shared_ptr_base.h:535
#18 0x00000000005185d9 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7ffd3b96a000) at /usr/local/include/c++/7.3.0/bits/shared_ptr_base.h:154
#19 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=<synthetic pointer>, __in_chrg=<optimized out>) at /usr/local/include/c++/7.3.0/bits/shared_ptr_base.h:684
#20 std::__shared_ptr<DB::MVCC::VersionSetWithDelta<DB::PageEntriesForDelta, DB::PageEntriesView, DB::PageEntriesEdit, DB::DeltaVersionEditAcceptor>::Snapshot, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=<synthetic pointer>, __in_chrg=<optimized out>)
    at /usr/local/include/c++/7.3.0/bits/shared_ptr_base.h:1123
#21 std::shared_ptr<DB::MVCC::VersionSetWithDelta<DB::PageEntriesForDelta, DB::PageEntriesView, DB::PageEntriesEdit, DB::DeltaVersionEditAcceptor>::Snapshot>::~shared_ptr (this=<synthetic pointer>, __in_chrg=<optimized out>)
    at /usr/local/include/c++/7.3.0/bits/shared_ptr.h:93

// call `removeExpiredSnapshots`
#22 DB::MVCC::VersionSetWithDelta<DB::PageEntriesForDelta, DB::PageEntriesView, DB::PageEntriesEdit, DB::DeltaVersionEditAcceptor>::removeExpiredSnapshots (this=0x7ffff2e6f228) at /data1/jaysonhuang/tics/dbms/src/Storages/Page/mvcc/VersionSetWithDelta.h:320
#23 0x00000000005205dc in DB::MVCC::VersionSetWithDelta<DB::PageEntriesForDelta, DB::PageEntriesView, DB::PageEntriesEdit, DB::DeltaVersionEditAcceptor>::Snapshot::~Snapshot (this=0x7ffd26185790, __in_chrg=<optimized out>)
    at /data1/jaysonhuang/tics/dbms/src/Storages/Page/mvcc/VersionSetWithDelta.h:153
#24 __gnu_cxx::new_allocator<DB::MVCC::VersionSetWithDelta<DB::PageEntriesForDelta, DB::PageEntriesView, DB::PageEntriesEdit, DB::DeltaVersionEditAcceptor>::Snapshot>::destroy<DB::MVCC::VersionSetWithDelta<DB::PageEntriesForDelta, DB::PageEntriesView, DB::PageEntriesEdit, DB::DeltaVersionEditAcceptor>::Snapshot> (this=<optimized out>, __p=<optimized out>) at /usr/local/include/c++/7.3.0/ext/new_allocator.h:140
#25 std::allocator_traits<std::allocator<DB::MVCC::VersionSetWithDelta<DB::PageEntriesForDelta, DB::PageEntriesView, DB::PageEntriesEdit, DB::DeltaVersionEditAcceptor>::Snapshot> >::destroy<DB::MVCC::VersionSetWithDelta<DB::PageEntriesForDelta, DB::PageEntriesView, DB::PageEntriesEdit, DB::DeltaVersionEditAcceptor>::Snapshot> (__a=..., __p=<optimized out>) at /usr/local/include/c++/7.3.0/bits/alloc_traits.h:487
#26 std::_Sp_counted_ptr_inplace<DB::MVCC::VersionSetWithDelta<DB::PageEntriesForDelta, DB::PageEntriesView, DB::PageEntriesEdit, DB::DeltaVersionEditAcceptor>::Snapshot, std::allocator<DB::MVCC::VersionSetWithDelta<DB::PageEntriesForDelta, DB::PageEntriesView, DB::PageEntriesEdit, DB::DeltaVersionEditAcceptor>::Snapshot>, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=0x7ffd26185780) at /usr/local/include/c++/7.3.0/bits/shared_ptr_base.h:535
#27 0x00000000004f5b09 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7ffd26185780) at /usr/local/include/c++/7.3.0/bits/shared_ptr_base.h:154
#28 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x7fff715bdf48, __in_chrg=<optimized out>) at /usr/local/include/c++/7.3.0/bits/shared_ptr_base.h:684
#29 std::__shared_ptr<DB::MVCC::VersionSetWithDelta<DB::PageEntriesForDelta, DB::PageEntriesView, DB::PageEntriesEdit, DB::DeltaVersionEditAcceptor>::Snapshot, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x7fff715bdf40, __in_chrg=<optimized out>)
    at /usr/local/include/c++/7.3.0/bits/shared_ptr_base.h:1123
#30 std::shared_ptr<DB::MVCC::VersionSetWithDelta<DB::PageEntriesForDelta, DB::PageEntriesView, DB::PageEntriesEdit, DB::DeltaVersionEditAcceptor>::Snapshot>::~shared_ptr (this=0x7fff715bdf40, __in_chrg=<optimized out>)
    at /usr/local/include/c++/7.3.0/bits/shared_ptr.h:93
#31 PSReader::run (this=0x7ffff2e27ca0) at /data1/jaysonhuang/tics/dbms/src/Storages/Page/tests/stress_page_stroage.cpp:180
#32 0x000000000063b0e7 in Poco::PooledThread::run (this=0x7ffff2f62600) at /data1/jaysonhuang/tics/contrib/poco/Foundation/src/ThreadPool.cpp:214
#33 0x0000000000635a58 in Poco::ThreadImpl::runnableEntry (pThread=<optimized out>) at /data1/jaysonhuang/tics/contrib/poco/Foundation/src/Thread_STD.cpp:139
#34 0x00000000006a9baf in std::execute_native_thread_routine (__p=0x7ffff2e8c1c0) at ../../../.././libstdc++-v3/src/c++11/thread.cc:83
#35 0x00007ffff795cea5 in start_thread () from /lib64/libpthread.so.0
#36 0x00007ffff68a29fd in clone () from /lib64/libc.so.6

[2] https://github.com/pingcap/tics/blob/8a4e9a6745e540144201b12194f820dea3ea0f11/dbms/src/Storages/Page/mvcc/VersionSetWithDelta.h#L149-L154

What is changed and how it works?

  • Do not call vset->removeExpiredSnapshots inside ~Snapshot, only remove useless snapshot weak_ptrs in PageEntriesVersionSetWithDelta::listAllLiveFiles
  • Add lock in MergeDeltaTaskPool::length (a small data race fixing)

Related changes

  • PR to update pingcap/docs/pingcap/docs-cn:
  • Need to cherry-pick to the release branch:

Check List

Tests

  • Manual test (add detailed scripts or steps below)
    • Running ./dbms/src/Storages/Page/tests/page_stress_page_storage "" 120 8 1024, it won't throw error "Resource deadlock avoided"

Side effects

Release note

  • Fix the panic issue that occurs when the read load is heavy

Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>
Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>
@JaySon-Huang JaySon-Huang added the CHERRY-PICK cherry pick label Jun 30, 2021
@JaySon-Huang JaySon-Huang self-assigned this Jun 30, 2021
@JaySon-Huang JaySon-Huang requested a review from flowbehappy June 30, 2021 12:11
@JaySon-Huang JaySon-Huang added the type/bugfix This PR fixes a bug. label Jun 30, 2021
@JaySon-Huang
Copy link
Contributor Author

/run-all-tests

@JaySon-Huang JaySon-Huang changed the title Fix incursive deadlock in ~Snapshot Fix incursive deadlock in ~Snapshot(#2277) Jun 30, 2021
@JaySon-Huang JaySon-Huang added this to the v5.0.3 milestone Jun 30, 2021
Copy link
Contributor

@flowbehappy flowbehappy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CHERRY-PICK cherry pick status/LGT1 Indicates that a PR has LGTM 1. type/bugfix This PR fixes a bug.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants