Skip to content

Fix incursive deadlock in ~Snapshot (#2277)#2280

Merged
JaySon-Huang merged 4 commits intopingcap:release-4.0from
ti-srebot:release-4.0-e9f28c717902
Jul 6, 2021
Merged

Fix incursive deadlock in ~Snapshot (#2277)#2280
JaySon-Huang merged 4 commits intopingcap:release-4.0from
ti-srebot:release-4.0-e9f28c717902

Conversation

@ti-srebot
Copy link
Collaborator

@ti-srebot ti-srebot commented Jun 30, 2021

cherry-pick #2277 to release-4.0
You can switch your code base to this Pull Request by using git-extras:

# In tics repo:
git pr https://github.com/pingcap/tics/pull/2280

After apply modifications, you can push your change to this PR via:

git push git@github.com:ti-srebot/tics.git pr/2280:release-4.0-e9f28c717902

What problem does this PR solve?

Issue Number: close #2249

Problem Summary:
Running ./dbms/src/Storages/Page/tests/page_stress_page_storage "" 120 8 1024, it will create 8 writing threads and 1024 reading threads.

It will throw the same exception as #2249. The backtrace is[1].
We can see that inside ~Snapshot, it calls removeExpiredSnapshots and release another snapshot. It cause incursive deadlock on vset->read_write_mutex.
#2229 Add some logic in removeExpiredSnapshots and make it more likely to run into this case.

[1] backtrace

(gdb) bt
#0  0x00007ffff67da387 in raise () from /lib64/libc.so.6
#1  0x00007ffff67dba78 in abort () from /lib64/libc.so.6
#2  0x00000000006ff9d5 in __gnu_cxx::__verbose_terminate_handler () at ../../.././libstdc++-v3/libsupc++/vterminate.cc:95
#3  0x0000000000682226 in __cxxabiv1::__terminate (handler=<optimized out>) at ../../.././libstdc++-v3/libsupc++/eh_terminate.cc:47
#4  0x0000000000683329 in __cxa_call_terminate (ue_header=ue_header@entry=0x7ffd262cf100) at ../../.././libstdc++-v3/libsupc++/eh_call.cc:54
#5  0x00000000006829d8 in __cxxabiv1::__gxx_personality_v0 (version=<optimized out>, actions=6, exception_class=5138137972254386944, ue_header=0x7ffd262cf100, context=<optimized out>) at ../../.././libstdc++-v3/libsupc++/eh_personality.cc:676
#6  0x0000000000716573 in _Unwind_RaiseException_Phase2 (exc=exc@entry=0x7ffd262cf100, context=context@entry=0x7fff715bda10) at ../.././libgcc/unwind.inc:62
#7  0x0000000000716d8f in _Unwind_RaiseException (exc=exc@entry=0x7ffd262cf100) at ../.././libgcc/unwind.inc:131
#8  0x00000000006839b6 in __cxxabiv1::__cxa_throw (obj=obj@entry=0x7ffd262cf120, tinfo=tinfo@entry=0x9c61d8 <typeinfo for std::system_error>, dest=dest@entry=0x68b910 <std::system_error::~system_error()>) at ../../.././libstdc++-v3/libsupc++/eh_throw.cc:88
#9  0x000000000068bbde in std::__throw_system_error (__i=__i@entry=35) at ../../../.././libstdc++-v3/src/c++11/system_error.cc:81
#10 0x0000000000520735 in std::__shared_mutex_pthread::lock_shared (this=<optimized out>) at /usr/local/include/c++/7.3.0/shared_mutex:142
#11 std::shared_mutex::lock_shared (this=<optimized out>) at /usr/local/include/c++/7.3.0/shared_mutex:335
#12 std::shared_lock<std::shared_mutex>::shared_lock (__m=..., this=<optimized out>) at /usr/local/include/c++/7.3.0/shared_mutex:553

// try to get incursive lock on vset->read_write_mutex
#13 DB::MVCC::VersionSetWithDelta<DB::PageEntriesForDelta, DB::PageEntriesView, DB::PageEntriesEdit, DB::DeltaVersionEditAcceptor>::compactOnDeltaRelease (tail=..., this=<optimized out>)
    at /data1/jaysonhuang/tics/dbms/src/Storages/Page/mvcc/VersionSetWithDelta.h:275
#14 DB::MVCC::VersionSetWithDelta<DB::PageEntriesForDelta, DB::PageEntriesView, DB::PageEntriesEdit, DB::DeltaVersionEditAcceptor>::Snapshot::~Snapshot (this=0x7ffd3b96a010, __in_chrg=<optimized out>)
    at /data1/jaysonhuang/tics/dbms/src/Storages/Page/mvcc/VersionSetWithDelta.h:144
#15 __gnu_cxx::new_allocator<DB::MVCC::VersionSetWithDelta<DB::PageEntriesForDelta, DB::PageEntriesView, DB::PageEntriesEdit, DB::DeltaVersionEditAcceptor>::Snapshot>::destroy<DB::MVCC::VersionSetWithDelta<DB::PageEntriesForDelta, DB::PageEntriesView, DB::PageEntriesEdit, DB::DeltaVersionEditAcceptor>::Snapshot> (this=<optimized out>, __p=<optimized out>) at /usr/local/include/c++/7.3.0/ext/new_allocator.h:140
#16 std::allocator_traits<std::allocator<DB::MVCC::VersionSetWithDelta<DB::PageEntriesForDelta, DB::PageEntriesView, DB::PageEntriesEdit, DB::DeltaVersionEditAcceptor>::Snapshot> >::destroy<DB::MVCC::VersionSetWithDelta<DB::PageEntriesForDelta, DB::PageEntriesView, DB::PageEntriesEdit, DB::DeltaVersionEditAcceptor>::Snapshot> (__a=..., __p=<optimized out>) at /usr/local/include/c++/7.3.0/bits/alloc_traits.h:487
#17 std::_Sp_counted_ptr_inplace<DB::MVCC::VersionSetWithDelta<DB::PageEntriesForDelta, DB::PageEntriesView, DB::PageEntriesEdit, DB::DeltaVersionEditAcceptor>::Snapshot, std::allocator<DB::MVCC::VersionSetWithDelta<DB::PageEntriesForDelta, DB::PageEntriesView, DB::PageEntriesEdit, DB::DeltaVersionEditAcceptor>::Snapshot>, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=0x7ffd3b96a000) at /usr/local/include/c++/7.3.0/bits/shared_ptr_base.h:535
#18 0x00000000005185d9 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7ffd3b96a000) at /usr/local/include/c++/7.3.0/bits/shared_ptr_base.h:154
#19 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=<synthetic pointer>, __in_chrg=<optimized out>) at /usr/local/include/c++/7.3.0/bits/shared_ptr_base.h:684
#20 std::__shared_ptr<DB::MVCC::VersionSetWithDelta<DB::PageEntriesForDelta, DB::PageEntriesView, DB::PageEntriesEdit, DB::DeltaVersionEditAcceptor>::Snapshot, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=<synthetic pointer>, __in_chrg=<optimized out>)
    at /usr/local/include/c++/7.3.0/bits/shared_ptr_base.h:1123
#21 std::shared_ptr<DB::MVCC::VersionSetWithDelta<DB::PageEntriesForDelta, DB::PageEntriesView, DB::PageEntriesEdit, DB::DeltaVersionEditAcceptor>::Snapshot>::~shared_ptr (this=<synthetic pointer>, __in_chrg=<optimized out>)
    at /usr/local/include/c++/7.3.0/bits/shared_ptr.h:93

// call `removeExpiredSnapshots`
#22 DB::MVCC::VersionSetWithDelta<DB::PageEntriesForDelta, DB::PageEntriesView, DB::PageEntriesEdit, DB::DeltaVersionEditAcceptor>::removeExpiredSnapshots (this=0x7ffff2e6f228) at /data1/jaysonhuang/tics/dbms/src/Storages/Page/mvcc/VersionSetWithDelta.h:320
#23 0x00000000005205dc in DB::MVCC::VersionSetWithDelta<DB::PageEntriesForDelta, DB::PageEntriesView, DB::PageEntriesEdit, DB::DeltaVersionEditAcceptor>::Snapshot::~Snapshot (this=0x7ffd26185790, __in_chrg=<optimized out>)
    at /data1/jaysonhuang/tics/dbms/src/Storages/Page/mvcc/VersionSetWithDelta.h:153
#24 __gnu_cxx::new_allocator<DB::MVCC::VersionSetWithDelta<DB::PageEntriesForDelta, DB::PageEntriesView, DB::PageEntriesEdit, DB::DeltaVersionEditAcceptor>::Snapshot>::destroy<DB::MVCC::VersionSetWithDelta<DB::PageEntriesForDelta, DB::PageEntriesView, DB::PageEntriesEdit, DB::DeltaVersionEditAcceptor>::Snapshot> (this=<optimized out>, __p=<optimized out>) at /usr/local/include/c++/7.3.0/ext/new_allocator.h:140
#25 std::allocator_traits<std::allocator<DB::MVCC::VersionSetWithDelta<DB::PageEntriesForDelta, DB::PageEntriesView, DB::PageEntriesEdit, DB::DeltaVersionEditAcceptor>::Snapshot> >::destroy<DB::MVCC::VersionSetWithDelta<DB::PageEntriesForDelta, DB::PageEntriesView, DB::PageEntriesEdit, DB::DeltaVersionEditAcceptor>::Snapshot> (__a=..., __p=<optimized out>) at /usr/local/include/c++/7.3.0/bits/alloc_traits.h:487
#26 std::_Sp_counted_ptr_inplace<DB::MVCC::VersionSetWithDelta<DB::PageEntriesForDelta, DB::PageEntriesView, DB::PageEntriesEdit, DB::DeltaVersionEditAcceptor>::Snapshot, std::allocator<DB::MVCC::VersionSetWithDelta<DB::PageEntriesForDelta, DB::PageEntriesView, DB::PageEntriesEdit, DB::DeltaVersionEditAcceptor>::Snapshot>, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=0x7ffd26185780) at /usr/local/include/c++/7.3.0/bits/shared_ptr_base.h:535
#27 0x00000000004f5b09 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7ffd26185780) at /usr/local/include/c++/7.3.0/bits/shared_ptr_base.h:154
#28 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x7fff715bdf48, __in_chrg=<optimized out>) at /usr/local/include/c++/7.3.0/bits/shared_ptr_base.h:684
#29 std::__shared_ptr<DB::MVCC::VersionSetWithDelta<DB::PageEntriesForDelta, DB::PageEntriesView, DB::PageEntriesEdit, DB::DeltaVersionEditAcceptor>::Snapshot, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x7fff715bdf40, __in_chrg=<optimized out>)
    at /usr/local/include/c++/7.3.0/bits/shared_ptr_base.h:1123
#30 std::shared_ptr<DB::MVCC::VersionSetWithDelta<DB::PageEntriesForDelta, DB::PageEntriesView, DB::PageEntriesEdit, DB::DeltaVersionEditAcceptor>::Snapshot>::~shared_ptr (this=0x7fff715bdf40, __in_chrg=<optimized out>)
    at /usr/local/include/c++/7.3.0/bits/shared_ptr.h:93
#31 PSReader::run (this=0x7ffff2e27ca0) at /data1/jaysonhuang/tics/dbms/src/Storages/Page/tests/stress_page_stroage.cpp:180
#32 0x000000000063b0e7 in Poco::PooledThread::run (this=0x7ffff2f62600) at /data1/jaysonhuang/tics/contrib/poco/Foundation/src/ThreadPool.cpp:214
#33 0x0000000000635a58 in Poco::ThreadImpl::runnableEntry (pThread=<optimized out>) at /data1/jaysonhuang/tics/contrib/poco/Foundation/src/Thread_STD.cpp:139
#34 0x00000000006a9baf in std::execute_native_thread_routine (__p=0x7ffff2e8c1c0) at ../../../.././libstdc++-v3/src/c++11/thread.cc:83
#35 0x00007ffff795cea5 in start_thread () from /lib64/libpthread.so.0
#36 0x00007ffff68a29fd in clone () from /lib64/libc.so.6

[2] https://github.com/pingcap/tics/blob/8a4e9a6745e540144201b12194f820dea3ea0f11/dbms/src/Storages/Page/mvcc/VersionSetWithDelta.h#L149-L154

What is changed and how it works?

  • Do not call vset->removeExpiredSnapshots inside ~Snapshot, only remove useless snapshot weak_ptrs in PageEntriesVersionSetWithDelta::listAllLiveFiles
  • Add lock in MergeDeltaTaskPool::length (a small data race fixing)

Related changes

  • PR to update pingcap/docs/pingcap/docs-cn:
  • Need to cherry-pick to the release branch:

Check List

Tests

  • Manual test (add detailed scripts or steps below)
    • Running ./dbms/src/Storages/Page/tests/page_stress_page_storage "" 120 8 1024, it won't throw error "Resource deadlock avoided"

Side effects

Release note

  • Fix the panic issue that occurs when the read load is heavy

Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
@ti-srebot ti-srebot added CHERRY-PICK cherry pick status/LGT1 Indicates that a PR has LGTM 1. type/bugfix This PR fixes a bug. labels Jun 30, 2021
@ti-srebot ti-srebot requested a review from flowbehappy June 30, 2021 12:46
@ti-srebot ti-srebot added this to the v4.0.14 milestone Jun 30, 2021
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address the conflict

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved

JaySon-Huang and others added 2 commits July 6, 2021 14:58
@JaySon-Huang
Copy link
Contributor

/merge

@ti-srebot
Copy link
Collaborator Author

@JaySon-Huang Oops! This PR requires at least 1 LGTMs to merge. The current number of LGTM is 0

@JaySon-Huang
Copy link
Contributor

/run-all-tests

@flowbehappy flowbehappy self-requested a review July 6, 2021 09:03
Copy link
Contributor

@flowbehappy flowbehappy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@JaySon-Huang JaySon-Huang merged commit 858e93e into pingcap:release-4.0 Jul 6, 2021
@JaySon-Huang JaySon-Huang deleted the release-4.0-e9f28c717902 branch July 6, 2021 09:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CHERRY-PICK cherry pick status/LGT1 Indicates that a PR has LGTM 1. type/bugfix This PR fixes a bug.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants