Skip to content

Infinite hang on remote coprocessor read #2529

@JaySon-Huang

Description

@JaySon-Huang

branch: v5.0.3

After @jiangyuzhao run a workload for some days, we find that one table can not be physically dropped. [1]
Using pstack, we can see that one thread is hang by a remote coprocessor read[2] stack_95908.log.zip

Related logs: query.log.zip

[1]

[2021/07/30 14:31:43.980 +08:00] [INFO] [<unknown>] ["SchemaSyncService: Performing GC using safe point 426672359660322816"] [thread_id=9820]
[2021/07/30 14:31:43.980 +08:00] [INFO] [<unknown>] ["SchemaSyncService: Physically dropping table test(1).t_28449(28449)"] [thread_id=9820]
[2021/07/30 14:31:44.981 +08:00] [INFO] [<unknown>] ["SchemaSyncService: Physically drop table test(1).t_28449(28449) is skipped, reason: locking attempt has timed out!DB::Exception: WRITE locking attempt on \"t_28449\" for  has timed out! (1000ms) Possible deadlock avoided. queries on drop:[] queries on alter:[]"] [thread_id=9820]
[2021/07/30 14:31:44.981 +08:00] [INFO] [<unknown>] ["SchemaSyncService: Performed GC using safe point 426672359660322816 meet error, will try again later"] [thread_id=9820]

[2]

## The start_ts is 426644067638050887
(gdb) p *cop_request
$9 = {<google::protobuf::Message> = {<No data fields>}, static kIndexInFileMessages = 4, static kRegionsFieldNumber = 4, static kDataFieldNumber = 3, static kContextFieldNumber = 1, static kTpFieldNumber = 2, static kStartTsFieldNumber = 5, static kSchemaVerFieldNumber = 6,
  _internal_metadata_ = {<google::protobuf::internal::InternalMetadataWithArenaBase<google::protobuf::UnknownFieldSet, google::protobuf::internal::InternalMetadataWithArena>> = {ptr_ = 0x0, static kPtrTagMask = 1, static kPtrValueMask = -2}, <No data fields>}, regions_ = {<google::protobuf::internal::RepeatedPtrFieldBase> = {static kInitialSize = 0,
      arena_ = 0x0, current_size_ = 1, total_size_ = 4, static kRepHeaderSize = 8, rep_ = 0x7f8f7848f110}, <No data fields>}, data_ = {ptr_ = 0x7f8f8de3fc60}, context_ = 0x7f8f8de50a80, tp_ = 103, start_ts_ = 426644067638050887, schema_ver_ = 37697, _cached_size_ = {size_ = {<std::__atomic_base<int>> = {static _S_alignment = 4, _M_i = 0},
      static is_always_lock_free = true}}}

Thread 162 (Thread 0x7f8fddbce700 (LWP 96105)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00000000083ef21c in __gthread_cond_wait (__mutex=<optimized out>, __cond=__cond@entry=0x7f8f87d0f7b0) at /root/flow/gcc-7.3.0/x86_64-pc-linux-gnu/libstdc++-v3/include/x86_64-pc-linux-gnu/bits/gthr-default.h:864
#2  std::condition_variable::wait (this=this@entry=0x7f8f87d0f7b0, __lock=...) at ../../../.././libstdc++-v3/src/c++11/condition_variable.cc:53
#3  0x0000000007884f4a in wait<pingcap::coprocessor::ResponseIter::next()::<lambda()> > (__p=..., __lock=..., this=0x7f8f87d0f7b0) at /usr/local/include/c++/7.3.0/condition_variable:99
#4  next (this=0x7f8f87d0f628) at /data2/yuzhao/tics/contrib/client-c/include/pingcap/coprocessor/Client.h:103
#5  DB::CoprocessorReader::nextResult (this=0x7f8f87d0f610) at /data2/yuzhao/tics/dbms/src/Flash/Coprocessor/CoprocessorReader.h:66
#6  0x000000000788d4f4 in DB::TiRemoteBlockInputStream<DB::CoprocessorReader>::fetchRemoteResult (this=this@entry=0x7f8f82c94010) at /data2/yuzhao/tics/dbms/src/DataStreams/TiRemoteBlockInputStream.h:99
#7  0x000000000788d6ad in DB::TiRemoteBlockInputStream<DB::CoprocessorReader>::fetchRemoteResult (this=this@entry=0x7f8f82c94010) at /data2/yuzhao/tics/dbms/src/DataStreams/TiRemoteBlockInputStream.h:151
#8  0x000000000788e028 in DB::TiRemoteBlockInputStream<DB::CoprocessorReader>::readImpl (this=0x7f8f82c94010) at /data2/yuzhao/tics/dbms/src/DataStreams/TiRemoteBlockInputStream.h:187
#9  0x000000000684991d in DB::IProfilingBlockInputStream::read (this=0x7f8f82c94010, res_filter=@0x7f8fddba0d38: 0x0, return_filter=return_filter@entry=false) at /data2/yuzhao/tics/dbms/src/DataStreams/IProfilingBlockInputStream.cpp:61
#10 0x0000000006849b07 in DB::IProfilingBlockInputStream::read (this=<optimized out>) at /data2/yuzhao/tics/dbms/src/DataStreams/IProfilingBlockInputStream.cpp:29
#11 0x00000000073e2b4b in DB::ExpressionBlockInputStream::readImpl (this=<optimized out>, this=<optimized out>) at /data2/yuzhao/tics/dbms/src/DataStreams/ExpressionBlockInputStream.cpp:36
#12 0x000000000684991d in DB::IProfilingBlockInputStream::read (this=0x7f8f87dd4590, res_filter=@0x7f8fddba0e78: 0xffffffff00000000, return_filter=return_filter@entry=false) at /data2/yuzhao/tics/dbms/src/DataStreams/IProfilingBlockInputStream.cpp:61
#13 0x0000000006849b07 in DB::IProfilingBlockInputStream::read (this=<optimized out>) at /data2/yuzhao/tics/dbms/src/DataStreams/IProfilingBlockInputStream.cpp:29
#14 0x00000000073e2b4b in DB::ExpressionBlockInputStream::readImpl (this=<optimized out>, this=<optimized out>) at /data2/yuzhao/tics/dbms/src/DataStreams/ExpressionBlockInputStream.cpp:36
#15 0x000000000684991d in DB::IProfilingBlockInputStream::read (this=0x7f8f87dd3e10, res_filter=@0x7f8fddba0fb8: 0x0, return_filter=return_filter@entry=false) at /data2/yuzhao/tics/dbms/src/DataStreams/IProfilingBlockInputStream.cpp:61
#16 0x0000000006849b07 in DB::IProfilingBlockInputStream::read (this=<optimized out>) at /data2/yuzhao/tics/dbms/src/DataStreams/IProfilingBlockInputStream.cpp:29
#17 0x00000000073e2b4b in DB::ExpressionBlockInputStream::readImpl (this=<optimized out>, this=<optimized out>) at /data2/yuzhao/tics/dbms/src/DataStreams/ExpressionBlockInputStream.cpp:36
#18 0x000000000684991d in DB::IProfilingBlockInputStream::read (this=0x7f8d9e1cff10, res_filter=@0x7f8fddba10f8: 0x3662e85 <MemoryTracker::alloc(long)+165>, return_filter=return_filter@entry=false) at /data2/yuzhao/tics/dbms/src/DataStreams/IProfilingBlockInputStream.cpp:61
#19 0x0000000006849b07 in DB::IProfilingBlockInputStream::read (this=<optimized out>) at /data2/yuzhao/tics/dbms/src/DataStreams/IProfilingBlockInputStream.cpp:29
#20 0x00000000073e2b4b in DB::ExpressionBlockInputStream::readImpl (this=<optimized out>, this=<optimized out>) at /data2/yuzhao/tics/dbms/src/DataStreams/ExpressionBlockInputStream.cpp:36
#21 0x000000000684991d in DB::IProfilingBlockInputStream::read (this=0x7f8dba4dc990, res_filter=@0x7f8fddba1238: 0x7f8fddba14b2, return_filter=return_filter@entry=false) at /data2/yuzhao/tics/dbms/src/DataStreams/IProfilingBlockInputStream.cpp:61
#22 0x0000000006849b07 in DB::IProfilingBlockInputStream::read (this=<optimized out>) at /data2/yuzhao/tics/dbms/src/DataStreams/IProfilingBlockInputStream.cpp:29
#23 0x00000000070ace1c in DB::ConcatBlockInputStream::readImpl (this=0x7f8f87dd2290) at /data2/yuzhao/tics/dbms/src/DataStreams/ConcatBlockInputStream.h:34
#24 0x000000000684991d in DB::IProfilingBlockInputStream::read (this=0x7f8f87dd2290, res_filter=@0x7f8fddba1418: 0x7f8fddba1ab0, return_filter=return_filter@entry=false) at /data2/yuzhao/tics/dbms/src/DataStreams/IProfilingBlockInputStream.cpp:61
#25 0x0000000006849b07 in DB::IProfilingBlockInputStream::read (this=<optimized out>) at /data2/yuzhao/tics/dbms/src/DataStreams/IProfilingBlockInputStream.cpp:29
#26 0x000000000774a008 in DB::Aggregator::execute (this=this@entry=0x7f8f82d8d170, stream=..., result=..., file_provider=...) at /data2/yuzhao/tics/dbms/src/Interpreters/Aggregator.cpp:1042
#27 0x00000000076ced57 in DB::AggregatingBlockInputStream::readImpl (this=0x7f8f82d8ce10) at /data2/yuzhao/tics/dbms/src/DataStreams/AggregatingBlockInputStream.cpp:33
#28 0x000000000684991d in DB::IProfilingBlockInputStream::read (this=0x7f8f82d8ce10, res_filter=@0x7f8fddba1b48: 0x7f8f87dd45b0, return_filter=return_filter@entry=false) at /data2/yuzhao/tics/dbms/src/DataStreams/IProfilingBlockInputStream.cpp:61
#29 0x0000000006849b07 in DB::IProfilingBlockInputStream::read (this=<optimized out>) at /data2/yuzhao/tics/dbms/src/DataStreams/IProfilingBlockInputStream.cpp:29
#30 0x00000000073e2b4b in DB::ExpressionBlockInputStream::readImpl (this=<optimized out>, this=<optimized out>) at /data2/yuzhao/tics/dbms/src/DataStreams/ExpressionBlockInputStream.cpp:36
#31 0x000000000684991d in DB::IProfilingBlockInputStream::read (this=0x7f8f87dd2a10, res_filter=@0x7f8fddba1c88: 0x7f8f87dd22b0, return_filter=return_filter@entry=false) at /data2/yuzhao/tics/dbms/src/DataStreams/IProfilingBlockInputStream.cpp:61
#32 0x0000000006849b07 in DB::IProfilingBlockInputStream::read (this=<optimized out>) at /data2/yuzhao/tics/dbms/src/DataStreams/IProfilingBlockInputStream.cpp:29
#33 0x00000000073e2b4b in DB::ExpressionBlockInputStream::readImpl (this=<optimized out>, this=<optimized out>) at /data2/yuzhao/tics/dbms/src/DataStreams/ExpressionBlockInputStream.cpp:36
#34 0x000000000684991d in DB::IProfilingBlockInputStream::read (this=0x7f8f9fa16290, res_filter=@0x7f8fddba1dc8: 0x7f8e67446f10, return_filter=return_filter@entry=false) at /data2/yuzhao/tics/dbms/src/DataStreams/IProfilingBlockInputStream.cpp:61
#35 0x0000000006849b07 in DB::IProfilingBlockInputStream::read (this=<optimized out>) at /data2/yuzhao/tics/dbms/src/DataStreams/IProfilingBlockInputStream.cpp:29
#36 0x000000000749689d in copyDataImpl<DB::copyData(DB::IBlockInputStream&, DB::IBlockOutputStream&, std::atomic<bool>*)::<lambda()>&, void (&)(const DB::Block&)> (progress=<optimized out>, is_cancelled=<synthetic pointer>, to=..., from=...) at /data2/yuzhao/tics/dbms/src/DataStreams/copyData.cpp:25
#37 DB::copyData (from=..., to=..., is_cancelled=is_cancelled@entry=0x0) at /data2/yuzhao/tics/dbms/src/DataStreams/copyData.cpp:64
#38 0x0000000007825788 in DB::DAGDriver<true>::execute (this=this@entry=0x7f8fddba31a0) at /data2/yuzhao/tics/dbms/src/Flash/Coprocessor/DAGDriver.cpp:107
#39 0x000000000785f9b3 in DB::BatchCoprocessorHandler::execute (this=this@entry=0x7f8fddba35c0) at /data2/yuzhao/tics/dbms/src/Flash/BatchCoprocessorHandler.cpp:69
#40 0x000000000780abf8 in DB::FlashService::<lambda()>::operator()(void) const (__closure=<optimized out>) at /data2/yuzhao/tics/dbms/src/Flash/FlashService.cpp:108
#41 0x000000000780acc4 in std::_Function_handler<grpc::Status(), DB::FlashService::BatchCoprocessor(grpc::ServerContext*, const coprocessor::BatchRequest*, grpc::ServerWriter<coprocessor::BatchResponse>*)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/local/include/c++/7.3.0/bits/std_function.h:302
#42 0x000000000780ecbb in operator() (this=<optimized out>) at /usr/local/include/c++/7.3.0/bits/std_function.h:706
#43 __invoke_impl<grpc::Status, std::function<grpc::Status()>&> (__f=...) at /usr/local/include/c++/7.3.0/bits/invoke.h:60
#44 __invoke<std::function<grpc::Status()>&> (__fn=...) at /usr/local/include/c++/7.3.0/bits/invoke.h:96
#45 operator() (__closure=<optimized out>) at /usr/local/include/c++/7.3.0/future:1421
#46 operator() (this=0x7f8fddba4850) at /usr/local/include/c++/7.3.0/future:1339
#47 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<grpc::Status>, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_state<std::function<grpc::Status ()>, std::allocator<int>, grpc::Status ()>::_M_run()::{lambda()#1}, grpc::Status> >::_M_invoke(std::_Any_data const&) (__functor=...) at /usr/local/include/c++/7.3.0/bits/std_function.h:302
#48 0x000000000772075f in std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*) ()
#49 0x00007f908ac88cf3 in pthread_once () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_once.S:103
#50 0x0000000007807f40 in __gthread_once (__func=<optimized out>, __once=0x7f8f963cb388) at /usr/local/include/c++/7.3.0/x86_64-pc-linux-gnu/bits/gthr-default.h:699
#51 call_once<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter>()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter>()>*, bool*> (__f=<unknown type in /data1/yuzhao/tiup_cluster_1/tidb-deploy/tiflash-49321/bin/tiflash/tiflash, CU 0x19ce74fe, DIE 0x19e64cea>, __once=...) at /usr/local/include/c++/7.3.0/mutex:684
#52 _M_set_result (__ignore_failure=false, __res=..., this=0x7f8f963cb370) at /usr/local/include/c++/7.3.0/future:401
#53 _M_run (this=0x7f8f963cb370) at /usr/local/include/c++/7.3.0/future:1423
#54 operator() (this=<optimized out>) at /usr/local/include/c++/7.3.0/future:1556
#55 operator() (__closure=<optimized out>) at /data2/yuzhao/tics/dbms/src/Flash/FlashService.cpp:306
#56 std::_Function_handler<void(), DB::FlashService::executeInThreadPool(const std::unique_ptr<ThreadPool>&, std::function<grpc::Status()>)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/local/include/c++/7.3.0/bits/std_function.h:316
#57 0x00000000079cd227 in operator() (this=0x7f8fddba48d0) at /usr/local/include/c++/7.3.0/bits/std_function.h:706
#58 ThreadPool::worker (this=0x7f9085f4b3c0) at /data2/yuzhao/tics/libs/libcommon/src/ThreadPool.cpp:92
#59 0x00000000083be12f in std::execute_native_thread_routine (__p=0x7f9071c7ef20) at ../../../.././libstdc++-v3/src/c++11/thread.cc:83
#60 0x00007f908ac82e63 in start_thread (arg=0x7f8fddbce700) at pthread_create.c:308
#61 0x00007f908a24755d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions