Skip to content

tiflash compute node crash after injection some fault such as network partition #9378

@Lily2025

Description

@Lily2025

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

1、run ch
2、inject one of cn network partition

2. What did you expect to see? (Required)

no crash

3. What did you see instead (Required)

tiflash compute node crash

{"container":"errorlog","stream":"stdout","namespace":"ha-test-disagg-tiflash-tps-7624098-1-212","pod":"secondary-tc-tiflash-0","log":"[2024/08/27 04:27:07.278 +08:00] [ERROR] [BaseDaemon.cpp:560] ["\n 0x563f5e51235e\tfaultSignalHandler(int, siginfo_t*, void*) [tiflash+124760926]\n \tlibs/libdaemon/src/BaseDaemon.cpp:211\n 0x7f96629e46f0\t [libc.so.6+255728]\n 0x7f9662a3194c\t__pthread_kill_implementation [libc.so.6+571724]\n 0x7f96629e4646\t__GI_raise [libc.so.6+255558]\n 0x7f96629ce7f3\tabort [libc.so.6+165875]\n 0x7f96629cf130\t__libc_message.cold [libc.so.6+168240]\n 0x7f96629dd1d7\t__libc_assert_fail [libc.so.6+225751]\n 0x7f9662a38109\t__pthread_tpp_change_priority [libc.so.6+598281]\n 0x7f9662a329c5\t__pthread_mutex_lock_full [libc.so.6+575941]\n 0x7f9667b987c6\tstd::__1::mutex::lock() [libc++.so.1+587718]\n 0x563f5fb45843\tstd::__1::__function::__func<DB::StorageDisaggregated::buildReadTaskForWriteNodeTable(DB::Context const&, std::__1::shared_ptrDB::DM::ScanContext const&, DB::DM::DisaggTaskId const&, unsigned long, std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator> const&, std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator> const&, std::__1::mutex&, std::__1::list<std::__1::shared_ptrDB::DM::SegmentReadTask, std::__1::allocator<std::__1::shared_ptrDB::DM::SegmentReadTask>>&)::$_0, std::__1::allocator<DB::StorageDisaggregated::buildReadTaskForWriteNodeTable(DB::Context const&, std::__1::shared_ptrDB::DM::ScanContext const&, DB::DM::DisaggTaskId const&, unsigned long, std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator> const&, std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator> const&, std::__1::mutex&, std::__1::list<std::__1::shared_ptrDB::DM::SegmentReadTask, std::__1::allocator<std::__1::shared_ptrDB::DM::SegmentReadTask>>&)::$_0>, void ()>::operator()() (.139ff689715caee4ff84ce0b2eee41ae) [tiflash+148039747]\n \t/usr/local/bin/../include/c++/v1/__mutex/lock_guard.h:35\n 0x563f58ef4e65\tstd::__1::packaged_task<void ()>::operator()() [tiflash+34463333]\n \t/usr/local/bin/../include/c++/v1/future:1891\n 0x563f58ef3119\tDB::ThreadPoolImpl<DB::ThreadFromGlobalPoolImpl>::worker(std::__1::__list_iterator<DB::ThreadFromGlobalPoolImpl, void*>) [tiflash+34455833]\n \t/usr/local/bin/../include/c++/v1/__functional/function.h:517\n 0x563f58ef5973\tstd::__1::__function::__func<DB::ThreadFromGlobalPoolImpl::ThreadFromGlobalPoolImpl<void DB::ThreadPoolImpl<DB::ThreadFromGlobalPoolImpl>::scheduleImpl(std::__1::function<void ()>, long, std::__1::optional, bool)::'lambda0'()>(void&&)::'lambda'(), std::__1::allocator<DB::ThreadFromGlobalPoolImpl::ThreadFromGlobalPoolImpl<void DB::ThreadPoolImpl<DB::ThreadFromGlobalPoolImpl>::scheduleImpl(std::__1::function<void ()>, long, std::__1::optional, bool)::'lambda0'()>(void&&)::'lambda'()>, void ()>::operator()() [tiflash+34466163]\n \tdbms/src/Common/UniThreadPool.cpp:160\n 0x563f58ef4608\tvoid* std::__1::__thread_proxy[abi:ue170006]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_deletestd::__1::__thread_struct>, void DB::ThreadPoolImplstd::__1::thread::scheduleImpl(std::__1::function<void ()>, long, std::__1::optional, bool)::'lambda0'()>>(void*) [tiflash+34461192]\n \t/usr/local/bin/../include/c++/v1/__functional/function.h:517\n 0x7f9662a2fc02\tstart_thread [libc.so.6+564226]"] [source=BaseDaemon] [thread_id=145]\n","time":"2024-08-26T20:27:08.239802511Z"}

4. What is your TiFlash version? (Required)

/tiflash/tiflash version
TiFlash
Release Version: v8.4.0-alpha
Edition: Community
Git Commit Hash: 81cd947
Git Branch: heads/refs/tags/v8.4.0-alpha
UTC Build Time: 2024-08-26 11:38:41
Enable Features: jemalloc sm4(GmSSL) mem-profiling avx2 avx512 unwind thinlto
Profile: RELWITHDEBINFO
Compiler: clang++ 17.0.6

Raft Proxy
Git Commit Hash: f2e5fb8878eb51492c54f1094a847e0b958c6bb8
Git Commit Branch: HEAD
UTC Build Time: ""
Rust Version: rustc 1.77.0-nightly (89e2160c4 2023-12-27)
Storage Engine: tiflash
Prometheus Prefix: tiflash_proxy_
Profile: release
Enable Features: external-jemalloc portable sse test-engine-kv-rocksdb test-engine-raft-raft-engine openssl-vendored
2024-08-27T03:04:36.672+0800 INFO k8s/client.go:135 it should be noted that a long-running command will not be interrupted even the use case has ended. For more information, please refer to https://github.com/pingcap/test-infra/discussions/129
./br -V
Release Version: v8.4.0-alpha
Git Commit Hash: 4eeeef8a1bbf22c2fd1bdd1e61f303e5d52764e0
Git Branch: heads/refs/tags/v8.4.0-alpha
Go Version: go1.21.10
UTC Build Time: 2024-08-26 11:37:18
Race Enabled: false

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions