Skip to content

Assert that pthread_join succeeds.#32584

Merged
yashykt merged 4 commits intogrpc:masterfrom
laramiel:patch-1
Mar 21, 2023
Merged

Assert that pthread_join succeeds.#32584
yashykt merged 4 commits intogrpc:masterfrom
laramiel:patch-1

Conversation

@laramiel
Copy link
Copy Markdown
Contributor

@laramiel laramiel commented Mar 9, 2023

Long story here:

CallbackAlternativeCQ operates a thread pool which processes a completion queue and then directly invokes the completion function in the same thread. This thread pool is initialized on first Ref() and unallocated on last Unref().

When running an in-process synchronous server (as we do for tests, using this https://github.com/google/tensorstore/blob/master/tensorstore/internal/grpc/grpc_mock.h), called by an async() interface caller, if the async() callback happens to drop the last reference to the grpc Channel, then the channel shutdown will attempt to run in one of the the CallbackAlternativeCQ threads.

This will cause a deadlock/race condition, as CallbackAlternativeCQ is not designed to shutdown itself. When this deadlock happens, pthread_join(pthread_id_) will return EDEADLK and the thread will keep running. However EDEADLK is silently ignored by Join() so CallbackAlternativeCQ::Unref will continue to delete the underlying grpc_completion_queue, leading to a SIGSEGV later in the process.

This adds an assert that pthread_join succeeded, which is useful as it avoids a later SIGSEBV. Alternatively, the thread implementation could gpr_log the errorcode before asserting.

Example backtrace of crash:

frame #0: 0x0000000194f1e868 libsystem_kernel.dylib`__pthread_kill + 8
frame #1: 0x0000000194f55cec libsystem_pthread.dylib`pthread_kill + 288
frame #2: 0x0000000194e8e2c8 libsystem_c.dylib`abort + 180
frame #3: 0x0000000194e8d620 libsystem_c.dylib`__assert_rtn + 272
frame #4: 0x0000000100a64f50 grpc_kvstore_test`grpc_core::(anonymous namespace)::ThreadInternalsPosix::Join() + 188
frame #5: 0x00000001009c5dd0 grpc_kvstore_test`grpc_core::Thread::Join() + 56
frame #6: 0x0000000100154474 grpc_kvstore_test`grpc::(anonymous namespace)::CallbackAlternativeCQ::Unref() + 216
frame #7: 0x0000000100154390 grpc_kvstore_test`grpc::CompletionQueue::ReleaseCallbackAlternativeCQ(grpc::CompletionQueue*) + 120
frame #8: 0x000000010014130c grpc_kvstore_test`grpc::Channel::~Channel() + 220
frame #9: 0x00000001001413c8 grpc_kvstore_test`grpc::Channel::~Channel() + 28
frame #10: 0x000000010014d678 grpc_kvstore_test`std::__1::default_delete<grpc::Channel>::operator()(grpc::Channel*) const + 44
frame #11: 0x000000010014d358 grpc_kvstore_test`std::__1::__shared_ptr_pointer<grpc::Channel*, std::__1::shared_ptr<grpc::Channel>::__shared_ptr_default_delete<grpc::Channel, grpc::Channel>, std::__1::allocator<grpc::Channel> >::__on_zero_shared() + 72
frame #12: 0x000000010002ab5c grpc_kvstore_test`std::__1::__shared_count::__release_shared() + 60
frame #13: 0x000000010002ab00 grpc_kvstore_test`std::__1::__shared_weak_count::__release_shared() + 28
frame #14: 0x000000010002aad0 grpc_kvstore_test`std::__1::shared_ptr<grpc::ServerCredentials>::~shared_ptr() + 56
frame #15: 0x00000001000053ec grpc_kvstore_test`std::__1::shared_ptr<tensorstore_grpc::kvstore::grpc_gen::KvStoreService::Stub>::~shared_ptr() + 28
frame #16: 0x000000010014653c grpc_kvstore_test`grpc::ClientContext::~ClientContext() + 356
frame #17: 0x0000000100146570 grpc_kvstore_test`grpc::ClientContext::~ClientContext() + 28
frame #18: 0x00000001000ab000 grpc_kvstore_test`tensorstore::(anonymous namespace)::ReadTask::~ReadTask() + 68
frame #19: 0x00000001000aae90 grpc_kvstore_test`tensorstore::(anonymous namespace)::ReadTask::~ReadTask() + 28
frame #20: 0x00000001000aae18 grpc_kvstore_test`tensorstore::internal::intrusive_ptr_decrement(tensorstore::internal::AtomicReferenceCount<tensorstore::(anonymous namespace)::ReadTask> const*) + 68
frame #21: 0x00000001000aadc8 grpc_kvstore_test`void tensorstore::internal::DefaultIntrusivePtrTraits::decrement<tensorstore::(anonymous namespace)::ReadTask*>(tensorstore::(anonymous namespace)::ReadTask*) + 24
frame #22: 0x00000001000aad9c grpc_kvstore_test`tensorstore::internal::IntrusivePtr<tensorstore::(anonymous namespace)::ReadTask, tensorstore::internal::DefaultIntrusivePtrTraits>::~IntrusivePtr() + 52
frame #23: 0x00000001000a5994 grpc_kvstore_test`tensorstore::internal::IntrusivePtr<tensorstore::(anonymous namespace)::ReadTask, tensorstore::internal::DefaultIntrusivePtrTraits>::~IntrusivePtr() + 28
frame #24: 0x00000001000aac24 grpc_kvstore_test`tensorstore::(anonymous namespace)::ReadTask::Start(tensorstore_grpc::kvstore::grpc_gen::KvStoreService::StubInterface*, absl::Time)::'lambda'(grpc::Status)::~() + 40
frame #25: 0x00000001000a6280 grpc_kvstore_test`tensorstore::(anonymous namespace)::ReadTask::Start(tensorstore_grpc::kvstore::grpc_gen::KvStoreService::StubInterface*, absl::Time)::'lambda'(grpc::Status)::~() + 28
frame #26: 0x00000001000a84ac grpc_kvstore_test`std::__1::__compressed_pair_elem<tensorstore::(anonymous namespace)::ReadTask::Start(tensorstore_grpc::kvstore::grpc_gen::KvStoreService::StubInterface*, absl::Time)::'lambda'(grpc::Status), 0, false>::~__compressed_pair_elem() + 28
frame #27: 0x00000001000a86c0 grpc_kvstore_test`std::__1::__compressed_pair<tensorstore::(anonymous namespace)::ReadTask::Start(tensorstore_grpc::kvstore::grpc_gen::KvStoreService::StubInterface*, absl::Time)::'lambda'(grpc::Status), std::__1::allocator<tensorstore::(anonymous namespace)::ReadTask::Start(tensorstore_grpc::kvstore::grpc_gen::KvStoreService::StubInterface*, absl::Time)::'lambda'(grpc::Status)> >::~__compressed_pair() + 28
frame #28: 0x00000001000a8694 grpc_kvstore_test`std::__1::__compressed_pair<tensorstore::(anonymous namespace)::ReadTask::Start(tensorstore_grpc::kvstore::grpc_gen::KvStoreService::StubInterface*, absl::Time)::'lambda'(grpc::Status), std::__1::allocator<tensorstore::(anonymous namespace)::ReadTask::Start(tensorstore_grpc::kvstore::grpc_gen::KvStoreService::StubInterface*, absl::Time)::'lambda'(grpc::Status)> >::~__compressed_pair() + 28
frame #29: 0x00000001000a990c grpc_kvstore_test`std::__1::__function::__alloc_func<tensorstore::(anonymous namespace)::ReadTask::Start(tensorstore_grpc::kvstore::grpc_gen::KvStoreService::StubInterface*, absl::Time)::'lambda'(grpc::Status), std::__1::allocator<tensorstore::(anonymous namespace)::ReadTask::Start(tensorstore_grpc::kvstore::grpc_gen::KvStoreService::StubInterface*, absl::Time)::'lambda'(grpc::Status)>, void (grpc::Status)>::destroy() + 24
frame #30: 0x00000001000a7ea0 grpc_kvstore_test`std::__1::__function::__func<tensorstore::(anonymous namespace)::ReadTask::Start(tensorstore_grpc::kvstore::grpc_gen::KvStoreService::StubInterface*, absl::Time)::'lambda'(grpc::Status), std::__1::allocator<tensorstore::(anonymous namespace)::ReadTask::Start(tensorstore_grpc::kvstore::grpc_gen::KvStoreService::StubInterface*, absl::Time)::'lambda'(grpc::Status)>, void (grpc::Status)>::destroy() + 28
frame #31: 0x00000001000aabbc grpc_kvstore_test`std::__1::__function::__value_func<void (grpc::Status)>::~__value_func() + 68
frame #32: 0x00000001000aab68 grpc_kvstore_test`std::__1::__function::__value_func<void (grpc::Status)>::~__value_func() + 28
frame #33: 0x00000001000aab3c grpc_kvstore_test`std::__1::function<void (grpc::Status)>::~function() + 28
frame #34: 0x00000001000a6254 grpc_kvstore_test`std::__1::function<void (grpc::Status)>::~function() + 28
frame #35: 0x0000000100108ae0 grpc_kvstore_test`grpc::internal::CallbackWithStatusTag::Run(bool) + 368
frame #36: 0x0000000100108964 grpc_kvstore_test`grpc::internal::CallbackWithStatusTag::StaticRun(grpc_completion_queue_functor*, int) + 44
frame #37: 0x0000000100154cb0 grpc_kvstore_test`grpc::(anonymous namespace)::CallbackAlternativeCQ::ThreadLoop(void*) + 356
frame #38: 0x0000000100a650b8 grpc_kvstore_test`grpc_core::(anonymous namespace)::ThreadInternalsPosix::ThreadInternalsPosix(char const*, void (*)(void*), void*, bool*, grpc_core::Thread::Options const&)::'lambda'(void*)::operator()(void*) const + 240
frame #39: 0x0000000100a64fbc grpc_kvstore_test`grpc_core::(anonymous namespace)::ThreadInternalsPosix::ThreadInternalsPosix(char const*, void (*)(void*), void*, bool*, grpc_core::Thread::Options const&)::'lambda'(void*)::__invoke(void*) + 28
frame #40: 0x0000000194f5606c libsystem_pthread.dylib`_pthread_start + 148

Long story here:

CallbackAlternativeCQ operates a thread pool which processes a completion queue and then directly invoking the functions in the thread.
This thread pool is initialized on first Ref() and unallocated on last Unref().

When running an in-process synchronous server (as we do for tests, using this https://github.com/google/tensorstore/blob/master/tensorstore/internal/grpc/grpc_mock.h), when called by an async() interface caller, if the async() callback happens to drop the last reference to the grpc Channel, then the channel shutdown will attempt to run in one of the the CallbackAlternativeCQ threads.

This will cause a deadlock/race condition, as CallbackAlternativeCQ is not designed to shutdown itself.  When this deadlock happens, pthread_join(pthread_id_) will return EDEADLK and the thread will keep running. However EDEADLK is silently ignored by Join() so CallbackAlternativeCQ will continue to delete the underlying grpc_completion_queue, leading to a SIGSEGV later in the process.

It's useful in this case to assert that pthread_join succeeded.  Alternatively, the thread implementation could gpr_log the errorcode before asserting.
@laramiel
Copy link
Copy Markdown
Contributor Author

laramiel commented Mar 9, 2023

I think that my specific completion queue issue can happen whenever there is (1) a channel which is owned by (2) an object which is deleted by an async callback:

struct Task {
  grpc::ClientContext context;
  std::shared_ptr<grpc::Channel> channel;
  RequestProto request;
  ResponseProto response;

  static void DoCall(std::shared_ptr<Task> task) {
     Service::NewStub(task->channel)->async()->Call(&task->context, &task->request, &task->response,
        [task](::grpc::Status s) {
          ConsumeProto(std::move(s), response);
          /// drop task reference, which happens on the completion queue thread.
      });
  }
};

auto task = std::make_shared<Task>();
task->channel = grpc::CreateChannel(address, grpc::InsecureCredentials());
Task::DoCall(std::move(task));

@laramiel
Copy link
Copy Markdown
Contributor Author

laramiel commented Mar 9, 2023

And it seems likely that this resource management pattern is used elsewhere in grpc. Here are a few candidates:

src/core/lib/iomgr/executor.c
src/core/tsi/alts/handshaker/alts_shared_resource.cc

copybara-service bot pushed a commit to google/tensorstore that referenced this pull request Mar 10, 2023
Reworked several methods from grpc_kvstore, as I discovered a deadlock
in grpc that I needed to workaround. Described more here:
grpc/grpc#32584

PiperOrigin-RevId: 515472854
Change-Id: I0f0909929680bd6d26d1e240360d739f10401773
Update Join() to log a failure prior to crashing.
@yashykt yashykt added kokoro:force-run release notes: no Indicates if PR should not be in release notes labels Mar 13, 2023
@yashykt
Copy link
Copy Markdown
Member

yashykt commented Mar 15, 2023

Please run tools/distrib/check_redundant_namespace_qualifiers.py

@yashykt
Copy link
Copy Markdown
Member

yashykt commented Mar 21, 2023

Re-running the tests. Will merge once the tests go green

@yashykt yashykt merged commit 3598c9f into grpc:master Mar 21, 2023
@copybara-service copybara-service bot added the imported Specifies if the PR has been imported to the internal repository label Mar 22, 2023
XuanWang-Amos pushed a commit to XuanWang-Amos/grpc that referenced this pull request May 1, 2023
Long story here:

CallbackAlternativeCQ operates a thread pool which processes a
completion queue and then directly invokes the completion function in
the same thread. This thread pool is initialized on first Ref() and
unallocated on last Unref().

When running an in-process synchronous server (as we do for tests, using
this
https://github.com/google/tensorstore/blob/master/tensorstore/internal/grpc/grpc_mock.h),
called by an async() interface caller, if the async() callback happens
to drop the last reference to the grpc Channel, then the channel
shutdown will attempt to run in one of the the CallbackAlternativeCQ
threads.

This will cause a deadlock/race condition, as `CallbackAlternativeCQ` is
not designed to shutdown itself. When this deadlock happens,
`pthread_join(pthread_id_)` will return `EDEADLK` and the thread will
keep running. However `EDEADLK` is silently ignored by Join() so
`CallbackAlternativeCQ::Unref` will continue to delete the underlying
grpc_completion_queue, leading to a `SIGSEGV` later in the process.


https://github.com/grpc/grpc/blob/97ba9871324cb68b93f22fd1860934392cd476ee/src/cpp/common/completion_queue_cc.cc#L115

This adds an assert that pthread_join succeeded, which is useful as it
avoids a later SIGSEBV. Alternatively, the thread implementation could
gpr_log the errorcode before asserting.


Example backtrace of crash:

frame #0: 0x0000000194f1e868 libsystem_kernel.dylib`__pthread_kill + 8
frame #1: 0x0000000194f55cec libsystem_pthread.dylib`pthread_kill + 288
    frame #2: 0x0000000194e8e2c8 libsystem_c.dylib`abort + 180
    frame #3: 0x0000000194e8d620 libsystem_c.dylib`__assert_rtn + 272
frame #4: 0x0000000100a64f50 grpc_kvstore_test`grpc_core::(anonymous
namespace)::ThreadInternalsPosix::Join() + 188
frame #5: 0x00000001009c5dd0 grpc_kvstore_test`grpc_core::Thread::Join()
+ 56
frame #6: 0x0000000100154474 grpc_kvstore_test`grpc::(anonymous
namespace)::CallbackAlternativeCQ::Unref() + 216
frame #7: 0x0000000100154390
grpc_kvstore_test`grpc::CompletionQueue::ReleaseCallbackAlternativeCQ(grpc::CompletionQueue*)
+ 120
frame #8: 0x000000010014130c grpc_kvstore_test`grpc::Channel::~Channel()
+ 220
frame #9: 0x00000001001413c8 grpc_kvstore_test`grpc::Channel::~Channel()
+ 28
frame #10: 0x000000010014d678
grpc_kvstore_test`std::__1::default_delete<grpc::Channel>::operator()(grpc::Channel*)
const + 44
frame #11: 0x000000010014d358
grpc_kvstore_test`std::__1::__shared_ptr_pointer<grpc::Channel*,
std::__1::shared_ptr<grpc::Channel>::__shared_ptr_default_delete<grpc::Channel,
grpc::Channel>, std::__1::allocator<grpc::Channel> >::__on_zero_shared()
+ 72
frame #12: 0x000000010002ab5c
grpc_kvstore_test`std::__1::__shared_count::__release_shared() + 60
frame #13: 0x000000010002ab00
grpc_kvstore_test`std::__1::__shared_weak_count::__release_shared() + 28
frame #14: 0x000000010002aad0
grpc_kvstore_test`std::__1::shared_ptr<grpc::ServerCredentials>::~shared_ptr()
+ 56
frame #15: 0x00000001000053ec
grpc_kvstore_test`std::__1::shared_ptr<tensorstore_grpc::kvstore::grpc_gen::KvStoreService::Stub>::~shared_ptr()
+ 28
frame #16: 0x000000010014653c
grpc_kvstore_test`grpc::ClientContext::~ClientContext() + 356
frame #17: 0x0000000100146570
grpc_kvstore_test`grpc::ClientContext::~ClientContext() + 28
frame #18: 0x00000001000ab000 grpc_kvstore_test`tensorstore::(anonymous
namespace)::ReadTask::~ReadTask() + 68
frame #19: 0x00000001000aae90 grpc_kvstore_test`tensorstore::(anonymous
namespace)::ReadTask::~ReadTask() + 28
frame #20: 0x00000001000aae18
grpc_kvstore_test`tensorstore::internal::intrusive_ptr_decrement(tensorstore::internal::AtomicReferenceCount<tensorstore::(anonymous
namespace)::ReadTask> const*) + 68
frame #21: 0x00000001000aadc8 grpc_kvstore_test`void
tensorstore::internal::DefaultIntrusivePtrTraits::decrement<tensorstore::(anonymous
namespace)::ReadTask*>(tensorstore::(anonymous namespace)::ReadTask*) +
24
frame #22: 0x00000001000aad9c
grpc_kvstore_test`tensorstore::internal::IntrusivePtr<tensorstore::(anonymous
namespace)::ReadTask,
tensorstore::internal::DefaultIntrusivePtrTraits>::~IntrusivePtr() + 52
frame #23: 0x00000001000a5994
grpc_kvstore_test`tensorstore::internal::IntrusivePtr<tensorstore::(anonymous
namespace)::ReadTask,
tensorstore::internal::DefaultIntrusivePtrTraits>::~IntrusivePtr() + 28
frame #24: 0x00000001000aac24 grpc_kvstore_test`tensorstore::(anonymous
namespace)::ReadTask::Start(tensorstore_grpc::kvstore::grpc_gen::KvStoreService::StubInterface*,
absl::Time)::'lambda'(grpc::Status)::~() + 40
frame #25: 0x00000001000a6280 grpc_kvstore_test`tensorstore::(anonymous
namespace)::ReadTask::Start(tensorstore_grpc::kvstore::grpc_gen::KvStoreService::StubInterface*,
absl::Time)::'lambda'(grpc::Status)::~() + 28
frame #26: 0x00000001000a84ac
grpc_kvstore_test`std::__1::__compressed_pair_elem<tensorstore::(anonymous
namespace)::ReadTask::Start(tensorstore_grpc::kvstore::grpc_gen::KvStoreService::StubInterface*,
absl::Time)::'lambda'(grpc::Status), 0,
false>::~__compressed_pair_elem() + 28
frame #27: 0x00000001000a86c0
grpc_kvstore_test`std::__1::__compressed_pair<tensorstore::(anonymous
namespace)::ReadTask::Start(tensorstore_grpc::kvstore::grpc_gen::KvStoreService::StubInterface*,
absl::Time)::'lambda'(grpc::Status),
std::__1::allocator<tensorstore::(anonymous
namespace)::ReadTask::Start(tensorstore_grpc::kvstore::grpc_gen::KvStoreService::StubInterface*,
absl::Time)::'lambda'(grpc::Status)> >::~__compressed_pair() + 28
frame #28: 0x00000001000a8694
grpc_kvstore_test`std::__1::__compressed_pair<tensorstore::(anonymous
namespace)::ReadTask::Start(tensorstore_grpc::kvstore::grpc_gen::KvStoreService::StubInterface*,
absl::Time)::'lambda'(grpc::Status),
std::__1::allocator<tensorstore::(anonymous
namespace)::ReadTask::Start(tensorstore_grpc::kvstore::grpc_gen::KvStoreService::StubInterface*,
absl::Time)::'lambda'(grpc::Status)> >::~__compressed_pair() + 28
frame #29: 0x00000001000a990c
grpc_kvstore_test`std::__1::__function::__alloc_func<tensorstore::(anonymous
namespace)::ReadTask::Start(tensorstore_grpc::kvstore::grpc_gen::KvStoreService::StubInterface*,
absl::Time)::'lambda'(grpc::Status),
std::__1::allocator<tensorstore::(anonymous
namespace)::ReadTask::Start(tensorstore_grpc::kvstore::grpc_gen::KvStoreService::StubInterface*,
absl::Time)::'lambda'(grpc::Status)>, void (grpc::Status)>::destroy() +
24
frame #30: 0x00000001000a7ea0
grpc_kvstore_test`std::__1::__function::__func<tensorstore::(anonymous
namespace)::ReadTask::Start(tensorstore_grpc::kvstore::grpc_gen::KvStoreService::StubInterface*,
absl::Time)::'lambda'(grpc::Status),
std::__1::allocator<tensorstore::(anonymous
namespace)::ReadTask::Start(tensorstore_grpc::kvstore::grpc_gen::KvStoreService::StubInterface*,
absl::Time)::'lambda'(grpc::Status)>, void (grpc::Status)>::destroy() +
28
frame #31: 0x00000001000aabbc
grpc_kvstore_test`std::__1::__function::__value_func<void
(grpc::Status)>::~__value_func() + 68
frame #32: 0x00000001000aab68
grpc_kvstore_test`std::__1::__function::__value_func<void
(grpc::Status)>::~__value_func() + 28
frame #33: 0x00000001000aab3c grpc_kvstore_test`std::__1::function<void
(grpc::Status)>::~function() + 28
frame #34: 0x00000001000a6254 grpc_kvstore_test`std::__1::function<void
(grpc::Status)>::~function() + 28
frame #35: 0x0000000100108ae0
grpc_kvstore_test`grpc::internal::CallbackWithStatusTag::Run(bool) + 368
frame #36: 0x0000000100108964
grpc_kvstore_test`grpc::internal::CallbackWithStatusTag::StaticRun(grpc_completion_queue_functor*,
int) + 44
frame #37: 0x0000000100154cb0 grpc_kvstore_test`grpc::(anonymous
namespace)::CallbackAlternativeCQ::ThreadLoop(void*) + 356
frame #38: 0x0000000100a650b8 grpc_kvstore_test`grpc_core::(anonymous
namespace)::ThreadInternalsPosix::ThreadInternalsPosix(char const*, void
(*)(void*), void*, bool*, grpc_core::Thread::Options
const&)::'lambda'(void*)::operator()(void*) const + 240
frame #39: 0x0000000100a64fbc grpc_kvstore_test`grpc_core::(anonymous
namespace)::ThreadInternalsPosix::ThreadInternalsPosix(char const*, void
(*)(void*), void*, bool*, grpc_core::Thread::Options
const&)::'lambda'(void*)::__invoke(void*) + 28
frame #40: 0x0000000194f5606c libsystem_pthread.dylib`_pthread_start +
148
@kfstorm
Copy link
Copy Markdown

kfstorm commented May 6, 2023

Hi, guys. We met the same issue when running UT with ASAN enabled.

This should be a bug and this PR only provides a nicer crash log, right? Do we have any plans to fix it?

wanlin31 pushed a commit that referenced this pull request May 18, 2023
Long story here:

CallbackAlternativeCQ operates a thread pool which processes a
completion queue and then directly invokes the completion function in
the same thread. This thread pool is initialized on first Ref() and
unallocated on last Unref().

When running an in-process synchronous server (as we do for tests, using
this
https://github.com/google/tensorstore/blob/master/tensorstore/internal/grpc/grpc_mock.h),
called by an async() interface caller, if the async() callback happens
to drop the last reference to the grpc Channel, then the channel
shutdown will attempt to run in one of the the CallbackAlternativeCQ
threads.

This will cause a deadlock/race condition, as `CallbackAlternativeCQ` is
not designed to shutdown itself. When this deadlock happens,
`pthread_join(pthread_id_)` will return `EDEADLK` and the thread will
keep running. However `EDEADLK` is silently ignored by Join() so
`CallbackAlternativeCQ::Unref` will continue to delete the underlying
grpc_completion_queue, leading to a `SIGSEGV` later in the process.


https://github.com/grpc/grpc/blob/97ba9871324cb68b93f22fd1860934392cd476ee/src/cpp/common/completion_queue_cc.cc#L115

This adds an assert that pthread_join succeeded, which is useful as it
avoids a later SIGSEBV. Alternatively, the thread implementation could
gpr_log the errorcode before asserting.


Example backtrace of crash:

frame #0: 0x0000000194f1e868 libsystem_kernel.dylib`__pthread_kill + 8
frame #1: 0x0000000194f55cec libsystem_pthread.dylib`pthread_kill + 288
    frame #2: 0x0000000194e8e2c8 libsystem_c.dylib`abort + 180
    frame #3: 0x0000000194e8d620 libsystem_c.dylib`__assert_rtn + 272
frame #4: 0x0000000100a64f50 grpc_kvstore_test`grpc_core::(anonymous
namespace)::ThreadInternalsPosix::Join() + 188
frame #5: 0x00000001009c5dd0 grpc_kvstore_test`grpc_core::Thread::Join()
+ 56
frame #6: 0x0000000100154474 grpc_kvstore_test`grpc::(anonymous
namespace)::CallbackAlternativeCQ::Unref() + 216
frame #7: 0x0000000100154390
grpc_kvstore_test`grpc::CompletionQueue::ReleaseCallbackAlternativeCQ(grpc::CompletionQueue*)
+ 120
frame #8: 0x000000010014130c grpc_kvstore_test`grpc::Channel::~Channel()
+ 220
frame #9: 0x00000001001413c8 grpc_kvstore_test`grpc::Channel::~Channel()
+ 28
frame #10: 0x000000010014d678
grpc_kvstore_test`std::__1::default_delete<grpc::Channel>::operator()(grpc::Channel*)
const + 44
frame #11: 0x000000010014d358
grpc_kvstore_test`std::__1::__shared_ptr_pointer<grpc::Channel*,
std::__1::shared_ptr<grpc::Channel>::__shared_ptr_default_delete<grpc::Channel,
grpc::Channel>, std::__1::allocator<grpc::Channel> >::__on_zero_shared()
+ 72
frame #12: 0x000000010002ab5c
grpc_kvstore_test`std::__1::__shared_count::__release_shared() + 60
frame #13: 0x000000010002ab00
grpc_kvstore_test`std::__1::__shared_weak_count::__release_shared() + 28
frame #14: 0x000000010002aad0
grpc_kvstore_test`std::__1::shared_ptr<grpc::ServerCredentials>::~shared_ptr()
+ 56
frame #15: 0x00000001000053ec
grpc_kvstore_test`std::__1::shared_ptr<tensorstore_grpc::kvstore::grpc_gen::KvStoreService::Stub>::~shared_ptr()
+ 28
frame #16: 0x000000010014653c
grpc_kvstore_test`grpc::ClientContext::~ClientContext() + 356
frame #17: 0x0000000100146570
grpc_kvstore_test`grpc::ClientContext::~ClientContext() + 28
frame #18: 0x00000001000ab000 grpc_kvstore_test`tensorstore::(anonymous
namespace)::ReadTask::~ReadTask() + 68
frame #19: 0x00000001000aae90 grpc_kvstore_test`tensorstore::(anonymous
namespace)::ReadTask::~ReadTask() + 28
frame #20: 0x00000001000aae18
grpc_kvstore_test`tensorstore::internal::intrusive_ptr_decrement(tensorstore::internal::AtomicReferenceCount<tensorstore::(anonymous
namespace)::ReadTask> const*) + 68
frame #21: 0x00000001000aadc8 grpc_kvstore_test`void
tensorstore::internal::DefaultIntrusivePtrTraits::decrement<tensorstore::(anonymous
namespace)::ReadTask*>(tensorstore::(anonymous namespace)::ReadTask*) +
24
frame #22: 0x00000001000aad9c
grpc_kvstore_test`tensorstore::internal::IntrusivePtr<tensorstore::(anonymous
namespace)::ReadTask,
tensorstore::internal::DefaultIntrusivePtrTraits>::~IntrusivePtr() + 52
frame #23: 0x00000001000a5994
grpc_kvstore_test`tensorstore::internal::IntrusivePtr<tensorstore::(anonymous
namespace)::ReadTask,
tensorstore::internal::DefaultIntrusivePtrTraits>::~IntrusivePtr() + 28
frame #24: 0x00000001000aac24 grpc_kvstore_test`tensorstore::(anonymous
namespace)::ReadTask::Start(tensorstore_grpc::kvstore::grpc_gen::KvStoreService::StubInterface*,
absl::Time)::'lambda'(grpc::Status)::~() + 40
frame #25: 0x00000001000a6280 grpc_kvstore_test`tensorstore::(anonymous
namespace)::ReadTask::Start(tensorstore_grpc::kvstore::grpc_gen::KvStoreService::StubInterface*,
absl::Time)::'lambda'(grpc::Status)::~() + 28
frame #26: 0x00000001000a84ac
grpc_kvstore_test`std::__1::__compressed_pair_elem<tensorstore::(anonymous
namespace)::ReadTask::Start(tensorstore_grpc::kvstore::grpc_gen::KvStoreService::StubInterface*,
absl::Time)::'lambda'(grpc::Status), 0,
false>::~__compressed_pair_elem() + 28
frame #27: 0x00000001000a86c0
grpc_kvstore_test`std::__1::__compressed_pair<tensorstore::(anonymous
namespace)::ReadTask::Start(tensorstore_grpc::kvstore::grpc_gen::KvStoreService::StubInterface*,
absl::Time)::'lambda'(grpc::Status),
std::__1::allocator<tensorstore::(anonymous
namespace)::ReadTask::Start(tensorstore_grpc::kvstore::grpc_gen::KvStoreService::StubInterface*,
absl::Time)::'lambda'(grpc::Status)> >::~__compressed_pair() + 28
frame #28: 0x00000001000a8694
grpc_kvstore_test`std::__1::__compressed_pair<tensorstore::(anonymous
namespace)::ReadTask::Start(tensorstore_grpc::kvstore::grpc_gen::KvStoreService::StubInterface*,
absl::Time)::'lambda'(grpc::Status),
std::__1::allocator<tensorstore::(anonymous
namespace)::ReadTask::Start(tensorstore_grpc::kvstore::grpc_gen::KvStoreService::StubInterface*,
absl::Time)::'lambda'(grpc::Status)> >::~__compressed_pair() + 28
frame #29: 0x00000001000a990c
grpc_kvstore_test`std::__1::__function::__alloc_func<tensorstore::(anonymous
namespace)::ReadTask::Start(tensorstore_grpc::kvstore::grpc_gen::KvStoreService::StubInterface*,
absl::Time)::'lambda'(grpc::Status),
std::__1::allocator<tensorstore::(anonymous
namespace)::ReadTask::Start(tensorstore_grpc::kvstore::grpc_gen::KvStoreService::StubInterface*,
absl::Time)::'lambda'(grpc::Status)>, void (grpc::Status)>::destroy() +
24
frame #30: 0x00000001000a7ea0
grpc_kvstore_test`std::__1::__function::__func<tensorstore::(anonymous
namespace)::ReadTask::Start(tensorstore_grpc::kvstore::grpc_gen::KvStoreService::StubInterface*,
absl::Time)::'lambda'(grpc::Status),
std::__1::allocator<tensorstore::(anonymous
namespace)::ReadTask::Start(tensorstore_grpc::kvstore::grpc_gen::KvStoreService::StubInterface*,
absl::Time)::'lambda'(grpc::Status)>, void (grpc::Status)>::destroy() +
28
frame #31: 0x00000001000aabbc
grpc_kvstore_test`std::__1::__function::__value_func<void
(grpc::Status)>::~__value_func() + 68
frame #32: 0x00000001000aab68
grpc_kvstore_test`std::__1::__function::__value_func<void
(grpc::Status)>::~__value_func() + 28
frame #33: 0x00000001000aab3c grpc_kvstore_test`std::__1::function<void
(grpc::Status)>::~function() + 28
frame #34: 0x00000001000a6254 grpc_kvstore_test`std::__1::function<void
(grpc::Status)>::~function() + 28
frame #35: 0x0000000100108ae0
grpc_kvstore_test`grpc::internal::CallbackWithStatusTag::Run(bool) + 368
frame #36: 0x0000000100108964
grpc_kvstore_test`grpc::internal::CallbackWithStatusTag::StaticRun(grpc_completion_queue_functor*,
int) + 44
frame #37: 0x0000000100154cb0 grpc_kvstore_test`grpc::(anonymous
namespace)::CallbackAlternativeCQ::ThreadLoop(void*) + 356
frame #38: 0x0000000100a650b8 grpc_kvstore_test`grpc_core::(anonymous
namespace)::ThreadInternalsPosix::ThreadInternalsPosix(char const*, void
(*)(void*), void*, bool*, grpc_core::Thread::Options
const&)::'lambda'(void*)::operator()(void*) const + 240
frame #39: 0x0000000100a64fbc grpc_kvstore_test`grpc_core::(anonymous
namespace)::ThreadInternalsPosix::ThreadInternalsPosix(char const*, void
(*)(void*), void*, bool*, grpc_core::Thread::Options
const&)::'lambda'(void*)::__invoke(void*) + 28
frame #40: 0x0000000194f5606c libsystem_pthread.dylib`_pthread_start +
148
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bloat/none imported Specifies if the PR has been imported to the internal repository lang/core per-call-memory/neutral per-channel-memory/neutral release notes: no Indicates if PR should not be in release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants