[chttp2] Fix ref-counting bug in Chttp2ServerListener around shutdown#37225
Closed
yashykt wants to merge 2 commits intogrpc:masterfrom
Closed
[chttp2] Fix ref-counting bug in Chttp2ServerListener around shutdown#37225yashykt wants to merge 2 commits intogrpc:masterfrom
yashykt wants to merge 2 commits intogrpc:masterfrom
Conversation
markdroth
approved these changes
Jul 16, 2024
| @@ -872,13 +869,15 @@ void Chttp2ServerListener::OnAccept(void* arg, grpc_endpoint* tcp, | |||
| // This ref needs to be taken in the critical region after having made | |||
Member
There was a problem hiding this comment.
I think this comment needs to be updated a bit. It seems like it originally referred only to the ref on the listener, but it should now refer to both that and the ref on the TCP server.
paulosjca
pushed a commit
to paulosjca/grpc
that referenced
this pull request
Nov 25, 2024
…grpc#37225) Noticed on a Core End2End test failure https://btx.cloud.google.com/invocations/dc3bf84d-e6ed-4b32-a24c-12489f981e46/targets/%2F%2Ftest%2Fcore%2Fend2end:cancel_with_status_test@poller%3Depoll1;config=56f5b09615e325097b100b58c41171656571290519a83c5d89a6067ef0283d46/log ``` F0000 00:00:1721017820.001684 87 tcp_server_posix.cc:354] Check failed: !s->shutdown *** Check failure stack trace: *** @ 0x7f32578da0e4 absl::lts_20240116::log_internal::LogMessage::SendToLog() @ 0x7f32578d9a94 absl::lts_20240116::log_internal::LogMessage::Flush() @ 0x7f32578da589 absl::lts_20240116::log_internal::LogMessageFatal::~LogMessageFatal() @ 0x7f3257e340a1 tcp_server_unref() @ 0x7f3258fcba8e grpc_core::Chttp2ServerListener::ActiveConnection::~ActiveConnection() @ 0x7f3258fd19e7 grpc_event_engine::experimental::MemoryAllocator::New<>()::Wrapper::~Wrapper() @ 0x7f3258fcc998 grpc_core::Chttp2ServerListener::OnAccept() @ 0x7f3257e34962 absl::lts_20240116::internal_any_invocable::LocalInvoker<>() @ 0x7f3257da6475 grpc_event_engine::experimental::PosixEngineListenerImpl::AsyncConnectionAcceptor::NotifyOnAccept()::$_1::operator()() @ 0x7f3257da4437 grpc_event_engine::experimental::PosixEngineListenerImpl::AsyncConnectionAcceptor::NotifyOnAccept() @ 0x7f3257da5fef absl::lts_20240116::base_internal::Callable::Invoke<>() @ 0x7f3257dca50a grpc_event_engine::experimental::PosixEngineClosure::Run() @ 0x7f3257c9013e grpc_event_engine::experimental::WorkStealingThreadPool::ThreadState::Step() @ 0x7f3257c8fe48 grpc_event_engine::experimental::WorkStealingThreadPool::ThreadState::ThreadBody() @ 0x7f3257c906df grpc_event_engine::experimental::WorkStealingThreadPool::WorkStealingThreadPoolImpl::StartThread()::$_0::__invoke() @ 0x7f32579a106c grpc_core::(anonymous namespace)::ThreadInternalsPosix::ThreadInternalsPosix()::{lambda()#1}::__invoke() @ 0x7f3257358609 start_thread ``` grpc#36563 changed the refcounting mechanism incorrectly and we ended up taking a ref on the tcp server outside the critical region, resulting in a time-of-check-to-time-of-use bug, where we could end up reffing the tcp server when it is already 0, i.e., when the listener has already been shutdown. This results in an attempt to destroy the tcp server twice and an eventual crash. Closes grpc#37225 COPYBARA_INTEGRATE_REVIEW=grpc#37225 from yashykt:FixChttp2Bug bc1e8df PiperOrigin-RevId: 654850991
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Noticed on a Core End2End test failure https://btx.cloud.google.com/invocations/dc3bf84d-e6ed-4b32-a24c-12489f981e46/targets/%2F%2Ftest%2Fcore%2Fend2end:cancel_with_status_test@poller%3Depoll1;config=56f5b09615e325097b100b58c41171656571290519a83c5d89a6067ef0283d46/log
#36563 changed the refcounting mechanism incorrectly and we ended up taking a ref on the tcp server outside the critical region, resulting in a time-of-check-to-time-of-use bug, where we could end up reffing the tcp server when it is already 0, i.e., when the listener has already been shutdown. This results in an attempt to destroy the tcp server twice and an eventual crash.