Skip to content

[chttp2] Fix ref-counting bug in Chttp2ServerListener around shutdown#37225

Closed
yashykt wants to merge 2 commits intogrpc:masterfrom
yashykt:FixChttp2Bug
Closed

[chttp2] Fix ref-counting bug in Chttp2ServerListener around shutdown#37225
yashykt wants to merge 2 commits intogrpc:masterfrom
yashykt:FixChttp2Bug

Conversation

@yashykt
Copy link
Copy Markdown
Member

@yashykt yashykt commented Jul 16, 2024

Noticed on a Core End2End test failure https://btx.cloud.google.com/invocations/dc3bf84d-e6ed-4b32-a24c-12489f981e46/targets/%2F%2Ftest%2Fcore%2Fend2end:cancel_with_status_test@poller%3Depoll1;config=56f5b09615e325097b100b58c41171656571290519a83c5d89a6067ef0283d46/log

F0000 00:00:1721017820.001684      87 tcp_server_posix.cc:354] Check failed: !s->shutdown 
*** Check failure stack trace: ***
    @     0x7f32578da0e4  absl::lts_20240116::log_internal::LogMessage::SendToLog()
    @     0x7f32578d9a94  absl::lts_20240116::log_internal::LogMessage::Flush()
    @     0x7f32578da589  absl::lts_20240116::log_internal::LogMessageFatal::~LogMessageFatal()
    @     0x7f3257e340a1  tcp_server_unref()
    @     0x7f3258fcba8e  grpc_core::Chttp2ServerListener::ActiveConnection::~ActiveConnection()
    @     0x7f3258fd19e7  grpc_event_engine::experimental::MemoryAllocator::New<>()::Wrapper::~Wrapper()
    @     0x7f3258fcc998  grpc_core::Chttp2ServerListener::OnAccept()
    @     0x7f3257e34962  absl::lts_20240116::internal_any_invocable::LocalInvoker<>()
    @     0x7f3257da6475  grpc_event_engine::experimental::PosixEngineListenerImpl::AsyncConnectionAcceptor::NotifyOnAccept()::$_1::operator()()
    @     0x7f3257da4437  grpc_event_engine::experimental::PosixEngineListenerImpl::AsyncConnectionAcceptor::NotifyOnAccept()
    @     0x7f3257da5fef  absl::lts_20240116::base_internal::Callable::Invoke<>()
    @     0x7f3257dca50a  grpc_event_engine::experimental::PosixEngineClosure::Run()
    @     0x7f3257c9013e  grpc_event_engine::experimental::WorkStealingThreadPool::ThreadState::Step()
    @     0x7f3257c8fe48  grpc_event_engine::experimental::WorkStealingThreadPool::ThreadState::ThreadBody()
    @     0x7f3257c906df  grpc_event_engine::experimental::WorkStealingThreadPool::WorkStealingThreadPoolImpl::StartThread()::$_0::__invoke()
    @     0x7f32579a106c  grpc_core::(anonymous namespace)::ThreadInternalsPosix::ThreadInternalsPosix()::{lambda()#1}::__invoke()
    @     0x7f3257358609  start_thread

#36563 changed the refcounting mechanism incorrectly and we ended up taking a ref on the tcp server outside the critical region, resulting in a time-of-check-to-time-of-use bug, where we could end up reffing the tcp server when it is already 0, i.e., when the listener has already been shutdown. This results in an attempt to destroy the tcp server twice and an eventual crash.

@@ -872,13 +869,15 @@ void Chttp2ServerListener::OnAccept(void* arg, grpc_endpoint* tcp,
// This ref needs to be taken in the critical region after having made
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment needs to be updated a bit. It seems like it originally referred only to the ref on the listener, but it should now refer to both that and the ref on the TCP server.

paulosjca pushed a commit to paulosjca/grpc that referenced this pull request Nov 25, 2024
…grpc#37225)

Noticed on a Core End2End test failure https://btx.cloud.google.com/invocations/dc3bf84d-e6ed-4b32-a24c-12489f981e46/targets/%2F%2Ftest%2Fcore%2Fend2end:cancel_with_status_test@poller%3Depoll1;config=56f5b09615e325097b100b58c41171656571290519a83c5d89a6067ef0283d46/log

```
F0000 00:00:1721017820.001684      87 tcp_server_posix.cc:354] Check failed: !s->shutdown
*** Check failure stack trace: ***
    @     0x7f32578da0e4  absl::lts_20240116::log_internal::LogMessage::SendToLog()
    @     0x7f32578d9a94  absl::lts_20240116::log_internal::LogMessage::Flush()
    @     0x7f32578da589  absl::lts_20240116::log_internal::LogMessageFatal::~LogMessageFatal()
    @     0x7f3257e340a1  tcp_server_unref()
    @     0x7f3258fcba8e  grpc_core::Chttp2ServerListener::ActiveConnection::~ActiveConnection()
    @     0x7f3258fd19e7  grpc_event_engine::experimental::MemoryAllocator::New<>()::Wrapper::~Wrapper()
    @     0x7f3258fcc998  grpc_core::Chttp2ServerListener::OnAccept()
    @     0x7f3257e34962  absl::lts_20240116::internal_any_invocable::LocalInvoker<>()
    @     0x7f3257da6475  grpc_event_engine::experimental::PosixEngineListenerImpl::AsyncConnectionAcceptor::NotifyOnAccept()::$_1::operator()()
    @     0x7f3257da4437  grpc_event_engine::experimental::PosixEngineListenerImpl::AsyncConnectionAcceptor::NotifyOnAccept()
    @     0x7f3257da5fef  absl::lts_20240116::base_internal::Callable::Invoke<>()
    @     0x7f3257dca50a  grpc_event_engine::experimental::PosixEngineClosure::Run()
    @     0x7f3257c9013e  grpc_event_engine::experimental::WorkStealingThreadPool::ThreadState::Step()
    @     0x7f3257c8fe48  grpc_event_engine::experimental::WorkStealingThreadPool::ThreadState::ThreadBody()
    @     0x7f3257c906df  grpc_event_engine::experimental::WorkStealingThreadPool::WorkStealingThreadPoolImpl::StartThread()::$_0::__invoke()
    @     0x7f32579a106c  grpc_core::(anonymous namespace)::ThreadInternalsPosix::ThreadInternalsPosix()::{lambda()#1}::__invoke()
    @     0x7f3257358609  start_thread
```

grpc#36563 changed the refcounting mechanism incorrectly and we ended up taking a ref on the tcp server outside the critical region, resulting in a time-of-check-to-time-of-use bug, where we could end up reffing the tcp server when it is already 0, i.e., when the listener has already been shutdown. This results in an attempt to destroy the tcp server twice and an eventual crash.

Closes grpc#37225

COPYBARA_INTEGRATE_REVIEW=grpc#37225 from yashykt:FixChttp2Bug bc1e8df
PiperOrigin-RevId: 654850991
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants