[export] [PH2] Fix crash with error flattener#41930
[export] [PH2] Fix crash with error flattener#41930copybara-service[bot] wants to merge 1 commit into
Conversation
266000c to
62eb0ee
Compare
Root cause 1. status_details_recv_ is not initialized at time of creation : https://github.com/grpc/grpc/blob/173b262da788de397ea3f7d788156c59272f3a89/src/core/load_balancing/rls/rls.cc#L684 2. It is then used to initialize an op, and is passed by reference via the op https://github.com/grpc/grpc/blob/173b262da788de397ea3f7d788156c59272f3a89/src/core/load_balancing/rls/rls.cc#L1705 3. This op goes all over the gRPC call space taking our status_details_recv_ , and many components assign stuff to it. 4. One such path misses assignment when error flattener is enabled : https://github.com/grpc/grpc/blob/173b262da788de397ea3f7d788156c59272f3a89/src/core/call/client_call.cc#L392 and https://github.com/grpc/grpc/blob/173b262da788de397ea3f7d788156c59272f3a89/src/core/call/client_call.cc#L451 5. And then we try to CSliceUnref(status_details_recv_) which was not initialized. And crash. https://github.com/grpc/grpc/blob/c0728bdae796f214a9fc56a617b21075c2ed3b40/src/core/load_balancing/rls/rls.cc#L1749 ``` I0123 21:12:16.698753 24 dual_ref_counted.h:216] ClientCall:0x7fab7402e660 weak_unref 1 -> 0 (refs=1) *** SIGSEGV received at time=1769202736 on cpu 1 *** PC: @ 0x7faba2478c57 (unknown) grpc_slice_refcount::Unref() @ 0x7fab9899041c 64 absl::lts_20250512::WriteFailureInfo() @ 0x7fab989900bb 304 absl::lts_20250512::AbslFailureSignalHandler() @ 0x7fab93da5420 1744 (unknown) @ 0x7faba2478b37 64 grpc_core::CSliceUnref() @ 0x7faba0e9ff14 1120 grpc_core::(anonymous namespace)::RlsLb::RlsRequest::OnRlsCallCompleteLocked() @ 0x7faba0e9f7d9 96 grpc_core::(anonymous namespace)::RlsLb::RlsRequest::OnRlsCallComplete()::$_0::operator()() @ 0x7faba0e9f795 32 std::__invoke_impl<>() @ 0x7faba0e9f775 32 std::__invoke<>() @ 0x7faba0e9f755 32 std::invoke<>() @ 0x7faba0e9f725 32 absl::lts_20250512::internal_any_invocable::InvokeR<>() @ 0x7faba0e9f652 32 absl::lts_20250512::internal_any_invocable::LocalInvoker<>() @ 0x7fab9c426f72 32 absl::lts_20250512::internal_any_invocable::Impl<>::operator()() @ 0x7fab95bd3957 528 grpc_core::WorkSerializer::WorkSerializerImpl::Run() @ 0x7fab974fef54 160 grpc_event_engine::experimental::WorkStealingThreadPool::ThreadState::Step() @ 0x7fab974fe8da 176 grpc_event_engine::experimental::WorkStealingThreadPool::ThreadState::ThreadBody() @ 0x7fab974ff591 48 grpc_event_engine::experimental::WorkStealingThreadPool::WorkStealingThreadPoolImpl::StartThread()::$_0::operator()() @ 0x7fab974ff569 32 grpc_event_engine::experimental::WorkStealingThreadPool::WorkStealingThreadPoolImpl::StartThread()::$_0::__invoke() @ 0x7fab95766d25 160 grpc_core::(anonymous namespace)::ThreadInternalsPosix::ThreadInternalsPosix()::{lambda()#1}::operator()() @ 0x7fab95766bf9 32 grpc_core::(anonymous namespace)::ThreadInternalsPosix::ThreadInternalsPosix()::{lambda()#1}::__invoke() @ 0x7fab93d99609 (unknown) start_thread ``` Fix option 1 : Initialize status_details_recv_ Fix option 2 : Modify the https://github.com/grpc/grpc/blob/173b262da788de397ea3f7d788156c59272f3a89/src/core/call/client_call.cc#L451 PR with the experiment enabled AND the fix for reference : #41920 ---- DO NOT SUBMIT. This PR is for testing purposes only. [cl/887957316](http://cl/887957316) PiperOrigin-RevId: 887957316
62eb0ee to
c4119e4
Compare
|
This doesn't look like the right fix to me. I think this would mask the symptom but not solve the underlying problem. The C-core API guarantees that when the recv_status_on_client op completes, the status and details will be set, so there should be no requirement that the caller initialize these two variables. If we're completing the op without setting these two variables, then we're violating the API contract. Initializing the variables in RLS may work around the immediate problem, but it will just crop up again later in other contexts. I think we need to figure out what code-path is completing the recv_status_on_client op without setting these two variables and fix it to make sure the variables are set. |
In that case, would it make sense to add an else case to this grpc/src/core/call/client_call.cc Line 445 in 488b14f and assign the required variables? Or would there be a better way? cc : @markdroth |
|
Good catch, Tanvi! You're right, that's where the bug is. I've sent you #41940 to fix it. Thanks! |
[export] [PH2] Fix crash with error flattener
Root cause
grpc/src/core/load_balancing/rls/rls.cc
Line 684 in 173b262
grpc/src/core/load_balancing/rls/rls.cc
Line 1705 in 173b262
grpc/src/core/call/client_call.cc
Line 392 in 173b262
grpc/src/core/call/client_call.cc
Line 451 in 173b262
grpc/src/core/load_balancing/rls/rls.cc
Line 1749 in c0728bd
Fix option 1 : Initialize status_details_recv_
Fix option 2 : Modify the
grpc/src/core/call/client_call.cc
Line 451 in 173b262
PR with the experiment enabled AND the fix for reference : #41920
DO NOT SUBMIT. This PR is for testing purposes only. cl/887957316