[Fix] Skip flaky active_call_spec.rb on linux arm64#40770
Conversation
|
I do have the pass for this one, but the job is flaky and a pass doesn't mean much without logs confirming the skip: I'm running a new test now with a modified code to force-fail https://source.cloud.google.com/results/invocations/ac179b76-0301-4ec6-b023-6d45374f4a62. |
|
Confirmed the tests are skipped as expected by forcing a failure at the end of the suite https://source.cloud.google.com/results/invocations/ac179b76-0301-4ec6-b023-6d45374f4a62 The tests are indeed skipped on linux arm (marked as pending) |
|
Now just double-checking the test are not skipped on non-arm linux: https://source.cloud.google.com/results/invocations/dc65bcf2-192d-4391-bf13-868b30748d1d |
|
Not planning to merge this, we'll keep the failure visible until it's fixed. |
…#41510) On ARM64, server shutdown could hang for 20+ minutes due to a memory visibility issue in the C-core completion queue. The shutdown_called flag lacks memory barriers, causing blocked threads to never wake up on ARM's weak memory model. A workaround fix was created for ruby that sent a dummy RPC before shutdown to unblock the completion queue from the I/O side. [41223](#41223). This PR addresses the issue in the core; such that all wrapped languages can reap the benefit; as well as the root cause is addressed. Converted the `shutdown_called` flag from bool to `std::atomic<bool>` in all internal completion queue data structures. This guarantees that the shutdown state transition is atomically visible across threads, preventing race conditions and ensuring the completion queue drains and shuts down correctly on all architectures. The PR addresses the issue skipped in [40770](#40770) Closes #41510 COPYBARA_INTEGRATE_REVIEW=#41510 from zarinn3pal:fix/cc-queue-shutdown 5e23512 PiperOrigin-RevId: 905116782
…grpc#41510) On ARM64, server shutdown could hang for 20+ minutes due to a memory visibility issue in the C-core completion queue. The shutdown_called flag lacks memory barriers, causing blocked threads to never wake up on ARM's weak memory model. A workaround fix was created for ruby that sent a dummy RPC before shutdown to unblock the completion queue from the I/O side. [41223](grpc#41223). This PR addresses the issue in the core; such that all wrapped languages can reap the benefit; as well as the root cause is addressed. Converted the `shutdown_called` flag from bool to `std::atomic<bool>` in all internal completion queue data structures. This guarantees that the shutdown state transition is atomically visible across threads, preventing race conditions and ensuring the completion queue drains and shuts down correctly on all architectures. The PR addresses the issue skipped in [40770](grpc#40770) Closes grpc#41510 COPYBARA_INTEGRATE_REVIEW=grpc#41510 from zarinn3pal:fix/cc-queue-shutdown 5e23512 PiperOrigin-RevId: 905116782
On linux arm the test is sometimes randomly hangs until the 20min timeout. Normally it takes just a few seconds to pass.
Fail: https://source.cloud.google.com/results/invocations/e0038f3c-5d32-405e-9e99-c28c9328a3b7
Pass: https://source.cloud.google.com/results/invocations/7be4b597-a7bd-48e2-a631-83695020fef3
This PR selectively skip this test on linux arm, until the root cause is determined.