Skip to content

[Ruby] Fix ARM64 server shutdown timeout in Ruby gRPC#41223

Closed
zarinn3pal wants to merge 2 commits into
grpc:masterfrom
zarinn3pal:fix/flaky_active_call_spec_arm
Closed

[Ruby] Fix ARM64 server shutdown timeout in Ruby gRPC#41223
zarinn3pal wants to merge 2 commits into
grpc:masterfrom
zarinn3pal:fix/flaky_active_call_spec_arm

Conversation

@zarinn3pal

@zarinn3pal zarinn3pal commented Dec 11, 2025

Copy link
Copy Markdown
Contributor

On ARM64, server shutdown could hang for 20+ minutes due to a memory visibility issue in the C-core completion queue. The shutdown_called flag lacks memory barriers, causing blocked threads to never wake up on ARM's weak memory model.

This workaround sends a dummy RPC before shutdown to unblock the completion queue from the I/O side, avoiding C-core changes.

The PR addresses the issue skipped in #40770

@zarinn3pal zarinn3pal added release notes: yes Indicates if PR needs to be in release notes and removed lang/ruby labels Dec 11, 2025
@pawbhard pawbhard assigned asheshvidyut and unassigned markdroth Jan 6, 2026
@asheshvidyut

Copy link
Copy Markdown
Member

If the actual fix is in Core, lets fix it in Core itself.

@zarinn3pal

Copy link
Copy Markdown
Contributor Author

Added fix on the core:
#41510

@zarinn3pal zarinn3pal closed this Mar 26, 2026
copybara-service Bot pushed a commit that referenced this pull request Apr 24, 2026
…#41510)

On ARM64, server shutdown could hang for 20+ minutes due to a memory visibility issue in the C-core completion queue. The shutdown_called flag lacks memory barriers, causing blocked threads to never wake up on ARM's weak memory model.

A  workaround fix was created for ruby that sent a dummy RPC before shutdown to unblock the completion queue from the I/O side. [41223](#41223).

This PR addresses the issue in the core; such that all wrapped languages can reap the benefit; as well as the root cause is addressed.  Converted the `shutdown_called` flag from bool to `std::atomic<bool>` in all internal completion queue data structures. This guarantees that the shutdown state transition is atomically visible across threads, preventing race conditions and ensuring the completion queue drains and shuts down correctly on all architectures.

The PR addresses the issue skipped in [40770](#40770)

Closes #41510

COPYBARA_INTEGRATE_REVIEW=#41510 from zarinn3pal:fix/cc-queue-shutdown 5e23512
PiperOrigin-RevId: 905116782
asheshvidyut pushed a commit to a-detiste/grpc that referenced this pull request Jun 10, 2026
…grpc#41510)

On ARM64, server shutdown could hang for 20+ minutes due to a memory visibility issue in the C-core completion queue. The shutdown_called flag lacks memory barriers, causing blocked threads to never wake up on ARM's weak memory model.

A  workaround fix was created for ruby that sent a dummy RPC before shutdown to unblock the completion queue from the I/O side. [41223](grpc#41223).

This PR addresses the issue in the core; such that all wrapped languages can reap the benefit; as well as the root cause is addressed.  Converted the `shutdown_called` flag from bool to `std::atomic<bool>` in all internal completion queue data structures. This guarantees that the shutdown state transition is atomically visible across threads, preventing race conditions and ensuring the completion queue drains and shuts down correctly on all architectures.

The PR addresses the issue skipped in [40770](grpc#40770)

Closes grpc#41510

COPYBARA_INTEGRATE_REVIEW=grpc#41510 from zarinn3pal:fix/cc-queue-shutdown 5e23512
PiperOrigin-RevId: 905116782
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lang/ruby release notes: yes Indicates if PR needs to be in release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants