[ruby] improve the way completion queue pluck operations handle signals and process shutdown#36903
Closed
apolcyn wants to merge 9 commits intogrpc:masterfrom
Closed
[ruby] improve the way completion queue pluck operations handle signals and process shutdown#36903apolcyn wants to merge 9 commits intogrpc:masterfrom
apolcyn wants to merge 9 commits intogrpc:masterfrom
Conversation
stanley-cheung
approved these changes
Jun 13, 2024
Contributor
stanley-cheung
left a comment
There was a problem hiding this comment.
LGTM. Any tests affected?
Contributor
Author
This doesn't require any test changes It fixes a CBF in
|
apolcyn
added a commit
to apolcyn/grpc
that referenced
this pull request
Jun 13, 2024
…ls and process shutdown (grpc#36903) Fixes the CBF of `src/ruby/end2end/killed_client_thread_test.rb` (failure mode is a hang of the child process that receives the SIGTERM) that has been happening since grpc#36724 So far grpc-ruby CQ pluck operations have so far used a 20ms-interval busy poll to check interrupts in case we've received a signal, handle process shutdown, etc. This means ongoing RPCs will not terminate their CQ operations if we need to terminate the process (the loop simply exits without waiting for the CQ op to finish), causing a leak. Those RPCs can leave refs over their corresponding channels preventing [this](https://github.com/grpc/grpc/blob/8564f72e8e0334c25c480e0aec1df75bdc1fce14/src/ruby/ext/grpc/rb_channel.c#L653) from terminating (the channels don't reach state SHUTDOWN after being destroyed). Fix is to unblock CQ pluck operations by cancelling calls, and thus allowing the CQ pluck to actually complete its operation. For server listening CQ operations, we unblock them by shutting down the server. A side win here is to remove the [20ms-interval busy poll](https://github.com/grpc/grpc/blob/8564f72e8e0334c25c480e0aec1df75bdc1fce14/src/ruby/ext/grpc/rb_completion_queue.c#L44) on CQ operations, which was only needed to handle shutdown. Closes grpc#36903 COPYBARA_INTEGRATE_REVIEW=grpc#36903 from apolcyn:fix_ruby_interrupt bed1ee2 PiperOrigin-RevId: 643046465
alto-ruby
added a commit
to alto-ruby/grpc
that referenced
this pull request
Jul 11, 2024
…k operations handle signals and process shutdown grpc#36903" (grpc#36916)" This reverts commit 3469d7b.
copybara-service bot
pushed a commit
that referenced
this pull request
Aug 7, 2024
Fixes #37234 Following up on the problem described in #36903, there are a number of paths in `client_server_spec.rb` and a few other tests where client call objects can leak due to RPC lifecycles not being properly completed, leading to a thread not terminating. Some of the tests, which don't use the surface-level APIs, are changed to manually close calls (and not rely on GC which might not happen before shutdown of ruby threads). `client_server_spec.rb` is updated to use surface level APIs, which manages call lifecycles correctly (this also improves the test's fidelity). While we're here: expose `cancel_with_status` on call operations. This was only accidentally private so far. The test refactoring caught it. Closes #37410 COPYBARA_INTEGRATE_REVIEW=#37410 from apolcyn:fix_call_leak b230472 PiperOrigin-RevId: 660430463
paulosjca
pushed a commit
to paulosjca/grpc
that referenced
this pull request
Nov 25, 2024
…ls and process shutdown (grpc#36903) Fixes the CBF of `src/ruby/end2end/killed_client_thread_test.rb` (failure mode is a hang of the child process that receives the SIGTERM) that has been happening since grpc#36724 So far grpc-ruby CQ pluck operations have so far used a 20ms-interval busy poll to check interrupts in case we've received a signal, handle process shutdown, etc. This means ongoing RPCs will not terminate their CQ operations if we need to terminate the process (the loop simply exits without waiting for the CQ op to finish), causing a leak. Those RPCs can leave refs over their corresponding channels preventing [this](https://github.com/grpc/grpc/blob/8564f72e8e0334c25c480e0aec1df75bdc1fce14/src/ruby/ext/grpc/rb_channel.c#L653) from terminating (the channels don't reach state SHUTDOWN after being destroyed). Fix is to unblock CQ pluck operations by cancelling calls, and thus allowing the CQ pluck to actually complete its operation. For server listening CQ operations, we unblock them by shutting down the server. A side win here is to remove the [20ms-interval busy poll](https://github.com/grpc/grpc/blob/8564f72e8e0334c25c480e0aec1df75bdc1fce14/src/ruby/ext/grpc/rb_completion_queue.c#L44) on CQ operations, which was only needed to handle shutdown. Closes grpc#36903 COPYBARA_INTEGRATE_REVIEW=grpc#36903 from apolcyn:fix_ruby_interrupt bed1ee2 PiperOrigin-RevId: 643046465
paulosjca
pushed a commit
to paulosjca/grpc
that referenced
this pull request
Nov 25, 2024
Fixes grpc#37234 Following up on the problem described in grpc#36903, there are a number of paths in `client_server_spec.rb` and a few other tests where client call objects can leak due to RPC lifecycles not being properly completed, leading to a thread not terminating. Some of the tests, which don't use the surface-level APIs, are changed to manually close calls (and not rely on GC which might not happen before shutdown of ruby threads). `client_server_spec.rb` is updated to use surface level APIs, which manages call lifecycles correctly (this also improves the test's fidelity). While we're here: expose `cancel_with_status` on call operations. This was only accidentally private so far. The test refactoring caught it. Closes grpc#37410 COPYBARA_INTEGRATE_REVIEW=grpc#37410 from apolcyn:fix_call_leak b230472 PiperOrigin-RevId: 660430463
apolcyn
added a commit
to apolcyn/grpc
that referenced
this pull request
Apr 28, 2025
…le signals and process shutdown (grpc#36903)" This reverts commit d4b5e12.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes the CBF of
src/ruby/end2end/killed_client_thread_test.rb(failure mode is a hang of the child process that receives the SIGTERM) that has been happening since #36724So far grpc-ruby CQ pluck operations have so far used a 20ms-interval busy poll to check interrupts in case we've received a signal, handle process shutdown, etc. This means ongoing RPCs will not terminate their CQ operations if we need to terminate the process (the loop simply exits without waiting for the CQ op to finish), causing a leak. Those RPCs can leave refs over their corresponding channels preventing this from terminating (the channels don't reach state SHUTDOWN after being destroyed).
Fix is to unblock CQ pluck operations by cancelling calls, and thus allowing the CQ pluck to actually complete its operation. For server listening CQ operations, we unblock them by shutting down the server.
A side win here is to remove the 20ms-interval busy poll on CQ operations, which was only needed to handle shutdown.