Skip to content

[Python] aio: skip grpc/aio shutdown if py interpreter is finalizing#40447

Closed
sergiitk wants to merge 3 commits intogrpc:masterfrom
sergiitk:fix/aio/shutdown
Closed

[Python] aio: skip grpc/aio shutdown if py interpreter is finalizing#40447
sergiitk wants to merge 3 commits intogrpc:masterfrom
sergiitk:fix/aio/shutdown

Conversation

@sergiitk
Copy link
Member

@sergiitk sergiitk commented Aug 14, 2025

This PR changes the logic of shutdown_grpc_aio to skip _actual_aio_shutdown python interpreter is already being finalized (cleaning up resources, destroying objects, preparing for program exit, etc). _actual_aio_shutdown involves PollerCompletionQueue shutdown, followed by core grpc_shutdown API call.

Reasoning:

  1. During finalizations, in come cases resources we're accessing may already be freed, and the order is not deterministic. Some of the resources being unloaded prior the _actual_aio_shutdown call: _global_aio_state, AsyncIOEngine enum, or even python libraries like sys. This leads to errors like AttributeError: 'NoneType' object has no attribute 'POLLER'.
  2. PollerCompletionQueue.shutdown() will try to wait on its poller thread to finish gracefully. In py3.14, PythonFinalizationError is raised when Thread.join() is called during finalization. I think the logic here is similar to (1): these threads may have already been deallocated.

Note that in some cases users were able to prevent _actual_aio_shutdown from being called by manually calling init_grpc_aio prior to initializing any grpc objects. This resulted in an incorrect positive refcount, which prevents _actual_aio_shutdown from being run. Before the above finalization check was added this side-effect was sometimes misused to avoid deadlock on finialization (#22365).

This PR:

@sergiitk sergiitk self-assigned this Aug 14, 2025
@sergiitk sergiitk added release notes: yes Indicates if PR needs to be in release notes lang/Python and removed lang/Python labels Aug 14, 2025
@sergiitk sergiitk added this to the Python 3.14 support milestone Aug 19, 2025
@sergiitk
Copy link
Member Author

sergiitk commented Sep 8, 2025

FYI @parthea

Copy link
Contributor

@sreenithi sreenithi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@copybara-service copybara-service bot closed this in 9e2283d Sep 9, 2025
@sergiitk sergiitk deleted the fix/aio/shutdown branch September 9, 2025 17:45
sergiitk added a commit to sergiitk/grpc that referenced this pull request Sep 9, 2025
…rpc#40447)

This PR changes the logic of `shutdown_grpc_aio` to skip `_actual_aio_shutdown` python interpreter is already [being finalized](https://docs.python.org/3.14/glossary.html#term-interpreter-shutdown) (cleaning up resources, destroying objects, preparing for program exit, etc). `_actual_aio_shutdown` involves `PollerCompletionQueue` shutdown, followed by core [`grpc_shutdown`](https://grpc.github.io/grpc/core/grpc_8h.html#a35f55253e80714c17f4f3a0657e06f1b) API call.

Reasoning:
1. During finalizations, in come cases resources we're accessing may already be freed, and the order is not deterministic. Some of the resources being unloaded prior the  `_actual_aio_shutdown` call: `_global_aio_state`, `AsyncIOEngine` enum, or even python libraries like `sys`. This leads to errors like `AttributeError: 'NoneType' object has no attribute 'POLLER'`.
2. `PollerCompletionQueue.shutdown()` will try to wait on its poller thread to finish gracefully. In py3.14, `PythonFinalizationError` is raised when `Thread.join()` is called during finalization. I think the logic here is similar to (1): these threads may have already been deallocated.

Note that in some cases users were able to prevent `_actual_aio_shutdown` from being called by manually calling `init_grpc_aio` prior to initializing any grpc objects.  This resulted in an incorrect positive refcount, which prevents `_actual_aio_shutdown` from being run. Before the above finalization check was added this side-effect was sometimes misused to avoid deadlock on finialization (grpc#22365).

This PR:
- Fixes grpc#39520
- Fixes grpc#22365
- Fixes grpc#38679
- Fixes grpc#33342
- Fixes grpc#36655

Closes grpc#40447

COPYBARA_INTEGRATE_REVIEW=grpc#40447 from sergiitk:fix/aio/shutdown 11114f6
PiperOrigin-RevId: 804971756
asheshvidyut pushed a commit to asheshvidyut/grpc that referenced this pull request Sep 12, 2025
…rpc#40447)

This PR changes the logic of `shutdown_grpc_aio` to skip `_actual_aio_shutdown` python interpreter is already [being finalized](https://docs.python.org/3.14/glossary.html#term-interpreter-shutdown) (cleaning up resources, destroying objects, preparing for program exit, etc). `_actual_aio_shutdown` involves `PollerCompletionQueue` shutdown, followed by core [`grpc_shutdown`](https://grpc.github.io/grpc/core/grpc_8h.html#a35f55253e80714c17f4f3a0657e06f1b) API call.

Reasoning:
1. During finalizations, in come cases resources we're accessing may already be freed, and the order is not deterministic. Some of the resources being unloaded prior the  `_actual_aio_shutdown` call: `_global_aio_state`, `AsyncIOEngine` enum, or even python libraries like `sys`. This leads to errors like `AttributeError: 'NoneType' object has no attribute 'POLLER'`.
2. `PollerCompletionQueue.shutdown()` will try to wait on its poller thread to finish gracefully. In py3.14, `PythonFinalizationError` is raised when `Thread.join()` is called during finalization. I think the logic here is similar to (1): these threads may have already been deallocated.

Note that in some cases users were able to prevent `_actual_aio_shutdown` from being called by manually calling `init_grpc_aio` prior to initializing any grpc objects.  This resulted in an incorrect positive refcount, which prevents `_actual_aio_shutdown` from being run. Before the above finalization check was added this side-effect was sometimes misused to avoid deadlock on finialization (grpc#22365).

This PR:
- Fixes grpc#39520
- Fixes grpc#22365
- Fixes grpc#38679
- Fixes grpc#33342
- Fixes grpc#36655

Closes grpc#40447

COPYBARA_INTEGRATE_REVIEW=grpc#40447 from sergiitk:fix/aio/shutdown 11114f6
PiperOrigin-RevId: 804971756
sergiitk added a commit to sergiitk/grpc that referenced this pull request Sep 16, 2025
…rpc#40447)

This PR changes the logic of `shutdown_grpc_aio` to skip `_actual_aio_shutdown` python interpreter is already [being finalized](https://docs.python.org/3.14/glossary.html#term-interpreter-shutdown) (cleaning up resources, destroying objects, preparing for program exit, etc). `_actual_aio_shutdown` involves `PollerCompletionQueue` shutdown, followed by core [`grpc_shutdown`](https://grpc.github.io/grpc/core/grpc_8h.html#a35f55253e80714c17f4f3a0657e06f1b) API call.

Reasoning:
1. During finalizations, in come cases resources we're accessing may already be freed, and the order is not deterministic. Some of the resources being unloaded prior the  `_actual_aio_shutdown` call: `_global_aio_state`, `AsyncIOEngine` enum, or even python libraries like `sys`. This leads to errors like `AttributeError: 'NoneType' object has no attribute 'POLLER'`.
2. `PollerCompletionQueue.shutdown()` will try to wait on its poller thread to finish gracefully. In py3.14, `PythonFinalizationError` is raised when `Thread.join()` is called during finalization. I think the logic here is similar to (1): these threads may have already been deallocated.

Note that in some cases users were able to prevent `_actual_aio_shutdown` from being called by manually calling `init_grpc_aio` prior to initializing any grpc objects.  This resulted in an incorrect positive refcount, which prevents `_actual_aio_shutdown` from being run. Before the above finalization check was added this side-effect was sometimes misused to avoid deadlock on finialization (grpc#22365).

This PR:
- Fixes grpc#39520
- Fixes grpc#22365
- Fixes grpc#38679
- Fixes grpc#33342
- Fixes grpc#36655

Closes grpc#40447

COPYBARA_INTEGRATE_REVIEW=grpc#40447 from sergiitk:fix/aio/shutdown 11114f6
PiperOrigin-RevId: 804971756
sergiitk added a commit that referenced this pull request Sep 17, 2025
…eter is finalizing (#40649)

Backport of #40447 to v1.75.x.
---
This PR changes the logic of `shutdown_grpc_aio` to skip
`_actual_aio_shutdown` python interpreter is already [being
finalized](https://docs.python.org/3.14/glossary.html#term-interpreter-shutdown)
(cleaning up resources, destroying objects, preparing for program exit,
etc). `_actual_aio_shutdown` involves `PollerCompletionQueue` shutdown,
followed by core
[`grpc_shutdown`](https://grpc.github.io/grpc/core/grpc_8h.html#a35f55253e80714c17f4f3a0657e06f1b)
API call.

Reasoning:
1. During finalizations, in come cases resources we're accessing may
already be freed, and the order is not deterministic. Some of the
resources being unloaded prior the `_actual_aio_shutdown` call:
`_global_aio_state`, `AsyncIOEngine` enum, or even python libraries like
`sys`. This leads to errors like `AttributeError: 'NoneType' object has
no attribute 'POLLER'`.
2. `PollerCompletionQueue.shutdown()` will try to wait on its poller
thread to finish gracefully. In py3.14, `PythonFinalizationError` is
raised when `Thread.join()` is called during finalization. I think the
logic here is similar to (1): these threads may have already been
deallocated.

Note that in some cases users were able to prevent
`_actual_aio_shutdown` from being called by manually calling
`init_grpc_aio` prior to initializing any grpc objects. This resulted in
an incorrect positive refcount, which prevents `_actual_aio_shutdown`
from being run. Before the above finalization check was added this
side-effect was sometimes misused to avoid deadlock on finialization
(#22365).

This PR:
- Fixes #39520
- Fixes #22365
- Fixes #38679
- Fixes #33342
- Fixes #36655
@anniefrchz anniefrchz mentioned this pull request Oct 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment