Clean up the leftover connection for finished threads in pinned thread mode#471
Conversation
|
cc @bartdag I badly wanted to make this fix. I am happy that I am finally able to submit this PR. |
|
@HyukjinKwon awesome work to fix this! |
2060594 to
c093d12
Compare
c093d12 to
a05214e
Compare
a05214e to
4b87d93
Compare
|
it should be ready for a look now. |
bartdag
left a comment
There was a problem hiding this comment.
Awesome work. It took me a while to understand what was causing the issue in the first place.
I made a few suggestions, but I do not feel strongly towards them. Feel free to take what you want from them!
Thinking about the root cause, in the future, we may want to add a method to allow threads to explicitly tell when their work is done. With the default gateway approach, you can always call GatewayClient.close() to close all connections, but this does not make sense in the pinned thread model (you want to close your connection, not all connections).
That being said, your solution is the only sane way to stay backward compatible, so kudos!!
4b87d93 to
9e6525b
Compare
9e6525b to
ed5b97d
Compare
|
Merged! |
### What changes were proposed in this pull request? This PR upgrade Py4J 0.10.9.4, with relevant documentation changes. ### Why are the changes needed? Py4J 0.10.9.3 has a resource leak issue when pinned thread mode is enabled - it's enabled by default in PySpark at 41af409. We worked around this by enforcing users to use `InheritableThread` or `inhteritable_thread_target` as a workaround. After upgrading, we don't need to enforce users anymore because it automatically cleans up, see also py4j/py4j#471 ### Does this PR introduce _any_ user-facing change? Yes, users don't have to use `InheritableThread` or `inhteritable_thread_target` to avoid resource leaking problem anymore. ### How was this patch tested? CI in this PR should test it out. Closes #35871 from HyukjinKwon/SPARK-38563. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit 8193b40) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
### What changes were proposed in this pull request? This PR upgrade Py4J 0.10.9.4, with relevant documentation changes. ### Why are the changes needed? Py4J 0.10.9.3 has a resource leak issue when pinned thread mode is enabled - it's enabled by default in PySpark at 41af409. We worked around this by enforcing users to use `InheritableThread` or `inhteritable_thread_target` as a workaround. After upgrading, we don't need to enforce users anymore because it automatically cleans up, see also py4j/py4j#471 ### Does this PR introduce _any_ user-facing change? Yes, users don't have to use `InheritableThread` or `inhteritable_thread_target` to avoid resource leaking problem anymore. ### How was this patch tested? CI in this PR should test it out. Closes #35871 from HyukjinKwon/SPARK-38563. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
This PR upgrade Py4J 0.10.9.4, with relevant documentation changes. Py4J 0.10.9.3 has a resource leak issue when pinned thread mode is enabled - it's enabled by default in PySpark at 41af409. We worked around this by enforcing users to use `InheritableThread` or `inhteritable_thread_target` as a workaround. After upgrading, we don't need to enforce users anymore because it automatically cleans up, see also py4j/py4j#471 Yes, users don't have to use `InheritableThread` or `inhteritable_thread_target` to avoid resource leaking problem anymore. CI in this PR should test it out. Closes #35871 from HyukjinKwon/SPARK-38563. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit 8193b40) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
What is the problem?
Correctly, there is resource leak when using the pinned thread mode (see also apache/spark#24898).
For example, if you repeat the codes below multiple times to create Py4J connections in multiple threads,
the number of leftover connections grows:
In the environment where multiple threads are used without a pool, it easily causes "Too many files open" due to the lack of file descriptors (as they are all occupied by unclosed sockets).
How do you fix?
This PR adds another variable to thread local that cleans up the connection right before the thread is finished. We need it as a separate thread local because
connectionis NOT cleaned because the reference is being held atJavaClient.deque.See also 50fe45e for more details.