Skip to content

Conversation

@vstinner
Copy link
Member

@vstinner vstinner commented Jun 26, 2018

Fix a race condition in Popen of
multiprocessing.popen_spawn_win32. The child process now duplicates
rhandle instead of "stealing" it.

Previously, rhandle was "stolen" by the child process, but it leaked
a handle if the child process has been terminated before it "stole"
the handle (and closed it in the parent process).

https://bugs.python.org/issue33929

@vstinner vstinner requested review from applio and pitrou June 26, 2018 00:10
@vstinner vstinner changed the title multiprocessing: fix handle leak on race condition bpo-33929: multiprocessing: fix handle leak on race condition Jun 26, 2018
@vstinner
Copy link
Member Author

@pitrou: My first version was wrong, but I fixed it and now tests pass on AppVeyor and VSTS (Windows). Would you mind to review this change please? I'm not 100% sure that it's correct, but I'm now sure that the current code has a race condition :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is this function still used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's part of the public API, no? At least, it's exported in all on Windows.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this comment is useful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first version of my patch tried to close sentinel as soon as the process completed. But it broke a test. Then I understood that it's public attribute, and so it cannot be closed here. For me it's non-obvious that close() cannot be closed "soon" (at least, not by Popen itself).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. The comment confuses me more than the situation it's supposed to comment :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/it/the read end of pipe/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/has been/had been/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"before it could steal the handle from the parent process"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit annoying. One argument is a process handle and the other is a pid?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

steal_handle() also has a weird API, but I agree that it's not a good reason to continue that :-) I changed the parameter to source_process, and moved the pid => handle dance to the caller.

Copy link
Member Author

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rebased my PR to be able to rewrite the commit message and update properly the NEWS entry.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

steal_handle() also has a weird API, but I agree that it's not a good reason to continue that :-) I changed the parameter to source_process, and moved the pid => handle dance to the caller.

bpo-33929: Fix a race condition in Popen of
multiprocessing.popen_spawn_win32. The child process now duplicates
the read end of pipe instead of "stealing" it.

Previously, the read end of pipe was "stolen" by the child process,
but it leaked a handle if the child process had been terminated
before it could steal the handle from the parent process.
@vstinner
Copy link
Member Author

@pitrou: I fixed all the issues that you reported. Would you mind to review my updated PR? Thanks.

By the way, I found a similar but different bug: https://bugs.python.org/issue33966 But it's a different code path, so I think that it's fine to fix it in a different PR.

@vstinner
Copy link
Member Author

I don't know. The comment confuses me more than the situation it's supposed to comment :-)

Ok, fair. I removed the comment.

@pitrou
Copy link
Member

pitrou commented Jun 27, 2018

Thank you for unconfusing me :-)

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vstinner !

@vstinner
Copy link
Member Author

Thank you for unconfusing me :-)

LOL

@vstinner vstinner merged commit 2cc9d21 into python:master Jun 27, 2018
@miss-islington
Copy link
Contributor

Thanks @vstinner for the PR 🌮🎉.. I'm working now to backport this PR to: 3.6, 3.7.
🐍🍒⛏🤖 I'm not a witch! I'm not a witch!

@vstinner vstinner deleted the mp_spawn_win32 branch June 27, 2018 09:40
@bedevere-bot
Copy link

GH-7960 is a backport of this pull request to the 3.7 branch.

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jun 27, 2018
…GH-7921)

Fix a race condition in Popen of
multiprocessing.popen_spawn_win32. The child process now duplicates
the read end of pipe instead of "stealing" it.

Previously, the read end of pipe was "stolen" by the child process,
but it leaked a handle if the child process had been terminated
before it could steal the handle from the parent process.
(cherry picked from commit 2cc9d21)

Co-authored-by: Victor Stinner <vstinner@redhat.com>
@miss-islington
Copy link
Contributor

Sorry, @vstinner, I could not cleanly backport this to 3.6 due to a conflict.
Please backport using cherry_picker on command line.
cherry_picker 2cc9d21fffb8146d30e6fb4221e32410ba4b4ab7 3.6

@bedevere-bot
Copy link

GH-7961 is a backport of this pull request to the 3.6 branch.

@pitrou
Copy link
Member

pitrou commented Jun 27, 2018

Victor, I'm skeptical about backporting this. It's not an important issue and we risk introducing regressions in a bugfix release. Let's make it a 3.8-only change?

@vstinner
Copy link
Member Author

Victor, I'm skeptical about backporting this. It's not an important issue and we risk introducing regressions in a bugfix release. Let's make it a 3.8-only change?

I'm trying to understand why multiprocessing tests are failing randomly but also frequently, for one year (if not longer, 3 years?). Fixing race conditions should reduce random tests failures, but I'm not sure that leaking open Windows handles can explain these failures.

About regression: IMHO the new code is less fragile and should not have no race condition. But I'm ok to not backport this change, except if someone proves that this change helps to make tests more stable :-)

miss-islington added a commit that referenced this pull request Jun 27, 2018
Fix a race condition in Popen of
multiprocessing.popen_spawn_win32. The child process now duplicates
the read end of pipe instead of "stealing" it.

Previously, the read end of pipe was "stolen" by the child process,
but it leaked a handle if the child process had been terminated
before it could steal the handle from the parent process.
(cherry picked from commit 2cc9d21)

Co-authored-by: Victor Stinner <vstinner@redhat.com>
@vstinner
Copy link
Member Author

I rejected my 3.6 backport: PR 7961.

But @miss-islington merged its 3.7 backport: PR #7960 :-(

@pitrou: Should I revert the backport to 3.7? Maybe it's fine for the 3.7.1, since it's newer?

@vstinner
Copy link
Member Author

@pitrou: Should I revert the backport to 3.7? Maybe it's fine for the 3.7.1, since it's newer?

Hum, since I already found a bug in my fix, I proposed a PR to revert the 3.7 backport: PR #7963.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants