-
Notifications
You must be signed in to change notification settings - Fork 269
document and maybe report deadlock in python 3.10's subprocess.Popen.communicate #3697
Description
The problem
Without --model-unblocked-syscall-latency, running the following python script (with python 3.10) ends up looping forever in shadow:
import subprocess as sp
p = sp.Popen(['sleep', '1'], stdin=sp.PIPE, stdout=sp.PIPE, stderr=sp.PIPE)
while True:
try:
p.communicate(timeout=0.01)
except sp.TimeoutExpired:
print("timeout")
continue
break
print("done")Debugging
We can see a single-step through the relevant python code by inserting a ptyon breakpoint() statement to break into pdb...
@@ -3,6 +3,7 @@ import subprocess as sp
p = sp.Popen(['sleep', '1'], stdin=sp.PIPE, stdout=sp.PIPE, stderr=sp.PIPE)
while True:
try:
+ breakpoint()
p.communicate(timeout=0.01)
except sp.TimeoutExpired:
print("timeout")...and piping in an infinite sequence of s (single-step) commands
$ shadow-exec -- bash -c 'yes s | python3 subprocess_test.py'
We get a lot of output, but I think the main line of interest is at https://github.com/python/cpython/blob/9524203deefb6d4ea6a502661f855961dee1af85/Lib/subprocess.py#L2013
if timeout is not None and timeout < 0:
Basically what happens is:
- The first time around, a python
selectstatement is executed with the original timeout of 0.01s. (this ends up being apollsyscall on my machine) - This returns exactly 0.01s later with no ready events
- The subprocess module calculates the remaining time from the original timeout and gets exactly 0
- In the line referenced above, since this means that it instead of raising a timeout exception, it ends up calling select again, now with a timeout of exactly 0
- The
pollsyscall returns immediately and we do the loop again
TODO
This can be worked around in shadow with --model-unblocked-syscall-latency
One might be able to weakly argue that this is a bug in Python... if one believes it's unsafe to assume the clock will at some point tick forward. Outside of shadow I guess that's generally a pretty safe assumption though. Still, maybe python can be "fixed" here if we can convince them that such a fix is at least unlikely to do harm.
TODO:
- check what other versions of python are affected, particularly whether this is still the behavior at HEAD
- maybe file an issue and/or send a PR to python to fix it
- add to shadow's compatibility notes