The problem
Without --model-unblocked-syscall-latency, running the following python script (with python 3.10) ends up looping forever in shadow:
import subprocess as sp
p = sp.Popen(['sleep', '1'], stdin=sp.PIPE, stdout=sp.PIPE, stderr=sp.PIPE)
while True:
try:
p.communicate(timeout=0.01)
except sp.TimeoutExpired:
print("timeout")
continue
break
print("done")
Debugging
We can see a single-step through the relevant python code by inserting a ptyon breakpoint() statement to break into pdb...
@@ -3,6 +3,7 @@ import subprocess as sp
p = sp.Popen(['sleep', '1'], stdin=sp.PIPE, stdout=sp.PIPE, stderr=sp.PIPE)
while True:
try:
+ breakpoint()
p.communicate(timeout=0.01)
except sp.TimeoutExpired:
print("timeout")
...and piping in an infinite sequence of s (single-step) commands
$ shadow-exec -- bash -c 'yes s | python3 subprocess_test.py'
We get a lot of output, but I think the main line of interest is at https://github.com/python/cpython/blob/9524203deefb6d4ea6a502661f855961dee1af85/Lib/subprocess.py#L2013
if timeout is not None and timeout < 0:
Basically what happens is:
- The first time around, a python
select statement is executed with the original timeout of 0.01s. (this ends up being a poll syscall on my machine)
- This returns exactly 0.01s later with no ready events
- The subprocess module calculates the remaining time from the original timeout and gets exactly 0
- In the line referenced above, since this means that it instead of raising a timeout exception, it ends up calling select again, now with a timeout of exactly 0
- The
poll syscall returns immediately and we do the loop again
TODO
This can be worked around in shadow with --model-unblocked-syscall-latency
One might be able to weakly argue that this is a bug in Python... if one believes it's unsafe to assume the clock will at some point tick forward. Outside of shadow I guess that's generally a pretty safe assumption though. Still, maybe python can be "fixed" here if we can convince them that such a fix is at least unlikely to do harm.
TODO:
- check what other versions of python are affected, particularly whether this is still the behavior at HEAD
- maybe file an issue and/or send a PR to python to fix it
- add to shadow's compatibility notes
The problem
Without
--model-unblocked-syscall-latency, running the following python script (with python 3.10) ends up looping forever in shadow:Debugging
We can see a single-step through the relevant python code by inserting a ptyon
breakpoint()statement to break intopdb......and piping in an infinite sequence of s (single-step) commands
We get a lot of output, but I think the main line of interest is at https://github.com/python/cpython/blob/9524203deefb6d4ea6a502661f855961dee1af85/Lib/subprocess.py#L2013
Basically what happens is:
selectstatement is executed with the original timeout of 0.01s. (this ends up being apollsyscall on my machine)pollsyscall returns immediately and we do the loop againTODO
This can be worked around in shadow with
--model-unblocked-syscall-latencyOne might be able to weakly argue that this is a bug in Python... if one believes it's unsafe to assume the clock will at some point tick forward. Outside of shadow I guess that's generally a pretty safe assumption though. Still, maybe python can be "fixed" here if we can convince them that such a fix is at least unlikely to do harm.
TODO: