Conversation
I'm having trouble verifying what you're fixing here. Can you give more detail on the bug you're addressing? From what I can see, job containers use an |
thats a typo on my side, sorry ( Ultimately, that means waiting on a |
| } | ||
|
|
||
| // Terminate the IO if the copy does not complete in the requested time. | ||
| timeoutErrCh := make(chan error) |
There was a problem hiding this comment.
Instead of having a separate channel for timeout error, can we just have a separate error variable that will be set only in the <-t.C: block? That way we don't need to worry about closing the channel and the following code block:
if tErr := <-timeoutErrCh; ioErr == nil {
ioErr = tErr
}
can just be
if ioErr == nil {
ioErr = tErr
}
This approach seems easier to read/understand for me at least. What do you think?
There was a problem hiding this comment.
we need to block on timeoutErrCh regardless, so we need some mechanism to have Wait actually wait for the timeout
the alternatives (imo) would be to add a function that selects on <-allDoneCh and ctx.Done(), and pass it a context with the CopyAfterExitTimeout or something similar, which i would rather save for when we refactor Cmd wholesale to properly thread contexts everywhere
kevpar
left a comment
There was a problem hiding this comment.
Based on some testing I'm a little confused about the error handling behavior you're fixing.
I created a test that opens a connected Reader/Writer, then closes the Writer, and attempts a read from the Reader. I tried this in cases where they were backed by os.Pipe, and by a named pipe (created via winio). In both cases, the read returned io.EOF.
Do you recall some more context on what you were seeing?
If you run the host process IO tests from #1964 (specifically (test src) |
I dug into this a bit. I learned that After investigating, it seems you've actually found a bug in how job containers handle IO pipes :D So these test cases like The real problem here is that job containers are not following the contract expected by
In the case of job containers (or anything using The proper fix here is to not close the parent end of the pipes when the process exits, and let it be handled by |
|
Here's the PR for the fix I mentioned: #2021 |
|
#2021 has been merged. You should be unblocked now. |
Update `Cmd.Wait` to return a known error value if it times out waiting on IO copy after the command exits (and update `TestCmdStuckIo` to check for that error). Prior, the test checked for an `io.ErrClosedPipe`, which: 1. is not the best indicator that IO is stuck; and 2. is now ignored as an error value raised during IO relay. Update `stuckIOProcess` logic in `cmd_test.go` to mirror logic in `interal/exec.Exec`, using `os.Pipe` for std io that returns an `io.EOF` (instead of `io.Pipe`, which does not). Signed-off-by: Hamza El-Saawy <hamzaelsaawy@microsoft.com>
Update `Cmd.Wait` to return a known error value if it times out waiting on IO copy after the command exits (and update `TestCmdStuckIo` to check for that error). Prior, the test checked for an `io.ErrClosedPipe`, which: 1. is not the best indicator that IO is stuck; and 2. is now ignored as an error value raised during IO relay. Update `stuckIOProcess` logic in `cmd_test.go` to mirror logic in `interal/exec.Exec`, using `os.Pipe` for std io that returns an `io.EOF` (instead of `io.Pipe`, which does not). Signed-off-by: Hamza El-Saawy <hamzaelsaawy@microsoft.com>
Have
internal\cmd.Cmdignore relay and close errors from the underlying IO channel being closed, since not allio.Reader/io.Writers return anio.EOFif the are closed during an IO operation.This standardizes behavior between an
hcs/gcsProcess(which use ago-winio.win32Filefor their IO channels, and returnio.EOFfor read and write operations on a closed handle) andJobProcess(which uses anos.Pipethat instead return anos.ErrClosed).Additionally, ignore errors from closing an already-closed std IO channel.
Update
Cmd.Waitto return a known error value if it times out waiting on IO copy after the command exits (and updateTestCmdStuckIoto check for that error).Prior, the test checked for an
io.ErrClosedPipe, which: