Fix rclone subprocess handling#2855
Merged
MichaelEischer merged 6 commits intorestic:masterfrom Jul 26, 2020
Merged
Conversation
7 tasks
greatroar
reviewed
Jul 25, 2020
greatroar
reviewed
Jul 25, 2020
Contributor
I'm not familiar with HTTP2, but a quick glance at RFC 7540 suggests that it may still send control messages when the connection is in half-open state. Maybe it just acknowledges the close? In any case, tests pass on my machine. |
restic did not notice when the rclone subprocess exited unexpectedly. restic manually created pipes for stdin and stdout and used these for the connection to the rclone subprocess. The process creating a pipe gets file descriptors for the sender and receiver side of a pipe and passes them on to the subprocess. The expected behavior would be that reads or writes in the parent process fail / return once the child process dies as a pipe would now just have a reader or writer but not both. However, this never happened as restic kept the reader and writer file descriptors of the pipes. `cmd.StdinPipe` and `cmd.StdoutPipe` close the subprocess side of pipes once the child process was started and close the parent process side of pipes once wait has finished. We can't use these functions as we need access to the raw `os.File` so just replicate that behavior.
Calling `Close()` on the rclone backend sometimes failed during test execution with 'signal: Broken pipe'. The stdio connection closed both the stdin and stdout file descriptors at the same moment, therefore giving rclone no chance to properly send any final http2 data frames. Now the stdin connection to rclone is closed first and will only be forcefully closed after a timeout. In case rclone exits before the timeout then the stdio connection will be closed normally.
b0bd9c6 to
01b9581
Compare
Member
Author
The usual behavior seems to be to explicitly close the substreams, but I haven't seen commands to close the whole connection. At least nothing shows up when logging reads and writes on the stream. |
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What is the purpose of this change? What does it change?
restic currently does not detect when the rclone subprocess exits and hangs forever (see #2381 and #2797). We also see test failure from time to time which fail with "signal: broken pipe" while closing the rclone backend.
This PR is partially inspired by #2797 and reuses the new test.
restic currently holds on to file descriptors for both the read and write side of the stdin and stdout pipes to the rclone subprocess. That prevents the pipes from closing properly once rclone stops and is also the root cause of not noticing the subprocess exit. Calling
stdioConn.Close()once the child process has exited is still necessary to free the file descriptors.Failing rclone tests seems to be caused because for some reason rclone dies due to a SIGPIPE error which is then returned by
cmd.Wait()and gets printed assignal: broken pipe. I have no idea what data rclone writes to the stdio_conn whenCloseis called, maybe some leftover data on the HTTP2 connection? The fix for this is to give rclone time to shut down after stdin was closed, such that stdout is not closed while rclone still write to it.Was the change discussed in an issue or in the forum before?
fixes #2381
replaces #2797
Checklist
changelog/unreleased/that describes the changes for our users (template here)gofmton the code in all commits