Skip to content

Fix rclone subprocess handling#2855

Merged
MichaelEischer merged 6 commits intorestic:masterfrom
MichaelEischer:rclone-exit
Jul 26, 2020
Merged

Fix rclone subprocess handling#2855
MichaelEischer merged 6 commits intorestic:masterfrom
MichaelEischer:rclone-exit

Conversation

@MichaelEischer
Copy link
Copy Markdown
Member

What is the purpose of this change? What does it change?

restic currently does not detect when the rclone subprocess exits and hangs forever (see #2381 and #2797). We also see test failure from time to time which fail with "signal: broken pipe" while closing the rclone backend.

This PR is partially inspired by #2797 and reuses the new test.

restic currently holds on to file descriptors for both the read and write side of the stdin and stdout pipes to the rclone subprocess. That prevents the pipes from closing properly once rclone stops and is also the root cause of not noticing the subprocess exit. Calling stdioConn.Close() once the child process has exited is still necessary to free the file descriptors.

Failing rclone tests seems to be caused because for some reason rclone dies due to a SIGPIPE error which is then returned by cmd.Wait() and gets printed as signal: broken pipe. I have no idea what data rclone writes to the stdio_conn when Close is called, maybe some leftover data on the HTTP2 connection? The fix for this is to give rclone time to shut down after stdin was closed, such that stdout is not closed while rclone still write to it.

Was the change discussed in an issue or in the forum before?

fixes #2381
replaces #2797

Checklist

  • I have read the Contribution Guidelines
  • I have enabled maintainer edits for this PR
  • I have added tests for all changes in this PR
  • I have added documentation for the changes (in the manual)
  • There's a new file in changelog/unreleased/ that describes the changes for our users (template here)
  • I have run gofmt on the code in all commits
  • All commit messages are formatted in the same style as the other commits in the repo
  • I'm done, this Pull Request is ready for review

@greatroar
Copy link
Copy Markdown
Contributor

I have no idea what data rclone writes to the stdio_conn when Close is called, maybe some leftover data on the HTTP2 connection?

I'm not familiar with HTTP2, but a quick glance at RFC 7540 suggests that it may still send control messages when the connection is in half-open state. Maybe it just acknowledges the close?

In any case, tests pass on my machine.

greatroar and others added 5 commits July 26, 2020 00:28
restic did not notice when the rclone subprocess exited unexpectedly.

restic manually created pipes for stdin and stdout and used these for the
connection to the rclone subprocess. The process creating a pipe gets
file descriptors for the sender and receiver side of a pipe and passes
them on to the subprocess. The expected behavior would be that reads or
writes in the parent process fail / return once the child process dies
as a pipe would now just have a reader or writer but not both.

However, this never happened as restic kept the reader and writer
file descriptors of the pipes. `cmd.StdinPipe` and `cmd.StdoutPipe`
close the subprocess side of pipes once the child process was started
and close the parent process side of pipes once wait has finished. We
can't use these functions as we need access to the raw `os.File` so just
replicate that behavior.
Calling `Close()` on the rclone backend sometimes failed during test
execution with 'signal: Broken pipe'. The stdio connection closed both
the stdin and stdout file descriptors at the same moment, therefore
giving rclone no chance to properly send any final http2 data frames.

Now the stdin connection to rclone is closed first and will only be
forcefully closed after a timeout. In case rclone exits before the
timeout then the stdio connection will be closed normally.
@MichaelEischer
Copy link
Copy Markdown
Member Author

I'm not familiar with HTTP2, but a quick glance at RFC 7540 suggests that it may still send control messages when the connection is in half-open state. Maybe it just acknowledges the close?

The usual behavior seems to be to explicitly close the substreams, but I haven't seen commands to close the whole connection. At least nothing shows up when logging reads and writes on the stream.

Copy link
Copy Markdown
Member

@fd0 fd0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! LGTM!

@MichaelEischer MichaelEischer merged commit be54cef into restic:master Jul 26, 2020
@MichaelEischer MichaelEischer deleted the rclone-exit branch July 26, 2020 20:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

restic does not detect rclone termination

3 participants