Skip to content

add --stdin-from-command flag to backup command#4254

Closed
sebhoss wants to merge 5 commits intorestic:masterfrom
sebhoss:issue-4251
Closed

add --stdin-from-command flag to backup command#4254
sebhoss wants to merge 5 commits intorestic:masterfrom
sebhoss:issue-4251

Conversation

@sebhoss
Copy link
Copy Markdown
Contributor

@sebhoss sebhoss commented Mar 16, 2023

What does this PR change? What problem does it solve?

This fixes #4251 by providing the requested --stdin-from-command flag to the restic backup command. It skips snapshot creation in case the given commands fails as determined by exec.Cmd.Wait() and returns the stderr of the given command as an error message back to the user.

I have not written any tests, nor documentation because I wanted to get some feedback about the path taken here wrt. upstream inclusion first. I am seeing various test failures in existing tests but these seem to be unrelated and might just occur because I'm building/testing the project within a container. Using these changes locally does look promising though:

$ ./restic backup --stdin-from-command --repo ./backup -- pg_dump -h foobar
enter password for repository: 
repository 55e7d5a0 opened (version 2, compression level auto)
using parent snapshot cc6e1dec
error: read /stdin: no data read
Fatal: unable to save snapshot: pg_dump: error: connection to database "root" failed: could not translate host name "foobar" to address: No address associated with hostname

Was the change previously discussed in an issue or on the forum?

Closes #4251

Checklist

  • I have read the contribution guidelines.
  • I have enabled maintainer edits.
  • I have added tests for all code changes.
  • I have added documentation for relevant changes (in the manual).
  • There's a new file in changelog/unreleased/ that describes the changes for our users (see template).
  • I have run gofmt on the code in all commits.
  • All commit messages are formatted in the same style as the other commits in the repo.
  • I'm done! This pull request is ready for review.

sebhoss added 5 commits March 16, 2023 07:41
This new flag is added to the backup subcommand in order to allow restic to control the execution of a command and determine whether to save a snapshot if the given command succeeds.

Signed-off-by: Sebastian Hoß <seb@xn--ho-hia.de>
In order to run with --stdin-from-command we need to short-circuit some functions similar to how it is handled for the --stdin flag. The only difference here is that --stdin-from-command actually expects that len(args) should be greater 0 whereas --stdin does not expect any args at all.

Signed-off-by: Sebastian Hoß <seb@xn--ho-hia.de>
It acts similar to --stdin but reads its data from the stdout of the given command instead of os.Stdin.

Signed-off-by: Sebastian Hoß <seb@xn--ho-hia.de>
In order to determine whether to save a snapshot, we need to capture the exit code returned by a command. In order to provide a nice error message, we supply stderr as well.

Signed-off-by: Sebastian Hoß <seb@xn--ho-hia.de>
Return with an error containing the stderr of the given command in case it fails. No new snapshot will be created and future prune operations on the repository will remove the unreferenced data.

Signed-off-by: Sebastian Hoß <seb@xn--ho-hia.de>
@Enrico204
Copy link
Copy Markdown
Contributor

LGTM! I think that this option should be strongly suggested in the documentation in place of the current "stdin procedure".

Maybe we want to unset some sensible environment variables? (e.g., RESTIC_PASSWORD)

@sebhoss
Copy link
Copy Markdown
Contributor Author

sebhoss commented Mar 29, 2023

Thanks for the review!

That's a good point about environment variables - I completely missed that requirement from the original ticket and thus the current code executes without any environment variables. However inheriting them and unsetting some sensible ones makes sense to me!

@Enrico204
Copy link
Copy Markdown
Contributor

Enrico204 commented Mar 29, 2023

AFAIK, exec.CommandContext is creating a exec.Cmd with the default value (nil) in the .Env field. According to the documentation, a nil value in .Env will instruct Go to inherit the current environment set. So, the current code is already passing all env to the child process (which seems the same behavior as Borg's).

A very rough way might be to remove all prefixed variables indicated in the documentation:

diff --git a/cmd/restic/cmd_backup.go b/cmd/restic/cmd_backup.go
index ee355bb42..2068d5e6b 100644
--- a/cmd/restic/cmd_backup.go
+++ b/cmd/restic/cmd_backup.go
@@ -608,6 +608,20 @@ func runBackup(ctx context.Context, opts BackupOptions, gopts GlobalOptions, ter
                        if err != nil {
                                return err
                        }
+
+                       for _, v := range os.Environ() {
+                               if !strings.HasPrefix(v, "RESTIC_") &&
+                                       !strings.HasPrefix(v, "AWS_") &&
+                                       !strings.HasPrefix(v, "ST_") &&
+                                       !strings.HasPrefix(v, "OS_") &&
+                                       !strings.HasPrefix(v, "B2_") &&
+                                       !strings.HasPrefix(v, "AZURE_") &&
+                                       !strings.HasPrefix(v, "GOOGLE_") {
+
+                                       command.Env = append(command.Env, v)
+                               }
+                       }
+
                        if err := command.Start(); err != nil {
                                return err
                        }

Note: I haven't tested this code.

Other options include:

  • resetting the environment variables list to a safe default; or
  • remove specific variables (e.g. RESTIC_PASSWORD), leaving others untouched

I'd argue that the the first option is too restrictive, and the second is too much maintenance burden (maintainers/developers need to remember to add new sensitive variables in the list). So, resetting variables by their prefix may be a tradeoff.

What do you think?

@sebhoss
Copy link
Copy Markdown
Contributor Author

sebhoss commented Mar 29, 2023

Ah nice, I guess I misread the docs - thanks again!

Thinking about unsetting variables a bit more I'm concerned that it might be too magical for users since they cannot influence it directly. So maybe another option like --inherit-env=true|false would make sense? Using this flag, users could control wheher the command they are executing inherits all or none of the availble env variables.

Edit: Or maybe something like --filter-env='RESTIC_*' to remove specific variables and --filter-env='*' to remove them all?

@Enrico204
Copy link
Copy Markdown
Contributor

While I agree that it might seem magical, I think too many knobs are equally bad. I don't see any use case for RESTIC_* variables to be passed to the child program, and, in any case, one can still "restore" variables for the child command using env. For example:

$ restic backup \
   --stdin-from-command \
   --repo ./backup \
   -- env RESTIC_REPOSITORY=$RESTIC_REPOSITORY pg_dump -h foobar

So, if a variable is removed, it can be re-added explicitly (it's safe - a pattern might match more/fewer variables tomorrow).

This needs to be clearly documented, of course :-)

@sebhoss
Copy link
Copy Markdown
Contributor Author

sebhoss commented Mar 30, 2023

Yeah I agree that too many knobs do not feel that good either. I think in the end I would probably just leave this as-is given that 1) borg seems to be fine with that 2) the current --stdin procedure works like that as well and 3) users can overwrite/unset variables as you have shown in your last snippet anyway.

@Enrico204
Copy link
Copy Markdown
Contributor

I am testing your branch for some real world backup in my personal/lab. For now, everything is working as expected :-)

Copy link
Copy Markdown
Member

@MichaelEischer MichaelEischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favor of adding a --stdin-from-command option (or maybe --content-from-command). Implementation-wise, it shouldn't be necessary to modify the archiver at all; the fs.FS interface should be generic enough to abstract away everything.

}

if len(args) > 0 {
if len(args) > 0 && opts.StdinCommand == false {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use && !opts.StdinCommand

}

if opts.Command != nil {
errBytes, _ := io.ReadAll(opts.CommandStderr)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will deadlock if a process writes too much data to stderr. Please handle stderr similar as in the rclone/sftp backend.

Hostname: opts.Host,
ParentSnapshot: parentSnapshot,
Command: command,
CommandStderr: stderr,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to modify the archiver. Instead, the Close() error part of the ReadCloser of fs.Reader should wait for the command to exit and return an error if necessary.

cmdErr := opts.Command.Wait()
if cmdErr != nil {
debug.Log("error while executing command: %v", cmdErr)
return nil, restic.ID{}, errors.New(string(errBytes))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cmdErr should not be dropped. It should at least be part of the returned error message.

filename := path.Join("/", opts.StdinFilename)
var closer io.ReadCloser
if opts.StdinCommand {
command = exec.CommandContext(ctx, args[0], args[1:]...)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move as much of the code that sets up the stdinCommand as possible out of runBackup. The latter function is already far too long.

f.StringVar(&backupOptions.ExcludeLargerThan, "exclude-larger-than", "", "max `size` of the files to be backed up (allowed suffixes: k/K, m/M, g/G, t/T)")
f.BoolVar(&backupOptions.Stdin, "stdin", false, "read backup from stdin")
f.StringVar(&backupOptions.StdinFilename, "stdin-filename", "stdin", "`filename` to use when reading from stdin")
f.BoolVar(&backupOptions.StdinCommand, "stdin-from-command", false, "execute command and store its stdout")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether --content-from-command (as in Borg) would be a better name or not. stdin-from-command somewhat contradicts its own description...

@sebhoss
Copy link
Copy Markdown
Contributor Author

sebhoss commented Apr 8, 2023

Implementation-wise, it shouldn't be necessary to modify the archiver at all; the fs.FS interface should be generic enough to abstract away everything.

Ok thanks for the pointer - that was exactly what I was looking for! So the idea here would be to :

  1. create an implementation of fs.FS that handles command execution and related errors
  2. refactor the code in runBackup to create an instance of that implementation
  3. pass it in as the targetFS to the existing archiver methods

Is that correct? Should that fs.FS implementation live in the same fs package? Is this and that how you would like to handle stderr output?

@MichaelEischer
Copy link
Copy Markdown
Member

Ok thanks for the pointer - that was exactly what I was looking for! So the idea here would be to :

1. create an implementation of `fs.FS` that handles command execution and related errors

Judging from https://github.com/restic/restic/pull/4254/files#diff-30d8672eafa9c0585766b6f437530377563cd1aac5606674c366f3ede4abea9eR618 an fs.FS implementation that wraps fs.Reader might work here.

2. refactor the code in `runBackup` to create an instance of that implementation

3. pass it in as the `targetFS` to the existing archiver methods

Yes.

Should that fs.FS implementation live in the same fs package? Is this and that how you would like to handle stderr output?

The fs package is probably the best place for now. And yep, I was referring to the goroutine that reads from stderr of the subprocess.

@MichaelEischer
Copy link
Copy Markdown
Member

@sebhoss Are you still interested in working on this PR?

@sebhoss
Copy link
Copy Markdown
Contributor Author

sebhoss commented Jul 10, 2023

Yes & no - we don't have any immediate pressure at work to get this in anymore, so I put this on my ever growing todo list. I still think that this would be a nice addition but I'm not sure when I'll be able to finish this. I'm fine with closing this PR in case you are doing some summer cleanups or re-assigning this to someone else(?)

@Enrico204
Copy link
Copy Markdown
Contributor

I can take over this PR. I am using the code here on a daily basis at home for backups, and I will be happy to have it merged :-)

However, don't I think I will be able to continue in this PR as the branch is from @sebhoss's repo, so maybe I can fork & create a new pull request with the same code, and then apply the suggestions from @MichaelEischer. What do you think?

@MichaelEischer
Copy link
Copy Markdown
Member

@Enrico204 Sounds good. Go ahead and create a new PR, I'll close the old PR afterwards.

@MichaelEischer
Copy link
Copy Markdown
Member

Superseded by #4410.

Enrico204 added a commit to Enrico204/restic that referenced this pull request Aug 27, 2023
The code has been refactored so that the archiver is back to the original code, and the stderr is handled using a go routine to avoid deadlock.
Enrico204 added a commit to Enrico204/restic that referenced this pull request Sep 16, 2023
The code has been refactored so that the archiver is back to the original code, and the stderr is handled using a go routine to avoid deadlock.
MichaelEischer pushed a commit to Enrico204/restic that referenced this pull request Oct 1, 2023
The code has been refactored so that the archiver is back to the original code, and the stderr is handled using a go routine to avoid deadlock.
Enrico204 added a commit to Enrico204/restic that referenced this pull request Oct 16, 2023
The code has been refactored so that the archiver is back to the original code, and the stderr is handled using a go routine to avoid deadlock.
MichaelEischer pushed a commit to Enrico204/restic that referenced this pull request Oct 27, 2023
The code has been refactored so that the archiver is back to the original code, and the stderr is handled using a go routine to avoid deadlock.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Backup flag to execute commands with restic, read content from standard input and consider return status

3 participants