Skip to content

S3: Don't archive metadata files on S3 Glacier#4584

Merged
MichaelEischer merged 2 commits intorestic:masterfrom
elkemper:fix-stop-archiving-metadata
Jan 20, 2024
Merged

S3: Don't archive metadata files on S3 Glacier#4584
MichaelEischer merged 2 commits intorestic:masterfrom
elkemper:fix-stop-archiving-metadata

Conversation

@elkemper
Copy link
Copy Markdown
Contributor

@elkemper elkemper commented Dec 5, 2023

What does this PR change? What problem does it solve?

It fixes issue when you upload snapshot to S3 Glacier, and metadata isn't available from any machine without cache.

It sets a fallback for the case where storage class should not be set - for metadata AND ONLY when specified tier will move files to archive. For the code simplicity it is done through the inverse logic operators: set storage class when file is data/ OR when storage class is not instant accessible (such as Standard etc. including 'Glacier Instant Retrieval').

Was the change previously discussed in an issue or on the forum?

Closes #4583

Checklist

  • I have read the contribution guidelines.
  • I have enabled maintainer edits.
  • I have added tests for all code changes.
  • I have added documentation for relevant changes (in the manual).
  • There's a new file in changelog/unreleased/ that describes the changes for our users (see template).
  • I have run gofmt on the code in all commits.
  • All commit messages are formatted in the same style as the other commits in the repo.
  • I'm done! This pull request is ready for review.

@MichaelEischer MichaelEischer changed the title Fix: Stop archiving metadata S3: Don't archive metadata files on S3 Glacier Dec 24, 2023
Copy link
Copy Markdown
Member

@MichaelEischer MichaelEischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how this PR is supposed to work. The directory metadata of a snapshot is also stored in pack files and consequently there's no chance for the backend to distinguish data and metadata.

@elkemper
Copy link
Copy Markdown
Contributor Author

Hey @MichaelEischer, thanks for picking the PR!

Here's my line of thought when I investigated this issue.
I figured, that because of everything in the repository - is just files then there should be one function, the sole responsibility of which is - save any file to the backend. Also, searching for the StorageClass argument, i decided, that Save function is what i'm looking for. And looking through the call queue - it appeared the last stand of Restic before handling the operation to the Minio dependency.
Knowing that - I looked to the arguments of the function and understood that h Handle could help me to determine what is being saved right now. And thanks to all of you, it had a parameter Type, which corresponds to the enum with all object types. So I thought that is safe to assume, that I can filter for the PackFile.

Well, after your question, I've done some more digging and went up on call stack and found that it used in all Flush functions - for data and for indexes. And handlers with datatype are passed down to it.

Also, as a proof of work - here's bunch of screenshots of S3 when I tried this code. (saving just one small text file)
Config (and keys) are stored in Standard because when I init-ed the repo I didn't use StorageClass option (not sure even will it have any effect thou). And you can mention the date of files created.
The Snapshots, Index and Data are created few minutes later, and you can see that they have different Storage Classes

Screenshots image imageimage image image image

And in opposite, here's my first attempts of saving to the Glacier, when I didn't know about issue.

Screenshots of my tries and errors image image image

Hope this helps.

@MichaelEischer
Copy link
Copy Markdown
Member

A pack file can contain either tree or data blobs. To allow a second host without a prefilled cache to use the repository for backups, the second host must also be able to retrieve the tree blobs. However, both tree and data blobs are stored in pack files. To see what I mean just try the following (while using your PR):

restic -o s3.storage-class=GLACIER backup some-small-folder
restic -o s3.storage-class=GLACIER backup --no-cache some-small-folder

The second backup run will use the snapshot from the first backup run as its parent snapshot. And it will try to download the corresponding tree blobs, which will fail.

For that to work a pack file must also not be archived if h.IsMetadata in h backend.Handle is true.

I'm not entirely sure how you intend the restore procedure to work. Do I have to retrieve the whole backup archive?

thus snapshot metadata could be accessed from machine without local cache.

https://github.com/restic/restic/issues/4583
https://github.com/restic/restic/issues/3202 No newline at end of file
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR does not add full blown support for archive storage. As such it should be clear after reading that there is no official support for cold storages.

@elkemper
Copy link
Copy Markdown
Contributor Author

Thanks for the comments.
You are right @MichaelEischer, I didn't know this about packfiles. I tried to add to the test backup made with my first build and it couldn't get snapshot metadata.
After accepting your snippets I rechecked updating the backup as well as forget and prune and everything worked as expected.

As of how I'm expecting to use this backups - to make it the "1" from "3-2-1" backup strategy. I want a cheap storage for valuable media only in case of failures of local copies and raid setup. So yeah, if I will restore it, it will be a full restore.

@MichaelEischer MichaelEischer force-pushed the fix-stop-archiving-metadata branch from 6bc929a to ffbed06 Compare January 20, 2024 10:18
Copy link
Copy Markdown
Member

@MichaelEischer MichaelEischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I've reworded the changelog and made a few additional minor cleanups.

@MichaelEischer MichaelEischer force-pushed the fix-stop-archiving-metadata branch from ffbed06 to a763a5c Compare January 20, 2024 10:25
@MichaelEischer MichaelEischer added this pull request to the merge queue Jan 20, 2024
Merged via the queue into restic:master with commit 6696195 Jan 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Backup to S3 Glacier archive triers make backups unusable/broken without cache

2 participants