Skip to content

feat(backends/s3): add warmup support before repacks and restores#5173

Merged
MichaelEischer merged 9 commits intorestic:masterfrom
gilbsgilbs:warmup
Feb 1, 2025
Merged

feat(backends/s3): add warmup support before repacks and restores#5173
MichaelEischer merged 9 commits intorestic:masterfrom
gilbsgilbs:warmup

Conversation

@gilbsgilbs
Copy link
Copy Markdown
Contributor

@gilbsgilbs gilbsgilbs commented Dec 7, 2024

What does this PR change? What problem does it solve?

This commit introduces basic support for transitioning pack files stored in cold storage to hot storage on S3 and S3-compatible providers.

To prevent unexpected behavior for existing users, the feature is gated behind new flags:

  • s3.enable-restore: opt-in flag (defaults to false)
  • s3.restore-days: number of days for the restored objects to remain in hot storage (defaults to 7)
  • s3.restore-timeout: maximum time to wait for a single restoration (default to 1 day)
  • s3.restore-tier: retrieval tier at which the restore will be processed. (default to Standard)

As restoration times can be lengthy, this implementation preemptively restores selected packs to prevent incessant restore-delays during downloads. This is slightly sub-optimal as we could process packs out-of-order (as soon as they're transitioned), but this would really add too much complexity for a marginal gain in speed.

To maintain simplicity and prevent resources exhautions with lots of packs, no new concurrency mechanisms or goroutines were added. This just hooks gracefully into the existing routines.

Limitations:

  • Tests against the backend were not written due to the lack of cold storage class support in MinIO. Testing was done manually on Scaleway's S3-compatible object storage and AWS S3. If necessary, automatic testing with LocalStack or mocks can be explored, though this requires further discussion.
  • Currently, this feature only warms up before restores and repacks (prune/copy), as those are the two main use-cases I came across. Support for other commands may be added in future iterations, as long as affected packs can be calculated in advance.
  • The feature is gated behind a new alpha s3-restore feature flag to make it explicit that the feature is still wet behind the ears.
  • There is no explicit user notification for ongoing pack restorations. While I think it is not necessary because of the opt-in flag, showing some notice may improve usability (but would probably require major refactoring in the progress bar which I didn't want to start). Another possibility would be to add a flag to send restores requests and fail early.

Was the change previously discussed in an issue or on the forum?

Checklist

  • I have added tests for all code changes.
    • see "Limitations" section above: tests have only been written for warmups, but not for actual S3-backend restores because MinIO server doesn't support cold storage classes. I'm happy to discuss on the best way to write tests for that without adding too much boilerplate (e.g. introduce localstack or find a proper way to mock this).
  • I have added documentation for relevant changes (in the manual).
  • There's a new file in changelog/unreleased/ that describes the changes for our users (see template).
  • I'm done! This pull request is ready for review.

@gilbsgilbs gilbsgilbs force-pushed the warmup branch 2 times, most recently from 7108753 to 84c7179 Compare December 7, 2024 16:31
@gilbsgilbs gilbsgilbs force-pushed the warmup branch 7 times, most recently from 4948809 to aeeecf2 Compare December 8, 2024 18:03
Copy link
Copy Markdown
Member

@MichaelEischer MichaelEischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole approach still feels somewhat experimental, so I think we should move the code behind an alpha feature flag.

In general, automatically warming up and waiting for the required pack files looks like a solid approach. I prefer that variant compared to having --warmup flags that require the user to guess when the warmup is complete (and in case of prune won't interact well with concurrent modifications of the repository). However, to further progress cold storage support, we'll need at least some feedback for the user to indicate that restic is actually waiting for files to be warmed up in contrast to being stuck.

Having more supported commands would also be nice, but that can be changed later on while the feature is still in alpha state.

@gilbsgilbs
Copy link
Copy Markdown
Contributor Author

Thanks @MichaelEischer for the feedback. I'm happy that you find this approach reasonable and that we can make cold storage support move forward in restic.

  • I think I've responded to your questions. This PR and the choices I made when writing it are a bit cold in my mind (pun not intended) and I'm dealing with other personal struggles, so sorry if I am not perfectly on point. I'll see if I can do anything pragmatic to close any debate on "slightly oriented" questions, but tell me if you see any blocker.
  • I either agree or don't have strong opinions on your other non-question comments.

So I'll rebase my branch and fix everything probably tomorrow, or by the end of next week.

However, to further progress cold storage support, we'll need at least some feedback for the user to indicate that restic is actually waiting for files to be warmed up in contrast to being stuck.

IIRC, this is more tedious than it seems because the current progress bar implementationS are "swallowing" everything. I'll see what I can do (probably in another branch if you don't mind, so as to not pollute this PR with UI-related stuff).

@gilbsgilbs gilbsgilbs force-pushed the warmup branch 6 times, most recently from 1b30397 to 27dd732 Compare January 21, 2025 20:05
@gilbsgilbs
Copy link
Copy Markdown
Contributor Author

I think I addressed all feedbacks.

Regarding user feedback, there is some fragmentation in the current codebase. Repack can only use a progress.Counter (which is initialized differently in the copy command and in the prune command). Restore uses a dedicated restoreui module which has more features. So I just passed a "log" callback to Repack, and used the existing Warn function in restoreui to warn the user that a warmup is ongoing. I can't say I'm too proud about that, but I don't see a proper way that wouldn't involve major rework on how the UI work. Tell me if you have a better idea on how to implement this.

@gilbsgilbs gilbsgilbs force-pushed the warmup branch 2 times, most recently from 71f8c30 to 7226f2a Compare January 22, 2025 00:28
@gilbsgilbs
Copy link
Copy Markdown
Contributor Author

I just removed the PacksWarmer struct which makes my whole implementation way simpler by removing one layer of indirection. We can always add such structure in the future if it's needed. Giving all handles to the backend at once may also allow optimizing the number of requests sent (which I have not implemented but is now theoretically possible).

This commit introduces basic support for transitioning pack files stored
in cold storage to hot storage on S3 and S3-compatible providers.

To prevent unexpected behavior for existing users, the feature is gated
behind new flags:

- `s3.enable-restore`: opt-in flag (defaults to false)
- `s3.restore-days`: number of days for the restored objects to remain
  in hot storage (defaults to `7`)
- `s3.restore-timeout`: maximum time to wait for a single restoration
  (default to `1 day`)
- `s3.restore-tier`: retrieval tier at which the restore will be
  processed. (default to `Standard`)

As restoration times can be lengthy, this implementation preemptively
restores selected packs to prevent incessant restore-delays during
downloads. This is slightly sub-optimal as we could process packs
out-of-order (as soon as they're transitioned), but this would really
add too much complexity for a marginal gain in speed.

To maintain simplicity and prevent resources exhautions with lots of
packs, no new concurrency mechanisms or goroutines were added. This just
hooks gracefully into the existing routines.

**Limitations:**

- Tests against the backend were not written due to the lack of cold
  storage class support in MinIO. Testing was done manually on
  Scaleway's S3-compatible object storage. If necessary, we could
  explore testing with LocalStack or mocks, though this requires further
  discussion.
- Currently, this feature only warms up before restores and repacks
  (prune/copy), as those are the two main use-cases I came across.
  Support for other commands may be added in future iterations, as long
  as affected packs can be calculated in advance.
- The feature is gated behind a new alpha `s3-restore` feature flag to
  make it explicit that the feature is still wet behind the ears.
- There is no explicit user notification for ongoing pack restorations.
  While I think it is not necessary because of the opt-in flag, showing
  some notice may improve usability (but would probably require major
  refactoring in the progress bar which I didn't want to start). Another
  possibility would be to add a flag to send restores requests and fail
  early.

See restic#3202
It's easier to handle multiple handles in the backend directly, and it
may open the door to reducing the number of requests made to the backend
in the future.
@gilbsgilbs gilbsgilbs force-pushed the warmup branch 3 times, most recently from a6f761c to f40ec01 Compare January 26, 2025 10:01
Copy link
Copy Markdown
Member

@MichaelEischer MichaelEischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes so far. I have a few more comments below. Regarding the progress information, let's stick to the simple log messages for now. I don't see a simple way to add proper progress bars without a lot of refactoring beforehand.

dstRepo restic.Repository,
packs restic.IDSet,
keepBlobs repackBlobSet,
p *progress.Counter,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using progress.Printer then it would be possible to also show a progress.Counter. For prune that would pretty much work out of the box, but he copy command has to be converted first to use setupTermstatus().

But the restore command is a much tougher problem. I'm not yet sure what the proper approach would be there. So, the log message will have to be good enough for now.

Copy link
Copy Markdown
Contributor Author

@gilbsgilbs gilbsgilbs Jan 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just want to point out that a progress bar for restores would not add a lot of value because all restores will succeed approximately at the same time. A progress bar would remain at "0%" for hours, and then it would suddenly wake-up, not much different than this PR. AFAIK, AWS doesn't provide any ETA for restores either. So I don't think heavy works are justified at all for this feature.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, then let's wait and see what feedback arrives. The progress bar also has a counter of the duration of the operation, that might be reassuring for users that restic isn't stuck entirely, but yeah the value of that is questionable.

if restore.OngoingRestore {
return false, true
}
if !restore.ExpiryTime.IsZero() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider files that expire soon as in need of a new restore call? Otherwise files could expire when performing multiple restores with a bit of time inbetween.

Copy link
Copy Markdown
Contributor Author

@gilbsgilbs gilbsgilbs Jan 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure if calling RestoreObject on an already-restored object would increase the expiry or just leave the object untouched, so I didn't want to write any explicit logic about that. I need to check if it is possible to do this.

Also, even if it worked, I am not sure how to define "expire soon". If it's < RestoreDays, then we may just call RestoreObject every time and not bother. If it's less, how much? < RestoreDays / 2 ? Sounds arbitrary. Letting user configure this value sounds too specific.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tested it and it seems to work. RestoreDays can be increased as well as reduced with subsequent calls to RestoreObject.

Given that the expiry-date is rounded up to the next day (GMT) by AWS, I think it is safe to call RestoreObject again if the current expiry-date is set before now + RestoreDays. This guarantees RestoreDays is always respected (at least on AWS), and still avoids calling RestoreObject for nothing.

ca5f6d5

Copy link
Copy Markdown
Member

@MichaelEischer MichaelEischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the quick changes! I've added a tiny fixup commit to fix a typo and correctly use feature.TestSetFlag().

dstRepo restic.Repository,
packs restic.IDSet,
keepBlobs repackBlobSet,
p *progress.Counter,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, then let's wait and see what feedback arrives. The progress bar also has a counter of the duration of the operation, that might be reassuring for users that restic isn't stuck entirely, but yeah the value of that is questionable.

@MichaelEischer MichaelEischer enabled auto-merge (squash) January 29, 2025 21:35
@MichaelEischer MichaelEischer merged commit 536ebef into restic:master Feb 1, 2025
wplapper pushed a commit to wplapper/restic that referenced this pull request Feb 4, 2025
…stic#5173)

* feat(backends/s3): add warmup support before repacks and restores

This commit introduces basic support for transitioning pack files stored
in cold storage to hot storage on S3 and S3-compatible providers.

To prevent unexpected behavior for existing users, the feature is gated
behind new flags:

- `s3.enable-restore`: opt-in flag (defaults to false)
- `s3.restore-days`: number of days for the restored objects to remain
  in hot storage (defaults to `7`)
- `s3.restore-timeout`: maximum time to wait for a single restoration
  (default to `1 day`)
- `s3.restore-tier`: retrieval tier at which the restore will be
  processed. (default to `Standard`)

As restoration times can be lengthy, this implementation preemptively
restores selected packs to prevent incessant restore-delays during
downloads. This is slightly sub-optimal as we could process packs
out-of-order (as soon as they're transitioned), but this would really
add too much complexity for a marginal gain in speed.

To maintain simplicity and prevent resources exhautions with lots of
packs, no new concurrency mechanisms or goroutines were added. This just
hooks gracefully into the existing routines.

**Limitations:**

- Tests against the backend were not written due to the lack of cold
  storage class support in MinIO. Testing was done manually on
  Scaleway's S3-compatible object storage. If necessary, we could
  explore testing with LocalStack or mocks, though this requires further
  discussion.
- Currently, this feature only warms up before restores and repacks
  (prune/copy), as those are the two main use-cases I came across.
  Support for other commands may be added in future iterations, as long
  as affected packs can be calculated in advance.
- The feature is gated behind a new alpha `s3-restore` feature flag to
  make it explicit that the feature is still wet behind the ears.
- There is no explicit user notification for ongoing pack restorations.
  While I think it is not necessary because of the opt-in flag, showing
  some notice may improve usability (but would probably require major
  refactoring in the progress bar which I didn't want to start). Another
  possibility would be to add a flag to send restores requests and fail
  early.

See restic#3202

* ui: warn user when files are warming up from cold storage

* refactor: remove the PacksWarmer struct

It's easier to handle multiple handles in the backend directly, and it
may open the door to reducing the number of requests made to the backend
in the future.
@gilbsgilbs
Copy link
Copy Markdown
Contributor Author

I somehow didn't notice this was merged. Thank you very much for your time @MichaelEischer . I'll follow-up with the missing commands in a few days or weeks.

gilbsgilbs added a commit to gilbsgilbs/restic that referenced this pull request Feb 13, 2025
gilbsgilbs added a commit to gilbsgilbs/restic that referenced this pull request Feb 13, 2025
gilbsgilbs added a commit to gilbsgilbs/restic that referenced this pull request Feb 18, 2025
gilbsgilbs added a commit to gilbsgilbs/restic that referenced this pull request Feb 18, 2025
gilbsgilbs added a commit to gilbsgilbs/restic that referenced this pull request Jun 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants