Skip to content

Simplify cache logic#2856

Merged
MichaelEischer merged 1 commit intorestic:masterfrom
aawsome:remove-readahead
Sep 4, 2021
Merged

Simplify cache logic#2856
MichaelEischer merged 1 commit intorestic:masterfrom
aawsome:remove-readahead

Conversation

@aawsome
Copy link
Copy Markdown
Contributor

@aawsome aawsome commented Jul 28, 2020

What does this PR change? What problem does it solve?

Simplifies the cache logic:

  • Actually a "ReadAhead" closure is defined which basically keeps the information which pack is a "treepack".
    This closure is used for one single purpose: If a pack file is loaded and not yet contained in the cache, it will be loaded and saved in the cache for future use.
  • Actually when saving tree packs, the same pack is saved again in the cache in packer_manager.

I changed the logic and added BlobType to the handle used to access the backend. This is in fact only used by the cache to determine whether a pack file from the backend should be cached.
As a side effect no tree packs need to be determined and stored within the index implementation, but instead mixed packs are saved.

Note that this PR does change the behavior of restic with respect to these points:

  • When a pack files is accessed without knowing what blob types are within, this file will not be cached if not already present in the cache. This is e.g. the case for pack.List which is used in some places.
  • Also tests are not modified and will sometimes not use the cache

Was the change discussed in an issue or in the forum before?

No.

Checklist

  • I have read the Contribution Guidelines
  • I have enabled maintainer edits for this PR
  • I have added tests for all changes in this PR
  • I have added documentation for the changes (in the manual)
  • There's a new file in changelog/unreleased/ that describes the changes for our users (template here)
  • I have run gofmt on the code in all commits
  • All commit messages are formatted in the same style as the other commits in the repo
  • I'm done, this Pull Request is ready for review

Copy link
Copy Markdown
Member

@MichaelEischer MichaelEischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My impression that this PR merely shift the complexity from one place to another: now there's a Backend and a CachedBackend which requires using the right one to get tree packs cached.

I'd also like to keep the explicit tracking of tree packs. My suggestion would be to replace the PerformReadahead closure with a method to register Handles which should be cached. This would also fit in nicely with the other already existing methods for cache management.

@aawsome
Copy link
Copy Markdown
Contributor Author

aawsome commented Jan 3, 2021

Thank you guys for your feedback! I agree that the new CachedBackend isn't a good idea; I'll change this.

My main intention in this was to change how restic determines which files are to be cached. The ReadAhead solution is not only unflexible but it also needs to save the list of "tree packs" in memory. This is then actually saved multiple times which I wanted to prevent. Moreover, the already-cached-files can also be easily determined by just looking at the cache itself, so there is in priciple no need to save those filenames again somewhere..

Also I find it way more natural to determine by its attributes whether a file is to be cached and not by saving its name somewhere. For all files except pack files this is the way the cache already works (just determining by the filetype wheter the file is to be cached). So I would say it is much more natural to extend this to all files - and this requires for pack files to give the cache not only the information about the filetype, but also about the blobtype.

So I propose to extend the function signatures in the Backend interface such that the blob type can also be given for Save and Load. The real backend implementation would not use it, but a backend that wraps a cache or implements #3202 in some way would use it. To still support mixed pack files, the InvalidBlobType could be given for that case.

@aawsome
Copy link
Copy Markdown
Contributor Author

aawsome commented Jan 22, 2021

I changed the logic (and the description of this PR). Now the Handle structure is extended by the BlobType which is set when saving/loading pack files and knowing whether it is a data/tree/mixed pack. This information is then only used by the cache backend to store tree packs in the cache.

Copy link
Copy Markdown
Member

@MichaelEischer MichaelEischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of annotating which type of blobs in contained in a pack file in general. However, we could (accidentally) end up with a few places in the code which no longer cache a file, as the burden to determine the file type now has moved to the caller and out of the cache backend. On the other hand it should be easier for the caller to know whether a file will be cached or not.

I've found a few more places in the code whose behavior is affected by the cache changes:

  • CreateIndexFromPacks in internal/repository/repository calls PackList which then has to read a pack file from the repository. In most cases there pack file is probably not yet part of the repository index and thus this shouldn't make much of a difference.
  • pack.List is called in packsToBlobs of cmd_find. This would no longer cache the pack file if it is a tree pack. Although, I'd expect that only a very small number of packs will be used (actually just one before #3427)
  • cat pack ... in cmd_cat would no longer cache a tree pack when dumping its content. But I guess that's not much of a problem.
  • printPacks in cmd_debug uses pack.List which would no longer be cached. But I guess that would be ok for a debug command.
  • And finally examinePack in cmd_debug uses LoadAll and be.Load which would no longer cache an examined tree pack. But I think the same reasoning as with the previous debug command applies.

So in summary I think these small behavior changes should nevertheless be ok. I only have one nitpick left before we could merge this, see the comment below.

@aawsome
Copy link
Copy Markdown
Contributor Author

aawsome commented Sep 3, 2021

@MichaelEischer Thanks a lot for your review and big excuse for the very late reply. I was heavily busy with finding a new home for my family and finally relocating. And wow, that was really a very busy time...
While not completely finished I'm starting to see light at the end of the tunnel now ;-)

I also rebased this PR to solve the merge conflicts. Hope it is now in a good shape to be merged.

Copy link
Copy Markdown
Member

@MichaelEischer MichaelEischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@MichaelEischer Thanks a lot for your review and big excuse for the very late reply. I was heavily busy with finding a new home for my family and finally relocating. And wow, that was really a very busy time...
While not completely finished I'm starting to see light at the end of the tunnel now ;-)

No problem. That really sounds like a lot of work, and I guess it's always surprising how much stuff accumulates over the years.

@MichaelEischer MichaelEischer merged commit fa5ca8a into restic:master Sep 4, 2021
@aawsome aawsome deleted the remove-readahead branch February 24, 2024 22:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants