Skip to content

Remove blob size cache from restic mount#2787

Merged
MichaelEischer merged 1 commit intorestic:masterfrom
greatroar:no-blobsize-cache
Jul 25, 2020
Merged

Remove blob size cache from restic mount#2787
MichaelEischer merged 1 commit intorestic:masterfrom
greatroar:no-blobsize-cache

Conversation

@greatroar
Copy link
Copy Markdown
Contributor

What is the purpose of this change? What does it change?

It removes the blob size cache from restic mount.

Peak RAM of restic mount followed by immediate umount goes down from 1.33GiB to 955.8MiB for a 1TB local repo, a 29.6% reduction.

This cache is designed to speed up listing of huge files. However, I don't see any performance problem when removing it, if I backup a 100GiB file of random data and ls -l that in a mountpoint. ls exits instantly even on the first run with a fresh mount.

Was the change discussed in an issue or in the forum before?

The cache was added in #998 to solve #820.

#2587 tackles restic mount performance and also removes this cache, but seems stalled. In #2585, I removed the cache and suggested some optimizations that I no longer think are the right way to solve potential performance issues. I think it should easy enough to merge/rebase #2587 with this patch.

Checklist

  • I have read the Contribution Guidelines
  • I have enabled maintainer edits for this PR
  • I have added tests for all changes in this PR
  • I have added documentation for the changes (in the manual)
  • There's a new file in changelog/unreleased/ that describes the changes for our users (template here)
  • I have run gofmt on the code in all commits
  • All commit messages are formatted in the same style as the other commits in the repo
  • I'm done, this Pull Request is ready for review

Copy link
Copy Markdown
Contributor

@aawsome aawsome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should definitively be merged!
If #2587 is too large to review soon, please merge this and I'll rebase #2587.

@MichaelEischer
Copy link
Copy Markdown
Member

MichaelEischer commented Jun 16, 2020

What has changed since #820 / #998 such that restic no longer needs a blob size cache? The Index/MasterIndex structure back in restic v0.4.0 at first glance seems to be rather similar to the current state.

(Or could this have something to do with debug logging? #1549 also seems to have sped up index lookups a lot)

@greatroar
Copy link
Copy Markdown
Contributor Author

@MichaelEischer I have honestly no idea. I've never been able to reproduce #820. In any case, with #2788 and/or #2789, there shouldn't be a performance issue left, at least not in the single-client case.

@aawsome
Copy link
Copy Markdown
Contributor

aawsome commented Jun 17, 2020

@MichaelEischer I also don't have any idea why the blob size cache could significantly increase speed... The only issue I see is with a lot of index files as LookupBlobSize needs to loop over all index files within the master index. And there is some small overhead in the used 'Lookup', but this will be removed with #2788 or #2789 as @greatroar correctly pointed out.

Moreover, #2587 will make restic mount much more responsive and only needs to call 'LookupBlobSize` when the file is really accessed.

So I suggest to remove this memory-hungry cache and leave further speed optimizations to #2587 or to a future PR that stores all loaded index information in one single data structure.

@MichaelEischer
Copy link
Copy Markdown
Member

If the performance problem that was tackled using the blob size cache still exists, then #2587 only moves the incredibly slow operation from listing the directory to opening the file. That would improve performance a bit, but whether you have to wait 12 or 9 minutes (just taking the numbers from #820) isn't that much of a difference.

I'm actually able to reproduce the performance problem (tested on macOS, #820 was reported on Linux): I've mounted a backup containing a folder with a 100 GB VM image. The first call to ls -laR . in that folder takes roughly 5 seconds (doesn't matter whether the debug log is enabled or not). The folder in #820 had roughly 900GB and the optimizations in #1549 speed-up lookups by factor 10-20x. Taken together this means that the 5 seconds are in fact the scaled down variant of the 12 minutes in #820.

I've noticed that the backup repository has 527 index files which for sure doesn't help the index performance, but that many index files are probably not too uncommon.

@greatroar
Copy link
Copy Markdown
Contributor Author

Does a second ls use the filesystem cache?

@aawsome
Copy link
Copy Markdown
Contributor

aawsome commented Jun 18, 2020

I prepared #2799 which merges the in-memory index into one index data structure.
If #2789 and #2799 will be merged, I do not see any reason why to have an extra blob size cache...

@MichaelEischer
Copy link
Copy Markdown
Member

@greatroar Please rebase this PR to the current master. The index implementation is probably fast enough by now to avoid a (big) performance regression.

@greatroar greatroar force-pushed the no-blobsize-cache branch from 301b927 to 07da61b Compare July 25, 2020 17:39
@greatroar
Copy link
Copy Markdown
Contributor Author

Done!

Copy link
Copy Markdown
Member

@MichaelEischer MichaelEischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. In my tests the performance seemed to be around two times slower than before. Which should be fine especially once #2587 is completed.

@MichaelEischer MichaelEischer merged commit 020cab8 into restic:master Jul 25, 2020
@greatroar greatroar deleted the no-blobsize-cache branch September 21, 2020 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants