Skip to content

Smaller memory footprint for restic mount#2585

Closed
greatroar wants to merge 3 commits intorestic:masterfrom
greatroar:mount-memory
Closed

Smaller memory footprint for restic mount#2585
greatroar wants to merge 3 commits intorestic:masterfrom
greatroar:mount-memory

Conversation

@greatroar
Copy link
Copy Markdown
Contributor

What is the purpose of this change? What does it change?

restic mount used to cache the sizes of all blobs in the repo. By removing this cache, the memory footprint drops by 25%. To compensate for any loss of speed (which I didn't observe, but I only casually browsed through some old snapshots), the repo's index now allows concurrent access and its size lookup no longer needs to scan through all the packs.

Was the change discussed in an issue or in the forum before?

Not that I know.

Checklist

  • I have read the Contribution Guidelines
  • I have added tests for all changes in this PR
  • I have added documentation for the changes (in the manual)
  • There's a new file in changelog/unreleased/ that describes the changes for our users (template here)
  • I have run gofmt on the code in all commits
  • All commit messages are formatted in the same style as the other commits in the repo
  • I'm done, this Pull Request is ready for review

greatroar added 3 commits February 18, 2020 12:06
internal/repository.Repository.LookupBlobSize now allows concurrent
access. The fuse mountpoint may be slightly slower, but the memory use
of "restic mount" is down by >25% for a repo of 54M files.
@fbkopp
Copy link
Copy Markdown

fbkopp commented Feb 18, 2020

Hello, please could you test the performance with large files + 50gb. I already had problems listing large files, so the file size cache was implemented.

@greatroar
Copy link
Copy Markdown
Contributor Author

I have a snapshot that contains a single 58GB file. When I ls that file, I get instant results:

$ time ls
bigfile.tar.bz2

real	0m0,019s
user	0m0,003s
sys	0m0,000s

Is that enough of a test, or do you have a suggestion for further benchmarking?

@aawsome
Copy link
Copy Markdown
Contributor

aawsome commented Feb 18, 2020

See also the discussion in #1680

About your PR:

  • I do agree about removing BlobSizeCache (if the index is fast enough to deliver the size).
  • Your index updates are covered (and extended a lot in means of memory usage) in the change discussed in Comparison of approaches to tackle index memory usage #2523. My two cents: It doesn't help (much) to optimize here. Also MasterIndex should be optimized to prevent looping over potentially lots of index files. This leads shortly to big changes in the index implementation, this is why I proposed a re-implementation.

About the test setting: Testing one single large files is not enough as the mountperformance is IMO mainly driven by listing dirs, i.e. loading tree blobs (of which you most likely have only one.
Index performance generally is depending on index entries and number of index files. Both are comparatively small for a repo with just one big file (average blob size should be around 8MB).

@aawsome
Copy link
Copy Markdown
Contributor

aawsome commented Feb 18, 2020

...when writing my last comment I realized that I should have checked whether reading tree blob is really the slow operation for listing files in FUSE. In fact it was not...
So I made #2587 which should solve the main problem and should IMO really make caching blob sizes unnecessary because it only reads the blob sizes when a file is read.

Thanks @greatroar for leading me to the right part to look at.
And sorry for proposing a "competing" PR... :-/

@greatroar
Copy link
Copy Markdown
Contributor Author

I had seen #2523, but decided to do a more localized optimization for the short term. I'd missed #1680.

Re: performance on lots of small files, it's hard to get consistent measurements because the kernel/FUSE seems to be caching the responses from restic. The first ls takes seconds for a dir of 10k small files, the second .06s, both with master and this PR. That also suggests that an LRU cache, which I considered as a potential further optimization, will be useless.

@aawsome
Copy link
Copy Markdown
Contributor

aawsome commented Feb 25, 2020

@greatroar Is it possible that you test your example with 10k of small files with #2587 ?
I think there also the first ls should be already very fast...

@greatroar
Copy link
Copy Markdown
Contributor Author

@aawsome Here's the test. Setup code:

export RESTIC_REPOSITORY=/tmp/testrepo
export RESTIC_PASSWORD=test

restic init
mkdir /tmp/testdata
for i in $(seq 10000); do
	touch /tmp/testdata/$i
done

restic backup /tmp/testdata

Then mount, and do

time ls $mountpoint/snapshots/latest/tmp/testinput

That gives, for master:

real	0m0,770s
user	0m0,034s
sys	0m0,102s

For this PR:

real	0m0,811s
user	0m0,036s
sys	0m0,119s

For #2587:

real	0m1,068s
user	0m0,051s
sys	0m0,153s

No big changes. Still, I'm closing in favor of #2587 because its changes are more localized.

@greatroar greatroar closed this Mar 3, 2020
@greatroar greatroar deleted the mount-memory branch June 17, 2020 11:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants