Fix quadratic file reading in restic mount by greatroar · Pull Request #2790 · restic/restic

greatroar · 2020-06-14T20:29:02Z

What is the purpose of this change? What does it change?

Doing a single read on a file in a restic mount takes time linear in the size of the file, since each read does a linear scan through an array of blob sizes, then another linear scan through an array of cached blob contents. That means that reading a whole file takes quadratic time O(n²/B), with B the block size.

This patch changes the linear scans to a binary search and a constant time operation. I measured the effect by backing up a 100GiB file of zeros (204800 blobs), then running cat inside a mount. The result is a more than 2× speedup. Before:

$ /usr/bin/time cat snapshots/latest/stdin >/dev/null 
2.45user 228.15system 1:07:33elapsed 5%CPU (0avgtext+0avgdata 2104maxresident)k
8inputs+0outputs (1major+132minor)pagefaults 0swaps

After:

$ /usr/bin/time cat snapshots/latest/stdin >/dev/null 
1.07user 138.08system 26:04.32elapsed 8%CPU (0avgtext+0avgdata 1960maxresident)k
0inputs+0outputs (0major+133minor)pagefaults 0swaps

That's half an hour wasted scanning arrays. Obviously, this gets much worse at terabyte scale.

The cache semantics have changed, but IMHO the old semantics didn't make much sense. Reading blob i would flush blobs 0 through i-1 from the cache, so reading backwards would cause the whole file to get cached in RAM without any eviction, while reading forwards would only cache the last block, with a linear-time eviction every time.

Existing tests pass; they fail when I introduce off-by-one errors in the new block skipping code.

Was the change discussed in an issue or in the forum before?

#2587 optimizes reading performance, but misses the quadratic time issue.

Checklist

I have read the Contribution Guidelines
I have enabled maintainer edits for this PR
I have added tests for all changes in this PR
I have added documentation for the changes (in the manual)
There's a new file in changelog/unreleased/ that describes the changes for our users (template here)
I have run gofmt on the code in all commits
All commit messages are formatted in the same style as the other commits in the repo
I'm done, this Pull Request is ready for review

MichaelEischer

I like the usage of a binary search. It could be possible to speed this up even further by caching the result of the search.

The new cache implementation looses one useful property of the old one: If file blocks are read before and after a blob boundary then the old implementation will end up with caching both blobs. As blobs are always read from the repository, this could end up in a large slowdown. (I think pdf files trigger rather erratic disk access patterns).

The proper solution here would be something like e.g. a global LRU cache. That could store the last 50-100 MB of accessed blobs.

internal/fuse/file.go

greatroar · 2020-06-17T10:31:44Z

I added an LRU cache. Regardless of whether the kernel caches FUSE blocks, it can still be useful when comparing files across snapshots.

greatroar · 2020-06-30T08:38:57Z

@MichaelEischer Merge?

MichaelEischer

The PR looks mostly fine. I just think that there's still a small race condition regarding the LRU-cache usage.

internal/fuse/fuse_test.go

internal/fuse/root.go

internal/fuse/fuse_test.go

internal/fuse/blobcache.go

internal/fuse/file.go

aawsome

I would suggest not to re-implement binary search.

The other changes look good to me.

internal/fuse/blobcache.go

internal/fuse/file.go

aawsome · 2020-07-10T18:21:12Z

@greatroar just realized that you exactly kicked out "import sort" with your last commit.
Is standard sort really slowing things down that much? Doesn't the closure gets inlined?

Just took a look at the source:
https://golang.org/src/sort/search.go?s=2246:2286#L49

Can't believe this is slower than your code except this is really about too much function calls...

greatroar · 2020-07-10T20:40:19Z

The speed difference is more than a factor of 2, see the commit message for 5a31546. And it really is failure to inline:

$ go build -gcflags=-m |& grep -F Read.func
./file.go:125:51: can inline (*file).Read.func1

Can inline, but no "inlined" message. From what I can tell from the assembler, it's the call (stack handling?) that's expensive.

Anyway, the throughput difference will be very small compared to the sort.Search version. I wrote the custom binary search first, but only pushed it in response to @MichaelEischer's performance concerns, to avoid the who-locks-what-in-FUSE debate. So it comes down to readability and I'll happily revert the last commit if the maintainers want it.

aawsome · 2020-07-10T20:51:17Z

The speed difference is more than a factor of 2, see the commit message for 5a31546. And it really is failure to inline:

Also just did the comparison myself 😉 - the missing inlining explains it. However, I would vote for the "sort.Search"-solution for easier readability.

If we do want to improve further, I would suggest to optimize for the sequential read case which still might be the most often used access pattern. That is, remember the last read blob (respectively index to Content) and check if the newly requested offset fits to this (or the next, if the last blob was completely read). If not, start the binary search which should be already fast enough for all other kinds of access.

MichaelEischer · 2020-07-11T14:03:28Z

If we do want to improve further, I would suggest to optimize for the sequential read case which still might be the most often used access pattern. That is, remember the last read blob (respectively index to Content) and check if the newly requested offset fits to this (or the next, if the last blob was completely read). If not, start the binary search which should be already fast enough for all other kinds of access.

@greatroar That's what I had in mind for "caching the result of the search", but in my test with a 60GB file the binary search seemed to be fast enough. I actually prefer the 'sort.Search' solution to the reimplemented binary search.

I've noticed that PR now reuses the simplelru package instead of adding another dependency to restic and I think it's rather important to keep the amount of dependencies at a reasonable level.

So the only thing remaining is a small changelog entry which mentions the optimized file access in the mount command . That changelog entry could later on also be extended to cover the other mount PRs.

aawsome · 2020-07-11T19:51:47Z

@greatroar That's what I had in mind for "caching the result of the search", but in my test with a 60GB file the binary search seemed to be fast enough. I actually prefer the 'sort.Search' solution to the reimplemented binary search.

I did tests all these three variants

sort.Search
manual binary search implemented by @greatroar
saving the index which will be used in the sequentially following next read.

with a 10GB file containing zeros (thus being able to reuse the already-cached blob) and used a time cat file >/dev/null. This means the third variant always gets a hit here.

However, there was no notable time difference and the searching part could not be seen in the cpu profile.
IMO this speaks for using the simple sort.Search solution.

MichaelEischer · 2020-07-11T22:12:41Z

@aawsome I took a look at the blob lookup performance using the cpu profiler. Even for a 50GB file the old simple loop just consumed 10% in fuse/file.Read . As the slowdown is quadratic this means that the replaced code part probably didn't consume more than 1% of the runtime of the Read method, even with the inefficient implementation. So in total you'll only see a noticeable difference for 100GB+ files.

greatroar · 2020-07-12T08:16:47Z

Ok, added a changelog entry and restored the sort-Search version

MichaelEischer

LGTM, thanks. I've rebased the branch onto the current master and force pushed it to restart the CI run, which once again had problems with the rclone test.

MichaelEischer reviewed Jun 16, 2020

View reviewed changes

internal/fuse/file.go Show resolved Hide resolved

internal/fuse/file.go Show resolved Hide resolved

internal/fuse/file.go Show resolved Hide resolved

greatroar force-pushed the fix-quadratic-read branch from e035196 to d868196 Compare June 17, 2020 10:29

greatroar force-pushed the fix-quadratic-read branch 7 times, most recently from cfde670 to d89da7d Compare June 22, 2020 07:39

greatroar closed this Jun 30, 2020

greatroar reopened this Jun 30, 2020

greatroar force-pushed the fix-quadratic-read branch from 55084f2 to ba702ad Compare June 30, 2020 12:46

MichaelEischer reviewed Jul 5, 2020

View reviewed changes

internal/fuse/fuse_test.go Outdated Show resolved Hide resolved

internal/fuse/root.go Show resolved Hide resolved

internal/fuse/fuse_test.go Outdated Show resolved Hide resolved

internal/fuse/blobcache.go Show resolved Hide resolved

internal/fuse/file.go Show resolved Hide resolved

greatroar force-pushed the fix-quadratic-read branch 2 times, most recently from d2bdda5 to 5a31546 Compare July 5, 2020 20:57

aawsome suggested changes Jul 10, 2020

View reviewed changes

internal/fuse/blobcache.go Show resolved Hide resolved

internal/fuse/file.go Outdated Show resolved Hide resolved

internal/fuse/file.go Show resolved Hide resolved

internal/fuse/file.go Show resolved Hide resolved

greatroar force-pushed the fix-quadratic-read branch from 668b728 to e6cd22f Compare July 12, 2020 08:16

greatroar added 3 commits July 12, 2020 18:27

Fix quadratic file reading in restic mount

d42c169

Replace mount's per-file cache by a global LRU cache

58719e1

Changelog entry for subquadratic reading in mount

4cf1c8e

MichaelEischer force-pushed the fix-quadratic-read branch from 76779e9 to 4cf1c8e Compare July 12, 2020 16:27

MichaelEischer approved these changes Jul 12, 2020

View reviewed changes

MichaelEischer merged commit b84f517 into restic:master Jul 12, 2020

aawsome mentioned this pull request Jul 12, 2020

Make restic mount faster and consume less memory #2587

Merged

5 tasks

greatroar deleted the fix-quadratic-read branch July 13, 2020 04:11

Conversation

greatroar commented Jun 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What is the purpose of this change? What does it change?

Was the change discussed in an issue or in the forum before?

Checklist

Uh oh!

MichaelEischer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greatroar commented Jun 17, 2020

Uh oh!

greatroar commented Jun 30, 2020

Uh oh!

MichaelEischer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aawsome left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aawsome commented Jul 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greatroar commented Jul 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aawsome commented Jul 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MichaelEischer commented Jul 11, 2020

Uh oh!

aawsome commented Jul 11, 2020

Uh oh!

MichaelEischer commented Jul 11, 2020

Uh oh!

greatroar commented Jul 12, 2020

Uh oh!

MichaelEischer left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

greatroar commented Jun 14, 2020 •

edited

Loading

aawsome commented Jul 10, 2020 •

edited

Loading

greatroar commented Jul 10, 2020 •

edited

Loading

aawsome commented Jul 10, 2020 •

edited

Loading

MichaelEischer left a comment •

edited

Loading