Skip to content

Add backup --file-read-concurrency flag#2750

Merged
MichaelEischer merged 2 commits intorestic:masterfrom
metalsp0rk:min-packsize
Oct 2, 2022
Merged

Add backup --file-read-concurrency flag#2750
MichaelEischer merged 2 commits intorestic:masterfrom
metalsp0rk:min-packsize

Conversation

@metalsp0rk
Copy link
Copy Markdown
Contributor

@metalsp0rk metalsp0rk commented May 24, 2020

What is the purpose of this change? What does it change?

Would like to add the following in order to tune restic rather than hardcoding it.
--min-packsize
--file-read-concurrency

Was the change discussed in an issue or in the forum before?

https://forum.restic.net/t/control-the-minimal-pack-files-size/617

Closes #2291

Checklist

  • I have read the Contribution Guidelines
  • I have enabled maintainer edits for this PR
  • I have added tests for all changes in this PR
  • I have added documentation for the changes (in the manual)
  • There's a new file in changelog/unreleased/ that describes the changes for our users (template here)
  • I have run gofmt on the code in all commits
  • All commit messages are formatted in the same style as the other commits in the repo
  • I'm done, this Pull Request is ready for review

@metalsp0rk metalsp0rk changed the title Min packsize Add --min-packsize and --file-read-concurrency flags May 24, 2020
@metalsp0rk
Copy link
Copy Markdown
Contributor Author

Supercedes PR #2671

Copy link
Copy Markdown

@madsi1m madsi1m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@dionorgua
Copy link
Copy Markdown
Contributor

I was playing a bit with larger pack size (128MB) and found one thing.
Restic (internal/pack/pack.go) has internal limit for header rsize + internal estimate on how much entries is usually placed in pack:

	maxHeaderSize = 16 * 1024 * 1024
	// number of header enries to download as part of header-length request
	eagerEntries = 15

I think that maxHeaderSize is still ok with 128MB pack, but probably can be hitted with some insane pack size like 1GB.

At the same time I think that it's better to adjust eagerEntries proportionally to --min-packsize. restic uses this value during rebuild-index to download tail of pack file from backend. And if such estimate was wrong (not whole header is downloaded), restic will download it gain (now with correct size obtained from already downloaded part of pack)

Even with --min-pack-size 32 currently rebuild-index uses two requests instead of one. I understand that in any case total requests number will be smaller (due to reduced number of pack files), but small fix for this will reduce it two times more...

@dionorgua
Copy link
Copy Markdown
Contributor

Even better:

// readHeader reads the header at the end of rd. size is the length of the
// whole data accessible in rd.
func readHeader(rd io.ReaderAt, size int64) ([]byte, error) {

We've actual pack size here. So just some formula should work.
I think that current eagerEntries value (15) is very close to 4*(packSize / averageblobSize). For current 4MB pack and 1MB blob size it'll be 16.

So something like eagerEntries := 4*size/1024*1024 should be ok.
I can create additional pull request for this if needed or feel free to integrate to this one.

@metalsp0rk
Copy link
Copy Markdown
Contributor Author

metalsp0rk commented May 27, 2020

While I definitely think this is a great improvement, and I want to implement it, there seems to be a complete rework of the prune functionality, which will likely change the way rebuild-index works. I'm not sure what exactly is happening with it, since I haven't really looked through it much, but I did read the head of it.

#2718 for reference.

I'll sleep on it, but it's probably not a bad idea to implement this now, seeing how it might take a while to get that other PR into master.

@aawsome
Copy link
Copy Markdown
Contributor

aawsome commented Jun 3, 2020

While I definitely think this is a great improvement, and I want to implement it, there seems to be a complete rework of the prune functionality, which will likely change the way rebuild-index works.

More precisely, this re-implementation of prune does not use the rebuild-index functionality (i.e. reading all pack headers) any longer.

However, rebuild-index is not changed and there might be still some (rare) circumstances where you do want to use it.

Moreover, reading the pack header is also used by check (when set to checking pack files) and the repack algorithm used within the current and new prune implementation to repack packs also reads the pack header.

So I think it is a good idea to also change the handling of pack-header-reading within this PR and independently from prune changes.

EDIT: I forgot to mention that the new prune implementation allows to repack "small" packs, where small is exactly defined by repository.MinPackSize - so together with this PR it would allow to even more fine-tune the pruning process if needed.

@metalsp0rk
Copy link
Copy Markdown
Contributor Author

Ah, excellent. That makes it more clear about the use case of rebuild-index.

I definitely agree that it's useful to implement it here. I'll go ahead and do that, when I get some time. I've suddenly found myself with a huge amount of work on my hands.

Copy link
Copy Markdown
Contributor

@aawsome aawsome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good to me. I added two minor suggestions, but IMO could be also merged like it is.

About the eagerEntry topic:

This is just a tuning parameter which may be optimized for a right guess of the number ob blobs within a given pack file. A naive approach would always read the pack twice (first read the number of blobs and then read the complete header) and this is optimized for "most" situations.

"Most" situations means that the number of blobs is estimated correctly which basically relies on:

  • the sizes of the blobs saved in the pack
  • the actual pack size

This PR influences the second factor, but only for pack files created by the current backup run. It would be much more sensible to rely eagerEntries on the actual pack size and not on minPackSize - however then still some guessing is needed about the first factor.

So I would suggest to merge this PR without changing eagerEntries but open a new issue. In this issue it could be argued if and how to optimize the reading of headers. Maybe we`ll then just decide to keep it as is because the impact for regular restic actions will be likely small in most cases.

@madsi1m
Copy link
Copy Markdown

madsi1m commented Jun 5, 2020

good work @metalsp0rk

@metalsp0rk
Copy link
Copy Markdown
Contributor Author

So I would suggest to merge this PR without changing eagerEntries but open a new issue.

I'm learning more about the internals of Restic every day. :)

Now that you've given that explainer, I definitely think that it's probably a bit too much scope creep and agree with your assessment of saving that for a new issue.

@metalsp0rk
Copy link
Copy Markdown
Contributor Author

Everything's been resolved, I bolstered the docs a bit more as well, I think this is probably ready to merge, if the tests pass.

Copy link
Copy Markdown
Member

@MichaelEischer MichaelEischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see why larger pack files could be useful, however, the code needs some further work to ensure that this won't break anything.

For the file read concurrency I'd like to see measurements that show a substantial improvement with values other than the default of 2.

Unless I've missed something (or a backend is horribly memory inefficient), restic shouldn't load full pack files into memory. So where does this change actually reduce memory usage? In my opinion the number of SaveBlob workers should probably rather line up with the backend connection limit.

Save Blob Concurrency
*********************

Restic defaults to creating a concurrent writer for each available CPU. This will require a
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restic writes pack files into temporary files and the reads from that file when uploading a file. So I don't see why this behavior happens. Do you have an explanation?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which behavior, specifically?

If you're referring to the concurrent writers, I do not. It's a bit of a peculiarity to me as well.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @MichaelEischer , it doesn't use the packsize per writer, but only the blobsize.
However, we should mention here that space for temporary files is needed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it seems that tmpfs usage is directly related to concurrent writers. I've observed a bit more than the number of writers multiplied by the pack size.

Worth noting.

File Read Concurrency
*********************

In some instances, such as backing up traditional spinning disks, reducing the file read
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have measurements to back that up? Reducing the file read concurrency from 2 to 1 probably doesn't change that much.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in the process of doing a large number of benchmarks. I'll share them in a Top Level comment and possibly make a thread on the forum when I'm through.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my numbers on spinning disks show that reducing concurrency does not help much, at least, not below default. It hurts significantly at higher levels of concurrency and with large numbers of small files. But that's expected behavior with a spinning disk.

I'll amend the docs to reflect this.

//setting default to 0 for these then plugging default values later fixes text output for (default: )
if globalOptions.MinPackSize == 0 {
globalOptions.MinPackSize = uint(minpacksize)
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need some upper limit for the pack size, especially we don't want to cross the gigabyte boundary.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need some upper limit for the pack size, especially we don't want to cross the gigabyte boundary.

Fully agree! In fact the maximum pack size should be small enough such that it can be stored in a uint32... (to save some memory when saving offset)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our hard limit if we want uint32 would be 4095MB.
We don't want to cross the GB boundary, so can I get a sanity check on a 1023MB limit?

I don't know if we want to mess with converting to uint32 right now, because that might effect the repository format in a manner could cause problems.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether that limit is strict enough. I wonder whether this causes memory usage problems for commands like check or prune.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would you consider to be a memory usage problem? I think properly documented memory usage estimates (or an inline warning, like I've implemented on save blob and file read concurrency in cmd_backup.go line 163) would be preferable to a hard limit. Some people will consider 1GB of memory usage a problem, others might consider 64GB to be perfectly acceptable.

I'm not sure what the project's preferences are when it comes to configurability. I'm of the opinion that we should allow the user to do whatever they want, as long as they're made aware of potential drawbacks.

I'm happy to run some tests on crazy large pack sizes for a prune and check, once I get my NAS fixed.

@aawsome
Copy link
Copy Markdown
Contributor

aawsome commented Jun 7, 2020

Unless I've missed something (or a backend is horribly memory inefficient), restic shouldn't load full pack files into memory. So where does this change actually reduce memory usage? In my opinion the number of SaveBlob workers should probably rather line up with the backend connection limit.

Oh that was my fault - I wrongly gave that hint. In fact you are right and it is not the full pack file. However restic holds chunk size in memory for each chunker - and this is one per concurrent writer. So for machines with many CPUs and little RAM it may make sense to reduce the number of concurrent writers.

@metalsp0rk
Copy link
Copy Markdown
Contributor Author

metalsp0rk commented Jun 12, 2020

The initial info I'm getting on benchmarking this (on a spinning disk only, so far) is that larger pack sizes are very helpful. I've found that the sweet spot is somewhere between 24MB and 32MB on my USB 5400rpm disk.

At the moment, I'm considering a recommendation to increase the default min pack size to 8MB or 12MB. Is there any specific reason we've chosen 4MB?

I'll share my results when I've finished the benchmarks.

@dionorgua
Copy link
Copy Markdown
Contributor

Here is a link with @fd0 comment why 4MB was used:
https://forum.restic.net/t/slow-speeds-of-restic-with-google-drive-rclone-backend/818/15

@dionorgua dionorgua mentioned this pull request Jun 15, 2020
11 tasks
@vincent-ogury
Copy link
Copy Markdown

Hi,

I'm looking forward for this feature.
On Borg I was used to tuneup the size of a chunk. By default, it is something like 500MB, which causes problems with Backblaze. Sometimes this doesn't feat to the server I send the chunk.
So I set the config in the repository to 100MB.

My backups on borg contain around than 5K files, which feat really well in terms of exchange and speed with Backblaze.

I switch to restic, which do a really good job for the simplicity and the speed. I also like the tagging, and the cleanup by tag. I can do 1 backup with different path, tag them (documents, configs, ...) and apply prune by tags.

For now the same backups on backblaze use almost 200K files, which I would like to reduce with bigger chunk.
a 24M or 32M chunk would reduce the number of files to 25K, which would be awesome!

I'm aware of memory usage that could be higher. May be something like a warning:
"You need 1GB / 4Mb chunk for this to work, ensure all your machine have this"
Or something like that.

Also if this setting can be saved on the repository himself, so we can set this once and use the same value from all the machines we have? "restic -r XXX set chunkMinSize=32"

@metalsp0rk
Copy link
Copy Markdown
Contributor Author

metalsp0rk commented Jul 25, 2020

Okay, so. I've been busy with renovations; so I lost track of this. What follows are my notes:

Repo Total Size: 14.8G

CPU: i5-8300h, mitigations on, ht on.
Source Disk: Samsung 970 Pro 512GB
Dest Disk: WD EasyStore 8TB External USB 3.0
Backup Contents: 12G of git packs, source code, image assets. 2G of video files, 365355 total files

Notes: Dropping caches after every run.

PackSize SaveBlob FileRead VmPeak WallTime backupsize Notes
4M default 8 default 2 default 1062100k 6m43.4s 10415508 None
8M 8 default 2 default 1130648k 6m20.4s 10414316 None
16M 8 default 2 default 1130328k 6m06.1s 13999000 None
32M 8 default 2 default 1130328k 5m58.8s 14002628 None
64M 8 default 2 default 1130800k 5m54.0s 14005908 It appears that Restic's archiver does not efficiently write out packs at large pack sizes. Some funny-business happening here.
128M 8 default 2 default 1130264k 6m03.1s 13999660 It's quite evident here that there is an inefficiency in the write-to-disk process. Worth investigating.
----------- ----------- ----------- ---------- ---------- ------------ -----------------
4M default 1 2 default 994608k 8m20.9s 14002940 A single blob saver appears to be a bottleneck.
4M default 2 2 default 994096k 6m51.2s 14004148 None
4M default 4 2 default 1062292k 6m32.5s 14010452 None
32M 1 2 default 926492k 6m15.1s 13993268 None
32M 2 2 default 994512k 6m20.3s 14009916 None
32M 4 2 default 1062420k 6m54.3s 14002076 None
32M 16 2 default 1334572k 6m02.1s 14000624 None
32M 32 2 default 1538680k 6m06.6s 13997180 None
----------- ----------- ----------- ---------- ---------- ------------ -----------------
4M 8 default 1 1130520k 9m41.6s 14016132 None
4M 8 default 4 1198948k 5m26.3s 14011396 None
4M 8 default 8 1266536k 5m10.2s 14007528 None
4M 8 default 16 1470580k 5m06.3s 14015760 None
32M 8 default 1 1198110k 4m19.9s 14001264 None
32M 8 default 4 1198108k 3m29.5s 13995664 None
32M 4 8 1130648k 4m59.7s 14017040 None
32M 2 8 1130328k 3m37.1s 13997100 None
32M 1 8 1130520k 4m08.7s 14005376 1 blob saver still a bottleneck.
64M 2 8 1130584k 5m15.5s 14000364 Hmmm, it doesn't like >32M pack sizes... Will have to try with B2 later.
32M 8 default 16 1470580k 4m59.6s 14004980 None

So upon quick analysis, it appears we can significantly improve performance by increasing the file read concurrency. I suspect this is due to an inefficiency in the way that the core chunk pipeline works, however, I do not have time to confirm that at the moment.

Now, I've done additional testing (but did not collect metrics on it) which showed that increasing file read concurrency negatively effects spinning rust. So to that end, I do not recommend changing the global default at this time.

@vincent-ogury to answer your questions:

I'm aware of memory usage that could be higher. May be something like a warning

A warning is a good idea. I'm not sure what the exact memory usage will correlate to, which is why I have elected to exclude it for the time being.

For now the same backups on backblaze use almost 200K files, which I would like to reduce with bigger chunk.

I'm going to do some benchmarking on backblaze with larger chunks. I suspect there will be some inefficiencies when fetching chunks for restores of small files, if you choose to go up much past 32M chunks.

Also if this setting can be saved on the repository himself, so we can set this once and use the same value from all the machines we have?

Hmmm, that's a good idea. At the moment, I do not know if we have the capability to save options like this. Maybe someone with more experience can chime in? I'd love to include this as an option.


All in all, I think this is ready for merge or at least very close. At time of writing, there are merge conflicts, so I'll give it a rebase when I get some time. In the interim, @MichaelEischer do these benchmarks satisfy your curiosity on the file-read-concurrency question? I'm happy to do more if need be, but I think I'll automate them in the future.

@MichaelEischer
Copy link
Copy Markdown
Member

Some of the labels in the last sixth of your table seem to be broken. '1 default' in the SaveBlob column can't be right.

Is VmPeak the same as maximum resident set size? Or does this refer to the maximum virtual memory usage? In the latter case it is not meaningful. A higher file read concurrency means that go has to spawn more os threads and these allocate e.g. on linux amd64 by default 8MB or more. However, only the part of the stack that is actually used, will be backed by real memory. Therefore it's no problem to allocate dozens of GB of virtual memory even if your system just has 1GB; well, that is until you try to use it.

It appears that Restic's archiver does not efficiently write out packs at large pack sizes. Some funny-business happening here.

From which rows in your measurements do you derive this? The row with that remark looks pretty much similar to the previous ones. That is from my point of view the first 6 row say that for the default concurrency it pretty much doesn't matter which packsize is used.

So upon quick analysis, it appears we can significantly improve performance by increasing the file read concurrency. I suspect this is due to an inefficiency in the way that the core chunk pipeline works, however, I do not have time to confirm that at the moment.

The optimal file read concurrency depends on the filesystem and HDD/SSD. The current value '2' for the file read concurrency is the result of some quite extensive tests and discussion (can't remember in which issue that discussion took place). The inefficiency in the pipeline is pretty sure #2696 which causes file savers to wait for a pack upload to complete before continuing and therefore cause pipeline "stalls". You could test for that case by only backing up a few large files.

The blob savers currently have three tasks: Calculate the block checksum, assemble packs and upload them to the backend. A single blob saver therefore means that no uploads and pack assembly happen concurrently. So it's expected that you need a few blob savers for reasonable performance.

That is a file read concurrency of 2 should be fine in general once #2696 is fixed. Manually increasing the number of BlobSavers beyond number of CPU cores + maximum backend concurrency won't be useful and in the worst case can only increase memory usage. But I'm not sure whether we actually have a memory usage problem here at all.

Also if this setting can be saved on the repository himself, so we can set this once and use the same value from all the machines we have?

Hmmm, that's a good idea. At the moment, I do not know if we have the capability to save options like this. Maybe someone with more experience can chime in? I'd love to include this as an option.

There's currently no way to store configuration in the repository.

Pack files have a size limit for their header of maxHeaderSize = 16MB. With the increased pack size it's actually possible to hit that limit for some pathological cases in which case a pack file would become unreadable. We need a check to complete a pack file once / before it hits that limit.

@metalsp0rk
Copy link
Copy Markdown
Contributor Author

Is VmPeak the same as maximum resident set size? Or does this refer to the maximum virtual memory usage?

VmPeak is defined as Peak virtual memory size. I think I snagged the wrong metric.

I may have been more interested in VmHWM, since that's the "peak Resident Set Size"

From which rows in your measurements do you derive this? The row with that remark looks pretty much similar to the previous ones. That is from my point of view the first 6 row say that for the default concurrency it pretty much doesn't matter which packsize is used.

I'm looking at the diminishing returns of the increased pack size. 5:58 to 5:54 is much smaller than 5:43 -> 5:20 or 5:20 -> 5:06

Some of the labels in the last sixth of your table seem to be broken. '1 default' in the SaveBlob column can't be right.

Yes, that can't be right. It should have simply been "1"

That is a file read concurrency of 2 should be fine in general once #2696 is fixed. Manually increasing the number of BlobSavers beyond number of CPU cores + maximum backend concurrency won't be useful and in the worst case can only increase memory usage. But I'm not sure whether we actually have a memory usage problem here at all.

I'm inclined to agree with you here. I think I may be seeing symptoms of that issue in my test results.

The blob savers currently have three tasks: Calculate the block checksum, assemble packs and upload them to the backend. A single blob saver therefore means that no uploads and pack assembly happen concurrently. So it's expected that you need a few blob savers for reasonable performance.

Agreed.

@davidjericho
Copy link
Copy Markdown

davidjericho commented Aug 6, 2020

I'm coming late into this particular thread, however for reference I'm the person that originally put a very naive PoC together that @madsi1m then turned into this original PR, subsequently taken up by @metalsp0rk. The original desire is worth sharing in the particular context of it being adjustable.

Thank you for all the discussion on this, I appreciate the thought being put into it, especially given I've not been able to give it the due effort myself.

The optimal file read concurrency depends on the filesystem and HDD/SSD. The current value '2' for the file read concurrency is the result of some quite extensive tests and discussion (can't remember in which issue that discussion took place).

I can entirely appreciate a relatively low number for direct attached storage in a single machine, and why we'd want to discourage many users from turning this number up too high.

In my particular use case, I have dedicated client nodes attached to a large distributed namespace holding nearly a billion files and just shy of 10PB, and growing. The limiting factor on any IO event is usually either an individual disk on a storage node in the case of a single file, or in the aggregate case, the CPU or network of a reader node. Statistically I'm extremely unlikely to access the same storage disk across several thousand threads for a read on any random file.

Other backup tools might be more suited if we were using some filesystem products, however the restic paradigm and method fits exceptionally well in the context of what a useful backup actually constitutes to us. We don't backup whole filesystems, we instead run nearly 100,000 individual jobs defined programmatically, as these constitute the atomic units that matter from the perspective of an end user. These do range from a handful of files, to upwards of 500TB and tens of millions of files, and we have very little to no control over what the structured layout is.

In experiments using my naive PoC, the thread count was incredibly useful, especially when using either NVMe or a RAM disk as the temporary directory, and I was easily saturating the 10Gbps network interfaces into my particular development node. Given the previously mentioned namespace design, as expensive as it maybe, it was near linear throughput benefit by adding more CPUs and more network bandwidth across the job set.

Pack files have a size limit for their header of maxHeaderSize = 16MB. With the increased pack size it's actually possible to hit that limit for some pathological cases in which case a pack file would become unreadable. We need a check to complete a pack file once / before it hits that limit.

The interest in larger packs was to reduce the pack count stored within any restic repository. As I mentioned above, there are nearly a billion files being written onto offline archival storage, and the operational time cost for retrieving an offline file for access is the robot loading time and the tape seek. The cost of reading many thousands of files potentially scattered across tapes was far higher in terms of time and wear on the system, than reading large packs of a gigabyte or so even if we threw away a lot of the read as unwanted data.

@MichaelEischer
Copy link
Copy Markdown
Member

In experiments using my naive PoC, the thread count was incredibly useful, especially when using either NVMe or a RAM disk as the temporary directory, and I was easily saturating the 10Gbps network interfaces into my particular development node. Given the previously mentioned namespace design, as expensive as it maybe, it was near linear throughput benefit by adding more CPUs and more network bandwidth across the job set.

Well, that use case seems rather different from what's been discussed in this PR so far (at least the throughput is an order of magnitude larger) and this also leads to new bottlenecks: With two just two file readers (= CPU cores) is definitely not possible to saturate a 10Gbps network link. The chunker implementation alone is currently capable of processing "just" 500-600 MB/s and with additional overhead from the file access and other processing steps the throughput will probably stay well below 1GB/s.

Pack files have a size limit for their header of maxHeaderSize = 16MB. With the increased pack size it's actually possible to hit that limit for some pathological cases in which case a pack file would become unreadable. We need a check to complete a pack file once / before it hits that limit.

The interest in larger packs was to reduce the pack count stored within any restic repository. As I mentioned above, there are nearly a billion files being written onto offline archival storage, and the operational time cost for retrieving an offline file for access is the robot loading time and the tape seek. The cost of reading many thousands of files potentially scattered across tapes was far higher in terms of time and wear on the system, than reading large packs of a gigabyte or so even if we threw away a lot of the read as unwanted data.

The 16MB header size limit allows roughly 450k blobs per pack file (one pack header entry has 37 bytes). To fill up a 1GB pack file this would require an average filesize of over (1024−16)×1024^2/(16×1024^2/37)−32 = 2299 bytes (the pack header counts into the pack size limit and each blob in the pack has an overhead of 32bytes in addition to the pack header entry). As you apparently did not encounter the pack-header size limit (at least I haven't read of any complaints ;-) ), I assume the average filesize to be above that limit. In pathological cases the header and data part of a pack file could have about the same size (aka. both are half a GB in size). And I doubt that the repository index handles such a large pack header well. So the pack-header size limit is just a safeguard that shouldn't apply during normal use.

@davidjericho
Copy link
Copy Markdown

Well, that use case seems rather different from what's been discussed in this PR so far (at least the throughput is an order of magnitude larger) and this also leads to new bottlenecks: With two just two file readers (= CPU cores) is definitely not possible to saturate a 10Gbps network link. The chunker implementation alone is currently capable of processing "just" 500-600 MB/s and with additional overhead from the file access and other processing steps the throughput will probably stay well below 1GB/s.

I propose it matches the linked forum discussion, and while I know there has been a lot of experimentation about what works great on direct attached storage, distributed storage is common now days and it does behave quite differently. I did have to increase the reader count, but I'd have to go digging in our ticket system to find the exactly numbers I ended up using. In the general case, 500-600MB/s was more than sufficient, the goal really being to use aggregate throughput to hide the latency of fetching a lot of small files. CPU is cheaper than my sanity when someone asks again why a data set isn't archived yet :)

The 16MB header size limit allows roughly 450k blobs per pack file (one pack header entry has 37 bytes). To fill up a 1GB pack file this would require an average filesize of over (1024−16)×1024^2/(16×1024^2/37)−32 = 2299 bytes (the pack header counts into the pack size limit and each blob in the pack has an overhead of 32bytes in addition to the pack header entry). As you apparently did not encounter the pack-header size limit (at least I haven't read of any complaints ;-) ),

I did read the discussion on that with some concern, and I think you're probably right, it was luck I didn't hit that limit. I did experiment with some much larger pack sizes and had some failures but did not dig deeply, it being a naive experiment. From a hazy memory the pack sizes I did test up to and failed using, would have lined up with what you calculate.

@matthewlloyd
Copy link
Copy Markdown

matthewlloyd commented Aug 18, 2020

Just my 2c, I am using this patch for my backups and it works great.

My use case is storing files on Google Cloud Storage using the Archive storage class, which is very cheap, with storage priced at just $1.20/TB/month and free ingress bandwidth, but API requests are expensive at $0.50/10,000 operations. Restic is perfect for this use case and superior to borg, since its local cache means it normally only needs to write to the cloud and no subsequent reads are needed. However, with the default 5MB pack size, the cost of the API PUT requests for an initial backup add up to $10.00/TB. I'm using a 500MB pack size, which reduces the cost to $0.10/TB.

I second that it would be nice to be able to configure this once at the repository level, in addition to the existing command-line and environment variable additions.

@ncecere
Copy link
Copy Markdown

ncecere commented Apr 12, 2022

Any status updates ?

@metalsp0rk metalsp0rk force-pushed the min-packsize branch 3 times, most recently from c7df932 to 06df336 Compare April 25, 2022 20:59
@metalsp0rk
Copy link
Copy Markdown
Contributor Author

Reworked the check. Doing some E2E tests on my workstations now. I think it's good, but I'd like to spend a bit of time letting it marinate on my machines. Either way, the check has now landed in my branch.

@MichaelEischer
Copy link
Copy Markdown
Member

Now that #3484 and #3475 are merged, let's move forward with some parts of this PR.

--save-blob-concurrency will be obsoleted by #3611 so let's get rid of that option. --file-read-concurrency has to wait at least until #3489 is merged, which will probably (hopefully) makes that option obsolete for most use cases, so we'll have to reevaluate that option later on.

The remaining interesting part is --min-packsize. I'd like to separate the code to properly support larger pack sizes (including the prune parts) into a separate PR. That PR should also increase the hard-coded (!) pack size to something in the range between 10-20MB. The discussion whether we want to add the option to customize the pack size or not should be handled separately.

@metalsp0rk
Copy link
Copy Markdown
Contributor Author

Sounds good, I'll rip the --min-packsize components out into a new PR, and sticking with the base2 convention, I'll set a 16MB default pack size for now.

@metalsp0rk metalsp0rk mentioned this pull request Apr 30, 2022
8 tasks
@metalsp0rk
Copy link
Copy Markdown
Contributor Author

New PR's in; #3731. I'm happy to close this one, or leave it open.

@MichaelEischer
Copy link
Copy Markdown
Member

Rebased to only include backup --file-read-concurrency. --save-blob-concurrency was obsoleted by #3611. And --pack-size was merged as part of #3731.

@MichaelEischer MichaelEischer changed the title Add --min-packsize and --file-read-concurrency flags Add backup --file-read-concurrency flag Aug 7, 2022
@JsBergbau
Copy link
Copy Markdown
Contributor

JsBergbau commented Aug 11, 2022

I've noticed when using compression=max restic adds badly compressible files (tested with mp3s) quite slow to the repo. Without compression this is done in a few seconds (SSD source and target) compared to about 90 seconds in my testcase. That is of course somehow expected. However CPU usage is clearly below 50 % (12 Core CPU, 6 physical cores). As far as I understand restic currently backups two files in parallel. So if more files would be backed-up in parallel restic should be much faster in this case because it uses more of the available CPUs.

Or to say it in other words: With compression support this feature is even much more important and should be included in the next release.

@MichaelEischer MichaelEischer linked an issue Aug 21, 2022 that may be closed by this pull request
f.StringVar(&backupOptions.StdinFilename, "stdin-filename", "stdin", "`filename` to use when reading from stdin")
f.Var(&backupOptions.Tags, "tag", "add `tags` for the new snapshot in the format `tag[,tag,...]` (can be specified multiple times)")

f.UintVar(&backupOptions.FileReadConcurrency, "file-read-concurrency", uint(fileReadConcurrency), "set concurrency on file reads. (default: $RESTIC_FILE_READ_CONCURRENCY or 2)")
Copy link
Copy Markdown
Contributor

@rawtaz rawtaz Sep 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change the first sentence in the description to "number of files to read concurrently" (removing the dot).

--files-from file read the files to backup from file (can be combined with file args; can be specified multiple times)
--files-from-raw file read the files to backup from file (can be combined with file args; can be specified multiple times)
--files-from-verbatim file read the files to backup from file (can be combined with file args; can be specified multiple times)
--file-read-concurrency uint set concurrency on file reads. (default: $RESTIC_FILE_READ_CONCURRENCY or 2)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Match with changes above :)

@rawtaz
Copy link
Copy Markdown
Contributor

rawtaz commented Sep 15, 2022

Sorry to be such a party pooper, but --file-read-concurrency is pretty long, and the "file" part is kind of pretty pointless as it's already mentioned in the description and also somewhat given in the context of backing up files. What about we change it to just --read-concurrency? Or is there a potential need for another type of read concurrency setting in the future?

@MichaelEischer
Copy link
Copy Markdown
Member

Or is there a potential need for another type of read concurrency setting in the future?

The only such flag I can think of right now would be reading multiple directories in parallel.

@JsBergbau
Copy link
Copy Markdown
Contributor

The only such flag I can think of right now would be reading multiple directories in parallel.

Wouldn't that be handled by reading multiple files in parallel? So if read-concurrency is 4 and there are 4 directories with one file each also all 4 directories would be read in parallel.

@MichaelEischer
Copy link
Copy Markdown
Member

Wouldn't that be handled by reading multiple files in parallel? So if read-concurrency is 4 and there are 4 directories with one file each also all 4 directories would be read in parallel.

In that special case yes. Tough for backups containing several million of files, it would be possible to speed up the operation by listing multiple directory concurrently.

@MichaelEischer
Copy link
Copy Markdown
Member

I've renamed the flag and rework the flag / environment variable handling to be more in line with the other commands.

The documentation for the flag now recommends it to improve performance for fast disks (see #3731 for measurements). The previous claim that reducing the read concurrency can improve throughput was never backed up with measurements as far as I see from scrolling through the comments.

Copy link
Copy Markdown
Member

@MichaelEischer MichaelEischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I've rebased the PR and also updated the changelog to match the documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ability to tweak chunk pack size Make IO-parallelism configurable