Add `backup --file-read-concurrency` flag by metalsp0rk · Pull Request #2750 · restic/restic

metalsp0rk · 2020-05-24T21:35:18Z

What is the purpose of this change? What does it change?

Would like to add the following in order to tune restic rather than hardcoding it.
--min-packsize
--file-read-concurrency

Was the change discussed in an issue or in the forum before?

https://forum.restic.net/t/control-the-minimal-pack-files-size/617

Closes #2291

Checklist

I have read the Contribution Guidelines
I have enabled maintainer edits for this PR
I have added tests for all changes in this PR
I have added documentation for the changes (in the manual)
There's a new file in changelog/unreleased/ that describes the changes for our users (template here)
I have run gofmt on the code in all commits
All commit messages are formatted in the same style as the other commits in the repo
I'm done, this Pull Request is ready for review

metalsp0rk · 2020-05-24T21:37:26Z

Supercedes PR #2671

madsi1m

Looks good

doc/095_considerations.rst

dionorgua · 2020-05-27T09:53:48Z

I was playing a bit with larger pack size (128MB) and found one thing.
Restic (internal/pack/pack.go) has internal limit for header rsize + internal estimate on how much entries is usually placed in pack:

	maxHeaderSize = 16 * 1024 * 1024
	// number of header enries to download as part of header-length request
	eagerEntries = 15

I think that maxHeaderSize is still ok with 128MB pack, but probably can be hitted with some insane pack size like 1GB.

At the same time I think that it's better to adjust eagerEntries proportionally to --min-packsize. restic uses this value during rebuild-index to download tail of pack file from backend. And if such estimate was wrong (not whole header is downloaded), restic will download it gain (now with correct size obtained from already downloaded part of pack)

Even with --min-pack-size 32 currently rebuild-index uses two requests instead of one. I understand that in any case total requests number will be smaller (due to reduced number of pack files), but small fix for this will reduce it two times more...

dionorgua · 2020-05-27T10:03:21Z

Even better:

// readHeader reads the header at the end of rd. size is the length of the
// whole data accessible in rd.
func readHeader(rd io.ReaderAt, size int64) ([]byte, error) {

We've actual pack size here. So just some formula should work.
I think that current eagerEntries value (15) is very close to 4*(packSize / averageblobSize). For current 4MB pack and 1MB blob size it'll be 16.

So something like eagerEntries := 4*size/1024*1024 should be ok.
I can create additional pull request for this if needed or feel free to integrate to this one.

metalsp0rk · 2020-05-27T22:55:35Z

While I definitely think this is a great improvement, and I want to implement it, there seems to be a complete rework of the prune functionality, which will likely change the way rebuild-index works. I'm not sure what exactly is happening with it, since I haven't really looked through it much, but I did read the head of it.

#2718 for reference.

I'll sleep on it, but it's probably not a bad idea to implement this now, seeing how it might take a while to get that other PR into master.

aawsome · 2020-06-03T20:30:58Z

While I definitely think this is a great improvement, and I want to implement it, there seems to be a complete rework of the prune functionality, which will likely change the way rebuild-index works.

More precisely, this re-implementation of prune does not use the rebuild-index functionality (i.e. reading all pack headers) any longer.

However, rebuild-index is not changed and there might be still some (rare) circumstances where you do want to use it.

Moreover, reading the pack header is also used by check (when set to checking pack files) and the repack algorithm used within the current and new prune implementation to repack packs also reads the pack header.

So I think it is a good idea to also change the handling of pack-header-reading within this PR and independently from prune changes.

EDIT: I forgot to mention that the new prune implementation allows to repack "small" packs, where small is exactly defined by repository.MinPackSize - so together with this PR it would allow to even more fine-tune the pruning process if needed.

metalsp0rk · 2020-06-03T21:51:49Z

Ah, excellent. That makes it more clear about the use case of rebuild-index.

I definitely agree that it's useful to implement it here. I'll go ahead and do that, when I get some time. I've suddenly found myself with a huge amount of work on my hands.

internal/repository/packer_manager.go

changelog/unreleased/pull-2750

cmd/restic/cmd_backup.go

doc/095_considerations.rst

doc/index.rst

internal/repository/repository.go

aawsome

Seems good to me. I added two minor suggestions, but IMO could be also merged like it is.

About the `eagerEntry` topic:

This is just a tuning parameter which may be optimized for a right guess of the number ob blobs within a given pack file. A naive approach would always read the pack twice (first read the number of blobs and then read the complete header) and this is optimized for "most" situations.

"Most" situations means that the number of blobs is estimated correctly which basically relies on:

the sizes of the blobs saved in the pack
the actual pack size

This PR influences the second factor, but only for pack files created by the current backup run. It would be much more sensible to rely eagerEntries on the actual pack size and not on minPackSize - however then still some guessing is needed about the first factor.

So I would suggest to merge this PR without changing eagerEntries but open a new issue. In this issue it could be argued if and how to optimize the reading of headers. Maybe we`ll then just decide to keep it as is because the impact for regular restic actions will be likely small in most cases.

changelog/unreleased/pull-2750

internal/repository/packer_manager.go

madsi1m · 2020-06-05T04:30:29Z

good work @metalsp0rk

metalsp0rk · 2020-06-05T04:49:38Z

So I would suggest to merge this PR without changing eagerEntries but open a new issue.

I'm learning more about the internals of Restic every day. :)

Now that you've given that explainer, I definitely think that it's probably a bit too much scope creep and agree with your assessment of saving that for a new issue.

metalsp0rk · 2020-06-06T17:22:14Z

Everything's been resolved, I bolstered the docs a bit more as well, I think this is probably ready to merge, if the tests pass.

MichaelEischer

I see why larger pack files could be useful, however, the code needs some further work to ensure that this won't break anything.

For the file read concurrency I'd like to see measurements that show a substantial improvement with values other than the default of 2.

Unless I've missed something (or a backend is horribly memory inefficient), restic shouldn't load full pack files into memory. So where does this change actually reduce memory usage? In my opinion the number of SaveBlob workers should probably rather line up with the backend connection limit.

internal/repository/testing.go

internal/repository/repository.go

doc/manual_rest.rst

MichaelEischer · 2020-06-07T15:53:07Z

doc/047_tuning_backup_parameters.rst

+Save Blob Concurrency
+*********************
+
+Restic defaults to creating a concurrent writer for each available CPU.  This will require a


Restic writes pack files into temporary files and the reads from that file when uploading a file. So I don't see why this behavior happens. Do you have an explanation?

Which behavior, specifically?

If you're referring to the concurrent writers, I do not. It's a bit of a peculiarity to me as well.

I agree with @MichaelEischer , it doesn't use the packsize per writer, but only the blobsize.
However, we should mention here that space for temporary files is needed.

So it seems that tmpfs usage is directly related to concurrent writers. I've observed a bit more than the number of writers multiplied by the pack size.

Worth noting.

MichaelEischer · 2020-06-07T15:55:53Z

doc/047_tuning_backup_parameters.rst

+File Read Concurrency
+*********************
+
+In some instances, such as backing up traditional spinning disks, reducing the file read


Do you have measurements to back that up? Reducing the file read concurrency from 2 to 1 probably doesn't change that much.

I'm in the process of doing a large number of benchmarks. I'll share them in a Top Level comment and possibly make a thread on the forum when I'm through.

So my numbers on spinning disks show that reducing concurrency does not help much, at least, not below default. It hurts significantly at higher levels of concurrency and with large numbers of small files. But that's expected behavior with a spinning disk.

I'll amend the docs to reflect this.

cmd/restic/global.go

MichaelEischer · 2020-06-07T15:58:22Z

cmd/restic/global.go

+	//setting default to 0 for these then plugging default values later fixes text output for (default: )
+	if globalOptions.MinPackSize == 0 {
+		globalOptions.MinPackSize = uint(minpacksize)
+	}


We need some upper limit for the pack size, especially we don't want to cross the gigabyte boundary.

We need some upper limit for the pack size, especially we don't want to cross the gigabyte boundary.

Fully agree! In fact the maximum pack size should be small enough such that it can be stored in a uint32... (to save some memory when saving offset)

Our hard limit if we want uint32 would be 4095MB.
We don't want to cross the GB boundary, so can I get a sanity check on a 1023MB limit?

I don't know if we want to mess with converting to uint32 right now, because that might effect the repository format in a manner could cause problems.

I'm not sure whether that limit is strict enough. I wonder whether this causes memory usage problems for commands like check or prune.

What would you consider to be a memory usage problem? I think properly documented memory usage estimates (or an inline warning, like I've implemented on save blob and file read concurrency in cmd_backup.go line 163) would be preferable to a hard limit. Some people will consider 1GB of memory usage a problem, others might consider 64GB to be perfectly acceptable.

I'm not sure what the project's preferences are when it comes to configurability. I'm of the opinion that we should allow the user to do whatever they want, as long as they're made aware of potential drawbacks.

I'm happy to run some tests on crazy large pack sizes for a prune and check, once I get my NAS fixed.

cmd/restic/cmd_backup.go

aawsome · 2020-06-07T18:28:23Z

Unless I've missed something (or a backend is horribly memory inefficient), restic shouldn't load full pack files into memory. So where does this change actually reduce memory usage? In my opinion the number of SaveBlob workers should probably rather line up with the backend connection limit.

Oh that was my fault - I wrongly gave that hint. In fact you are right and it is not the full pack file. However restic holds chunk size in memory for each chunker - and this is one per concurrent writer. So for machines with many CPUs and little RAM it may make sense to reduce the number of concurrent writers.

metalsp0rk · 2020-06-12T06:44:03Z

The initial info I'm getting on benchmarking this (on a spinning disk only, so far) is that larger pack sizes are very helpful. I've found that the sweet spot is somewhere between 24MB and 32MB on my USB 5400rpm disk.

At the moment, I'm considering a recommendation to increase the default min pack size to 8MB or 12MB. Is there any specific reason we've chosen 4MB?

I'll share my results when I've finished the benchmarks.

dionorgua · 2020-06-12T07:13:29Z

Here is a link with @fd0 comment why 4MB was used:
https://forum.restic.net/t/slow-speeds-of-restic-with-google-drive-rclone-backend/818/15

vincent-ogury · 2020-07-24T15:02:59Z

Hi,

I'm looking forward for this feature.
On Borg I was used to tuneup the size of a chunk. By default, it is something like 500MB, which causes problems with Backblaze. Sometimes this doesn't feat to the server I send the chunk.
So I set the config in the repository to 100MB.

My backups on borg contain around than 5K files, which feat really well in terms of exchange and speed with Backblaze.

I switch to restic, which do a really good job for the simplicity and the speed. I also like the tagging, and the cleanup by tag. I can do 1 backup with different path, tag them (documents, configs, ...) and apply prune by tags.

For now the same backups on backblaze use almost 200K files, which I would like to reduce with bigger chunk.
a 24M or 32M chunk would reduce the number of files to 25K, which would be awesome!

I'm aware of memory usage that could be higher. May be something like a warning:
"You need 1GB / 4Mb chunk for this to work, ensure all your machine have this"
Or something like that.

Also if this setting can be saved on the repository himself, so we can set this once and use the same value from all the machines we have? "restic -r XXX set chunkMinSize=32"

metalsp0rk · 2020-07-25T23:13:55Z

Okay, so. I've been busy with renovations; so I lost track of this. What follows are my notes:

Repo Total Size: 14.8G

CPU: i5-8300h, mitigations on, ht on.
Source Disk: Samsung 970 Pro 512GB
Dest Disk: WD EasyStore 8TB External USB 3.0
Backup Contents: 12G of git packs, source code, image assets. 2G of video files, 365355 total files

Notes: Dropping caches after every run.

PackSize	SaveBlob	FileRead	VmPeak	WallTime	backupsize	Notes
4M default	8 default	2 default	1062100k	6m43.4s	10415508	None
8M	8 default	2 default	1130648k	6m20.4s	10414316	None
16M	8 default	2 default	1130328k	6m06.1s	13999000	None
32M	8 default	2 default	1130328k	5m58.8s	14002628	None
64M	8 default	2 default	1130800k	5m54.0s	14005908	It appears that Restic's archiver does not efficiently write out packs at large pack sizes. Some funny-business happening here.
128M	8 default	2 default	1130264k	6m03.1s	13999660	It's quite evident here that there is an inefficiency in the write-to-disk process. Worth investigating.
-----------	-----------	-----------	----------	----------	------------	-----------------
4M default	1	2 default	994608k	8m20.9s	14002940	A single blob saver appears to be a bottleneck.
4M default	2	2 default	994096k	6m51.2s	14004148	None
4M default	4	2 default	1062292k	6m32.5s	14010452	None
32M	1	2 default	926492k	6m15.1s	13993268	None
32M	2	2 default	994512k	6m20.3s	14009916	None
32M	4	2 default	1062420k	6m54.3s	14002076	None
32M	16	2 default	1334572k	6m02.1s	14000624	None
32M	32	2 default	1538680k	6m06.6s	13997180	None
-----------	-----------	-----------	----------	----------	------------	-----------------
4M	8 default	1	1130520k	9m41.6s	14016132	None
4M	8 default	4	1198948k	5m26.3s	14011396	None
4M	8 default	8	1266536k	5m10.2s	14007528	None
4M	8 default	16	1470580k	5m06.3s	14015760	None
32M	8 default	1	1198110k	4m19.9s	14001264	None
32M	8 default	4	1198108k	3m29.5s	13995664	None
32M	4	8	1130648k	4m59.7s	14017040	None
32M	2	8	1130328k	3m37.1s	13997100	None
32M	1	8	1130520k	4m08.7s	14005376	1 blob saver still a bottleneck.
64M	2	8	1130584k	5m15.5s	14000364	Hmmm, it doesn't like >32M pack sizes... Will have to try with B2 later.
32M	8 default	16	1470580k	4m59.6s	14004980	None

So upon quick analysis, it appears we can significantly improve performance by increasing the file read concurrency. I suspect this is due to an inefficiency in the way that the core chunk pipeline works, however, I do not have time to confirm that at the moment.

Now, I've done additional testing (but did not collect metrics on it) which showed that increasing file read concurrency negatively effects spinning rust. So to that end, I do not recommend changing the global default at this time.

@vincent-ogury to answer your questions:

I'm aware of memory usage that could be higher. May be something like a warning

A warning is a good idea. I'm not sure what the exact memory usage will correlate to, which is why I have elected to exclude it for the time being.

For now the same backups on backblaze use almost 200K files, which I would like to reduce with bigger chunk.

I'm going to do some benchmarking on backblaze with larger chunks. I suspect there will be some inefficiencies when fetching chunks for restores of small files, if you choose to go up much past 32M chunks.

Also if this setting can be saved on the repository himself, so we can set this once and use the same value from all the machines we have?

Hmmm, that's a good idea. At the moment, I do not know if we have the capability to save options like this. Maybe someone with more experience can chime in? I'd love to include this as an option.

All in all, I think this is ready for merge or at least very close. At time of writing, there are merge conflicts, so I'll give it a rebase when I get some time. In the interim, @MichaelEischer do these benchmarks satisfy your curiosity on the file-read-concurrency question? I'm happy to do more if need be, but I think I'll automate them in the future.

cmd/restic/cmd_init.go

internal/checker/checker_test.go

MichaelEischer · 2020-07-26T16:17:07Z

Some of the labels in the last sixth of your table seem to be broken. '1 default' in the SaveBlob column can't be right.

Is VmPeak the same as maximum resident set size? Or does this refer to the maximum virtual memory usage? In the latter case it is not meaningful. A higher file read concurrency means that go has to spawn more os threads and these allocate e.g. on linux amd64 by default 8MB or more. However, only the part of the stack that is actually used, will be backed by real memory. Therefore it's no problem to allocate dozens of GB of virtual memory even if your system just has 1GB; well, that is until you try to use it.

It appears that Restic's archiver does not efficiently write out packs at large pack sizes. Some funny-business happening here.

From which rows in your measurements do you derive this? The row with that remark looks pretty much similar to the previous ones. That is from my point of view the first 6 row say that for the default concurrency it pretty much doesn't matter which packsize is used.

So upon quick analysis, it appears we can significantly improve performance by increasing the file read concurrency. I suspect this is due to an inefficiency in the way that the core chunk pipeline works, however, I do not have time to confirm that at the moment.

The optimal file read concurrency depends on the filesystem and HDD/SSD. The current value '2' for the file read concurrency is the result of some quite extensive tests and discussion (can't remember in which issue that discussion took place). The inefficiency in the pipeline is pretty sure #2696 which causes file savers to wait for a pack upload to complete before continuing and therefore cause pipeline "stalls". You could test for that case by only backing up a few large files.

The blob savers currently have three tasks: Calculate the block checksum, assemble packs and upload them to the backend. A single blob saver therefore means that no uploads and pack assembly happen concurrently. So it's expected that you need a few blob savers for reasonable performance.

That is a file read concurrency of 2 should be fine in general once #2696 is fixed. Manually increasing the number of BlobSavers beyond number of CPU cores + maximum backend concurrency won't be useful and in the worst case can only increase memory usage. But I'm not sure whether we actually have a memory usage problem here at all.

Also if this setting can be saved on the repository himself, so we can set this once and use the same value from all the machines we have?

Hmmm, that's a good idea. At the moment, I do not know if we have the capability to save options like this. Maybe someone with more experience can chime in? I'd love to include this as an option.

There's currently no way to store configuration in the repository.

Pack files have a size limit for their header of maxHeaderSize = 16MB. With the increased pack size it's actually possible to hit that limit for some pathological cases in which case a pack file would become unreadable. We need a check to complete a pack file once / before it hits that limit.

metalsp0rk · 2020-07-29T01:33:06Z

Is VmPeak the same as maximum resident set size? Or does this refer to the maximum virtual memory usage?

VmPeak is defined as Peak virtual memory size. I think I snagged the wrong metric.

I may have been more interested in VmHWM, since that's the "peak Resident Set Size"

From which rows in your measurements do you derive this? The row with that remark looks pretty much similar to the previous ones. That is from my point of view the first 6 row say that for the default concurrency it pretty much doesn't matter which packsize is used.

I'm looking at the diminishing returns of the increased pack size. 5:58 to 5:54 is much smaller than 5:43 -> 5:20 or 5:20 -> 5:06

Some of the labels in the last sixth of your table seem to be broken. '1 default' in the SaveBlob column can't be right.

Yes, that can't be right. It should have simply been "1"

That is a file read concurrency of 2 should be fine in general once #2696 is fixed. Manually increasing the number of BlobSavers beyond number of CPU cores + maximum backend concurrency won't be useful and in the worst case can only increase memory usage. But I'm not sure whether we actually have a memory usage problem here at all.

I'm inclined to agree with you here. I think I may be seeing symptoms of that issue in my test results.

The blob savers currently have three tasks: Calculate the block checksum, assemble packs and upload them to the backend. A single blob saver therefore means that no uploads and pack assembly happen concurrently. So it's expected that you need a few blob savers for reasonable performance.

Agreed.

davidjericho · 2020-08-06T12:08:13Z

I'm coming late into this particular thread, however for reference I'm the person that originally put a very naive PoC together that @madsi1m then turned into this original PR, subsequently taken up by @metalsp0rk. The original desire is worth sharing in the particular context of it being adjustable.

Thank you for all the discussion on this, I appreciate the thought being put into it, especially given I've not been able to give it the due effort myself.

The optimal file read concurrency depends on the filesystem and HDD/SSD. The current value '2' for the file read concurrency is the result of some quite extensive tests and discussion (can't remember in which issue that discussion took place).

I can entirely appreciate a relatively low number for direct attached storage in a single machine, and why we'd want to discourage many users from turning this number up too high.

In my particular use case, I have dedicated client nodes attached to a large distributed namespace holding nearly a billion files and just shy of 10PB, and growing. The limiting factor on any IO event is usually either an individual disk on a storage node in the case of a single file, or in the aggregate case, the CPU or network of a reader node. Statistically I'm extremely unlikely to access the same storage disk across several thousand threads for a read on any random file.

Other backup tools might be more suited if we were using some filesystem products, however the restic paradigm and method fits exceptionally well in the context of what a useful backup actually constitutes to us. We don't backup whole filesystems, we instead run nearly 100,000 individual jobs defined programmatically, as these constitute the atomic units that matter from the perspective of an end user. These do range from a handful of files, to upwards of 500TB and tens of millions of files, and we have very little to no control over what the structured layout is.

In experiments using my naive PoC, the thread count was incredibly useful, especially when using either NVMe or a RAM disk as the temporary directory, and I was easily saturating the 10Gbps network interfaces into my particular development node. Given the previously mentioned namespace design, as expensive as it maybe, it was near linear throughput benefit by adding more CPUs and more network bandwidth across the job set.

Pack files have a size limit for their header of maxHeaderSize = 16MB. With the increased pack size it's actually possible to hit that limit for some pathological cases in which case a pack file would become unreadable. We need a check to complete a pack file once / before it hits that limit.

The interest in larger packs was to reduce the pack count stored within any restic repository. As I mentioned above, there are nearly a billion files being written onto offline archival storage, and the operational time cost for retrieving an offline file for access is the robot loading time and the tape seek. The cost of reading many thousands of files potentially scattered across tapes was far higher in terms of time and wear on the system, than reading large packs of a gigabyte or so even if we threw away a lot of the read as unwanted data.

MichaelEischer · 2020-08-08T17:21:12Z

In experiments using my naive PoC, the thread count was incredibly useful, especially when using either NVMe or a RAM disk as the temporary directory, and I was easily saturating the 10Gbps network interfaces into my particular development node. Given the previously mentioned namespace design, as expensive as it maybe, it was near linear throughput benefit by adding more CPUs and more network bandwidth across the job set.

Well, that use case seems rather different from what's been discussed in this PR so far (at least the throughput is an order of magnitude larger) and this also leads to new bottlenecks: With two just two file readers (= CPU cores) is definitely not possible to saturate a 10Gbps network link. The chunker implementation alone is currently capable of processing "just" 500-600 MB/s and with additional overhead from the file access and other processing steps the throughput will probably stay well below 1GB/s.

Pack files have a size limit for their header of maxHeaderSize = 16MB. With the increased pack size it's actually possible to hit that limit for some pathological cases in which case a pack file would become unreadable. We need a check to complete a pack file once / before it hits that limit.

The interest in larger packs was to reduce the pack count stored within any restic repository. As I mentioned above, there are nearly a billion files being written onto offline archival storage, and the operational time cost for retrieving an offline file for access is the robot loading time and the tape seek. The cost of reading many thousands of files potentially scattered across tapes was far higher in terms of time and wear on the system, than reading large packs of a gigabyte or so even if we threw away a lot of the read as unwanted data.

The 16MB header size limit allows roughly 450k blobs per pack file (one pack header entry has 37 bytes). To fill up a 1GB pack file this would require an average filesize of over (1024−16)×1024^2/(16×1024^2/37)−32 = 2299 bytes (the pack header counts into the pack size limit and each blob in the pack has an overhead of 32bytes in addition to the pack header entry). As you apparently did not encounter the pack-header size limit (at least I haven't read of any complaints ;-) ), I assume the average filesize to be above that limit. In pathological cases the header and data part of a pack file could have about the same size (aka. both are half a GB in size). And I doubt that the repository index handles such a large pack header well. So the pack-header size limit is just a safeguard that shouldn't apply during normal use.

davidjericho · 2020-08-08T22:51:54Z

Well, that use case seems rather different from what's been discussed in this PR so far (at least the throughput is an order of magnitude larger) and this also leads to new bottlenecks: With two just two file readers (= CPU cores) is definitely not possible to saturate a 10Gbps network link. The chunker implementation alone is currently capable of processing "just" 500-600 MB/s and with additional overhead from the file access and other processing steps the throughput will probably stay well below 1GB/s.

I propose it matches the linked forum discussion, and while I know there has been a lot of experimentation about what works great on direct attached storage, distributed storage is common now days and it does behave quite differently. I did have to increase the reader count, but I'd have to go digging in our ticket system to find the exactly numbers I ended up using. In the general case, 500-600MB/s was more than sufficient, the goal really being to use aggregate throughput to hide the latency of fetching a lot of small files. CPU is cheaper than my sanity when someone asks again why a data set isn't archived yet :)

The 16MB header size limit allows roughly 450k blobs per pack file (one pack header entry has 37 bytes). To fill up a 1GB pack file this would require an average filesize of over (1024−16)×1024^2/(16×1024^2/37)−32 = 2299 bytes (the pack header counts into the pack size limit and each blob in the pack has an overhead of 32bytes in addition to the pack header entry). As you apparently did not encounter the pack-header size limit (at least I haven't read of any complaints ;-) ),

I did read the discussion on that with some concern, and I think you're probably right, it was luck I didn't hit that limit. I did experiment with some much larger pack sizes and had some failures but did not dig deeply, it being a naive experiment. From a hazy memory the pack sizes I did test up to and failed using, would have lined up with what you calculate.

matthewlloyd · 2020-08-18T19:06:51Z

Just my 2c, I am using this patch for my backups and it works great.

My use case is storing files on Google Cloud Storage using the Archive storage class, which is very cheap, with storage priced at just $1.20/TB/month and free ingress bandwidth, but API requests are expensive at $0.50/10,000 operations. Restic is perfect for this use case and superior to borg, since its local cache means it normally only needs to write to the cloud and no subsequent reads are needed. However, with the default 5MB pack size, the cost of the API PUT requests for an initial backup add up to $10.00/TB. I'm using a 500MB pack size, which reduces the cost to $0.10/TB.

I second that it would be nice to be able to configure this once at the repository level, in addition to the existing command-line and environment variable additions.

ncecere · 2022-04-12T15:17:21Z

Any status updates ?

metalsp0rk · 2022-04-25T21:05:42Z

Reworked the check. Doing some E2E tests on my workstations now. I think it's good, but I'd like to spend a bit of time letting it marinate on my machines. Either way, the check has now landed in my branch.

MichaelEischer · 2022-04-30T19:25:10Z

Now that #3484 and #3475 are merged, let's move forward with some parts of this PR.

--save-blob-concurrency will be obsoleted by #3611 so let's get rid of that option. --file-read-concurrency has to wait at least until #3489 is merged, which will probably (hopefully) makes that option obsolete for most use cases, so we'll have to reevaluate that option later on.

The remaining interesting part is --min-packsize. I'd like to separate the code to properly support larger pack sizes (including the prune parts) into a separate PR. That PR should also increase the hard-coded (!) pack size to something in the range between 10-20MB. The discussion whether we want to add the option to customize the pack size or not should be handled separately.

metalsp0rk · 2022-04-30T21:21:24Z

Sounds good, I'll rip the --min-packsize components out into a new PR, and sticking with the base2 convention, I'll set a 16MB default pack size for now.

metalsp0rk · 2022-04-30T22:25:36Z

New PR's in; #3731. I'm happy to close this one, or leave it open.

MichaelEischer · 2022-08-07T13:14:44Z

Rebased to only include backup --file-read-concurrency. --save-blob-concurrency was obsoleted by #3611. And --pack-size was merged as part of #3731.

JsBergbau · 2022-08-11T18:53:43Z

I've noticed when using compression=max restic adds badly compressible files (tested with mp3s) quite slow to the repo. Without compression this is done in a few seconds (SSD source and target) compared to about 90 seconds in my testcase. That is of course somehow expected. However CPU usage is clearly below 50 % (12 Core CPU, 6 physical cores). As far as I understand restic currently backups two files in parallel. So if more files would be backed-up in parallel restic should be much faster in this case because it uses more of the available CPUs.

Or to say it in other words: With compression support this feature is even much more important and should be included in the next release.

rawtaz · 2022-09-15T00:25:31Z

cmd/restic/cmd_backup.go

 	f.StringVar(&backupOptions.StdinFilename, "stdin-filename", "stdin", "`filename` to use when reading from stdin")
 	f.Var(&backupOptions.Tags, "tag", "add `tags` for the new snapshot in the format `tag[,tag,...]` (can be specified multiple times)")
-
+	f.UintVar(&backupOptions.FileReadConcurrency, "file-read-concurrency", uint(fileReadConcurrency), "set concurrency on file reads. (default: $RESTIC_FILE_READ_CONCURRENCY or 2)")


Please change the first sentence in the description to "number of files to read concurrently" (removing the dot).

rawtaz · 2022-09-15T00:29:08Z

doc/manual_rest.rst

          --files-from file                        read the files to backup from file (can be combined with file args; can be specified multiple times)
          --files-from-raw file                    read the files to backup from file (can be combined with file args; can be specified multiple times)
          --files-from-verbatim file               read the files to backup from file (can be combined with file args; can be specified multiple times)
+          --file-read-concurrency uint             set concurrency on file reads. (default: $RESTIC_FILE_READ_CONCURRENCY or 2)


Match with changes above :)

rawtaz · 2022-09-15T00:50:26Z

Sorry to be such a party pooper, but --file-read-concurrency is pretty long, and the "file" part is kind of pretty pointless as it's already mentioned in the description and also somewhat given in the context of backing up files. What about we change it to just --read-concurrency? Or is there a potential need for another type of read concurrency setting in the future?

MichaelEischer · 2022-09-17T13:44:17Z

Or is there a potential need for another type of read concurrency setting in the future?

The only such flag I can think of right now would be reading multiple directories in parallel.

JsBergbau · 2022-09-23T20:33:38Z

The only such flag I can think of right now would be reading multiple directories in parallel.

Wouldn't that be handled by reading multiple files in parallel? So if read-concurrency is 4 and there are 4 directories with one file each also all 4 directories would be read in parallel.

MichaelEischer · 2022-09-24T09:41:01Z

Wouldn't that be handled by reading multiple files in parallel? So if read-concurrency is 4 and there are 4 directories with one file each also all 4 directories would be read in parallel.

In that special case yes. Tough for backups containing several million of files, it would be possible to speed up the operation by listing multiple directory concurrently.

MichaelEischer · 2022-09-24T10:00:53Z

I've renamed the flag and rework the flag / environment variable handling to be more in line with the other commands.

The documentation for the flag now recommends it to improve performance for fast disks (see #3731 for measurements). The previous claim that reducing the read concurrency can improve throughput was never backed up with measurements as far as I see from scrolling through the comments.

MichaelEischer

LGTM. I've rebased the PR and also updated the changelog to match the documentation.

metalsp0rk changed the title ~~Min packsize~~ Add --min-packsize and --file-read-concurrency flags May 24, 2020

metalsp0rk mentioned this pull request May 24, 2020

cli --min-packsize and --file-read-concurrency #2671

Closed

8 tasks

madsi1m approved these changes May 25, 2020

View reviewed changes

greatroar reviewed May 26, 2020

View reviewed changes

doc/095_considerations.rst Outdated Show resolved Hide resolved

aawsome reviewed Jun 4, 2020

View reviewed changes

metalsp0rk force-pushed the min-packsize branch from ddfeb8c to 9376c62 Compare June 4, 2020 06:40

aawsome approved these changes Jun 5, 2020

View reviewed changes

changelog/unreleased/pull-2750 Outdated Show resolved Hide resolved

internal/repository/packer_manager.go Outdated Show resolved Hide resolved

MichaelEischer reviewed Jun 7, 2020

View reviewed changes

dionorgua mentioned this pull request Jun 15, 2020

Reimplementation of prune #2718

Merged

11 tasks

MichaelEischer reviewed Jul 26, 2020

View reviewed changes

cmd/restic/cmd_init.go Outdated Show resolved Hide resolved

internal/checker/checker_test.go Outdated Show resolved Hide resolved

metalsp0rk force-pushed the min-packsize branch 3 times, most recently from c7df932 to 06df336 Compare April 25, 2022 20:59

metalsp0rk force-pushed the min-packsize branch from 06df336 to e8f1e3a Compare April 30, 2022 21:20

metalsp0rk mentioned this pull request Apr 30, 2022

Feature: min packsize flag #3731

Merged

8 tasks

MichaelEischer force-pushed the min-packsize branch from e8f1e3a to 5855748 Compare August 7, 2022 13:13

MichaelEischer changed the title ~~Add --min-packsize and --file-read-concurrency flags~~ Add backup --file-read-concurrency flag Aug 7, 2022

MichaelEischer linked an issue Aug 21, 2022 that may be closed by this pull request

Make IO-parallelism configurable #322

Closed

rawtaz reviewed Sep 15, 2022

View reviewed changes

MichaelEischer force-pushed the min-packsize branch from 5855748 to 6f17126 Compare September 24, 2022 09:57

metalsp0rk and others added 2 commits October 2, 2022 22:51

backup: add option for file read concurrency

4a501d7

backup: rework read concurrency

2e606ca

MichaelEischer force-pushed the min-packsize branch from 6f17126 to 2e606ca Compare October 2, 2022 20:55

MichaelEischer approved these changes Oct 2, 2022

View reviewed changes

MichaelEischer merged commit e99ad39 into restic:master Oct 2, 2022

JsBergbau mentioned this pull request Oct 3, 2022

kind restic and other tools benchmark report #3917

Closed

GrauBlitz mentioned this pull request Aug 3, 2025

Make IO-parallelism configurable #322

Closed

Conversation

metalsp0rk commented May 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What is the purpose of this change? What does it change?

Was the change discussed in an issue or in the forum before?

Checklist

Uh oh!

metalsp0rk commented May 24, 2020

Uh oh!

madsi1m left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dionorgua commented May 27, 2020

Uh oh!

dionorgua commented May 27, 2020

Uh oh!

metalsp0rk commented May 27, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aawsome commented Jun 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

metalsp0rk commented Jun 3, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aawsome left a comment

Choose a reason for hiding this comment

About the eagerEntry topic:

Uh oh!

Uh oh!

Uh oh!

madsi1m commented Jun 5, 2020

Uh oh!

metalsp0rk commented Jun 5, 2020

Uh oh!

metalsp0rk commented Jun 6, 2020

Uh oh!

MichaelEischer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

aawsome commented Jun 7, 2020

metalsp0rk commented May 24, 2020 •

edited

Loading

metalsp0rk commented May 27, 2020 •

edited

Loading

aawsome commented Jun 3, 2020 •

edited

Loading

About the `eagerEntry` topic:

metalsp0rk commented Jun 12, 2020 •

edited

Loading

metalsp0rk commented Jul 25, 2020 •

edited

Loading

davidjericho commented Aug 6, 2020 •

edited

Loading

matthewlloyd commented Aug 18, 2020 •

edited

Loading

JsBergbau commented Aug 11, 2022 •

edited

Loading

rawtaz Sep 15, 2022 •

edited

Loading