-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Improving restics performance on large mostly unchanging datasets #1550
Description
Output of restic version
Self compiled latest master
Identified Issue
As a follow up to #1538 and #1549, on large backups with few changed files (>14M files total, probably <10000 changed) the performance is now limited by the Index.Has function (50.53% of the total cpu usage of restic during my backup). In an effort to try and decrease backup times further, I was looking to see what could be done to further reduce the affect of this function. There seems to be two significant sources of cpu usage:
- Calling Mutex.Lock() (8% of the total cpu usage of restic). This is mostly just the overhead of having a lock, and is not caused by having the lock be contested.
- Doing the actual lookup into the map (28.04% of the total cpu usage of restic)
Note those are the total times of the entire backup, not just in the calls to Has().
Also mixed in there is the setup/teardown time for using defer. Considering that the Go team is looking on improving restic's exact use case (calling Mutex.Unlock), it may not be worth worrying about this.
Do you have an idea how to solve the issue?
- For the Mutex, I think replacing it with a RWMutex may help (I need to preform more benchmarking on this to be sure). Otherwise, I'm not sure there is a way to avoid it. If I understand the code base correctly, finalized Indexes don't need locking as they are read only. Is this correct? And would it be worth trying to take advantage of this fact (with the associated complexity increase)?
- For the map, my only thought would be to decrease the size of it. From reading how the map lookup is done, this might help, but more benchmarking is needed. Is there a reason why the index size was chosen to be what it is?
- Alternately, restic could try to call Has() less. Restic seems to call Has() to check if a file is present in the backup repository already. However, I was thinking if restic sees the file in the ancestor tree, it could use that fact to avoid having to check if the repository has the file. Does this sound reasonable? If so, should this wait on New archiver code #1494 or be done at the same time? If this is the preferred option but you want me to wait until after New archiver code #1494 before trying this option that's fine.
- My last idea would be to reorganize the indexes so that the MasterIndex (the main direct caller of Index.Has()) would only have to search through a limited number of indexes to find the blob. This could also help reduce memory usage by avoiding having to load all the indexes into memory at the start of restic (which consumes ~6-8G for my 14M file backup), and instead load them on demand. But this is a large change requiring a repository format break, so I can understand if this option is currently off the table.
Alternately, I can understand if you are happy with the performance as is right now (and I am generally happy too!) and rather concentrate on other more pressing features/bugs. I was just looking to see if there was any low hanging fruit to optimize in restic and this seems to be my largest performance bottleneck right now. If you want to come back to this problem later, feel free to either leave it open or close this issue as suits you.
Did restic help you or made you happy in any way?
I've been looking for a backup solution to replace Crashplan Home, most importantly be able to handle my ~7TB of data with incremental backups. Restic is the best solution I've found so far, and I'm currently planing on using it extensively. Especially considering how easy and fast it is to use!