-
Notifications
You must be signed in to change notification settings - Fork 1.7k
check slows down quadratically with number of snapshots #2284
Description
3 seconds for every 200 snapshots. At 72 snapshots per day (3 per hour), this makes restic check painful to run.
Output of restic version
restic 0.9.5 compiled with go1.12.4 on linux/amd64
How did you run restic exactly?
I make backups every 20 minutes, so 500 snapshots per week. Please consider the following numbers in this light.
I'll skip my actual setup, and dive right into the:
Steps to reproduce the behavior
Call backup and check somewhat often.
This MRE repo does everything, and collects the data in a nice, somewhat usable format.
Specific setup:
RESTIC_REPOSITORY=destdir/ RESTIC_PASSWORD=1234 restic backup fromdirRESTIC_REPOSITORY=destdir/ RESTIC_PASSWORD=1234 restic check- Few changes to the repository, just like my actual home folder doesn't change completely all the time. The MRE makes no changes at all. However, this is unrelated to Omit snapshot creation if there was no change #662.
What backend/server/service did you use to store the repository?
sftp for my actual setup, and "filesystem" for the MRE.
This should be irrelevant, as I'm complaining about an excessive CPU load.
Expected behavior
With more snapshots, restic check has more to do, I get it. But I expect something along the lines of 1 ms per "empty" snapshot.
Actual behavior
It looks like each snapshots makes check slower by roughly 15 ms in terms of actual CPU usage as reported by /usr/bin/time. Here are the measurements from the MRE:
Note that the quadratic fit is so perfect you can barely see it. The first few checks take virtually no time, and then it blows up into tens of seconds of high CPU load. This can be neither explained by latency, nor number of files ("hundreds" is not enough for that kind of slowdown).
I'd like to mention again: You can recreate this with the MRE script.
Do you have any idea what may have caused this?
My initial guess was that there is an "accidentally quadratic" bug, so I was pretty surprised to see linear growth on the graph. So it's probably not that.
My current guess is that snapshot parsing/scanning is unreasonably slow. Maybe some computation has to run several times per snapshot, whereas one run would suffice?
On the other hand, I cannot reproduce this using "fs" as backend. So the issue could be related to sftp. Or maybe it's quadratically many IO operations, and with my harddrive it's so fast the latency doesn't matter anymore.
Do you have an idea how to solve the issue?
No idea, but it seems to be between check snapshots, trees and blobs and no errors were found. Maybe I'll sprinkle some printlns to bisect when the issue happens.
"Snapshot packs" (#523) probably don't help, but it depends on what the actual issue is.
Did restic help you or made you happy in any way?
Absolutely! I'm about to switch away from my old backup system, which needed root to restore anything, and uses system-wide configuration. With restic, both are better! :)
