Skip to content

check slows down quadratically with number of snapshots #2284

@BenWiederhake

Description

@BenWiederhake

3 seconds for every 200 snapshots. At 72 snapshots per day (3 per hour), this makes restic check painful to run.

Output of restic version

restic 0.9.5 compiled with go1.12.4 on linux/amd64

How did you run restic exactly?

I make backups every 20 minutes, so 500 snapshots per week. Please consider the following numbers in this light.

I'll skip my actual setup, and dive right into the:

Steps to reproduce the behavior

Call backup and check somewhat often.

This MRE repo does everything, and collects the data in a nice, somewhat usable format.

Specific setup:

  • RESTIC_REPOSITORY=destdir/ RESTIC_PASSWORD=1234 restic backup fromdir
  • RESTIC_REPOSITORY=destdir/ RESTIC_PASSWORD=1234 restic check
  • Few changes to the repository, just like my actual home folder doesn't change completely all the time. The MRE makes no changes at all. However, this is unrelated to Omit snapshot creation if there was no change #662.

What backend/server/service did you use to store the repository?

sftp for my actual setup, and "filesystem" for the MRE.

This should be irrelevant, as I'm complaining about an excessive CPU load.

Expected behavior

With more snapshots, restic check has more to do, I get it. But I expect something along the lines of 1 ms per "empty" snapshot.

Actual behavior

It looks like each snapshots makes check slower by roughly 15 ms in terms of actual CPU usage as reported by /usr/bin/time. Here are the measurements from the MRE:

chart that shows quadratic growth, x axis is the number of snapshot, y axis is time

Note that the quadratic fit is so perfect you can barely see it. The first few checks take virtually no time, and then it blows up into tens of seconds of high CPU load. This can be neither explained by latency, nor number of files ("hundreds" is not enough for that kind of slowdown).

I'd like to mention again: You can recreate this with the MRE script.

Do you have any idea what may have caused this?

My initial guess was that there is an "accidentally quadratic" bug, so I was pretty surprised to see linear growth on the graph. So it's probably not that.

My current guess is that snapshot parsing/scanning is unreasonably slow. Maybe some computation has to run several times per snapshot, whereas one run would suffice?

On the other hand, I cannot reproduce this using "fs" as backend. So the issue could be related to sftp. Or maybe it's quadratically many IO operations, and with my harddrive it's so fast the latency doesn't matter anymore.

Do you have an idea how to solve the issue?

No idea, but it seems to be between check snapshots, trees and blobs and no errors were found. Maybe I'll sprinkle some printlns to bisect when the issue happens.

"Snapshot packs" (#523) probably don't help, but it depends on what the actual issue is.

Did restic help you or made you happy in any way?

Absolutely! I'm about to switch away from my old backup system, which needed root to restore anything, and uses system-wide configuration. With restic, both are better! :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions