Skip to content

restic is very slow to backup small files #2696

@blastrock

Description

@blastrock

Output of restic version

restic 0.9.4 compiled with go1.11.6 on linux/amd64
Actually 5a7c27d

How did you run restic exactly?

~/go/bin/restic init && time ~/go/bin/restic backup . -o b2.connections=200

What backend/server/service did you use to store the repository?

I upload on b2 in EU region.

Benchmarks

I have made a benchmark and reported the results on #1383 (comment) .

I have made a few more benchmarks and opened this new issue, as requested by @MichaelEischer. The issue is indeed specific to backing up small files (and I don't know if it's related to b2).

I have found in restic's code that the parallelism is controlled by the non-exposed variables there:

func (o Options) ApplyDefaults() Options {
. My initial benchmarks have shown that tweaking the FileReadConcurrency had a great influence on the backup time, so these benchmarks will focus on that variable.

  • fast.com measured my upload speed at 300Mbps
  • The hard drive spins at 7200RPM. hdparm reports 175MB/s for non-cached reads.
  • The linux cache is cleared before each test with sync; echo 3 | sudo tee /proc/sys/vm/drop_caches
  • I live in France and upload to b2 in their EU region
  • I used -o b2.connections=200 because tests with the default value give very
    low numbers...
  • Each test is run on a bare repository
  • I compiled commit 5a7c27d from master
  • I am using a debian unstable with linux 5.4.0

Test #1

  • 1 file of 1GiB full of random
Drive type FileReadConcurrency Total time Speed
HDD 2 53s 19MiB/s
HDD 16 49s 21MiB/s

Test #2

  • 1024 files of 1MiB full of random
Drive type FileReadConcurrency Total time Speed
HDD 2 5min19s 3.2MiB/s
tmpfs 2 5min11s 3.3MiB/s
HDD 16 54s 19MiB/s
tmpfs 16 55s 19MiB/s

So regardless of the storage device (HDD or RAM), the current value of FileRead seems to slow down the backup of small files significantly.

Do you have an idea how to solve the issue?

I don't, but here's a hint: #1383 (comment)

So maybe the problem lies with restic's architecture. But maybe another solution (or workaround) could be to expose the concurrency settings on the command line so that we can tweak them.

Did restic help you today? Did it make you happy in any way?

restic didn't help me today, it helped me with my backups 2 days ago :)

(Short) story time: at work, my colleagues had problems with our server backup with duplicity (I don't remember what was the actual problem though). I recommended restic which I use to do my personal backups, they have set it up and they seem happy about it. It is now running every day there!

Anyway, keep up the good work!

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions