Skip to content

Disk I/O throttling? #1145

@benjamin-thomas

Description

@benjamin-thomas

Thanks for your great program, I find it a pleasure to use!

I've had a problem with a particular backup though...

Output of restic version

restic 0.7.1
compiled with go1.8.3 on linux/amd64

The OS is : Ubuntu 14.04 LTS
The filesystem is: ext4

How did you start restic exactly? (Include the complete command line)

 envdir ~/.envdir/restic/ restic backup ~/my_dir

Note: envdir provided by daemontools

What backend/server/service did you use?

rest-server over HTTPS

Expected behavior

Initial backup of a big directory (230GB) should not render the system unresponsive.

Actual behavior

Stats of the running output, for reference:

scan [/home/user/my_dir]
scanned 7408 directories, 5106240 files in 5:45
[1:42:37] 52.03%  20.227 MiB/s  121.623 GiB / 233.752 GiB  448727 / 5113648 items  0 errors  ETA 1:34:36

The system became unresponsive because of a very high %CPU "wait average" (=70), available via top.

Also I confirmed that only the restic process was choking the disk via iotop.

Steps to reproduce the behavior

Launch the initial backup subcommand on a folder containing millions of little files (I suppose this is the problem)

Do you have any idea what may have caused this?

The folder contains lots of small text files (eml), segmented into subdirectories based upon year/month/day to facilitate normal disk lookups (via /bin/ls, etc).

I guess this has a negative impact on restic doing lots of disk lookups ?

For the time being, I found a workaround by wrapping the restic command as such:

envdir ~/.envdir/restic/ nice -n19 ionice -c3 restic backup ~/my_dir

For ionice to work however, I also had to switch to a different IO scheduler as such:

root@server:~# cat /sys/block/sda/queue/scheduler
noop [deadline] cfq
root@server:~# echo cfq > /sys/block/sda/queue/scheduler
root@server:~# cat /sys/block/sda/queue/scheduler
noop deadline [cfq]  

It'd be nice if there was a way to throttle I/O operations.

Even, from a usability perspective.

Because, even if this workaround works, I still get a huge load average that I have to ignore for the moment.

root@server:~# cat /proc/loadavg
11.38 11.10 10.84 2/355 30862

My server is responsive at the moment, but if I get a CPU or memory spike, I wouldn't be able to see this because of the mudded loadavg.

Any thoughts?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions