Skip to content

Prevent local backend file fragmentation by file preallocation.#3261

Merged
MichaelEischer merged 3 commits intorestic:masterfrom
DRON-666:prealloc-packs
Jun 9, 2023
Merged

Prevent local backend file fragmentation by file preallocation.#3261
MichaelEischer merged 3 commits intorestic:masterfrom
DRON-666:prealloc-packs

Conversation

@DRON-666
Copy link
Copy Markdown
Contributor

@DRON-666 DRON-666 commented Feb 2, 2021

What does this PR change? What problem does it solve?

File preallocation has already been added to the restorer, but not to the local backend.
This PR moves preallocation code from restorer to the fs package and adds preallocation to the local backend.

Before this PR:

restic init --copy-chunker-params 
created restic repository f0e0af3f2f at repo2

restic backup "C:\Program Files" 
no parent snapshot found, will read all files

Files:       94897 new,     0 changed,     0 unmodified
Dirs:         9490 new,     0 changed,     0 unmodified
Added to the repo: 9.117 GiB

processed 94897 files, 10.011 GiB in 5:04
snapshot 5b287af7 saved

contig /a /s repo2

...
repo2\data\0d\0da13caef4a1746608248a8d1286dbddd20ef89dc524c6c2adb8da5c9e4a8a41 is in 16 fragments
repo2\data\0e\0eaa27e755e25a65f45327e400d058df6b426cdefa0b083c54a743d10fb90db9 is in 14 fragments
repo2\data\2c\2cb9a6bc2645656f8f288d131f251edcf5a76314f46cb67f2300febea6b61aaf is in 13 fragments
repo2\data\34\34a9dfd06cc971ef6d5768471245ae8ad26572942133d54313baa6484d116bb6 is in 12 fragments
repo2\data\74\74cfcbf057fc7af25b4fe930b766d95e2a96f1c0a166d66dff9812da85f8df30 is in 12 fragments
repo2\data\85\854e876ad694fcff94b5927520ac1aeb85dd555cf47f5d2525212047de4f0a7c is in 17 fragments
repo2\data\be\be958f56d448fe766f6fd7289d015c831f730678529d7355bb5b2ed790741a0b is in 16 fragments
repo2\data\bf\bfd119b20a3a4cdc7831d126f551f0211aaa9c56ae06603365a31365c6e4d3e9 is in 12 fragments
...

Summary:
     Number of files processed:      2322
     Number unsuccessfully procesed: 0
     Average fragmentation       : 2.9186 frags/file

Many files are split into 10-20 fragments.

After this PR:

restic init --copy-chunker-params 
created restic repository 665523814e at repo2

restic backup "C:\Program Files" 
no parent snapshot found, will read all files

Files:       94897 new,     0 changed,     0 unmodified
Dirs:         9490 new,     0 changed,     0 unmodified
Added to the repo: 9.117 GiB

processed 94897 files, 10.011 GiB in 4:55
snapshot f97d7203 saved

contig /a /s repo2

...
repo2\data is in 8 fragments
...

Summary:
     Number of files processed:      2317
     Number unsuccessfully procesed: 0
     Average fragmentation       : 1.00388 frags/file

Only the data dir (containing 256 subdirs) is fragmented.

Was the change discussed in an issue or in the forum before?

#2679
#2893

Checklist

  • I have read the Contribution Guidelines
  • I have enabled maintainer edits for this PR
  • I have added tests for all changes in this PR
  • I have added documentation for the changes (in the manual)
  • There's a new file in changelog/unreleased/ that describes the changes for our users (template here)
  • I have run gofmt on the code in all commits
  • All commit messages are formatted in the same style as the other commits in the repo
  • I'm done, this Pull Request is ready for review

@DRON-666
Copy link
Copy Markdown
Contributor Author

DRON-666 commented Feb 2, 2021

Additionally I tried various Windows-specific prealocation methods instead of the generic os.File.Truncate().
For backup, all methods are the same within margin of error but results for external SMR drive are very strange.

Prealloc
method
CMR SATA HDD CMR SATA HDD CMR USB HDD SMR USB HDD
None 96.93 +0.00%2.16 256.76 -0.00%3.19 173.06 +0.00%2.34 410.82 -0.00%2.52
Truncate 94.25 -2.77%1.00 236.74 -7.80%1.00 166.09 -4.02%1.00 557.40 +35.68%1.01
Method5 94.33 -2.68%1.00 234.44 -8.69%1.00 166.21 -3.96%1.00 557.66 +35.74%1.01
Method65 94.75 -2.25%1.00 236.49 -7.90%1.00 165.86 -4.16%1.00 558.55 +35.96%1.01

None - Master branch
Truncate - This PR
Method5 - SetFileInformationByHandle(FileAllocationInfo)
Method65 - SetFileInformationByHandle(FileEndOfFileInfo) + SetFileInformationByHandle(FileAllocationInfo)

The data in the columns is not comparable because they correspond to different computers and backup sources.
The first sub column is the average time of 10 runs, the second is the average fragmentation.

Copy link
Copy Markdown
Member

@MichaelEischer MichaelEischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know, Truncate internally calls SetFileInformationByHandle(FileEndOfFileInfo).

Regarding the results of the external SMR drive: I wonder whether this is related to writing multiple files to the backend at the same time which together with the flush calls requires writing to different locations on the HDD. Could you try what happens when there's only a single filestorer or when the local backend does not fsync?

@DRON-666
Copy link
Copy Markdown
Contributor Author

DRON-666 commented Feb 7, 2021

@MichaelEischer Thanks a lot for the review, I'll fix it tomorrow.

As far as I know, Truncate internally calls SetFileInformationByHandle(FileEndOfFileInfo).

No, it's FileEndOfFileInfo+FileAllocationInfo (Method65):

benchstat Restore_Truncate.txt Restore_Method5.txt
name           old time/op  new time/op  delta
CMR_HDD1_SATA    202s ВЄ 2%    209s ВЄ 1%  +3.40%  (p=0.000 n=10+10)
CMR_HDD2_SATA    658s ВЄ 3%    714s ВЄ 2%  +8.45%  (p=0.000 n=9+9)
CMR_HDD_USB      315s ВЄ 1%    317s ВЄ 1%  +0.83%  (p=0.002 n=9+9)
SMR_HDD_USB      370s ВЄ 8%    371s ВЄ 2%    ~     (p=0.549 n=9+10)

benchstat Restore_Truncate.txt Restore_Method65.txt
name           old time/op  new time/op  delta
CMR_HDD1_SATA    202s ВЄ 2%    202s ВЄ 2%   ~     (p=0.796 n=10+10)
CMR_HDD2_SATA    658s ВЄ 3%    663s ВЄ 2%   ~     (p=0.278 n=9+10)
CMR_HDD_USB      315s ВЄ 1%    318s ВЄ 5%   ~     (p=1.000 n=9+9)
SMR_HDD_USB      370s ВЄ 8%    374s ВЄ 6%   ~     (p=0.400 n=9+10)

Could you try what happens when there's only a single filestorer or when the local backend does not fsync?

Yes, of course, but every test run takes many hours, so give me a few days.

@DRON-666
Copy link
Copy Markdown
Contributor Author

DRON-666 commented Feb 9, 2021

@MichaelEischer TL;DR By disabling fsync, you will get a speedup of two or three times. Single filestorer is a bad thing on SMR.

CMR SATA HDD SMR USB HDD
None 276.49 +0.00%3.19 419.75 +0.00%2.57
Truncate 254.81 -7.84%1.00 561.25 +33.71%1.01
Single 244.71 -11.50%1.00 708.39 +68.76%1.01
NoSync 130.04 -52.97%1.00 170.36 -59.41%1.01
NoSync+Single 130.77 -52.71%1.00 165.94 -60.47%1.01

None - Master branch
Truncate - This PR
Single - This PR and a single filestorer
NoSync - This PR and no fsync

benchstat Backup_None.txt Backup_Truncate.txt
name           old time/op  new time/op  delta
CMR_HDD2_SATA    276s ± 2%    255s ± 1%   -7.73%  (p=0.000 n=9+10)
SMR_HDD_USB      421s ± 4%    563s ± 3%  +33.69%  (p=0.000 n=10+10)

benchstat Backup_Truncate.txt Backup_NoSync.txt
name           old time/op  new time/op  delta
CMR_HDD2_SATA    255s ± 1%    130s ± 1%  -48.96%  (p=0.000 n=10+10)
SMR_HDD_USB      563s ± 3%    173s ±14%  -69.25%  (p=0.000 n=10+10)

benchstat Backup_Truncate.txt Backup_Single.txt
name           old time/op  new time/op  delta
CMR_HDD2_SATA    255s ± 1%    245s ± 1%   -3.98%  (p=0.000 n=10+10)
SMR_HDD_USB      563s ± 3%    713s ± 1%  +26.67%  (p=0.000 n=10+9)

benchstat Backup_Truncate.txt Backup_NoSync_Single.txt
name           old time/op  new time/op  delta
CMR_HDD2_SATA    255s ± 1%    131s ± 1%  -48.69%  (p=0.000 n=10+10)
SMR_HDD_USB      563s ± 3%    166s ± 8%  -70.52%  (p=0.000 n=10+9)

benchstat Backup_NoSync.txt Backup_NoSync_Single.txt
name           old time/op  new time/op  delta
CMR_HDD2_SATA    130s ± 1%    131s ± 1%   ~     (p=0.165 n=10+10)
SMR_HDD_USB      173s ±14%    166s ± 8%   ~     (p=0.278 n=10+9)

Raw benchstat data.

@MichaelEischer MichaelEischer linked an issue May 14, 2021 that may be closed by this pull request
@MichaelEischer
Copy link
Copy Markdown
Member

@DRON-666 Could you repeat the measurements for this PR one last time? With the increased pack size of 16MB, I'd expect the performance regression on the SMR disk to be far lower. If that's the case, then I think we should just merge this PR. And if not, then let's close the PR.

@DRON-666
Copy link
Copy Markdown
Contributor Author

DRON-666 commented Jun 9, 2023

It's not the same SMR drive (now i have only two ST6000DM003), but results are very promising:

             │ Backup_None.txt │        Backup_Truncate.txt         │
             │     sec/op      │   sec/op    vs base                │
CMR_HDD_SATA       56.16 ± 10%   53.50 ± 8%        ~ (p=0.101 n=10)
CMR_HDD_USB        181.4 ±  4%   165.5 ± 5%   -8.77% (p=0.000 n=10)
SMR_HDD_USB        351.0 ±  1%   247.8 ± 2%  -29.40% (p=0.000 n=10)

Copy link
Copy Markdown
Member

@MichaelEischer MichaelEischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The measurements look encouraging :-) . Sorry for taking so long to get this merged!

@MichaelEischer MichaelEischer merged commit 237f32c into restic:master Jun 9, 2023
@DRON-666 DRON-666 deleted the prealloc-packs branch June 9, 2023 09:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

High fragmentation and very slow backups

2 participants