-
Notifications
You must be signed in to change notification settings - Fork 594
feat(snapshots): Flush after restoring each file #4825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(snapshots): Flush after restoring each file #4825
Conversation
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
julio-lopez
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cannot be the default behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR ensures data durability during snapshot restoration by adding an explicit filesystem sync operation after each file is restored. This prevents potential data loss when users immediately detach removable storage after restoration completes.
Key changes:
- Added
f.Sync()call after file write operations to flush buffered data to disk before closing files
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #4825 +/- ##
==========================================
+ Coverage 75.86% 78.03% +2.16%
==========================================
Files 470 548 +78
Lines 37301 31417 -5884
==========================================
- Hits 28299 24515 -3784
+ Misses 7071 4851 -2220
- Partials 1931 2051 +120 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
The file restore is using buffered IO, so after the restore completes, file data are actually written to the system cache, there is no guarantee all data have been flushed to disk. The flush may happen some time later according to the system cache's own data management policy.
On the other hand, users may restore data to a "removable disk", e.g., a virtual disk or a USB disk.
As a result, if users remove the disk immediately after the restore (e.g., detach the virtual disk from the hypervisor), the restore data may be lost.
This PR adds a flush call (
f.Sync) for each file after the restore.The flush may take some time because it waits for the kernel to completes the flush IO. So for large amount of files, the throughput may get affected.
However, I think for a restore tool, it is always correct to make sure the data is unconditionally restored to the disk. So the overhead is something ought to spend.
So I don't make this as a conditional call (e.g., adding a user option for it). Let me know if this affects other modules or scenarios.