Conversation
3f1adab to
37ba63e
Compare
|
I wonder if a better alternative would be to try and grow the file to the exact required size on flush. If we don't have enough space, we'll abort without touching the file. We currently grow the file automatically by writing to it, which means it's more likely to abort outside of entry boundaries. One edge case with my suggestion is that we might have a bunch of zeroes at the end of the file. If is possible if we crash half way during the flush while we've already grown the file. But, we can easily detect this as there's no valid Speaking of which: a crash during flush is likely another scenario to reproduce the problem you described - writing a partial entry. Thoughts? |
As we discussed with @agourlay, in general, we cannot check if the data in the file is correct. We can check the entry, but if the malformed data is the last bytes after the last entry, we cannot check it - the external id may be any possible value. In case of out-of-space it's fine, but I'm not sure if this proposal works fine in case of crash |
There is a draft PR with collecting all pending mappings into a single buffer and writing it in one shot. It contains the size estimation of the pending changes and is a partial implementation of your proposal, we can extend it to implement your idea |
| // `persisted_mappings_size` may be less than actual file size. | ||
| let file_start_appending = persisted_mappings_size.load(std::sync::atomic::Ordering::Relaxed); | ||
| if file_start_appending <= file_len { | ||
| file.seek(io::SeekFrom::Start(file_start_appending)) |
There was a problem hiding this comment.
Idea: how about we truncate the file here instead (or do both). Then we also 'repair' the file for the next load if we don't finish this flush.
If we hit this case, we expect to always write the exact same sequence of bytes. And so it should not corrupt the file even more. But truncating may be a bit safer.
This isn't necessary, but it is nice to clean up partial mappings.
timvisee
left a comment
There was a problem hiding this comment.
I like the idea, the implementation looks sound.
I did extend your implementation a bit to clean up bad state more aggressively. It isn't strictly necessary, but in my opinion it's a nice to have.
Specifically it now:
- on failed flush try to truncate file to what we had before flush, actively removes partial entries while we still know how to correctly recover the state on disk
- if file is too large when persisting new mappings, don't just seek to append point but also truncate everything after
| // If persisting mappings failed, try to truncate mappings file to what we had before | ||
| // in an best effort to get rid of partially persisted mappings. We can safely ignore | ||
| // truncate errors because load should properly handle partial entries as well. | ||
| if let Err(err) = stored { |
There was a problem hiding this comment.
it seems store_mapping_changes can throw a bunch of errors, I am not sure about matching on any error returned here.
Maybe we could at least check if the file len changed before trying to set it back?
There was a problem hiding this comment.
If the file wasn't touched, the following 'truncation' will be a no op.
Do you think it's still better to isolate the error paths?
There was a problem hiding this comment.
On a second though I think it is fine, wanted to have your opinion.
Isolating error path will be super verbose.
There was a problem hiding this comment.
Yeah.
There is extra work to re-open the file to allow truncation. But it only happens in the error path.
|
|
||
| // Get new persisted size as a position after writing all changes. | ||
| // Stream position is used here to handle cases where the pending changes are shorter than non-persisted part. | ||
| let new_persisted_size = file.stream_position().map_err(|err| { |
There was a problem hiding this comment.
Did you test this stream_position thingy?
Is it supposed to give you the current cursor position after all the writes?
There was a problem hiding this comment.
Yes.
In this case it gives us the size of the file in bytes, or the position we write the first new byte to.
|
I tested this PR locally using tmpfs (filesystem in RAM) with small disk size (8MB). Test code with repro steps. It shows the same error #[test]
fn mutable_id_tracker_overflow() {
// First, create a file of 1MB size to simulate later disk space cleaning.
let tmp_file_path = Path::new("/mnt/tiny/tmp_file");
if fs::exists(&tmp_file_path).unwrap() {
fs::remove_file(&tmp_file_path).unwrap();
}
let mut tmp_file = fs::File::create(&tmp_file_path).unwrap();
tmp_file.write(&vec![0; 1024 * 1024]).unwrap();
tmp_file.flush().unwrap();
tmp_file.sync_all().unwrap();
drop(tmp_file);
// Then, create ID tracker and fill it while possible.
let path = Path::new("/mnt/tiny/id_tracker_overflow_test");
if fs::exists(path).unwrap() {
fs::remove_dir_all(path).unwrap();
}
std::fs::create_dir_all(path).unwrap();
let mut id_tracker = make_mutable_tracker(&path);
for i in 0.. {
let internal_id = id_tracker.total_point_count() as PointOffsetType;
let point_id = PointIdType::NumId(internal_id as u64);
match id_tracker.set_link(point_id, internal_id) {
Ok(_) => {}
Err(err) => {
println!("Stopped at iteration {i} due to error: {err}");
break;
}
}
// flush each 1000 points
if i % 1000 == 0 {
match id_tracker.mapping_flusher()() {
Ok(_) => {}
Err(err) => {
println!("Flushing stopped at iteration {i} due to error: {err}");
break;
}
}
}
}
// check that free space is not available
let fs_stats = fs2::available_space(path).unwrap();
println!("Available space before cleanup: {fs_stats} bytes");
// then, add additional space by removing the temporary file.
fs::remove_file(&tmp_file_path).unwrap();
// flush file deletion to ensure the OS registers the freed space
#[cfg(unix)]
{
use std::process::Command;
// Force filesystem sync
let _ = Command::new("sync").output();
}
std::thread::sleep(std::time::Duration::from_secs(3)); // wait for the system to register the freed space
// check that free space is available
let fs_stats = fs2::available_space(path).unwrap();
println!("Available space after cleanup: {fs_stats} bytes");
// Add some more points to verify that ID tracker can continue working.
for i in 0..10 {
let point_id = PointIdType::Uuid(Uuid::new_v4());
let internal_id = id_tracker.total_point_count() as PointOffsetType;
match id_tracker.set_link(point_id, internal_id) {
Ok(_) => {}
Err(err) => {
println!("Stopped at iteration {i} due to error: {err}");
break;
}
}
}
// Flush has to work at the end.
id_tracker.mapping_flusher()().unwrap();
// Reload ID tracker to verify data integrity.
drop(id_tracker);
// The bug appears here with some probability.
MutableIdTracker::open(&path).unwrap();
} |
Turns out that message was from before the fix. No pending concerns then. 👍 |
* id tracker persisted mappings offset * remove obsolete check * review remarks * review remarks * When persisting mappings, truncate file that is larger than we expect * If persisting mappings fails, truncate file to what we had before This isn't necessary, but it is nice to clean up partial mappings. * Fix typos * Fix typos * Just truncate the file, we don't have to seek anymore * Rename length variable to mappings_expected_len --------- Co-authored-by: timvisee <tim@visee.me>
This PR fixes the ID tracker flushing bug and continues PR #6173.
Bug description
In the production, there is an error while ID tracker loading:
Possible repro steps
There are no confirmed repro steps yet, but there is a possible scenario of how it could happen:
In this scenario, file shrinking from here #6173 is not helping because the data is malformed, but we don't meet EOF.
Solution
As a solution, this PR tracks the amount of successfully persisted bytes amount. In code, it's called
persisted_mappings_size.This
persisted_mappings_sizeis equal to the file size at the beginning. It's increased after a successful flush, and the flush is starting appends atpersisted_mappings_sizeposition (usingfile::Seek).How it affects the described bug scenario
In the scenario of the possible repro steps, we don't update
persisted_mappings_sizeif the flush failed. It means, at step 5, we won't append the unfinished file. We will override non-succesfull part in this file.Open questions
Do we need to shrink the file in the
persisted_mappings_sizeis not equal to the file size while flushing?