tigerbeetle icon indicating copy to clipboard operation
tigerbeetle copied to clipboard

CLI: Rename data file during "format" command

Open sentientwaffle opened this issue 2 years ago • 4 comments

Right now, tigerbeetle format <file> writes directly to <file>.

But if tigerbeetle format does not complete properly (e.g. it is interrupted by Ctrl+C, or one of the disk writes fails) then the file shouldn't be used by a replica.

To make it harder to mistake an incompletely-formatted data file as completely-formatted, tigerbeetle format should write to <file>.(random suffix), and then rename to <file> only when it is done writing.

(This should probably be implemented in main.zig, not vsr/format.zig.)

sentientwaffle avatar Aug 29 '23 16:08 sentientwaffle

This is a good idea.

Thinking about this some more the last few minutes.

How can we solve this (the user experience issue) from another angle? For example, how can we indicate this in the superblock, rather than rely on the file system, to provide a better error when a data file is not completely formatted?

The reason being, that this would then work also for raw block devices where we can't rename, and that rename on Windows immediately after writing to a file, can sometimes suffer when an antivirus intervenes. The AV first wants to scan the file before it allows the rename, so IIRC you can get flaky EPERM.

jorangreef avatar Aug 29 '23 16:08 jorangreef

I think we already do that automatically -- when formatting, we first write the WAL, then we write the superblock (trailers then header). We could probably improve error messages.

But renaming moves the failure up a level -- the issue is visible from the file system, without running anything at all.

sentientwaffle avatar Aug 29 '23 17:08 sentientwaffle

(This should probably be implemented in main.zig, not vsr/format.zig.)

If I understand correctly, I actually think this should be in VSR, since VSR includes stable storage, and since the superblock is part of VSR as a framework, not state machine-specific. If it were state machine-specific, then it would be in main.zig.

jorangreef avatar Jan 15 '24 13:01 jorangreef

How can we detect the "incomplete format" issue for block devices (and provide a nicer error message)? Should we add code for that? Or could we attack it by minimizing the risk of it happening in the first place?

jorangreef avatar Jan 15 '24 13:01 jorangreef