Skip to content

Fix 'invalid magic number 0' bug#11338

Merged
codesome merged 3 commits intoprometheus:mainfrom
codesome:magic-number
Sep 28, 2022
Merged

Fix 'invalid magic number 0' bug#11338
codesome merged 3 commits intoprometheus:mainfrom
codesome:magic-number

Conversation

@codesome
Copy link
Member

Closes #7397

We already had a repair mechanism for this file if the file was empty, but it can happen that file is not 0 size but filled with 0s. So this PR extends the repair logic and repairs for file with 0s as well.

I would like this to be a part of v2.39

@codesome codesome requested a review from bwplotka September 22, 2022 08:46
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
@codesome codesome mentioned this pull request Sep 23, 2022
8 tasks
@alvinlin123
Copy link
Contributor

alvinlin123 commented Sep 28, 2022

I think this fix is good to detect 0 magic number segment file and remove it.

But I also think we should maybe call file.Sync() right after writing the magic number to the .tmp file when cutting a new segment file. I believe this is the root cause why sometimes segment file has magic number value of 0 (I am thinking of a case where data did not get flushed to disk after rename the .tmp file by OS).

From what I've read, cutting a new segment file in head chunk involves

  1. create a .tmp file
  2. pre-allocate the file to some size and fill the file with 0x00s
  3. write magic number
  4. close file
  5. rename file

According to Linux's close man page data are not guaranteed to be written to disk, and I think Go's File.Close pretty much just make a underlying OS system call.

I don't think the fileutil.Rename() sync the data to disk too.

I think in this PR maybe we can add a File.Sync() call after writing the magic number, then we can do some log grepping on to see if invalid magic number 0 error still happens?

I also see the comment for repairLastChunkFile():

// repairLastChunkFile deletes the last file if it's empty.
// Because we don't fsync when creating these files, we could end
// up with an empty file at the end during an abrupt shutdown.

I am wondering why doing fsync is not desirable? Is it because of perf?

return files, errors.Wrap(err, "read file during last head chunk file repair")
}

remove = binary.BigEndian.Uint32(buf) == 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should log something here to tell the user that "hey, we encountered magic number is 0 error".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't think adding this log line is useful or actionable. Also, this package does not log anywhere and I would like to keep it that way. Deleting the last file is not a big issue.

Copy link
Contributor

@alvinlin123 alvinlin123 Sep 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, if we don't want to do fsync I don't think the log will be useful too. I mainly suggested this to see if doing fsync would fix this problem too :)

@alvinlin123
Copy link
Contributor

alvinlin123 commented Sep 28, 2022

Ok, I see the argument for not doing a sync after writing magic number is noted in #8061 (comment), which I can agree because I know fsync does come with overhead.

But, I wonder sync 4 bytes of magic number would take seconds; if it indeed take seconds, isn't the disk overloaded and you have bigger problem?

I get the idea of "fixing" the problem at "read path" to avoid fsync when cutting new segment file in head chunk but;

  1. Is the saving of overhead of fsync worth to have maintainability issue like the issue? By not doing fsync, we are essentially asking anyone making change to the logic of cutting new segment file to consider if update is needed to repairLastChunkFile.
    1. In other words, if we were to do fsync back in 2020, we probably won't need this PR?
  2. Do we cut new segment file so frequently that fsync overhead is an issue?

if info.Size() == 0 {

remove := info.Size() == 0 // Empty file.
if info.Size() >= MagicChunksSize {
Copy link
Contributor

@alvinlin123 alvinlin123 Sep 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think it might be better to move this if block to a helper function so the code can read like:

remove := info.Size() ==0 
remove = !remove && noMagicNubmer(...)

if remove {
 ....
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this logic is only used here and that repairLastChunkFile is not too huge, I prefer to not create another function here.

Copy link
Contributor

@alvinlin123 alvinlin123 Sep 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense; you have a good point.

@codesome
Copy link
Member Author

It is not just about syncing the magic number. In some OS the whole file is preallocated even before you write the magic number (for example Windows). So any abrupt shutdown before you sync the magic number still needs to be detected (you can see that we already had a code in place to detect empty files, but that is apparently not enough).

Also, fsync being on the hot path can be a dealbreaker (though with the addition of async queue and another open work, it won't be in hot path in future). This repair mechanism is acceptable and I don't see it as a maintainence burden here; a repair mechanism for database is desirable instead of a hard fail IMO (you cannot expect your hardware to work properly all the time, issues happen).

Additionally, if there was no magic number, it also means there were no chunks flushed to the file, so flushing the magic number does not help us in any way. If you see, trying to sync magic number still leaves a room for the error to occur.

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
@alvinlin123
Copy link
Contributor

@codesome yes, you are absolutely right, what you did in this PR is needed; I wouldn't remove it, it's awesome fix :)

In some OS the whole file is preallocated even before you write the magic number (for example Windows). So any abrupt shutdown before you sync the magic number still needs to be detected

Right, when you preallocate, I think different OS may do different thing, some may fill it with 0x00, some may leave the file empty, but that's not the problem here. The current code is we pre-allocate a temporary file, write the magic number, then rename the file. If we do a fsync before renaming the file, the rename the file, then anything abrupt termination before renaming the file will result the file remain temporary, and will get cleaned up/ignored. In other words, if we can guarantee that before renaming the file, the magic number is written to the temporary file on disk, the the abrupt termination should not result in corrupted segment file (the temporary can still be corrupted, but it doesn't matter).

If we don't do fysnc, then I can see how the actual segment file can be empty, filled with 0x00, or maybe filled with some dummy value like 0xFF depending on the underlying OS' implementation of file preallocation.

In other words, I think if we do fsync before renaming the file, we should avoid empty file problem as well as magic number being 0 problem.

Also, fsync being on the hot path can be a dealbreaker (though with the addition of async queue and another open work, it won't be in hot path in future

My question is more on why having fsync on hot path is a dealbreaker, is the fsync performance really that bad for typical use cases? If we can do some quick perf test on it it would be nice. To put it more bluntly, I wonder if not doing a fsync is premature optimization, but of course I know I can be wrong :-) The argument of fsync can hurt performance totally make sense, but I am just if it really hurts "that much" :)

@codesome
Copy link
Member Author

It is already a problem without fsync on high loads: when the ChunkDiskMapper is trying to flush/mmap many chunks to the disk and it sometimes causes high write latencies because many chunks are waiting to be flushed (this happens on hot path). Trying to fsync (which can happen when writing chunks because you need to go to the next file) will only make it worse.

Again, file getting corrupted like this during abrupt shutdown is not a problem, we can gracefully recover from it without any data loss. Also, trying to fsync does not take the problem away: the abrupt shutdown can happen right when trying to fsync. So I don't see an benefit of a fsync here.

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
@codesome codesome merged commit 83d738e into prometheus:main Sep 28, 2022
@alvinlin123
Copy link
Contributor

It is already a problem without fsync on high loads: when the ChunkDiskMapper is trying to flush/mmap many chunks to the disk and it sometimes causes high write latencies because many chunks are waiting to be flushed (this happens on hot path). Trying to fsync (which can happen when writing chunks because you need to go to the next file) will only make it worse.

I see, thanks for the follow up :)

jan--f pushed a commit to jan--f/prometheus that referenced this pull request Oct 20, 2022
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
(cherry picked from commit 83d738e)
openshift-merge-robot added a commit to openshift/prometheus that referenced this pull request Oct 21, 2022
OCPBUGS-2640: Fix 'invalid magic number 0' bug (prometheus#11338)
roidelapluie pushed a commit to roidelapluie/prometheus that referenced this pull request Nov 4, 2022
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
roidelapluie pushed a commit that referenced this pull request Nov 4, 2022
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
raptorsun pushed a commit to raptorsun/prometheus that referenced this pull request Nov 4, 2022
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
(cherry picked from commit 83d738e)
@alexey-sh
Copy link

I observed the following error on v2.45.0

ts=2023-06-28T08:18:26.074Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1687507200239 maxt=1687514400000 ulid=01H3KYJFEZR28K8AM1E15NSQ0E
ts=2023-06-28T08:18:26.074Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1687478400239 maxt=1687500000000 ulid=01H3KYJKXPA1TBFBXDQWHHA4F2
ts=2023-06-28T08:18:26.074Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1687514400239 maxt=1687521600000 ulid=01H3M5E6PXJ75N807800QM4Z76
ts=2023-06-28T08:18:26.230Z caller=main.go:878 level=info msg="Stopping scrape discovery manager..."
ts=2023-06-28T08:18:26.230Z caller=manager.go:995 level=info component="rule manager" msg="Starting rule manager..."
ts=2023-06-28T08:18:26.230Z caller=main.go:892 level=info msg="Stopping notify discovery manager..."
ts=2023-06-28T08:18:26.230Z caller=manager.go:1009 level=info component="rule manager" msg="Stopping rule manager..."
ts=2023-06-28T08:18:26.230Z caller=manager.go:1019 level=info component="rule manager" msg="Rule manager stopped"
ts=2023-06-28T08:18:26.230Z caller=main.go:929 level=info msg="Stopping scrape manager..."
ts=2023-06-28T08:18:26.230Z caller=main.go:874 level=info msg="Scrape discovery manager stopped"
ts=2023-06-28T08:18:26.230Z caller=main.go:888 level=info msg="Notify discovery manager stopped"
ts=2023-06-28T08:18:26.230Z caller=notifier.go:603 level=info component=notifier msg="Stopping notification manager..."
ts=2023-06-28T08:18:26.230Z caller=main.go:1150 level=info msg="Notifier manager stopped"
ts=2023-06-28T08:18:26.230Z caller=main.go:921 level=info msg="Scrape manager stopped"
ts=2023-06-28T08:18:26.230Z caller=main.go:1159 level=error err="opening storage failed: reloadBlocks: corrupted block 01GZWMTKQZMDF67X6MWP46DPJS: invalid magic number 0"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tsdb.Open fails with invalid magic number 0 when running with reverted previously mmaped chunks

4 participants