Fix 'invalid magic number 0' bug by codesome · Pull Request #11338 · prometheus/prometheus

codesome · 2022-09-22T08:46:42Z

We already had a repair mechanism for this file if the file was empty, but it can happen that file is not 0 size but filled with 0s. So this PR extends the repair logic and repairs for file with 0s as well.

I would like this to be a part of v2.39

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

alvinlin123 · 2022-09-28T02:15:07Z

I think this fix is good to detect 0 magic number segment file and remove it.

But I also think we should maybe call file.Sync() right after writing the magic number to the .tmp file when cutting a new segment file. I believe this is the root cause why sometimes segment file has magic number value of 0 (I am thinking of a case where data did not get flushed to disk after rename the .tmp file by OS).

From what I've read, cutting a new segment file in head chunk involves

create a .tmp file
pre-allocate the file to some size and fill the file with 0x00s
write magic number
close file
rename file

According to Linux's close man page data are not guaranteed to be written to disk, and I think Go's File.Close pretty much just make a underlying OS system call.

I don't think the fileutil.Rename() sync the data to disk too.

I think in this PR maybe we can add a File.Sync() call after writing the magic number, then we can do some log grepping on to see if invalid magic number 0 error still happens?

I also see the comment for repairLastChunkFile():

// repairLastChunkFile deletes the last file if it's empty.
// Because we don't fsync when creating these files, we could end
// up with an empty file at the end during an abrupt shutdown.

I am wondering why doing fsync is not desirable? Is it because of perf?

alvinlin123 · 2022-09-28T02:16:04Z

tsdb/chunks/head_chunks.go

+			return files, errors.Wrap(err, "read file during last head chunk file repair")
+		}
+
+		remove = binary.BigEndian.Uint32(buf) == 0


I think we should log something here to tell the user that "hey, we encountered magic number is 0 error".

Hmm, I don't think adding this log line is useful or actionable. Also, this package does not log anywhere and I would like to keep it that way. Deleting the last file is not a big issue.

Yea, if we don't want to do fsync I don't think the log will be useful too. I mainly suggested this to see if doing fsync would fix this problem too :)

alvinlin123 · 2022-09-28T02:40:37Z

Ok, I see the argument for not doing a sync after writing magic number is noted in #8061 (comment), which I can agree because I know fsync does come with overhead.

But, I wonder sync 4 bytes of magic number would take seconds; if it indeed take seconds, isn't the disk overloaded and you have bigger problem?

I get the idea of "fixing" the problem at "read path" to avoid fsync when cutting new segment file in head chunk but;

Is the saving of overhead of fsync worth to have maintainability issue like the issue? By not doing fsync, we are essentially asking anyone making change to the logic of cutting new segment file to consider if update is needed to repairLastChunkFile.
1. In other words, if we were to do fsync back in 2020, we probably won't need this PR?
Do we cut new segment file so frequently that fsync overhead is an issue?

tsdb/chunks/head_chunks.go

alvinlin123 · 2022-09-28T03:13:43Z

tsdb/chunks/head_chunks.go

-	if info.Size() == 0 {
+
+	remove := info.Size() == 0 // Empty file.
+	if info.Size() >= MagicChunksSize {


nit: I think it might be better to move this if block to a helper function so the code can read like:

remove := info.Size() ==0 remove = !remove && noMagicNubmer(...) if remove { .... }

Since this logic is only used here and that repairLastChunkFile is not too huge, I prefer to not create another function here.

Make sense; you have a good point.

codesome · 2022-09-28T06:55:46Z

It is not just about syncing the magic number. In some OS the whole file is preallocated even before you write the magic number (for example Windows). So any abrupt shutdown before you sync the magic number still needs to be detected (you can see that we already had a code in place to detect empty files, but that is apparently not enough).

Also, fsync being on the hot path can be a dealbreaker (though with the addition of async queue and another open work, it won't be in hot path in future). This repair mechanism is acceptable and I don't see it as a maintainence burden here; a repair mechanism for database is desirable instead of a hard fail IMO (you cannot expect your hardware to work properly all the time, issues happen).

Additionally, if there was no magic number, it also means there were no chunks flushed to the file, so flushing the magic number does not help us in any way. If you see, trying to sync magic number still leaves a room for the error to occur.

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

alvinlin123 · 2022-09-28T08:03:19Z

@codesome yes, you are absolutely right, what you did in this PR is needed; I wouldn't remove it, it's awesome fix :)

In some OS the whole file is preallocated even before you write the magic number (for example Windows). So any abrupt shutdown before you sync the magic number still needs to be detected

Right, when you preallocate, I think different OS may do different thing, some may fill it with 0x00, some may leave the file empty, but that's not the problem here. The current code is we pre-allocate a temporary file, write the magic number, then rename the file. If we do a fsync before renaming the file, the rename the file, then anything abrupt termination before renaming the file will result the file remain temporary, and will get cleaned up/ignored. In other words, if we can guarantee that before renaming the file, the magic number is written to the temporary file on disk, the the abrupt termination should not result in corrupted segment file (the temporary can still be corrupted, but it doesn't matter).

If we don't do fysnc, then I can see how the actual segment file can be empty, filled with 0x00, or maybe filled with some dummy value like 0xFF depending on the underlying OS' implementation of file preallocation.

In other words, I think if we do fsync before renaming the file, we should avoid empty file problem as well as magic number being 0 problem.

Also, fsync being on the hot path can be a dealbreaker (though with the addition of async queue and another open work, it won't be in hot path in future

My question is more on why having fsync on hot path is a dealbreaker, is the fsync performance really that bad for typical use cases? If we can do some quick perf test on it it would be nice. To put it more bluntly, I wonder if not doing a fsync is premature optimization, but of course I know I can be wrong :-) The argument of fsync can hurt performance totally make sense, but I am just if it really hurts "that much" :)

codesome · 2022-09-28T08:11:56Z

It is already a problem without fsync on high loads: when the ChunkDiskMapper is trying to flush/mmap many chunks to the disk and it sometimes causes high write latencies because many chunks are waiting to be flushed (this happens on hot path). Trying to fsync (which can happen when writing chunks because you need to go to the next file) will only make it worse.

Again, file getting corrupted like this during abrupt shutdown is not a problem, we can gracefully recover from it without any data loss. Also, trying to fsync does not take the problem away: the abrupt shutdown can happen right when trying to fsync. So I don't see an benefit of a fsync here.

tsdb/chunks/head_chunks.go

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

alvinlin123 · 2022-09-28T16:36:56Z

It is already a problem without fsync on high loads: when the ChunkDiskMapper is trying to flush/mmap many chunks to the disk and it sometimes causes high write latencies because many chunks are waiting to be flushed (this happens on hot path). Trying to fsync (which can happen when writing chunks because you need to go to the next file) will only make it worse.

I see, thanks for the follow up :)

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> (cherry picked from commit 83d738e)

OCPBUGS-2640: Fix 'invalid magic number 0' bug (prometheus#11338)

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> (cherry picked from commit 83d738e)

alexey-sh · 2023-06-28T08:20:56Z

I observed the following error on v2.45.0

ts=2023-06-28T08:18:26.074Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1687507200239 maxt=1687514400000 ulid=01H3KYJFEZR28K8AM1E15NSQ0E
ts=2023-06-28T08:18:26.074Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1687478400239 maxt=1687500000000 ulid=01H3KYJKXPA1TBFBXDQWHHA4F2
ts=2023-06-28T08:18:26.074Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1687514400239 maxt=1687521600000 ulid=01H3M5E6PXJ75N807800QM4Z76
ts=2023-06-28T08:18:26.230Z caller=main.go:878 level=info msg="Stopping scrape discovery manager..."
ts=2023-06-28T08:18:26.230Z caller=manager.go:995 level=info component="rule manager" msg="Starting rule manager..."
ts=2023-06-28T08:18:26.230Z caller=main.go:892 level=info msg="Stopping notify discovery manager..."
ts=2023-06-28T08:18:26.230Z caller=manager.go:1009 level=info component="rule manager" msg="Stopping rule manager..."
ts=2023-06-28T08:18:26.230Z caller=manager.go:1019 level=info component="rule manager" msg="Rule manager stopped"
ts=2023-06-28T08:18:26.230Z caller=main.go:929 level=info msg="Stopping scrape manager..."
ts=2023-06-28T08:18:26.230Z caller=main.go:874 level=info msg="Scrape discovery manager stopped"
ts=2023-06-28T08:18:26.230Z caller=main.go:888 level=info msg="Notify discovery manager stopped"
ts=2023-06-28T08:18:26.230Z caller=notifier.go:603 level=info component=notifier msg="Stopping notification manager..."
ts=2023-06-28T08:18:26.230Z caller=main.go:1150 level=info msg="Notifier manager stopped"
ts=2023-06-28T08:18:26.230Z caller=main.go:921 level=info msg="Scrape manager stopped"
ts=2023-06-28T08:18:26.230Z caller=main.go:1159 level=error err="opening storage failed: reloadBlocks: corrupted block 01GZWMTKQZMDF67X6MWP46DPJS: invalid magic number 0"

codesome requested a review from bwplotka September 22, 2022 08:46

Fix 'invalid magic number 0' bug

2ae668c

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

codesome force-pushed the magic-number branch from be6a036 to 2ae668c Compare September 22, 2022 08:47

codesome mentioned this pull request Sep 23, 2022

Cut v2.39.0-rc.0 #11344

Merged

8 tasks

alvinlin123 reviewed Sep 28, 2022

View reviewed changes

tsdb/chunks/head_chunks.go Outdated Show resolved Hide resolved

alvinlin123 reviewed Sep 28, 2022

View reviewed changes

Apply feedback

81373ca

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

csmarchbanks reviewed Sep 28, 2022

View reviewed changes

tsdb/chunks/head_chunks.go Outdated Show resolved Hide resolved

tsdb/chunks/head_chunks.go Outdated Show resolved Hide resolved

Apply feedback

a26ed4a

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

csmarchbanks approved these changes Sep 28, 2022

View reviewed changes

codesome merged commit 83d738e into prometheus:main Sep 28, 2022

jan--f pushed a commit to jan--f/prometheus that referenced this pull request Oct 20, 2022

Fix 'invalid magic number 0' bug (prometheus#11338)

f072fc4

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> (cherry picked from commit 83d738e)

openshift-ci bot mentioned this pull request Oct 20, 2022

OCPBUGS-2640: Fix 'invalid magic number 0' bug (#11338) openshift/prometheus#143

Merged

openshift-merge-robot added a commit to openshift/prometheus that referenced this pull request Oct 21, 2022

Merge pull request #143 from jan--f/pr-11338-4.11

e751c61

OCPBUGS-2640: Fix 'invalid magic number 0' bug (prometheus#11338)

roidelapluie pushed a commit to roidelapluie/prometheus that referenced this pull request Nov 4, 2022

Fix 'invalid magic number 0' bug (prometheus#11338)

50bb491

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

roidelapluie pushed a commit that referenced this pull request Nov 4, 2022

Fix 'invalid magic number 0' bug (#11338)

715e56f

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

raptorsun pushed a commit to raptorsun/prometheus that referenced this pull request Nov 4, 2022

Fix 'invalid magic number 0' bug (prometheus#11338)

9135fd3

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> (cherry picked from commit 83d738e)

raptorsun mentioned this pull request Nov 4, 2022

OCPBUGS-3078: Backkport OCPBUGS-1718 to release 4.10 openshift/prometheus#146

Merged

Conversation

codesome commented Sep 22, 2022

Uh oh!

alvinlin123 commented Sep 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alvinlin123 Sep 28, 2022

Choose a reason for hiding this comment

Uh oh!

codesome Sep 28, 2022

Choose a reason for hiding this comment

Uh oh!

alvinlin123 Sep 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alvinlin123 commented Sep 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

alvinlin123 Sep 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codesome Sep 28, 2022

Choose a reason for hiding this comment

Uh oh!

alvinlin123 Sep 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codesome commented Sep 28, 2022

Uh oh!

alvinlin123 commented Sep 28, 2022

Uh oh!

codesome commented Sep 28, 2022

Uh oh!

Uh oh!

Uh oh!

alvinlin123 commented Sep 28, 2022

Uh oh!

alexey-sh commented Jun 28, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

alvinlin123 commented Sep 28, 2022 •

edited

Loading

alvinlin123 Sep 28, 2022 •

edited

Loading

alvinlin123 commented Sep 28, 2022 •

edited

Loading

alvinlin123 Sep 28, 2022 •

edited

Loading

alvinlin123 Sep 28, 2022 •

edited

Loading