-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Closed
Closed
Copy link
Description
We were seeing a state in which a shard would not perform full compactions leading to a build up of level 4 TSM files.
File state:
-rw-r--r--. 1 root root 2.1G Aug 5 20:59 000016684-000000007.tsm
-rw-r--r--. 1 root root 2.1G Aug 5 21:02 000016684-000000008.tsm
-rw-r--r--. 1 root root 2.1G Aug 5 21:04 000016684-000000009.tsm
-rw-r--r--. 1 root root 376M Aug 5 21:05 000016684-000000010.tsm
-rw-r--r--. 1 root root 2.1G Aug 5 18:00 000016812-000000004.tsm
-rw-r--r--. 1 root root 1.4G Aug 5 18:00 000016812-000000005.tsm
-rw-r--r--. 1 root root 1.3G Aug 5 21:21 000016844-000000002.tsm
-rw-r--r--. 1 root root 2.1G Aug 5 18:00 000016948-000000004.tsm
-rw-r--r--. 1 root root 1.4G Aug 5 18:00 000016948-000000005.tsm
-rw-r--r--. 1 root root 2.1G Aug 5 18:00 000017076-000000004.tsm
There is a rouge level 2 file packed within fully compacted files
-rw-r--r--. 1 root root 2.1G Aug 5 20:59 000016684-000000007.tsm
-rw-r--r--. 1 root root 2.1G Aug 5 21:02 000016684-000000008.tsm
-rw-r--r--. 1 root root 2.1G Aug 5 21:04 000016684-000000009.tsm
-rw-r--r--. 1 root root 376M Aug 5 21:05 000016684-000000010.tsm
and level 4 files
-rw-r--r--. 1 root root 2.1G Aug 5 18:00 000016948-000000004.tsm
-rw-r--r--. 1 root root 1.4G Aug 5 18:00 000016948-000000005.tsm
-rw-r--r--. 1 root root 2.1G Aug 5 18:00 000017076-000000004.tsm
The area of our code that would cause this state to be skipped would be here
influxdb/tsdb/engine/tsm1/compact.go
Lines 620 to 670 in 22bec4f
| // step is how may files to compact in a group. We want to clamp it at 4 but also stil | |
| // return groups smaller than 4. | |
| step := 4 | |
| if step > end { | |
| step = end | |
| } | |
| // slice off the generations that we'll examine | |
| generations = generations[start:end] | |
| // Loop through the generations in groups of size step and see if we can compact all (or | |
| // some of them as group) | |
| groups := []tsmGenerations{} | |
| for i := 0; i < len(generations); i += step { | |
| var skipGroup bool | |
| startIndex := i | |
| for j := i; j < i+step && j < len(generations); j++ { | |
| gen := generations[j] | |
| lvl := gen.level() | |
| // Skip compacting this group if there happens to be any lower level files in the | |
| // middle. These will get picked up by the level compactors. | |
| if lvl <= 3 { | |
| skipGroup = true | |
| break | |
| } | |
| // Skip the file if it's over the max size and it contains a full block | |
| if gen.size() >= uint64(tsdb.MaxTSMFileSize) && gen.files[0].FirstBlockCount >= tsdb.DefaultMaxPointsPerBlock && !gen.hasTombstones() { | |
| startIndex++ | |
| continue | |
| } | |
| } | |
| if skipGroup { | |
| continue | |
| } | |
| endIndex := i + step | |
| if endIndex > len(generations) { | |
| endIndex = len(generations) | |
| } | |
| if endIndex-startIndex > 0 { | |
| groups = append(groups, generations[startIndex:endIndex]) | |
| } | |
| } | |
| if len(groups) == 0 { | |
| return nil, 0 | |
| } |
We need to add some sort of escape mechanism that would allow for compactions to occur or simplify this logic.
Steps to reproduce:
It would be very difficult to replicate this issue, we believe it was an artifact from running compactions on v1.12.1. We understand that the state outlined above would result in a loop that never fully compacts TSM files.
Reactions are currently unavailable