Skip to content

TSM file state leads to inescapable loop where compactFull is not running #26681

@devanbenz

Description

@devanbenz

We were seeing a state in which a shard would not perform full compactions leading to a build up of level 4 TSM files.

File state:

-rw-r--r--.  1 root root 2.1G Aug  5 20:59 000016684-000000007.tsm
-rw-r--r--.  1 root root 2.1G Aug  5 21:02 000016684-000000008.tsm
-rw-r--r--.  1 root root 2.1G Aug  5 21:04 000016684-000000009.tsm
-rw-r--r--.  1 root root 376M Aug  5 21:05 000016684-000000010.tsm
-rw-r--r--.  1 root root 2.1G Aug  5 18:00 000016812-000000004.tsm
-rw-r--r--.  1 root root 1.4G Aug  5 18:00 000016812-000000005.tsm
-rw-r--r--.  1 root root 1.3G Aug  5 21:21 000016844-000000002.tsm
-rw-r--r--.  1 root root 2.1G Aug  5 18:00 000016948-000000004.tsm
-rw-r--r--.  1 root root 1.4G Aug  5 18:00 000016948-000000005.tsm
-rw-r--r--.  1 root root 2.1G Aug  5 18:00 000017076-000000004.tsm

There is a rouge level 2 file packed within fully compacted files

-rw-r--r--.  1 root root 2.1G Aug  5 20:59 000016684-000000007.tsm
-rw-r--r--.  1 root root 2.1G Aug  5 21:02 000016684-000000008.tsm
-rw-r--r--.  1 root root 2.1G Aug  5 21:04 000016684-000000009.tsm
-rw-r--r--.  1 root root 376M Aug  5 21:05 000016684-000000010.tsm

and level 4 files

-rw-r--r--.  1 root root 2.1G Aug  5 18:00 000016948-000000004.tsm
-rw-r--r--.  1 root root 1.4G Aug  5 18:00 000016948-000000005.tsm
-rw-r--r--.  1 root root 2.1G Aug  5 18:00 000017076-000000004.tsm

The area of our code that would cause this state to be skipped would be here

// step is how may files to compact in a group. We want to clamp it at 4 but also stil
// return groups smaller than 4.
step := 4
if step > end {
step = end
}
// slice off the generations that we'll examine
generations = generations[start:end]
// Loop through the generations in groups of size step and see if we can compact all (or
// some of them as group)
groups := []tsmGenerations{}
for i := 0; i < len(generations); i += step {
var skipGroup bool
startIndex := i
for j := i; j < i+step && j < len(generations); j++ {
gen := generations[j]
lvl := gen.level()
// Skip compacting this group if there happens to be any lower level files in the
// middle. These will get picked up by the level compactors.
if lvl <= 3 {
skipGroup = true
break
}
// Skip the file if it's over the max size and it contains a full block
if gen.size() >= uint64(tsdb.MaxTSMFileSize) && gen.files[0].FirstBlockCount >= tsdb.DefaultMaxPointsPerBlock && !gen.hasTombstones() {
startIndex++
continue
}
}
if skipGroup {
continue
}
endIndex := i + step
if endIndex > len(generations) {
endIndex = len(generations)
}
if endIndex-startIndex > 0 {
groups = append(groups, generations[startIndex:endIndex])
}
}
if len(groups) == 0 {
return nil, 0
}

We need to add some sort of escape mechanism that would allow for compactions to occur or simplify this logic.

Steps to reproduce:
It would be very difficult to replicate this issue, we believe it was an artifact from running compactions on v1.12.1. We understand that the state outlined above would result in a loop that never fully compacts TSM files.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions