storage: Split chunks if more than 120 samples#8582
storage: Split chunks if more than 120 samples#8582bwplotka merged 5 commits intoprometheus:mainfrom
Conversation
Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>
Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>
|
A friendly ping and reminder for @codesome 😊 |
|
Did you benchmark this? |
codesome
left a comment
There was a problem hiding this comment.
LGTM, only nits. And as Julien said, do you have any benchmark results to show difference in size and time taken to do compaction?
storage/merge_test.go
Outdated
| NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(0, 90)), // 0 - 90 | ||
| NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(60, 90)), // 90 - 150 |
There was a problem hiding this comment.
| NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(0, 90)), // 0 - 90 | |
| NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(60, 90)), // 90 - 150 | |
| NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(0, 90)), // [0 - 90) | |
| NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(60, 90)), // [60 - 150) |
storage/merge_test.go
Outdated
| ), | ||
| }, | ||
| { | ||
| name: "150 overlapping split chunk", |
There was a problem hiding this comment.
| name: "150 overlapping split chunk", | |
| name: "150 overlapping samples, split chunk", |
storage/merge_test.go
Outdated
| NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(0, 110)), // 0 - 110 | ||
| NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(60, 50)), // 60 - 110 |
There was a problem hiding this comment.
| NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(0, 110)), // 0 - 110 | |
| NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(60, 50)), // 60 - 110 | |
| NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(0, 110)), // [0 - 110) | |
| NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(60, 50)), // [60 - 110) |
storage/series.go
Outdated
| MaxTime: maxt, | ||
| Chunk: chk, | ||
| }) | ||
| // TODO: There's probably a nicer way than doing this here. |
storage/series.go
Outdated
| i := 0 | ||
| seriesIter := s.Series.Iterator() | ||
| for seriesIter.Next() { | ||
| // Create a new chunk if too many samples in the current one |
There was a problem hiding this comment.
Also in other places where missed.
| // Create a new chunk if too many samples in the current one | |
| // Create a new chunk if too many samples in the current one. |
|
/prombench main |
|
Incorrect prombench syntax, please find correct syntax here. |
|
/prombench v2.26.0-rc.0 |
|
⏱️ Welcome to Prometheus Benchmarking Tool. ⏱️ Compared versions: After successful deployment, the benchmarking metrics can be viewed at: Other Commands: |
|
Not much difference with prombench as expected as the split won't come into picture. Mostly relevant in vertical compaction and compacting blocks not written by Prometheus. |
|
/prombench cancel |
|
Benchmark cancel is in progress. |
bwplotka
left a comment
There was a problem hiding this comment.
Nice, some comments from my side.
I think it's great to see finally - it will bring more consistent results on vertical compactions.
| } | ||
|
|
||
| // TODO(bwplotka): Currently encoder will just naively build one chunk, without limit. Split it: https://github.com/prometheus/tsdb/issues/670 | ||
| const seriesToChunkEncoderSplit = 120 |
There was a problem hiding this comment.
hm, I wonder if we could use existing constant:
Line 2334 in d614ae9
There was a problem hiding this comment.
I guess we could try to figure something out. As you suspected right now there will be a import cycle and we would need to move the constant elsewhere.
In the end it might not make such a big difference. I don't expect this change before Prometheus v3.0.
storage/series.go
Outdated
| const seriesToChunkEncoderSplit = 120 | ||
|
|
||
| func (s *seriesToChunkEncoder) Iterator() chunks.Iterator { | ||
| chks := []chunks.Meta{} |
There was a problem hiding this comment.
Can we create this closer to the usage?
storage/series.go
Outdated
| MaxTime: maxt, | ||
| Chunk: chk, | ||
| }) | ||
| // TODO: There's probably a nicer way than doing this here. |
storage/series.go
Outdated
| return errChunksIterator{err: err} | ||
| } | ||
| mint = int64(math.MaxInt64) | ||
| // maxt is immediately overwritten below |
There was a problem hiding this comment.
Can we have this comment full sentence?
Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>
Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>
Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>
|
Addressed the comments. Thanks for the good feedback 👍 |
|
It would be nice to have this feature merged! |
|
Thanks! |
This is my attempt at fixing #5862.
It's based on the work that @bwplotka recently did and uses the
NewSeriesSetToChunkSet.While iterating over the samples we keep track of how many we've appened and if that's more than 120 we append the current chunk to the chunk slice creating a new one to keep appending to.
It's probably not perfect yet but I'd rather get feedback early.
/cc @codesome @hdost