storage: Split chunks if more than 120 samples by metalmatze · Pull Request #8582 · prometheus/prometheus

metalmatze · 2021-03-10T18:20:49Z

This is my attempt at fixing #5862.

It's based on the work that @bwplotka recently did and uses the NewSeriesSetToChunkSet.
While iterating over the samples we keep track of how many we've appened and if that's more than 120 we append the current chunk to the chunk slice creating a new one to keep appending to.

It's probably not perfect yet but I'd rather get feedback early.

/cc @codesome @hdost

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

metalmatze · 2021-03-26T17:57:51Z

A friendly ping and reminder for @codesome 😊

roidelapluie · 2021-03-26T19:09:04Z

Did you benchmark this?

codesome

LGTM, only nits. And as Julien said, do you have any benchmark results to show difference in size and time taken to do compaction?

codesome · 2021-03-28T12:38:18Z

storage/merge_test.go

+				NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(0, 90)),  // 0 - 90
+				NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(60, 90)), // 90 - 150


Suggested change

NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(0, 90)), // 0 - 90

NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(60, 90)), // 90 - 150

NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(0, 90)), // [0 - 90)

NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(60, 90)), // [60 - 150)

codesome · 2021-03-28T12:38:48Z

storage/merge_test.go

+			),
+		},
+		{
+			name: "150 overlapping split chunk",


Suggested change

name: "150 overlapping split chunk",

name: "150 overlapping samples, split chunk",

codesome · 2021-03-28T12:39:20Z

storage/merge_test.go

+				NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(0, 110)), // 0 - 110
+				NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(60, 50)), // 60 - 110


Suggested change

NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(0, 110)), // 0 - 110

NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(60, 50)), // 60 - 110

NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(0, 110)), // [0 - 110)

NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(60, 50)), // [60 - 110)

codesome · 2021-03-28T12:40:57Z

storage/series.go

+				MaxTime: maxt,
+				Chunk:   chk,
+			})
+			// TODO: There's probably a nicer way than doing this here.


I would remove this TODO

yea, there is not (:

codesome · 2021-03-28T12:41:37Z

storage/series.go

+	i := 0
 	seriesIter := s.Series.Iterator()
 	for seriesIter.Next() {
+		// Create a new chunk if too many samples in the current one


Also in other places where missed.

Suggested change

// Create a new chunk if too many samples in the current one

// Create a new chunk if too many samples in the current one.

roidelapluie · 2021-03-29T20:47:07Z

/prombench main

prombot · 2021-03-29T20:47:07Z

Incorrect prombench syntax, please find correct syntax here.

roidelapluie · 2021-03-29T20:47:58Z

/prombench v2.26.0-rc.0

prombot · 2021-03-29T20:47:59Z

⏱️ Welcome to Prometheus Benchmarking Tool. ⏱️

Compared versions: PR-8582 and v2.26.0-rc.0

After successful deployment, the benchmarking metrics can be viewed at:

Other Commands:
To stop benchmark: /prombench cancel
To restart benchmark: /prombench restart v2.26.0-rc.0

codesome · 2021-03-30T08:09:27Z

Not much difference with prombench as expected as the split won't come into picture. Mostly relevant in vertical compaction and compacting blocks not written by Prometheus.

roidelapluie · 2021-03-30T20:44:40Z

/prombench cancel

prombot · 2021-03-30T20:44:42Z

Benchmark cancel is in progress.

bwplotka

Nice, some comments from my side.

I think it's great to see finally - it will bring more consistent results on vertical compactions.

bwplotka · 2021-03-31T10:54:17Z

storage/series.go

 }

-// TODO(bwplotka): Currently encoder will just naively build one chunk, without limit. Split it: https://github.com/prometheus/tsdb/issues/670
+const seriesToChunkEncoderSplit = 120


hm, I wonder if we could use existing constant:

prometheus/tsdb/head.go

Line 2334 in d614ae9

const samplesPerChunk = 120

🤔 We only need to make sure we don't do any cycling package link.

I guess we could try to figure something out. As you suspected right now there will be a import cycle and we would need to move the constant elsewhere.
In the end it might not make such a big difference. I don't expect this change before Prometheus v3.0.

bwplotka · 2021-03-31T10:54:42Z

storage/series.go

+const seriesToChunkEncoderSplit = 120
+
 func (s *seriesToChunkEncoder) Iterator() chunks.Iterator {
+	chks := []chunks.Meta{}


Can we create this closer to the usage?

bwplotka · 2021-03-31T10:55:07Z

storage/series.go

+				MaxTime: maxt,
+				Chunk:   chk,
+			})
+			// TODO: There's probably a nicer way than doing this here.


yea, there is not (:

bwplotka · 2021-03-31T10:55:24Z

storage/series.go

+				return errChunksIterator{err: err}
+			}
+			mint = int64(math.MaxInt64)
+			// maxt is immediately overwritten below


Can we have this comment full sentence?

tsdb/tsdbutil/chunks.go

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

metalmatze · 2021-04-09T21:37:01Z

Addressed the comments. Thanks for the good feedback 👍

yeya24 · 2021-05-17T21:50:04Z

It would be nice to have this feature merged!

bwplotka · 2021-05-18T16:37:20Z

Thanks!

storage: Split chunks if more than 120 samples

28ff171

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

metalmatze requested a review from codesome as a code owner March 10, 2021 18:20

storage: Don't set maxt which is overwritten right away

16ef84c

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

codesome reviewed Mar 28, 2021

View reviewed changes

prombot added the prombench label Mar 29, 2021

bwplotka approved these changes Mar 31, 2021

View reviewed changes

metalmatze added 3 commits April 9, 2021 23:35

storage: Improve comments on merge_test

0528e1b

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

storage: Improve comments and move code closer to usage

ec9b3b6

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

tsdb/tsdbutil: Add comment for GenerateSamples

c194c1f

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

bwplotka approved these changes May 18, 2021

View reviewed changes

bwplotka merged commit 7e7efab into prometheus:main May 18, 2021

yeya24 mentioned this pull request May 18, 2021

tsdb: Avoid chunks with >120 samples in MergeOverlappingChunks #5862

Closed

metalmatze deleted the split-chunks branch May 30, 2021 12:04

codesome mentioned this pull request Jun 9, 2025

Make samplesPerChunk configurable #12055

Merged

		NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(0, 90)), // 0 - 90
		NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(60, 90)), // 90 - 150

	name: "150 overlapping split chunk",
	name: "150 overlapping samples, split chunk",

	// Create a new chunk if too many samples in the current one
	// Create a new chunk if too many samples in the current one.

Conversation

metalmatze commented Mar 10, 2021

Uh oh!

metalmatze commented Mar 26, 2021

Uh oh!

roidelapluie commented Mar 26, 2021

Uh oh!

codesome left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

roidelapluie commented Mar 29, 2021

Uh oh!

prombot commented Mar 29, 2021

Uh oh!

roidelapluie commented Mar 29, 2021

Uh oh!

prombot commented Mar 29, 2021

Uh oh!

codesome commented Mar 30, 2021

Uh oh!

roidelapluie commented Mar 30, 2021

Uh oh!

prombot commented Mar 30, 2021

Uh oh!

bwplotka left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

metalmatze commented Apr 9, 2021

Uh oh!

yeya24 commented May 17, 2021

Uh oh!

bwplotka commented May 18, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants