Make samplesPerChunk configurable#12055
Conversation
4b374a8 to
33409d1
Compare
|
/prombench main |
|
@bboreham is not a org member nor a collaborator and cannot execute benchmarks. |
|
/prombench main |
|
Benchmark tests are running for 3 days! If this is intended ignore this message otherwise you can cancel it by commenting: |
|
It appears to use slightly more memory: There are fewer chunks in memory, as expected: @leizor what metric(s) were you looking to show an improvement on? (Also it seems Prombench is executing no queries. That seems like a crucial omission) |
|
I think saving a bit of disk space in exchange for using slightly more memory is pretty much what I expected. It's a bummer about the queries though because I suspect that this change might also result in slightly longer query latencies since each chunk is larger (I was hoping it'd not be the case though). Is it something in my PR that's making Prombench not execute any queries? |
|
Benchmark tests are running for 5 days! If this is intended ignore this message otherwise you can cancel it by commenting: |
|
/prombench cancel |
|
@leizor is not a org member nor a collaborator and cannot execute benchmarks. |
|
Converting this PR to a draft; I'm going to take a stab at forgoing sample count limits in favor of chunk size limits here instead. |
|
Any ideas why it uses more RAM? With fewer chunks and a better compression ratio, I would expect less RAM usage. If at all, I would expect more CPU usage and longer query times (because we have to iterate through chunks with more samples). |
|
/prombench cancel |
|
Benchmark cancel is in progress. |
Hmm good point. I was thinking it'd use more RAM because it'd be holding more samples in memory at any given time, but I think you're right--the number of samples should actually still be the same, just in fewer chunks. |
|
@beorn7 Actually, I stumbled across this today (from here):
If a chunk is flushed to disk as soon as a new one is cut (instead of only when the block is done), larger chunks would indeed result in increased memory usage, right? EDIT: Looking a bit more closely, I think the code matches that behavior (after the head chunk is mmapped, |
|
Good point. Small note: if the data is queried it will be brought back into memory. So it’s relevant that Prombench was not running any queries. |
|
Side note: If we are considering changing that number - what about making this (e.g. hidden for now) flag? |
|
Yeah, sure, if we end up with tweaking the number of samples per chunk, it could be a hidden flag for the time being. But maybe going down the path of trying to set a target in bytes rather than samples is anyway better. (Which then, of course, might be a (hidden) flag, too.) |
Signed-off-by: Justin Lei <justin.lei@grafana.com>
33409d1 to
fd08473
Compare
Signed-off-by: Justin Lei <justin.lei@grafana.com>
fd08473 to
73ff91d
Compare
|
Do we want to add the flag in this PR or defer to later one? |
|
I would say, let's put it into a (hidden) flag here in this PR (similar to With the flags, users can experiment, and based on the experiences, we can then set a new default limit that's generally better (or maybe we go down the path of limiting the number of bytes after all). But if we hardcode a different limit now, and then it turns out it's detrimental in some cases, it will create a needless confusion. |
|
I would say we need a flag. |
fbacc6d to
ab4af48
Compare
Signed-off-by: Justin Lei <justin.lei@grafana.com>
Signed-off-by: Justin Lei <justin.lei@grafana.com>
ab4af48 to
c3e6b85
Compare
|
Ok, I've added the new flag |
beorn7
left a comment
There was a problem hiding this comment.
I think that works.
@roidelapluie @bwplotka @jesusvazquez any last words? Otherwise I'll merge tomorrow.
jesusvazquez
left a comment
There was a problem hiding this comment.
LGTM, nice job @leizor
I am also uncertain about why memory slightly increased with 220 samples, would be good to look into that.
Also good call on adding the flag.
|
Thanks, merging now… |
|
I've got two notes about this: One is regarding the memory usage: for which I guess one of the reasons are the 8 extra bytes of that setting we're storing per memSeries. What was the reason to do this setting per memSeries? Is it to make sure that the 1/4th chunk estimation works correctly if the setting is changed in the Head? But this doesn't sound like a setting someone would want to change in runtime, and even if they did, the side-effect of having uneven chunks once doesn't seem to be an argument to store that integer in every single series. The other one is regarding how that strange amount of 220 samples (which isn't a multiple of 15, 30, 60: the usual Prometheus scraping intervals) really works with the chunk size estimation: prometheus/tsdb/head_append.go Lines 1371 to 1377 in 37e5249 I was wondering if the estimation algorithm would maybe try to issue 3 x 160-samples chunks for a 15s scrape interval (respecting the 220 maximum, making it useless)? But after writing some tests I see that setting this at 220, we'll actually issue just 240-samples chunks: diff --git tsdb/head_test.go tsdb/head_test.go
index af3df378e..7b69f11c1 100644
--- tsdb/head_test.go
+++ tsdb/head_test.go
@@ -1318,6 +1318,20 @@ func TestComputeChunkEndTime(t *testing.T) {
max: 1000,
res: 104,
},
+ {
+ // Test 15s scrape inteval with 220 samples per chunk.
+ start: 0, // Start at 0 to make it easier.
+ cur: 15e3 * (220 / 4), // We're at 1/4th after receiving 220/4 samples each 15s.
+ max: 2 * 60 * 60e3, // 2h
+ res: 1 * 60 * 60e3, // 1h, i.e. 2-240 samples chunks per hour
+ },
+ {
+ // Test 30s scrape inteval with 220 samples per chunk.
+ start: 0, // Start at 0 to make it easier.
+ cur: 30e3 * (220 / 4), // We're at 1/4th after receiving 220/4 samples each 30s.
+ max: 2 * 60 * 60e3, // 2h
+ res: 2 * 60 * 60e3, // 4, i.e. 1 240-samples chunk.
+ },
}
for _, c := range cases {Maybe we should change the setting to 240 then? I know this is specific to the default 2h blocks, but I don't think anyone runs 110-minutes blocks, and it's strange to see chunks issued with a different size compared to what was set. |
|
Very good point about storing the setting per memSeries. That seems unneeded indeed. Even if it were not fully responsible for the memory increase, it is still confusing conceptually (because it suggests that different memSeries might have a different setting). @leizor would you like to create an amendment PR? Also good thoughts about the exact value of samplesPerChunk and its consequences for the actual chunk size. Note that the final version of this PR does not set the value to 220 but leaves it at 120 with the option of setting it via the command line. So we are good on this front, but users of the new flag should be aware of the interdependencies of the different TSDB flags. (They are all hidden advanced flags, so I'm not too concerned right now that it is hard to set them to something meaningful.) |
|
@beorn7 Yep, will do! |
It was just a way to pass the information down to where it was needed; we need to know what the value of samplesPerChunk is in I think I'm going to give memSeries a reference to HeadOptions instead. Unfortunately it'll use the same extra 8 bytes per memSeries but I figure at least it wouldn't imply that the value could be different between memSeries. EDIT: Actually, I think I'll add |
|
Updated the PR title to not cause confusion in the future. |
| mmMaxTime int64 // Max time of any mmapped chunk, only used during WAL replay. | ||
|
|
||
| nextAt int64 // Timestamp at which to cut the next chunk. | ||
| samplesPerChunk int // Target number of samples per chunk. |
There was a problem hiding this comment.
We should probably make this uint16 and have validation during init to keep it within that range. Saves 48 bytes per series.
|
I am wondering if setting it to a higher number gets hit by #8582. I see that the logic still exists. |



This PR suggests a new value for
samplesPerChunkbased on some observations/assumptions about the average dataset:10control bits.Therefore, this equation (n = number of samples) for expected size per chunk covers 60% of the cases:
The other 40% is covered by this:
Combining the two cases covers nearly 100% of the cases:
This allows us to arrive at an equation to describe the expected compression ratio:
That means that the compression ratio approaches an asymptote of 128 / 5.6 = 22.86. If we target achieving 90% of the maximum expected compression ratio, we'd be targeting 22.86 * .9 = 20.574. Solving for n, we arrive at 223 samples per chunk, which is 1388 bits or 174 bytes.
As a point of comparison, the current samples per chunk value yields ~83% of the maximum expected compression ratio with an expected chunk size of 811 bits or 102 bytes.