perf(chunkenc): intern custom values for native histograms by krajorama · Pull Request #16565 · prometheus/prometheus

krajorama · 2025-05-07T12:30:22Z

Fixes: #13490 . Although we can do more if we implement not losing custom values in the iterator between iterator reuse.

$ benchstat nhcb-intern-baseline.txt nhcb-intern-pr.txt 
goos: linux
goarch: amd64
pkg: github.com/prometheus/prometheus/promql
cpu: Intel(R) Core(TM) Ultra 7 155U
                                                                          │ nhcb-intern-baseline.txt │        nhcb-intern-pr.txt         │
                                                                          │          sec/op          │   sec/op     vs base              │
NativeHistogramsCustomBuckets/sum-14                                                     1.260 ± 18%   1.157 ± 15%       ~ (p=0.132 n=6)
NativeHistogramsCustomBuckets/sum_rate_with_short_rate_interval-14                       1.232 ± 34%   1.422 ± 25%       ~ (p=0.589 n=6)
NativeHistogramsCustomBuckets/sum_rate_with_long_rate_interval-14                        2.204 ±  8%   2.194 ± 27%       ~ (p=0.818 n=6)
NativeHistogramsCustomBuckets/histogram_count_with_short_rate_interval-14                1.185 ± 21%   1.194 ± 10%       ~ (p=1.000 n=6)
NativeHistogramsCustomBuckets/histogram_count_with_long_rate_interval-14                 2.082 ± 14%   1.730 ± 27%       ~ (p=0.093 n=6)
geomean                                                                                  1.532         1.494        -2.45%

                                                                          │ nhcb-intern-baseline.txt │          nhcb-intern-pr.txt          │
                                                                          │           B/op           │     B/op       vs base               │
NativeHistogramsCustomBuckets/sum-14                                                    472.5Mi ± 1%   406.3Mi ±  2%  -14.03% (p=0.002 n=6)
NativeHistogramsCustomBuckets/sum_rate_with_short_rate_interval-14                      203.6Mi ± 4%   170.7Mi ±  0%  -16.19% (p=0.002 n=6)
NativeHistogramsCustomBuckets/sum_rate_with_long_rate_interval-14                       200.3Mi ± 0%   167.4Mi ±  0%  -16.46% (p=0.002 n=6)
NativeHistogramsCustomBuckets/histogram_count_with_short_rate_interval-14               148.7Mi ± 0%   147.1Mi ± 10%        ~ (p=0.394 n=6)
NativeHistogramsCustomBuckets/histogram_count_with_long_rate_interval-14                153.0Mi ± 5%   157.9Mi ±  0%        ~ (p=1.000 n=6)
geomean                                                                                 213.0Mi        193.2Mi         -9.29%

                                                                          │ nhcb-intern-baseline.txt │         nhcb-intern-pr.txt         │
                                                                          │        allocs/op         │  allocs/op   vs base               │
NativeHistogramsCustomBuckets/sum-14                                                     6.589M ± 0%   5.160M ± 0%  -21.68% (p=0.002 n=6)
NativeHistogramsCustomBuckets/sum_rate_with_short_rate_interval-14                       3.719M ± 0%   2.998M ± 0%  -19.37% (p=0.002 n=6)
NativeHistogramsCustomBuckets/sum_rate_with_long_rate_interval-14                        3.611M ± 0%   2.891M ± 0%  -19.95% (p=0.002 n=6)
NativeHistogramsCustomBuckets/histogram_count_with_short_rate_interval-14                1.726M ± 0%   1.690M ± 3%        ~ (p=0.065 n=6)
NativeHistogramsCustomBuckets/histogram_count_with_long_rate_interval-14                 1.633M ± 4%   1.611M ± 0%   -1.34% (p=0.002 n=6)
geomean                                                                                  3.015M        2.613M       -13.36%

Copy the benchmark for native histograms with exponential buckets and adopt to native histograms with custom buckets. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

The custom values are the "le" bucket boundaries of native histograms with custom buckets. They are never modified. It is ok to not copy them when iterating a chunk, just reference them. If we will ever have a function that modifies the custom values, like 'trim' for example. That function will have to make a copy on write. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

krajorama · 2025-05-07T14:03:19Z

It is an open question whether to keep counting the size of the custom values for the engine here.

There's no trivial answer. In this first step the interning works per chunk. So loading 1000 samples from a series could still results in 100x custom values. And of course loading 1000 series with 1 sample will result in 1000x custom values.

beorn7

Thanks. Looks good.

Let's the size logic as is for now. It's giving us the worst case, but it's also not such a big deal compared to the total size of the histogram. When we do "perfect interning" (no duplicate custom values at all), we can remove the variable part in the size calculation.

…eus/prometheus#16565

* Update mimir-prometheus * Replace references to `InstanceName` removed in prometheus/prometheus#16228 * Fix native histogram mangling behaviour to be compatible with prometheus/prometheus#16565 * Make build tags consistent with Prometheus * Update MQE tests to match upstream * Reinstate test cases that were skipped due to upstream issues * Bring in changes from grafana/mimir-prometheus#880 * Add changelog entries * Regenerate OTLP files * Fix breaking change * Update upstream tests

chore(engine): add simple NHCB benchmark

6dc6785

Copy the benchmark for native histograms with exponential buckets and adopt to native histograms with custom buckets. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

krajorama requested a review from beorn7 May 7, 2025 12:30

krajorama force-pushed the krajo/intern-custom-values branch from 426dcba to 6c64665 Compare May 7, 2025 12:40

krajorama marked this pull request as ready for review May 7, 2025 16:49

krajorama requested review from jesusvazquez and roidelapluie as code owners May 7, 2025 16:49

krajorama mentioned this pull request May 8, 2025

[Prometheus Engine] Race condition / inconsistency causing broken native histograms #16541

Closed

beorn7 approved these changes May 12, 2025

View reviewed changes

Merge branch 'main' into krajo/intern-custom-values

772d5ab

krajorama merged commit eb940d9 into main May 20, 2025
45 checks passed

krajorama deleted the krajo/intern-custom-values branch May 20, 2025 07:13

charleskorn mentioned this pull request May 26, 2025

Sync upstream at 14fc57e grafana/mimir-prometheus#878

Merged

charleskorn added a commit to grafana/mimir that referenced this pull request May 27, 2025

Fix native histogram mangling behaviour to be compatible with prometh…

862fe5b

…eus/prometheus#16565

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(chunkenc): intern custom values for native histograms#16565

perf(chunkenc): intern custom values for native histograms#16565
krajorama merged 3 commits intomainfrom
krajo/intern-custom-values

krajorama commented May 7, 2025 •

edited

Loading

Uh oh!

krajorama commented May 7, 2025 •

edited

Loading

Uh oh!

beorn7 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

krajorama commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

krajorama commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

beorn7 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

krajorama commented May 7, 2025 •

edited

Loading

krajorama commented May 7, 2025 •

edited

Loading