Skip to content

promql: NHCB operations should support different bucket layouts #17255

@linasm

Description

@linasm

Currently, FloatHistogram.Add (and .Sub) methods return an error when called with two histograms containing different custom buckets layouts:

if h.UsesCustomBuckets() && !FloatBucketsMatch(h.CustomValues, other.CustomValues) {
return nil, false, ErrHistogramsIncompatibleBounds
}

The behaviour can be observed by executing this simple test:

# Test native histograms with custom buckets, adding a new bucket.
load 1m
    custom_buckets_histogram{instance="0"} {{schema:-53 sum:3 count:2 custom_values:[5] buckets:[1 1]}}
    custom_buckets_histogram{instance="1"} {{schema:-53 sum:5 count:4 custom_values:[5 10] buckets:[1 2 1]}}

eval instant at 1m sum(custom_buckets_histogram)
    {} {{schema:-53 sum:8 count:6 custom_values:[5] buckets:[2 4]}}
  • the test fails because of an empty result, also, a warning ("vector contains histograms with incompatible custom buckets (...)") is returned

I believe this limitation is too strict. I would find it way more user friendly if the operation returned a histogram containing all buckets that are present on both histograms. For mismatched buckets (only present on one of the histograms), their values would be rolled up into the next matching (present on both histograms) bucket. There would always exist a matching (catch all) bucket (+Inf) for this.

It can be argued that in the most degenerate case (histograms having no common explicit buckets) the resulting histogram would only have the +Inf bucket. I believe this behaviour is fine, at least it does not seem to be worse than returning an empty (or partial) response. And I think we should not focus too much on this degenerate case of the users shooting themselves into the foot.

I think the case that deserves more attention is when new buckets are added to existing histograms. After all, one of the selling points of NHCB seems to be low per-bucket overhead, so as a user, I will be tempted to add more buckets in my instrumentation. Likely over several iterations. Likely, with gradual rollouts. Meaning that there will be prolonged periods of time when my queries are affected by this existing limitation.

I believe the solution proposed here works well for cases like this (adding new buckets, but the same would be true for removing some buckets as well) and allows a smooth transition between bucket layouts.

As usual, I might be missing some important detail. @beorn7 I would appreciate your thoughts on this (and I hope it is not too late). Can help with the implementation in case this gets a green light.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions