Native histograms: reconcile mismatched NHCB bounds in Add/Sub#17278
Native histograms: reconcile mismatched NHCB bounds in Add/Sub#17278beorn7 merged 9 commits intoprometheus:mainfrom
Conversation
09841db to
9950ff1
Compare
|
Thanks for this. As said, I do not have capacity for a detailed review right now (but maybe others have). I peeked at it, though, because it is very intriguing. Just to clarify: From the PromQL perspective, this fixes all the cases where NHCB buckets are mismatched. (While the implementation happens in the Add/Sub methods, it helps for I have a lot of more strategic thoughts about this, which is more around how to release this (before/after declaring NH a stable feature, and how to do it if we do the latter) and if we should have annotation when buckets are reconciled. To not pollute the review of the actual implementation here, I'll add those thoughts to #17255. |
6ab35b3 to
6ac517f
Compare
|
General note (not having looked at the code in detail, maybe it already works): We also want to reconcile buckets for calculation of This then leads to the problem of counter reset detection. So far, we considered each change of the custom values a counter reset. However, now we should just do the normal detection after buckets have been reconciled. |
|
About reset detection: prometheus/model/histogram/float_histogram.go Lines 642 to 646 in 9e4d23d |
We originally planned to not include this in the initial feature set. But now prometheus/prometheus#17278 is underway, so let's aim for it. Signed-off-by: beorn7 <beorn@grafana.com>
b35fc4e to
397cf6c
Compare
I've added info level annotation, this is now ready for review.
IIUC this is being implemented separately, in #17312? |
Thanks. @krajorama or myself will have a look ASAP.
No, #17312 is a separate thing. I think we just have to update the code I have linked above to handle NHCB specially and include the reconciliation logic. I don't have smart ideas right now how to do it in detail, but I think it should/could be part of this PR. |
I'm not sure I understand - IMHO a change in |
That was our reasoning so far, and it was OK because we would never really aggregate NHCBs with different custom values anyway. But now that we can, thanks to your efforts, I realized that there are other things that could happen. For example, somebody might decide to dynamically remove a bucket, without really resetting the underlying classic histogram. The ability of removing buckets from a cumulative histogram, even on the fly or later in storage, was sold as one of the big reasons to go cumulative back in the days… So I guess it could actually happen, or maybe is already doing this. With the bucket reconciliation, we can now aggregate across those bucket changes, in particular calculating a rate over a timespan that includes the bucket removal. A false-positive in counter reset detection could wreak havoc with the result. (While previously, we would just not have returned a result at all and said "incompatible bucket layout".) So even if it might be rare, I do think we have to treat this correctly. |
Signed-off-by: Linas Medziunas <linas.medziunas@gmail.com>
No worries, my algorithm probably isn't perfect either, it might skip buckets that are not used, so the loop most likely needs to iterate on the custom values (boundaries) like |
f048e97 to
938e2d4
Compare
Signed-off-by: Linas Medziunas <linas.medziunas@gmail.com>
938e2d4 to
bfee24a
Compare
Signed-off-by: Linas Medziunas <linas.medziunas@gmail.com>
bfee24a to
82ac2af
Compare
|
@krajorama @beorn7 I've implemented counter reset detection based on reconciled buckets, O(n) complexity, no allocations. |
krajorama
left a comment
There was a problem hiding this comment.
LGMT, I'm suggesting a small change to reduce/simplify the code size a little bit. But after that I think this is ready for approve and merge from my side.
Signed-off-by: Linas Medziunas <linas.medziunas@gmail.com>
Signed-off-by: Linas Medžiūnas <linasm@users.noreply.github.com>
|
@krajorama I've addressed your comments regarding custom bounds intersection loop. There was another comment about consistency that I'm not sure about (see my response inline). |
Hi, I've elaborated on the comment. Not super important to be honest, as the result is the same, but I'd like to avoid confusion perhaps years from now. |
Signed-off-by: Linas Medziunas <linas.medziunas@gmail.com>
|
Will give this a final pass ASAP. Thank you both of you for the epic work so far. |
beorn7
left a comment
There was a problem hiding this comment.
Looks great. Just a minor suggestion to improve the wording of the info annotation.
And maybe there could be tests that calculate non-trivial rates (i.e. one with a bunch of buckets in the result) involving bucket reconciliation, one with a reset and one without a reset?
We originally planned to not include this in the initial feature set. But now prometheus/prometheus#17278 is underway, so let's aim for it. Signed-off-by: beorn7 <beorn@grafana.com>
Signed-off-by: Linas Medziunas <linas.medziunas@gmail.com>
Signed-off-by: Linas Medziunas <linas.medziunas@gmail.com>
|
@beorn7 I've addressed your feedback:
|
… improved custom bucket mismatch handling) Signed-off-by: Aleksandr Smirnov <5targazer@mail.ru>
… improved custom bucket mismatch handling) Signed-off-by: Aleksandr Smirnov <5targazer@mail.ru>
… improved custom bucket mismatch handling) Signed-off-by: Aleksandr Smirnov <5targazer@mail.ru>
Which issue(s) does the PR fix:
Fixes #17255
Does this PR introduce a user-facing change?