stats: add 'detailed' buckets-mode that captures all data from circlhist#27094
stats: add 'detailed' buckets-mode that captures all data from circlhist#27094jmarantz merged 30 commits intoenvoyproxy:mainfrom
Conversation
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
…sted. Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
|
There is a dependency change I've been wanting to make for ages that might be a good fit for this PR. Metadata change only diff --git a/bazel/repository_locations.bzl b/bazel/repository_locations.bzl
index 2bf42e8df8..6af423a00a 100644
--- a/bazel/repository_locations.bzl
+++ b/bazel/repository_locations.bzl
@@ -218,8 +217,8 @@ REPOSITORY_LOCATIONS_SPEC = dict(
),
com_github_circonus_labs_libcircllhist = dict(
project_name = "libcircllhist",
- project_desc = "An implementation of Circonus log-linear histograms",
- project_url = "https://github.com/circonus-labs/libcircllhist",
+ project_desc = "An implementation of OpenHistogram log-linear histograms",
+ project_url = "https://github.com/openhistogram/libcircllhist",
version = "39f9db724a81ba78f5d037f1cae79c5a07107c8e",
sha256 = "fd2492f6cc1f8734f8f57be8c2e7f2907e94ee2a4c02445ce59c4241fece144b",
strip_prefix = "libcircllhist-{version}",
@@ -228,7 +227,7 @@ REPOSITORY_LOCATIONS_SPEC = dict(
release_date = "2019-05-21",
cpe = "N/A",
license = "Apache-2.0",
- license_url = "https://github.com/circonus-labs/libcircllhist/blob/{version}/LICENSE",
+ license_url = "https://github.com/openhistogram/libcircllhist/blob/{version}/LICENSE",
),
com_github_cyan4973_xxhash = dict( |
|
Also the commit we are using is from May 2019. I'm not sure what commit it is but the latest commit fails in tests. It would be great to update to the latest commit at some point |
|
Hi @moderation -- any reason not to just issue a PR for that dep change so it isn't blocked on this one? |
|
No, but I don't think I have the C++ chops to fully incorporate the latest libcirclhist code |
|
Ah so there's an interface change? Then I definitely don't want to roll that into this one :) I'll look at updating the dep separately. |
|
Actually can you file a bug for the upgrade? You did say "it fails in tests". I think maybe someone has already looked at this issue but I forget who. |
|
@moderation I drafted #27159 to see what CI has to say. |
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
|
/retest |
|
Retrying Azure Pipelines: |
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
docs/root/operations/admin.rst
Outdated
| intervals=1:2, 2:3 | ||
| summary=P0(1,1) P25(1.0625,1.034) P50(2.0166,1.068) P75(2.058,2.005) P90(2.083,2.06) P95(2.091,2.08) P99(2.09,2.09) P99.5(2.099,2.098) P99.9(2.099,2.099) P100(2.1,2.1) | ||
|
|
||
| Each bucket is shown as `value:count`. In the above example there are two buckets. `totals` |
There was a problem hiding this comment.
It's unclear from reading the docs what value means. Given that a bucket is always a range of values, what is this single value? Upper bound? Lower bound? Midpoint?
There was a problem hiding this comment.
@ramaraochavali do you know? I think I tried to suss this out with a test but couldn't determine it, nor could I found precise doc for it other than the sparse comments in the circlhist header file.
There was a problem hiding this comment.
@ramaraochavali do you know? I think I tried to suss this out with a test but couldn't determine it, nor could I found precise doc for it other than the sparse comments in the circlhist header file.
There was a problem hiding this comment.
I think you need to either use hist_bucket_to_string() or hist_bucket_to_double() and some other fields to figure out the bounds of the bucket and express that in the output.
There was a problem hiding this comment.
Oh I didn't see hist_bucket_t APIs and I think I can use those to get a bucket-min and bucket-max and ultimately use those to control the width of the histogram bar when rendering.
I'll go off and do that.
source/common/stats/utility.cc
Outdated
|
|
||
| // Determine how we will map source buckets to destination buckets. | ||
| // Consider 3 cases: | ||
| // 1. if we have 15 source buckets and 5 destination buckets we'll |
There was a problem hiding this comment.
Depending on the distribution of the histogram, this may not be a good algorithm for reducing bucket count. It is often preferable to keep more buckets near the upper/lower limits, and fewer buckets in the middle.
Can we get away with not limiting the buckets at all, and just using whatever the histogram library gives us? I believe it is a finite number of buckets, and not excessively large.
There was a problem hiding this comment.
I'm not sure what the maximum is from the histogram library, but I couldn't get it to give that many buckets, and felt that based on my experiments if I set 64 buckets that would generally not require any interpolation. The interpolation is just defensive to protect against a future version of circlhist, or a certain data pattern that I have not tried, resulting in a lot more buckets.
I agree with you putting the remainder at the beginning is not necessarily best but I'm not sure why another distribution would be better, and so I opted for what would be easiest :)
There was a problem hiding this comment.
I'm not sure what the maximum is from the histogram library, but I couldn't get it to give that many buckets, and felt that based on my experiments if I set 64 buckets that would generally not require any interpolation. The interpolation is just defensive to protect against a future version of circlhist, or a certain data pattern that I have not tried, resulting in a lot more buckets.
I agree with you putting the remainder at the beginning is not necessarily best but I'm not sure why another distribution would be better, and so I opted for what would be easiest :)
There was a problem hiding this comment.
I'd say for now, just add an ENVOY_BUG(buckets <= 100) or something and remove the interpolation code entirely.
There was a problem hiding this comment.
That's probably OK. We could also decide on that at call-sites by calling getDetail(MAX_BUCKETS + 1) and then asserting at call-site we got <= MAX_BUCKETS.
There was a problem hiding this comment.
Hmm, 190 buckets is a lot; but it's probably not so many that it would be difficult to render, right? But let me express this a different way: why try to keep the native resolution of the underlying histogram, if we then change that resolution immediately by combining buckets? I believe the underlying design of this histogram has dynamic bucket widths. What if you are combining buckets are drastically different widths? I think that could result in strange distortions of the histogram. I'm wary of doing this type of math on histograms without fully understanding how it changes the data.
There was a problem hiding this comment.
If we just want to render geometry it's probably OK, but I wanted to annotate with actual values as text. I think 30-40 is about that max we'd want to interpolate to.
My thinking is that circlhist is reasonably good at picking buckets logarithmically needed for a dynamic range, and that incrementally changing them by combining neighbors until you get the number of buckets you want should avoid being misleading. I think changing from just a bucket 'value' to a range (min,width) makes me feel like we are not distorting the data with this transform.
But in any case this isn't changing the existing behavior; this would (at least for now) only be used for graphical rendering. We could then iterate on how we combine them.
Another option is to have the C++ always yield all the buckets and sort out interpolation in JS.
There was a problem hiding this comment.
I ran a few more experiments and found that the number of buckets appears to go up with the log of the number of distinct values I record. With 10M entries I get to 550 buckets. So from a memory and bandwidth perspective it might be OK to just stream out all the buckets as there will not be so many that it will OOM the binary.
But from a UI perspective we are going to want to combine buckets. We could do that in JS instead if that makes you feel better about this C++ interface. That would also make it possible for the UI to show detailed sub bucket info when hovering over an interpolated histogram bar.
There was a problem hiding this comment.
Oh, that's interesting.
This is way outside the scope of this PR, but here's a thought: do we get any value from the current histogram implementation with dynamic buckets? For prometheus, we always coerce to a fixed set of buckets. For this new use case, you need to significantly coerce the buckets (although at least you're just moving them, not changing any of the bounds). Would it make more sense to just use a really simple histogram implementation with fixed buckets?
Assuming that's a topic for another day, I think in this PR I'd prefer having the JS do the bucket-merging for now so that we're not committing to a specific bucket merging behavior in Envoy; any client can merge in any way they see fit (linear, logarithmic, fancy-math, etc).
There was a problem hiding this comment.
I think we don't get much value from the dynamic impl right now in terms of detailed buckets, but I suspect it currently helps us provide accurate p95, p99, p99.5 etc across a wide range of data values. And with this series of PRs we'd get detailed bucketing for a single proxy via its admin endpoint. I think this coercion I proposed would rarely be activated in practice; it'd only come into play with a very wide dynamic range of actual data.
We could implement the coercion in C++ but let the endpoint query-params determine whether to coerce at all or what the max bucket-count should be. I think in practice we'll get less than 30 buckets; but I don't have actual data for this yet.
However I'm warming up to moving the coercion to JS as that way a UI could easily be added to show the detail within a merged bucket. So I'll remove the coercion from this PR. I'll move status to 'draft' for now and and re-open soon. Thanks for working through this.
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
…uckets. Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
|
/retest |
|
Retrying Azure Pipelines: |
jmarantz
left a comment
There was a problem hiding this comment.
Interpolation is now fully removed from this PR; it will re-emerge later in JS.
jmarantz
left a comment
There was a problem hiding this comment.
Interpolation is now fully removed from this PR; it will re-emerge later in JS.
ggreenway
left a comment
There was a problem hiding this comment.
Mostly looks good now.
/wait
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
|
/retest |
|
Retrying Azure Pipelines: |
Signed-off-by: Joshua Marantz <jmarantz@google.com>
…ist (envoyproxy#27094) Commit Message: The current mechanisms for extracting histograms leave a lot of useful data inaccessible. This adds a new extraction mechanism to provide the full computed detail provided by the histogram library, as well as JSON and textual access to this data through the admin port. Additional Description: This partially addresses envoyproxy#11259 but does not add the new detailed bucket mode to the metrics service gRPC API. This PR sets the stage for admin visualization of histograms in envoyproxy#26472. The visualization could've occurred without this but it was not very rich; it's got more potential for revealing insight if the full bucket data can be rendered. Note that when aggregating histograms over multiple Envoy proxies it may be better to use the configurable fixed bucket sizes so that the histogram data can be combined across Envoy instances by the time-series database (e.g. Prometheus). Risk Level: low: this is new functionality. Testing: //test/... Docs Changes: added doc for new bucket-mode. Release Notes: required Platform Specific Features: n/a Signed-off-by: Joshua Marantz <jmarantz@google.com> Signed-off-by: Ryan Eskin <ryan.eskin89@protonmail.com>
Commit Message: The current mechanisms for extracting histograms leave a lot of useful data inaccessible. This adds a new extraction mechanism to provide the full computed detail provided by the histogram library, as well as JSON and textual access to this data through the admin port.
Additional Description: This partially addresses #11259 but does not add the new detailed bucket mode to the metrics service gRPC API.
This PR sets the stage for admin visualization of histograms in #26472. The visualization could've occurred without this but it was not very rich; it's got more potential for revealing insight if the full bucket data can be rendered. Note that when aggregating histograms over multiple Envoy proxies it may be better to use the configurable fixed bucket sizes so that the histogram data can be combined across Envoy instances by the time-series database (e.g. Prometheus).
This PR could be split into smaller ones if helpful:
That might be overkill in terms of PR granularity, but when reviewing it might be useful to consider these 3 levels of API separately.
Risk Level: low: this is new functionality.
Testing: //test/...
Docs Changes: added doc for new bucket-mode.
Release Notes: required
Platform Specific Features: n/a