stats: add 'detailed' buckets-mode that captures all data from circlhist by jmarantz · Pull Request #27094 · envoyproxy/envoy

jmarantz · 2023-05-02T02:25:11Z

Commit Message: The current mechanisms for extracting histograms leave a lot of useful data inaccessible. This adds a new extraction mechanism to provide the full computed detail provided by the histogram library, as well as JSON and textual access to this data through the admin port.

Additional Description: This partially addresses #11259 but does not add the new detailed bucket mode to the metrics service gRPC API.

This PR sets the stage for admin visualization of histograms in #26472. The visualization could've occurred without this but it was not very rich; it's got more potential for revealing insight if the full bucket data can be rendered. Note that when aggregating histograms over multiple Envoy proxies it may be better to use the configurable fixed bucket sizes so that the histogram data can be combined across Envoy instances by the time-series database (e.g. Prometheus).

This PR could be split into smaller ones if helpful:

ParentHistogram detailed bucket APIs and tests
New Admin histogram_buckets enum value, Text and JSON admin renderings and tests

That might be overkill in terms of PR granularity, but when reviewing it might be useful to consider these 3 levels of API separately.

Risk Level: low: this is new functionality.
Testing: //test/...
Docs Changes: added doc for new bucket-mode.
Release Notes: required
Platform Specific Features: n/a

Signed-off-by: Joshua Marantz <jmarantz@google.com>

repokitteh-read-only · 2023-05-02T02:25:15Z

As a reminder, PRs marked as draft will not be automatically assigned reviewers,
or be handled by maintainer-oncall triage.

Please mark your PR as ready when you want it to be reviewed!

🐱

Caused by: #27094 was opened by jmarantz.

see: more, trace.

…sted. Signed-off-by: Joshua Marantz <jmarantz@google.com>

Signed-off-by: Joshua Marantz <jmarantz@google.com>

moderation · 2023-05-03T04:32:07Z

There is a dependency change I've been wanting to make for ages that might be a good fit for this PR. Metadata change only

diff --git a/bazel/repository_locations.bzl b/bazel/repository_locations.bzl                                                                 
index 2bf42e8df8..6af423a00a 100644                                                                                                          
--- a/bazel/repository_locations.bzl                                                                                                         
+++ b/bazel/repository_locations.bzl                                                                                                         
@@ -218,8 +217,8 @@ REPOSITORY_LOCATIONS_SPEC = dict(                                                                                        
     ),
     com_github_circonus_labs_libcircllhist = dict(
         project_name = "libcircllhist",
-        project_desc = "An implementation of Circonus log-linear histograms",                                                               
-        project_url = "https://github.com/circonus-labs/libcircllhist",                                                                     
+        project_desc = "An implementation of OpenHistogram log-linear histograms",                                                          
+        project_url = "https://github.com/openhistogram/libcircllhist",                                                                     
         version = "39f9db724a81ba78f5d037f1cae79c5a07107c8e",
         sha256 = "fd2492f6cc1f8734f8f57be8c2e7f2907e94ee2a4c02445ce59c4241fece144b",
         strip_prefix = "libcircllhist-{version}",
@@ -228,7 +227,7 @@ REPOSITORY_LOCATIONS_SPEC = dict(                                                                                        
         release_date = "2019-05-21",
         cpe = "N/A",
         license = "Apache-2.0",
-        license_url = "https://github.com/circonus-labs/libcircllhist/blob/{version}/LICENSE",                                              
+        license_url = "https://github.com/openhistogram/libcircllhist/blob/{version}/LICENSE",                                              
     ),
     com_github_cyan4973_xxhash = dict(

moderation · 2023-05-03T04:34:31Z

Also the commit we are using is from May 2019. I'm not sure what commit it is but the latest commit fails in tests. It would be great to update to the latest commit at some point

Signed-off-by: Joshua Marantz <jmarantz@google.com>

jmarantz · 2023-05-03T14:59:02Z

Hi @moderation -- any reason not to just issue a PR for that dep change so it isn't blocked on this one?

moderation · 2023-05-03T20:58:52Z

No, but I don't think I have the C++ chops to fully incorporate the latest libcirclhist code

jmarantz · 2023-05-03T21:02:29Z

Ah so there's an interface change?

Then I definitely don't want to roll that into this one :) I'll look at updating the dep separately.

jmarantz · 2023-05-03T21:16:02Z

Actually can you file a bug for the upgrade? You did say "it fails in tests". I think maybe someone has already looked at this issue but I forget who.

jmarantz · 2023-05-03T23:55:22Z

@moderation I drafted #27159 to see what CI has to say.

Signed-off-by: Joshua Marantz <jmarantz@google.com>

jmarantz · 2023-05-04T15:02:05Z

/retest

repokitteh-read-only · 2023-05-04T15:02:11Z

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #27094 (comment) was created by @jmarantz.

see: more, trace.

Signed-off-by: Joshua Marantz <jmarantz@google.com>

ggreenway

/wait

ggreenway · 2023-05-08T15:55:57Z

docs/root/operations/admin.rst

+      intervals=1:2, 2:3
+      summary=P0(1,1) P25(1.0625,1.034) P50(2.0166,1.068) P75(2.058,2.005) P90(2.083,2.06) P95(2.091,2.08) P99(2.09,2.09) P99.5(2.099,2.098) P99.9(2.099,2.099) P100(2.1,2.1)
+
+  Each bucket is shown as `value:count`. In the above example there are two buckets. `totals`


It's unclear from reading the docs what value means. Given that a bucket is always a range of values, what is this single value? Upper bound? Lower bound? Midpoint?

@ramaraochavali do you know? I think I tried to suss this out with a test but couldn't determine it, nor could I found precise doc for it other than the sparse comments in the circlhist header file.

I think you need to either use hist_bucket_to_string() or hist_bucket_to_double() and some other fields to figure out the bounds of the bucket and express that in the output.

Oh I didn't see hist_bucket_t APIs and I think I can use those to get a bucket-min and bucket-max and ultimately use those to control the width of the histogram bar when rendering.

I'll go off and do that.

ggreenway · 2023-05-08T16:00:04Z

source/common/stats/utility.cc

+
+  // Determine how we will map source buckets to destination buckets.
+  // Consider 3 cases:
+  //  1. if we have 15 source buckets and 5 destination buckets we'll


Depending on the distribution of the histogram, this may not be a good algorithm for reducing bucket count. It is often preferable to keep more buckets near the upper/lower limits, and fewer buckets in the middle.

Can we get away with not limiting the buckets at all, and just using whatever the histogram library gives us? I believe it is a finite number of buckets, and not excessively large.

I'm not sure what the maximum is from the histogram library, but I couldn't get it to give that many buckets, and felt that based on my experiments if I set 64 buckets that would generally not require any interpolation. The interpolation is just defensive to protect against a future version of circlhist, or a certain data pattern that I have not tried, resulting in a lot more buckets.

I agree with you putting the remainder at the beginning is not necessarily best but I'm not sure why another distribution would be better, and so I opted for what would be easiest :)

I'd say for now, just add an ENVOY_BUG(buckets <= 100) or something and remove the interpolation code entirely.

That's probably OK. We could also decide on that at call-sites by calling getDetail(MAX_BUCKETS + 1) and then asserting at call-site we got <= MAX_BUCKETS.

Hmm, 190 buckets is a lot; but it's probably not so many that it would be difficult to render, right? But let me express this a different way: why try to keep the native resolution of the underlying histogram, if we then change that resolution immediately by combining buckets? I believe the underlying design of this histogram has dynamic bucket widths. What if you are combining buckets are drastically different widths? I think that could result in strange distortions of the histogram. I'm wary of doing this type of math on histograms without fully understanding how it changes the data.

If we just want to render geometry it's probably OK, but I wanted to annotate with actual values as text. I think 30-40 is about that max we'd want to interpolate to.

My thinking is that circlhist is reasonably good at picking buckets logarithmically needed for a dynamic range, and that incrementally changing them by combining neighbors until you get the number of buckets you want should avoid being misleading. I think changing from just a bucket 'value' to a range (min,width) makes me feel like we are not distorting the data with this transform.

But in any case this isn't changing the existing behavior; this would (at least for now) only be used for graphical rendering. We could then iterate on how we combine them.

Another option is to have the C++ always yield all the buckets and sort out interpolation in JS.

I ran a few more experiments and found that the number of buckets appears to go up with the log of the number of distinct values I record. With 10M entries I get to 550 buckets. So from a memory and bandwidth perspective it might be OK to just stream out all the buckets as there will not be so many that it will OOM the binary.

But from a UI perspective we are going to want to combine buckets. We could do that in JS instead if that makes you feel better about this C++ interface. That would also make it possible for the UI to show detailed sub bucket info when hovering over an interpolated histogram bar.

Oh, that's interesting.

This is way outside the scope of this PR, but here's a thought: do we get any value from the current histogram implementation with dynamic buckets? For prometheus, we always coerce to a fixed set of buckets. For this new use case, you need to significantly coerce the buckets (although at least you're just moving them, not changing any of the bounds). Would it make more sense to just use a really simple histogram implementation with fixed buckets?

Assuming that's a topic for another day, I think in this PR I'd prefer having the JS do the bucket-merging for now so that we're not committing to a specific bucket merging behavior in Envoy; any client can merge in any way they see fit (linear, logarithmic, fancy-math, etc).

I think we don't get much value from the dynamic impl right now in terms of detailed buckets, but I suspect it currently helps us provide accurate p95, p99, p99.5 etc across a wide range of data values. And with this series of PRs we'd get detailed bucketing for a single proxy via its admin endpoint. I think this coercion I proposed would rarely be activated in practice; it'd only come into play with a very wide dynamic range of actual data.

We could implement the coercion in C++ but let the endpoint query-params determine whether to coerce at all or what the max bucket-count should be. I think in practice we'll get less than 30 buckets; but I don't have actual data for this yet.

However I'm warming up to moving the coercion to JS as that way a UI could easily be added to show the detail within a merged bucket. So I'll remove the coercion from this PR. I'll move status to 'draft' for now and and re-open soon. Thanks for working through this.

Signed-off-by: Joshua Marantz <jmarantz@google.com>

…uckets. Signed-off-by: Joshua Marantz <jmarantz@google.com>

Signed-off-by: Joshua Marantz <jmarantz@google.com>

jmarantz · 2023-05-16T12:35:31Z

/retest

repokitteh-read-only · 2023-05-16T12:35:38Z

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #27094 (comment) was created by @jmarantz.

see: more, trace.

jmarantz

Interpolation is now fully removed from this PR; it will re-emerge later in JS.

jmarantz

Interpolation is now fully removed from this PR; it will re-emerge later in JS.

ggreenway

Mostly looks good now.

/wait

docs/root/operations/admin.rst

envoy/stats/histogram.h

source/server/admin/stats_render.cc

Signed-off-by: Joshua Marantz <jmarantz@google.com>

ggreenway

LGTM

jmarantz · 2023-05-17T19:19:16Z

/retest

repokitteh-read-only · 2023-05-17T19:19:23Z

Retrying Azure Pipelines:
Check envoy-presubmit isn't fully completed, but will still attempt retrying.
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #27094 (comment) was created by @jmarantz.

see: more, trace.

Signed-off-by: Joshua Marantz <jmarantz@google.com>

…ist (envoyproxy#27094) Commit Message: The current mechanisms for extracting histograms leave a lot of useful data inaccessible. This adds a new extraction mechanism to provide the full computed detail provided by the histogram library, as well as JSON and textual access to this data through the admin port. Additional Description: This partially addresses envoyproxy#11259 but does not add the new detailed bucket mode to the metrics service gRPC API. This PR sets the stage for admin visualization of histograms in envoyproxy#26472. The visualization could've occurred without this but it was not very rich; it's got more potential for revealing insight if the full bucket data can be rendered. Note that when aggregating histograms over multiple Envoy proxies it may be better to use the configurable fixed bucket sizes so that the histogram data can be combined across Envoy instances by the time-series database (e.g. Prometheus). Risk Level: low: this is new functionality. Testing: //test/... Docs Changes: added doc for new bucket-mode. Release Notes: required Platform Specific Features: n/a Signed-off-by: Joshua Marantz <jmarantz@google.com> Signed-off-by: Ryan Eskin <ryan.eskin89@protonmail.com>

jmarantz added 2 commits May 1, 2023 22:03

Add/test detailed bucket-mode to histograms.

ad752b9

Signed-off-by: Joshua Marantz <jmarantz@google.com>

revert unneeded change

852b2b1

Signed-off-by: Joshua Marantz <jmarantz@google.com>

jmarantz added 5 commits May 2, 2023 00:01

make a generic helper to do bucket interpolation so it can be unit te…

c42a25d

…sted. Signed-off-by: Joshua Marantz <jmarantz@google.com>

add detailed unit tests for the bucket interpretation algorithm.

fa23cd8

Signed-off-by: Joshua Marantz <jmarantz@google.com>

Merge branch 'main' into histogram-detail

2aff107

Signed-off-by: Joshua Marantz <jmarantz@google.com>

Add text renderer for detailed histograms.

7396297

Signed-off-by: Joshua Marantz <jmarantz@google.com>

cleanup UI.

2803888

Signed-off-by: Joshua Marantz <jmarantz@google.com>

regolds

128fb58

Signed-off-by: Joshua Marantz <jmarantz@google.com>

jmarantz mentioned this pull request May 4, 2023

admin: render histograms in HTML output #26517

Open

jmarantz added 4 commits May 3, 2023 20:50

add doc for new buckets-mode

aa27c2c

Signed-off-by: Joshua Marantz <jmarantz@google.com>

tighten up up histogram json examples

4e67808

Signed-off-by: Joshua Marantz <jmarantz@google.com>

add release note.

9e5e62f

Signed-off-by: Joshua Marantz <jmarantz@google.com>

cleanup

f8979f9

Signed-off-by: Joshua Marantz <jmarantz@google.com>

jmarantz marked this pull request as ready for review May 4, 2023 02:58

jmarantz assigned ramaraochavali May 4, 2023

jmarantz added 3 commits May 3, 2023 23:08

format updates

95fcb59

Signed-off-by: Joshua Marantz <jmarantz@google.com>

indentation fix + typo

4ac0175

Signed-off-by: Joshua Marantz <jmarantz@google.com>

add blank line before code block.

69cebf0

Signed-off-by: Joshua Marantz <jmarantz@google.com>

jmarantz added 2 commits May 4, 2023 11:43

remove superflous declaration

c6ab3f9

Signed-off-by: Joshua Marantz <jmarantz@google.com>

Merge branch 'main' into histogram-detail

33c0029

Signed-off-by: Joshua Marantz <jmarantz@google.com>

ggreenway requested changes May 8, 2023

View reviewed changes

repokitteh-read-only bot added the waiting label May 8, 2023

jmarantz added 3 commits May 11, 2023 10:26

keep track of bucket widths, including in interpolation.

0b7291c

Signed-off-by: Joshua Marantz <jmarantz@google.com>

add width info to detailed histogram admin tests.

4d05ca9

Signed-off-by: Joshua Marantz <jmarantz@google.com>

format

9fd9ce9

Signed-off-by: Joshua Marantz <jmarantz@google.com>

repokitteh-read-only bot removed the waiting label May 11, 2023

jmarantz added 2 commits May 11, 2023 11:03

update doc to indicate the min_value and width regimen for detailed b…

daa9354

…uckets. Signed-off-by: Joshua Marantz <jmarantz@google.com>

remove superfluous char in doc

ab19e63

Signed-off-by: Joshua Marantz <jmarantz@google.com>

jmarantz marked this pull request as draft May 15, 2023 16:54

jmarantz added 2 commits May 15, 2023 16:09

remove C++ interpolation algorithm; that will be done in JS.

9189612

Signed-off-by: Joshua Marantz <jmarantz@google.com>

Merge branch 'main' into histogram-detail

92e6839

Signed-off-by: Joshua Marantz <jmarantz@google.com>

jmarantz marked this pull request as ready for review May 16, 2023 04:02

jmarantz commented May 16, 2023

View reviewed changes

ggreenway requested changes May 16, 2023

View reviewed changes

repokitteh-read-only bot added the waiting label May 16, 2023

review comments

d7d792c

Signed-off-by: Joshua Marantz <jmarantz@google.com>

repokitteh-read-only bot removed the waiting label May 16, 2023

remove unused declarations

ace8cc7

Signed-off-by: Joshua Marantz <jmarantz@google.com>

ggreenway approved these changes May 17, 2023

View reviewed changes

Merge branch 'main' into histogram-detail

40c4d98

Signed-off-by: Joshua Marantz <jmarantz@google.com>

jmarantz merged commit f4e2a26 into envoyproxy:main May 18, 2023

jmarantz deleted the histogram-detail branch May 18, 2023 01:05

jmarantz mentioned this pull request Jun 30, 2023

admin: new histogram rendering needs a few tweaks #28205

Open

Conversation

jmarantz commented May 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

repokitteh-read-only bot commented May 2, 2023

Uh oh!

moderation commented May 3, 2023

Uh oh!

moderation commented May 3, 2023

Uh oh!

jmarantz commented May 3, 2023

Uh oh!

moderation commented May 3, 2023

Uh oh!

jmarantz commented May 3, 2023

Uh oh!

jmarantz commented May 3, 2023

Uh oh!

jmarantz commented May 3, 2023

Uh oh!

jmarantz commented May 4, 2023

Uh oh!

repokitteh-read-only bot commented May 4, 2023

Uh oh!

ggreenway left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jmarantz May 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jmarantz commented May 16, 2023

Uh oh!

repokitteh-read-only bot commented May 16, 2023

Uh oh!

jmarantz left a comment

Choose a reason for hiding this comment

Uh oh!

jmarantz left a comment

Choose a reason for hiding this comment

Uh oh!

ggreenway left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ggreenway left a comment

Choose a reason for hiding this comment

jmarantz commented May 2, 2023 •

edited

Loading

jmarantz May 15, 2023 •

edited

Loading