Skip to content

Streaming support for prometheus stats#24716

Closed
rulex123 wants to merge 34 commits intoenvoyproxy:mainfrom
rulex123:prometheus-stats-streaming
Closed

Streaming support for prometheus stats#24716
rulex123 wants to merge 34 commits intoenvoyproxy:mainfrom
rulex123:prometheus-stats-streaming

Conversation

@rulex123
Copy link
Copy Markdown
Contributor

@rulex123 rulex123 commented Dec 31, 2022

Commit Message: patch to add support for streaming/chunking of prometheus stats - basically adapts the algorithm introduced in #19693 for prometheus stats (while keeping common parts of the algorithm in a shared class to avoid code duplication). With this patch we should be able to improve the issue described here #16139

Risk Level: only underlying implementation changes, but otherwise no change of behaviour/features. So low risk I would say.
Testing: current tests should remain unchanged (though they might be moved around due to refactoring of code); new tests are also added
Docs Changes: n/a
Release Notes: n/a
[Optional Runtime guard:]
[Optional Fixes #Issue]: #16139

Preliminary benchmark data:

BEFORE

---------------------------------------------------------------------------
Benchmark                                 Time             CPU   Iterations
---------------------------------------------------------------------------
BM_AllCountersText                      668 ms          666 ms            1
BM_UsedCountersText                    40.6 ms         40.4 ms           18
BM_FilteredCountersText                6705 ms         6675 ms            1
BM_Re2FilteredCountersText              292 ms          291 ms            2
BM_AllCountersJson                      730 ms          725 ms            1
BM_UsedCountersJson                    40.4 ms         40.3 ms           17
BM_FilteredCountersJson                6822 ms         6753 ms            1
BM_Re2FilteredCountersJson              311 ms          310 ms            2
BM_AllCountersPrometheus               5489 ms         5381 ms            1
BM_UsedCountersPrometheus               160 ms          157 ms            5
BM_FilteredCountersPrometheus          7684 ms         7637 ms            1
BM_Re2FilteredCountersPrometheus        577 ms          574 ms            1

AFTER

---------------------------------------------------------------------------
Benchmark                                 Time             CPU   Iterations
---------------------------------------------------------------------------
BM_AllCountersText                      682 ms          680 ms            1
BM_UsedCountersText                    40.4 ms         40.3 ms           17
BM_FilteredCountersText                6749 ms         6715 ms            1
BM_Re2FilteredCountersText              297 ms          296 ms            2
BM_AllCountersJson                      715 ms          713 ms            1
BM_UsedCountersJson                    46.2 ms         45.6 ms           16
BM_FilteredCountersJson                7345 ms         7214 ms            1
BM_Re2FilteredCountersJson              323 ms          319 ms            2
BM_AllCountersPrometheus               1917 ms         1907 ms            1
BM_UsedCountersPrometheus              55.1 ms         54.6 ms           13
BM_FilteredCountersPrometheus          6886 ms         6824 ms            1
BM_Re2FilteredCountersPrometheus        305 ms          304 ms            2

@jmarantz for now a draft PR: I would like to get some preliminary feedback before going ahead with more refactoring (e.g. cleaning up code and moving tests around). Also it might be a good idea to loop in Greg at this point, especially to confirm I didn't overlook anything major around prometheus stats and scopes :)

rulex123 and others added 12 commits December 31, 2022 09:51
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
@repokitteh-read-only
Copy link
Copy Markdown

As a reminder, PRs marked as draft will not be automatically assigned reviewers,
or be handled by maintainer-oncall triage.

Please mark your PR as ready when you want it to be reviewed!

🐱

Caused by: #24716 was opened by rulex123.

see: more, trace.

Signed-off-by: rulex123 <29862113+rulex123@users.noreply.github.com>
@jmarantz
Copy link
Copy Markdown
Contributor

jmarantz commented Dec 31, 2022 via email

@jmarantz jmarantz self-assigned this Dec 31, 2022
Signed-off-by: rulex123 <erica.manno@gmail.com>
Copy link
Copy Markdown
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did an initial pass; let me know what you think so far.

I'm not clear on what's missing from a refactoring perspective. Is there something else we should share between prom and non-prom that you haven't done yet?

namespace Envoy {
namespace Server {

template <class TR, class C, class G, class H>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can you use longer names for the templated types? It's not immediately obvious what those represent.

namespace Server {

template <class TR, class C, class G, class H>
StatsRequestBase<TR, C, G, H>::StatsRequestBase(Stats::Store& stats, const StatsParams& params,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can you make the class-name match the filename? In this case I'd rename the file stats_request_base.h.

actually, after reading more of the code in the PR, I think it might be better to call the base-class StatsRequest and leave it in the same files (stats_request.h, stats_request.cc) and have the two derived types calls: GroupedStatsRequest and UngroupedStatsRequest? I think that's the essence of why you wanted them templated.

I could see that it might be desirable to have a grouped view like for Prom in HTML (just for human-friendly viewing) or in JSON for processing by another tagged system.

By leaving the bulk of the implementation in stats_request.cc and .h it might be easier to see the changes in review. The UngroupedStatsRequest would be relatively small.

WDYT?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense, I will do as you suggest 👍


namespace Envoy {
namespace Server {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add some class comments describing why we are templating this class?


// phase-related state
uint64_t phase_stat_count_{0};
unsigned short phase_;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it doesn't save anything to make this a short; maybe make it a uint32_t, and call it phase_index_?

Moreover, why did you elect to use an index here? I feel like it might be more error-prone to refer to these by index (noting the commit you just made changing a 2 to a 3).

I think it's not too hard in Prom to skip the text-readouts phase if that's not enabled.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The index doesn't have to do with skipping the text-readouts per se. It's used to keep track of the current phase in the shared base class, and move on to the next phase w/o having to know what that phase is.

This is because the prom and non-prom algorithms go through phases in a different order. I hope that makes sense: if you can think of a better way to achieve something similar, I am happy to update the code. For now I will rename the field name and type.


// capture stat by either adding to a pre-existing variant or by creating a new variant, and
// then storing the variant into the map
const Stats::SymbolTable& global_symbol_table = stat->constSymbolTable();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can save this per-stat lookup by saving the symbol table as a class member variable, since you are already capturing [this]

const Stats::SymbolTable& global_symbol_table = stat->constSymbolTable();
std::string tag_extracted_name = global_symbol_table.toString(stat->tagExtractedStatName());

StatOrScopes variant = stat_map_.contains(tag_extracted_name)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can make this a little faster (just one hash lookup rather than 3 in some cases) with:

StatOrScopes& variant = stat_map_[tag_extracted_name];
if (variant.index() == absl::variant_npos) {
  variant = std::vector<Stats::RefcountPtr<StatType>>({stat});
} else {
  absl::get<std::vector<Stats::RefcountPtr<StatType>>>(variant).emplace_back(stat);
}

phase_ = 0;
}

template <class StatType> Stats::IterateFn<StatType> PrometheusStatsRequest::checkStat() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename this saveMatchingStat? I see the original code used a local lambda check_stat for this, but that was mis-named as it's doing more than just checking. Might as well fix it now.


// check if filtered
if (params_.filter_ != nullptr) {
if (!std::regex_search(stat->name(), *params_.filter_)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if filters are ever used with Prom. It seems weird now that I think of it to filter on the fully elaborated name, but I guess that's easier from a coding and usage perspective than filtering on the tag-extracted name and then having a separate mechanism for filtering on the extracted tags.

@jmarantz
Copy link
Copy Markdown
Contributor

jmarantz commented Jan 8, 2023

I'm happy to take another pass when you are ready.

/wait

@rulex123
Copy link
Copy Markdown
Contributor Author

rulex123 commented Jan 9, 2023

Ty @jmarantz for the initial review!

I was on leave for a few days so I didn't get a chance to look at your feedback yet, but I am back now. The refactoring to share code between the prom and non-prom stats request classes is complete (bar the feedback you provided, which I still need to address).

The leftover refactoring concerns tests and the "old" prometheus stats code (to be removed, since it's now replaced by the PrometheusStatsRequest class). Hopefully I will be able to have a full-blown PR ready for you before end of week :-)

Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
@repokitteh-read-only repokitteh-read-only bot added mobile deps Approval required for changes to Envoy's external dependencies and removed waiting labels Jan 12, 2023
@repokitteh-read-only
Copy link
Copy Markdown

CC @envoyproxy/mobile-maintainers: FYI only for changes made to (mobile/).
envoyproxy/mobile-maintainers assignee is @Augustyniak
CC @envoyproxy/dependency-shepherds: Your approval is needed for changes made to (bazel/.*repos.*\.bzl)|(bazel/dependency_imports\.bzl)|(api/bazel/.*\.bzl)|(.*/requirements\.txt)|(.*\.patch).
envoyproxy/dependency-shepherds assignee is @RyanTheOptimist

🐱

Caused by: #24716 was synchronize by rulex123.

see: more, trace.

@rulex123
Copy link
Copy Markdown
Contributor Author

I addressed the feedback provided so far and cleaned out the old prometheus stats code. I think the only outstanding item is to adapt pre-existing tests in prometheus_stats_test.cc (for now they are commented out).

Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
@RyanTheOptimist RyanTheOptimist removed deps Approval required for changes to Envoy's external dependencies mobile labels Jan 12, 2023
@RyanTheOptimist
Copy link
Copy Markdown
Contributor

I was added as a deps reviewer but I don't see any deps changes. I'm guessing that the deps changes were removed in a previous commit. Ditto for mobile. I'll remove the labels.

Signed-off-by: rulex123 <erica.manno@gmail.com>
@RyanTheOptimist RyanTheOptimist removed their assignment Jan 12, 2023
Copy link
Copy Markdown
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flushing comments for now; will finish review tomorrow.

const std::string& prefixed_tag_extracted_name,
const Stats::Gauge& gauge) {
const std::string tags = PrometheusStatsFormatter::formattedTags(gauge.tags());
response.add(fmt::format("{0}{{{1}}} {2}\n", prefixed_tag_extracted_name, tags, gauge.value()));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: absl::StrCat is probably faster for this use-case.

const std::string tags = PrometheusStatsFormatter::formattedTags(histogram.tags());
const std::string hist_tags = histogram.tags().empty() ? EMPTY_STRING : (tags + ",");

const Stats::HistogramStatistics& stats = histogram.cumulativeStatistics();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you leave a TODO in here that support for the 3 "bucket modes" offered in other formats hasn't been ported to Prom yet?

See Utility::HistogramBucketsMode in source/server/admin/utils.h

Thanks!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure thing!

html_render->tableEnd(response_);
html_render->startPre(response_);
render_ = std::move(html_render);
render_.reset(html_render.release());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this change?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm not sure what I did here: I will revert to original.

// in the scopes.
for (const Stats::ConstScopeSharedPtr& scope : scopes_) {
StatOrScopes& variant = stat_map_[stats_.symbolTable().toString(scope->prefix())];
if (variant.index() == absl::variant_npos) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little surprised that this if-statement is not needed. What puts the ScopeVec into the variant?

I'm just curious -- tests would obviously catch this if it didn't work.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the operator[] of the btree_map that runs a try_emplace behind the scenes and so inserts the variant into the map (and instantiates it using the variant's default constructor) when the lookup key is not in the map.

So for this reason the condition of the if statement is never true and the whole if block is redundant.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool thank you! Can you add that as a code-comment here in place of the code?

@@ -28,20 +32,35 @@ class StatsRequest : public Admin::Request {
// is not a significant cost to this, but in a future PR we may choose to
// co-mingle the types. Note that histograms are groups together in the data
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scan you fix my typo? s/groups/grouped/

thanks

Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
@rulex123
Copy link
Copy Markdown
Contributor Author

@jmarantz I've added new tests and adapted pre-existing ones now, and I also addressed any feedback you left so far - so I think this patch is finally "ready for review".

Before I mark it as such though: I mistakenly pushed a couple of unsigned commits, is it ok if I fix by force-pushing to the branch? Not sure how to address otherwise.

@jmarantz
Copy link
Copy Markdown
Contributor

RE force-pushing -- OK -- if you could scan over the PR for unresolved comments first that would be great, otherwise they will disappear when you force-push.

Another thing I sometimes do is abandon one PR and create a new one; that way you can still reference the old PR and its comments :) But it's up to you.

@jmarantz
Copy link
Copy Markdown
Contributor

@kyessenov FYI

Signed-off-by: Erica Manno <ericam@oktaproxylb.services.gew1.spotify.net>
@rulex123
Copy link
Copy Markdown
Contributor Author

Closing as this is now superseded by #24998

@rulex123 rulex123 closed this Jan 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants