Streaming support for prometheus stats#24716
Streaming support for prometheus stats#24716rulex123 wants to merge 34 commits intoenvoyproxy:mainfrom
Conversation
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <29862113+rulex123@users.noreply.github.com>
|
Fantastic! I will take a look!
And thanks for including the benchmarks. I am happy to review prior to
refractoring once I get back to a big screen.
…On Sat, Dec 31, 2022, 10:25 AM repokitteh-read-only[bot] < ***@***.***> wrote:
As a reminder, PRs marked as draft will not be automatically assigned
reviewers,
or be handled by maintainer-oncall triage.
Please mark your PR as ready when you want it to be reviewed!
🐱
Caused by: #24716 <#24716> was
opened by rulex123.
see: more <#24716>, trace
<https://prod.repokitteh.app/traces/ui/envoyproxy/envoy/0a01f300-88ed-11ed-9d5d-a8ded29505d3>
.
—
Reply to this email directly, view it on GitHub
<#24716 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAO2IPIZ2736TRTVDZK4OATWP73YZANCNFSM6AAAAAATNORLJE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
jmarantz
left a comment
There was a problem hiding this comment.
I did an initial pass; let me know what you think so far.
I'm not clear on what's missing from a refactoring perspective. Is there something else we should share between prom and non-prom that you haven't done yet?
| namespace Envoy { | ||
| namespace Server { | ||
|
|
||
| template <class TR, class C, class G, class H> |
There was a problem hiding this comment.
nit: can you use longer names for the templated types? It's not immediately obvious what those represent.
| namespace Server { | ||
|
|
||
| template <class TR, class C, class G, class H> | ||
| StatsRequestBase<TR, C, G, H>::StatsRequestBase(Stats::Store& stats, const StatsParams& params, |
There was a problem hiding this comment.
nit: can you make the class-name match the filename? In this case I'd rename the file stats_request_base.h.
actually, after reading more of the code in the PR, I think it might be better to call the base-class StatsRequest and leave it in the same files (stats_request.h, stats_request.cc) and have the two derived types calls: GroupedStatsRequest and UngroupedStatsRequest? I think that's the essence of why you wanted them templated.
I could see that it might be desirable to have a grouped view like for Prom in HTML (just for human-friendly viewing) or in JSON for processing by another tagged system.
By leaving the bulk of the implementation in stats_request.cc and .h it might be easier to see the changes in review. The UngroupedStatsRequest would be relatively small.
WDYT?
There was a problem hiding this comment.
This makes sense, I will do as you suggest 👍
|
|
||
| namespace Envoy { | ||
| namespace Server { | ||
|
|
There was a problem hiding this comment.
can you add some class comments describing why we are templating this class?
|
|
||
| // phase-related state | ||
| uint64_t phase_stat_count_{0}; | ||
| unsigned short phase_; |
There was a problem hiding this comment.
I think it doesn't save anything to make this a short; maybe make it a uint32_t, and call it phase_index_?
Moreover, why did you elect to use an index here? I feel like it might be more error-prone to refer to these by index (noting the commit you just made changing a 2 to a 3).
I think it's not too hard in Prom to skip the text-readouts phase if that's not enabled.
There was a problem hiding this comment.
The index doesn't have to do with skipping the text-readouts per se. It's used to keep track of the current phase in the shared base class, and move on to the next phase w/o having to know what that phase is.
This is because the prom and non-prom algorithms go through phases in a different order. I hope that makes sense: if you can think of a better way to achieve something similar, I am happy to update the code. For now I will rename the field name and type.
|
|
||
| // capture stat by either adding to a pre-existing variant or by creating a new variant, and | ||
| // then storing the variant into the map | ||
| const Stats::SymbolTable& global_symbol_table = stat->constSymbolTable(); |
There was a problem hiding this comment.
nit: you can save this per-stat lookup by saving the symbol table as a class member variable, since you are already capturing [this]
| const Stats::SymbolTable& global_symbol_table = stat->constSymbolTable(); | ||
| std::string tag_extracted_name = global_symbol_table.toString(stat->tagExtractedStatName()); | ||
|
|
||
| StatOrScopes variant = stat_map_.contains(tag_extracted_name) |
There was a problem hiding this comment.
you can make this a little faster (just one hash lookup rather than 3 in some cases) with:
StatOrScopes& variant = stat_map_[tag_extracted_name];
if (variant.index() == absl::variant_npos) {
variant = std::vector<Stats::RefcountPtr<StatType>>({stat});
} else {
absl::get<std::vector<Stats::RefcountPtr<StatType>>>(variant).emplace_back(stat);
}
| phase_ = 0; | ||
| } | ||
|
|
||
| template <class StatType> Stats::IterateFn<StatType> PrometheusStatsRequest::checkStat() { |
There was a problem hiding this comment.
rename this saveMatchingStat? I see the original code used a local lambda check_stat for this, but that was mis-named as it's doing more than just checking. Might as well fix it now.
|
|
||
| // check if filtered | ||
| if (params_.filter_ != nullptr) { | ||
| if (!std::regex_search(stat->name(), *params_.filter_)) { |
There was a problem hiding this comment.
I'm wondering if filters are ever used with Prom. It seems weird now that I think of it to filter on the fully elaborated name, but I guess that's easier from a coding and usage perspective than filtering on the tag-extracted name and then having a separate mechanism for filtering on the extracted tags.
|
I'm happy to take another pass when you are ready. /wait |
|
Ty @jmarantz for the initial review! I was on leave for a few days so I didn't get a chance to look at your feedback yet, but I am back now. The refactoring to share code between the prom and non-prom stats request classes is complete (bar the feedback you provided, which I still need to address). The leftover refactoring concerns tests and the "old" prometheus stats code (to be removed, since it's now replaced by the |
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
)" This reverts commit 9dcf001.
Signed-off-by: rulex123 <erica.manno@gmail.com>
|
CC @envoyproxy/mobile-maintainers: FYI only for changes made to |
…proxy#24386)"" This reverts commit 709a2fe.
|
I addressed the feedback provided so far and cleaned out the old prometheus stats code. I think the only outstanding item is to adapt pre-existing tests in |
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
|
I was added as a deps reviewer but I don't see any deps changes. I'm guessing that the deps changes were removed in a previous commit. Ditto for mobile. I'll remove the labels. |
Signed-off-by: rulex123 <erica.manno@gmail.com>
jmarantz
left a comment
There was a problem hiding this comment.
flushing comments for now; will finish review tomorrow.
source/server/admin/stats_render.cc
Outdated
| const std::string& prefixed_tag_extracted_name, | ||
| const Stats::Gauge& gauge) { | ||
| const std::string tags = PrometheusStatsFormatter::formattedTags(gauge.tags()); | ||
| response.add(fmt::format("{0}{{{1}}} {2}\n", prefixed_tag_extracted_name, tags, gauge.value())); |
There was a problem hiding this comment.
nit: absl::StrCat is probably faster for this use-case.
| const std::string tags = PrometheusStatsFormatter::formattedTags(histogram.tags()); | ||
| const std::string hist_tags = histogram.tags().empty() ? EMPTY_STRING : (tags + ","); | ||
|
|
||
| const Stats::HistogramStatistics& stats = histogram.cumulativeStatistics(); |
There was a problem hiding this comment.
can you leave a TODO in here that support for the 3 "bucket modes" offered in other formats hasn't been ported to Prom yet?
See Utility::HistogramBucketsMode in source/server/admin/utils.h
Thanks!
source/server/admin/stats_request.cc
Outdated
| html_render->tableEnd(response_); | ||
| html_render->startPre(response_); | ||
| render_ = std::move(html_render); | ||
| render_.reset(html_render.release()); |
There was a problem hiding this comment.
hmm not sure what I did here: I will revert to original.
| // in the scopes. | ||
| for (const Stats::ConstScopeSharedPtr& scope : scopes_) { | ||
| StatOrScopes& variant = stat_map_[stats_.symbolTable().toString(scope->prefix())]; | ||
| if (variant.index() == absl::variant_npos) { |
There was a problem hiding this comment.
I'm a little surprised that this if-statement is not needed. What puts the ScopeVec into the variant?
I'm just curious -- tests would obviously catch this if it didn't work.
There was a problem hiding this comment.
It's the operator[] of the btree_map that runs a try_emplace behind the scenes and so inserts the variant into the map (and instantiates it using the variant's default constructor) when the lookup key is not in the map.
So for this reason the condition of the if statement is never true and the whole if block is redundant.
There was a problem hiding this comment.
cool thank you! Can you add that as a code-comment here in place of the code?
source/server/admin/stats_request.h
Outdated
| @@ -28,20 +32,35 @@ class StatsRequest : public Admin::Request { | |||
| // is not a significant cost to this, but in a future PR we may choose to | |||
| // co-mingle the types. Note that histograms are groups together in the data | |||
There was a problem hiding this comment.
scan you fix my typo? s/groups/grouped/
thanks
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
|
@jmarantz I've added new tests and adapted pre-existing ones now, and I also addressed any feedback you left so far - so I think this patch is finally "ready for review". Before I mark it as such though: I mistakenly pushed a couple of unsigned commits, is it ok if I fix by force-pushing to the branch? Not sure how to address otherwise. |
|
RE force-pushing -- OK -- if you could scan over the PR for unresolved comments first that would be great, otherwise they will disappear when you force-push. Another thing I sometimes do is abandon one PR and create a new one; that way you can still reference the old PR and its comments :) But it's up to you. |
|
@kyessenov FYI |
Signed-off-by: Erica Manno <ericam@oktaproxylb.services.gew1.spotify.net>
|
Closing as this is now superseded by #24998 |
Commit Message: patch to add support for streaming/chunking of prometheus stats - basically adapts the algorithm introduced in #19693 for prometheus stats (while keeping common parts of the algorithm in a shared class to avoid code duplication). With this patch we should be able to improve the issue described here #16139
Risk Level: only underlying implementation changes, but otherwise no change of behaviour/features. So low risk I would say.
Testing: current tests should remain unchanged (though they might be moved around due to refactoring of code); new tests are also added
Docs Changes: n/a
Release Notes: n/a
[Optional Runtime guard:]
[Optional Fixes #Issue]: #16139
Preliminary benchmark data:
BEFORE
AFTER
@jmarantz for now a draft PR: I would like to get some preliminary feedback before going ahead with more refactoring (e.g. cleaning up code and moving tests around). Also it might be a good idea to loop in Greg at this point, especially to confirm I didn't overlook anything major around prometheus stats and scopes :)