dynamic modules: support metrics with labels by defining metric vector counterparts#41003
Conversation
Signed-off-by: William Zhang <wtzhang23@gmail.com>
…c-vectors Signed-off-by: William Zhang <wtzhang23@gmail.com>
|
IMO Otel client library is probably the better UX than prometheus at this point https://opentelemetry.io/docs/languages/go/instrumentation/ |
|
I think an api similar to otel would require both label names and values to be inserted into the per-filter instance stat name pool. In addition, if we want a simple api where the existing callbacks for defining unlabelled metrics is used to define potentially-labeled metrics, then those counters must be lazily fetched during the lifecycle of the filter. I don't think this is that big of a deal given that the metric values will be in thread-local storage after first invocation, but not 100% sure. Not sure how expensive it is to not string intern here so not sure how big of a problem this is; at least the filters I've seen internally almost |
Signed-off-by: William T Zhang <wtzhang23@gmail.com>
|
AFAIK, interning requires a lock on the symbol table, so it cannot be trusted to be run on the data path. If we aspire to support SDKs, none of them really have interning capability AFAIK, and the tags are ultimately represented as vectors of strings somewhere. This is the same representation as Envoy's "inlined string", so it should perform comparatively the same. If you really care about performance, then I think putting a bit of thought into histograms might pay the best value. They are very expensive in Envoy today, so you might need an alternate storage backend to support more rudimentary use cases (e.g. for plugins willing to take a big hit in accuracy - which is not an option today, all histograms are ~5% accurate today). |
Co-authored-by: Takeshi Yoneda <t.y.mathetake@gmail.com> Signed-off-by: William T Zhang <wtzhang23@gmail.com>
Signed-off-by: William Zhang <wtzhang23@gmail.com>
Signed-off-by: William Zhang <wtzhang23@gmail.com>
Signed-off-by: William Zhang <wtzhang23@gmail.com>
Signed-off-by: William Zhang <wtzhang23@gmail.com>
Signed-off-by: William Zhang <wtzhang23@gmail.com>
Signed-off-by: William Zhang <wtzhang23@gmail.com>
mathetake
left a comment
There was a problem hiding this comment.
I think i like the latest comment on removing virtual -- i like the less code
| envoy_dynamic_module_type_module_buffer* label_values, | ||
| size_t label_values_length) { | ||
|
|
||
| ASSERT(label_values_length == label_names.size()); |
There was a problem hiding this comment.
so this will be hard for users to notice what they are doing wrong. ASSERT is only in debug build vs users would likely just use the release build, so this will simply result in the memory corruption without users noticing it, or segfault. In other words i feel this assertion is more likely being hit than not.
I would suggest that at the ABI level maybe allow return -1 to indicate that the label input it wrong as I assume that that is the only failure mode. Maybe we could use -2 for ASSERT(!filter_config->stat_creation_frozen_); as well for the same reason to provide better feedback to module devs. wdyt?
There was a problem hiding this comment.
Yes I think this is a good point. To take this further we could borrow from C and support getting the last errno for future callbacks that return an error that is hard to express in the return value (e.g. a pointer). But I think we could keep this simple and just return -error code for now.
There was a problem hiding this comment.
yeah, so currently the error feedback is not consistent right now. Some callback returns only false or true while logging and the other has an enum for all the failure mode. I think we should stick to the enum everywhere while results should be stored in a pointer passed as an argument. We can do the refactoring in a follow up if we want
There was a problem hiding this comment.
Think it might be worth having this in a separate PR focusing on standardizing the error type?
There was a problem hiding this comment.
Ah I see I can follow the callout approach for now.
Signed-off-by: William Zhang <wtzhang23@gmail.com>
Signed-off-by: William Zhang <wtzhang23@gmail.com>
|
Took the opposite approach where return a result error code but pass in a pointer for the resultant metric id. This way I didn't have to think about casting size_t to |
| } | ||
|
|
||
| impl<EHF: EnvoyHttpFilter> HttpFilterConfig<EHF> for StatsCallbacksFilterConfig { | ||
| fn new_http_filter(&mut self, envoy_filter: &mut EHF) -> Box<dyn HttpFilter<EHF>> { |
There was a problem hiding this comment.
can you also add more dynamic label tests like adding a header as a label value in request_headers callback instead of only just static label value?
There was a problem hiding this comment.
And let's verify that in integration tests
There was a problem hiding this comment.
Did these tests + more in an integration test.
There was a problem hiding this comment.
Added unit tests as well testing getting a value from a header and from a local variable.
|
@wtzhang23 sorry where did you see the flake? i couldn't find here |
|
oh in the prechecks... has this been happening on main branch as well? |
|
ok in this integration test |
|
so |
|
Ya though from my staring at the code it seems your implementation should be very sound. Edit: Oh does dispatcher really last that long? Don't know enough to see if the comment |
|
I believe dispatcher should live that long; I have a slight hunch that this is something to do with the FakeExternalCache sending local reply from the on_scheduled callback. Could you try pushing the following patch and see if this happens again here? I think this is in anyways weird diff --git a/source/extensions/filters/http/dynamic_modules/filter.cc b/source/extensions/filters/http/dynamic_modules/filter.cc
index 7ad53c5dca..82e99f119e 100644
--- a/source/extensions/filters/http/dynamic_modules/filter.cc
+++ b/source/extensions/filters/http/dynamic_modules/filter.cc
@@ -126,8 +126,8 @@ void DynamicModuleHttpFilter::sendLocalReply(
Code code, absl::string_view body,
std::function<void(ResponseHeaderMap& headers)> modify_headers,
const absl::optional<Grpc::Status::GrpcStatus> grpc_status, absl::string_view details) {
- decoder_callbacks_->sendLocalReply(code, body, modify_headers, grpc_status, details);
sent_local_reply_ = true;
+ decoder_callbacks_->sendLocalReply(code, body, modify_headers, grpc_status, details);
} |
Signed-off-by: William Zhang <wtzhang23@gmail.com>
|
Will add tests for errors to resolve coverage failures |
Signed-off-by: William Zhang <wtzhang23@gmail.com>
Signed-off-by: William Zhang <wtzhang23@gmail.com>
Signed-off-by: William Zhang <wtzhang23@gmail.com>
|
looks like the flake is gone? I think my hunch was right? |
|
Ya you are a genius lol |
|
cool lol... can you merge the main branch again so that we can be extra sure that it's gone |
mathetake
left a comment
There was a problem hiding this comment.
Thank you for the iterations!
|
//test/extensions/dynamic_modules/http:integration_test is failing ASAN, PTAL? link to raw test failure logs. |
|
oh ok thanks for the heads up. I will revert this one for now |
|
Is this failure related to this PR? Program counter being zero is somewhat strange and the test case is not a stats test case |
|
oh |
|
Is it possible the address sanitizer just can't read the symbol table/other metadata in the module binary and it checked randomly while the program was currently in the module? Not exactly sure how address sanitizers themselves are implemented |
|
|
|
#41094 opened an issue |
…r counterparts (envoyproxy#41003) Commit Message: dynamic modules: support metrics with labels by defining metric vector counterparts Additional Description: Here, we define CounterVec, GaugeVec, and HistogramVec, which behave similar to how prometheus metrics are defined in [golang](https://pkg.go.dev/github.com/prometheus/client_golang/prometheus#NewCounterVec) and other languages. Label names are fixed and set on filter config initialization whereas label values are dynamically provisioned for each new Filter instance. Risk Level: Low (still in active development, feature itself would be high) Testing: Unit and Integration tests Docs Changes: N/A Release Notes: Added in changelogs Platform Specific Features: N/A Closes envoyproxy#41002 --------- Signed-off-by: William Zhang <wtzhang23@gmail.com> Co-authored-by: Takeshi Yoneda <t.y.mathetake@gmail.com> Signed-off-by: Misha Badov <mbadov@google.com>
Commit Message: dynamic modules: support metrics with labels by defining metric vector counterparts
Additional Description: Here, we define CounterVec, GaugeVec, and HistogramVec, which behave similar to how prometheus metrics are defined in golang and other languages. Label names are fixed and set on filter config initialization whereas label values are dynamically provisioned for each new Filter instance.
Risk Level: Low (still in active development, feature itself would be high)
Testing: Unit and Integration tests
Docs Changes: N/A
Release Notes: Added in changelogs
Platform Specific Features: N/A
Closes #41002