Add configurable stats tags by mrice32 · Pull Request #1852 · envoyproxy/envoy

mrice32 · 2017-10-13T16:50:34Z

This is the second stage of #1751.

The concept of a TagExtractor is added. TagExtractors are used to remove part of the name of a stat on creation and set it as a named tag rather than the name. Regexes are used to extract the tags iteratively. There are a set of default tags included, but users can also pass in lists of their own name-regex combinations in the config (this portion is documented here).

In response to comments on #1751, the default regexes (if included wholesale using the config bool) are processed in a defined ordering from least specific (earlier in the string) to most specific (later in the string).

Fixes #1536.

Signed-off-by: Matt Rice <mattrice@google.com>

htuch

Looks solid, some comments and nits.

htuch · 2017-10-16T12:45:31Z

include/envoy/stats/stats.h

+   * represented in the name, the tags vector will remain unmodified and the return value will be a
+   * copy of name. The portion removed from the name may be different than the value put into
+   * the tag vector for readability purposes. Note: The extraction process is expected to be run
+   * iteratively. For a list of TagExtractors, the original name is expected to be passed into


Nit: s/fist/first/

htuch · 2017-10-16T12:46:21Z

include/envoy/stats/stats.h

+   * copy of name. The portion removed from the name may be different than the value put into
+   * the tag vector for readability purposes. Note: The extraction process is expected to be run
+   * iteratively. For a list of TagExtractors, the original name is expected to be passed into
+   * extractTag for the fist, and then each iteration after the modified name returned from the


This sentence just trails off..

htuch · 2017-10-16T12:48:17Z

include/envoy/stats/stats.h

   */
  virtual void addSink(Sink& sink) PURE;

+  /**


A single extractor or a vector of extractors?

htuch · 2017-10-16T12:51:31Z

source/common/config/utility.h

+  /**
+   * Creates the set of stats tag extractors requested by the config and transfers ownership to the
+   * caller.
+   * @param bootstrap bootstrap proto


Nit: prefer sentences to end with .

htuch · 2017-10-16T12:52:22Z

source/common/config/utility.h

+   * Creates the set of stats tag extractors requested by the config and transfers ownership to the
+   * caller.
+   * @param bootstrap bootstrap proto
+   * @return Owning vector of TagExtractors


Nit: prefer @return std::vector<Stats::TagExtractorPtr> tag extractor vector. or something to that effect, Envoy style guide has return type after @return.

Oh, I wasn't aware of this. Thanks for the heads up. Two questions:

This is pretty inconsistent in Envoy now. Should we clean this up?

We don't actually render the doxygen, do we? If we do, IIRC, the return type is provided there automatically. If we don't, we're just typing out the type that could be read a few lines below. Is there an advantage to that? And why don't we do this for @ params? I'm not necessarily suggesting we do it a different way, just curious about the reasoning as it's not immediately obvious to me.

TBH I think this was probably a habit picked up on some project I worked on long ago which I no longer remember. It may very well make no sense. I'm not sure how easy it would be to clean up. In a perfect world we would actually run doxygen if even for the only reason to check that comments are compliant and fail if they are not. (there is an issue on this)

Gotcha. I won't worry about it for now. I was just curious. Side note: publishing the generated doxygen would be pretty nice.

htuch · 2017-10-16T13:19:03Z

source/common/stats/stats_impl.cc

+                                                     const std::string& regex) {
+
+  if (name.empty()) {
+    throw EnvoyException("tag_name cannot be empty");


Do you have coverage for this?

htuch · 2017-10-16T13:21:49Z

source/common/stats/stats_impl.cc

+    tag.name_ = name_;
+    tag.value_ = value_subexpr.str();
+
+    // This call invalidates match and all derived objects because they contain references to


Not sure which call in the below expression this comment is referring to.

htuch · 2017-10-16T13:24:22Z

source/common/stats/stats_impl.cc

+std::string TagExtractorImpl::extractTag(const std::string& tag_extracted_name,
+                                         std::vector<Tag>& tags) const {
+  std::smatch match;
+  // The regex must match and contain one or more subexpressions (all after the first are ignored).


Maybe explain the remove_subexpr vs. value_expr distinction as well (i.e. what you capture vs. what you want to extract value wise).

Good point. Added this.

htuch · 2017-10-16T13:24:53Z

source/common/stats/stats_impl.h

+   * Creates a tag extractor from the regex provided or looks up a default regex.
+   * @param name name for tag extractor. Used to look up a default tag extractor if regex is empty.
+   * @param regex optional regex expression. Can be specified as an empty string to trigger a
+   * default regex lookup.


htuch · 2017-10-16T13:27:43Z

source/server/server.cc

+  server_stats_.reset(
+      new ServerStats{ALL_SERVER_STATS(POOL_GAUGE_PREFIX(stats_store_, "server."))});
+
+  failHealthcheck(false);


How come this moved?

This method calls server_stats_->live_.set(!fail);, so server_stats_ needs to be initialized before it's called.

Signed-off-by: Matt Rice <mattrice@google.com>

htuch

LGTM after comments.

htuch · 2017-10-23T13:59:22Z

source/common/config/well_known_names.cc

+  // *_rq(_<response_code_class>xx)
+  name_regex_pairs.push_back({RESPONSE_CODE_CLASS, "_rq(_(\\dxx))$"});
+
+  // http.<stat_prefix>.dynamodb.table.<table_name>.capacity.<operation_name>.__partition_id=<last_seven_characters_from_partition_id>


Should this be [<stat_prefix>.]? Otherwise, how come the lookahead?

htuch · 2017-10-23T14:03:22Z

source/common/config/well_known_names.cc

+  // http.[<stat_prefix>.]dynamodb.operation.(<operation_name>.)<base_stat> or
+  // http.[<stat_prefix>.]dynamodb.table.[<table_name>.]capacity.(<operation_name>.)[<partition_id>]
+  name_regex_pairs.push_back({DYNAMO_OPERATION, "^http(?=\\.).*?\\.dynamodb.(?:operation|table(?="
+                                                "\\.).*?\\.capacity)(\\.(.*?))(?:\\.|$)"});


Might be clearer as two regexes..

As it works now, we use names as unique identifiers (and we throw if the user attempts to use duplicate names), so if a tag shows up in two distinct places a single regex must capture both. This is how we ensure a default regex isn't used twice (one custom and one with the with the use_all_defaults flag for example). One alternative would be to check uniqueness of the regex strings themselves during creation instead of names. That would allow us to have duplicate names in the list of tag extractors. It wouldn't be a big change; do you think that would be preferred? My only hesitation is that it seems a little less intuitive for the user than requiring uniqueness of names.

Got it, it's good as is.

htuch · 2017-10-23T14:06:20Z

source/common/config/well_known_names.cc

+  name_regex_pairs.push_back({RATELIMIT_PREFIX, "^ratelimit\\.((.*?)\\.)\\w+?$"});
+
+  // cluster.(<cluster_name>.)*
+  name_regex_pairs.push_back({CLUSTER_NAME, "^cluster\\.((.*?)\\.)"});


QQ: will we pick up a some reserved words in stats names as cluster names here?

I'm not sure if I'm understanding your question correctly, but I don't think so. This regex should only be run once, so when it is run, the only string that can be following "cluster." is the actual name of the cluster. As long as no other regex removes the cluster name before this one, it should work as intended. Let me know if I'm missing something.

htuch · 2017-10-23T14:14:17Z

test/common/stats/stats_impl_test.cc

+  regex_tester.testRegex("http.fault_connection_manager.fault.fault_cluster.aborts_injected",
+                         "http.fault.aborts_injected",
+                         {fault_connection_manager, fault_downstream_cluster});
+}


An alternative testing strategy might be to run the tag extraction after an integration test, dump the outputs, manually verify create a "golden" state and then compare. I don't think you need to do this, there are already a bunch of reasonable tests, but this kind of testing would sanity check what happens when we run on real-ish data.

Good point. I tried this out and found a problem with my expectations (based on the docs) - "listener." is followed by <ip>_<port> rather than just port, so it wasn't being captured. Not sure the best way to do this though... read in a text file? Do we have other tests that work this way? I want to make it as low-maintenance as possible, meaning I would rather people not have to re-generate the output every time they add a stat. That sort of setup would make it easy for people to accidentally ignore an issue they created because it becomes the normal process to modify this test each time anyone touches stats.

One possible solution would be to have the "golden state" be a set of stats (and their processed forms) that are expected to exist. Stats added in the future will be ignored. This would eliminate what I would assume would be the most common stats modification - addition.

We don't really have tests that do this today. I think maybe just adding to one of the integration tests a validation of the full extraction would be useful and give the same benefit as well.

Signed-off-by: Matt Rice <mattrice@google.com>

mattklein123

Cool stuff. LGTM. Few random comments.

mattklein123 · 2017-10-23T17:27:46Z

source/common/config/utility.cc


+std::vector<Stats::TagExtractorPtr>
+Utility::createTagExtractors(const envoy::api::v2::Bootstrap& bootstrap) {
+


nit: del newline

mattklein123 · 2017-10-23T17:29:18Z

source/common/config/utility.cc

+  };
+
+  // Add defaults.
+  if (!bootstrap.stats_config().has_use_all_default_tags() ||


@htuch random unrelated thought: With the proto validator code, could we potentially auto add defaults for things like this if not specified? Might make things easier to read in general.

mattklein123 · 2017-10-23T17:39:50Z

source/common/stats/stats_impl.h

+                         std::vector<Tag>& tags) const override;
+
+private:
+  std::string name_;


nit: these can be const I think.

mattklein123 · 2017-10-23T17:40:15Z

source/common/stats/stats_impl.h


 private:
  const std::string name_;
+  const std::string tag_extracted_name_;


nit: can we make all of these const?

? These members are already const. Was this meant to point at something else?

Oops sorry nevermind mistake.

mattklein123 · 2017-10-23T17:42:46Z

source/server/server.cc


-    failHealthcheck(false);
-
    uint64_t version_int;


nit: instead of passing version_int, can you just do this where we now set the stat?

Signed-off-by: Matt Rice <mattrice@google.com>

mattklein123

LGTM pending any final comments from @htuch

…es with regexes. Signed-off-by: Matt Rice <mattrice@google.com>

Signed-off-by: Matt Rice <mattrice@google.com>

mrice32 · 2017-10-25T22:11:48Z

@htuch @mattklein123 fixed a few faulty assumptions revealed by dumping stats from an integration test and added a "golden" state integration test to ensure these actual envoy stats (as opposed to hardcoded strings pulled from the docs in the unit tests) don't break in the future. PTAL

mattklein123 · 2017-10-25T22:32:42Z

test/integration/xfcc_integration_test.cc

+  // Note: the test below is meant to check that default tags are being extracted correctly with
+  // real-ish input stats. If new stats are added, this test will not break because names that do
+  // not exist in the map are not checked. However, if stats are modified the below maps should be
+  // updated (or regenerated by printing in map literal format).


Can you add a bit more color on how you generated this list? I'm not sure exactly what "regenerated by printing in map literal format" means. Mainly concerned that we might have to regenerate and it will be confusing.

Sure, I can provide more details. I could also just toss the generation code in there and leave it commented out if you think that would decrease the friction?

+1, generally not a fan of commented out code but that seems fine here.

Signed-off-by: Matt Rice <mattrice@google.com>

htuch

LGTM from my side, I'll leave it to @mattklein123 for final approval/merge.

htuch · 2017-10-26T03:13:18Z

test/integration/xfcc_integration_test.cc

+  //   std::cout << std::endl << "{\"" << (*it)->name() << "\", \"" << (*it)->tagExtractedName() <<
+  //   "\"}";
+  // }
+  // std::cout << "};" << std::endl;


This is fine with me if OK with @mattklein123. Might also be a valid thing to do to factor this code into a function that is used to both generate the golden file and the current state for comparison, but I think you've got most of the win here by just having some real state being validated against your regexes.

mattklein123

LGTM. Nice work!

Signed-off-by: Piotr Sikora <piotrsikora@google.com>

This reverts commit f85f49c.

Signed-off-by: Mike Schore <mike.schore@gmail.com> Signed-off-by: JP Simard <jp@jpsim.com>

**Description** Restores cache_creation_input_tokens field in OpenAI-format streaming responses for GCP Anthropic models. This field was accidentally omitted during a recent refactoring, causing clients to be unable to track cache write operations. Problem: The cache_creation_input_tokens field was missing from streaming responses when using GCP Anthropic with prompt caching. Internal Integration test failure: test_gcp_anthropic_cache_hit_stream[gcp.claude-3-7-sonnet] AssertionError: First stream (MISS) should have cache_creation > 0, but got 0 **Root Cause** Commit 4347d17 (Feb 7, 2026) refactored streaming logic from openai_gcpanthropic_stream.go into anthropic_helper.go. During this refactoring, the SetCacheCreationInputTokens() calls were accidentally omitted from two event handlers: 1. message_start handler - missing initial cache creation token storage 2. message_delta handler - missing incremental cache creation token accumulation The code correctly extracted the value from Anthropic's API but never stored it in the response. **Changes** internal/translator/anthropic_helper.go: - Added SetCacheCreationInputTokens() call in message_start handler - Added AddCacheCreationInputTokens() call in message_delta handler internal/translator/openai_gcpanthropic_test.go: - Updated existing test cases to verify cache_creation_input_tokens appears in responses - Added field to "basic text response" and "response with tool use" test cases --------- Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>

mrice32 added 2 commits October 13, 2017 12:39

Add stats tags

5ecfe79

Signed-off-by: Matt Rice <mattrice@google.com>

Changed map to vector to ensure planned ordering of default regexes

c418741

Signed-off-by: Matt Rice <mattrice@google.com>

htuch reviewed Oct 16, 2017

View reviewed changes

mrice32 added 4 commits October 21, 2017 20:27

Responded to comments

06bbb1c

Signed-off-by: Matt Rice <mattrice@google.com>

Corrected comment

fb2deae

Signed-off-by: Matt Rice <mattrice@google.com>

Merge branch 'master' into stats_tags_separated

cf23ec1

fixed format

fcd70aa

Signed-off-by: Matt Rice <mattrice@google.com>

htuch reviewed Oct 23, 2017

View reviewed changes

Corrected tag regex description comment

3bcabb5

Signed-off-by: Matt Rice <mattrice@google.com>

mattklein123 reviewed Oct 23, 2017

View reviewed changes

Responded to comments

21a1cde

Signed-off-by: Matt Rice <mattrice@google.com>

mattklein123 previously approved these changes Oct 23, 2017

View reviewed changes

Added golden state test to xfcc integration test. Fixed existing issu…

083aea0

…es with regexes. Signed-off-by: Matt Rice <mattrice@google.com>

mrice32 dismissed mattklein123’s stale review via 083aea0 October 24, 2017 21:43

Removed code comment

0540545

Signed-off-by: Matt Rice <mattrice@google.com>

mattklein123 reviewed Oct 25, 2017

View reviewed changes

Added sample code and more explanation for regeneration of map literals

56da11b

Signed-off-by: Matt Rice <mattrice@google.com>

htuch approved these changes Oct 26, 2017

View reviewed changes

mattklein123 approved these changes Oct 26, 2017

View reviewed changes

mattklein123 merged commit 005aee4 into envoyproxy:master Oct 26, 2017

mrice32 deleted the stats_tags_separated branch October 26, 2017 15:07

mrice32 mentioned this pull request Oct 26, 2017

Prometheus metrics? christian-posta/envoy-microservices-patterns#2

Closed

kevintchan mentioned this pull request Nov 8, 2017

stats: native datadog tagged metric support #2024

Closed

rshriram pushed a commit to rshriram/envoy that referenced this pull request Oct 30, 2018

Update Envoy SHA to latest. (envoyproxy#1852)

f85f49c

Signed-off-by: Piotr Sikora <piotrsikora@google.com>

rshriram pushed a commit to rshriram/envoy that referenced this pull request Oct 30, 2018

Revert "Update Envoy SHA to latest. (envoyproxy#1852)" (envoyproxy#1854)

121faee

This reverts commit f85f49c.

jpsim pushed a commit that referenced this pull request Nov 28, 2022

network: add basic unit coverage for dns refresh (#1852)

2f4baa9

Signed-off-by: Mike Schore <mike.schore@gmail.com> Signed-off-by: JP Simard <jp@jpsim.com>

jpsim pushed a commit that referenced this pull request Nov 29, 2022

network: add basic unit coverage for dns refresh (#1852)

a3a5a9d

Signed-off-by: Mike Schore <mike.schore@gmail.com> Signed-off-by: JP Simard <jp@jpsim.com>


		std::vector<Stats::TagExtractorPtr>
		Utility::createTagExtractors(const envoy::api::v2::Bootstrap& bootstrap) {

Conversation

mrice32 commented Oct 13, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

htuch left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mrice32 Oct 16, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mrice32 Oct 22, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

htuch left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mrice32 Oct 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mrice32 Oct 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mrice32 Oct 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattklein123 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

mrice32 commented Oct 13, 2017 •

edited

Loading

mrice32 Oct 16, 2017 •

edited

Loading

mrice32 Oct 22, 2017 •

edited

Loading

mrice32 Oct 23, 2017 •

edited

Loading

mrice32 Oct 23, 2017 •

edited

Loading

mrice32 Oct 23, 2017 •

edited

Loading