utility: Add performance annotation library by jmarantz · Pull Request #2626 · envoyproxy/envoy

jmarantz · 2018-02-15T20:36:48Z

Description:
This is broken out from #2615 as it likely deserves its own review. This provides a mechanism to annotate and measure the costs of functions that are data-dependent, e.g. regexes.

This is step 3a-ish, in the plan to improve startup performance. This is another step toward addressing #2373

Adds perf annotation library that can be used to instrument code but disappear completely from generated code unless enabled with bazel --define=perf_annotation=enabled

Produces tables in this format.

Duration(us)  # Calls  Mean(ns)  StdDev(ns)  Min(ns)  Max(ns)  Category  Description
        4600        4   1150000      129099  1000000  1300000     alpha            1
         200        1    200000         nan   200000   200000     gamma            2
          87        3     29000        1000    28000    30000      beta            3

Instrumentation can be coded into the system but is turned off via compiler macros so there is zero cost in production.

Risk Level: Low -- new utility library not used by anything yet.

Release Notes: N/A

Signed-off-by: Joshua Marantz <jmarantz@google.com>

htuch

This will be pretty useful. I've left a few comments to get the review going, but I'd like to verify that it makes sense to be adding our own thing here, as previously discussed.

A quick look around gives https://github.com/LLNL/Caliper and USDT (http://www.linuxinternals.org/blog/2018/02/10/usdt-notes/, Linux-specific, unclear on overhead).

I think what you have is clean and portable and might be simpler than adopting other solutions, but other tools will likely have richer statistical output and support both tracing and event recording. It would be good to have a principled approach to debug at some point.

htuch · 2018-02-18T04:46:32Z

source/common/common/perf_annotation.h

+  PerfAnnotationContext();
+
+  typedef std::pair<std::chrono::nanoseconds, uint64_t> DurationCount;
+  typedef std::map<std::string, DurationCount> DurationCountMap;


Is an ordered map here deliberate for stability in output order? If so, can you add a comment.

Originally, yes, but then I changed the output function to sort by time rather than name.

Also, I also got into a habit from before there was an official unordered_map of using std::set or std::map unless I had a specific performance reason not to, just for consistent iteration order of tests and predictably good performance without outliers. Your comment suggests that is not the envoy culture so I'll switch.

I tried changing to an unordered_map, and ran into a heap of trouble getting it to compile. Other hash_map implementations I've used before unordered_map would have been very easy; just change the declaration in the header. However my usage of [] to do find/insert in one operation doesn't seem to work well with unordered_map, at least with the data structures I'm using as keys and values. Maybe I need to make a custom hash/equals functors when my key is a pair.

There is some discussion about the differences here, which I didn't fully grasp: https://stackoverflow.com/questions/17172080/insert-vs-emplace-vs-operator-in-c-map .

Also there's some performance notes here: https://stackoverflow.com/questions/3902644/choosing-between-stdmap-and-stdunordered-map

I think I'll leave a TODO to try switching again to unordered_map.

I think general C++ best practice is to prefer unordered_map over map unless ordering matters; this I picked up via Google style rather than the Envoy code base. The key in your example is a string, so not sure why that is problematic when switching to a hash map.

The key is now a pair of strings but anyway once I defined an explicit hasher, unordered_map worked great. The compiler error messages when I lacked one were shockingly unhelpful :)

htuch · 2018-02-18T04:46:48Z

source/common/common/perf_annotation.h

+   */
+  PerfAnnotationContext();
+
+  typedef std::pair<std::chrono::nanoseconds, uint64_t> DurationCount;


Nit: prefer using vs. typedef.

htuch · 2018-02-18T04:49:10Z

source/common/common/perf_annotation.cc

+      context_(PerfAnnotationContext::getOrCreate()) {}
+
+void PerfOperation::record(absl::string_view category, absl::string_view description) {
+  SystemTime end_time = ProdSystemTimeSource::instance_.currentTime();


Nit: const here and below.

wish there was a lint feature to find those. TBH I'm not sure why local scalars ever need to be declared const. The compiler will figure it out either way.

This style quirk is not for compiler performance reasons (I agree, it should be able to infer). The idea is to just have immutability by default for correctness reasons.

OK, anyway it's a pretty minor point and I have no problem adding const to local scalars; I just have a lot of decades of not bothering to do that to try to overcome, which is why I wish I had a linter to help me :)

htuch · 2018-02-18T04:50:59Z

source/common/common/perf_annotation.h

+public:
+  /**
+   * Records time consumed by a category and description, which are just
+   * joined together in the library with " / ".


I think there's some context that should be added to parse the second part of this sentence.

Actually I decided to make them separate columns rather than joining with " / ".

htuch · 2018-02-18T04:53:19Z

source/common/common/perf_annotation.cc

+  std::vector<std::string> columns[num_columns];
+  for (size_t i = 0; i < num_columns; ++i) {
+    columns[i].push_back(headers[i]);
+    widths[i] = strlen(headers[i]);


Nit: generally prefer to promote to std::string rather than use C unsafe string library functions. This makes it easier to scan the code base for C unsafe string functions when hunting for possible security issues.

htuch · 2018-02-18T04:56:07Z

source/common/common/perf_annotation.cc

+      // it inline with the largest.
+      if (i != (num_columns - 1)) {
+        out.append(widths[i] - str.size(), ' ');
+        absl::StrAppend(&out, str, "  ");


Did you consider printf style formatting with fmtlib (http://fmtlib.net/latest/syntax.html)?

OK, done. I'm not sure if it's easier to read but it's OK. I wound up factoring out the formatting strings and putting them into a separate vector.

htuch · 2018-02-18T04:57:03Z

source/common/common/perf_annotation.cc

+        (count == 0)
+            ? "NaN"
+            : std::to_string(
+                  std::chrono::duration_cast<std::chrono::nanoseconds>(duration).count() / count));


Have you considered tracking other cheap stats such as min/max/stddev? Having an idea of variance is pretty useful in understanding the average.

I've added a TODO for that; should be an easy incremental improvement.

FWIW that is now done.

- attempt to switch to unordered_map (deferred to TODO) - change typedef to using - more const for scalar locals - switch the category/description to be a separate columns to make the API easier to explain - use fmt::format for field right-justification - consider adding other stats (deferred to TODO) Signed-off-by: Joshua Marantz <jmarantz@google.com>

I realized my mistake and got unordered_map to work as expected. Needed to define a hasher because there's not a builtin one for std::pair, though it seems like there could/should be, as there is a builtin operator== for std::pair. Signed-off-by: Joshua Marantz <jmarantz@google.com>

Signed-off-by: Joshua Marantz <jmarantz@google.com>

Also fixes a compilation problem seen in CI due to lack of explicit conversion from absl::string_view to std::string. Signed-off-by: Joshua Marantz <jmarantz@google.com>

Signed-off-by: Joshua Marantz <jmarantz@google.com>

jmarantz · 2018-02-26T15:44:15Z

@mrice32 had several comments in #2615 which are applicable here, and repeated here:

Is there a reason we're not using monotonic clocks/times here and throughout since we're just computing durations?
Add doxygen annotation for params to PerfAnnotationContext::record() and elsewhere
Is there any reason we can't just subclass the ThreadSafeSingleton to keep this singleton implementation consistent with others like it across Envoy?
There don't appear to be any tests for PerfOperation. Can we add a few? I think just using a second timer to ensure that the recorded time will be greater than some value will suffice.

jmarantz · 2018-02-26T17:43:39Z

RE @mrice32 comments:

monotonic clocks: done
doxygen annotation: done
ThreadSafeSingleton; does not appear to be used consistently now. I'm actually not sure I want to switch to this because the explicit instantiation used currently works well with the tests, and also I'm thinking that if we want to use this for performance critical code, we might want to actually put it in thread-local storage and have a way to aggregate across the silos only when someone interactively requests output.
PerfOperation (and all this new code) has 100% coverage via PerfAnnotationTest.testMacros (which is the recommended mechanism for code to do annotations). Do you think it's important to also test PerfOperation explicitly (as opposed to via PERF_OPERATION)?

…s alternative. Signed-off-by: Joshua Marantz <jmarantz@google.com>

mrice32 · 2018-02-26T18:18:41Z

You're probably right that it's not used in all cases, but I don't think that's necessarily reason to not use it as it is the most common and supported implementation. In both cases there will be leakage across tests that is cleaned up with the CLEAR macro - IIUC, the only difference would be calling a different getter method? (Feel free to correct me if I'm wrong) As for thread-local aggregation, the singleton implementation we use here will be overhauled regardless in that case, so I see no reason one would be preferred over the other. This is ultimately up to you as it's a relatively small detail, but I think it's generally good practice to try to reuse common libraries, especially when they are designed for your use-case, as they increase functional predictability and general readability across the codebase. The argument isn't necessarily that a special implementation of a singleton will be wrong, but more that it might differ in subtle ways that ends up violating others' assumptions about the object - especially with something like singletons. These sorts of small misunderstandings can lead to much larger bugs.
Nope, you're totally right. I must've just glanced right past that test. Thanks for pointing that out.

… are in nanoseconds. Signed-off-by: Joshua Marantz <jmarantz@google.com>

jmarantz · 2018-02-26T19:48:50Z

@mrice32 per f2f, one reason I'd prefer not to use ThreadSafeSingleton is that in the current state, the object remains completely uninstantiated unless enabled at compile-time. I actually was torn over whether to use a singleton at all for this, and went this way because I thought it would be easier to instrument code with zero production effect. In the future it may be worth putting the object into an appropriate context object so it's available where needed -- maybe using one of the thread-local objects.

jmarantz · 2018-02-27T13:27:39Z

I think all comments are addressed; PTAL. One that I addressed really with just a comment was the possibility of taking another dependency for a library that's got some overlap with this: https://github.com/LLNL/Caliper. I did have a quick look at that and I agree there's a lot of overlap but I don't think the output was what I was looking for in #2615. Still it might be nice to think about utilizing that one as well.

mrice32 · 2018-02-28T16:23:58Z

ping :) @dnoe

htuch

LGTM modulo minor comments.

htuch · 2018-02-28T17:55:26Z

source/common/common/perf_annotation.cc

+  context_->record(duration, category, description);
+}
+
+PerfAnnotationContext::PerfAnnotationContext() {}


Not needed?

Well I've explicitly declared it to be private to guide users to use getOrCreate(), so it needs to be implemented. And I don't think it wants to be inline due to the complexity of the map creation. Adding a comment.

htuch · 2018-02-28T17:57:43Z

source/common/common/perf_annotation.cc

+    }
+    stats.max_ = std::max(stats.max_, duration);
+    stats.total_ += duration;
+    // TODO(jmarantz): accumulate standard deviation.


What does this mean given the above stddev_.update?

detritrus; removed :)

htuch · 2018-02-28T18:04:08Z

source/common/common/perf_annotation.h

+// However, the macros for instrumenting code for performance analysis will expand
+// to nothing.
+//
+// See also: https://github.com/LLNL/Caliper -- it may be worth integrating with


This methodology is also similar to KStats used at VMware for the VMM, see https://labs.vmware.com/vmtj/methodology-for-performance-analysis-of-vmware-vsphere-under-tier-1-applications. We wrote a bit about how we used this in https://dl.acm.org/citation.cfm?id=1899945&dl=ACM&coll=DL as well.

added more comments for further reading.

htuch · 2018-02-28T18:06:55Z

source/common/common/perf_annotation.cc

+    columns[7].push_back(category_description.second);
+    for (size_t i = 0; i < num_columns; ++i) {
+      widths[i] = std::max(widths[i], columns[i].back().size());
+    }


FWIW, I'm thinking just dumping to HTML might be even easier, since then there is no manual layout engine work to be done, but this is fine.

I would rather generate columnar text for easy fast consumption while developing. I think an HTML table generator would also be very useful if/when this is hooked to the admin console, so I added a TODO for that.

htuch · 2018-02-28T18:07:04Z

source/common/common/perf_annotation.cc

+      return std::to_string(std::chrono::duration_cast<std::chrono::nanoseconds>(ns).count());
+    };
+    columns[0].push_back(microseconds_string(stats.total_));
+    uint64_t count = stats.stddev_.count();


Nit: const

htuch · 2018-02-28T18:08:29Z

source/common/common/perf_annotation.h

+   * @param duration std::chrono::nanoseconds the duration.
+   * @param category absl::string_view the name of a category for the recording.
+   * @param category absl::string_view the name of description for the recording.
+   *


Nit: remove blank line.

htuch · 2018-02-28T18:14:15Z

source/common/common/perf_annotation.h

+  /**
+   * Report an event relative to the operation in progress. Note report can be called
+   * multiple times on a single PerfOperation, with distinct category/description combinations.
+   * @param category absl::string_view the name of a category for the recording.


FYI, we don't typically include types in @param. I know, I know, none of this makes any sense other than from a consistency argument perspective.

my bad; this is explicit in STYLE.md. Done.

htuch · 2018-02-28T18:17:06Z

source/common/common/utility.cc

+double WelfordStandardDeviation::computeStandardDeviation() const {
+  const double variance = computeVariance();
+  // It seems very difficult for variance to go negative, but from the calculation in update()
+  // above, I can't quite convince myself it's impossible, so put in a guard to be sure.


Should this just be an ASSERT, since it shouldn't be possible to have negative variance..

I agree it shouldn't be possible, but this algorithm is an approximation and it's not super-obvious to me that it can't happen given the way it's calculated. So I thought it's easier just to guard against it and avoid runtime exceptions.

… const local scalars. Signed-off-by: Joshua Marantz <jmarantz@google.com>

htuch

Thanks.

As discussed in the weekly meeting, this does not provide a C++ implementation of Platform filters, merely the ability to configure Envoy to use them. Part of: #2498 Risk Level: Low Testing: New unit tests Docs Changes: N/A Release Notes: Updated version_history.rst Signed-off-by: Ryan Hamilton <rch@google.com> Signed-off-by: JP Simard <jp@jpsim.com>

Follow up from #2626. Risk Level: None Testing: N/A Docs Changes: N/A Release Notes: N/A Signed-off-by: Ryan Hamilton <rch@google.com> Signed-off-by: JP Simard <jp@jpsim.com>

As discussed in the weekly meeting, this does not provide a C++ implementation of Platform filters, merely the ability to configure Envoy to use them. Part of: #2498 Risk Level: Low Testing: New unit tests Docs Changes: N/A Release Notes: Updated version_history.rst Signed-off-by: Ryan Hamilton <rch@google.com> Signed-off-by: JP Simard <jp@jpsim.com>

Follow up from #2626. Risk Level: None Testing: N/A Docs Changes: N/A Release Notes: N/A Signed-off-by: Ryan Hamilton <rch@google.com> Signed-off-by: JP Simard <jp@jpsim.com>

jmarantz added 3 commits February 15, 2018 15:29

Add perf_annotation_lib.

2db4b24

Signed-off-by: Joshua Marantz <jmarantz@google.com>

formatting

d440e99

Signed-off-by: Joshua Marantz <jmarantz@google.com>

Merge branch 'master' into perf-annotation-lib

d69399b

Signed-off-by: Joshua Marantz <jmarantz@google.com>

jmarantz added a commit to jmarantz/envoy that referenced this pull request Feb 15, 2018

Use the cleaned up version of perf_annotation from envoyproxy#2626

c85f07e

Signed-off-by: Joshua Marantz <jmarantz@google.com>

Fix perf width caluclation crash, and left-justify last text column.

f7a1d64

Signed-off-by: Joshua Marantz <jmarantz@google.com>

jmarantz mentioned this pull request Feb 16, 2018

stats: add perf-annotation regexes used in stats tag extraction #2615

Merged

dnoe assigned dnoe and htuch Feb 16, 2018

htuch suggested changes Feb 18, 2018

View reviewed changes

jmarantz added 6 commits February 25, 2018 16:04

formatting fix.

9643f74

Signed-off-by: Joshua Marantz <jmarantz@google.com>

Add running min & max values.

3e2de9d

Signed-off-by: Joshua Marantz <jmarantz@google.com>

Add stddev column, and underlying library support.

63d7167

Also fixes a compilation problem seen in CI due to lack of explicit conversion from absl::string_view to std::string. Signed-off-by: Joshua Marantz <jmarantz@google.com>

add extra parens per clang's recommendation to disambiguate.

79f6afa

Signed-off-by: Joshua Marantz <jmarantz@google.com>

Use monotonic time, add missing doxygen comments, reference Caliper a…

b196723

…s alternative. Signed-off-by: Joshua Marantz <jmarantz@google.com>

Fix some incorrect units: sums are on microseconds, all other columns…

671b54c

… are in nanoseconds. Signed-off-by: Joshua Marantz <jmarantz@google.com>

htuch suggested changes Feb 28, 2018

View reviewed changes

Style nit resolution; update stale TODOs, remove doxygen param types,…

ed94aba

… const local scalars. Signed-off-by: Joshua Marantz <jmarantz@google.com>

mrice32 approved these changes Feb 28, 2018

View reviewed changes

htuch approved these changes Feb 28, 2018

View reviewed changes

htuch merged commit 0888f3f into envoyproxy:master Feb 28, 2018

jmarantz deleted the perf-annotation-lib branch February 28, 2018 20:18

Conversation

jmarantz commented Feb 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

htuch left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jmarantz Feb 25, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jmarantz Feb 28, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jmarantz commented Feb 26, 2018

Uh oh!

jmarantz commented Feb 26, 2018

Uh oh!

mrice32 commented Feb 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jmarantz commented Feb 26, 2018

Uh oh!

jmarantz commented Feb 27, 2018

Uh oh!

mrice32 commented Feb 28, 2018

Uh oh!

htuch left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jmarantz commented Feb 15, 2018 •

edited

Loading

htuch left a comment •

edited

Loading

jmarantz Feb 25, 2018 •

edited

Loading

jmarantz Feb 28, 2018 •

edited

Loading

mrice32 commented Feb 26, 2018 •

edited

Loading