utility: Add performance annotation library#2626
Conversation
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
There was a problem hiding this comment.
This will be pretty useful. I've left a few comments to get the review going, but I'd like to verify that it makes sense to be adding our own thing here, as previously discussed.
A quick look around gives https://github.com/LLNL/Caliper and USDT (http://www.linuxinternals.org/blog/2018/02/10/usdt-notes/, Linux-specific, unclear on overhead).
I think what you have is clean and portable and might be simpler than adopting other solutions, but other tools will likely have richer statistical output and support both tracing and event recording. It would be good to have a principled approach to debug at some point.
| PerfAnnotationContext(); | ||
|
|
||
| typedef std::pair<std::chrono::nanoseconds, uint64_t> DurationCount; | ||
| typedef std::map<std::string, DurationCount> DurationCountMap; |
There was a problem hiding this comment.
Is an ordered map here deliberate for stability in output order? If so, can you add a comment.
There was a problem hiding this comment.
Originally, yes, but then I changed the output function to sort by time rather than name.
Also, I also got into a habit from before there was an official unordered_map of using std::set or std::map unless I had a specific performance reason not to, just for consistent iteration order of tests and predictably good performance without outliers. Your comment suggests that is not the envoy culture so I'll switch.
There was a problem hiding this comment.
I tried changing to an unordered_map, and ran into a heap of trouble getting it to compile. Other hash_map implementations I've used before unordered_map would have been very easy; just change the declaration in the header. However my usage of [] to do find/insert in one operation doesn't seem to work well with unordered_map, at least with the data structures I'm using as keys and values. Maybe I need to make a custom hash/equals functors when my key is a pair.
There is some discussion about the differences here, which I didn't fully grasp: https://stackoverflow.com/questions/17172080/insert-vs-emplace-vs-operator-in-c-map .
Also there's some performance notes here: https://stackoverflow.com/questions/3902644/choosing-between-stdmap-and-stdunordered-map
I think I'll leave a TODO to try switching again to unordered_map.
There was a problem hiding this comment.
I think general C++ best practice is to prefer unordered_map over map unless ordering matters; this I picked up via Google style rather than the Envoy code base. The key in your example is a string, so not sure why that is problematic when switching to a hash map.
There was a problem hiding this comment.
The key is now a pair of strings but anyway once I defined an explicit hasher, unordered_map worked great. The compiler error messages when I lacked one were shockingly unhelpful :)
| */ | ||
| PerfAnnotationContext(); | ||
|
|
||
| typedef std::pair<std::chrono::nanoseconds, uint64_t> DurationCount; |
| context_(PerfAnnotationContext::getOrCreate()) {} | ||
|
|
||
| void PerfOperation::record(absl::string_view category, absl::string_view description) { | ||
| SystemTime end_time = ProdSystemTimeSource::instance_.currentTime(); |
There was a problem hiding this comment.
wish there was a lint feature to find those. TBH I'm not sure why local scalars ever need to be declared const. The compiler will figure it out either way.
There was a problem hiding this comment.
This style quirk is not for compiler performance reasons (I agree, it should be able to infer). The idea is to just have immutability by default for correctness reasons.
There was a problem hiding this comment.
OK, anyway it's a pretty minor point and I have no problem adding const to local scalars; I just have a lot of decades of not bothering to do that to try to overcome, which is why I wish I had a linter to help me :)
| public: | ||
| /** | ||
| * Records time consumed by a category and description, which are just | ||
| * joined together in the library with " / ". |
There was a problem hiding this comment.
I think there's some context that should be added to parse the second part of this sentence.
There was a problem hiding this comment.
Actually I decided to make them separate columns rather than joining with " / ".
| std::vector<std::string> columns[num_columns]; | ||
| for (size_t i = 0; i < num_columns; ++i) { | ||
| columns[i].push_back(headers[i]); | ||
| widths[i] = strlen(headers[i]); |
There was a problem hiding this comment.
Nit: generally prefer to promote to std::string rather than use C unsafe string library functions. This makes it easier to scan the code base for C unsafe string functions when hunting for possible security issues.
| // it inline with the largest. | ||
| if (i != (num_columns - 1)) { | ||
| out.append(widths[i] - str.size(), ' '); | ||
| absl::StrAppend(&out, str, " "); |
There was a problem hiding this comment.
Did you consider printf style formatting with fmtlib (http://fmtlib.net/latest/syntax.html)?
There was a problem hiding this comment.
OK, done. I'm not sure if it's easier to read but it's OK. I wound up factoring out the formatting strings and putting them into a separate vector.
| (count == 0) | ||
| ? "NaN" | ||
| : std::to_string( | ||
| std::chrono::duration_cast<std::chrono::nanoseconds>(duration).count() / count)); |
There was a problem hiding this comment.
Have you considered tracking other cheap stats such as min/max/stddev? Having an idea of variance is pretty useful in understanding the average.
There was a problem hiding this comment.
I've added a TODO for that; should be an easy incremental improvement.
There was a problem hiding this comment.
FWIW that is now done.
- attempt to switch to unordered_map (deferred to TODO) - change typedef to using - more const for scalar locals - switch the category/description to be a separate columns to make the API easier to explain - use fmt::format for field right-justification - consider adding other stats (deferred to TODO) Signed-off-by: Joshua Marantz <jmarantz@google.com>
I realized my mistake and got unordered_map to work as expected. Needed to define a hasher because there's not a builtin one for std::pair, though it seems like there could/should be, as there is a builtin operator== for std::pair. Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Also fixes a compilation problem seen in CI due to lack of explicit conversion from absl::string_view to std::string. Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
|
@mrice32 had several comments in #2615 which are applicable here, and repeated here:
|
|
RE @mrice32 comments:
|
…s alternative. Signed-off-by: Joshua Marantz <jmarantz@google.com>
|
… are in nanoseconds. Signed-off-by: Joshua Marantz <jmarantz@google.com>
|
@mrice32 per f2f, one reason I'd prefer not to use ThreadSafeSingleton is that in the current state, the object remains completely uninstantiated unless enabled at compile-time. I actually was torn over whether to use a singleton at all for this, and went this way because I thought it would be easier to instrument code with zero production effect. In the future it may be worth putting the object into an appropriate context object so it's available where needed -- maybe using one of the thread-local objects. |
|
I think all comments are addressed; PTAL. One that I addressed really with just a comment was the possibility of taking another dependency for a library that's got some overlap with this: https://github.com/LLNL/Caliper. I did have a quick look at that and I agree there's a lot of overlap but I don't think the output was what I was looking for in #2615. Still it might be nice to think about utilizing that one as well. |
|
ping :) @dnoe |
| context_->record(duration, category, description); | ||
| } | ||
|
|
||
| PerfAnnotationContext::PerfAnnotationContext() {} |
There was a problem hiding this comment.
Well I've explicitly declared it to be private to guide users to use getOrCreate(), so it needs to be implemented. And I don't think it wants to be inline due to the complexity of the map creation. Adding a comment.
| } | ||
| stats.max_ = std::max(stats.max_, duration); | ||
| stats.total_ += duration; | ||
| // TODO(jmarantz): accumulate standard deviation. |
There was a problem hiding this comment.
What does this mean given the above stddev_.update?
There was a problem hiding this comment.
detritrus; removed :)
| // However, the macros for instrumenting code for performance analysis will expand | ||
| // to nothing. | ||
| // | ||
| // See also: https://github.com/LLNL/Caliper -- it may be worth integrating with |
There was a problem hiding this comment.
This methodology is also similar to KStats used at VMware for the VMM, see https://labs.vmware.com/vmtj/methodology-for-performance-analysis-of-vmware-vsphere-under-tier-1-applications. We wrote a bit about how we used this in https://dl.acm.org/citation.cfm?id=1899945&dl=ACM&coll=DL as well.
There was a problem hiding this comment.
added more comments for further reading.
| columns[7].push_back(category_description.second); | ||
| for (size_t i = 0; i < num_columns; ++i) { | ||
| widths[i] = std::max(widths[i], columns[i].back().size()); | ||
| } |
There was a problem hiding this comment.
FWIW, I'm thinking just dumping to HTML might be even easier, since then there is no manual layout engine work to be done, but this is fine.
There was a problem hiding this comment.
I would rather generate columnar text for easy fast consumption while developing. I think an HTML table generator would also be very useful if/when this is hooked to the admin console, so I added a TODO for that.
| return std::to_string(std::chrono::duration_cast<std::chrono::nanoseconds>(ns).count()); | ||
| }; | ||
| columns[0].push_back(microseconds_string(stats.total_)); | ||
| uint64_t count = stats.stddev_.count(); |
| * @param duration std::chrono::nanoseconds the duration. | ||
| * @param category absl::string_view the name of a category for the recording. | ||
| * @param category absl::string_view the name of description for the recording. | ||
| * |
| /** | ||
| * Report an event relative to the operation in progress. Note report can be called | ||
| * multiple times on a single PerfOperation, with distinct category/description combinations. | ||
| * @param category absl::string_view the name of a category for the recording. |
There was a problem hiding this comment.
FYI, we don't typically include types in @param. I know, I know, none of this makes any sense other than from a consistency argument perspective.
There was a problem hiding this comment.
my bad; this is explicit in STYLE.md. Done.
| double WelfordStandardDeviation::computeStandardDeviation() const { | ||
| const double variance = computeVariance(); | ||
| // It seems very difficult for variance to go negative, but from the calculation in update() | ||
| // above, I can't quite convince myself it's impossible, so put in a guard to be sure. |
There was a problem hiding this comment.
Should this just be an ASSERT, since it shouldn't be possible to have negative variance..
There was a problem hiding this comment.
I agree it shouldn't be possible, but this algorithm is an approximation and it's not super-obvious to me that it can't happen given the way it's calculated. So I thought it's easier just to guard against it and avoid runtime exceptions.
… const local scalars. Signed-off-by: Joshua Marantz <jmarantz@google.com>
As discussed in the weekly meeting, this does not provide a C++ implementation of Platform filters, merely the ability to configure Envoy to use them. Part of: #2498 Risk Level: Low Testing: New unit tests Docs Changes: N/A Release Notes: Updated version_history.rst Signed-off-by: Ryan Hamilton <rch@google.com> Signed-off-by: JP Simard <jp@jpsim.com>
Follow up from #2626. Risk Level: None Testing: N/A Docs Changes: N/A Release Notes: N/A Signed-off-by: Ryan Hamilton <rch@google.com> Signed-off-by: JP Simard <jp@jpsim.com>
As discussed in the weekly meeting, this does not provide a C++ implementation of Platform filters, merely the ability to configure Envoy to use them. Part of: #2498 Risk Level: Low Testing: New unit tests Docs Changes: N/A Release Notes: Updated version_history.rst Signed-off-by: Ryan Hamilton <rch@google.com> Signed-off-by: JP Simard <jp@jpsim.com>
Follow up from #2626. Risk Level: None Testing: N/A Docs Changes: N/A Release Notes: N/A Signed-off-by: Ryan Hamilton <rch@google.com> Signed-off-by: JP Simard <jp@jpsim.com>
Description:
This is broken out from #2615 as it likely deserves its own review. This provides a mechanism to annotate and measure the costs of functions that are data-dependent, e.g. regexes.
This is step 3a-ish, in the plan to improve startup performance. This is another step toward addressing #2373
Adds perf annotation library that can be used to instrument code but disappear completely from generated code unless enabled with bazel --define=perf_annotation=enabled
Produces tables in this format.
Instrumentation can be coded into the system but is turned off via compiler macros so there is zero cost in production.
Risk Level: Low -- new utility library not used by anything yet.
Release Notes: N/A