Skip to content

utility: Add performance annotation library#2626

Merged
htuch merged 13 commits intoenvoyproxy:masterfrom
jmarantz:perf-annotation-lib
Feb 28, 2018
Merged

utility: Add performance annotation library#2626
htuch merged 13 commits intoenvoyproxy:masterfrom
jmarantz:perf-annotation-lib

Conversation

@jmarantz
Copy link
Copy Markdown
Contributor

@jmarantz jmarantz commented Feb 15, 2018

Description:
This is broken out from #2615 as it likely deserves its own review. This provides a mechanism to annotate and measure the costs of functions that are data-dependent, e.g. regexes.

This is step 3a-ish, in the plan to improve startup performance. This is another step toward addressing #2373

Adds perf annotation library that can be used to instrument code but disappear completely from generated code unless enabled with bazel --define=perf_annotation=enabled

Produces tables in this format.

Duration(us)  # Calls  Mean(ns)  StdDev(ns)  Min(ns)  Max(ns)  Category  Description
        4600        4   1150000      129099  1000000  1300000     alpha            1
         200        1    200000         nan   200000   200000     gamma            2
          87        3     29000        1000    28000    30000      beta            3

Instrumentation can be coded into the system but is turned off via compiler macros so there is zero cost in production.

Risk Level: Low -- new utility library not used by anything yet.

Release Notes: N/A

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
jmarantz added a commit to jmarantz/envoy that referenced this pull request Feb 15, 2018
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be pretty useful. I've left a few comments to get the review going, but I'd like to verify that it makes sense to be adding our own thing here, as previously discussed.

A quick look around gives https://github.com/LLNL/Caliper and USDT (http://www.linuxinternals.org/blog/2018/02/10/usdt-notes/, Linux-specific, unclear on overhead).

I think what you have is clean and portable and might be simpler than adopting other solutions, but other tools will likely have richer statistical output and support both tracing and event recording. It would be good to have a principled approach to debug at some point.

PerfAnnotationContext();

typedef std::pair<std::chrono::nanoseconds, uint64_t> DurationCount;
typedef std::map<std::string, DurationCount> DurationCountMap;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is an ordered map here deliberate for stability in output order? If so, can you add a comment.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally, yes, but then I changed the output function to sort by time rather than name.

Also, I also got into a habit from before there was an official unordered_map of using std::set or std::map unless I had a specific performance reason not to, just for consistent iteration order of tests and predictably good performance without outliers. Your comment suggests that is not the envoy culture so I'll switch.

Copy link
Copy Markdown
Contributor Author

@jmarantz jmarantz Feb 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried changing to an unordered_map, and ran into a heap of trouble getting it to compile. Other hash_map implementations I've used before unordered_map would have been very easy; just change the declaration in the header. However my usage of [] to do find/insert in one operation doesn't seem to work well with unordered_map, at least with the data structures I'm using as keys and values. Maybe I need to make a custom hash/equals functors when my key is a pair.

There is some discussion about the differences here, which I didn't fully grasp: https://stackoverflow.com/questions/17172080/insert-vs-emplace-vs-operator-in-c-map .

Also there's some performance notes here: https://stackoverflow.com/questions/3902644/choosing-between-stdmap-and-stdunordered-map

I think I'll leave a TODO to try switching again to unordered_map.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think general C++ best practice is to prefer unordered_map over map unless ordering matters; this I picked up via Google style rather than the Envoy code base. The key in your example is a string, so not sure why that is problematic when switching to a hash map.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key is now a pair of strings but anyway once I defined an explicit hasher, unordered_map worked great. The compiler error messages when I lacked one were shockingly unhelpful :)

*/
PerfAnnotationContext();

typedef std::pair<std::chrono::nanoseconds, uint64_t> DurationCount;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: prefer using vs. typedef.

context_(PerfAnnotationContext::getOrCreate()) {}

void PerfOperation::record(absl::string_view category, absl::string_view description) {
SystemTime end_time = ProdSystemTimeSource::instance_.currentTime();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: const here and below.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wish there was a lint feature to find those. TBH I'm not sure why local scalars ever need to be declared const. The compiler will figure it out either way.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This style quirk is not for compiler performance reasons (I agree, it should be able to infer). The idea is to just have immutability by default for correctness reasons.

Copy link
Copy Markdown
Contributor Author

@jmarantz jmarantz Feb 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, anyway it's a pretty minor point and I have no problem adding const to local scalars; I just have a lot of decades of not bothering to do that to try to overcome, which is why I wish I had a linter to help me :)

public:
/**
* Records time consumed by a category and description, which are just
* joined together in the library with " / ".
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's some context that should be added to parse the second part of this sentence.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I decided to make them separate columns rather than joining with " / ".

std::vector<std::string> columns[num_columns];
for (size_t i = 0; i < num_columns; ++i) {
columns[i].push_back(headers[i]);
widths[i] = strlen(headers[i]);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: generally prefer to promote to std::string rather than use C unsafe string library functions. This makes it easier to scan the code base for C unsafe string functions when hunting for possible security issues.

// it inline with the largest.
if (i != (num_columns - 1)) {
out.append(widths[i] - str.size(), ' ');
absl::StrAppend(&out, str, " ");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider printf style formatting with fmtlib (http://fmtlib.net/latest/syntax.html)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, done. I'm not sure if it's easier to read but it's OK. I wound up factoring out the formatting strings and putting them into a separate vector.

(count == 0)
? "NaN"
: std::to_string(
std::chrono::duration_cast<std::chrono::nanoseconds>(duration).count() / count));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered tracking other cheap stats such as min/max/stddev? Having an idea of variance is pretty useful in understanding the average.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a TODO for that; should be an easy incremental improvement.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW that is now done.

- attempt to switch to unordered_map (deferred to TODO)
- change typedef to using
- more const for scalar locals
- switch the category/description to be a separate columns to make the API easier to explain
- use fmt::format for field right-justification
- consider adding other stats (deferred to TODO)

Signed-off-by: Joshua Marantz <jmarantz@google.com>
I realized my mistake and got unordered_map to work as expected.
Needed to define a hasher because there's not a builtin one for
std::pair, though it seems like there could/should be, as there is a
builtin operator== for std::pair.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Also fixes a compilation problem seen in CI due to lack of explicit conversion
from absl::string_view to std::string.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
@jmarantz
Copy link
Copy Markdown
Contributor Author

@mrice32 had several comments in #2615 which are applicable here, and repeated here:

  1. Is there a reason we're not using monotonic clocks/times here and throughout since we're just computing durations?

  2. Add doxygen annotation for params to PerfAnnotationContext::record() and elsewhere

  3. Is there any reason we can't just subclass the ThreadSafeSingleton to keep this singleton implementation consistent with others like it across Envoy?

  4. There don't appear to be any tests for PerfOperation. Can we add a few? I think just using a second timer to ensure that the recorded time will be greater than some value will suffice.

@jmarantz
Copy link
Copy Markdown
Contributor Author

RE @mrice32 comments:

  1. monotonic clocks: done
  2. doxygen annotation: done
  3. ThreadSafeSingleton; does not appear to be used consistently now. I'm actually not sure I want to switch to this because the explicit instantiation used currently works well with the tests, and also I'm thinking that if we want to use this for performance critical code, we might want to actually put it in thread-local storage and have a way to aggregate across the silos only when someone interactively requests output.
  4. PerfOperation (and all this new code) has 100% coverage via PerfAnnotationTest.testMacros (which is the recommended mechanism for code to do annotations). Do you think it's important to also test PerfOperation explicitly (as opposed to via PERF_OPERATION)?

…s alternative.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
@mrice32
Copy link
Copy Markdown
Member

mrice32 commented Feb 26, 2018

  1. You're probably right that it's not used in all cases, but I don't think that's necessarily reason to not use it as it is the most common and supported implementation. In both cases there will be leakage across tests that is cleaned up with the CLEAR macro - IIUC, the only difference would be calling a different getter method? (Feel free to correct me if I'm wrong) As for thread-local aggregation, the singleton implementation we use here will be overhauled regardless in that case, so I see no reason one would be preferred over the other. This is ultimately up to you as it's a relatively small detail, but I think it's generally good practice to try to reuse common libraries, especially when they are designed for your use-case, as they increase functional predictability and general readability across the codebase. The argument isn't necessarily that a special implementation of a singleton will be wrong, but more that it might differ in subtle ways that ends up violating others' assumptions about the object - especially with something like singletons. These sorts of small misunderstandings can lead to much larger bugs.

  2. Nope, you're totally right. I must've just glanced right past that test. Thanks for pointing that out.

… are in nanoseconds.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
@jmarantz
Copy link
Copy Markdown
Contributor Author

@mrice32 per f2f, one reason I'd prefer not to use ThreadSafeSingleton is that in the current state, the object remains completely uninstantiated unless enabled at compile-time. I actually was torn over whether to use a singleton at all for this, and went this way because I thought it would be easier to instrument code with zero production effect. In the future it may be worth putting the object into an appropriate context object so it's available where needed -- maybe using one of the thread-local objects.

@jmarantz
Copy link
Copy Markdown
Contributor Author

I think all comments are addressed; PTAL. One that I addressed really with just a comment was the possibility of taking another dependency for a library that's got some overlap with this: https://github.com/LLNL/Caliper. I did have a quick look at that and I agree there's a lot of overlap but I don't think the output was what I was looking for in #2615. Still it might be nice to think about utilizing that one as well.

@mrice32
Copy link
Copy Markdown
Member

mrice32 commented Feb 28, 2018

ping :) @dnoe

Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo minor comments.

context_->record(duration, category, description);
}

PerfAnnotationContext::PerfAnnotationContext() {}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I've explicitly declared it to be private to guide users to use getOrCreate(), so it needs to be implemented. And I don't think it wants to be inline due to the complexity of the map creation. Adding a comment.

}
stats.max_ = std::max(stats.max_, duration);
stats.total_ += duration;
// TODO(jmarantz): accumulate standard deviation.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean given the above stddev_.update?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

detritrus; removed :)

// However, the macros for instrumenting code for performance analysis will expand
// to nothing.
//
// See also: https://github.com/LLNL/Caliper -- it may be worth integrating with
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This methodology is also similar to KStats used at VMware for the VMM, see https://labs.vmware.com/vmtj/methodology-for-performance-analysis-of-vmware-vsphere-under-tier-1-applications. We wrote a bit about how we used this in https://dl.acm.org/citation.cfm?id=1899945&dl=ACM&coll=DL as well.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added more comments for further reading.

columns[7].push_back(category_description.second);
for (size_t i = 0; i < num_columns; ++i) {
widths[i] = std::max(widths[i], columns[i].back().size());
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I'm thinking just dumping to HTML might be even easier, since then there is no manual layout engine work to be done, but this is fine.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather generate columnar text for easy fast consumption while developing. I think an HTML table generator would also be very useful if/when this is hooked to the admin console, so I added a TODO for that.

return std::to_string(std::chrono::duration_cast<std::chrono::nanoseconds>(ns).count());
};
columns[0].push_back(microseconds_string(stats.total_));
uint64_t count = stats.stddev_.count();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: const

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* @param duration std::chrono::nanoseconds the duration.
* @param category absl::string_view the name of a category for the recording.
* @param category absl::string_view the name of description for the recording.
*
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: remove blank line.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

/**
* Report an event relative to the operation in progress. Note report can be called
* multiple times on a single PerfOperation, with distinct category/description combinations.
* @param category absl::string_view the name of a category for the recording.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, we don't typically include types in @param. I know, I know, none of this makes any sense other than from a consistency argument perspective.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my bad; this is explicit in STYLE.md. Done.

double WelfordStandardDeviation::computeStandardDeviation() const {
const double variance = computeVariance();
// It seems very difficult for variance to go negative, but from the calculation in update()
// above, I can't quite convince myself it's impossible, so put in a guard to be sure.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this just be an ASSERT, since it shouldn't be possible to have negative variance..

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it shouldn't be possible, but this algorithm is an approximation and it's not super-obvious to me that it can't happen given the way it's calculated. So I thought it's easier just to guard against it and avoid runtime exceptions.

… const local scalars.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@htuch htuch merged commit 0888f3f into envoyproxy:master Feb 28, 2018
@jmarantz jmarantz deleted the perf-annotation-lib branch February 28, 2018 20:18
jpsim pushed a commit that referenced this pull request Nov 28, 2022
As discussed in the weekly meeting, this does not provide a C++ implementation of Platform filters, merely the ability to configure Envoy to use them.

Part of: #2498

Risk Level: Low
Testing: New unit tests
Docs Changes: N/A
Release Notes: Updated version_history.rst

Signed-off-by: Ryan Hamilton <rch@google.com>
Signed-off-by: JP Simard <jp@jpsim.com>
jpsim pushed a commit that referenced this pull request Nov 28, 2022
Follow up from #2626.

Risk Level: None
Testing: N/A
Docs Changes: N/A
Release Notes: N/A

Signed-off-by: Ryan Hamilton <rch@google.com>
Signed-off-by: JP Simard <jp@jpsim.com>
jpsim pushed a commit that referenced this pull request Nov 29, 2022
As discussed in the weekly meeting, this does not provide a C++ implementation of Platform filters, merely the ability to configure Envoy to use them.

Part of: #2498

Risk Level: Low
Testing: New unit tests
Docs Changes: N/A
Release Notes: Updated version_history.rst

Signed-off-by: Ryan Hamilton <rch@google.com>
Signed-off-by: JP Simard <jp@jpsim.com>
jpsim pushed a commit that referenced this pull request Nov 29, 2022
Follow up from #2626.

Risk Level: None
Testing: N/A
Docs Changes: N/A
Release Notes: N/A

Signed-off-by: Ryan Hamilton <rch@google.com>
Signed-off-by: JP Simard <jp@jpsim.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants