Skip to content

stats: faster versions of SymbolTableImpl::lessThan and StatName::startsWith#19563

Merged
jmarantz merged 19 commits intoenvoyproxy:mainfrom
jmarantz:symtab-compare-speed
Feb 1, 2022
Merged

stats: faster versions of SymbolTableImpl::lessThan and StatName::startsWith#19563
jmarantz merged 19 commits intoenvoyproxy:mainfrom
jmarantz:symtab-compare-speed

Conversation

@jmarantz
Copy link
Copy Markdown
Contributor

@jmarantz jmarantz commented Jan 16, 2022

Commit Message: Improvements in the admin panel (#19546 #18670) put greater pressure on sorting of stats via StatName, as opposed to saving elaborated strings for every stat and sorting by that.

This new implementation provides an iterator-like interface for decoding the tokens in StatName, enabling us to early-exit when comparing stat names with multiple tokens. For example, if you are comparing "a.b.c.d" against "a.x.y.z" we can abort out after the "b" vs "x" comparison, and there is no need to compare "c" to "y". Of course it's not as fast as comparing strings, but it saves having to hold the elaborated strings in memory.

It also adds a new sortByStatNames free function which is more efficient than calling std::sort directly because it can take the symbol-table lock once for the whole sort, rather than re-taking the lock on each comparison. Taking uncontended locks is fast, but as the benchmark shows, it's not as fast as not taking locks.

Additional Description:
A benchmark for this is added in this PR:
OLD:

------------------------------------------------------------
Benchmark                  Time             CPU   Iterations
------------------------------------------------------------
bmCompareElements        174 ns          174 ns      4028127
bmStdSort          318591243 ns    318515288 ns            2

NEW:

------------------------------------------------------------
Benchmark                  Time             CPU   Iterations
------------------------------------------------------------
bmCompareElements       55.9 ns         55.9 ns     12515696
bmSortByStatNames   61342554 ns     61329512 ns           11
bmStdSort           80578659 ns     80562977 ns            9

The raw CompareElements numbers show the raw speed improvement offered by the early-exit as well as refraining from creating temp vectors, but count on 20M compares to sort 1M stats.

The sort numbers provide more context. in the old code, sorting 100k stats (from 1k clusters) takes 318ms (manually testing 1M stats is around 2.8 seconds but I didn't want to have CI run such a long test each time). Doing that using std::sort with the new comparison algorithm is 81ms and with the optimized sortByStatNames is 61ms.

The burst of main-thread activity due to an admin /stats request may impact long-tail data-plane latency on a heavily loaded system. which is why in #18670 we segment into scopes. In the meantime, this reduces the speed impact of sorting.

Risk Level: medium -- hopefully the existing unit tests covered all interesting corner cases
Testing: //test/common/stats/... plus the new benchmark
Docs Changes: n/a
Release Notes: n/a
Platform Specific Features: n/a

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
@jmarantz
Copy link
Copy Markdown
Contributor Author

/retest

@repokitteh-read-only
Copy link
Copy Markdown

Retrying Azure Pipelines:
Check envoy-presubmit isn't fully completed, but will still attempt retrying.
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #19563 (comment) was created by @jmarantz.

see: more, trace.

@jmarantz jmarantz changed the title stats: faster versions of SymbolTableImpl::lessThan and StatName::StartsWith stats: faster versions of SymbolTableImpl::lessThan and StatName::startsWith Jan 16, 2022
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
…mparison.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Copy link
Copy Markdown
Contributor

@pradeepcrao pradeepcrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be interesting to see what the benchmarks say once the class hierarchy is removed, eliminating vtable lookups..

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
…her than array+size

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Copy link
Copy Markdown
Contributor Author

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the review!

Signed-off-by: Joshua Marantz <jmarantz@google.com>
@pradeepcrao
Copy link
Copy Markdown
Contributor

LGTM

Copy link
Copy Markdown
Contributor

@snowp snowp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks this looks reasonable to me, a few comments

TokenIter::TokenType prefix_type = prefix_iter.next();
TokenIter::TokenType this_type = this_iter.next();
if (prefix_type == TokenIter::TokenType::End) {
break; // "a.b.c" starts with "a.b" or "a.b.c"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just use return here? It's a bit confusing having both break and return control flows here imo

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You raise a good point. It's a bit annoying though to have a while(true) loop inside a function that returns a value, without a break, because I don't know how to get both gcc and clang to compile without complaint.

I was trying to use gcc to make debugging with gdb better (though I failed), but other people might want to compile gcc without warnings.

Now I can't reproduce the problem I thought this solved, so I went with all 'returns' in the loop, and an unreachable 'return true' outside it (with a comment).

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Copy link
Copy Markdown
Contributor

@snowp snowp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

We should probably have another maintainer sign off on this, any suggestions for who?

@jmarantz
Copy link
Copy Markdown
Contributor Author

Agreed - I think @ggreenway might have the most context - I think the Prometheus formatter he wrote might be the first beneficiary of this.

@snowp
Copy link
Copy Markdown
Contributor

snowp commented Feb 1, 2022

@ggreenway ping

Copy link
Copy Markdown
Member

@ggreenway ggreenway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me

@jmarantz jmarantz merged commit aded9c3 into envoyproxy:main Feb 1, 2022
@jmarantz jmarantz deleted the symtab-compare-speed branch February 1, 2022 20:38
jmarantz added a commit that referenced this pull request Feb 2, 2022
Commit Message: Removes the pure interface for symbol tables, as there is now only one implementation. It was originally added to enable a fake implementation while the impl and its usage was being refined, but now the interface layer just gets in the way. For example it's not possible to add a virtual template method to SymbolTable which would have been useful in #19563
Additional Description:
Risk Level: low
Testing: //test/...
Docs Changes: n/a
Release Notes: n/a
Platform Specific Features: n/a

Signed-off-by: Joshua Marantz <jmarantz@google.com>
joshperry pushed a commit to joshperry/envoy that referenced this pull request Feb 13, 2022
…rtsWith (envoyproxy#19563)

Commit Message: Improvements in the admin panel (envoyproxy#19546 envoyproxy#18670) put greater pressure on sorting of stats via StatName, as opposed to saving elaborated strings for every stat and sorting by that.

This new implementation provides an iterator-like interface for decoding the tokens in StatName, enabling us to early-exit when comparing stat names with multiple tokens. For example, if you are comparing "a.b.c.d" against "a.x.y.z" we can abort out after the "b" vs "x" comparison, and there is no need to compare "c" to "y". Of course it's not as fast as comparing strings, but it saves having to hold the elaborated strings in memory.

It also adds a new sortByStatNames free function which is more efficient than calling std::sort directly because it can take the symbol-table lock once for the whole sort, rather than re-taking the lock on each comparison. Taking uncontended locks is fast, but as the benchmark shows, it's not as fast as not taking locks.

Additional Description:
A benchmark for this is added in this PR:
OLD:
```
------------------------------------------------------------
Benchmark                  Time             CPU   Iterations
------------------------------------------------------------
bmCompareElements        174 ns          174 ns      4028127
bmStdSort          318591243 ns    318515288 ns            2
```
NEW:
```
------------------------------------------------------------
Benchmark                  Time             CPU   Iterations
------------------------------------------------------------
bmCompareElements       55.9 ns         55.9 ns     12515696
bmSortByStatNames   61342554 ns     61329512 ns           11
bmStdSort           80578659 ns     80562977 ns            9
```
The raw CompareElements numbers show the raw speed improvement offered by the early-exit as well as refraining from creating temp vectors, but count on 20M compares to sort 1M stats.

The sort numbers provide more context. in the old code, sorting 100k stats (from 1k clusters) takes 318ms (manually testing 1M stats is around 2.8 seconds but I didn't want to have CI run such a long test each time). Doing that using std::sort with the new comparison algorithm is 81ms and with the optimized sortByStatNames is 61ms.

The burst of main-thread activity due to an admin /stats request may impact long-tail data-plane latency on a heavily loaded system. which is why in envoyproxy#18670 we segment into scopes. In the meantime, this reduces the speed impact of sorting.

Risk Level: medium -- hopefully the existing unit tests covered all interesting corner cases
Testing: //test/common/stats/... plus the new benchmark
Docs Changes: n/a
Release Notes: n/a
Platform Specific Features: n/a

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Josh Perry <josh.perry@mx.com>
joshperry pushed a commit to joshperry/envoy that referenced this pull request Feb 13, 2022
Commit Message: Removes the pure interface for symbol tables, as there is now only one implementation. It was originally added to enable a fake implementation while the impl and its usage was being refined, but now the interface layer just gets in the way. For example it's not possible to add a virtual template method to SymbolTable which would have been useful in envoyproxy#19563
Additional Description:
Risk Level: low
Testing: //test/...
Docs Changes: n/a
Release Notes: n/a
Platform Specific Features: n/a

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Josh Perry <josh.perry@mx.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants