stats: faster versions of SymbolTableImpl::lessThan and StatName::startsWith#19563
stats: faster versions of SymbolTableImpl::lessThan and StatName::startsWith#19563jmarantz merged 19 commits intoenvoyproxy:mainfrom
Conversation
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
|
/retest |
|
Retrying Azure Pipelines: |
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
…mparison. Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
pradeepcrao
left a comment
There was a problem hiding this comment.
Will be interesting to see what the benchmarks say once the class hierarchy is removed, eliminating vtable lookups..
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
…her than array+size Signed-off-by: Joshua Marantz <jmarantz@google.com>
jmarantz
left a comment
There was a problem hiding this comment.
thanks for the review!
Signed-off-by: Joshua Marantz <jmarantz@google.com>
|
LGTM |
snowp
left a comment
There was a problem hiding this comment.
Thanks this looks reasonable to me, a few comments
| TokenIter::TokenType prefix_type = prefix_iter.next(); | ||
| TokenIter::TokenType this_type = this_iter.next(); | ||
| if (prefix_type == TokenIter::TokenType::End) { | ||
| break; // "a.b.c" starts with "a.b" or "a.b.c" |
There was a problem hiding this comment.
Maybe just use return here? It's a bit confusing having both break and return control flows here imo
There was a problem hiding this comment.
You raise a good point. It's a bit annoying though to have a while(true) loop inside a function that returns a value, without a break, because I don't know how to get both gcc and clang to compile without complaint.
I was trying to use gcc to make debugging with gdb better (though I failed), but other people might want to compile gcc without warnings.
Now I can't reproduce the problem I thought this solved, so I went with all 'returns' in the loop, and an unreachable 'return true' outside it (with a comment).
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
snowp
left a comment
There was a problem hiding this comment.
LGTM, thanks!
We should probably have another maintainer sign off on this, any suggestions for who?
|
Agreed - I think @ggreenway might have the most context - I think the Prometheus formatter he wrote might be the first beneficiary of this. |
|
@ggreenway ping |
Commit Message: Removes the pure interface for symbol tables, as there is now only one implementation. It was originally added to enable a fake implementation while the impl and its usage was being refined, but now the interface layer just gets in the way. For example it's not possible to add a virtual template method to SymbolTable which would have been useful in #19563 Additional Description: Risk Level: low Testing: //test/... Docs Changes: n/a Release Notes: n/a Platform Specific Features: n/a Signed-off-by: Joshua Marantz <jmarantz@google.com>
…rtsWith (envoyproxy#19563) Commit Message: Improvements in the admin panel (envoyproxy#19546 envoyproxy#18670) put greater pressure on sorting of stats via StatName, as opposed to saving elaborated strings for every stat and sorting by that. This new implementation provides an iterator-like interface for decoding the tokens in StatName, enabling us to early-exit when comparing stat names with multiple tokens. For example, if you are comparing "a.b.c.d" against "a.x.y.z" we can abort out after the "b" vs "x" comparison, and there is no need to compare "c" to "y". Of course it's not as fast as comparing strings, but it saves having to hold the elaborated strings in memory. It also adds a new sortByStatNames free function which is more efficient than calling std::sort directly because it can take the symbol-table lock once for the whole sort, rather than re-taking the lock on each comparison. Taking uncontended locks is fast, but as the benchmark shows, it's not as fast as not taking locks. Additional Description: A benchmark for this is added in this PR: OLD: ``` ------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------ bmCompareElements 174 ns 174 ns 4028127 bmStdSort 318591243 ns 318515288 ns 2 ``` NEW: ``` ------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------ bmCompareElements 55.9 ns 55.9 ns 12515696 bmSortByStatNames 61342554 ns 61329512 ns 11 bmStdSort 80578659 ns 80562977 ns 9 ``` The raw CompareElements numbers show the raw speed improvement offered by the early-exit as well as refraining from creating temp vectors, but count on 20M compares to sort 1M stats. The sort numbers provide more context. in the old code, sorting 100k stats (from 1k clusters) takes 318ms (manually testing 1M stats is around 2.8 seconds but I didn't want to have CI run such a long test each time). Doing that using std::sort with the new comparison algorithm is 81ms and with the optimized sortByStatNames is 61ms. The burst of main-thread activity due to an admin /stats request may impact long-tail data-plane latency on a heavily loaded system. which is why in envoyproxy#18670 we segment into scopes. In the meantime, this reduces the speed impact of sorting. Risk Level: medium -- hopefully the existing unit tests covered all interesting corner cases Testing: //test/common/stats/... plus the new benchmark Docs Changes: n/a Release Notes: n/a Platform Specific Features: n/a Signed-off-by: Joshua Marantz <jmarantz@google.com> Signed-off-by: Josh Perry <josh.perry@mx.com>
Commit Message: Removes the pure interface for symbol tables, as there is now only one implementation. It was originally added to enable a fake implementation while the impl and its usage was being refined, but now the interface layer just gets in the way. For example it's not possible to add a virtual template method to SymbolTable which would have been useful in envoyproxy#19563 Additional Description: Risk Level: low Testing: //test/... Docs Changes: n/a Release Notes: n/a Platform Specific Features: n/a Signed-off-by: Joshua Marantz <jmarantz@google.com> Signed-off-by: Josh Perry <josh.perry@mx.com>
Commit Message: Improvements in the admin panel (#19546 #18670) put greater pressure on sorting of stats via StatName, as opposed to saving elaborated strings for every stat and sorting by that.
This new implementation provides an iterator-like interface for decoding the tokens in StatName, enabling us to early-exit when comparing stat names with multiple tokens. For example, if you are comparing "a.b.c.d" against "a.x.y.z" we can abort out after the "b" vs "x" comparison, and there is no need to compare "c" to "y". Of course it's not as fast as comparing strings, but it saves having to hold the elaborated strings in memory.
It also adds a new sortByStatNames free function which is more efficient than calling std::sort directly because it can take the symbol-table lock once for the whole sort, rather than re-taking the lock on each comparison. Taking uncontended locks is fast, but as the benchmark shows, it's not as fast as not taking locks.
Additional Description:
A benchmark for this is added in this PR:
OLD:
NEW:
The raw CompareElements numbers show the raw speed improvement offered by the early-exit as well as refraining from creating temp vectors, but count on 20M compares to sort 1M stats.
The sort numbers provide more context. in the old code, sorting 100k stats (from 1k clusters) takes 318ms (manually testing 1M stats is around 2.8 seconds but I didn't want to have CI run such a long test each time). Doing that using std::sort with the new comparison algorithm is 81ms and with the optimized sortByStatNames is 61ms.
The burst of main-thread activity due to an admin
/statsrequest may impact long-tail data-plane latency on a heavily loaded system. which is why in #18670 we segment into scopes. In the meantime, this reduces the speed impact of sorting.Risk Level: medium -- hopefully the existing unit tests covered all interesting corner cases
Testing: //test/common/stats/... plus the new benchmark
Docs Changes: n/a
Release Notes: n/a
Platform Specific Features: n/a