-
Notifications
You must be signed in to change notification settings - Fork 248
IdentifierSet performance improvements #525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
sbarzowski
merged 2 commits into
google:master
from
inkel:identifierset-performance-improvements
Mar 25, 2021
Merged
IdentifierSet performance improvements #525
sbarzowski
merged 2 commits into
google:master
from
inkel:identifierset-performance-improvements
Mar 25, 2021
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This will be useful later on when implementing the performance improvements to compare how much we gain. Signed-off-by: Leandro López <leandro.lopez@grafana.com>
We know for sure how big the resulting slice will be, so instead of having Go perform checks and reallocations whenever the slice grows too small by using `append` we calculate the resulting size earlier and directly index each element into the output slice. This result in only two allocations all the time, no matter how big the resulting slice will be. With it also come big performance improvements: name old time/op new time/op delta ToOrderedSlice/1_unique_identifiers-8 173ns ±10% 150ns ± 4% -13.16% (p=0.000 n=10+9) ToOrderedSlice/10_unique_identifiers-8 1.24µs ± 6% 0.51µs ± 5% -58.95% (p=0.000 n=9+9) ToOrderedSlice/100_unique_identifiers-8 20.4µs ±20% 4.0µs ± 5% -80.25% (p=0.000 n=10+9) ToOrderedSlice/1000_unique_identifiers-8 253µs ± 9% 40µs ± 5% -84.19% (p=0.000 n=10+10) ToOrderedSlice/10000_unique_identifiers-8 3.44ms ± 9% 0.37ms ± 5% -89.32% (p=0.000 n=9+9) ToOrderedSlice/100000_unique_identifiers-8 48.6ms ±13% 3.6ms ± 4% -92.49% (p=0.000 n=10+10) name old alloc/op new alloc/op delta ToOrderedSlice/1_unique_identifiers-8 40.0B ± 0% 40.0B ± 0% ~ (all equal) ToOrderedSlice/10_unique_identifiers-8 520B ± 0% 184B ± 0% -64.62% (p=0.000 n=10+10) ToOrderedSlice/100_unique_identifiers-8 4.10kB ± 0% 1.82kB ± 0% -55.75% (p=0.000 n=10+10) ToOrderedSlice/1000_unique_identifiers-8 32.8kB ± 0% 16.4kB ± 0% -49.95% (p=0.000 n=10+10) ToOrderedSlice/10000_unique_identifiers-8 826kB ± 0% 164kB ± 0% -80.16% (p=0.000 n=10+10) ToOrderedSlice/100000_unique_identifiers-8 9.25MB ± 0% 1.61MB ± 0% -82.64% (p=0.000 n=10+10) name old allocs/op new allocs/op delta ToOrderedSlice/1_unique_identifiers-8 2.00 ± 0% 2.00 ± 0% ~ (all equal) ToOrderedSlice/10_unique_identifiers-8 6.00 ± 0% 2.00 ± 0% -66.67% (p=0.000 n=10+10) ToOrderedSlice/100_unique_identifiers-8 9.00 ± 0% 2.00 ± 0% -77.78% (p=0.000 n=10+10) ToOrderedSlice/1000_unique_identifiers-8 12.0 ± 0% 2.0 ± 0% -83.33% (p=0.000 n=10+10) ToOrderedSlice/10000_unique_identifiers-8 21.0 ± 0% 2.0 ± 0% -90.48% (p=0.000 n=10+10) ToOrderedSlice/100000_unique_identifiers-8 31.0 ± 0% 2.0 ± 0% -93.55% (p=0.000 n=10+10) For ToSLice we use the same changes except the sort operation. In this case the results are even more impressive: allocations are stable at just 1 per operation, regardless of set size, and performance is orders of magnitude faster: name old time/op new time/op delta ToSlice/1_unique_identifiers-8 83.0ns ± 1% 92.4ns ± 0% +11.37% (p=0.000 n=9+8) ToSlice/10_unique_identifiers-8 295ns ± 4% 749ns ± 3% +154.21% (p=0.000 n=10+10) ToSlice/100_unique_identifiers-8 2.33µs ± 2% 4.28µs ± 2% +83.62% (p=0.000 n=9+8) ToSlice/1000_unique_identifiers-8 25.4µs ±10% 34.7µs ± 1% +36.51% (p=0.000 n=10+10) ToSlice/10000_unique_identifiers-8 215µs ± 2% 543µs ± 1% +152.15% (p=0.000 n=9+10) ToSlice/100000_unique_identifiers-8 2.05ms ± 2% 7.17ms ± 1% +249.53% (p=0.000 n=9+9) name old alloc/op new alloc/op delta ToSlice/1_unique_identifiers-8 16.0B ± 0% 16.0B ± 0% ~ (all equal) ToSlice/10_unique_identifiers-8 160B ± 0% 496B ± 0% +210.00% (p=0.000 n=10+10) ToSlice/100_unique_identifiers-8 1.79kB ± 0% 4.08kB ± 0% +127.68% (p=0.000 n=10+10) ToSlice/1000_unique_identifiers-8 16.4kB ± 0% 32.8kB ± 0% +99.90% (p=0.000 n=10+10) ToSlice/10000_unique_identifiers-8 164kB ± 0% 826kB ± 0% +404.12% (p=0.000 n=8+10) ToSlice/100000_unique_identifiers-8 1.61MB ± 0% 9.25MB ± 0% +475.93% (p=0.000 n=10+10) name old allocs/op new allocs/op delta ToSlice/1_unique_identifiers-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) ToSlice/10_unique_identifiers-8 1.00 ± 0% 5.00 ± 0% +400.00% (p=0.000 n=10+10) ToSlice/100_unique_identifiers-8 1.00 ± 0% 8.00 ± 0% +700.00% (p=0.000 n=10+10) ToSlice/1000_unique_identifiers-8 1.00 ± 0% 11.00 ± 0% +1000.00% (p=0.000 n=10+10) ToSlice/10000_unique_identifiers-8 1.00 ± 0% 20.00 ± 0% +1900.00% (p=0.000 n=10+10) ToSlice/100000_unique_identifiers-8 1.00 ± 0% 30.00 ± 0% +2900.00% (p=0.000 n=10+10) Signed-off-by: Leandro López <leandro.lopez@grafana.com>
aaf82f5 to
e7d711f
Compare
Contributor
|
Nice! |
Contributor
Author
|
Thank you @sbarzowski ! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hello. While profiling another tool (Grafana's Tanka) I've found that 13% of memory profiling was spent at
ast.IdentifierSet.ToOrderedSlice. After checking the code I've found a small change that drastically improves not only memory consumption in a stable way (only 2 allocations per call toToOrderSlice, regardless the size of the output slice) but drastically improves the performance of this andToSlicemethods.I've added some benchmarks to test this and found the following: with the code at current master I get these numbers:
As you can see, the number of allocations and bytes per operation depends on the size of the underlying map.
With the changes proposed in this PR, I'm getting these numbers instead:
Performant was improved in all places by orders of magnitude, as reported by
benchstat:I recompiled my local Tanka using these improvements and now when profiling output this method isn't listed anymore, not even in the
top 10output.