Skip to content

Conversation

@AaronO
Copy link
Contributor

@AaronO AaronO commented Apr 14, 2023

Also cleanup, builds off #131

We can see the overhead improvements in uri parsing for smaller values (where overhead is relatively significant) and we can see it compound in header/count accumulating the overhead of jumping in & out of SIMD.

header/count

1 2 4 8 16 32 64 128
Before 22 39 77 144 283 578 1092 2159
After 21 37 71 135 271 568 1025 2034

uri

1b 2b 4b 8b 16b 32b 64b 128b 256b 512b 1024b 2048b 4096b
Before 7 8 9 11 8 6 7 11 19 34 67 127 270
After 5 5 7 9 6 5 6 9 20 31 60 119 255

@seanmonstar
Copy link
Owner

cc @Noah-Kennedy

@AaronO
Copy link
Contributor Author

AaronO commented Apr 14, 2023

Small enum in lieu of func ptr is marginally better thanks to branch-prediction, observed on header/count:

test header/count_1 ... bench:          21 ns/iter (+/- 5)
test header/count_2 ... bench:          35 ns/iter (+/- 5)
test header/count_4 ... bench:          66 ns/iter (+/- 2)
test header/count_8 ... bench:         130 ns/iter (+/- 53)
test header/count_16 ... bench:         259 ns/iter (+/- 80)
test header/count_32 ... bench:         499 ns/iter (+/- 43)
test header/count_64 ... bench:         978 ns/iter (+/- 195)
test header/count_128 ... bench:        1938 ns/iter (+/- 116)

AaronO added a commit to AaronO/httparse that referenced this pull request Apr 15, 2023
First pass, building off seanmonstar#132
@AaronO AaronO mentioned this pull request Apr 15, 2023
@AaronO AaronO force-pushed the perf/simd-runtime-latency branch from 9c1232a to 5232599 Compare April 18, 2023 21:35
@AaronO AaronO force-pushed the perf/simd-runtime-latency branch from 30a143d to e7f4a84 Compare April 18, 2023 22:02
@AaronO
Copy link
Contributor Author

AaronO commented Apr 18, 2023

@seanmonstar Squashed to a single commit cleanup: simd runtime detection, since it's more of a cleanup than a perf improvement as we reverted to the atomic (which shouldn't be an issue in absolute but I would rather fine tune minimizing overhead of runtime feature detection in a separate PR)

@AaronO AaronO changed the title perf: SIMD runtime latency cleanup: SIMD runtime detection Apr 18, 2023
@seanmonstar
Copy link
Owner

I know when I originally added SIMD support to this crate, the is_x86_feature_detected! macro did not get inlined, so the function call was slower than caching in an atomic locally. Inline attributes were later added, so it could be that the cache is no longer worth keeping. Would be good to measure.

@AaronO
Copy link
Contributor Author

AaronO commented Apr 18, 2023

I know when I originally added SIMD support to this crate, the is_x86_feature_detected! macro did not get inlined, so the function call was slower than caching in an atomic locally. Inline attributes were later added, so it could be that the cache is no longer worth keeping. Would be good to measure.

I did assembly dumps and it is inlined. It still requires more finetuning and analysis that I think would be best addressed in its own PR.

@seanmonstar seanmonstar merged commit fbb0bdd into seanmonstar:master Apr 20, 2023
@AaronO AaronO deleted the perf/simd-runtime-latency branch April 20, 2023 20:18
AaronO added a commit to AaronO/httparse that referenced this pull request Apr 20, 2023
First pass at neon support, building off seanmonstar#132
AaronO added a commit to AaronO/httparse that referenced this pull request Apr 25, 2023
First pass at neon support, building off seanmonstar#132
seanmonstar pushed a commit that referenced this pull request Apr 25, 2023
First pass at neon support, building off #132
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants