Skip to content

Conversation

@seanmonstar
Copy link
Owner

This takes over where #38 left off (huge thanks @kamyuentse), and gets it working on 1.27.0, while keeping our minimum compiler version (1.10.0!). It might seem a little weird, so I'll explain how it's doing both runtime and compiletime detection for maximum performance.

Runtime Detection

The stable feature in Rust 1.27 includes is_x86_feature_enabled!, which allows checking if a certain target feature is enabled. Internally, it uses the unstable cfg(target_feature), but can also query the CPU at runtime. As of 1.27, the runtime check isn't inlined, which means that adding SIMD support was actually slower than with it disabled.

A patch to the stdsimd crate has already landed to include checks, but in the mean time, httparse uses its own inlined cache. After querying the macros once, the feature set is stored in a local atomic, and checking it results in an overall speed improvement!

However, by using this cache, it actually interferes slightly with optimizations the compiler could do if compiled with target_cpu=native. That's because the macro internally uses cfg(target_feature), and when that is set, the entire branch can be eliminated.

Compile-time detection

So, we already have a win with runtime detection. This also includes support to use compile time detection, even though it isn't stable in Rust 1.27! It takes advantage of the fact that cargo includes a CARGO_CFG_TARGET_FEATURE environment variable exposed to build scripts.

So, the new build script also looks for that environment variable, and if it detects that someone is compiling with certains features we can use (either sse4.2 or avx2), that information is emitted in custom httparse cfg options.

Then, the compilation of httparse will use a version that doesn't use our cached feature detection, and just uses is_x86_feature_enabled! directly. Since we saw before that the feature has been enabled, this will in most cases mean the branch is eliminated entirely.

Both runtime and compile-time detection in httparse can be disabled, though it is currently meant for testing (to be able to run the tests with all the various parsing methods in CI).

Benchmark improvements

Pre-1.27 (or when specifically configured SIMD off)

bench_httparse       ... bench:    529 ns/iter (+/- 13) = 1328 MB/s
bench_httparse_short ... bench:    66 ns/iter (+/- 1) = 1030 MB/s
bench_pico           ... bench:    492 ns/iter (+/- 11) = 1428 MB/s
bench_pico_short     ... bench:    72 ns/iter (+/- 3) = 944 MB/s 

1.27 with runtime detection (and my CPU has SSE4.2):

bench_httparse       ... bench:    451 ns/iter (+/- 16) = 1558 MB/s
bench_httparse_short ... bench:    70 ns/iter (+/- 8) = 971 MB/s
bench_pico           ... bench:    492 ns/iter (+/- 11) = 1428 MB/s
bench_pico_short     ... bench:    72 ns/iter (+/- 3) = 944 MB/s 

1.27 when setting -C target_cpu=native (and my CPU has SSE4.2):

bench_httparse       ... bench:    405 ns/iter (+/- 23) = 1735 MB/s
bench_httparse_short ... bench:    62 ns/iter (+/- 1) = 1096 MB/s
bench_pico           ... bench:    492 ns/iter (+/- 11) = 1428 MB/s
bench_pico_short     ... bench:    72 ns/iter (+/- 3) = 944 MB/s 

Takeaways

  • Without SIMD, httparse is oh-so-slightly faster than Pico when the requests are tiny, but a bit slower on a more realist request from a browser.
  • With runtime detection (and SSE4.2 on the CPU), httparse loses a couple nanoseconds on small requests (it adds a branch that wasn't there before), but sees ~15% improvement on the bigger more common requests.
  • With -C target_cpu=native (and SSE4.2 on the CPU) httparse no longer loses time on smaller requests, since the branch is eliminated at compile time, and is another ~11% faster on normal requests than with runtime detection (or a total of ~24% improvement)!

@seanmonstar seanmonstar merged commit 97b2925 into master Jun 22, 2018
@seanmonstar seanmonstar deleted the simd branch June 22, 2018 02:04
@kamyuentse
Copy link
Contributor

@seanmonstar Awesome! Just post a pic related to the approach we used here, maybe this will be helpful later.

simd

@nox nox mentioned this pull request Mar 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants