use SSE4.2 and AVX2 instructions when available #40
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.

This takes over where #38 left off (huge thanks @kamyuentse), and gets it working on 1.27.0, while keeping our minimum compiler version (1.10.0!). It might seem a little weird, so I'll explain how it's doing both runtime and compiletime detection for maximum performance.
Runtime Detection
The stable feature in Rust 1.27 includes
is_x86_feature_enabled!, which allows checking if a certain target feature is enabled. Internally, it uses the unstablecfg(target_feature), but can also query the CPU at runtime. As of 1.27, the runtime check isn't inlined, which means that adding SIMD support was actually slower than with it disabled.A patch to the stdsimd crate has already landed to include checks, but in the mean time, httparse uses its own inlined cache. After querying the macros once, the feature set is stored in a local atomic, and checking it results in an overall speed improvement!
However, by using this cache, it actually interferes slightly with optimizations the compiler could do if compiled with
target_cpu=native. That's because the macro internally usescfg(target_feature), and when that is set, the entire branch can be eliminated.Compile-time detection
So, we already have a win with runtime detection. This also includes support to use compile time detection, even though it isn't stable in Rust 1.27! It takes advantage of the fact that cargo includes a
CARGO_CFG_TARGET_FEATUREenvironment variable exposed to build scripts.So, the new build script also looks for that environment variable, and if it detects that someone is compiling with certains features we can use (either sse4.2 or avx2), that information is emitted in custom httparse cfg options.
Then, the compilation of httparse will use a version that doesn't use our cached feature detection, and just uses
is_x86_feature_enabled!directly. Since we saw before that the feature has been enabled, this will in most cases mean the branch is eliminated entirely.Both runtime and compile-time detection in httparse can be disabled, though it is currently meant for testing (to be able to run the tests with all the various parsing methods in CI).
Benchmark improvements
Pre-1.27 (or when specifically configured SIMD off)
1.27 with runtime detection (and my CPU has SSE4.2):
1.27 when setting
-C target_cpu=native(and my CPU has SSE4.2):Takeaways
-C target_cpu=native(and SSE4.2 on the CPU) httparse no longer loses time on smaller requests, since the branch is eliminated at compile time, and is another ~11% faster on normal requests than with runtime detection (or a total of ~24% improvement)!