goodbye simd crate, hello std::arch#456
Merged
BurntSushi merged 6 commits intorust-lang:masterfrom Mar 13, 2018
Merged
Conversation
Member
Author
|
Is it possible to use a |
Member
Author
|
Indeed, it is possible: macro_rules! defunion {
() => {
#[derive(Clone, Copy)]
#[allow(non_camel_case_types)]
pub union u8x32 {
vector: __m256i,
bytes: [u8; 32],
}
}
}
defunion!(); |
This commit ports the Teddy searcher to use std::arch and moves off the portable SIMD vector API. Performance remains the same, and it looks like the codegen is identical, which is great! This also makes the `simd-accel` feature a no-op and adds a new `unstable` feature which will enable the Teddy optimization. The `-C target-feature` or `-C target-cpu` settings are no longer necessary, since this will now do runtime target feature detection. We also add a new `unstable` feature to the regex crate, which will enable this new use of std::arch. Once enabled, the Teddy optimizations becomes available automatically without any additional compile time flags.
This commit adds a copy of the Teddy searcher that works on AVX2. We don't attempt to reuse any code between them just yet, and instead just copy & paste and tweak parts of it to work on 32 bytes instead of 16. (Some parts were trickier than others. For example, @jneem figured out how to nearly compensate for the lack of a real 256-bit bytewise PALIGNR instruction, which we borrow here.) Overall, AVX2 provides a nice bump in performance.
We no longer need to enable SIMD optimizations at compile time. They are automatically enabled when regex is compiled with the `unstable` feature.
This removes our compile time SIMD flags and replaces them with the `unstable` feature, which will cause CI to use whatever CPU features are available. Ideally, we would test each important CPU feature combinations, but I'd like to avoid doing that in one CI job and instead split them out into separate CI jobs to keep CI times low. That requires more work.
This was referenced Oct 13, 2025
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR ports the regex's crate use of SIMD to
std::arch, which in turn drops the dependency on thesimdcrate and any compile time SIMD configuration requirements. As a bonus, we also add an AVX2 variant of what used to be an exclusively SSSE3 algorithm.We do this by adding a new feature
unstable, which when enabled, will cause the regex crate to automatically use SSSE3 or AVX2 optimized variants of certain literal algorithms (specifically, the Teddy multi-matcher), depending on which CPU features are available at runtime. Oncestd::archis stabilized, these optimizations will be enabled automatically.Performance improvements from no-SIMD to SSSE3 (which roughly match the status quo, when SSSE3 is enabled at compile time):
And then improvements from SSSE3 to AVX2:
🎉