-
-
Notifications
You must be signed in to change notification settings - Fork 124
Closed
Description
Hi, first of all, thank you for this library. Learned a lot while reading through it. I've noticed SWAR implementation do checks for < and > characters while other vectored searches don't. SWAR is used as fallback to all SIMD implementations, so when the remaining buffer length is smaller than vector size (e.g < 32), caret chars won't be allowed. This might hurt consistency.
Lines 129 to 156 in 3308d11
| // XOR checks to catch '<' & '>' for correctness | |
| // | |
| // XOR can be thought of as a "distance function" | |
| // (somewhat extrapolating from the `xor(x, x) = 0` identity and ∀ x != y: xor(x, y) != 0` | |
| // (each u8 "xor key" providing a unique total ordering of u8) | |
| // '<' and '>' have a "xor distance" of 2 (`xor('<', '>') = 2`) | |
| // xor(x, '>') <= 2 => {'>', '?', '<'} | |
| // xor(x, '<') <= 2 => {'<', '=', '>'} | |
| // | |
| // We assume P('=') > P('?'), | |
| // given well/commonly-formatted URLs with querystrings contain | |
| // a single '?' but possibly many '=' | |
| // | |
| // Thus it's preferable/near-optimal to "xor distance" on '>', | |
| // since we'll slowpath at most one block per URL | |
| // | |
| // Some rust code to sanity check this yourself: | |
| // ```rs | |
| // fn xordist(x: u8, n: u8) -> Vec<(char, u8)> { | |
| // (0..=255).into_iter().map(|c| (c as char, c ^ x)).filter(|(_c, y)| *y <= n).collect() | |
| // } | |
| // (xordist(b'<', 2), xordist(b'>', 2)) | |
| // ``` | |
| const B3: usize = uniform_block(3); // (dist <= 2) + 1 to wrap | |
| const BGT: usize = uniform_block(b'>'); | |
| let xgt = x ^ BGT; | |
| let ltgtq = xgt.wrapping_sub(B3) & !xgt; |
Metadata
Metadata
Assignees
Labels
No labels