Skip to content

SWAR checks for carets while other SIMD implementations don't #201

@nikneym

Description

@nikneym

Hi, first of all, thank you for this library. Learned a lot while reading through it. I've noticed SWAR implementation do checks for < and > characters while other vectored searches don't. SWAR is used as fallback to all SIMD implementations, so when the remaining buffer length is smaller than vector size (e.g < 32), caret chars won't be allowed. This might hurt consistency.

httparse/src/simd/swar.rs

Lines 129 to 156 in 3308d11

// XOR checks to catch '<' & '>' for correctness
//
// XOR can be thought of as a "distance function"
// (somewhat extrapolating from the `xor(x, x) = 0` identity and ∀ x != y: xor(x, y) != 0`
// (each u8 "xor key" providing a unique total ordering of u8)
// '<' and '>' have a "xor distance" of 2 (`xor('<', '>') = 2`)
// xor(x, '>') <= 2 => {'>', '?', '<'}
// xor(x, '<') <= 2 => {'<', '=', '>'}
//
// We assume P('=') > P('?'),
// given well/commonly-formatted URLs with querystrings contain
// a single '?' but possibly many '='
//
// Thus it's preferable/near-optimal to "xor distance" on '>',
// since we'll slowpath at most one block per URL
//
// Some rust code to sanity check this yourself:
// ```rs
// fn xordist(x: u8, n: u8) -> Vec<(char, u8)> {
// (0..=255).into_iter().map(|c| (c as char, c ^ x)).filter(|(_c, y)| *y <= n).collect()
// }
// (xordist(b'<', 2), xordist(b'>', 2))
// ```
const B3: usize = uniform_block(3); // (dist <= 2) + 1 to wrap
const BGT: usize = uniform_block(b'>');
let xgt = x ^ BGT;
let ltgtq = xgt.wrapping_sub(B3) & !xgt;

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions