Faster string comparison for arena strings

Copied from https://github.com/oxc-project/backlog/issues/46#issuecomment-2603153807:

Related to #193.

---

The other thing that's slow with strings is `==` comparison. This affects both `==` comparison itself, and hash table lookups (since they perform a `==` comparison too).

### The problem

The problem is that you're not allowed to read out of bounds of the string so, as I understand it, string comparison for short strings (which is common for JS identifiers) is either implemented as a byte-by-byte comparison (slow), or is implemented roughly like this:

```js
fn eq(a: &str, b: &str) -> bool {
    if a.len() != b.len() {
        return false;
    }

    let len = a.len();
    let a_ptr = a.as_ptr();
    let b_ptr = b.as_ptr();

    if len >= 8 {
        // Omitted for brevity
    } else if len == 0 {
        true
    } else if len >= 4 {
        a_ptr.cast::<u32>().read_unaligned() == b_ptr.cast::<u32>().read_unaligned()
        && a_ptr.add(len - 4).cast::<u32>().read_unaligned() == b_ptr.add(len - 4).cast::<u32>().read_unaligned()
    } else if len >= 2 {
        a_ptr.cast::<u16>().read_unaligned() == b_ptr.cast::<u16>().read_unaligned()
        && a_ptr.add(len - 2).cast::<u16>().read_unaligned() == b_ptr.add(len - 4).cast::<u16>().read_unaligned()
    } else {
        a_ptr.read() == b_ptr.read()
    }
}
```

This involves lots of reads, and lots of branches. And as strings are of varying lengths in a pretty random pattern, it's terrible for the branch predictor - it won't be able to correctly predict the branch.

It also involves a function call into `memcmp` (the string comparison function from `libc` which does all this).

### A more efficient solution

If we made sure that all strings are stored in a location where:

1. The string is followed by at least `8 - (len % 8)` initalized bytes (i.e. `abc` has at least 5 more initialized bytes after it).
2. Those following bytes are never aliased by a `&mut` ref, so safe to read.

...then we can always read a batch of 8 bytes, and we just mask off the ones that aren't part of the string when comparing.

```rs
fn eq(a: Ident<'_>, b: Ident<'_>) -> bool {
    if a.len() != b.len() {
        return false;
    }

    let len = a.len();
    if len >= 8 {
        // Omitted for brevity
    } else {
        let a_block = a.as_ptr().cast::<u64>().read_unaligned();
        let b_block = b.as_ptr().cast::<u64>().read_unaligned();
        // XOR. Any bytes which are equal = 0
        let mut diff = a_block ^ b_block;
        // Shift bytes which are not part of the string off the top end
        diff <<= (8 - len) * 8;
        // If all bytes in diff are 0, strings are equal
        diff == 0
    }
}
```

https://godbolt.org/z/o8x33Pa7d

### How would we achieve the requirements?

The requirements are already met for all strings in source text *except* those which end less than 8 bytes before of the file.

Handle those last ones by: In lexer/parser, copy any strings/identifiers which end within 8 bytes of end of the file to elsewhere in the arena and write padding bytes after them. The lexer already has separate code paths for handling strings/identifiers in last chunk of the file, so it wouldn't affect the fast path.

All other strings which are allocated elsewhere in the arena: Write padding bytes after them to make the length up to a multiple of 8.

Or, even better: Store all strings which aren't part of source text in a different part of the arena from other data, tightly packed. So then each string has another string directly after it, satisfying the "must have initialized bytes after" requirement. We only deal in immutable strings, so none of that data can be aliased by a `&mut` reference. Then you only need to write padding bytes after the *last* string.

### Future extension - SIMD

With SIMD AVX2 ops could compare 2 strings of up to 32 bytes length with only 6 instructions and 1 very predictable branch (with AVX-512 this goes up to 64 bytes with same number of ops). Very few identifiers are longer than 32 bytes, so this fast path would cover the vast majority of cases.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Faster string comparison for arena strings #195

The problem

A more efficient solution

How would we achieve the requirements?

Future extension - SIMD

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Faster string comparison for arena strings #195

Description

The problem

A more efficient solution

How would we achieve the requirements?

Future extension - SIMD

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions