Skip to content

perf(lexer): try perfect hash table for keyword lookup#171

Closed
YangchenYe323 wants to merge 4 commits into
mainfrom
yangchen/perf/perfect-fast-hash
Closed

perf(lexer): try perfect hash table for keyword lookup#171
YangchenYe323 wants to merge 4 commits into
mainfrom
yangchen/perf/perfect-fast-hash

Conversation

@YangchenYe323

Copy link
Copy Markdown
Contributor

Building on #140 Try a faster hashing function (CityHash) from fasthash.

Since rust-phf don't support custom hasher yet, I wrote our own macros to generate specialized keyword hash tables and tried included the macro in the source code of oxc_parser. However there's some weird issues I can't resolve (It compiles and runs when building the whole workspace but fails to run benchmarks or examples). Therefore I expanded the macro offline and put the file in crates/oxc_parser/lexer/keyword_generated.rs

@YangchenYe323 YangchenYe323 force-pushed the yangchen/perf/perfect-fast-hash branch from dc96b22 to 776750a Compare March 12, 2023 04:52
@YangchenYe323

Copy link
Copy Markdown
Contributor Author

Seems fasthash cannot build on Windows 🤦‍♂️. Here's my isolated benchmark result, just running the match_keyword on a bunch of JQuery words (Different hashing, no Simd):
Screen Shot 2023-03-11 at 10 55 40 PM

@github-actions

github-actions Bot commented Mar 12, 2023

Copy link
Copy Markdown
Contributor

Parser Benchmark Results - ubuntu-latest

group                      main                                   pr
-----                      ----                                   --
parser/babylon.max.js      1.00    108.7±0.21ms    95.0 MB/sec    1.02    110.9±0.28ms    93.1 MB/sec
parser/d3.js               1.00     14.0±0.04ms    39.1 MB/sec    1.02     14.2±0.04ms    38.5 MB/sec
parser/lodash.js           1.00      4.0±0.11ms   129.9 MB/sec    1.01      4.0±0.06ms   128.6 MB/sec
parser/pdf.js              1.00      7.7±0.07ms    52.4 MB/sec    1.02      7.8±0.08ms    51.6 MB/sec
parser/typescript.js       1.00    110.6±0.23ms    87.0 MB/sec    1.02    112.8±0.22ms    85.3 MB/sec
semantic/babylon.max.js    1.00    166.8±4.03ms    61.9 MB/sec    1.00    166.5±3.32ms    62.0 MB/sec
semantic/d3.js             1.00     16.8±1.61ms    32.4 MB/sec    1.18     19.8±2.36ms    27.6 MB/sec
semantic/lodash.js         1.19      5.0±0.31ms   101.9 MB/sec    1.00      4.2±0.10ms   121.3 MB/sec
semantic/pdf.js            1.00      8.6±1.78ms    46.9 MB/sec    1.16     10.0±1.92ms    40.3 MB/sec
semantic/typescript.js     1.00    130.8±0.94ms    73.5 MB/sec    1.01    131.6±0.84ms    73.1 MB/sec

@github-actions

github-actions Bot commented Mar 12, 2023

Copy link
Copy Markdown
Contributor

Parser Benchmark Results - windows-latest

group                      main                                   pr
-----                      ----                                   --
parser/babylon.max.js      1.00    151.6±4.30ms    68.1 MB/sec    1.03    156.3±6.19ms    66.0 MB/sec
parser/d3.js               1.00     18.9±0.46ms    28.9 MB/sec    1.05     19.8±0.72ms    27.5 MB/sec
parser/lodash.js           1.00      5.4±0.35ms    95.7 MB/sec    1.04      5.6±0.58ms    92.0 MB/sec
parser/pdf.js              1.00     10.6±0.28ms    37.8 MB/sec    1.02     10.9±0.36ms    37.0 MB/sec
parser/typescript.js       1.00    153.2±4.49ms    62.8 MB/sec    1.04    159.2±6.37ms    60.4 MB/sec
semantic/babylon.max.js    1.00   299.4±17.83ms    34.5 MB/sec    1.01   303.7±22.44ms    34.0 MB/sec
semantic/d3.js             1.00     29.3±0.93ms    18.7 MB/sec    1.10     32.1±1.64ms    17.0 MB/sec
semantic/lodash.js         1.09      6.4±0.50ms    79.8 MB/sec    1.00      5.9±0.24ms    86.6 MB/sec
semantic/pdf.js            1.01     12.4±0.45ms    32.5 MB/sec    1.00     12.2±0.93ms    33.0 MB/sec
semantic/typescript.js     1.00    234.8±9.07ms    41.0 MB/sec    1.08   253.7±19.16ms    37.9 MB/sec

@YangchenYe323

Copy link
Copy Markdown
Contributor Author

I wonder why this is even worse than rust-phf...

@Boshen

Boshen commented Mar 12, 2023

Copy link
Copy Markdown
Member

I think hashing on less values is the key here, then the next step is to change the hashing function, and then the phf table.

@YangchenYe323 YangchenYe323 force-pushed the yangchen/perf/perfect-fast-hash branch 5 times, most recently from 2c485ce to b6ef0a3 Compare March 12, 2023 09:39
@Boshen

Boshen commented Mar 12, 2023

Copy link
Copy Markdown
Member

It seems the performance of match_keyword function is so small that, you need at least a constant 5-10ms change on the typescript benchmark for it to be noticable.

@Boshen Boshen changed the title perf(lexer): try perfect hash table based on CityHash perf(lexer): try perfect hash table for keyword lookup Mar 12, 2023
@YangchenYe323 YangchenYe323 force-pushed the yangchen/perf/perfect-fast-hash branch from 2cacefd to 29c1e0d Compare March 12, 2023 21:29
@YangchenYe323

Copy link
Copy Markdown
Contributor Author

I think hashing on less values is the key here, then the next step is to change the hashing function, and then the phf table.

@Boshen After all my tweaking, my guess is that the cost of the routine is dominated by stuff in the CompactStr crate. The difference between matching and hashing is easily overshadowed. The creation, cloning of CompactStr all involves lots of branches to distinguish between heap and stack based strings.

I'm thinking maybe we should try avoiding the eager materialization of TokenValue? It seems to me that a materialized TokenValue is redundant with the reference of the source text plus a Span at least in most cases.

@Boshen

Boshen commented Mar 13, 2023

Copy link
Copy Markdown
Member

I'm thinking maybe we should try avoiding the eager materialization of TokenValue? It seems to me that a materialized TokenValue is redundant with the reference of the source text plus a Span at least in most cases.

Yes, I've been looking at this routine as well.

Most of the token values are useless, the code was originally from jsparagus, which is needed for their parser generator.

It seems to me that a materialized TokenValue is redundant with the reference of the source text plus a Span at least in most cases.

It was done this way for ease of use, since we don't have to deal with &'a str any more.

Shall we try and make the token values lazy (for keywords only) and then come back to this?

@Boshen

Boshen commented Mar 13, 2023

Copy link
Copy Markdown
Member

Shall we try and make the token values lazy (for keywords only) and then come back to this?

I'll work on this.

@YangchenYe323

Copy link
Copy Markdown
Contributor Author

Shall we try and make the token values lazy (for keywords only) and then come back to this?

I'll work on this.

I'm actually running some experiments to remove materializing token values for both keywords and identifiers. I noticed the into_bump_str can already turn an owned string to an allocator-managed reference. I'm currently looking into making a new variantTokenValue::Str(&'a str) and move materialization away from parsing logic.

@Boshen

Boshen commented Mar 13, 2023

Copy link
Copy Markdown
Member

Shall we try and make the token values lazy (for keywords only) and then come back to this?

I'll work on this.

I'm actually running some experiments to remove materializing token values for both keywords and identifiers. I noticed the into_bump_str can already turn an owned string to an allocator-managed reference. I'm currently looking into making a new variantTokenValue::Str(&'a str) and move materialization away from parsing logic.

I changed it on main.

and move materialization away from parsing logic

This may require a lot of work, but you can try. The only problem I have with String<'a> is that all downstream code will become very hard to write.

@YangchenYe323 YangchenYe323 force-pushed the yangchen/perf/perfect-fast-hash branch from 29c1e0d to 7f2e6fd Compare March 13, 2023 01:52

const HASH_TABLE_SIZE: usize = 512usize;
const HASH_TABLE_SEED: u32 = 7_484_039u32;
static KEYWORD_HASH_TABLE: [(Kind, &'static str); HASH_TABLE_SIZE] = [

@Boshen Boshen Mar 13, 2023

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two questions:

  1. Is &'static str still needed here?
  2. The table is so big! I wonder if loading this table in and out of cpu cache out weighs every things.

@YoniFei YoniFei Mar 13, 2023

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting thought.
On the one hand, insofar as the hash function is poor (far from uniform distribution / high collisions), it's possible whole cache line(s) of the table are never touched.
On the other hand, it only takes a single entry hit to pay the price - and if the distribution is poor anyhow then there's also no benefit in having the larger table.

512 does seem unnecessarily large.
You can try increasing # of attempts and see if you get a 128 or 256 size table (without taking too long).

Comment thread crates/oxc_parser/src/lexer/kind.rs
let hash_code = hash::hash_u32(extract, HASH_TABLE_SEED);
let idx = hash_code as usize % HASH_TABLE_SIZE;
let (mut kind, key) = KEYWORD_HASH_TABLE[idx];
if s.len() != key.len() || s != key {

@YoniFei YoniFei Mar 13, 2023

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where having cmov instead of branches can make a significant difference.
Can try splitting

if s.len() != key.len() {
    kind = Ident
}
if s != key {
    kind = Ident;
}

and seeing what LLVM spits out.

Or use cmov crate?

@YoniFei

YoniFei commented Mar 13, 2023

Copy link
Copy Markdown
Contributor

This may require a lot of work, but you can try. The only problem I have with String<'a> is that all downstream code will become very hard to write.

(Unrelated to this PR)
About this:
I was looking at rustc's code to see what they do for keyword matching.
They just use a regular map with FxHash, but they do the lookup coupled with their own string interning.

Anyhow, I saw this:

        // SAFETY: we can extend the arena allocation to `'static` because we
        // only access these while the arena is still alive.
        let string: &'static str = unsafe { &*(string as *const str) };

They use an arena for interned strings, but they avoid needing to propagate lifetimes everywhere.
"Cheat" for convenience, because you know that you have a high-level guarantee - usage won't outlive the arena.

FYI for your consideration (you probably already thought of it, but just in case)

@Boshen

Boshen commented Mar 13, 2023

Copy link
Copy Markdown
Member

(you probably already thought of it, but just in case)

TIL! Thank you again for looking into the dark arts of Rust.

    let string: &'static str = unsafe { &*(string as *const str) };

This is making me really uncomfortable 🤯

@YangchenYe323 YangchenYe323 force-pushed the yangchen/perf/perfect-fast-hash branch from 7f2e6fd to d7bdb3d Compare March 13, 2023 18:50
@YangchenYe323 YangchenYe323 force-pushed the yangchen/perf/perfect-fast-hash branch from 2f881e3 to b05c951 Compare March 13, 2023 20:51
@YangchenYe323

Copy link
Copy Markdown
Contributor Author

I've attached a profiling result generated from pprof-rs for parsing lodash.js. Maybe the time spent on lexer is really trivial so it's not worth all the hassle.

⬇️ Time related to lexer in a 30s sample.
Screen Shot 2023-03-13 at 4 18 55 PM

I couldn't find a way to display the whole picture nicely... I thought of introducing pprof to the codebase but it seg-faulted on typescript.js

@Boshen

Boshen commented Mar 13, 2023

Copy link
Copy Markdown
Member

@YangchenYe323 Yes, let's stop here, in conclusion the performance improvement is negligible compared to the rest.

All your work is this still valuable and we learned so much in the process, some of the code is also improved along the way. We also recorded all the process for future references.

Again, thank you so much for trying things out and learning with us on this topic.

@Boshen Boshen deleted the yangchen/perf/perfect-fast-hash branch April 15, 2023 10:26
Boshen added a commit that referenced this pull request Oct 7, 2025
Reorder Kind enum to make all keywords truly contiguous, enabling a pure range check without additional OR clauses.

## Changes

**Enum Reordering**: Moved `True`, `False`, `Null` from the literals section (after punctuation) to immediately after `Yield`, making all keywords contiguous from `Await..=Null`.

**Before**:
```rust
pub fn is_any_keyword(self) -> bool {
    matches!(self as u8, x if x >= Await as u8 && x <= Yield as u8)
        || matches!(self, True | False | Null)  // Extra check needed!
}
```

**After**:
```rust
pub fn is_any_keyword(self) -> bool {
    matches!(self as u8, x if x >= Await as u8 && x <= Null as u8)  // Pure range check!
}
```

## Assembly Impact

### Before (range + OR)
```asm
; Range check for Await..=Yield
and   w8, w0, #0xff
sub   w8, w8, #5
cmp   w8, #86
cset  w9, lo
; Additional checks for True/False/Null
cmp   w0, #168
ccmp  w0, #171, #0, ne
cset  w0, lo
orr   w0, w9, w0      ; Combine results
```
**7 instructions**

### After (pure range)
```asm
and   w8, w0, #0xff
sub   w8, w8, #5
cmp   w8, #89
cset  w0, lo
```
**4 instructions** (43% reduction!)

## Why This Works

`True`, `False`, and `Null` are **both** keywords AND literals per ECMAScript spec:
- Keywords: Reserved words that cannot be used as identifiers
- Literals: Values with specific meanings

Grouping them with keywords is semantically correct and enables better optimization. The `is_literal()` function explicitly checks for these tokens, so their position in the enum doesn't affect correctness.

## Performance

- Called from `advance()` on **every token**
- **3 fewer instructions** on the hottest path
- **Simpler control flow** = better branch prediction
- **No OR operation** = faster execution

Added comprehensive tests including verification that True/False/Null work correctly as both keywords and literals.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Boshen added a commit that referenced this pull request Oct 7, 2025
Reorder Kind enum to make all keywords truly contiguous, enabling a pure range check without additional OR clauses.

## Changes

**Enum Reordering**: Moved `True`, `False`, `Null` from the literals section (after punctuation) to immediately after `Yield`, making all keywords contiguous from `Await..=Null`.

**Before**:
```rust
pub fn is_any_keyword(self) -> bool {
    matches!(self as u8, x if x >= Await as u8 && x <= Yield as u8)
        || matches!(self, True | False | Null)  // Extra check needed!
}
```

**After**:
```rust
pub fn is_any_keyword(self) -> bool {
    matches!(self as u8, x if x >= Await as u8 && x <= Null as u8)  // Pure range check!
}
```

## Assembly Impact

### Before (range + OR)
```asm
; Range check for Await..=Yield
and   w8, w0, #0xff
sub   w8, w8, #5
cmp   w8, #86
cset  w9, lo
; Additional checks for True/False/Null
cmp   w0, #168
ccmp  w0, #171, #0, ne
cset  w0, lo
orr   w0, w9, w0      ; Combine results
```
**7 instructions**

### After (pure range)
```asm
and   w8, w0, #0xff
sub   w8, w8, #5
cmp   w8, #89
cset  w0, lo
```
**4 instructions** (43% reduction!)

## Why This Works

`True`, `False`, and `Null` are **both** keywords AND literals per ECMAScript spec:
- Keywords: Reserved words that cannot be used as identifiers
- Literals: Values with specific meanings

Grouping them with keywords is semantically correct and enables better optimization. The `is_literal()` function explicitly checks for these tokens, so their position in the enum doesn't affect correctness.

## Performance

- Called from `advance()` on **every token**
- **3 fewer instructions** on the hottest path
- **Simpler control flow** = better branch prediction
- **No OR operation** = faster execution

Added comprehensive tests including verification that True/False/Null work correctly as both keywords and literals.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants