perf(lexer): try perfect hash table for keyword lookup#171
perf(lexer): try perfect hash table for keyword lookup#171YangchenYe323 wants to merge 4 commits into
Conversation
dc96b22 to
776750a
Compare
Parser Benchmark Results - ubuntu-latest |
Parser Benchmark Results - windows-latest |
|
I wonder why this is even worse than rust-phf... |
|
I think hashing on less values is the key here, then the next step is to change the hashing function, and then the phf table. |
2c485ce to
b6ef0a3
Compare
|
It seems the performance of |
2cacefd to
29c1e0d
Compare
@Boshen After all my tweaking, my guess is that the cost of the routine is dominated by stuff in the I'm thinking maybe we should try avoiding the eager materialization of TokenValue? It seems to me that a materialized TokenValue is redundant with the reference of the source text plus a Span at least in most cases. |
Yes, I've been looking at this routine as well. Most of the token values are useless, the code was originally from jsparagus, which is needed for their parser generator.
It was done this way for ease of use, since we don't have to deal with Shall we try and make the token values lazy (for keywords only) and then come back to this? |
I'll work on this. |
I'm actually running some experiments to remove materializing token values for both keywords and identifiers. I noticed the |
I changed it on main.
This may require a lot of work, but you can try. The only problem I have with |
29c1e0d to
7f2e6fd
Compare
|
|
||
| const HASH_TABLE_SIZE: usize = 512usize; | ||
| const HASH_TABLE_SEED: u32 = 7_484_039u32; | ||
| static KEYWORD_HASH_TABLE: [(Kind, &'static str); HASH_TABLE_SIZE] = [ |
There was a problem hiding this comment.
Two questions:
- Is
&'static strstill needed here? - The table is so big! I wonder if loading this table in and out of cpu cache out weighs every things.
There was a problem hiding this comment.
Interesting thought.
On the one hand, insofar as the hash function is poor (far from uniform distribution / high collisions), it's possible whole cache line(s) of the table are never touched.
On the other hand, it only takes a single entry hit to pay the price - and if the distribution is poor anyhow then there's also no benefit in having the larger table.
512 does seem unnecessarily large.
You can try increasing # of attempts and see if you get a 128 or 256 size table (without taking too long).
| let hash_code = hash::hash_u32(extract, HASH_TABLE_SEED); | ||
| let idx = hash_code as usize % HASH_TABLE_SIZE; | ||
| let (mut kind, key) = KEYWORD_HASH_TABLE[idx]; | ||
| if s.len() != key.len() || s != key { |
There was a problem hiding this comment.
This is where having cmov instead of branches can make a significant difference.
Can try splitting
if s.len() != key.len() {
kind = Ident
}
if s != key {
kind = Ident;
}and seeing what LLVM spits out.
Or use cmov crate?
(Unrelated to this PR) Anyhow, I saw this: // SAFETY: we can extend the arena allocation to `'static` because we
// only access these while the arena is still alive.
let string: &'static str = unsafe { &*(string as *const str) };They use an arena for interned strings, but they avoid needing to propagate lifetimes everywhere. FYI for your consideration (you probably already thought of it, but just in case) |
TIL! Thank you again for looking into the dark arts of Rust.
This is making me really uncomfortable 🤯 |
7f2e6fd to
d7bdb3d
Compare
2f881e3 to
b05c951
Compare
|
I've attached a profiling result generated from pprof-rs for parsing lodash.js. Maybe the time spent on lexer is really trivial so it's not worth all the hassle. ⬇️ Time related to lexer in a 30s sample. I couldn't find a way to display the whole picture nicely... I thought of introducing pprof to the codebase but it seg-faulted on typescript.js |
|
@YangchenYe323 Yes, let's stop here, in conclusion the performance improvement is negligible compared to the rest. All your work is this still valuable and we learned so much in the process, some of the code is also improved along the way. We also recorded all the process for future references. Again, thank you so much for trying things out and learning with us on this topic. |
Reorder Kind enum to make all keywords truly contiguous, enabling a pure range check without additional OR clauses.
## Changes
**Enum Reordering**: Moved `True`, `False`, `Null` from the literals section (after punctuation) to immediately after `Yield`, making all keywords contiguous from `Await..=Null`.
**Before**:
```rust
pub fn is_any_keyword(self) -> bool {
matches!(self as u8, x if x >= Await as u8 && x <= Yield as u8)
|| matches!(self, True | False | Null) // Extra check needed!
}
```
**After**:
```rust
pub fn is_any_keyword(self) -> bool {
matches!(self as u8, x if x >= Await as u8 && x <= Null as u8) // Pure range check!
}
```
## Assembly Impact
### Before (range + OR)
```asm
; Range check for Await..=Yield
and w8, w0, #0xff
sub w8, w8, #5
cmp w8, #86
cset w9, lo
; Additional checks for True/False/Null
cmp w0, #168
ccmp w0, #171, #0, ne
cset w0, lo
orr w0, w9, w0 ; Combine results
```
**7 instructions**
### After (pure range)
```asm
and w8, w0, #0xff
sub w8, w8, #5
cmp w8, #89
cset w0, lo
```
**4 instructions** (43% reduction!)
## Why This Works
`True`, `False`, and `Null` are **both** keywords AND literals per ECMAScript spec:
- Keywords: Reserved words that cannot be used as identifiers
- Literals: Values with specific meanings
Grouping them with keywords is semantically correct and enables better optimization. The `is_literal()` function explicitly checks for these tokens, so their position in the enum doesn't affect correctness.
## Performance
- Called from `advance()` on **every token**
- **3 fewer instructions** on the hottest path
- **Simpler control flow** = better branch prediction
- **No OR operation** = faster execution
Added comprehensive tests including verification that True/False/Null work correctly as both keywords and literals.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Reorder Kind enum to make all keywords truly contiguous, enabling a pure range check without additional OR clauses.
## Changes
**Enum Reordering**: Moved `True`, `False`, `Null` from the literals section (after punctuation) to immediately after `Yield`, making all keywords contiguous from `Await..=Null`.
**Before**:
```rust
pub fn is_any_keyword(self) -> bool {
matches!(self as u8, x if x >= Await as u8 && x <= Yield as u8)
|| matches!(self, True | False | Null) // Extra check needed!
}
```
**After**:
```rust
pub fn is_any_keyword(self) -> bool {
matches!(self as u8, x if x >= Await as u8 && x <= Null as u8) // Pure range check!
}
```
## Assembly Impact
### Before (range + OR)
```asm
; Range check for Await..=Yield
and w8, w0, #0xff
sub w8, w8, #5
cmp w8, #86
cset w9, lo
; Additional checks for True/False/Null
cmp w0, #168
ccmp w0, #171, #0, ne
cset w0, lo
orr w0, w9, w0 ; Combine results
```
**7 instructions**
### After (pure range)
```asm
and w8, w0, #0xff
sub w8, w8, #5
cmp w8, #89
cset w0, lo
```
**4 instructions** (43% reduction!)
## Why This Works
`True`, `False`, and `Null` are **both** keywords AND literals per ECMAScript spec:
- Keywords: Reserved words that cannot be used as identifiers
- Literals: Values with specific meanings
Grouping them with keywords is semantically correct and enables better optimization. The `is_literal()` function explicitly checks for these tokens, so their position in the enum doesn't affect correctness.
## Performance
- Called from `advance()` on **every token**
- **3 fewer instructions** on the hottest path
- **Simpler control flow** = better branch prediction
- **No OR operation** = faster execution
Added comprehensive tests including verification that True/False/Null work correctly as both keywords and literals.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>


Building on #140 Try a faster hashing function (CityHash) from fasthash.
Since rust-phf don't support custom hasher yet, I wrote our own macros to generate specialized keyword hash tables and tried included the macro in the source code of oxc_parser. However there's some weird issues I can't resolve (It compiles and runs when building the whole workspace but fails to run benchmarks or examples). Therefore I expanded the macro offline and put the file in
crates/oxc_parser/lexer/keyword_generated.rs