perf(lexer): try perfect hash table for keyword lookup by YangchenYe323 · Pull Request #171 · oxc-project/oxc

YangchenYe323 · 2023-03-12T04:51:48Z

Building on #140 Try a faster hashing function (CityHash) from fasthash.

Since rust-phf don't support custom hasher yet, I wrote our own macros to generate specialized keyword hash tables and tried included the macro in the source code of oxc_parser. However there's some weird issues I can't resolve (It compiles and runs when building the whole workspace but fails to run benchmarks or examples). Therefore I expanded the macro offline and put the file in crates/oxc_parser/lexer/keyword_generated.rs

YangchenYe323 · 2023-03-12T04:59:30Z

Seems fasthash cannot build on Windows 🤦‍♂️. Here's my isolated benchmark result, just running the match_keyword on a bunch of JQuery words (Different hashing, no Simd):

github-actions · 2023-03-12T05:33:50Z

Parser Benchmark Results - ubuntu-latest

group                      main                                   pr
-----                      ----                                   --
parser/babylon.max.js      1.00    108.7±0.21ms    95.0 MB/sec    1.02    110.9±0.28ms    93.1 MB/sec
parser/d3.js               1.00     14.0±0.04ms    39.1 MB/sec    1.02     14.2±0.04ms    38.5 MB/sec
parser/lodash.js           1.00      4.0±0.11ms   129.9 MB/sec    1.01      4.0±0.06ms   128.6 MB/sec
parser/pdf.js              1.00      7.7±0.07ms    52.4 MB/sec    1.02      7.8±0.08ms    51.6 MB/sec
parser/typescript.js       1.00    110.6±0.23ms    87.0 MB/sec    1.02    112.8±0.22ms    85.3 MB/sec
semantic/babylon.max.js    1.00    166.8±4.03ms    61.9 MB/sec    1.00    166.5±3.32ms    62.0 MB/sec
semantic/d3.js             1.00     16.8±1.61ms    32.4 MB/sec    1.18     19.8±2.36ms    27.6 MB/sec
semantic/lodash.js         1.19      5.0±0.31ms   101.9 MB/sec    1.00      4.2±0.10ms   121.3 MB/sec
semantic/pdf.js            1.00      8.6±1.78ms    46.9 MB/sec    1.16     10.0±1.92ms    40.3 MB/sec
semantic/typescript.js     1.00    130.8±0.94ms    73.5 MB/sec    1.01    131.6±0.84ms    73.1 MB/sec

github-actions · 2023-03-12T05:37:46Z

Parser Benchmark Results - windows-latest

group                      main                                   pr
-----                      ----                                   --
parser/babylon.max.js      1.00    151.6±4.30ms    68.1 MB/sec    1.03    156.3±6.19ms    66.0 MB/sec
parser/d3.js               1.00     18.9±0.46ms    28.9 MB/sec    1.05     19.8±0.72ms    27.5 MB/sec
parser/lodash.js           1.00      5.4±0.35ms    95.7 MB/sec    1.04      5.6±0.58ms    92.0 MB/sec
parser/pdf.js              1.00     10.6±0.28ms    37.8 MB/sec    1.02     10.9±0.36ms    37.0 MB/sec
parser/typescript.js       1.00    153.2±4.49ms    62.8 MB/sec    1.04    159.2±6.37ms    60.4 MB/sec
semantic/babylon.max.js    1.00   299.4±17.83ms    34.5 MB/sec    1.01   303.7±22.44ms    34.0 MB/sec
semantic/d3.js             1.00     29.3±0.93ms    18.7 MB/sec    1.10     32.1±1.64ms    17.0 MB/sec
semantic/lodash.js         1.09      6.4±0.50ms    79.8 MB/sec    1.00      5.9±0.24ms    86.6 MB/sec
semantic/pdf.js            1.01     12.4±0.45ms    32.5 MB/sec    1.00     12.2±0.93ms    33.0 MB/sec
semantic/typescript.js     1.00    234.8±9.07ms    41.0 MB/sec    1.08   253.7±19.16ms    37.9 MB/sec

YangchenYe323 · 2023-03-12T05:49:06Z

I wonder why this is even worse than rust-phf...

Boshen · 2023-03-12T06:10:35Z

I think hashing on less values is the key here, then the next step is to change the hashing function, and then the phf table.

Boshen · 2023-03-12T14:03:33Z

It seems the performance of match_keyword function is so small that, you need at least a constant 5-10ms change on the typescript benchmark for it to be noticable.

YangchenYe323 · 2023-03-12T22:07:06Z

I think hashing on less values is the key here, then the next step is to change the hashing function, and then the phf table.

@Boshen After all my tweaking, my guess is that the cost of the routine is dominated by stuff in the CompactStr crate. The difference between matching and hashing is easily overshadowed. The creation, cloning of CompactStr all involves lots of branches to distinguish between heap and stack based strings.

I'm thinking maybe we should try avoiding the eager materialization of TokenValue? It seems to me that a materialized TokenValue is redundant with the reference of the source text plus a Span at least in most cases.

Boshen · 2023-03-13T00:28:51Z

I'm thinking maybe we should try avoiding the eager materialization of TokenValue? It seems to me that a materialized TokenValue is redundant with the reference of the source text plus a Span at least in most cases.

Yes, I've been looking at this routine as well.

Most of the token values are useless, the code was originally from jsparagus, which is needed for their parser generator.

It seems to me that a materialized TokenValue is redundant with the reference of the source text plus a Span at least in most cases.

It was done this way for ease of use, since we don't have to deal with &'a str any more.

Shall we try and make the token values lazy (for keywords only) and then come back to this?

Boshen · 2023-03-13T00:45:25Z

Shall we try and make the token values lazy (for keywords only) and then come back to this?

I'll work on this.

YangchenYe323 · 2023-03-13T00:55:03Z

Shall we try and make the token values lazy (for keywords only) and then come back to this?

I'll work on this.

I'm actually running some experiments to remove materializing token values for both keywords and identifiers. I noticed the into_bump_str can already turn an owned string to an allocator-managed reference. I'm currently looking into making a new variantTokenValue::Str(&'a str) and move materialization away from parsing logic.

Boshen · 2023-03-13T01:38:02Z

Shall we try and make the token values lazy (for keywords only) and then come back to this?

I'll work on this.

I'm actually running some experiments to remove materializing token values for both keywords and identifiers. I noticed the into_bump_str can already turn an owned string to an allocator-managed reference. I'm currently looking into making a new variantTokenValue::Str(&'a str) and move materialization away from parsing logic.

I changed it on main.

and move materialization away from parsing logic

This may require a lot of work, but you can try. The only problem I have with String<'a> is that all downstream code will become very hard to write.

Boshen · 2023-03-13T02:13:38Z

+
+const HASH_TABLE_SIZE: usize = 512usize;
+const HASH_TABLE_SEED: u32 = 7_484_039u32;
+static KEYWORD_HASH_TABLE: [(Kind, &'static str); HASH_TABLE_SIZE] = [


Two questions:

Is &'static str still needed here?

The table is so big! I wonder if loading this table in and out of cpu cache out weighs every things.

Interesting thought.
On the one hand, insofar as the hash function is poor (far from uniform distribution / high collisions), it's possible whole cache line(s) of the table are never touched.
On the other hand, it only takes a single entry hit to pay the price - and if the distribution is poor anyhow then there's also no benefit in having the larger table.

512 does seem unnecessarily large.
You can try increasing # of attempts and see if you get a 128 or 256 size table (without taking too long).

YoniFei · 2023-03-13T03:34:56Z

+    let hash_code = hash::hash_u32(extract, HASH_TABLE_SEED);
+    let idx = hash_code as usize % HASH_TABLE_SIZE;
+    let (mut kind, key) = KEYWORD_HASH_TABLE[idx];
+    if s.len() != key.len() || s != key {


This is where having cmov instead of branches can make a significant difference.
Can try splitting

if s.len() != key.len() { kind = Ident } if s != key { kind = Ident; }

and seeing what LLVM spits out.

Or use cmov crate?

YoniFei · 2023-03-13T03:58:26Z

This may require a lot of work, but you can try. The only problem I have with String<'a> is that all downstream code will become very hard to write.

(Unrelated to this PR)
About this:
I was looking at rustc's code to see what they do for keyword matching.
They just use a regular map with FxHash, but they do the lookup coupled with their own string interning.

Anyhow, I saw this:

        // SAFETY: we can extend the arena allocation to `'static` because we
        // only access these while the arena is still alive.
        let string: &'static str = unsafe { &*(string as *const str) };

They use an arena for interned strings, but they avoid needing to propagate lifetimes everywhere.
"Cheat" for convenience, because you know that you have a high-level guarantee - usage won't outlive the arena.

FYI for your consideration (you probably already thought of it, but just in case)

Boshen · 2023-03-13T05:12:17Z

(you probably already thought of it, but just in case)

TIL! Thank you again for looking into the dark arts of Rust.

    let string: &'static str = unsafe { &*(string as *const str) };

This is making me really uncomfortable 🤯

YangchenYe323 · 2023-03-13T21:22:36Z

I've attached a profiling result generated from pprof-rs for parsing lodash.js. Maybe the time spent on lexer is really trivial so it's not worth all the hassle.

⬇️ Time related to lexer in a 30s sample.

I couldn't find a way to display the whole picture nicely... I thought of introducing pprof to the codebase but it seg-faulted on typescript.js

Boshen · 2023-03-13T23:23:43Z

@YangchenYe323 Yes, let's stop here, in conclusion the performance improvement is negligible compared to the rest.

All your work is this still valuable and we learned so much in the process, some of the code is also improved along the way. We also recorded all the process for future references.

Again, thank you so much for trying things out and learning with us on this topic.

Reorder Kind enum to make all keywords truly contiguous, enabling a pure range check without additional OR clauses. ## Changes **Enum Reordering**: Moved `True`, `False`, `Null` from the literals section (after punctuation) to immediately after `Yield`, making all keywords contiguous from `Await..=Null`. **Before**: ```rust pub fn is_any_keyword(self) -> bool { matches!(self as u8, x if x >= Await as u8 && x <= Yield as u8) || matches!(self, True | False | Null) // Extra check needed! } ``` **After**: ```rust pub fn is_any_keyword(self) -> bool { matches!(self as u8, x if x >= Await as u8 && x <= Null as u8) // Pure range check! } ``` ## Assembly Impact ### Before (range + OR) ```asm ; Range check for Await..=Yield and w8, w0, #0xff sub w8, w8, #5 cmp w8, #86 cset w9, lo ; Additional checks for True/False/Null cmp w0, #168 ccmp w0, #171, #0, ne cset w0, lo orr w0, w9, w0 ; Combine results ``` **7 instructions** ### After (pure range) ```asm and w8, w0, #0xff sub w8, w8, #5 cmp w8, #89 cset w0, lo ``` **4 instructions** (43% reduction!) ## Why This Works `True`, `False`, and `Null` are **both** keywords AND literals per ECMAScript spec: - Keywords: Reserved words that cannot be used as identifiers - Literals: Values with specific meanings Grouping them with keywords is semantically correct and enables better optimization. The `is_literal()` function explicitly checks for these tokens, so their position in the enum doesn't affect correctness. ## Performance - Called from `advance()` on **every token** - **3 fewer instructions** on the hottest path - **Simpler control flow** = better branch prediction - **No OR operation** = faster execution Added comprehensive tests including verification that True/False/Null work correctly as both keywords and literals. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

YangchenYe323 force-pushed the yangchen/perf/perfect-fast-hash branch from dc96b22 to 776750a Compare March 12, 2023 04:52

YangchenYe323 force-pushed the yangchen/perf/perfect-fast-hash branch 5 times, most recently from 2c485ce to b6ef0a3 Compare March 12, 2023 09:39

Boshen changed the title ~~perf(lexer): try perfect hash table based on CityHash~~ perf(lexer): try perfect hash table for keyword lookup Mar 12, 2023

Boshen mentioned this pull request Mar 12, 2023

perf(lexer): low hanging fruits #151

Closed

YangchenYe323 force-pushed the yangchen/perf/perfect-fast-hash branch from 2cacefd to 29c1e0d Compare March 12, 2023 21:29

Boshen mentioned this pull request Mar 13, 2023

refactor(lexer): change TokenValue::String(Atom) to TokenValue::String(&str) #174

Merged

YangchenYe323 force-pushed the yangchen/perf/perfect-fast-hash branch from 29c1e0d to 7f2e6fd Compare March 13, 2023 01:52

Boshen reviewed Mar 13, 2023

View reviewed changes

YoniFei reviewed Mar 13, 2023

View reviewed changes

Comment thread crates/oxc_parser/src/lexer/kind.rs

YoniFei reviewed Mar 13, 2023

View reviewed changes

YangchenYe323 added 2 commits March 13, 2023 13:03

perf(lexer): try phf for keyword matching

0d6e6df

Separate kind & str and use cmov intrinsics

d7bdb3d

YangchenYe323 force-pushed the yangchen/perf/perfect-fast-hash branch from 7f2e6fd to d7bdb3d Compare March 13, 2023 18:50

YangchenYe323 added 2 commits March 13, 2023 14:15

Use 256 hash table

56349f4

Move table generation to task/phf_gen

b05c951

YangchenYe323 force-pushed the yangchen/perf/perfect-fast-hash branch from 2f881e3 to b05c951 Compare March 13, 2023 20:51

YangchenYe323 closed this Mar 14, 2023

Boshen deleted the yangchen/perf/perfect-fast-hash branch April 15, 2023 10:26

Uh oh!

Uh oh!

Conversation

YangchenYe323 commented Mar 12, 2023

Uh oh!

YangchenYe323 commented Mar 12, 2023

Uh oh!

github-actions Bot commented Mar 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Parser Benchmark Results - ubuntu-latest

Uh oh!

github-actions Bot commented Mar 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Parser Benchmark Results - windows-latest

Uh oh!

YangchenYe323 commented Mar 12, 2023

Uh oh!

Boshen commented Mar 12, 2023

Uh oh!

Boshen commented Mar 12, 2023

Uh oh!

YangchenYe323 commented Mar 12, 2023

Uh oh!

Boshen commented Mar 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Boshen commented Mar 13, 2023

Uh oh!

YangchenYe323 commented Mar 13, 2023

Uh oh!

Boshen commented Mar 13, 2023

Uh oh!

Boshen Mar 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

YoniFei Mar 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

YoniFei Mar 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

YoniFei commented Mar 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Boshen commented Mar 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

YangchenYe323 commented Mar 13, 2023

Uh oh!

Boshen commented Mar 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions Bot commented Mar 12, 2023 •

edited

Loading

github-actions Bot commented Mar 12, 2023 •

edited

Loading

Boshen commented Mar 13, 2023 •

edited

Loading

Boshen Mar 13, 2023 •

edited

Loading

YoniFei Mar 13, 2023 •

edited

Loading

YoniFei Mar 13, 2023 •

edited

Loading

YoniFei commented Mar 13, 2023 •

edited

Loading

Boshen commented Mar 13, 2023 •

edited

Loading

Boshen commented Mar 13, 2023 •

edited

Loading