Improve parse() performance by romgrk · Pull Request #278 · json5/json5

romgrk · 2022-06-14T01:25:13Z

Changes

I was reading the parser code and saw a few things that could improve the performance, they are summarized below.

Use integers constants for states

I saw that the state constants where strings. I thought that it could be a quick gain to replace them with named integer constants. The benchmark results I get are that there is a ~20% improvement to parse() by doing so.

Avoid allocation

While I was looking for other improvements, I also noticed that the token variable was assigned a new object everytime, I therefore created an object tokenRef, which is updated every time token needs to be assigned. I couldn't get a statistically significant improvement here, results were either 1% better or equal. But I've included it anyway because removing allocations alleviates the job of the garbage collector, which is something that's hard to benchmark but always nice to have.

Use codepoints instead of chars

Similarly to the first point, I thought it could be faster to use (integer) codepoints instead of chars. I ran the code through some code to get new code:

newCode = fs.readFileSync('./lib/parse.js').toString()
  .replaceAll(/'(\\\w|.|\\u....)'/g, (m) => `${eval(m).codePointAt(0)} /* ${m} */`)

And made sure functions were adapted to receive codepoints instead of chars. This last change had a bigger impact, I would say about 70-80% improvement over the previous one.

Summary

The changes above end up giving an improvement of around 100% compared to the original version. All three of them are independant of each other, so you could choose to pick some of them only. If this PR seems too dauting to read in one go, each commit is complete and represents one change.

The benchmark code: https://gist.github.com/romgrk/eb4a2a16422e50bc37e811c934a66f8f

Small file:

$ node benchmark.js ./package.json5
parseOriginal x 4,210 ops/sec ±0.93% (95 runs sampled)
parseImproved x 8,285 ops/sec ±0.92% (94 runs sampled)
Fastest is parseImproved

Large file:

$ node benchmark.js ./data.json5
parseOriginal x 255 ops/sec ±0.31% (92 runs sampled)
parseImproved x 544 ops/sec ±0.78% (96 runs sampled)
Fastest is parseImproved

# `data.json5` is `package.json5` pasted a bunch of times in an array, with a few 
# added values to hit the numbers & booleans codepaths

jordanbtucker · 2022-06-14T19:29:10Z

Thank you for these performance suggestions! Would you mind rebasing as we've made some CI changes and I'd like to see the results of this PR in CI. Thanks!

romgrk · 2022-06-14T20:56:51Z

Done, I've rebased on main.

edit: And it's failing due to coverage. How can I see which lines aren't covered?

jordanbtucker

Looks great, except for a few minor changes! I'll take a closer look at everything when I have a bit more time.

lib/parse.js

lib/util.js

jordanbtucker · 2022-06-14T22:53:49Z

To view coverage, run npm run coverage. This will create a file at coverage/lcov-report/index.html where you can review coverage in a browser.

lib/util.js

romgrk · 2022-06-15T00:10:14Z

Build passing & all good on my side, this can be merged if you're satisfied with it.

jordanbtucker

Everything is looking good. We just need to run npm run lint and fix one issue and a nitpick.

lib/parse.js

romgrk · 2022-06-15T03:00:33Z

Linting complete.

edit: Oh wait no missed the 4 digits.
edit: Ok now we're good.

jordanbtucker · 2022-06-15T03:41:31Z

Awesome! I'm going to sit on this PR for a day just to make sure I'm not missing anything and vet the code a bit more. I really appreciate the code improvements and well-written description.

lib/parse.js

romgrk · 2022-10-14T11:58:09Z

Ping @jordanbtucker: is there still interest for this PR? If not feel free to close.

I should also note, for completeness' sake, that if someone is really after performance, then using a pure javascript implementation of JSON5 won't cut it. The native JSON implementation in node/V8 is 20-40x faster in the same benchmarks used above. But saving cycles is always nice.

jordanbtucker

Everything is good except for the hex numbers. It would be great if you could refactor them as constants instead of using block comments to identify them.

jordanbtucker · 2022-10-16T06:04:19Z

If you're not able to make those changes, I don't mind merging this and making them myself.

jordanbtucker · 2022-10-16T06:04:43Z

Also, I think I might rebase this against the v3 branch instead of main.

romgrk · 2022-10-16T06:50:29Z

Everything is good except for the the hex numbers. It would be great if you could refactor them as constants instead of using block comments to identify them.

Something like this?

const C_0 = 0x30
const C_1 = 0x31
// ...

jordanbtucker · 2022-10-16T07:03:42Z

Yeah, that works. Maybe use CHAR_0 instead though, just to make it more clear that these represent characters?

romgrk · 2022-11-30T14:45:11Z

I've been trying to find time for this but I've been super busy with work. If you're ok with merging as-is I think it's going to be faster. I've rebased on master, do you also need it rebased on v3?

ghost

romgrk:perf-parse

mbehr1 · 2023-04-15T08:29:56Z

any update here? Imho those improvements would be quite useful.

romgrk · 2023-06-11T18:10:18Z

Ping @jordanbtucker, if you can list the minimum required changes to merge this, I might be able to complete it soon.

niczak · 2024-04-20T23:12:51Z

Somewhat criminal this branch was never merged.

romgrk · 2024-04-21T04:15:01Z

Open-source maintainers are free to merge or not, performance might not be a priority for this project. Merging this PR also means the maintainer(s) would need to support these changes, which do decrease readability a bit :|

Note that from what I remember while benchmarking this PR, even with these changes the native JSON.parse is something like 20x faster so anyone using this JSON5 parser should be using it because readability is more important for their use-case than performance. If you need to pass a lot of data around, use standard JSON. I would find it nice to add a note on the readme about the performance characteristics of the project though, to avoid people using it in cases where it wouldn't be a good fit.

kshetline · 2025-06-01T22:20:42Z

@romgrk, while this PR sits here I've found a home for a somewhat modified version of your suggested changes in my own JSON5 fork. Thanks for the ideas!

jackytank · 2025-06-02T10:27:06Z

+1 vote to merge this PR. as I just clone this PR's branch and do the benchmark myself and found it indeed improve the performance by x2 simply use enum integer instead of enum string (I lied to make my vote more creditable :v)

kshetline · 2025-06-03T01:18:19Z

I added some benchmarking to my own spin-off project. The benchmarking measures the performance of the built-in JSON.parse() function, and how the current JSON5, my old JSON-Z without the above enhancements (plus one other), and my new JSON-Z perform.

--------- 40 MB sample benchmarks (ranked) ----------
1. JSON      x 11.14 ops/sec ±5.31% (33 runs sampled)
2. JSONZ     x  0.99 ops/sec ±3.17% (7 runs sampled)
3. JSON5     x  0.48 ops/sec ±5.21% (6 runs sampled)
4. Old JSONZ x  0.35 ops/sec ±7.57% (5 runs sampled)
-----------------------------------------------------

--- 60 MB long string sample benchmarks (ranked) ----
1. JSON      x 7.85 ops/sec ±10.93% (24 runs sampled)
2. JSONZ     x 0.75 ops/sec ±2.01% (6 runs sampled)
3. JSON5     x 0.26 ops/sec ±7.38% (5 runs sampled)
4. Old JSONZ x 0.18 ops/sec ±0.61% (5 runs sampled)
-----------------------------------------------------

The test material has to be standard JSON, of course, or JSON.parse() couldn't be included in the benchmarks.

The first sample is a huge sample file I found, naturally enough, googling for "large JSON sample". There might well be better samples out there, but this is probably decent. (I'll may try to find another example that exercises number, boolean, and null parsing more.)

The second sample is simply the first sample with a 20MB-long single string tacked on, related to another speed improvement from this PR: #233

Clearly nothing else comes close to JSON.parse() for speed, but at least these enhancements narrow the gap significantly.

jordanbtucker self-assigned this Jun 14, 2022

jordanbtucker added the enhancement 👍 label Jun 14, 2022

romgrk force-pushed the perf-parse branch from 955c456 to 649b6ec Compare June 14, 2022 20:56

jordanbtucker requested changes Jun 14, 2022

View reviewed changes

lib/parse.js Outdated Show resolved Hide resolved

lib/parse.js Show resolved Hide resolved

lib/util.js Outdated Show resolved Hide resolved

lib/util.js Show resolved Hide resolved

romgrk commented Jun 14, 2022

View reviewed changes

lib/util.js Outdated Show resolved Hide resolved

jordanbtucker requested changes Jun 15, 2022

View reviewed changes

lib/parse.js Outdated Show resolved Hide resolved

lib/parse.js Outdated Show resolved Hide resolved

romgrk force-pushed the perf-parse branch from a41f883 to 352f447 Compare June 15, 2022 03:05

jordanbtucker approved these changes Jun 15, 2022

View reviewed changes

HolgerJeromin reviewed Jun 15, 2022

View reviewed changes

lib/parse.js Show resolved Hide resolved

jordanbtucker added this to the v2.3.0 milestone Jun 18, 2022

jordanbtucker requested changes Oct 16, 2022

View reviewed changes

romgrk added 8 commits November 30, 2022 09:44

perf(parse): use int constants instead of strings

660518e

perf(parse): use int constants instead of strings

d11595b

perf(parse): avoid allocating object

40fb84a

perf(parse): use codepoints instead of chars

a14f6e5

fix(parse): bugs introduced in previous commit

db672b1

perf(parse): avoid more allocations

86d472b

fix: coverage & node compatibility issues

8880916

fix(util): restore isSpaceSeparator behavior

6176750

lint

a5a08ce

romgrk force-pushed the perf-parse branch from 352f447 to a5a08ce Compare November 30, 2022 14:44

ghost reviewed Jan 9, 2023

View reviewed changes

romgrk mentioned this pull request Sep 5, 2023

[DataGrid] Make the pinned rows be on top of the no rows overlay mui/mui-x#9986

Merged

1 task

This comment was marked as spam.

Sign in to view

This comment was marked as off-topic.

Sign in to view

Conversation

romgrk commented Jun 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Use integers constants for states

Avoid allocation

Use codepoints instead of chars

Summary

Uh oh!

jordanbtucker commented Jun 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

romgrk commented Jun 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jordanbtucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jordanbtucker commented Jun 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

romgrk commented Jun 15, 2022

Uh oh!

jordanbtucker left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

romgrk commented Jun 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jordanbtucker commented Jun 15, 2022

Uh oh!

Uh oh!

romgrk commented Oct 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jordanbtucker left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jordanbtucker commented Oct 16, 2022

Uh oh!

jordanbtucker commented Oct 16, 2022

Uh oh!

romgrk commented Oct 16, 2022

Uh oh!

jordanbtucker commented Oct 16, 2022

Uh oh!

romgrk commented Nov 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ghost left a comment

Choose a reason for hiding this comment

Uh oh!

mbehr1 commented Apr 15, 2023

Uh oh!

romgrk commented Jun 11, 2023

Uh oh!

niczak commented Apr 20, 2024

Uh oh!

romgrk commented Apr 21, 2024

Uh oh!

This comment was marked as spam.

Uh oh!

This comment was marked as off-topic.

kshetline commented Jun 1, 2025

Uh oh!

jackytank commented Jun 2, 2025

Uh oh!

kshetline commented Jun 3, 2025

Uh oh!

Reviewers

Assignees

romgrk commented Jun 14, 2022 •

edited

Loading

jordanbtucker commented Jun 14, 2022 •

edited

Loading

romgrk commented Jun 14, 2022 •

edited

Loading

jordanbtucker commented Jun 14, 2022 •

edited

Loading

jordanbtucker left a comment •

edited

Loading

romgrk commented Jun 15, 2022 •

edited

Loading

romgrk commented Oct 14, 2022 •

edited

Loading

jordanbtucker left a comment •

edited

Loading

romgrk commented Nov 30, 2022 •

edited

Loading