Conversation
|
Thank you for these performance suggestions! Would you mind rebasing as we've made some CI changes and I'd like to see the results of this PR in CI. Thanks! |
|
Done, I've rebased on main. edit: And it's failing due to coverage. How can I see which lines aren't covered? |
jordanbtucker
left a comment
There was a problem hiding this comment.
Looks great, except for a few minor changes! I'll take a closer look at everything when I have a bit more time.
|
To view coverage, run |
|
Build passing & all good on my side, this can be merged if you're satisfied with it. |
|
Linting complete. edit: Oh wait no missed the 4 digits. |
|
Awesome! I'm going to sit on this PR for a day just to make sure I'm not missing anything and vet the code a bit more. I really appreciate the code improvements and well-written description. |
|
Ping @jordanbtucker: is there still interest for this PR? If not feel free to close. I should also note, for completeness' sake, that if someone is really after performance, then using a pure javascript implementation of JSON5 won't cut it. The native JSON implementation in node/V8 is 20-40x faster in the same benchmarks used above. But saving cycles is always nice. |
|
If you're not able to make those changes, I don't mind merging this and making them myself. |
|
Also, I think I might rebase this against the v3 branch instead of main. |
Something like this? const C_0 = 0x30
const C_1 = 0x31
// ... |
|
Yeah, that works. Maybe use |
|
I've been trying to find time for this but I've been super busy with work. If you're ok with merging as-is I think it's going to be faster. I've rebased on master, do you also need it rebased on v3? |
|
any update here? Imho those improvements would be quite useful. |
|
Ping @jordanbtucker, if you can list the minimum required changes to merge this, I might be able to complete it soon. |
|
Somewhat criminal this branch was never merged. |
|
Open-source maintainers are free to merge or not, performance might not be a priority for this project. Merging this PR also means the maintainer(s) would need to support these changes, which do decrease readability a bit :| Note that from what I remember while benchmarking this PR, even with these changes the native |
This comment was marked as off-topic.
This comment was marked as off-topic.
|
@romgrk, while this PR sits here I've found a home for a somewhat modified version of your suggested changes in my own JSON5 fork. Thanks for the ideas! |
|
+1 vote to merge this PR. as I just clone this PR's branch and do the benchmark myself and found it indeed improve the performance by x2 simply use enum integer instead of enum string (I lied to make my vote more creditable :v) |
|
I added some benchmarking to my own spin-off project. The benchmarking measures the performance of the built-in The test material has to be standard JSON, of course, or The first sample is a huge sample file I found, naturally enough, googling for "large JSON sample". There might well be better samples out there, but this is probably decent. (I'll may try to find another example that exercises number, boolean, and null parsing more.) The second sample is simply the first sample with a 20MB-long single string tacked on, related to another speed improvement from this PR: #233 Clearly nothing else comes close to |
Changes
I was reading the parser code and saw a few things that could improve the performance, they are summarized below.
Use integers constants for states
I saw that the state constants where strings. I thought that it could be a quick gain to replace them with named integer constants. The benchmark results I get are that there is a ~20% improvement to
parse()by doing so.Avoid allocation
While I was looking for other improvements, I also noticed that the
tokenvariable was assigned a new object everytime, I therefore created an objecttokenRef, which is updated every timetokenneeds to be assigned. I couldn't get a statistically significant improvement here, results were either 1% better or equal. But I've included it anyway because removing allocations alleviates the job of the garbage collector, which is something that's hard to benchmark but always nice to have.Use codepoints instead of chars
Similarly to the first point, I thought it could be faster to use (integer) codepoints instead of chars. I ran the code through some code to get new code:
And made sure functions were adapted to receive codepoints instead of chars. This last change had a bigger impact, I would say about 70-80% improvement over the previous one.
Summary
The changes above end up giving an improvement of around 100% compared to the original version. All three of them are independant of each other, so you could choose to pick some of them only. If this PR seems too dauting to read in one go, each commit is complete and represents one change.
The benchmark code: https://gist.github.com/romgrk/eb4a2a16422e50bc37e811c934a66f8f
Small file:
Large file: