Skip to content

codegen performance: use trim instead of lodash/trimEnd#5255

Merged
loganfsmyth merged 4 commits intobabel:masterfrom
jwbay:codegen-perf
Feb 9, 2017
Merged

codegen performance: use trim instead of lodash/trimEnd#5255
loganfsmyth merged 4 commits intobabel:masterfrom
jwbay:codegen-perf

Conversation

@jwbay
Copy link
Copy Markdown
Contributor

@jwbay jwbay commented Feb 2, 2017

Q A
Patch: Bug Fix? yes
Major: Breaking Change? no
Minor: New Feature? no
Deprecations? no
Spec Compliancy? no
Tests Added/Pass? yes
Fixed Tickets #5081
License MIT
Doc PR no
Dependency Changes no

lodash/trimEnd executes a regex against potentially massive strings which can freeze node. Instead find out how much we need to trim off character by character, then slice it off all at once

Fixes #5081

Benchmark:

import Buffer from './buffer';

const buf = new Buffer();
const str = 'x \n \t\n '.repeat(1000000); //string length of 7 million chars

for (let index = 0; index < 20; index++) {
    const time = process.hrtime();
    const result = buf._trimEnd(str);
    const diff = process.hrtime(time);
    const timeNS = diff[0] * 1e9 + diff[1];
    const timeMS = timeNS / 1000000;
    console.log(`${timeMS} ms`);
}

Result, first with lodash, then with local implementation:

bash-3.2$ node lib/perftest
106.521502 ms
111.613859 ms
110.619436 ms
108.626456 ms
106.573851 ms
118.773115 ms
115.758886 ms
114.970173 ms
113.741202 ms
112.46398 ms
111.008178 ms
113.651342 ms
113.615203 ms
116.468036 ms
109.902413 ms
105.105529 ms
118.988414 ms
119.775734 ms
116.280489 ms
115.562662 ms
bash-3.2$ node lib/perftest
5.051242 ms
0.066738 ms
0.066952 ms
0.010192 ms
0.00808 ms
0.013173 ms
0.006453 ms
0.005716 ms
0.011435 ms
0.004782 ms
0.004849 ms
0.005258 ms
0.004975 ms
0.009258 ms
0.004691 ms
0.004398 ms
0.008326 ms
0.004975 ms
0.249306 ms
0.006354 ms

@mention-bot
Copy link
Copy Markdown

@jwbay, thanks for your PR! By analyzing the history of the files in this pull request, we identified @loganfsmyth, @davidaurelio and @jridgewell to be potential reviewers.

@jwbay jwbay changed the title implement local trimEnd for code generation codegen performance: implement local trimEnd Feb 2, 2017
_trimEnd(str: string): string {
let trimTarget = str.length;
const whitespace = /\s/;
while (trimTarget > 0 && str[trimTarget - 1].match(whitespace)) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace.test(str[trimTarget - 1])

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't aware of the perf difference. Done, thanks!

const map = this._map;
const result = {
code: trimEnd(this._buf.join("")),
code: this._trimEnd(this._buf.join("")),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is trimming actually necessary?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a new contributor I'm not really equipped to answer this question. Trailing whitespace doesn't seem important, but I guess there could be misbehaving plugins that introduce large amounts of it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DrewML @existentialism Do you know?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To throw out a guess, the generator injects lots of newlines trying to make formatting match, so this probably keeps the end-of-file formatting the same even through the node type at the end of the file may differ.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ That's what I assumed it was as well

@codecov-io
Copy link
Copy Markdown

codecov-io commented Feb 2, 2017

Codecov Report

Merging #5255 into master will not impact coverage.

@@           Coverage Diff           @@
##           master    #5255   +/-   ##
=======================================
  Coverage   89.22%   89.22%           
=======================================
  Files         203      203           
  Lines        9885     9885           
  Branches     2663     2663           
=======================================
  Hits         8820     8820           
  Misses       1065     1065
Impacted Files Coverage Δ
packages/babel-generator/src/buffer.js 100% <ø> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8c3392f...56c5984. Read the comment docs.

@jridgewell
Copy link
Copy Markdown
Member

oh, or there's https://www.npmjs.com/package/trim-right which already has this optimization (from none other than @kittens).

@hzoo
Copy link
Copy Markdown
Member

hzoo commented Feb 2, 2017

Wow a good find @jridgewell! #2008

@erikdesjardins
Copy link
Copy Markdown
Contributor

erikdesjardins commented Feb 2, 2017

For reference: this was originally changed from trim-right to lodash in #3500; should probably add a comment so it doesn't get switched back again

@danez
Copy link
Copy Markdown
Member

danez commented Feb 2, 2017

And maybe we can try to get this improvement into lodash? :)

this._queue.unshift([str, line, column, identifierName, filename]);
}

_trimEnd(str: string): string {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we throw a comment on here so it is clear why we do this rather than Lodash? Over time it'll be easy to lose info about which issue introduced this, and why.

Copy link
Copy Markdown
Member

@hzoo hzoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya a comment sounds good

also cc @jdalton for reference.

@jdalton
Copy link
Copy Markdown
Member

jdalton commented Feb 3, 2017

Nice find!

I haven't optimized trim yet (was waiting on a scenario before I took a look at it).
This looks like an easy tweak to include in future releases of Lodash.

@DrewML
Copy link
Copy Markdown
Member

DrewML commented Feb 3, 2017

Thanks for doing this @jwbay. I've been super busy with work, so your contribution is very appreciated!

/cc @DmitriiAbramov since Jest is waiting on this fix for a new release.

@jdalton
Copy link
Copy Markdown
Member

jdalton commented Feb 3, 2017

Ah, checking a bit more: This was a case where I had previously optimized it but thought it was a bit overkill. But now there's a nice scenario, so 🤘.

Out of curiosity @jwbay could you test it with lodash.trimright (the v3 version of _.trimEnd). It used things like charCodeAt instead of index lookup, but it does a similar kind of optimization as this PR. I'm wondering if I could essentially just revert to get the wins.

@existentialism
Copy link
Copy Markdown
Member

existentialism commented Feb 3, 2017

Ran the above with lodash.trimright and it returns similar results:

4.843494 ms
0.023119 ms
0.020604 ms
0.002296 ms
0.001449 ms
0.00153 ms
0.00117 ms
0.001173 ms
0.001042 ms
0.000889 ms
0.000904 ms
0.000948 ms
0.000933 ms
0.023671 ms
0.004273 ms
0.001282 ms
0.000983 ms
0.000921 ms
0.002614 ms
0.000893 ms

@jridgewell
Copy link
Copy Markdown
Member

@jdalton: Note that the new code is considerably slower when there are a large amounts of spaces at the end. We could probably speed it up further by doing a charCodeAt comparison so everything that matches \s. But this PR is enough for Babel's use case.

@jdalton
Copy link
Copy Markdown
Member

jdalton commented Feb 3, 2017

Thanks @existentialism!

Ran the above with lodash.trimright and it returns similar results

Would you mind sanity checking the built-in 'string'.trimRight() usage too.
Please specify the version of Node you tested (I'm interested in v4, v6, v7)

@jridgewell

Note that the new code is considerably slower when there are a large amounts of spaces at the end. We could probably speed it up further by doing a charCodeAt comparison so everything that matches \s. But this PR is enough for Babel's use case.

That's what Lodash v3 _.trimRight does.

@existentialism
Copy link
Copy Markdown
Member

@jdalton

I ran the tests on node 6.9.4.

With String.prototype.trimRight():

4.194117 ms
0.06576 ms
0.0295 ms
0.001513 ms
0.000977 ms
0.001038 ms
0.000771 ms
0.000657 ms
0.000675 ms
0.000616 ms
0.000533 ms
0.000589 ms
0.004922 ms
0.000634 ms
0.000528 ms
0.000578 ms
0.000621 ms
0.000531 ms
0.000584 ms
0.000576 ms

@jdalton
Copy link
Copy Markdown
Member

jdalton commented Feb 3, 2017

@existentialism Great!

Ya, I remember it being pretty speedy.
Here's Node v4.7.3 numbers for String#trimRight:

27.201798 ms
0.010351 ms
0.002709 ms
0.001111 ms
0.000863 ms
0.000719 ms
0.00078 ms
0.00075 ms
0.000703 ms
0.000747 ms
0.000757 ms
0.00074 ms
0.000728 ms
0.000742 ms
0.000726 ms
0.000743 ms
0.000796 ms
0.000711 ms
0.000716 ms
0.000722 ms

Looks like a quick fix (this PR) for v6 but then defer to built-in for v7 would do.

Update:
Actually, here are the Node 0.12.18 numbers:

0.083213 ms
0.009484 ms
0.002044 ms
0.001321 ms
0.00106 ms
0.000942 ms
0.00086 ms
0.000897 ms
0.001407 ms
0.001589 ms
0.001432 ms
0.001167 ms
0.0015 ms
0.001465 ms
0.00106 ms
0.000834 ms
0.000871 ms
0.00085 ms
0.00081 ms
0.000839 ms

Maybe just go with the built-in all up.

@jwbay
Copy link
Copy Markdown
Contributor Author

jwbay commented Feb 4, 2017

I switched to the built-in and left a comment in case it needs to be switched to something else in the future.

@jwbay jwbay changed the title codegen performance: implement local trimEnd codegen performance: use trimRight instead of lodash/trimEnd Feb 4, 2017
@loganfsmyth
Copy link
Copy Markdown
Member

Only .trim() is standard, .trimRight is a proposal that is now .trimEnd.

It looks like two output format tests fail if we just use .trim() but I'm fine if we update those tests to trim both ends.

@jwbay jwbay changed the title codegen performance: use trimRight instead of lodash/trimEnd codegen performance: use trim instead of lodash/trimEnd Feb 4, 2017
@jwbay
Copy link
Copy Markdown
Contributor Author

jwbay commented Feb 4, 2017

Ok, switched to trim. Here's the run on node 6:

4.412384 ms
0.034078 ms
0.03672 ms
0.002295 ms
0.002837 ms
0.002182 ms
0.001503 ms
0.001301 ms
0.001549 ms
0.001427 ms
0.001174 ms
0.001238 ms
0.001239 ms
0.001245 ms
0.001323 ms
0.001315 ms
0.001256 ms
0.001122 ms
0.001051 ms
0.001234 ms

@babel-bot
Copy link
Copy Markdown
Collaborator

Hey @jwbay! It looks like one or more of your builds have failed. I've copied the relevant info below to save you some time.

jwbay added 3 commits February 4, 2017 16:19
lodash/trimEnd executes a regex against potentially massive strings which can freeze node. instead find out how much we need to trim off character by character, then slice it off all at once
@@ -1,4 +1,2 @@
#!/usr/bin/env node


Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@loganfsmyth: Does this affect sourcemaps?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! Good call, dang.

Copy link
Copy Markdown
Member

@loganfsmyth loganfsmyth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jridgewell is totally right, we can't use .trim() because the sourcemaps will point to the wrong lines.

@jwbay
Copy link
Copy Markdown
Contributor Author

jwbay commented Feb 4, 2017

Should we go back to local impl. or use https://www.npmjs.com/package/trim-right?

@loganfsmyth
Copy link
Copy Markdown
Member

Why don't we just use trim-right and call it good.

@jdalton
Copy link
Copy Markdown
Member

jdalton commented Feb 4, 2017

@loganfsmyth

Why don't we just use trim-right and call it good.

Because trimRight is built-in and supported by all versions of Node. Why add another dep when it's completely not needed? The trimRight and trimLeft flavors have been in engines as long before trim was standardized.

@loganfsmyth
Copy link
Copy Markdown
Member

@jdalton It'd break the Web REPL on IE, at least according to MDN.

@jdalton
Copy link
Copy Markdown
Member

jdalton commented Feb 4, 2017

@loganfsmyth isn't that a core-js concern?

@loganfsmyth
Copy link
Copy Markdown
Member

Babel relies on babel-runtime but that doesn't polyfill prototype methods, so it'd just throw.

@jdalton
Copy link
Copy Markdown
Member

jdalton commented Feb 4, 2017

What is babel-runtime's browser support?
If it's not shimmable and babel-runtime is tied to browser support then that would be a problem.
Have you benchmarked trim-right?

@loganfsmyth
Copy link
Copy Markdown
Member

As a library, Babel doesn't make any assumptions about non-ES5 method availability and it relies on transform-runtime to pick up uses of ES6 functionality and points them at core-js's non-global polyfill library. babel-runtime's browser support is the same as core-js, it just doesn't expose things that can't be detected confidently with static analysis, like prototype methods.

trim-right was linked above in #5255 (comment) as something that has this optimization.

We can also just wait if you plan to optimize this in Lodash. I have no string feelings about how we fix it.

@jdalton
Copy link
Copy Markdown
Member

jdalton commented Feb 4, 2017

As a library, Babel doesn't make any assumptions about non-ES5 method availability and it relies on transform-runtime to pick up uses of ES6 functionality and points them at core-js's non-global polyfill library. babel-runtime's browser support is the same as core-js, it just doesn't expose things that can't be detected confidently with static analysis, like prototype methods.

What's core-js browser support? I'm getting at if IE8 is involved (I hope not) but if it is then character access by index is not supported.

trim-right was linked above in #5255 (comment) as something that has this optimization.

Ya, it looks like it's comparable:

0.180566 ms
0.018806 ms
0.026337 ms
0.002897 ms
0.002399 ms
0.002713 ms
0.002148 ms
0.002398 ms
0.00205 ms
0.002236 ms
0.001994 ms
0.002223 ms
0.003143 ms
0.002006 ms
0.001956 ms
0.001909 ms
0.001959 ms
0.001976 ms
0.219092 ms
0.002705 ms

We can also just wait if you plan to optimize this in Lodash. I have no string feelings about how we fix it.

Lodash is into its v5 development. v4 updates are restricted to critical bugs. So this optimization won't come until v5 is released (some time away)

@loganfsmyth
Copy link
Copy Markdown
Member

What's core-js browser support? I'm getting at if IE8 is involved (I hope not) but if it is then character access by index is not supported.

Wait, IE8 doesn't support indexed string access? I'd say IE8 is beyond Babel's supported versions. I actually don't think we have an official number, but using .trimRight will certainly introduce something that will break on IE so avoiding it seems practical. Since .trimRight is not standards-track, I'm against relying on it. Does it exist in Nashorn, for example? I have no idea.

Since we can't wait on Lodash v5, and we can get acceptable performance from trim-right, that's my vote.

@jdalton
Copy link
Copy Markdown
Member

jdalton commented Feb 5, 2017

Wait, IE8 doesn't support indexed string access?

Right-ish. It's wonky.

I'd say IE8 is beyond Babel's supported versions.

Yiss! It'd be nice to have Babel's support clearly outlined though to make things,
like deciding which APIs to use, easier.

Does it exist in Nashorn, for example? I have no idea.

Does Babel support Nashorn? Are you testing Nashorn?

Since we can't wait on Lodash v5, and we can get acceptable performance from trim-right, that's my vote.

Yap. Lodash 5 will actually rely on trimRight or trimEnd (so won't be shimming it).

@xtuc
Copy link
Copy Markdown
Member

xtuc commented Feb 5, 2017

Nashorn or IE aren't officially supported according to compiler-environment-support.md

I agree with @loganfsmyth, this is not worth breaking them.

@jdalton
Copy link
Copy Markdown
Member

jdalton commented Feb 5, 2017

@xtuc

Nashorn or IE aren't officially supported according to compiler-environment-support.md

Thanks!

I agree with @loganfsmyth, this is not worth breaking them.

Got it. Don't break enviros that are untested with unknown (un)supported states. 👌 😸

@hzoo
Copy link
Copy Markdown
Member

hzoo commented Feb 6, 2017

We are going to drop transform-runtime in 7.0 btw

@loganfsmyth loganfsmyth merged commit 1a325ce into babel:master Feb 9, 2017
@jwbay jwbay deleted the codegen-perf branch February 9, 2017 21:05
@hzoo hzoo added the PR: Internal 🏠 A type of pull request used for our changelog categories label Feb 13, 2017
@lock lock bot added the outdated A closed issue/PR that is archived due to age. Recommended to make a new issue label Oct 6, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Oct 6, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

outdated A closed issue/PR that is archived due to age. Recommended to make a new issue pkg: generator PR: Internal 🏠 A type of pull request used for our changelog categories

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Code size limit bump (#4965) in babel-generator causes issues for very large files