Skip to content

SD-225 Use a "lzycompute" method for module initialization#5398

Merged
retronym merged 1 commit intoscala:2.12.0from
retronym:ticket/SD-225
Sep 15, 2016
Merged

SD-225 Use a "lzycompute" method for module initialization#5398
retronym merged 1 commit intoscala:2.12.0from
retronym:ticket/SD-225

Conversation

@retronym
Copy link
Member

@retronym retronym commented Sep 14, 2016

The monitors and module instantation were inliuned into the module accessor
method in b2e0911. However, this seems to have had a detrimental impact on
performance. This might be because the module accessors are now above the
"always inline" HotSpot threshold of 35 bytes, or perhaps because they
contain monitor-entry/exit and exception handlers.

This commit returns to the the 2.11.8 appraoch of factoring
the the second check of the doublecheck locking into a method.
I've done this by declaring a nested method within the accessor;
this will be lifted out to the class level by lambdalift.

This represents a slight deviation from the implementation strategy used
for lazy accessors, which create a symbol for the slowpath method in
the info transform and generate the corresponding DefDef as a class
member. I don't believe this deviation is particular worrisome, though.

I have bootstrapped the compiler through this commit and found that
the drastic regression in compiling the shapeless test suite is solved.

Fixes scala/scala-dev#225

@scala-jenkins scala-jenkins added this to the 2.12.0-RC2 milestone Sep 14, 2016
@retronym
Copy link
Member Author

Review by @adriaanm | @lrytz

@adriaanm
Copy link
Contributor

Looks like it needs a junit test update. Otherwise LGTM

@lrytz
Copy link
Member

lrytz commented Sep 14, 2016

LGTM, too

The monitors and module instantation were inliuned into the module accessor
method in b2e0911. However, this seems to have had a detrimental impact on
performance. This might be because the module accessors are now above the
"always inline" HotSpot threshold of 35 bytes, or perhaps because they
contain monitor-entry/exit and exception handlers.

This commit returns to the the 2.11.8 appraoch of factoring
the the second check of the doublecheck locking into a method.
I've done this by declaring a nested method within the accessor;
this will be lifted out to the class level by lambdalift.

This represents a slight deviation from the implementation strategy used
for lazy accessors, which create a symbol for the slowpath method in
the info transform and generate the corresponding DefDef as a class
member. I don't believe this deviation is particular worrisome, though.

I have bootstrapped the compiler through this commit and found that
the drastic regression in compiling the shapeless test suite is solved.
@retronym
Copy link
Member Author

retronym commented Sep 15, 2016

I've run a few more benchmarks that show the same regression in 2.12.0-RC1 and restored performance after bootstrapping through this change (highlighted below).

image

@retronym retronym merged commit be43eb5 into scala:2.12.0 Sep 15, 2016
@milessabin
Copy link
Contributor

👏

@milessabin
Copy link
Contributor

milessabin commented Sep 15, 2016

After this I'm seeing,

2.12.0 branch
coreJVM/compile 39s
coreJVM/test:compile 238s

Which is a significant improvement over RC1 but still almost twice as long as 2.11.8.

@retronym
Copy link
Member Author

retronym commented Sep 16, 2016

@milessabin Did you use a bootstrapped compiler when collecting those numbers? Until #5403 is merged, we're still using 2.12.0-RC1 as STARR.

@milessabin
Copy link
Contributor

milessabin commented Sep 16, 2016

I did a publishLocal using @adriaanm's recipe, then rebuilt with -Dstarr.version=..., then dist/mkPack and built shapeless and its tests using ++2.12.0-RC1=... path to pack ....

@milessabin
Copy link
Contributor

If you have an alternative recipe for testing please let me know and I'll try it out.

@retronym
Copy link
Member Author

I've just tested against 2.12.0-d274405-SNAPSHOT (the artifact from #5403).

I benchmarked the cold performance of compiling shapeless core with this version and 2.11.8.

Results: https://gist.github.com/retronym/220345f697a18a0facc259556a8a73b2

[info] Benchmark                    (_scalaVersion)  (extraArgs)                                      (source)  Mode  Cnt      Score     Error  Units
[info] ColdScalacBenchmark.compile           2.11.8               @/Users/jz/code/shapeless/args-2.11-core.txt    ss   12  19463.820 ± 453.189  ms/op
[info] ColdScalacBenchmark.compile  2.12.0-d274405-SNAPSHOT               @/Users/jz/code/shapeless/args-2.12-core.txt    ss   12  22751.384 ± 727.376  ms/op

This does show a performance regression of around ~15%, which I'm interested to investigate further.

For the tests project (only one iteration this time)

[info] Benchmark                    (_scalaVersion)  (extraArgs)                                      (source)  Mode  Cnt       Score   Error  Units
[info] ColdScalacBenchmark.compile           2.11.8               @/Users/jz/code/shapeless/args-2.11-test.txt    ss       113177.188          ms/op
[info] ColdScalacBenchmark.compile  2.12.0-d274405-SNAPSHOT               @/Users/jz/code/shapeless/args-2.12-test.txt    ss       131956.216          ms/op

That suggests a similar slowdown. But neither of these matches with your report of a 2x slowdown.

@milessabin
Copy link
Contributor

milessabin commented Sep 18, 2016

My latest timings ...

For 2.12.0 + #5403 + dist/mkPack, latest from shapeless master,

% sbt -Dsbt.profile=2.12.x
> ++2.12.0-RC1=/home/miles/projects/scala/build/pack
> clean
> coreJVM/compile
> coreJVM/test:compile

coreJVM/compile 37s
coreJVM/test:compile 200s

For 2.11.8, latest from shapeless master,

% sbt -Dsbt.profile=2.12.x
> ++2.11.8
> clean
> coreJVM/compile
> coreJVM/test:compile

coreJVM/compile 33s
coreJVM/test:compile 123s

So, I'm seeing about the same slowdown as you for coreJVM/compile but a much bigger slowdown for coreJVM/test:compile ... consistently 1.6x with my latest 2.12.0 build.

Are we compiling the exact same tests? Is it possible that you excluded some to avoid the constant val elimination problem?

Alternatively, am I doing something wrong?

@milessabin
Copy link
Contributor

milessabin commented Sep 18, 2016

(nb. the -Dsbt.profile=2.12.x just disables sbt-scoverage which doesn't have 2.12.x support yet).

@retronym
Copy link
Member Author

retronym commented Sep 18, 2016

@milessabin Just doing dist/mkPack will not use -opt when building. I suggest using ;setupPublishCore;dist/mkPack;publishLocal

Here are the compiler command lines I used (includind selection of test source files) https://gist.github.com/retronym/8bca64252f68fbfb354593af7b53ae74

I was also compiling outside of SBT. It is conceivable that there is an additional regression in the incremental compiler (which adds two compiler phases to extract dependencies and APIs from the compiled files).

@milessabin
Copy link
Contributor

I'm afraid I can't reproduce your results, even building with scalac outside SBT.

I'm quite prepared to believe that -optimise (or rather the lack of it) is the issue here, but if it is then none of the instructions above seem to be enough to coax it into being applied.

@milessabin
Copy link
Contributor

Could you publish a snapshot that I can test here?

@milessabin
Copy link
Contributor

Compiling with plain scalac, command line similar to yours above I see,

2.11.8  129s
2.12.x  155s

So 1.2x in this case ... does that line up with what you're seeing?

In SBT I'm seeing,

2.11.8  138s
2.12.x  237s

ie. 1.7x, which suggests that your hypothesis that there's a regression in the incremental compiler might be right.

@retronym
Copy link
Member Author

I have since found what might have been the root cause of this problem: our code gen from synchronized has subtly changed in a way that rendered them unJITtable.

I have proposed a fix for that in #5417

I think this change makes sense in any case, but if anyone want to revisit the decision, we could do some benchmarking without the lzycompute methods for module accessors.

@adriaanm adriaanm added the release-notes worth highlighting in next release notes label Oct 15, 2016
@adriaanm adriaanm modified the milestone: 2.12.0-RC2 Oct 29, 2016
@adriaanm adriaanm added the 2.12 label Oct 29, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes worth highlighting in next release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants