Collect all statistics and optimize checks#6034
Conversation
|
|
||
| object StatisticsHelper { | ||
| /** Represents the method handle for a parameterless method returning a primitive boolean. */ | ||
| final val BooleanMethodType: MethodType = MethodType.methodType(classOf[Boolean]).unwrap |
There was a problem hiding this comment.
The unwrap here could probably be removed.
| total += System.nanoTime() - time | ||
| } | ||
| val total2 = System.nanoTime() - start | ||
| // Shouldn't we replace this println with a `reporter.info`? |
There was a problem hiding this comment.
I guess this is a TODO? I'd like to get some confirmation. Problem is that we have no access to reporters from here.
| println("Enabling statistics, measuring overhead = "+ | ||
| total/10000.0+"ns to "+total2/10000.0+"ns per timer") | ||
| _enabled = true | ||
| this.setValue(true) |
There was a problem hiding this comment.
Note that here we only can enable, but we cannot disable. Is this what we want, or would we like to enable/disable them for hot compiler? Personally, I think we could support both. It's true it will have an impact and will force the deoptimization of bunch a code, but this is the case when we enable statistics anyway.
|
Started the benchmarks, builds 929 and 930 in https://scala-ci.typesafe.com/benchq/tasks |
|
Scala.js does not compile scalac nor scala-reflect, so I don't think there's any insights to be had from me here ;) |
Yeah, that was my intuition, but you commented on the ticket ()scala/scala-dev#149 saying otherwise:
Which made me think that you're interested in compiling the compiler with Scalajs at some point in the future. 😄 |
|
True, I think it would be good to preemptively isolate such things in easy-to-swap-out files. But I don't have much interest in doing the work of porting scalac to Scala.js myself. |
|
It would be interesting to compare with a version where |
If this is the case, we will not be able to switch on/off statistics, right? |
Yes indeed, but it would be interesting to see if it's faster than the current situation where we read the |
|
@lrytz Gotcha, it's a good idea to know the lower and upper bound. I've added two commits to check the performance for both |
|
OK, should I run the benchmarks on these 3 new commits? |
|
Yes, please! @lrytz |
|
dbb7720 is slower in the current results. I think this is due to |
|
I retract what I said before, there is no place where we use the setter unnecessarily. Also, I am removing the last commit since it fails because of scala/bug#10460. Now that I have all the benchmark results, I see that making What I cannot grasp is how having a
Everything is green. So at this point it would be good to get some feedback from a reviewer. |
|
Note that if the results look good, I'll need to remove 2300797. |
| val bounded = handle.bindTo(this) | ||
| setTarget(bounded) | ||
| bounded | ||
| } |
There was a problem hiding this comment.
I'd suggest to put all the method handle contants into a Java source file so they can be try static final constants.
https://wiki.openjdk.java.net/display/HotSpot/Deconstructing+MethodHandles
If such a reference to a MethodHandle is held in a static final field then the runtime should be able to constant fold invocations on that reference and what it holds when inlining occurs.
There was a problem hiding this comment.
This is cool. I'll try this.
| @@ -267,19 +308,20 @@ quant) | |||
| */ | |||
| final val hotEnabled = false | |||
There was a problem hiding this comment.
To get a better look at the performance overhead (or absense thereof), perhaps you could extend the switchpoint technique to hotEnabled.
There was a problem hiding this comment.
I've decided to reuse canEnable for hotEnabled as well, so that when we do measure statistics we get the whole information. Let me know if you're ok with this.
184a61d to
01a4287
Compare
|
There seems to be some problem here with the CLA signature checker not detecting that I've signed the CLA in commit 01a4287 and blocking the rest of the commits from being compiled/tested. Any hints on how I could solve it? |
|
There's a noticeable hit in performance (~10%) when The use of statics in 63953e1 seems to have no effect in performance compared to the previous approach. What's your view on this @retronym? I haven't made |
|
Okay, I've made some arrangements. You can check all the most important benchmarks here: https://scala-ci.typesafe.com/grafana/dashboard/db/pr-validation?var-scalaSha=311a576030|bb3e997faa|6d006a5461|63953e1388|5254231683|b5a3aee4db&var-source=All&var-bench=HotScalacBenchmark.compile&var-host=scalabench@scalabench@&from=1502073067834&to=1502882030576. At the end, the statics trick did have an effect on performance, b5a3aee addresses an oversight I had. I think this is already ready for another review. |
c675569 to
042895b
Compare
|
Okay, wrapping up, after all the experimentation that I've done with the different commits, my conclusions are:
All in all, this PR is ready for review. If you're okay with the final implementation, I'll squash accordingly. I'm planning to follow up with another proposal that will add more statistics. |
|
By the way, forgot to mention. Final benchmarks for:
|
|
What is the overhead of having statistics enabled? Ideally it should not skew the overall runtime too much (perhaps 10% overhead is okay). If the overhead is too high, we'll need to re-separate the toggle switches for hot and regular stats. |
|
Oh, I see that in 347d589 you've done that already. |
Right now, all of them are enabled and the overhead (when not enabled) is the one reported in my last comments. However, I have not computed the overhead when they are enabled because of scala/bug#10460. |
|
|
||
| private static final MethodHandle DEFAULT_VALUE_GETTER = DEFAULT_VALUE.createGetter(); | ||
|
|
||
| public static BooleanContainer getDefault() { |
There was a problem hiding this comment.
Should this actually be named isStatisticsEnabled? AFAICT it can't be used for more than one purpose, so it seems strange to leave it here.
If my assumptions are true, I'd suggest to move getDefault and the static fields it depends on to a class named StatisticsStatics or somesuch.
There was a problem hiding this comment.
Hmm, the naming is generic is because I thought that this BooleanContainer could be used in other scenarios (that require an almost final boolean), and so didn't want to hardcode it. There's in theory nothing tied to statistics on this class on purpose.
I'm happy to follow your advice if you think that this genericity will not be worth it in the future... but please confirm 😄.
There was a problem hiding this comment.
Yes, please do. IMO this doesn't make any more sense than exposing var default: Boolean here :)
|
We can use Rough numbers, one fork on my local machine.
It would be prudent to figure out which statistics are really the hot ones, and maybe remove them. |
|
YourKit suggests that maintaining the I have found the YourKit seems to overstate the cost of Anyway, removing stats: retronym@6e993e6 drops the cost of Options:
|
|
|
|
Thanks for the in-depth analysis, Jason.
I've always liked the idea of pseudorandom sampling. I think it could give good results on these super hot counters. However, before trying it out I think it's better to try easier alternatives like using a new almost final value for those hot stats you've identified + IMO, this effort could be split into a different PR. If 1866d1b doesn't reduce the overhead, I suggest merging what we have now (I'll clean up the git history first) and defer a decision and implementation to another PR. |
|
@retronym These are my results benchmarking locally with TurboBoost disabled. I could only benchmark 2.12.3 and the last commit in this PR, since the other artifacts weren't accessible, even though I have the following in a global sbt file: resolvers ++= (
if (scalaVersion.value.contains("-bin"))
List("scala-integration" at "https://scala-ci.typesafe.com/artifactory/scala-integration/")
else Nil
)Results for 2.12.4-bin-042895b-SNAPSHOTResults for 2.12.3ConclusionI'm surprised by these results, up until a point in which I would like someone to rerun the benchmarks and confirm them. Following these results, the overhead between the two is negligible. As a reminder, the last commit moves the statistics infrastructure in the initializer of Note that last commits are failing because the builds for 9a2a0cd were aborted. Can someone restart the CI for them? |
|
I have this to access PR validation builds. https://scala-ci.typesafe.com/artifactory/scala-integration/ contains fully bootstrapped builds triggered by the benchmark runs. Even without these, it is possible to run |
|
/rebuild https://scala-ci.typesafe.com/job/scala-2.12.x-validate-test/5457/consoleText |
|
I've just benchmarked everything again to make sure the results are consistent.
Rough numbers without error margins (which are more or less consistent):
I don't have benchmarks for I guess this can be reviewed again. |
The following commit changes the previous val into a def so that collection of all statistics is recorded. Before this change, most of the statistics were disabled even if `-Ystatistics` was enabled because the value of the `_enabled` was false when the object `Statistics` was initialized. In the case of `-Ystatistics` being enabled, `_enabled` was changed by the settings manager but that change never affected `canEnable`.
References: scala/scala-dev#149. It adds a mechanism to remove the overhead of checking for statistics in the common case (where statistics are disabled). It does so by reusing a mechanism proposed by Rémi Forax and brought to my attention by Jason (more here: scala/scala-dev#149). This commit adds the following changes: 1. It adds the infrastructure so that statistics are optimized. This means that now we have two different flags, `-Ystatistics` and `-Yhot-statistics`, that enable different counters and timers. The hot statistics are the ones that instrument key parts of the compiler infrastructure and heavily affect the runtime (around 10%). The ones for `-Ystatistics` enable most of the statistics, and are the ones that give most information about the compiler and yet have a slight effect on runtime. 2. It prints the statistics overhead per run, and does it through the reporter instead of `println`. The implementation of this commit has gone through several iterations until performance has been deemed to be minimal, both for the cases where statistics are enabled and disabled. For a concrete analysis, see the PR discussion: scala#6034. There could still be some work optimizing the overhead where statistics are enabled, but this commit does a pretty good job at it so far.
|
I have squashed the commits of this PR. The previous git history can be found in the |
|
Scheduled another benchmark to be doubleplus sure. |
retronym
left a comment
There was a problem hiding this comment.
Great work, thanks for your persistence in finding the optimal solution.
No conflict, include PRs scala#5762, scala#6034

References: scala/scala-dev#149.
This PR does the two following things, all in the domain of statistics:
reporting on a very narrow set of the available statistics). This is
achieved by turning
canEnableinto adefinstead ofval. Relatedhistory: ebb719f#diff-438d9e5ac0dd1965aabd4776e1ee70eaR246/
in the common case (where statistics are disabled). It does so by
reusing a mechanism proposed by Rémi Forax and brought to my
attention by @retronym (more here: optimize command-line flag checking in compiler using ConstantCallSite scala-dev#149).
There is one pending issues with this PR, which hasn't been solved yet:
compilers like Scalajs and Scala.native to default to a correct, inefficient
implementation?
I don't know how to do this (or how these compilers are replacing JVM-specific
implementations), so I would appreciate insights from @sjrd and @densh.
In the future, I could look into ways of expanding this implementation to all
our flag checking infrastructure, which is what the referenced ticket on the
top is about. However, I'd prefer to do this in a separate PR if possible. For
now, I wanted to create a very small change to see the prior results.
This PR is work-in-progress and I've just opened it to check the
performance penalty of my changes. Afterwards, I would like to detect
what's the performance penalty of having statistics enabled by default,
I'm sure we can draw interesting results from there.