Skip to content

Collect all statistics and optimize checks#6034

Merged
retronym merged 2 commits intoscala:2.12.xfrom
scalacenter:stats
Sep 20, 2017
Merged

Collect all statistics and optimize checks#6034
retronym merged 2 commits intoscala:2.12.xfrom
scalacenter:stats

Conversation

@jvican
Copy link
Member

@jvican jvican commented Aug 10, 2017

References: scala/scala-dev#149.

This PR does the two following things, all in the domain of statistics:

  1. It allows the collection of all the statistics (before, we were only
    reporting on a very narrow set of the available statistics). This is
    achieved by turning canEnable into a def instead of val. Related
    history: ebb719f#diff-438d9e5ac0dd1965aabd4776e1ee70eaR246/
  2. It adds a mechanism to remove the overhead of checking for statistics
    in the common case (where statistics are disabled). It does so by
    reusing a mechanism proposed by Rémi Forax and brought to my
    attention by @retronym (more here: optimize command-line flag checking in compiler using ConstantCallSite scala-dev#149).

There is one pending issues with this PR, which hasn't been solved yet:

  • How do we keep this implementation runtime-agnostic, allowing
    compilers like Scalajs and Scala.native to default to a correct, inefficient
    implementation?

I don't know how to do this (or how these compilers are replacing JVM-specific
implementations), so I would appreciate insights from @sjrd and @densh.

In the future, I could look into ways of expanding this implementation to all
our flag checking infrastructure, which is what the referenced ticket on the
top is about. However, I'd prefer to do this in a separate PR if possible. For
now, I wanted to create a very small change to see the prior results.

This PR is work-in-progress and I've just opened it to check the
performance penalty of my changes. Afterwards, I would like to detect
what's the performance penalty of having statistics enabled by default,
I'm sure we can draw interesting results from there.

@scala-jenkins scala-jenkins added this to the 2.12.4 milestone Aug 10, 2017

object StatisticsHelper {
/** Represents the method handle for a parameterless method returning a primitive boolean. */
final val BooleanMethodType: MethodType = MethodType.methodType(classOf[Boolean]).unwrap
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unwrap here could probably be removed.

total += System.nanoTime() - time
}
val total2 = System.nanoTime() - start
// Shouldn't we replace this println with a `reporter.info`?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is a TODO? I'd like to get some confirmation. Problem is that we have no access to reporters from here.

println("Enabling statistics, measuring overhead = "+
total/10000.0+"ns to "+total2/10000.0+"ns per timer")
_enabled = true
this.setValue(true)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that here we only can enable, but we cannot disable. Is this what we want, or would we like to enable/disable them for hot compiler? Personally, I think we could support both. It's true it will have an impact and will force the deoptimization of bunch a code, but this is the case when we enable statistics anyway.

@lrytz
Copy link
Member

lrytz commented Aug 10, 2017

Started the benchmarks, builds 929 and 930 in https://scala-ci.typesafe.com/benchq/tasks

@sjrd
Copy link
Member

sjrd commented Aug 10, 2017

Scala.js does not compile scalac nor scala-reflect, so I don't think there's any insights to be had from me here ;)

@jvican
Copy link
Member Author

jvican commented Aug 10, 2017

Scala.js does not compile scalac nor scala-reflect, so I don't think there's any insights to be had from me here ;)

Yeah, that was my intuition, but you commented on the ticket ()scala/scala-dev#149 saying otherwise:

Just pointing out that this will make it harder to compile scalac with non-JVM backends (duh, I can't just write "Scala.js" in those cases anymore ^^). It would be good if the way this is implemented is well isolated somewhere where it is easy to change the implementation for different platforms. (as opposed to spread this implementation detail all over the place)

Which made me think that you're interested in compiling the compiler with Scalajs at some point in the future. 😄

@sjrd
Copy link
Member

sjrd commented Aug 10, 2017

True, I think it would be good to preemptively isolate such things in easy-to-swap-out files. But I don't have much interest in doing the work of porting scalac to Scala.js myself.

@lrytz
Copy link
Member

lrytz commented Aug 10, 2017

@lrytz
Copy link
Member

lrytz commented Aug 10, 2017

It would be interesting to compare with a version where canInline is a compile-time constant (final val canEnable = false)

@jvican
Copy link
Member Author

jvican commented Aug 10, 2017

It would be interesting to compare with a version where canInline is a compile-time constant (final val canEnable = false)

If this is the case, we will not be able to switch on/off statistics, right?

@lrytz
Copy link
Member

lrytz commented Aug 10, 2017

If this is the case, we will not be able to switch on/off statistics, right?

Yes indeed, but it would be interesting to see if it's faster than the current situation where we read the canEnable field.

@jvican
Copy link
Member Author

jvican commented Aug 10, 2017

@lrytz Gotcha, it's a good idea to know the lower and upper bound. I've added two commits to check the performance for both false and true.

@lrytz
Copy link
Member

lrytz commented Aug 10, 2017

OK, should I run the benchmarks on these 3 new commits?

@jvican
Copy link
Member Author

jvican commented Aug 10, 2017

Yes, please! @lrytz

@jvican
Copy link
Member Author

jvican commented Aug 10, 2017

dbb7720 is slower in the current results. I think this is due to Global and other classes setting the value of enabled when, by default, all uses of enabled_= should be executed only if enabled is false -- otherwise we cause the JVM to deoptimize. I'm giong to add a commit to test this hypothesis.

@jvican
Copy link
Member Author

jvican commented Aug 10, 2017

I retract what I said before, there is no place where we use the setter unnecessarily. Also, I am removing the last commit since it fails because of scala/bug#10460.

Now that I have all the benchmark results, I see that making _enabled a compile-time constant (2300797) is more or less on par with 6d006a5, sometimes one wins over the other one and viceversa (for example, in scalap 6d006a5 wins, but in better-files does). I assume this variation is natural to the benchmark infrastructure and that they yield more or less the same performance. Given these results, I infer that the use of switchpoints has been a success -- we're able to get the same performance as a compile-time constant.

What I cannot grasp is how having a private var _enabled is in anyway better than the two previous alternatives, and yet the benchmarks report that this is the case. I assume this is noise, although it doesn't fall strictly into the error margin. I would appreciate if someone can have a look. The only explanations I see plausible are:

  1. We've used different benchmarking machines for the first commit and the last one.
  2. The machine to run the benchmarks for all the commits is the same, but it's sometimes unstable.

Everything is green. So at this point it would be good to get some feedback from a reviewer.

@jvican
Copy link
Member Author

jvican commented Aug 10, 2017

Note that if the results look good, I'll need to remove 2300797.

val bounded = handle.bindTo(this)
setTarget(bounded)
bounded
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest to put all the method handle contants into a Java source file so they can be try static final constants.

https://wiki.openjdk.java.net/display/HotSpot/Deconstructing+MethodHandles

If such a reference to a MethodHandle is held in a static final field then the runtime should be able to constant fold invocations on that reference and what it holds when inlining occurs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool. I'll try this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -267,19 +308,20 @@ quant)
*/
final val hotEnabled = false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To get a better look at the performance overhead (or absense thereof), perhaps you could extend the switchpoint technique to hotEnabled.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've decided to reuse canEnable for hotEnabled as well, so that when we do measure statistics we get the whole information. Let me know if you're ok with this.

@jvican jvican force-pushed the stats branch 3 times, most recently from 184a61d to 01a4287 Compare August 14, 2017 11:39
@jvican
Copy link
Member Author

jvican commented Aug 14, 2017

There seems to be some problem here with the CLA signature checker not detecting that I've signed the CLA in commit 01a4287 and blocking the rest of the commits from being compiled/tested. Any hints on how I could solve it?

@jvican jvican closed this Aug 14, 2017
@jvican jvican reopened this Aug 14, 2017
@jvican
Copy link
Member Author

jvican commented Aug 14, 2017

There's a noticeable hit in performance (~10%) when _hotEnabled is forwarding to canEnable. This can be seen in the benchmarks. I'm surprised by this -- I presume this change exhausts the JVM inlining budget?

The use of statics in 63953e1 seems to have no effect in performance compared to the previous approach.

What's your view on this @retronym? I haven't made _hotEnabled use its own switchpoint because I believe that MutableCallSites can only have one target -- making such a change would require the switchpoint strategy to be more abstract, being potentially slower (I'm actually not sure if this would be noticeable or not).

@jvican
Copy link
Member Author

jvican commented Aug 14, 2017

Okay, I've made some arrangements. You can check all the most important benchmarks here: https://scala-ci.typesafe.com/grafana/dashboard/db/pr-validation?var-scalaSha=311a576030|bb3e997faa|6d006a5461|63953e1388|5254231683|b5a3aee4db&var-source=All&var-bench=HotScalacBenchmark.compile&var-host=scalabench@scalabench@&from=1502073067834&to=1502882030576.

At the end, the statics trick did have an effect on performance, b5a3aee addresses an oversight I had.

I think this is already ready for another review.

@jvican jvican force-pushed the stats branch 2 times, most recently from c675569 to 042895b Compare August 15, 2017 14:26
@jvican
Copy link
Member Author

jvican commented Aug 15, 2017

Okay, wrapping up, after all the experimentation that I've done with the different commits, my conclusions are:

  1. Last two commits reduce the overhead of using this approach and enabling statistics compiler-wide. This approach uses AlmostFinalValue as proposed by Rémi Forax. This approach, even though boxed, is better than the one I started with (a special-cased implementation for Statistics) because it allows callers to forward to static fields (concretely BooleanContainer.getDefault(), instead of a non-static field). The previous approach caused a ~10% slowdown.
  2. The statistics infrastructure now is a little bit more polished and enabling -Ystatistics works (not like before, that only showed a very constrained subset of all the available statistics).
  3. The println in the setter for enabling statistics has been replaced by the use of a reporter, and the information on the variation of the statistics measurements happens in every run instead of per global.

All in all, this PR is ready for review. If you're okay with the final implementation, I'll squash accordingly. I'm planning to follow up with another proposal that will add more statistics.

@jvican
Copy link
Member Author

jvican commented Aug 15, 2017

By the way, forgot to mention. Final benchmarks for:

@jvican
Copy link
Member Author

jvican commented Aug 28, 2017

@lrytz @retronym Can you have another look at this?

@retronym
Copy link
Member

What is the overhead of having statistics enabled? Ideally it should not skew the overall runtime too much (perhaps 10% overhead is okay). If the overhead is too high, we'll need to re-separate the toggle switches for hot and regular stats.

@retronym
Copy link
Member

Oh, I see that in 347d589 you've done that already.

@jvican
Copy link
Member Author

jvican commented Aug 28, 2017

What is the overhead of having statistics enabled? Ideally it should not skew the overall runtime too much (perhaps 10% overhead is okay). If the overhead is too high, we'll need to re-separate the toggle switches for hot and regular stats.
Oh, I see that in 347d589 you've done that already.

Right now, all of them are enabled and the overhead (when not enabled) is the one reported in my last comments. However, I have not computed the overhead when they are enabled because of scala/bug#10460.


private static final MethodHandle DEFAULT_VALUE_GETTER = DEFAULT_VALUE.createGetter();

public static BooleanContainer getDefault() {
Copy link
Member

@retronym retronym Aug 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this actually be named isStatisticsEnabled? AFAICT it can't be used for more than one purpose, so it seems strange to leave it here.

If my assumptions are true, I'd suggest to move getDefault and the static fields it depends on to a class named StatisticsStatics or somesuch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, the naming is generic is because I thought that this BooleanContainer could be used in other scenarios (that require an almost final boolean), and so didn't want to hardcode it. There's in theory nothing tied to statistics on this class on purpose.

I'm happy to follow your advice if you think that this genericity will not be worth it in the future... but please confirm 😄.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please do. IMO this doesn't make any more sense than exposing var default: Boolean here :)

@retronym
Copy link
Member

retronym commented Aug 29, 2017

We can use -pextraArgs=-Ystatistics to have the benchmark harness enable statistics via the setting.

for V in 2.12.4-bin-347d589-SNAPSHOT; 2.12.4-bin-042895b-SNAPSHOT 2.12.3; do sbt ++$V "hot -psource=scalap -f 1" "hot -psource=scalap -f 1 -pextraArgs=-Ystatistics"; done
YextraArgs=-Ystatistics"; done

Rough numbers, one fork on my local machine.

Version -Ystatistics:false -Ystatistics:true
2.12.3 781 876
2.12.4-bin-a8c43dc 801 1035 parent of this PR
2.12.4-bin-347d589 811 1035 "hot" stats commented out
2.12.4-bin-042895b 816 1067
  • The overhead of enabling statistics in 2.12.3 was low, largely because the setting work for many counters guarded by an if of a false compile time constant
  • The overhead of a working statistics implementation is 20-25%.
    • It doesn't seem to make much difference whether the counters that had been thought of as hot (and previously controlled with a separate constant) are enabled or not.

It would be prudent to figure out which statistics are really the hot ones, and maybe remove them.

@retronym
Copy link
Member

YourKit suggests that maintaining the QuantMap of the different tree subclasses to instantiation count (in TreesStats.nodeByType) and typer timings per tree subclass (TyperStat.byTypeStack) are the dominant overhead.

image

I have found the YourKit seems to overstate the cost of Object.hashCode (maybe its profiling somehow interferes with the intrisified nature of that?).

Anyway, removing stats: retronym@6e993e6 drops the cost of -Ystatistics somewhat, to 945ms (15% overhead) (from 1067ms, or 30% overhead, above).

Options:

  • Do nothing, defer a decision and implementation to a later PR
  • Move those "actually" hot statistics behind a secondary almost-final value and compiler option
  • Make them more efficient by sampling, rather than counting every invocation. Naively, bump a counter and sample when counter % N == 0. Less naively, bump a cheap pseudorandom value and sample when counter % N == 0 (to avoid bias from patterns in data).
  • Delete them entirely

@retronym
Copy link
Member

StatisticsInfo.print is also fairly expensive (3% of total compile time) as it traverses all trees.

@jvican
Copy link
Member Author

jvican commented Aug 29, 2017

Thanks for the in-depth analysis, Jason.

Make them more efficient by sampling, rather than counting every invocation. Naively, bump a counter and sample when counter % N == 0. Less naively, bump a cheap pseudorandom value and sample when counter % N == 0 (to avoid bias from patterns in data).

I've always liked the idea of pseudorandom sampling. I think it could give good results on these super hot counters. However, before trying it out I think it's better to try easier alternatives like using a new almost final value for those hot stats you've identified + StatisticsInfo.print. If working, that should put the overhead of running normal statistics around 12%. This is implemented in 1866d1b.

IMO, this effort could be split into a different PR. If 1866d1b doesn't reduce the overhead, I suggest merging what we have now (I'll clean up the git history first) and defer a decision and implementation to another PR.

@jvican
Copy link
Member Author

jvican commented Aug 29, 2017

@retronym These are my results benchmarking locally with TurboBoost disabled. I could only benchmark 2.12.3 and the last commit in this PR, since the other artifacts weren't accessible, even though I have the following in a global sbt file:

resolvers ++= (
  if (scalaVersion.value.contains("-bin"))
     List("scala-integration" at "https://scala-ci.typesafe.com/artifactory/scala-integration/")
  else Nil
)

Results for 2.12.4-bin-042895b-SNAPSHOT

[info] Setting version to 2.12.4-bin-042895b-SNAPSHOT
[info] Benchmark                                   (corpusVersion)  (extraArgs)  (source)    Mode  Cnt     Score    Error  Units
[info] HotScalacBenchmark.compile                           latest                 scalap  sample  220  1476.834 ± 34.233  ms/op
[info] HotScalacBenchmark.compile:compile·p0.00             latest                 scalap  sample       1262.486           ms/op
[info] HotScalacBenchmark.compile:compile·p0.50             latest                 scalap  sample       1437.598           ms/op
[info] HotScalacBenchmark.compile:compile·p0.90             latest                 scalap  sample       1687.368           ms/op
[info] HotScalacBenchmark.compile:compile·p0.95             latest                 scalap  sample       1797.888           ms/op
[info] HotScalacBenchmark.compile:compile·p0.99             latest                 scalap  sample       2012.008           ms/op
[info] HotScalacBenchmark.compile:compile·p0.999            latest                 scalap  sample       2099.249           ms/op
[info] HotScalacBenchmark.compile:compile·p0.9999           latest                 scalap  sample       2099.249           ms/op
[info] HotScalacBenchmark.compile:compile·p1.00             latest                 scalap  sample       2099.249           ms/op
[info] Benchmark                                   (corpusVersion)   (extraArgs)  (source)    Mode  Cnt     Score    Error  Units
[info] HotScalacBenchmark.compile                           latest  -Ystatistics    scalap  sample   68  1579.926 ± 54.677  ms/op
[info] HotScalacBenchmark.compile:compile·p0.00             latest  -Ystatistics    scalap  sample       1411.383           ms/op
[info] HotScalacBenchmark.compile:compile·p0.50             latest  -Ystatistics    scalap  sample       1561.330           ms/op
[info] HotScalacBenchmark.compile:compile·p0.90             latest  -Ystatistics    scalap  sample       1707.291           ms/op
[info] HotScalacBenchmark.compile:compile·p0.95             latest  -Ystatistics    scalap  sample       1861.747           ms/op
[info] HotScalacBenchmark.compile:compile·p0.99             latest  -Ystatistics    scalap  sample       2168.455           ms/op
[info] HotScalacBenchmark.compile:compile·p0.999            latest  -Ystatistics    scalap  sample       2168.455           ms/op
[info] HotScalacBenchmark.compile:compile·p0.9999           latest  -Ystatistics    scalap  sample       2168.455           ms/op
[info] HotScalacBenchmark.compile:compile·p1.00             latest  -Ystatistics    scalap  sample       2168.455           ms/op

Results for 2.12.3

[info] Setting version to 2.12.3
[info] Benchmark                                   (corpusVersion)  (extraArgs)  (source)    Mode  Cnt     Score    Error  Units
[info] HotScalacBenchmark.compile                           latest                 scalap  sample  218  1489.257 ± 42.212  ms/op
[info] HotScalacBenchmark.compile:compile·p0.00             latest                 scalap  sample       1247.805           ms/op
[info] HotScalacBenchmark.compile:compile·p0.50             latest                 scalap  sample       1442.841           ms/op
[info] HotScalacBenchmark.compile:compile·p0.90             latest                 scalap  sample       1716.729           ms/op
[info] HotScalacBenchmark.compile:compile·p0.95             latest                 scalap  sample       1867.409           ms/op
[info] HotScalacBenchmark.compile:compile·p0.99             latest                 scalap  sample       2194.040           ms/op
[info] HotScalacBenchmark.compile:compile·p0.999            latest                 scalap  sample       2214.593           ms/op
[info] HotScalacBenchmark.compile:compile·p0.9999           latest                 scalap  sample       2214.593           ms/op
[info] HotScalacBenchmark.compile:compile·p1.00             latest                 scalap  sample       2214.593           ms/op
[info] Benchmark                                   (corpusVersion)   (extraArgs)  (source)    Mode  Cnt     Score    Error  Units
[info] HotScalacBenchmark.compile                           latest  -Ystatistics    scalap  sample   69  1545.206 ± 57.351  ms/op
[info] HotScalacBenchmark.compile:compile·p0.00             latest  -Ystatistics    scalap  sample       1367.343           ms/op
[info] HotScalacBenchmark.compile:compile·p0.50             latest  -Ystatistics    scalap  sample       1503.658           ms/op
[info] HotScalacBenchmark.compile:compile·p0.90             latest  -Ystatistics    scalap  sample       1690.305           ms/op
[info] HotScalacBenchmark.compile:compile·p0.95             latest  -Ystatistics    scalap  sample       1844.445           ms/op
[info] HotScalacBenchmark.compile:compile·p0.99             latest  -Ystatistics    scalap  sample       2088.763           ms/op
[info] HotScalacBenchmark.compile:compile·p0.999            latest  -Ystatistics    scalap  sample       2088.763           ms/op
[info] HotScalacBenchmark.compile:compile·p0.9999           latest  -Ystatistics    scalap  sample       2088.763           ms/op
[info] HotScalacBenchmark.compile:compile·p1.00             latest  -Ystatistics    scalap  sample       2088.763           ms/op

Conclusion

I'm surprised by these results, up until a point in which I would like someone to rerun the benchmarks and confirm them. Following these results, the overhead between the two is negligible. As a reminder, the last commit moves the statistics infrastructure in the initializer of Trees, Typers and StatisticsInfo.print into a hotEnabled almost final value which is turned on by -Yhot-statistics compiler flag (if -Ystatistics is not present, hot statistics won't be shown).

Note that last commits are failing because the builds for 9a2a0cd were aborted. Can someone restart the CI for them?

@retronym
Copy link
Member

I have this to access PR validation builds.

resolvers ++= (
  if (scalaVersion.value.endsWith("-SNAPSHOT") || scalaVersion.value.contains("-nightly"))
    List(
      "pr-scala snapshots" at "https://scala-ci.typesafe.com/artifactory/scala-pr-validation-snapshots/",
      Resolver.mavenLocal
    )
  else
    Nil
)

https://scala-ci.typesafe.com/artifactory/scala-integration/ contains fully bootstrapped builds triggered by the benchmark runs.

Even without these, it is possible to run sbt setupPublishCore publishLocal to publish to your ~/.ivy2.

@retronym
Copy link
Member

/rebuild

https://scala-ci.typesafe.com/job/scala-2.12.x-validate-test/5457/consoleText

[info] + Range inclusiveByOne.take: OK, passed 100 tests.
Build timed out (after 150 minutes). Marking the build as aborted.
Build was aborted
Archiving artifacts
Notifying endpoint 'HTTP:http://scala-ci.typesafe.com:8888/jenkins'
Notifying endpoint 'HTTP:https://scala-ci.typesafe.com/benchq/webhooks/jenkins'

@jvican
Copy link
Member Author

jvican commented Aug 30, 2017

I've just benchmarked everything again to make sure the results are consistent.

  1. Logs for Scala 2.12.3
  2. Logs for Scala 2.12.4-bin-042895b-SNAPSHOT
  3. Logs for Scala 2.12.4-bin-347d589-SNAPSHOT
  4. Logs for Scala 2.12.4-bin-1866d1b-SNAPSHOT

Rough numbers without error margins (which are more or less consistent):

Version -Ystatistics:false -Ystatistics:true
2.12.3 1289 1386
2.12.4-bin-a8c43dc ??? ??? parent of this PR
2.12.4-bin-347d589 1346 1633 "hot" stats commented out
2.12.4-bin-042895b 1357 1625 all stats enabled & together
2.12.4-bin-1866d1b 1326 1370 "hot" stats split

I don't have benchmarks for 2.12.4-bin-a8c43dc because 2.12.4-bin-1866d1b provides almost the same benchmark results than 2.12.3, confirming that the overhead between the two is minimal, both where statistics are disabled and they are enabled.

I guess this can be reviewed again.

The following commit changes the previous val into a def so that
collection of all statistics is recorded. Before this change, most of
the statistics were disabled even if `-Ystatistics` was enabled because
the value of the `_enabled` was false when the object `Statistics` was
initialized. In the case of `-Ystatistics` being enabled, `_enabled` was
changed by the settings manager but that change never affected
`canEnable`.
References: scala/scala-dev#149.

It adds a mechanism to remove the overhead of checking for statistics in
the common case (where statistics are disabled). It does so by reusing a
mechanism proposed by Rémi Forax and brought to my attention by
Jason (more here: scala/scala-dev#149).

This commit adds the following changes:

1. It adds the infrastructure so that statistics are optimized. This
means that now we have two different flags, `-Ystatistics` and
`-Yhot-statistics`, that enable different counters and timers. The hot
statistics are the ones that instrument key parts of the compiler
infrastructure and heavily affect the runtime (around 10%). The ones for
`-Ystatistics` enable most of the statistics, and are the ones that give
most information about the compiler and yet have a slight effect on
runtime.

2. It prints the statistics overhead per run, and does it through the
reporter instead of `println`.

The implementation of this commit has gone through several iterations
until performance has been deemed to be minimal, both for the cases
where statistics are enabled and disabled.

For a concrete analysis, see the PR discussion:
scala#6034.

There could still be some work optimizing the overhead where statistics
are enabled, but this commit does a pretty good job at it so far.
@jvican
Copy link
Member Author

jvican commented Sep 19, 2017

I have squashed the commits of this PR. The previous git history can be found in the stats-non-squashed.

@retronym
Copy link
Member

retronym commented Sep 19, 2017

Scheduled another benchmark to be doubleplus sure.

Copy link
Member

@retronym retronym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, thanks for your persistence in finding the optimal solution.

@retronym retronym merged commit 4fa380b into scala:2.12.x Sep 20, 2017
adriaanm added a commit to adriaanm/scala that referenced this pull request Sep 27, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants