Improve hash code of Names by retronym · Pull Request #5474 · scala/scala

retronym · 2016-10-21T10:54:18Z

The old approach of using the first, last, and middle characters
only lays a trap for generate names that have little or no entropy
at these locations. Fresh existential names generated in "as seen
from" operations are one such case, and when compiling large
batches of files the name table can become imbalanced.

I found such an example in ScalaTest:

scala/scala-dev#246 (comment)

This commit uses all characters to compute the hashCode.

Review by @adriaanm @lrytz

I expect that this change will be beneficial on large code bases
that regularly exercise AsSeenFrom#captureThis, and neutral on
most smaller code bases. I'll perform some benchmarks to make sure
of that "neutral" claim and post the results here.

retronym · 2016-10-21T10:55:42Z

Related to scala/scala-dev#246

dragos · 2016-10-21T11:11:45Z

src/reflect/scala/reflect/internal/Names.scala


  /**
-   * The hashcode of a name depends on the first, the last and the middle character,
+   * The hashcode of is equivalent to cs name depends on the first, the last and the middle character,


Hm, looks like a partial update, you probably need to rephrase the whole line.

Oops. I've just pushed the full change.

lrytz · 2016-10-21T11:19:45Z

great numbers for ScalaTest, hope there's no negative impact on other projects

DarkDimius · 2016-10-21T11:25:21Z

src/reflect/scala/reflect/internal/Names.scala

-       cs(offset + len - 1) * 41 +
-       cs(offset + (len >> 1)))
-    else 0
+  private def hashValue(cs: Array[Char], offset: Int, len: Int): Int = {


@retronym do you have a guarantee that strings don't repeat in your name-table?(Dotty does)
If yes, than you won't need to consider character values at all as default hashcode(system identity hashcode) would work correctly.

@DarkDimius how do you guarantee that strings don't repeat? Let's say you first create a name "abc", then a name "bc", will the name table only contain "abc"?

Also, to me, dotty's Names.scala looks quite similar to scala's. There's a hashValue method that looks the same as the one being replaced in this PR. Can you point out differences you have in mind?

@lrytz,

@DarkDimius how do you guarantee that strings don't repeat? Let's say you first create a name "abc", then a name "bc", will the name table only contain "abc"?

It would contain both, but the next time you try to create abc you'll get the same one. See https://github.com/lampepfl/dotty/blob/master/src/dotty/tools/dotc/core/Names.scala#L245

Also, to me, dotty's Names.scala looks quite similar to scala's. There's a hashValue method that looks the same as the one being replaced in this PR. Can you point out differences you have in mind?

hashValue is only used when creating new Term names. It's not used when comparing hashcodes.
Hashcode is https://github.com/lampepfl/dotty/blob/master/src/dotty/tools/dotc/core/Names.scala#L178

This looks to me exactly the same as in scala https://github.com/scala/scala/blob/2.12.x/src/reflect/scala/reflect/internal/Names.scala#L233, so I still don't understand where the difference is to dotty..

Oh, I misunderstood the intention of this PR. I thought it changes the hashcode.
Dotty has the same issue.

retronym · 2016-10-22T04:04:23Z

Here's one way to perform the followup work (using per-compilation unit counters rather than a global counter to generate the fresh existential names):

diff --git a/src/reflect/scala/reflect/internal/Symbols.scala b/src/reflect/scala/reflect/internal/Symbols.scala
index f870ecf..6f74b2f 100644
--- a/src/reflect/scala/reflect/internal/Symbols.scala
+++ b/src/reflect/scala/reflect/internal/Symbols.scala
@@ -34,9 +34,7 @@ trait Symbols extends api.Symbols { self: SymbolTable =>
   def recursionTable = _recursionTable
   def recursionTable_=(value: immutable.Map[Symbol, Int]) = _recursionTable = value

-  private var existentialIds = 0
-  protected def nextExistentialId() = { existentialIds += 1; existentialIds }
-  protected def freshExistentialName(suffix: String) = newTypeName("_" + nextExistentialId() + suffix)
+  protected def freshExistentialName(suffix: String) = newTypeName(currentFreshNameCreator.newName("_") + suffix)

   // Set the fields which point companions at one another.  Returns the module.
   def connectModuleToClass(m: ModuleSymbol, moduleClass: ClassSymbol): ModuleSymbol = {
diff --git a/src/reflect/scala/reflect/runtime/SynchronizedSymbols.scala b/src/reflect/scala/reflect/runtime/SynchronizedSymbols.scala
index 237afa0..2cbf0a7 100644
--- a/src/reflect/scala/reflect/runtime/SynchronizedSymbols.scala
+++ b/src/reflect/scala/reflect/runtime/SynchronizedSymbols.scala
@@ -10,9 +10,6 @@ private[reflect] trait SynchronizedSymbols extends internal.Symbols { self: Symb
   private lazy val atomicIds = new java.util.concurrent.atomic.AtomicInteger(0)
   override protected def nextId() = atomicIds.incrementAndGet()

-  private lazy val atomicExistentialIds = new java.util.concurrent.atomic.AtomicInteger(0)
-  override protected def nextExistentialId() = atomicExistentialIds.incrementAndGet()
-
   private lazy val _recursionTable = mkThreadLocalStorage(immutable.Map.empty[Symbol, Int])
   override def recursionTable = _recursionTable.get
   override def recursionTable_=(value: immutable.Map[Symbol, Int]) = _recursionTable.set(value)

retronym · 2016-10-22T05:48:43Z

My first attempt to benchmark this for small programs showed no change (795ms->797ms) for better-files, and a 2% slowdown (1645ms -> 1678ms) for scalap. However, the results seem a little noiser on my laptop (even with TurboBoost disabled) than i was used to on my desktop where I've done previous benchmarking, so I'm not sure I have enough warmup time or forks to say whether or not that 2% is just noise.

I plan to re-run the benchmarks, and also compare profiles before/after to see is hashValue is starting to show up.

adriaanm · 2016-12-19T22:45:56Z

Have you had a chance to confirm the benchmark results?

retronym · 2016-12-19T22:58:09Z

Nope, I haven't. I'm going to close this one for now and revisit once we've got some more benchmarking coverage next year.

The old approach of using the first, last, and middle characters only lays a trap for generate names that have little or no entropy at these locations. For instance, fresh existential names generated in "as seen from" operations are one such case, and when compiling large batches of files the name table can become imbalanced. This seems to be the bottleneck compiling the enourmous (generated) test suite for ScalaTest itself: scala/scala-dev#246 (comment) This commit uses all characters to compute the hashCode. It improves the compilation time of ScalaTest tests from 487s to 349s (0.71x). It would still be useful to avoid generating these fresh names with a global counter, as this represents a steady name leak in long-lived Globals (e.g. the presentation compiler.)

retronym · 2019-04-10T10:52:32Z

Benchmark run: https://scala-ci.typesafe.com/view/scala-bench/job/compiler-benchmark/2573/console

retronym · 2019-05-08T06:52:33Z

I'm happy with the performance, and will merge this now.

lrytz · 2019-05-08T07:15:58Z

I'm happy with the performance

What did you observe?

retronym · 2019-05-08T07:34:06Z

@lrytz neutral to ~1% regression

I'm hopeful that #8019 will recover a little of the regression (if it is real, that is!). I'll keep an eye on the performance charts and the results our profile runs now both changes are merged.

retronym · 2019-05-08T07:36:21Z

The motivation to revive this, BTW, was that we had a report from a customer that they had a name table bucket with 500 entries, suggesting that something in the compiler or a macro/plugin was running into the same problem as we found/fixed in #5506

scala-jenkins added this to the 2.12.1 milestone Oct 21, 2016

retronym mentioned this pull request Oct 21, 2016

Scalatest compilation time regression in 2.12 scala/scala-dev#246

Closed

dragos reviewed Oct 21, 2016

View reviewed changes

DarkDimius reviewed Oct 21, 2016

View reviewed changes

DarkDimius mentioned this pull request Oct 21, 2016

Port NameTable collision rate reducing change from Scalac scala/scala3#1619

Closed

retronym force-pushed the topic/name-hash branch from 37cc1a0 to b70d7a0 Compare October 22, 2016 03:45

lrytz added the WIP label Oct 28, 2016

retronym force-pushed the topic/name-hash branch from b70d7a0 to 6991bf5 Compare November 8, 2016 01:23

retronym modified the milestones: 2.12.2, 2.12.1 Nov 29, 2016

retronym closed this Dec 19, 2016

SethTisue removed this from the 2.12.2 milestone Dec 20, 2016

smarter mentioned this pull request Jan 7, 2018

Fix #1619: Reduce collision rate in NameTable scala/scala3#3768

Merged

retronym reopened this Apr 10, 2019

scala-jenkins added this to the 2.12.9 milestone Apr 10, 2019

retronym force-pushed the topic/name-hash branch from 6991bf5 to 2d2b895 Compare April 10, 2019 10:09

retronym mentioned this pull request Apr 24, 2019

fresh implementation of Names #7976

Closed

retronym removed the WIP label May 8, 2019

retronym merged commit 92f6515 into scala:2.12.x May 8, 2019

SethTisue added the performance the need for speed. usually compiler performance, sometimes runtime performance. label May 8, 2019

Conversation

retronym commented Oct 21, 2016

Uh oh!

retronym commented Oct 21, 2016

Uh oh!

dragos Oct 21, 2016

Choose a reason for hiding this comment

Uh oh!

retronym Oct 22, 2016

Choose a reason for hiding this comment

Uh oh!

lrytz commented Oct 21, 2016

Uh oh!

DarkDimius Oct 21, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lrytz Oct 21, 2016

Choose a reason for hiding this comment

Uh oh!

DarkDimius Oct 21, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lrytz Oct 21, 2016

Choose a reason for hiding this comment

Uh oh!

DarkDimius Oct 21, 2016

Choose a reason for hiding this comment

Uh oh!

retronym commented Oct 22, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

retronym commented Oct 22, 2016

Uh oh!

adriaanm commented Dec 19, 2016

Uh oh!

retronym commented Dec 19, 2016

Uh oh!

retronym commented Apr 10, 2019

Uh oh!

retronym commented May 8, 2019

Uh oh!

lrytz commented May 8, 2019

Uh oh!

retronym commented May 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

retronym commented May 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

DarkDimius Oct 21, 2016 •

edited

Loading

DarkDimius Oct 21, 2016 •

edited

Loading

retronym commented Oct 22, 2016 •

edited

Loading

retronym commented May 8, 2019 •

edited

Loading

retronym commented May 8, 2019 •

edited

Loading