Skip to content

Improve hash code of Names#5474

Merged
retronym merged 1 commit intoscala:2.12.xfrom
retronym:topic/name-hash
May 8, 2019
Merged

Improve hash code of Names#5474
retronym merged 1 commit intoscala:2.12.xfrom
retronym:topic/name-hash

Conversation

@retronym
Copy link
Member

The old approach of using the first, last, and middle characters
only lays a trap for generate names that have little or no entropy
at these locations. Fresh existential names generated in "as seen
from" operations are one such case, and when compiling large
batches of files the name table can become imbalanced.

I found such an example in ScalaTest:

scala/scala-dev#246 (comment)

This commit uses all characters to compute the hashCode.

Review by @adriaanm @lrytz

I expect that this change will be beneficial on large code bases
that regularly exercise AsSeenFrom#captureThis, and neutral on
most smaller code bases. I'll perform some benchmarks to make sure
of that "neutral" claim and post the results here.

@retronym
Copy link
Member Author

Related to scala/scala-dev#246


/**
* The hashcode of a name depends on the first, the last and the middle character,
* The hashcode of is equivalent to cs name depends on the first, the last and the middle character,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, looks like a partial update, you probably need to rephrase the whole line.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. I've just pushed the full change.

@lrytz
Copy link
Member

lrytz commented Oct 21, 2016

great numbers for ScalaTest, hope there's no negative impact on other projects

cs(offset + len - 1) * 41 +
cs(offset + (len >> 1)))
else 0
private def hashValue(cs: Array[Char], offset: Int, len: Int): Int = {
Copy link
Contributor

@DarkDimius DarkDimius Oct 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@retronym do you have a guarantee that strings don't repeat in your name-table?(Dotty does)
If yes, than you won't need to consider character values at all as default hashcode(system identity hashcode) would work correctly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DarkDimius how do you guarantee that strings don't repeat? Let's say you first create a name "abc", then a name "bc", will the name table only contain "abc"?

Also, to me, dotty's Names.scala looks quite similar to scala's. There's a hashValue method that looks the same as the one being replaced in this PR. Can you point out differences you have in mind?

Copy link
Contributor

@DarkDimius DarkDimius Oct 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lrytz,

@DarkDimius how do you guarantee that strings don't repeat? Let's say you first create a name "abc", then a name "bc", will the name table only contain "abc"?

It would contain both, but the next time you try to create abc you'll get the same one. See https://github.com/lampepfl/dotty/blob/master/src/dotty/tools/dotc/core/Names.scala#L245

Also, to me, dotty's Names.scala looks quite similar to scala's. There's a hashValue method that looks the same as the one being replaced in this PR. Can you point out differences you have in mind?

hashValue is only used when creating new Term names. It's not used when comparing hashcodes.
Hashcode is https://github.com/lampepfl/dotty/blob/master/src/dotty/tools/dotc/core/Names.scala#L178

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks to me exactly the same as in scala https://github.com/scala/scala/blob/2.12.x/src/reflect/scala/reflect/internal/Names.scala#L233, so I still don't understand where the difference is to dotty..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I misunderstood the intention of this PR. I thought it changes the hashcode.
Dotty has the same issue.

@retronym
Copy link
Member Author

retronym commented Oct 22, 2016

Here's one way to perform the followup work (using per-compilation unit counters rather than a global counter to generate the fresh existential names):

diff --git a/src/reflect/scala/reflect/internal/Symbols.scala b/src/reflect/scala/reflect/internal/Symbols.scala
index f870ecf..6f74b2f 100644
--- a/src/reflect/scala/reflect/internal/Symbols.scala
+++ b/src/reflect/scala/reflect/internal/Symbols.scala
@@ -34,9 +34,7 @@ trait Symbols extends api.Symbols { self: SymbolTable =>
   def recursionTable = _recursionTable
   def recursionTable_=(value: immutable.Map[Symbol, Int]) = _recursionTable = value

-  private var existentialIds = 0
-  protected def nextExistentialId() = { existentialIds += 1; existentialIds }
-  protected def freshExistentialName(suffix: String) = newTypeName("_" + nextExistentialId() + suffix)
+  protected def freshExistentialName(suffix: String) = newTypeName(currentFreshNameCreator.newName("_") + suffix)

   // Set the fields which point companions at one another.  Returns the module.
   def connectModuleToClass(m: ModuleSymbol, moduleClass: ClassSymbol): ModuleSymbol = {
diff --git a/src/reflect/scala/reflect/runtime/SynchronizedSymbols.scala b/src/reflect/scala/reflect/runtime/SynchronizedSymbols.scala
index 237afa0..2cbf0a7 100644
--- a/src/reflect/scala/reflect/runtime/SynchronizedSymbols.scala
+++ b/src/reflect/scala/reflect/runtime/SynchronizedSymbols.scala
@@ -10,9 +10,6 @@ private[reflect] trait SynchronizedSymbols extends internal.Symbols { self: Symb
   private lazy val atomicIds = new java.util.concurrent.atomic.AtomicInteger(0)
   override protected def nextId() = atomicIds.incrementAndGet()

-  private lazy val atomicExistentialIds = new java.util.concurrent.atomic.AtomicInteger(0)
-  override protected def nextExistentialId() = atomicExistentialIds.incrementAndGet()
-
   private lazy val _recursionTable = mkThreadLocalStorage(immutable.Map.empty[Symbol, Int])
   override def recursionTable = _recursionTable.get
   override def recursionTable_=(value: immutable.Map[Symbol, Int]) = _recursionTable.set(value)

@retronym
Copy link
Member Author

My first attempt to benchmark this for small programs showed no change (795ms->797ms) for better-files, and a 2% slowdown (1645ms -> 1678ms) for scalap. However, the results seem a little noiser on my laptop (even with TurboBoost disabled) than i was used to on my desktop where I've done previous benchmarking, so I'm not sure I have enough warmup time or forks to say whether or not that 2% is just noise.

I plan to re-run the benchmarks, and also compare profiles before/after to see is hashValue is starting to show up.

@lrytz lrytz added the WIP label Oct 28, 2016
@retronym retronym modified the milestones: 2.12.2, 2.12.1 Nov 29, 2016
@adriaanm
Copy link
Contributor

Have you had a chance to confirm the benchmark results?

@retronym
Copy link
Member Author

Nope, I haven't. I'm going to close this one for now and revisit once we've got some more benchmarking coverage next year.

The old approach of using the first, last, and middle characters
only lays a trap for generate names that have little or no entropy
at these locations.

For instance, fresh existential names generated in "as seen
from" operations are one such case, and when compiling large
batches of files the name table can become imbalanced.

This seems to be the bottleneck compiling the enourmous
(generated) test suite for ScalaTest itself:

   scala/scala-dev#246 (comment)

This commit uses all characters to compute the hashCode.

It improves the compilation time of ScalaTest tests from
487s to 349s (0.71x).

It would still be useful to avoid generating these fresh names
with a global counter, as this represents a steady name leak
in long-lived Globals (e.g. the presentation compiler.)
@retronym retronym reopened this Apr 10, 2019
@scala-jenkins scala-jenkins added this to the 2.12.9 milestone Apr 10, 2019
@retronym
Copy link
Member Author

@retronym
Copy link
Member Author

retronym commented May 8, 2019

I'm happy with the performance, and will merge this now.

@retronym retronym merged commit 92f6515 into scala:2.12.x May 8, 2019
@lrytz
Copy link
Member

lrytz commented May 8, 2019

I'm happy with the performance

What did you observe?

@retronym
Copy link
Member Author

retronym commented May 8, 2019

@lrytz neutral to ~1% regression

I'm hopeful that #8019 will recover a little of the regression (if it is real, that is!). I'll keep an eye on the performance charts and the results our profile runs now both changes are merged.

@retronym
Copy link
Member Author

retronym commented May 8, 2019

The motivation to revive this, BTW, was that we had a report from a customer that they had a name table bucket with 500 entries, suggesting that something in the compiler or a macro/plugin was running into the same problem as we found/fixed in #5506

@SethTisue SethTisue added the performance the need for speed. usually compiler performance, sometimes runtime performance. label May 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance the need for speed. usually compiler performance, sometimes runtime performance.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants