Conversation
|
Related to scala/scala-dev#246 |
|
|
||
| /** | ||
| * The hashcode of a name depends on the first, the last and the middle character, | ||
| * The hashcode of is equivalent to cs name depends on the first, the last and the middle character, |
There was a problem hiding this comment.
Hm, looks like a partial update, you probably need to rephrase the whole line.
There was a problem hiding this comment.
Oops. I've just pushed the full change.
|
great numbers for ScalaTest, hope there's no negative impact on other projects |
| cs(offset + len - 1) * 41 + | ||
| cs(offset + (len >> 1))) | ||
| else 0 | ||
| private def hashValue(cs: Array[Char], offset: Int, len: Int): Int = { |
There was a problem hiding this comment.
@retronym do you have a guarantee that strings don't repeat in your name-table?(Dotty does)
If yes, than you won't need to consider character values at all as default hashcode(system identity hashcode) would work correctly.
There was a problem hiding this comment.
@DarkDimius how do you guarantee that strings don't repeat? Let's say you first create a name "abc", then a name "bc", will the name table only contain "abc"?
Also, to me, dotty's Names.scala looks quite similar to scala's. There's a hashValue method that looks the same as the one being replaced in this PR. Can you point out differences you have in mind?
There was a problem hiding this comment.
@DarkDimius how do you guarantee that strings don't repeat? Let's say you first create a name "abc", then a name "bc", will the name table only contain "abc"?
It would contain both, but the next time you try to create abc you'll get the same one. See https://github.com/lampepfl/dotty/blob/master/src/dotty/tools/dotc/core/Names.scala#L245
Also, to me, dotty's Names.scala looks quite similar to scala's. There's a hashValue method that looks the same as the one being replaced in this PR. Can you point out differences you have in mind?
hashValue is only used when creating new Term names. It's not used when comparing hashcodes.
Hashcode is https://github.com/lampepfl/dotty/blob/master/src/dotty/tools/dotc/core/Names.scala#L178
There was a problem hiding this comment.
This looks to me exactly the same as in scala https://github.com/scala/scala/blob/2.12.x/src/reflect/scala/reflect/internal/Names.scala#L233, so I still don't understand where the difference is to dotty..
There was a problem hiding this comment.
Oh, I misunderstood the intention of this PR. I thought it changes the hashcode.
Dotty has the same issue.
37cc1a0 to
b70d7a0
Compare
|
Here's one way to perform the followup work (using per-compilation unit counters rather than a global counter to generate the fresh existential names): diff --git a/src/reflect/scala/reflect/internal/Symbols.scala b/src/reflect/scala/reflect/internal/Symbols.scala
index f870ecf..6f74b2f 100644
--- a/src/reflect/scala/reflect/internal/Symbols.scala
+++ b/src/reflect/scala/reflect/internal/Symbols.scala
@@ -34,9 +34,7 @@ trait Symbols extends api.Symbols { self: SymbolTable =>
def recursionTable = _recursionTable
def recursionTable_=(value: immutable.Map[Symbol, Int]) = _recursionTable = value
- private var existentialIds = 0
- protected def nextExistentialId() = { existentialIds += 1; existentialIds }
- protected def freshExistentialName(suffix: String) = newTypeName("_" + nextExistentialId() + suffix)
+ protected def freshExistentialName(suffix: String) = newTypeName(currentFreshNameCreator.newName("_") + suffix)
// Set the fields which point companions at one another. Returns the module.
def connectModuleToClass(m: ModuleSymbol, moduleClass: ClassSymbol): ModuleSymbol = {
diff --git a/src/reflect/scala/reflect/runtime/SynchronizedSymbols.scala b/src/reflect/scala/reflect/runtime/SynchronizedSymbols.scala
index 237afa0..2cbf0a7 100644
--- a/src/reflect/scala/reflect/runtime/SynchronizedSymbols.scala
+++ b/src/reflect/scala/reflect/runtime/SynchronizedSymbols.scala
@@ -10,9 +10,6 @@ private[reflect] trait SynchronizedSymbols extends internal.Symbols { self: Symb
private lazy val atomicIds = new java.util.concurrent.atomic.AtomicInteger(0)
override protected def nextId() = atomicIds.incrementAndGet()
- private lazy val atomicExistentialIds = new java.util.concurrent.atomic.AtomicInteger(0)
- override protected def nextExistentialId() = atomicExistentialIds.incrementAndGet()
-
private lazy val _recursionTable = mkThreadLocalStorage(immutable.Map.empty[Symbol, Int])
override def recursionTable = _recursionTable.get
override def recursionTable_=(value: immutable.Map[Symbol, Int]) = _recursionTable.set(value) |
|
My first attempt to benchmark this for small programs showed no change (795ms->797ms) for I plan to re-run the benchmarks, and also compare profiles before/after to see is |
b70d7a0 to
6991bf5
Compare
|
Have you had a chance to confirm the benchmark results? |
|
Nope, I haven't. I'm going to close this one for now and revisit once we've got some more benchmarking coverage next year. |
The old approach of using the first, last, and middle characters only lays a trap for generate names that have little or no entropy at these locations. For instance, fresh existential names generated in "as seen from" operations are one such case, and when compiling large batches of files the name table can become imbalanced. This seems to be the bottleneck compiling the enourmous (generated) test suite for ScalaTest itself: scala/scala-dev#246 (comment) This commit uses all characters to compute the hashCode. It improves the compilation time of ScalaTest tests from 487s to 349s (0.71x). It would still be useful to avoid generating these fresh names with a global counter, as this represents a steady name leak in long-lived Globals (e.g. the presentation compiler.)
6991bf5 to
2d2b895
Compare
|
I'm happy with the performance, and will merge this now. |
What did you observe? |
|
@lrytz neutral to ~1% regression I'm hopeful that #8019 will recover a little of the regression (if it is real, that is!). I'll keep an eye on the performance charts and the results our profile runs now both changes are merged. |
|
The motivation to revive this, BTW, was that we had a report from a customer that they had a name table bucket with 500 entries, suggesting that something in the compiler or a macro/plugin was running into the same problem as we found/fixed in #5506 |
The old approach of using the first, last, and middle characters
only lays a trap for generate names that have little or no entropy
at these locations. Fresh existential names generated in "as seen
from" operations are one such case, and when compiling large
batches of files the name table can become imbalanced.
I found such an example in ScalaTest:
scala/scala-dev#246 (comment)
This commit uses all characters to compute the hashCode.
Review by @adriaanm @lrytz
I expect that this change will be beneficial on large code bases
that regularly exercise
AsSeenFrom#captureThis, and neutral onmost smaller code bases. I'll perform some benchmarks to make sure
of that "neutral" claim and post the results here.