Rationalize subclasses of Name by retronym · Pull Request #8044 · scala/scala

retronym · 2019-05-08T06:47:15Z

Due to alignment, TermName_R (which doesn't cache the provided string
for toString) takes up just as much space as TermName_S.

The code ends up somewhat easier to read with by just encoding the
difference with the a nullable field.

retronym · 2019-05-08T06:47:40Z

$ java -Djdk.attach.allowAttachSelf=true -cp $(coursier fetch -q -p 'org.openjdk.jol:jol-cli:0.9') org.openjdk.jol.Main internals -cp $(scala-classpath $(scala-ref-version 2.13.x)) 'scala.reflect.internal.Names$TermName_R'
# WARNING: Unable to attach Serviceability Agent. You can try again with escalated privileges. Two options: a) use -Djol.tryWithSudo=true to try with sudo; b) echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
# Running 64-bit HotSpot VM.
# Using compressed oop with 3-bit shift.
# Using compressed klass with 3-bit shift.
# WARNING | Compressed references base/shifts are guessed by the experiment!
# WARNING | Therefore, computed addresses are just guesses, and ARE NOT RELIABLE.
# WARNING | Make sure to attach Serviceability Agent to get the reliable addresses.
# Objects are 8 bytes aligned.
# Field sizes by type: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
# Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]

Failed to find matching constructor, falling back to class-only introspection.

scala.reflect.internal.Names$TermName_R object internals:
 OFFSET  SIZE                                    TYPE DESCRIPTION                               VALUE
      0    12                                         (object header)                           N/A
     12     4                 scala.reflect.api.Names NameApi.$outer                            N/A
     16     4                                     int Name.index                                N/A
     20     4                                     int Name.len                                  N/A
     24     4   scala.reflect.internal.Names.TermName TermName.next                             N/A
     28     4                                         (loss due to the next object alignment)
Instance size: 32 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total

...
scala.reflect.internal.Names$TermName_S object internals:
 OFFSET  SIZE                                    TYPE DESCRIPTION                               VALUE
      0    12                                         (object header)                           N/A
     12     4                 scala.reflect.api.Names NameApi.$outer                            N/A
     16     4                                     int Name.index                                N/A
     20     4                                     int Name.len                                  N/A
     24     4   scala.reflect.internal.Names.TermName TermName.next                             N/A
     28     4                        java.lang.String TermName_S.toString                       N/A
Instance size: 32 bytes
Space losses: 0 bytes internal + 0 bytes external = 0 bytes total

diesalbla · 2019-05-08T10:09:21Z

@retronym I have pushed the changes to the Partest that was failing, due to reflection.

src/reflect/scala/reflect/internal/Names.scala

mkeskells · 2019-05-08T21:51:18Z

src/reflect/scala/reflect/internal/Names.scala

    def longString: String      = nameKind + " " + decode
    def debugString = { val s = decode ; if (isTypeName) s + "!" else s }
+
+    override final def toString: String = if (cachedString == null) new String(chrs, index, len) else cachedString


we could consider caching the name if we have calculated it. Its a trade off, and would depend on some benchmarking when in use

We don't actually call Name.toString anywhere hot. When we build up full names (e.g. scala.Boolean or scala/Boolean we get the characters of the components from the underlying chrs array directly.

I've added a commit to clean up the part of code I'm talking about here (Symbol#fullName). After that change, I've deprecated direct access to Names.chrs.

if it is not useful, then why do we have the field at all?
There are lots of code paths that call toString, and probably don't all need to
e.g. append, prepend, string=

also the unapply of TermName and TypeName

Due to alignment, TermName_R (which doesn't cache the provided string for toString) takes up just as much space as TermName_S. The code ends up somewhat easier to read with by just encoding the difference with the a nullable field.

src/reflect/scala/reflect/internal/Names.scala

mkeskells · 2019-05-09T07:20:33Z

You could argue that, after we decake Name, the toString field may no longer be free, it may become expensive

If we will lose the outer pointer (not sure if we will), the toString field retain 8 bytes

retronym · 2019-05-09T07:28:27Z

We won't actually lose the outer pointer (it might in the future point to a NameTable rather than a Global.)

I started to look for places where Name.toString is called during compilation but could be avoided. I would hope that by addressing any hot uses of Name.toString we could eliminate cachedString altogether. This would reduce the retained size of the name table somewhat.

diesalbla · 2019-05-09T14:54:45Z

src/reflect/scala/reflect/internal/Names.scala

+
  /** Memory to store all names sequentially. */
-  var chrs: Array[Char] = new Array[Char](NAME_SIZE)
+  private[this] var _chrs: Array[Char] = new Array[Char](NAME_SIZE) // TODO this ought to be private


Suggested change

private[this] var _chrs: Array[Char] = new Array[Char](NAME_SIZE) // TODO this ought to be private

private[this] var _chrs: Array[Char] = new Array[Char](NAME_SIZE)

diesalbla · 2019-05-09T18:32:53Z

@retronym Since you are trying to reduce space consumption on the names area, I had an idea and a doubt, on which I would like to get your opinion.

The idea is to tun _chrs from being an Array[Char], which is to say a UTF-16-encoded characters, into a UTF-8-encoded Array[Byte]. As I understand it, UTF-8 is a variable-length encoding, where most ASCII characters occupy 1 byte. Given that most of the Scala code out there is written in the ASCII subset (leave or take some fancy arrows), this would probably halve memory usage. I am guessing that, currently, for most programs, _chrs is wasting half its space with Character values that have a zero byte for its most significant bits.

The doubt I was having (which is more about looking both sides of the road) is about the Array[Char] type: have we checked if the JVM is using exactly two bytes for each position? I recall reading somewhere that, sometimes, the JVM may end up using a whole 32-bit word for smaller primitive values.

diesalbla · 2019-05-09T18:40:02Z

src/reflect/scala/reflect/internal/Names.scala


  // SYNCNOTE: caller to constructor must synchronize if `synchronizeNames` is enabled
-  sealed abstract class TermName(index0: Int, len0: Int, val next: TermName) extends Name(index0, len0) with TermNameApi {
+  final class TermName(index0: Int, len0: Int, val next: TermName, cachedString: String) extends Name(index0, len0, cachedString) with TermNameApi {


Would there be a benefit in reducing either index0 or len0 to Short or Byte? I do not think there are any identifiers over 127 characters long.

index0 cant be. Its an index into a blob of all of the identifiers. I might be more inclined to consider if 2G is big enough to cope with all of the names that a project could reference (It seems a strange and arbitrary limit to build in to the architecture)

What is the language limit, or the JVM limits? I don't think that we could impose a smaller limit for the length of the identifier, after all we have to be able to read a java identifier, and cope with any legal identifier in scala, and I think that is 65536, from below

from https://docs.oracle.com/javase/specs/jvms/se12/html/jvms-4.html#jvms-4.11

The length of field and method names, field and method descriptors, and other constant string values (including those referenced by ConstantValue (§4.7.2) attributes) is limited to 65535 characters by the 16-bit unsigned length item of the CONSTANT_Utf8_info structure (§4.4.7).

Perhaps you could use 24 bits for the index0 and 8 bits for the len0, or any such arrangement, so you can fit them inside a single Int field. That would allow you to have 16777216 characters, for words of up to 255 characters each.

That sounds dangerous simply because while we could forbid 256-character scala names, we can't impose such a restriction on Java (and being unable even to parse such a classfile does not sound acceptable to me). We could have an XL(Term|Type)Name to accommodate that, perhaps.

mkeskells · 2019-05-09T21:58:34Z

@diesalbla in my original proposal for the names reorganisation, we use just strings, and java 10 ( I think) there is already an optimisation to cope with strings that consist of 8 bit chars
I don't believe that this structure really is worth the effort. Its just a set of string like things IMO

hrhino

I think there's probably a few more things that could be followed up on but the change itself certainly LGTM.

hrhino · 2019-05-11T21:50:45Z

src/reflect/scala/reflect/internal/Names.scala

+  @deprecated("Don't access name table contents directly.", "2.12.9")
+  def chrs: Array[Char] = _chrs
+  @deprecated("Don't access name table contents directly.", "2.12.9")
+  def chrs_=(cs: Array[Char]) = _chrs = cs


Where is this used?

I strongly doubt it is ever used anywhere, I was just being conservative to keep the current API in a minor release.

hrhino · 2019-05-11T21:54:39Z

src/reflect/scala/reflect/internal/Names.scala


  // SYNCNOTE: caller to constructor must synchronize if `synchronizeNames` is enabled
-  sealed abstract class TermName(index0: Int, len0: Int, val next: TermName) extends Name(index0, len0) with TermNameApi {
+  final class TermName(index0: Int, len0: Int, val next: TermName, cachedString: String) extends Name(index0, len0, cachedString) with TermNameApi {


That sounds dangerous simply because while we could forbid 256-character scala names, we can't impose such a restriction on Java (and being unable even to parse such a classfile does not sound acceptable to me). We could have an XL(Term|Type)Name to accommodate that, perhaps.

scala-jenkins added this to the 2.12.9 milestone May 8, 2019

retronym force-pushed the topic/rationalize-names branch 2 times, most recently from 02414e7 to 07aa78f Compare May 8, 2019 07:23

hrhino reviewed May 8, 2019

View reviewed changes

src/reflect/scala/reflect/internal/Names.scala Outdated Show resolved Hide resolved

mkeskells reviewed May 8, 2019

View reviewed changes

Rationalize subclasses of Name

e5dab49

Due to alignment, TermName_R (which doesn't cache the provided string for toString) takes up just as much space as TermName_S. The code ends up somewhat easier to read with by just encoding the difference with the a nullable field.

retronym force-pushed the topic/rationalize-names branch from 84e4ac0 to e5dab49 Compare May 8, 2019 22:16

hrhino reviewed May 8, 2019

View reviewed changes

src/reflect/scala/reflect/internal/Names.scala Outdated Show resolved Hide resolved

retronym added 2 commits May 9, 2019 08:53

Avoid direct use of Names.chrs from Symbols

3b57788

Deprecate external access to Names.chrs

22f6779

retronym force-pushed the topic/rationalize-names branch from 09c257c to 22f6779 Compare May 8, 2019 22:53

diesalbla approved these changes May 9, 2019

View reviewed changes

diesalbla reviewed May 9, 2019

View reviewed changes

hrhino approved these changes May 11, 2019

View reviewed changes

retronym merged commit ae27fe9 into scala:2.12.x May 14, 2019

retronym mentioned this pull request Jul 25, 2019

More accurate outer checks in patterns #8261

Closed

	private[this] var _chrs: Array[Char] = new Array[Char](NAME_SIZE) // TODO this ought to be private
	private[this] var _chrs: Array[Char] = new Array[Char](NAME_SIZE)

Conversation

retronym commented May 8, 2019

Uh oh!

retronym commented May 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

diesalbla commented May 8, 2019

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

retronym May 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mkeskells commented May 9, 2019

Uh oh!

retronym commented May 9, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

diesalbla commented May 9, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mkeskells commented May 9, 2019

Uh oh!

hrhino left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

retronym commented May 8, 2019 •

edited

Loading

retronym May 8, 2019 •

edited

Loading

diesalbla commented May 9, 2019 •

edited

Loading