Skip to content

Rationalize subclasses of Name#8044

Merged
retronym merged 3 commits intoscala:2.12.xfrom
retronym:topic/rationalize-names
May 14, 2019
Merged

Rationalize subclasses of Name#8044
retronym merged 3 commits intoscala:2.12.xfrom
retronym:topic/rationalize-names

Conversation

@retronym
Copy link
Member

@retronym retronym commented May 8, 2019

Due to alignment, TermName_R (which doesn't cache the provided string
for toString) takes up just as much space as TermName_S.

The code ends up somewhat easier to read with by just encoding the
difference with the a nullable field.

@scala-jenkins scala-jenkins added this to the 2.12.9 milestone May 8, 2019
@retronym
Copy link
Member Author

retronym commented May 8, 2019

$ java -Djdk.attach.allowAttachSelf=true -cp $(coursier fetch -q -p 'org.openjdk.jol:jol-cli:0.9') org.openjdk.jol.Main internals -cp $(scala-classpath $(scala-ref-version 2.13.x)) 'scala.reflect.internal.Names$TermName_R'
# WARNING: Unable to attach Serviceability Agent. You can try again with escalated privileges. Two options: a) use -Djol.tryWithSudo=true to try with sudo; b) echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
# Running 64-bit HotSpot VM.
# Using compressed oop with 3-bit shift.
# Using compressed klass with 3-bit shift.
# WARNING | Compressed references base/shifts are guessed by the experiment!
# WARNING | Therefore, computed addresses are just guesses, and ARE NOT RELIABLE.
# WARNING | Make sure to attach Serviceability Agent to get the reliable addresses.
# Objects are 8 bytes aligned.
# Field sizes by type: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
# Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]

Failed to find matching constructor, falling back to class-only introspection.

scala.reflect.internal.Names$TermName_R object internals:
 OFFSET  SIZE                                    TYPE DESCRIPTION                               VALUE
      0    12                                         (object header)                           N/A
     12     4                 scala.reflect.api.Names NameApi.$outer                            N/A
     16     4                                     int Name.index                                N/A
     20     4                                     int Name.len                                  N/A
     24     4   scala.reflect.internal.Names.TermName TermName.next                             N/A
     28     4                                         (loss due to the next object alignment)
Instance size: 32 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total

...
scala.reflect.internal.Names$TermName_S object internals:
 OFFSET  SIZE                                    TYPE DESCRIPTION                               VALUE
      0    12                                         (object header)                           N/A
     12     4                 scala.reflect.api.Names NameApi.$outer                            N/A
     16     4                                     int Name.index                                N/A
     20     4                                     int Name.len                                  N/A
     24     4   scala.reflect.internal.Names.TermName TermName.next                             N/A
     28     4                        java.lang.String TermName_S.toString                       N/A
Instance size: 32 bytes
Space losses: 0 bytes internal + 0 bytes external = 0 bytes total

@retronym retronym force-pushed the topic/rationalize-names branch 2 times, most recently from 02414e7 to 07aa78f Compare May 8, 2019 07:23
@diesalbla
Copy link
Contributor

@retronym I have pushed the changes to the Partest that was failing, due to reflection.

def longString: String = nameKind + " " + decode
def debugString = { val s = decode ; if (isTypeName) s + "!" else s }

override final def toString: String = if (cachedString == null) new String(chrs, index, len) else cachedString
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could consider caching the name if we have calculated it. Its a trade off, and would depend on some benchmarking when in use

Copy link
Member Author

@retronym retronym May 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't actually call Name.toString anywhere hot. When we build up full names (e.g. scala.Boolean or scala/Boolean we get the characters of the components from the underlying chrs array directly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a commit to clean up the part of code I'm talking about here (Symbol#fullName). After that change, I've deprecated direct access to Names.chrs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it is not useful, then why do we have the field at all?
There are lots of code paths that call toString, and probably don't all need to
e.g. append, prepend, string=

also the unapply of TermName and TypeName

Due to alignment, TermName_R (which doesn't cache the provided string
for toString) takes up just as much space as TermName_S.

The code ends up somewhat easier to read with by just encoding the
difference with the a nullable field.
@retronym retronym force-pushed the topic/rationalize-names branch from 84e4ac0 to e5dab49 Compare May 8, 2019 22:16
@retronym retronym force-pushed the topic/rationalize-names branch from 09c257c to 22f6779 Compare May 8, 2019 22:53
@mkeskells
Copy link
Contributor

You could argue that, after we decake Name, the toString field may no longer be free, it may become expensive

If we will lose the outer pointer (not sure if we will), the toString field retain 8 bytes

@retronym
Copy link
Member Author

retronym commented May 9, 2019

We won't actually lose the outer pointer (it might in the future point to a NameTable rather than a Global.)

I started to look for places where Name.toString is called during compilation but could be avoided. I would hope that by addressing any hot uses of Name.toString we could eliminate cachedString altogether. This would reduce the retained size of the name table somewhat.


/** Memory to store all names sequentially. */
var chrs: Array[Char] = new Array[Char](NAME_SIZE)
private[this] var _chrs: Array[Char] = new Array[Char](NAME_SIZE) // TODO this ought to be private
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private[this] var _chrs: Array[Char] = new Array[Char](NAME_SIZE) // TODO this ought to be private
private[this] var _chrs: Array[Char] = new Array[Char](NAME_SIZE)

@diesalbla
Copy link
Contributor

diesalbla commented May 9, 2019

@retronym Since you are trying to reduce space consumption on the names area, I had an idea and a doubt, on which I would like to get your opinion.

The idea is to tun _chrs from being an Array[Char], which is to say a UTF-16-encoded characters, into a UTF-8-encoded Array[Byte]. As I understand it, UTF-8 is a variable-length encoding, where most ASCII characters occupy 1 byte. Given that most of the Scala code out there is written in the ASCII subset (leave or take some fancy arrows), this would probably halve memory usage. I am guessing that, currently, for most programs, _chrs is wasting half its space with Character values that have a zero byte for its most significant bits.

The doubt I was having (which is more about looking both sides of the road) is about the Array[Char] type: have we checked if the JVM is using exactly two bytes for each position? I recall reading somewhere that, sometimes, the JVM may end up using a whole 32-bit word for smaller primitive values.


// SYNCNOTE: caller to constructor must synchronize if `synchronizeNames` is enabled
sealed abstract class TermName(index0: Int, len0: Int, val next: TermName) extends Name(index0, len0) with TermNameApi {
final class TermName(index0: Int, len0: Int, val next: TermName, cachedString: String) extends Name(index0, len0, cachedString) with TermNameApi {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there be a benefit in reducing either index0 or len0 to Short or Byte? I do not think there are any identifiers over 127 characters long.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

index0 cant be. Its an index into a blob of all of the identifiers. I might be more inclined to consider if 2G is big enough to cope with all of the names that a project could reference (It seems a strange and arbitrary limit to build in to the architecture)

What is the language limit, or the JVM limits? I don't think that we could impose a smaller limit for the length of the identifier, after all we have to be able to read a java identifier, and cope with any legal identifier in scala, and I think that is 65536, from below

from https://docs.oracle.com/javase/specs/jvms/se12/html/jvms-4.html#jvms-4.11

The length of field and method names, field and method descriptors, and other constant string values (including those referenced by ConstantValue (§4.7.2) attributes) is limited to 65535 characters by the 16-bit unsigned length item of the CONSTANT_Utf8_info structure (§4.4.7).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps you could use 24 bits for the index0 and 8 bits for the len0, or any such arrangement, so you can fit them inside a single Int field. That would allow you to have 16777216 characters, for words of up to 255 characters each.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds dangerous simply because while we could forbid 256-character scala names, we can't impose such a restriction on Java (and being unable even to parse such a classfile does not sound acceptable to me). We could have an XL(Term|Type)Name to accommodate that, perhaps.

@mkeskells
Copy link
Contributor

@diesalbla in my original proposal for the names reorganisation, we use just strings, and java 10 ( I think) there is already an optimisation to cope with strings that consist of 8 bit chars
I don't believe that this structure really is worth the effort. Its just a set of string like things IMO

Copy link
Contributor

@hrhino hrhino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's probably a few more things that could be followed up on but the change itself certainly LGTM.

@deprecated("Don't access name table contents directly.", "2.12.9")
def chrs: Array[Char] = _chrs
@deprecated("Don't access name table contents directly.", "2.12.9")
def chrs_=(cs: Array[Char]) = _chrs = cs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly doubt it is ever used anywhere, I was just being conservative to keep the current API in a minor release.


// SYNCNOTE: caller to constructor must synchronize if `synchronizeNames` is enabled
sealed abstract class TermName(index0: Int, len0: Int, val next: TermName) extends Name(index0, len0) with TermNameApi {
final class TermName(index0: Int, len0: Int, val next: TermName, cachedString: String) extends Name(index0, len0, cachedString) with TermNameApi {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds dangerous simply because while we could forbid 256-character scala names, we can't impose such a restriction on Java (and being unable even to parse such a classfile does not sound acceptable to me). We could have an XL(Term|Type)Name to accommodate that, perhaps.

@retronym retronym merged commit ae27fe9 into scala:2.12.x May 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants