Rationalize subclasses of Name#8044
Conversation
|
02414e7 to
07aa78f
Compare
|
@retronym I have pushed the changes to the Partest that was failing, due to reflection. |
| def longString: String = nameKind + " " + decode | ||
| def debugString = { val s = decode ; if (isTypeName) s + "!" else s } | ||
|
|
||
| override final def toString: String = if (cachedString == null) new String(chrs, index, len) else cachedString |
There was a problem hiding this comment.
we could consider caching the name if we have calculated it. Its a trade off, and would depend on some benchmarking when in use
There was a problem hiding this comment.
We don't actually call Name.toString anywhere hot. When we build up full names (e.g. scala.Boolean or scala/Boolean we get the characters of the components from the underlying chrs array directly.
There was a problem hiding this comment.
I've added a commit to clean up the part of code I'm talking about here (Symbol#fullName). After that change, I've deprecated direct access to Names.chrs.
There was a problem hiding this comment.
if it is not useful, then why do we have the field at all?
There are lots of code paths that call toString, and probably don't all need to
e.g. append, prepend, string=
also the unapply of TermName and TypeName
Due to alignment, TermName_R (which doesn't cache the provided string for toString) takes up just as much space as TermName_S. The code ends up somewhat easier to read with by just encoding the difference with the a nullable field.
84e4ac0 to
e5dab49
Compare
09c257c to
22f6779
Compare
|
You could argue that, after we decake Name, the toString field may no longer be free, it may become expensive If we will lose the outer pointer (not sure if we will), the toString field retain 8 bytes |
|
We won't actually lose the outer pointer (it might in the future point to a I started to look for places where |
|
|
||
| /** Memory to store all names sequentially. */ | ||
| var chrs: Array[Char] = new Array[Char](NAME_SIZE) | ||
| private[this] var _chrs: Array[Char] = new Array[Char](NAME_SIZE) // TODO this ought to be private |
There was a problem hiding this comment.
| private[this] var _chrs: Array[Char] = new Array[Char](NAME_SIZE) // TODO this ought to be private | |
| private[this] var _chrs: Array[Char] = new Array[Char](NAME_SIZE) |
|
@retronym Since you are trying to reduce space consumption on the The idea is to tun The doubt I was having (which is more about looking both sides of the road) is about the |
|
|
||
| // SYNCNOTE: caller to constructor must synchronize if `synchronizeNames` is enabled | ||
| sealed abstract class TermName(index0: Int, len0: Int, val next: TermName) extends Name(index0, len0) with TermNameApi { | ||
| final class TermName(index0: Int, len0: Int, val next: TermName, cachedString: String) extends Name(index0, len0, cachedString) with TermNameApi { |
There was a problem hiding this comment.
Would there be a benefit in reducing either index0 or len0 to Short or Byte? I do not think there are any identifiers over 127 characters long.
There was a problem hiding this comment.
index0 cant be. Its an index into a blob of all of the identifiers. I might be more inclined to consider if 2G is big enough to cope with all of the names that a project could reference (It seems a strange and arbitrary limit to build in to the architecture)
What is the language limit, or the JVM limits? I don't think that we could impose a smaller limit for the length of the identifier, after all we have to be able to read a java identifier, and cope with any legal identifier in scala, and I think that is 65536, from below
from https://docs.oracle.com/javase/specs/jvms/se12/html/jvms-4.html#jvms-4.11
The length of field and method names, field and method descriptors, and other constant string values (including those referenced by ConstantValue (§4.7.2) attributes) is limited to 65535 characters by the 16-bit unsigned length item of the CONSTANT_Utf8_info structure (§4.4.7).
There was a problem hiding this comment.
Perhaps you could use 24 bits for the index0 and 8 bits for the len0, or any such arrangement, so you can fit them inside a single Int field. That would allow you to have 16777216 characters, for words of up to 255 characters each.
There was a problem hiding this comment.
That sounds dangerous simply because while we could forbid 256-character scala names, we can't impose such a restriction on Java (and being unable even to parse such a classfile does not sound acceptable to me). We could have an XL(Term|Type)Name to accommodate that, perhaps.
|
@diesalbla in my original proposal for the names reorganisation, we use just strings, and java 10 ( I think) there is already an optimisation to cope with strings that consist of 8 bit chars |
hrhino
left a comment
There was a problem hiding this comment.
I think there's probably a few more things that could be followed up on but the change itself certainly LGTM.
| @deprecated("Don't access name table contents directly.", "2.12.9") | ||
| def chrs: Array[Char] = _chrs | ||
| @deprecated("Don't access name table contents directly.", "2.12.9") | ||
| def chrs_=(cs: Array[Char]) = _chrs = cs |
There was a problem hiding this comment.
I strongly doubt it is ever used anywhere, I was just being conservative to keep the current API in a minor release.
|
|
||
| // SYNCNOTE: caller to constructor must synchronize if `synchronizeNames` is enabled | ||
| sealed abstract class TermName(index0: Int, len0: Int, val next: TermName) extends Name(index0, len0) with TermNameApi { | ||
| final class TermName(index0: Int, len0: Int, val next: TermName, cachedString: String) extends Name(index0, len0, cachedString) with TermNameApi { |
There was a problem hiding this comment.
That sounds dangerous simply because while we could forbid 256-character scala names, we can't impose such a restriction on Java (and being unable even to parse such a classfile does not sound acceptable to me). We could have an XL(Term|Type)Name to accommodate that, perhaps.
Due to alignment, TermName_R (which doesn't cache the provided string
for toString) takes up just as much space as TermName_S.
The code ends up somewhat easier to read with by just encoding the
difference with the a nullable field.