Fix VerifyError for by-name default args in object super calls#24983
Fix VerifyError for by-name default args in object super calls#24983tanishiking wants to merge 1 commit intoscala:mainfrom
Conversation
|
We might want to inline when the body has a reference to |
0e60141 to
e91399f
Compare
e91399f to
9020a4f
Compare
Nevermind, it doesn't matter that the body contains a reference to object Baz extends {
def defaultValue$1: Baz = Baz.E1
...
new Foo[Baz](Baz$$superArg$1()(defaultValue$1, arg1$1), ...)
} // ... |
| import ast.Trees.* | ||
| import core.NameKinds.SuperArgName | ||
|
|
||
| import core.Decorators.* |
There was a problem hiding this comment.
unrelated, just removed unused import
|
@liufengyun Would you mind reviewing this when you have a chance? I believe initialization is the area you specialize in. |
| * which generates invalid bytecode. | ||
| */ | ||
| private def isSyntheticByNameMethod(sym: Symbol, constr: Symbol): Boolean = | ||
| sym.is(Synthetic) && sym.is(Method) && sym.info.isInstanceOf[ExprType] && sym.owner == constr |
There was a problem hiding this comment.
Minor: "by-name method" can be misleading, what about "parameterless method"?
There was a problem hiding this comment.
In my opinion, Synthetic is a weak contract to check: what is special about Synthetic?
It would be good to check a strong contract that would make it easy to reason about the semantic correctness of the transform.
A code example here to clearly document the contract would be very helpful for long-term maintainability.
There was a problem hiding this comment.
In my opinion, Synthetic is a weak contract to check: what is special about Synthetic?
Ah, is(Synthetic) was just a safe guard (because I knew such parameterless methods are typically generated by LiftToDefs), but the essential constraints are:
- Owner is a constructor
- It's a parameterless method
(Actually, all methods whose parameter is constructor should be a problem, but not sure it might happen).
class Foo(x: => Int)
object Bar extends Foo({
def helper: Int = 42
helper
})
// becomes: which is safe without our change
object Bar extends Foo(Bar$superArg$1) {
synthetic def Bar$superArg$1: Int = {
def helper: Int = 42
helper
}
}Added some concrete example code in comment, and try removed is(Synthetic) check, let's see if it's not a problem.
There was a problem hiding this comment.
Thank you for the clarification @tanishiking , the PR is already in very good shape 👍
One thing that I worry about (related to the original question) is how we can be sure that the inlining does not change semantics. In particular, if we inline an expression into a by-name parameter, then the expression could be executed multiple times instead of once, which is only safe if the expression does not produce any observable side effects.
The document for LiftToDefs mentions the purity aspect:
scala3/compiler/src/dotty/tools/dotc/typer/EtaExpansion.scala
Lines 218 to 222 in 40075f7
The inlining undos the lifting, thus there needs some semantic justification to make the inlining safe.
There was a problem hiding this comment.
Right, to make sure inlining safe, I added an additional check: https://github.com/scala/scala3/pull/24983/changes#diff-707d4763f534decc252bdfd4d997af123dc4a9b8cb5c4b62a9add1514ffee594R204-R210
Now we inline only when the Ident (reference to inlinable parameterless method) is the entire body of a parameterless method. For example,
class Foo[T](x: => T, y: Int = 1, z: Int = 2)
object Bar extends Foo[MyValue](Bar.value, z = 3)
// After LiftToDefs
object Bar extends {
def defaultValue$1: MyValue = Bar.value
val arg1$1: Int = Foo.<init>$default$2
} with Foo[MyValue](() => defaultValue$1, arg1$1, 3)
// we can inline defaultValue$1 because the `Ident(defaultValue$1)` is the entire body of closure `() => defaultValue$1`.This restriction should be enough for our case, and we won't break the semantics when the body is more complex and evaluation order matters. What do you think?
There was a problem hiding this comment.
Thank you @tanishiking , the latest change looks great, I can't imagine one can do better 👍 All looks good to me.
/cc: @sjrd @smarter @odersky Please be notified about this change and comment if you have any concerns.
19c87f4 to
e7d37d7
Compare
Fixes scala#24201 When an object with - A super class with a by-name parameter - A by-name argument that gets lifted to a method - Named arguments (a Block in the super call) a `VerifyError` occurs at runtime because compiler generates a static method that incorrectly tries to access `this` before initialization. ```scala abstract class Foo[T](defaultValue: => T, arg1: Int = 1, arg2: Int = 2): def getValue: T = defaultValue enum Baz: case E1, E2 object Baz extends Foo[Baz](Baz.E1, arg2 = 2) @main def test = println(Baz.getValue) ``` The `Baz` in above Scala code, and it's compiled to the following JVM bytecode: ``` private Baz$(); Code: // get default argument for `arg1` and store_1 0: getstatic scala#34 // Field Foo$.MODULE$:LFoo$; 3: invokevirtual scala#38 // Method Foo$.$lessinit$greater$default$2:()I 6: istore_1 7: aload_0 // push uninitialized `this` for Foo.<init> // Trying to call this.defaultValue$1() but `this` is uninitialized! 8: aload_0 // ← aload_0 on uninitialized this 9: invokespecial scala#42 // Method defaultValue$1:()LBaz; ← INVALID! 12: checkcast scala#44 // class scala/Function0 15: iload_1 16: invokestatic scala#48 // Method Baz$superArg$1:(Lscala/Function0;I)Lscala/Function0; 19: iload_1 20: iconst_2 21: invokespecial scala#51 // Method Foo."<init>":(Lscala/Function0;II)V 24: return ``` The JVM verifier rejects this because `aload_0` at offset 8 loads an uninitialized `this` reference, and `invokespecial` at offset 9 tries to call an instance method on it before the super constructor completes. **Problem** After Typer, the Baz constructor looks like this: ```scala object Baz extends { def defaultValue$1: Baz = Baz.E1 val arg1$1: Int = Foo.<init>$default$2[Baz] new Foo[Baz](defaultValue$1, arg1$1, arg2 = 2) } { ... } ``` And, after `ElimByName` + `HoistSuperArgs`: `HoistSuperArgs` creates a static helper method that evaluates complex super call arguments outside the constructor context. ```scala object Baz extends { def defaultValue$1: Baz = Baz.E1 // still an instance method ... new Foo[Baz](Baz$$superArg$1()(defaultValue$1, arg1$1), ...) // ^ passes this.defaultValue$1 → VerifyError } { ... private <static> def Baz$$superArg$1()(defaultValue$1: => Baz, ...): () ?=> Baz = () => defaultValue$1 } ``` So it creates: private <static> def Baz$$superArg$1()(defaultValue$1: => Baz, arg1$1: Int): () ?=> Baz = () => defaultValue$1 And the super call becomes: new Foo[Baz](Baz$$superArg$1()(defaultValue$1, arg1$1), arg1$1, 2) ^^^^^^^^^^^^^^^ // This is a REFERENCE to the instance method! The Tree ```scala object Baz extends { def defaultValue$1: Baz = Baz.E1 // lifted by-name method (ExprType) val arg1$1: Int = Foo.<init>$default$2[Baz] new Foo[Baz](defaultValue$1, arg1$1, arg2 = 2) } { ... } ``` ```scala object Baz extends { def defaultValue$1: Baz = Baz.E1 // still an instance method ... new Foo[Baz](Baz$$superArg$1()(defaultValue$1, arg1$1), ...) // ^ passes this.defaultValue$1 → VerifyError } { ... private <static> def Baz$$superArg$1()(defaultValue$1: => Baz, ...): () ?=> Baz = () => defaultValue$1 } ``` However, to pass `defaultValue$1` as an argument to this static method, the compiler generates `this.defaultValue$1()`, which requires `this` to be initialized. So even though the static method itself doesn't use `this`, the call site access `this` before initializatin. --- This commit fixes the invalid code generation by inlining `defaultValue$1`'s body directly into the static method, and now we don't need to reference `this` at the call site. ```scala object Baz extends { // defaultValue$1 removed ... new Foo[Baz](Baz$$superArg$1()(arg1$1), ...) } { ... private <static> def Baz$$superArg$1()(arg1$1: Int): () ?=> Baz = () => Baz.E1 // inlined body, no reference to this } ``` In HoistSuperArgs, detect lifted by-name methods (synthetic methods with ExprType info) in the super call Block, store their bodies, and inline references to them in the hoisted static method. This avoids passing instance method references to static methods during super call evaluation.
e7d37d7 to
c7f15bf
Compare
| /** Check if symbol is a parameterless method owned by the constructor. | ||
| * | ||
| * This is used to identify methods created by LiftToDefs for by-name arguments. | ||
| * These methods need to be inlined to avoid VerifyError on JVM. |
There was a problem hiding this comment.
| * These methods need to be inlined to avoid VerifyError on JVM. | |
| * These methods need to be inlined to avoid VerifyError on the JVM | |
| * or IR checking errors with Scala.js. |
| * We inline because $anonfun's body is exactly `Ident(defaultValue$1)`. | ||
| */ | ||
| private def isParameterlessMethod(sym: Symbol, constr: Symbol): Boolean = | ||
| sym.owner == constr && sym.is(Method) && sym.info.isInstanceOf[ExprType] |
There was a problem hiding this comment.
This is probably going to match too many things. In particular, it could identify a local parameterless def used several times in the constructor:
class Foo {
val x = {
def make: Int = ...
make + make
}
}We don't want such a def to be inlined. We need to restrict this test to defaultValue$n methods somehow, I believe.
There was a problem hiding this comment.
@sjrd It seems you missed the comments above the code: isParameterlessMethod is not used to decide inlining or not, but only to assist inlining.
There was a problem hiding this comment.
IIUC it's the criteria that decides whether to put a method inside inlinableMethods. And being in inlinableMethods is the criteria that decides whether to be inlined at case id: Ident if inlinableMethods.contains(id.symbol) =>.
There was a problem hiding this comment.
And being in inlinableMethods is the criteria that decides whether to be inlined at case id: Ident if inlinableMethods.contains(id.symbol) =>.
The criteria is actually stronger than that: the inlining site must be a method whose body is just the parameterless method.
@tanishiking Maybe rename inlinableMethods to parameterlessMethods to avoid confusion.
There was a problem hiding this comment.
The criteria is actually stronger than that: the inlining site must be a method whose body is just the parameterless method.
You could still have several such call sites, couldn't you?
There was a problem hiding this comment.
Hmm, I started to feel like it might be overly complicated to handle this case... As @sjrd mentioned, I'm not 100% confident this won't inline other call sites than the i24201-specific case (and adding more conditions doesn't seem right...).
In hindsight, what we're doing is basically undoing the lifting arguments by Typer(LiftToDefs). It might be simpler to just not lift by-name parameter arguments (at constructor site) in the first place there, instead of handling in HoistSuperArgs 🤔 (they're converted to () => x in ElimByName anyway). Let me consider how we can make this simpler.
Anyway, thank you so much for reviewing! @liufengyun
|
Closing for #25157 |
…esh syms fix scala#24201 This change also fixes scala#18123 (not fully understand why) **Background** `LiftToDefs` lifts by-name arguments to synthetic `def`s so that a method argument would not be duplicated both as the real argument and as an argument to a default getter. see scala#2939 A transformation for by-name semantics is done later in `ElimByName`, which turns a by-name actual argument into a closure. **Problem** However, lifting by-name arguments can be problematic in a specific scenario. scala#24201 For example, in `extends Foo[Baz](Baz.E1, arg2 = 3)`: ```scala abstract class Foo[T](value: => T, arg1: Int = 1, arg2: Int = 2) enum Baz { case E1 } object Baz extends Foo[Baz](Baz.E1, arg2 = 3) ``` `LiftToDefs` lifts the by-name argument `Baz.E1` to the instance method like: ```scala object Baz extends { def defaultValue$1: Baz = Baz.E1 // instance method val arg1$1: Int = Foo.$lessinit$greater$default$2[Baz] new Foo[Baz](defaultValue$1, arg1$1, arg2 = 3) } ``` where `defaultValue$1` is a synthesized instance method owned by the constructor. This is not valid code, because the constructor super call needs to access uninitialized `this` to call `defaultValue$1`, which results in a `VerifyError` during JVM bytecode verification. **Why previous patches did not work** - scala#24983 tried to undo lifting in a later phase (`HoistSuperArgs`) - scala#25157 skips lifting when the owner is a constructor but neither worked well, because both essentially cancel lifting in the specific code shape (when inside a constructor super call). However, cancelling lifting reintroduces the problem fixed by scala#3839 for those code shapes. **Solution** This commit fixes scala#24201 without breaking scala#2939 by removing `LiftToDefs` entirely. Instead, we avoid the duplicated symbol problem by copying by-name parameter arguments with fresh symbols. This removes the constructor-local synthetic-method path entirely and keeps the code as-is. (This is not a problem regarding evaluation order, because `ElimByName` translates `Baz.E1` into the closure `() => Baz.E1`.) ```scala object Baz extends Foo[Baz](Baz.E1, arg2 = 3) ``` The default-getter handling is moved to the point where duplication actually occurs (`Applications.spliceMeth`). When `spliceMeth` rebuilds a default-getter call, it now freshens only by-name arguments. For example, the example code below: ```scala // i7477.scala def spawn(f: => Unit)(naming: Int = 4): Unit = ??? def test(): Unit = spawn { val x: Int = 5 x }() ``` previously, it is transformed into: ```scala def f$1: Unit = { val x: Int = 5 { x; () } } this.spawn(f$1)(this.spawn$default$2(f$1)) ``` now, the same tree is transformed into: ```scala this.spawn({ val x$1: Int = 5 { x$1 () }} )(this.spawn$default$2({ val x$2: Int = 5 { x$2 () } }) ) ```
…esh syms fix scala#24201 This change also fixes scala#18123 (not fully understand why) **Background** `LiftToDefs` lifts by-name arguments to synthetic `def`s so that a method argument would not be duplicated both as the real argument and as an argument to a default getter. see scala#2939 A transformation for by-name semantics is done later in `ElimByName`, which turns a by-name actual argument into a closure. **Problem** However, lifting by-name arguments can be problematic in a specific scenario. scala#24201 For example, in `extends Foo[Baz](Baz.E1, arg2 = 3)`: ```scala abstract class Foo[T](value: => T, arg1: Int = 1, arg2: Int = 2) enum Baz { case E1 } object Baz extends Foo[Baz](Baz.E1, arg2 = 3) ``` `LiftToDefs` lifts the by-name argument `Baz.E1` to the instance method like: ```scala object Baz extends { def defaultValue$1: Baz = Baz.E1 // instance method val arg1$1: Int = Foo.$lessinit$greater$default$2[Baz] new Foo[Baz](defaultValue$1, arg1$1, arg2 = 3) } ``` where `defaultValue$1` is a synthesized instance method owned by the constructor. This is not valid code, because the constructor super call needs to access uninitialized `this` to call `defaultValue$1`, which results in a `VerifyError` during JVM bytecode verification. **Why previous patches did not work** - scala#24983 tried to undo lifting in a later phase (`HoistSuperArgs`) - scala#25157 skips lifting when the owner is a constructor but neither worked well, because both essentially cancel lifting in the specific code shape (when inside a constructor super call). However, cancelling lifting reintroduces the problem fixed by scala#3839 for those code shapes. **Solution** This commit fixes scala#24201 without breaking scala#2939 by removing `LiftToDefs` entirely. Instead, we avoid the duplicated symbol problem by copying by-name parameter arguments with fresh symbols. This removes the constructor-local synthetic-method path entirely and keeps the code as-is. (This is not a problem regarding evaluation order, because `ElimByName` translates `Baz.E1` into the closure `() => Baz.E1`.) ```scala object Baz extends Foo[Baz](Baz.E1, arg2 = 3) ``` The default-getter handling is moved to the point where duplication actually occurs (`Applications.spliceMeth`). When `spliceMeth` rebuilds a default-getter call, it now freshens only by-name arguments. For example, the example code below: ```scala // i7477.scala def spawn(f: => Unit)(naming: Int = 4): Unit = ??? def test(): Unit = spawn { val x: Int = 5 x }() ``` previously, it is transformed into: ```scala def f$1: Unit = { val x: Int = 5 { x; () } } this.spawn(f$1)(this.spawn$default$2(f$1)) ``` now, the same tree is transformed into: ```scala this.spawn({ val x$1: Int = 5 { x$1 () }} )(this.spawn$default$2({ val x$2: Int = 5 { x$2 () } }) ) ```
…esh syms fix scala#24201 This change also fixes scala#18123 (not fully understand why) **Background** `LiftToDefs` lifts by-name arguments to synthetic `def`s so that a method argument would not be duplicated both as the real argument and as an argument to a default getter. see scala#2939 A transformation for by-name semantics is done later in `ElimByName`, which turns a by-name actual argument into a closure. **Problem** However, lifting by-name arguments can be problematic in a specific scenario. scala#24201 For example, in `extends Foo[Baz](Baz.E1, arg2 = 3)`: ```scala abstract class Foo[T](value: => T, arg1: Int = 1, arg2: Int = 2) enum Baz { case E1 } object Baz extends Foo[Baz](Baz.E1, arg2 = 3) ``` `LiftToDefs` lifts the by-name argument `Baz.E1` to the instance method like: ```scala object Baz extends { def defaultValue$1: Baz = Baz.E1 // instance method val arg1$1: Int = Foo.$lessinit$greater$default$2[Baz] new Foo[Baz](defaultValue$1, arg1$1, arg2 = 3) } ``` where `defaultValue$1` is a synthesized instance method owned by the constructor. This is not valid code, because the constructor super call needs to access uninitialized `this` to call `defaultValue$1`, which results in a `VerifyError` during JVM bytecode verification. **Why previous patches did not work** - scala#24983 tried to undo lifting in a later phase (`HoistSuperArgs`) - scala#25157 skips lifting when the owner is a constructor but neither worked well, because both essentially cancel lifting in the specific code shape (when inside a constructor super call). However, cancelling lifting reintroduces the problem fixed by scala#3839 for those code shapes. **Solution** This commit fixes scala#24201 without breaking scala#2939 by removing `LiftToDefs` entirely. Instead, we avoid the duplicated symbol problem by copying by-name parameter arguments with fresh symbols. This removes the constructor-local synthetic-method path entirely and keeps the code as-is. (This is not a problem regarding evaluation order, because `ElimByName` translates `Baz.E1` into the closure `() => Baz.E1`.) ```scala object Baz extends Foo[Baz](Baz.E1, arg2 = 3) ``` The default-getter handling is moved to the point where duplication actually occurs (`Applications.spliceMeth`). When `spliceMeth` rebuilds a default-getter call, it now freshens only by-name arguments. For example, the example code below: ```scala // i7477.scala def spawn(f: => Unit)(naming: Int = 4): Unit = ??? def test(): Unit = spawn { val x: Int = 5 x }() ``` previously, it is transformed into: ```scala def f$1: Unit = { val x: Int = 5 { x; () } } this.spawn(f$1)(this.spawn$default$2(f$1)) ``` now, the same tree is transformed into: ```scala this.spawn({ val x$1: Int = 5 { x$1 () }} )(this.spawn$default$2({ val x$2: Int = 5 { x$2 () } }) ) ```
…esh syms (#25502) fix #24201 This change also fixes #18123 (not fully understand why) **Background** `LiftToDefs` lifts by-name arguments to synthetic `def`s so that a method argument would not be duplicated both as the real argument and as an argument to a default getter. see #2939 A transformation for by-name semantics is done later in `ElimByName`, which turns a by-name actual argument into a closure. **Problem** However, lifting by-name arguments can be problematic in a specific scenario. #24201 For example, in `extends Foo[Baz](Baz.E1, arg2 = 3)`: ```scala abstract class Foo[T](value: => T, arg1: Int = 1, arg2: Int = 2) enum Baz { case E1 } object Baz extends Foo[Baz](Baz.E1, arg2 = 3) ``` `LiftToDefs` lifts the by-name argument `Baz.E1` to the instance method like: ```scala object Baz extends { def defaultValue$1: Baz = Baz.E1 // instance method val arg1$1: Int = Foo.$lessinit$greater$default$2[Baz] new Foo[Baz](defaultValue$1, arg1$1, arg2 = 3) } ``` where `defaultValue$1` is a synthesized instance method owned by the constructor. This is not valid code, because the constructor super call needs to access uninitialized `this` to call `defaultValue$1`, which results in a `VerifyError` during JVM bytecode verification. **Why previous patches did not work** - #24983 tried to undo lifting in a later phase (`HoistSuperArgs`) - #25157 skips lifting when the owner is a constructor but neither worked well, because both essentially cancel lifting in the specific code shape (when inside a constructor super call). However, cancelling lifting reintroduces the problem fixed by #3839 for those code shapes. **Solution** This commit fixes #24201 without breaking #2939 by removing `LiftToDefs` entirely. Instead, we avoid the duplicated symbol problem by copying by-name parameter arguments with fresh symbols. This removes the constructor-local synthetic-method path entirely and keeps the code as-is. (This is not a problem regarding evaluation order, because `ElimByName` translates `Baz.E1` into the closure `() => Baz.E1`.) ```scala object Baz extends Foo[Baz](Baz.E1, arg2 = 3) ``` The default-getter handling is moved to the point where duplication actually occurs (`Applications.spliceMeth`). When `spliceMeth` rebuilds a default-getter call, it now freshens only by-name arguments. For example, the example code below: ```scala // i7477.scala def spawn(f: => Unit)(naming: Int = 4): Unit = ??? def test(): Unit = spawn { val x: Int = 5 x }() ``` previously, it is transformed into: ```scala def f$1: Unit = { val x: Int = 5 { x; () } } this.spawn(f$1)(this.spawn$default$2(f$1)) ``` now, the same tree is transformed into: ```scala this.spawn({ val x$1: Int = 5 { x$1 () }} )(this.spawn$default$2({ val x$2: Int = 5 { x$2 () } }) ) ``` <!-- Fixes #XYZ (where XYZ is the issue number from the issue tracker) --> <!-- TODO description of the change --> <!-- Ideally should have a title like "Fix #XYZ: Short fix description" --> <!-- TODO first sign the CLA https://contribute.akka.io/cla/scala --> <!-- if the PR is still a WIP, create it as a draft PR (or convert it into one) --> ## How much have your relied on LLM-based tools in this contribution? work with GPT5.4 under supervision ## How was the solution tested? <!-- If automated tests are included, mention it. If they are not, explain why and how the solution was tested. --> ## Additional notes Maybe this could be a code size problem when the by-name parameter arg has a huge body?
Fixes #24201
An object with
cause a
VerifyErrorat runtime because compiler generates a static method that incorrectly tries to accessthisbefore initialization.The
Bazin above Scala code, and it's compiled to the following JVM bytecode:The JVM verifier rejects this because
aload_0at offset 8 loads an uninitializedthisreference, andinvokespecialat offset 9 tries to call an instance method on it before the super constructor completes.Problem
After Typer, the Baz constructor looks like this:
And, after
ElimByName+HoistSuperArgs:HoistSuperArgscreates a static helper method that evaluates complex super call arguments outside the constructor context.So it creates:
And the super call becomes:
The Tree
However, to pass
defaultValue$1as an argument to this static method, the compiler generatesthis.defaultValue$1(), which requiresthisto be initialized. So even though the static method itself doesn't usethis, the call site accessthisbefore initializatin.This commit fixes the invalid code generation by inlining
defaultValue$1's body directly into the static method, and now we don't need to referencethisat the call site.In HoistSuperArgs, detect lifted by-name methods (synthetic methods with ExprType info) in the super call Block, store their bodies, and inline references to them in the hoisted static method. This avoids passing instance method references to static methods during super call evaluation.