Skip to content

Require less stack compiling 150 field case class (with a balanced AND)#9635

Merged
dwijnand merged 1 commit intoscala:2.13.xfrom
lihaoyi:fix-12397
May 18, 2021
Merged

Require less stack compiling 150 field case class (with a balanced AND)#9635
dwijnand merged 1 commit intoscala:2.13.xfrom
lihaoyi:fix-12397

Conversation

@lihaoyi
Copy link
Contributor

@lihaoyi lihaoyi commented May 16, 2021

Fixes scala/bug#12397 by turning the long change of &&s in the synthetic def equals method:

a && b && c && d && e && f && g && h 

Which currently parses into an unbalanced depth O(n) tree as follows:

(((((((a && b) && c) && d) && e) && f) && g) && h)

into a binary tree of depth O(log n):

(((a && b) && (c && d)) && ((e && f) && (g && h)))

Tested manually by pasting the following snippet into the sbt scala interpreter:

case class Big150(_0: Int, _1: Int, _2: Int, _3: Int, _4: Int, _5: Int, _6: Int, _7: Int,
                  _8: Int, _9: Int, _10: Int, _11: Int, _12: Int, _13: Int, _14: Int,
                  _15: Int, _16: Int, _17: Int, _18: Int, _19: Int, _20: Int, _21: Int,
                  _22: Int, _23: Int, _24: Int, _25: Int, _26: Int, _27: Int, _28: Int,
                  _29: Int, _30: Int, _31: Int, _32: Int, _33: Int, _34: Int, _35: Int,
                  _36: Int, _37: Int, _38: Int, _39: Int, _40: Int, _41: Int, _42: Int,
                  _43: Int, _44: Int, _45: Int, _46: Int, _47: Int, _48: Int, _49: Int,
                  _50: Int, _51: Int, _52: Int, _53: Int, _54: Int, _55: Int, _56: Int,
                  _57: Int, _58: Int, _59: Int, _60: Int, _61: Int, _62: Int, _63: Int,
                  _64: Int, _65: Int, _66: Int, _67: Int, _68: Int, _69: Int, _70: Int,
                  _71: Int, _72: Int, _73: Int, _74: Int, _75: Int, _76: Int, _77: Int,
                  _78: Int, _79: Int, _80: Int, _81: Int, _82: Int, _83: Int, _84: Int,
                  _85: Int, _86: Int, _87: Int, _88: Int, _89: Int, _90: Int, _91: Int,
                  _92: Int, _93: Int, _94: Int, _95: Int, _96: Int, _97: Int, _98: Int,
                  _99: Int, _100: Int, _101: Int, _102: Int, _103: Int, _104: Int,
                  _105: Int, _106: Int, _107: Int, _108: Int, _109: Int, _110: Int,
                  _111: Int, _112: Int, _113: Int, _114: Int, _115: Int, _116: Int,
                  _117: Int, _118: Int, _119: Int, _120: Int, _121: Int, _122: Int,
                  _123: Int, _124: Int, _125: Int, _126: Int, _127: Int, _128: Int,
                  _129: Int, _130: Int, _131: Int, _132: Int, _133: Int, _134: Int,
                  _135: Int, _136: Int, _137: Int, _138: Int, _139: Int, _140: Int,
                  _141: Int, _142: Int, _143: Int, _144: Int, _145: Int, _146: Int,
                  _147: Int, _148: Int, _149: Int)

This semi-reliably crashes the interpreter with a StackOverflow on 2.13.x, and works without issue on this PR.

I'm not sure where the tests should go, but let me know and I'll happily paste that snippet into your test suite (or you guys could do it on my behalf when merging!)

It's not clear to me if the other generated methods suffer the same unbalanced-AST issue, but glancing over the code it seems they don't: e.g. .hashCode has a long chain of val assignments of AST depth O(1), .productElement is one big pattern match of depth O(1), etc. The fact that this seems to fix the StackOverflow without it turning up somewhere else also supports the idea that .equals is the only generated method with this issue

Seems the problematic behavior was introduced 14 years ago in 8397c7b#diff-205537ac4c08ea690ada72e398df0018dcaf2a7c4987c0d8d8df322314469578R162

@scala-jenkins scala-jenkins added this to the 2.13.7 milestone May 16, 2021
@lihaoyi
Copy link
Contributor Author

lihaoyi commented May 16, 2021

Not sure if this should live in TreeDSL.scala instead, but a quick grep seems to suggest the only place def AND is used is here.

EDIT: I moved the code over, that gets rid of one useless .reduce call, and there probably isn't any scenario where someone would want to call def AND(guards: Tree*) without this balanced-binary-AST behavior anyway

gen.mkAnd(binaryTreeAnd(before), binaryTreeAnd(after))
}

assert(guards.nonEmpty, "Need at least one tree to AND() together, not zero")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case never happens AFAICT, so better to fail loudly if it does rather than returning a rubbish AST node

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"AFAICT"? The types allow it and there's nothing rubbish about EmptyTree. Could you add that back? I'd rather that than an assert and a ???.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT = As Far As I Can Tell. I grepped for AND() in the codebase and there is only one callsite (this one), and it doesn't allow an empty Seq to be passed in.

The EmptyTree branch is basically dead code that is never exercised, so who knows if it does what someone wants if it gets inadvertently hit? Wouldn't it be better to fail loudly rather than proceeding with an untested unexercised branch that nobody has ever thought about? If we really want to put something there, why not put a literal true, since that's the monoid zero of the && operation?

That's my reasoning anyway. If you still think we should put the EmptyTree back in, I'm happy to do so.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The EmptyTree branch is basically dead code that is never exercised, so who knows if it does what someone wants if it gets inadvertently hit? Wouldn't it be better to fail loudly rather than proceeding with an untested unexercised branch that nobody has ever thought about?

It's a toss-up between returning something one could call reasonable vs loudly and clearly failing early on unexpected input. Given this is out of scope for the objective here, I'd opt for the status quo (EmptyTree).

Even better I'm not against making AND take (guard: Tree, guards: Tree*) and avoid this dilemma entirely.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually don't mind this returning TRUE (above in the file) on the empty case, instead.

@lihaoyi
Copy link
Contributor Author

lihaoyi commented May 16, 2021

Test failure seems to benign, looks like we just need to update some golden files

@dwijnand

This comment has been minimized.

@lihaoyi
Copy link
Contributor Author

lihaoyi commented May 16, 2021

@dwijnand I ended up just copy-pasting it manually but it seems to have worked, validate-main is green

@joroKr21
Copy link
Member

joroKr21 commented May 16, 2021

Wouldn't this mess up the order of comparisons? If you check equalsCore you see this comment:

compare primitive fields first, slow equality checks of non-primitive fields can be skipped when primitives differ

Edit: to preserve that optimisation one would need two binary trees - one for primitives and one for other fields and then && them together.

I also wonder why canEqual comes at the end in the current scheme 🤔

def AND(guards: Tree*) = {
def binaryTreeAnd(tests: Seq[Tree]): Tree = tests match{
case List() => ???
case List(single) => single
Copy link
Contributor

@viktorklang viktorklang May 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you can match on List here.

scala> def f(i: Int*) = i match {
     |   case List() => 0
     |   case List(i) => 1
     |   case is => 2
     | }
def f(i: Int*): Int

scala> f()
val res11: Int = 0

scala> f(1)
val res12: Int = 2

scala> f(2,3)
val res13: Int = 2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whereas:

scala> def f(i: Int*) = i match {
     |   case Seq() => 0
     |   case Seq(i) => 1
     |   case is => 2
     | }
def f(i: Int*): Int

scala> f()
val res18: Int = 0

scala> f(1)
val res19: Int = 1

scala> f(2,3)
val res20: Int = 2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Note: that the type of ArraySeq.splitAt is (ArraySeq, ArraySeq))

scala> def f(ints: Int*) = ints
def f(ints: Int*): Seq[Int]

scala> f(1,2,3).splitAt(2)
val res7: (Seq[Int], Seq[Int]) = (ArraySeq(1, 2),ArraySeq(3))

scala> f(1,2,3).splitAt(0)
val res8: (Seq[Int], Seq[Int]) = (ArraySeq(),ArraySeq(1, 2, 3))

scala> f(1,2,3).splitAt(3)
val res9: (Seq[Int], Seq[Int]) = (ArraySeq(1, 2, 3),ArraySeq())

def NOT(tree: Tree) = Select(tree, Boolean_not)
def AND(guards: Tree*) = if (guards.isEmpty) EmptyTree else guards reduceLeft gen.mkAnd
def AND(guards: Tree*) = {
def binaryTreeAnd(tests: Seq[Tree]): Tree = tests match{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to require List here and recurse on Nil, a :: Nil, twoOrMore convert to List before the outer call.

Also you could have this method be:

def binaryTreeAnd(head: Tree, tail: List[Tree]): Tree

And then this method clearly never has nested EmptyTree.

The EmptyTree case can be handled once at the outer level of the input is empty.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the use of splitAt recursively would make List version quadratic. I don't know if ArraySeq has O(1) splitAt. If it doesn't, the current code is also quadratic.

A better approach might be to copy to an Array and recurse on indices representing the upper and lower bounds of the array which will certainly only N log N.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@johnynek varargs are instantiated as ArraySeq. splitAt for that class seems to be implemented by s.c.AbstractIterable, which I presume uses take + drop as implementation. I'd probably go with getting the array from the ArraySeq and work with offsets, as you suggested, as then you can avoid allocating tuples and new collections.

@lihaoyi
Copy link
Contributor Author

lihaoyi commented May 17, 2021

I removed the assert, added back the EmptyTree, and replaced the List() pattern matches with Seq()s in response to @dwijnand and @viktorklang's feedback.

@joroKr21 I don't think this would change the order of evaluation; we're re-balancing the tree of comparisons, but the comparisons should still be evaluated from left to right, and the left-to-right order of the tree is unchanged.

@johnynek Given the O(n) copying I'm doing in .splitAt, the O(n) traversal I'm doing in .size, and the O(log n) levels of the binary recursion, I'm pretty sure the current PR implementation is O(n log n)? We can always micro-optimize it a bit further, but I'm following the same style of the surrounding code, which doesn't seem particularly micro-optimized

@johnynek
Copy link
Contributor

Yeah, I made a mistake. Since the depth is log, doing O(N) and work at each stage is N log N, which I think should be fine.

I think you can optimize it down to O(N) at the cost of less clear code. Probably not worth it since N is small in virtually all cases.

@joroKr21
Copy link
Member

joroKr21 commented May 17, 2021

@joroKr21 I don't think this would change the order of evaluation; we're re-balancing the tree of comparisons, but the comparisons should still be evaluated from left to right, and the left-to-right order of the tree is unchanged.

@lihaoyi indeed, thanks for explaining 👍

I had some non-profit fun with binary trees (untested):

  • dirty imperative version:
    def AND(guards: Tree*) =
      if (guards.isEmpty) EmptyTree else {
        val trees = guards.toArray
        var n = trees.length
        while (n > 1) {
          var i, j = n % 2
          while (j < n) {
            trees(i) = gen.mkAnd(trees(j), trees(j + 1))
            i += 1; j += 2
          }
          n = i
        }
        trees.head
      }
  • elegant functional version:
    def AND(guards: Tree*) = {
      @tailrec def binaryTreeAnd(queue: List[Tree], frontier: List[Tree], flip: Boolean): Tree =
        queue match {
          case x :: y :: queue =>
            val xy = if (flip) gen.mkAnd(y, x) else gen.mkAnd(x, y)
            binaryTreeAnd(queue, xy :: frontier, flip)
          case x :: Nil =>
            binaryTreeAnd(Nil, x :: frontier, flip)
          case Nil =>
            frontier match {
              case Nil => EmptyTree
              case x :: Nil => x
              case _ => binaryTreeAnd(frontier, Nil, !flip)
            }
        }

      binaryTreeAnd(guards.toList, Nil, flip = false)
    }

@dwijnand dwijnand changed the title Fix #12397: Crazy JVM stack requirements for compiling big-ish case class? Require less stack compiling 150 field case class (with a balanced AND) May 17, 2021
@dwijnand
Copy link
Member

If there's no response feedback, this just needs a squash now.

@SethTisue
Copy link
Member

SethTisue commented May 17, 2021

compare primitive fields first, slow equality checks of non-primitive fields can be skipped when primitives differ

@joroKr21 you said your order-of-operations concerns were addressed. I just want to double check, are we sure that this optimization wasn't lost here?

@joroKr21
Copy link
Member

@joroKr21 you said your order-of-operations concerns were addressed. I just want to double check, are we sure that this optimization wasn't lost here?

Yes, left-to-right is left-to-right regardless of the branching structure.

@dwijnand
Copy link
Member

scala> ((((((prim1 && prim2) && prim3) && prim4) && ref1) && ref2) && ref3) && ref4
prim1 standing by
prim2 standing by
prim3 standing by
prim4 standing by
ref1 standing by
ref2 standing by
ref3 standing by
ref4 standing by
val res0: Boolean = true

scala> ((prim1 && prim2) && (prim3 && prim4)) && ((ref1 && ref2) && (ref3 && ref4))
prim1 standing by
prim2 standing by
prim3 standing by
prim4 standing by
ref1 standing by
ref2 standing by
ref3 standing by
ref4 standing by
val res1: Boolean = true

@SethTisue
Copy link
Member

SethTisue commented May 17, 2021

left-to-right is left-to-right regardless of the branching structure

Sorry, I didn't phrase my question clearly. I understand that the branching structure doesn't affect order of operations. What I meant to ask is: this PR retains the partitioning into primitive and not-primitive, right?

@joroKr21
Copy link
Member

What I meant to ask is: this PR retains the partitioning into primitive and not-primitive, right?

Ah yeah it's done in equalsCore before we call AND

@lihaoyi lihaoyi force-pushed the fix-12397 branch 2 times, most recently from cdc13fc to 29ae6fa Compare May 18, 2021 02:45
…hetic `def equals` method:

```scala
a && b && c && d && e && f && g && h
```

Which currently parses into an unbalanced depth O(n) tree as follows:

```scala
(((((((a && b) && c) && d) && e) && f) && g) && h)
```

into a binary tree of depth O(log n):

```scala
(((a && b) && (c && d)) && ((e && f) && (g && h)))
```

Tested manually by pasting the following snippet into the `sbt scala` interpreter:

```scala
case class Big150(_0: Int, _1: Int, _2: Int, _3: Int, _4: Int, _5: Int, _6: Int, _7: Int,
                  _8: Int, _9: Int, _10: Int, _11: Int, _12: Int, _13: Int, _14: Int,
                  _15: Int, _16: Int, _17: Int, _18: Int, _19: Int, _20: Int, _21: Int,
                  _22: Int, _23: Int, _24: Int, _25: Int, _26: Int, _27: Int, _28: Int,
                  _29: Int, _30: Int, _31: Int, _32: Int, _33: Int, _34: Int, _35: Int,
                  _36: Int, _37: Int, _38: Int, _39: Int, _40: Int, _41: Int, _42: Int,
                  _43: Int, _44: Int, _45: Int, _46: Int, _47: Int, _48: Int, _49: Int,
                  _50: Int, _51: Int, _52: Int, _53: Int, _54: Int, _55: Int, _56: Int,
                  _57: Int, _58: Int, _59: Int, _60: Int, _61: Int, _62: Int, _63: Int,
                  _64: Int, _65: Int, _66: Int, _67: Int, _68: Int, _69: Int, _70: Int,
                  _71: Int, _72: Int, _73: Int, _74: Int, _75: Int, _76: Int, _77: Int,
                  _78: Int, _79: Int, _80: Int, _81: Int, _82: Int, _83: Int, _84: Int,
                  _85: Int, _86: Int, _87: Int, _88: Int, _89: Int, _90: Int, _91: Int,
                  _92: Int, _93: Int, _94: Int, _95: Int, _96: Int, _97: Int, _98: Int,
                  _99: Int, _100: Int, _101: Int, _102: Int, _103: Int, _104: Int,
                  _105: Int, _106: Int, _107: Int, _108: Int, _109: Int, _110: Int,
                  _111: Int, _112: Int, _113: Int, _114: Int, _115: Int, _116: Int,
                  _117: Int, _118: Int, _119: Int, _120: Int, _121: Int, _122: Int,
                  _123: Int, _124: Int, _125: Int, _126: Int, _127: Int, _128: Int,
                  _129: Int, _130: Int, _131: Int, _132: Int, _133: Int, _134: Int,
                  _135: Int, _136: Int, _137: Int, _138: Int, _139: Int, _140: Int,
                  _141: Int, _142: Int, _143: Int, _144: Int, _145: Int, _146: Int,
                  _147: Int, _148: Int, _149: Int)
```

This semi-reliably crashes the interpreter with a StackOverflow on 2.13.x, and works without issue on this PR.

I'm not sure where the tests should go, but let me know and I'll happily paste that snippet into your test suite (or you guys could do it on my behalf when merging!)

It's not clear to me if the other generated methods suffer the same unbalanced-AST issue, but glancing over the code it seems they don't: e.g. `.hashCode` has a long chain of `val` assignments of AST depth O(1), `.productElement` is one big pattern match of depth O(1), etc. The fact that this seems to fix the StackOverflow without it turning up somewhere else also supports the idea that `.equals` is the only generated method with this issue

Seems the problematic behavior was introduced 14 years ago in scala@8397c7b#diff-205537ac4c08ea690ada72e398df0018dcaf2a7c4987c0d8d8df322314469578R162
@lihaoyi
Copy link
Contributor Author

lihaoyi commented May 18, 2021

@dwijnand squashed it

Copy link
Member

@dwijnand dwijnand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thank you.

@dwijnand dwijnand enabled auto-merge May 18, 2021 11:01
@dwijnand dwijnand merged commit 8490d37 into scala:2.13.x May 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Crazy JVM stack requirements for compiling big-ish case class?

7 participants