List optimizations for List.from, ::: and prependedAll by joshlemer · Pull Request #8601 · scala/scala

joshlemer · 2019-12-18T18:21:10Z

::::

we avoid creating intermediate ListBuffer, and list.iterator objects, and also avoid doing extra work in ListBuffer's ++= method such as ensureUnaliased() calls and this.last0 = ...; this.size += ...; updates etc.

prependedAll:

same optimizations as above, but with the added bonus that currently prependedAll will also copy all of this since the super.prependedAll implementation is:

// StrictOptimizedSeqOps...
  override def prependedAll[B >: A](prefix: IterableOnce[B]): CC[B] = {
    val b = iterableFactory.newBuilder[B]
    b ++= prefix
    b ++= this
    b.result()
  }

This PR avoids that extra copying by doing a similar approach to the ::: algorithm.

Also note: This is the method that underlies list concatenation (++/concat/appendedAll)

Benchmarks

code:

  @Param(Array("0", "1", "10", "100", "1000", "10000"))
  var size: Int = _

  var prefix: List[Int] = Nil
  var prefixVector: Vector[Int] = Vector()
  var suffix: List[Int] = Nil

  @Setup(Level.Trial) def initKeys(): Unit = {
    prefix = List.fill(size)(0)
    prefixVector = Vector.fill(size)(0)
    suffix = List.fill(size)(0)
  }

  @Benchmark def colonColonColon(bh: Blackhole): Unit = {
    bh.consume(prefix ::: suffix)
  }
  @Benchmark def prependedAll(bh: Blackhole): Unit = {
    bh.consume(suffix.prependedAll(prefixVector))
  }

this branch:

[info] Benchmark                      (size)  Mode  Cnt      Score       Error  Units
[info] ListBenchmark.colonColonColon       0  avgt   20      3.210 ±     0.288  ns/op
[info] ListBenchmark.colonColonColon       1  avgt   20      7.205 ±     0.630  ns/op
[info] ListBenchmark.colonColonColon      10  avgt   20     51.701 ±     7.002  ns/op
[info] ListBenchmark.colonColonColon     100  avgt   20    552.057 ±    83.209  ns/op
[info] ListBenchmark.colonColonColon    1000  avgt   20   8129.771 ±  5933.402  ns/op
[info] ListBenchmark.colonColonColon   10000  avgt   20  54518.199 ± 15867.176  ns/op
[info] ListBenchmark.prependedAll          0  avgt   20      3.447 ±     0.144  ns/op
[info] ListBenchmark.prependedAll          1  avgt   20     39.401 ±    24.351  ns/op
[info] ListBenchmark.prependedAll         10  avgt   20     76.843 ±    11.932  ns/op
[info] ListBenchmark.prependedAll        100  avgt   20    478.878 ±    12.370  ns/op
[info] ListBenchmark.prependedAll       1000  avgt   20   5143.790 ±   841.873  ns/op
[info] ListBenchmark.prependedAll      10000  avgt   20  73089.342 ± 14821.023  ns/op

2.13.x:

[info] Benchmark                      (size)  Mode  Cnt       Score       Error  Units
[info] ListBenchmark.colonColonColon       0  avgt   20       2.769 ±     0.091  ns/op
[info] ListBenchmark.colonColonColon       1  avgt   20      15.965 ±     2.040  ns/op
[info] ListBenchmark.colonColonColon      10  avgt   20      93.735 ±     2.598  ns/op
[info] ListBenchmark.colonColonColon     100  avgt   20     933.721 ±    40.424  ns/op
[info] ListBenchmark.colonColonColon    1000  avgt   20    8770.294 ±   546.587  ns/op
[info] ListBenchmark.colonColonColon   10000  avgt   20  123632.790 ± 30704.689  ns/op
[info] ListBenchmark.prependedAll          0  avgt   20      23.060 ±     1.769  ns/op
[info] ListBenchmark.prependedAll          1  avgt   20      60.048 ±    29.049  ns/op
[info] ListBenchmark.prependedAll         10  avgt   20     178.528 ±    38.932  ns/op
[info] ListBenchmark.prependedAll        100  avgt   20    1107.418 ±   146.895  ns/op
[info] ListBenchmark.prependedAll       1000  avgt   20   10747.862 ±   627.832  ns/op
[info] ListBenchmark.prependedAll      10000  avgt   20  107582.601 ±  9426.232  ns/op

TODO

Write more property/unit tests for ::: and prependedAll
Benchmark more thoroughly with more forks and iterations and

src/library/scala/collection/immutable/List.scala

joshlemer · 2020-01-01T04:54:09Z

I've pushed an optimization of List.from(IterableOnce) which results in like 3-4x speedup!

This Branch

[info] Benchmark                 (size)  Mode  Cnt     Score     Error  Units
[info] ListBenchmark.fromVector       0  avgt   20     2.518 ±   0.021  ns/op
[info] ListBenchmark.fromVector       1  avgt   20    14.544 ±   0.461  ns/op
[info] ListBenchmark.fromVector       2  avgt   20    18.685 ±   1.176  ns/op
[info] ListBenchmark.fromVector       3  avgt   20    22.879 ±   0.678  ns/op
[info] ListBenchmark.fromVector       4  avgt   20    26.933 ±   0.335  ns/op
[info] ListBenchmark.fromVector       5  avgt   20    33.303 ±   1.208  ns/op
[info] ListBenchmark.fromVector      10  avgt   20    53.243 ±   1.774  ns/op
[info] ListBenchmark.fromVector     100  avgt   20   375.164 ±  10.294  ns/op
[info] ListBenchmark.fromVector    1000  avgt   20  4026.165 ± 301.378  ns/op


2.13.x

[info] Benchmark                 (size)  Mode  Cnt      Score     Error  Units
[info] ListBenchmark.fromVector       0  avgt   20      2.572 ±   0.119  ns/op
[info] ListBenchmark.fromVector       1  avgt   20     32.201 ±   0.308  ns/op
[info] ListBenchmark.fromVector       2  avgt   20     55.803 ±   6.275  ns/op
[info] ListBenchmark.fromVector       3  avgt   20     71.952 ±   3.001  ns/op
[info] ListBenchmark.fromVector       4  avgt   20     90.953 ±   1.730  ns/op
[info] ListBenchmark.fromVector       5  avgt   20    110.080 ±   1.715  ns/op
[info] ListBenchmark.fromVector      10  avgt   20    219.745 ±   8.785  ns/op
[info] ListBenchmark.fromVector     100  avgt   20   1930.478 ±  35.367  ns/op
[info] ListBenchmark.fromVector    1000  avgt   20  19149.557 ± 137.165  ns/op

diesalbla · 2020-01-01T20:35:41Z

For those benchmarks, could you also show the gc.alloc.rate results?

joshlemer · 2020-01-01T21:37:21Z

@diesalbla results were too big to post inline, you can see them here though: https://gist.github.com/joshlemer/63ea43be7cdd57c7b5cf605b5d8eebdb

diesalbla · 2020-01-01T22:34:05Z

thanks. The results indicate no changes for the fromVector, 16 fewer bytes for colonColonColon, and half allocations for prependedAll

joshlemer · 2020-01-01T22:52:49Z

@diesalbla yeah that's right. That's kinda confusing though, why there aren't any savings for fromVector considering that the new implementation saves on the allocation of the ListBuffer object...

NthPortal · 2020-01-02T04:01:54Z

what's the reason for the huge speedup vs ListBuffer.from(...).toList?

I feel like it would be better to fix ListBuffer#addAll than to add a custom List.from. The only thing I can guess is the performance hit is that ListBuffer#addAll writes to the fields in each iteration, rather than just at the end. Do you want to try keeping the building in local variables and just storing them at the end, and see if that evens up the performance @joshlemer?

mkeskells · 2020-01-02T07:47:15Z

test/junit/scala/collection/immutable/ListTest.scala

+
+  @Test def colonColonColon(): Unit = {
+
+    assertEquals(Nil, Nil ::: Nil)


it may be worth adding some assertNonAllocating on the appropriate cases

I'm not familiar with this method, what's the import? Or are you saying I could implement it?

I misremembered the name - its just nonAllocating

see HashSetTest
your test class needs to extend scala.tools.testing.AllocationTest
then you can call

def nonAllocating[T: Manifest](fn: => T)(implicit execution: AllocationExecution = AllocationExecution()): T def onlyAllocates[T: Manifest](size: Int)(fn: => T)(implicit execution: AllocationExecution = AllocationExecution()): T

so you can get tests like this (in HashSetTest)

@Test def nonAllocatingEqualsNotShared(): Unit = { val base = generate() val notShared = generate() assertTrue(nonAllocating { base == notShared }) assertTrue(nonAllocating { notShared == base }) }

joshlemer · 2020-01-02T13:36:12Z

what's the reason for the huge speedup vs ListBuffer.from(...).toList?

I feel like it would be better to fix ListBuffer#addAll than to add a custom List.from. The only thing I can guess is the performance hit is that ListBuffer#addAll writes to the fields in each iteration, rather than just at the end. Do you want to try keeping the building in local variables and just storing them at the end, and see if that evens up the performance @joshlemer?

@NthPortal this seems like a reasonable suggestion 😄 . Might not get to that for a few days though.

joshlemer · 2020-01-02T21:09:58Z

@NthPortal I tried this implementation of ListBuffer#addAll and it only moved the needle a couple of percent:

  override final def addAll(xs: IterableOnce[A]): this.type = {
    val it = xs.iterator
    if (it.hasNext) {
      ensureUnaliased()
      val last1 = new ::[A](it.next(), Nil)
      var _last0 = last0
      var _len = len + 1
      if (_len == 1) first = last1 else _last0.next = last1
      _last0 = last1
      while (it.hasNext) {
        val last1 = new ::[A](it.next(), Nil)
        _last0.next = last1
        _last0 = last1
        _len += 1
      }
      last0 = _last0
      len = _len
    }
    this
  }

NthPortal · 2020-01-02T23:50:16Z

hmm, oh well. was worth a shot I guess. I don't have any other ideas as to why it's so much worse

src/library/scala/collection/immutable/List.scala

szeiger · 2020-02-19T13:22:11Z

It looks like this is ready to merge. Is there anything you still want to change? In case of List.from I would go with the simpler version because there's no conclusive difference in the benchmarks.

src/library/scala/collection/immutable/List.scala

Optimize List#prependedAll Optimize List.from Property tests for List.from, ::: and prependedAll

joshlemer · 2020-03-20T00:56:45Z

@SethTisue This would be a really excellent speedup to include in 2.13.2 if possible

SethTisue · 2020-03-20T01:38:15Z

Stefan's out of commission at the moment, but it seems plausible he'll have time for a final review/merge here in time for 2.13.2

List optimizations for List.from, ::: and prependedAll

scala-jenkins added this to the 2.13.2 milestone Dec 18, 2019

joshlemer force-pushed the colon-colon-colon-opt branch 2 times, most recently from 8602022 to 186739d Compare December 18, 2019 20:33

joshlemer marked this pull request as ready for review December 18, 2019 21:07

joshlemer mentioned this pull request Dec 18, 2019

[WIP] Rework of immutable.Queue to avoid copying data into the "in" list #8602

Closed

5 tasks

diesalbla added the library:collections PRs involving changes to the standard collection library label Dec 19, 2019

hrhino reviewed Dec 20, 2019

View reviewed changes

src/library/scala/collection/immutable/List.scala Outdated Show resolved Hide resolved

NthPortal reviewed Dec 25, 2019

View reviewed changes

src/library/scala/collection/immutable/List.scala Outdated Show resolved Hide resolved

joshlemer force-pushed the colon-colon-colon-opt branch 3 times, most recently from e0fb2a3 to 15f729b Compare January 1, 2020 04:52

joshlemer changed the title ~~List optimizations for ::: and prependedAll~~ List optimizations for List.from, ::: and prependedAll Jan 1, 2020

mkeskells reviewed Jan 2, 2020

View reviewed changes

szeiger reviewed Jan 3, 2020

View reviewed changes

src/library/scala/collection/immutable/List.scala Outdated Show resolved Hide resolved

lrytz assigned szeiger Jan 8, 2020

SethTisue modified the milestones: 2.13.2, 2.13.3 Feb 6, 2020

joshlemer commented Feb 25, 2020

View reviewed changes

src/library/scala/collection/immutable/List.scala Outdated Show resolved Hide resolved

joshlemer force-pushed the colon-colon-colon-opt branch 2 times, most recently from eea23af to 2169dd6 Compare March 17, 2020 17:43

Optimize List#:::

55821d5

Optimize List#prependedAll Optimize List.from Property tests for List.from, ::: and prependedAll

joshlemer force-pushed the colon-colon-colon-opt branch from 2169dd6 to 55821d5 Compare March 20, 2020 00:52

SethTisue modified the milestones: 2.13.3, 2.13.2 Mar 20, 2020

SethTisue added the performance the need for speed. usually compiler performance, sometimes runtime performance. label Mar 20, 2020

szeiger approved these changes Apr 1, 2020

View reviewed changes

szeiger merged commit ca30256 into scala:2.13.x Apr 1, 2020

joshlemer deleted the colon-colon-colon-opt branch April 1, 2020 16:54

hamzaremmal pushed a commit to hamzaremmal/scala3 that referenced this pull request May 2, 2025

Merge pull request scala/scala#8601 from joshlemer/colon-colon-colon-opt

14f8e49

List optimizations for List.from, ::: and prependedAll

hamzaremmal pushed a commit to scala/scala3 that referenced this pull request May 7, 2025

Merge pull request scala/scala#8601 from joshlemer/colon-colon-colon-opt

7efb15d

List optimizations for List.from, ::: and prependedAll


		@Test def colonColonColon(): Unit = {

		assertEquals(Nil, Nil ::: Nil)

Conversation

joshlemer commented Dec 18, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

TODO

Uh oh!

Uh oh!

Uh oh!

joshlemer commented Jan 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

diesalbla commented Jan 1, 2020

Uh oh!

joshlemer commented Jan 1, 2020

Uh oh!

diesalbla commented Jan 1, 2020

Uh oh!

joshlemer commented Jan 1, 2020

Uh oh!

NthPortal commented Jan 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mkeskells Jan 2, 2020

Choose a reason for hiding this comment

Uh oh!

joshlemer Jan 2, 2020

Choose a reason for hiding this comment

Uh oh!

mkeskells Mar 17, 2020

Choose a reason for hiding this comment

Uh oh!

joshlemer commented Jan 2, 2020

Uh oh!

joshlemer commented Jan 2, 2020

Uh oh!

NthPortal commented Jan 2, 2020

Uh oh!

Uh oh!

szeiger commented Feb 19, 2020

Uh oh!

Uh oh!

joshlemer commented Mar 20, 2020

Uh oh!

SethTisue commented Mar 20, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

joshlemer commented Dec 18, 2019 •

edited

Loading

joshlemer commented Jan 1, 2020 •

edited

Loading

NthPortal commented Jan 2, 2020 •

edited

Loading