Skip to content

List optimizations for List.from, ::: and prependedAll#8601

Merged
szeiger merged 1 commit intoscala:2.13.xfrom
joshlemer:colon-colon-colon-opt
Apr 1, 2020
Merged

List optimizations for List.from, ::: and prependedAll#8601
szeiger merged 1 commit intoscala:2.13.xfrom
joshlemer:colon-colon-colon-opt

Conversation

@joshlemer
Copy link
Contributor

@joshlemer joshlemer commented Dec 18, 2019

::::

  • we avoid creating intermediate ListBuffer, and list.iterator objects, and also avoid doing extra work in ListBuffer's ++= method such as ensureUnaliased() calls and this.last0 = ...; this.size += ...; updates etc.

prependedAll:

  • same optimizations as above, but with the added bonus that currently prependedAll will also copy all of this since the super.prependedAll implementation is:
// StrictOptimizedSeqOps...
  override def prependedAll[B >: A](prefix: IterableOnce[B]): CC[B] = {
    val b = iterableFactory.newBuilder[B]
    b ++= prefix
    b ++= this
    b.result()
  }

This PR avoids that extra copying by doing a similar approach to the ::: algorithm.

Also note: This is the method that underlies list concatenation (++/concat/appendedAll)

Benchmarks

code:

  @Param(Array("0", "1", "10", "100", "1000", "10000"))
  var size: Int = _

  var prefix: List[Int] = Nil
  var prefixVector: Vector[Int] = Vector()
  var suffix: List[Int] = Nil

  @Setup(Level.Trial) def initKeys(): Unit = {
    prefix = List.fill(size)(0)
    prefixVector = Vector.fill(size)(0)
    suffix = List.fill(size)(0)
  }

  @Benchmark def colonColonColon(bh: Blackhole): Unit = {
    bh.consume(prefix ::: suffix)
  }
  @Benchmark def prependedAll(bh: Blackhole): Unit = {
    bh.consume(suffix.prependedAll(prefixVector))
  }

this branch:

[info] Benchmark                      (size)  Mode  Cnt      Score       Error  Units
[info] ListBenchmark.colonColonColon       0  avgt   20      3.210 ±     0.288  ns/op
[info] ListBenchmark.colonColonColon       1  avgt   20      7.205 ±     0.630  ns/op
[info] ListBenchmark.colonColonColon      10  avgt   20     51.701 ±     7.002  ns/op
[info] ListBenchmark.colonColonColon     100  avgt   20    552.057 ±    83.209  ns/op
[info] ListBenchmark.colonColonColon    1000  avgt   20   8129.771 ±  5933.402  ns/op
[info] ListBenchmark.colonColonColon   10000  avgt   20  54518.199 ± 15867.176  ns/op
[info] ListBenchmark.prependedAll          0  avgt   20      3.447 ±     0.144  ns/op
[info] ListBenchmark.prependedAll          1  avgt   20     39.401 ±    24.351  ns/op
[info] ListBenchmark.prependedAll         10  avgt   20     76.843 ±    11.932  ns/op
[info] ListBenchmark.prependedAll        100  avgt   20    478.878 ±    12.370  ns/op
[info] ListBenchmark.prependedAll       1000  avgt   20   5143.790 ±   841.873  ns/op
[info] ListBenchmark.prependedAll      10000  avgt   20  73089.342 ± 14821.023  ns/op

2.13.x:

[info] Benchmark                      (size)  Mode  Cnt       Score       Error  Units
[info] ListBenchmark.colonColonColon       0  avgt   20       2.769 ±     0.091  ns/op
[info] ListBenchmark.colonColonColon       1  avgt   20      15.965 ±     2.040  ns/op
[info] ListBenchmark.colonColonColon      10  avgt   20      93.735 ±     2.598  ns/op
[info] ListBenchmark.colonColonColon     100  avgt   20     933.721 ±    40.424  ns/op
[info] ListBenchmark.colonColonColon    1000  avgt   20    8770.294 ±   546.587  ns/op
[info] ListBenchmark.colonColonColon   10000  avgt   20  123632.790 ± 30704.689  ns/op
[info] ListBenchmark.prependedAll          0  avgt   20      23.060 ±     1.769  ns/op
[info] ListBenchmark.prependedAll          1  avgt   20      60.048 ±    29.049  ns/op
[info] ListBenchmark.prependedAll         10  avgt   20     178.528 ±    38.932  ns/op
[info] ListBenchmark.prependedAll        100  avgt   20    1107.418 ±   146.895  ns/op
[info] ListBenchmark.prependedAll       1000  avgt   20   10747.862 ±   627.832  ns/op
[info] ListBenchmark.prependedAll      10000  avgt   20  107582.601 ±  9426.232  ns/op

TODO

  • Write more property/unit tests for ::: and prependedAll

  • Benchmark more thoroughly with more forks and iterations and

@scala-jenkins scala-jenkins added this to the 2.13.2 milestone Dec 18, 2019
@joshlemer joshlemer force-pushed the colon-colon-colon-opt branch 2 times, most recently from 8602022 to 186739d Compare December 18, 2019 20:33
@joshlemer joshlemer marked this pull request as ready for review December 18, 2019 21:07
@diesalbla diesalbla added the library:collections PRs involving changes to the standard collection library label Dec 19, 2019
@joshlemer joshlemer force-pushed the colon-colon-colon-opt branch 3 times, most recently from e0fb2a3 to 15f729b Compare January 1, 2020 04:52
@joshlemer
Copy link
Contributor Author

joshlemer commented Jan 1, 2020

I've pushed an optimization of List.from(IterableOnce) which results in like 3-4x speedup!

This Branch

[info] Benchmark                 (size)  Mode  Cnt     Score     Error  Units
[info] ListBenchmark.fromVector       0  avgt   20     2.518 ±   0.021  ns/op
[info] ListBenchmark.fromVector       1  avgt   20    14.544 ±   0.461  ns/op
[info] ListBenchmark.fromVector       2  avgt   20    18.685 ±   1.176  ns/op
[info] ListBenchmark.fromVector       3  avgt   20    22.879 ±   0.678  ns/op
[info] ListBenchmark.fromVector       4  avgt   20    26.933 ±   0.335  ns/op
[info] ListBenchmark.fromVector       5  avgt   20    33.303 ±   1.208  ns/op
[info] ListBenchmark.fromVector      10  avgt   20    53.243 ±   1.774  ns/op
[info] ListBenchmark.fromVector     100  avgt   20   375.164 ±  10.294  ns/op
[info] ListBenchmark.fromVector    1000  avgt   20  4026.165 ± 301.378  ns/op


2.13.x

[info] Benchmark                 (size)  Mode  Cnt      Score     Error  Units
[info] ListBenchmark.fromVector       0  avgt   20      2.572 ±   0.119  ns/op
[info] ListBenchmark.fromVector       1  avgt   20     32.201 ±   0.308  ns/op
[info] ListBenchmark.fromVector       2  avgt   20     55.803 ±   6.275  ns/op
[info] ListBenchmark.fromVector       3  avgt   20     71.952 ±   3.001  ns/op
[info] ListBenchmark.fromVector       4  avgt   20     90.953 ±   1.730  ns/op
[info] ListBenchmark.fromVector       5  avgt   20    110.080 ±   1.715  ns/op
[info] ListBenchmark.fromVector      10  avgt   20    219.745 ±   8.785  ns/op
[info] ListBenchmark.fromVector     100  avgt   20   1930.478 ±  35.367  ns/op
[info] ListBenchmark.fromVector    1000  avgt   20  19149.557 ± 137.165  ns/op

@joshlemer joshlemer changed the title List optimizations for ::: and prependedAll List optimizations for List.from, ::: and prependedAll Jan 1, 2020
@diesalbla
Copy link
Contributor

For those benchmarks, could you also show the gc.alloc.rate results?

@joshlemer
Copy link
Contributor Author

@diesalbla results were too big to post inline, you can see them here though: https://gist.github.com/joshlemer/63ea43be7cdd57c7b5cf605b5d8eebdb

@diesalbla
Copy link
Contributor

thanks. The results indicate no changes for the fromVector, 16 fewer bytes for colonColonColon, and half allocations for prependedAll

@joshlemer
Copy link
Contributor Author

@diesalbla yeah that's right. That's kinda confusing though, why there aren't any savings for fromVector considering that the new implementation saves on the allocation of the ListBuffer object...

@NthPortal
Copy link
Contributor

NthPortal commented Jan 2, 2020

what's the reason for the huge speedup vs ListBuffer.from(...).toList?

I feel like it would be better to fix ListBuffer#addAll than to add a custom List.from. The only thing I can guess is the performance hit is that ListBuffer#addAll writes to the fields in each iteration, rather than just at the end. Do you want to try keeping the building in local variables and just storing them at the end, and see if that evens up the performance @joshlemer?


@Test def colonColonColon(): Unit = {

assertEquals(Nil, Nil ::: Nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it may be worth adding some assertNonAllocating on the appropriate cases

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with this method, what's the import? Or are you saying I could implement it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misremembered the name - its just nonAllocating

see HashSetTest
your test class needs to extend scala.tools.testing.AllocationTest
then you can call

  def nonAllocating[T: Manifest](fn: => T)(implicit execution: AllocationExecution = AllocationExecution()): T 

  def onlyAllocates[T: Manifest](size: Int)(fn: => T)(implicit execution: AllocationExecution = AllocationExecution()): T 

so you can get tests like this (in HashSetTest)


  @Test
  def nonAllocatingEqualsNotShared(): Unit = {
    val base = generate()
    val notShared = generate()

    assertTrue(nonAllocating {
      base == notShared
    })
    assertTrue(nonAllocating {
      notShared == base
    })
  }

@joshlemer
Copy link
Contributor Author

what's the reason for the huge speedup vs ListBuffer.from(...).toList?

I feel like it would be better to fix ListBuffer#addAll than to add a custom List.from. The only thing I can guess is the performance hit is that ListBuffer#addAll writes to the fields in each iteration, rather than just at the end. Do you want to try keeping the building in local variables and just storing them at the end, and see if that evens up the performance @joshlemer?

@NthPortal this seems like a reasonable suggestion 😄 . Might not get to that for a few days though.

@joshlemer
Copy link
Contributor Author

@NthPortal I tried this implementation of ListBuffer#addAll and it only moved the needle a couple of percent:

  override final def addAll(xs: IterableOnce[A]): this.type = {
    val it = xs.iterator
    if (it.hasNext) {
      ensureUnaliased()
      val last1 = new ::[A](it.next(), Nil)
      var _last0 = last0
      var _len = len + 1
      if (_len == 1) first = last1 else _last0.next = last1
      _last0 = last1
      while (it.hasNext) {
        val last1 = new ::[A](it.next(), Nil)
        _last0.next = last1
        _last0 = last1
        _len += 1
      }
      last0 = _last0
      len = _len
    }
    this
  }

@NthPortal
Copy link
Contributor

hmm, oh well. was worth a shot I guess. I don't have any other ideas as to why it's so much worse

@SethTisue SethTisue modified the milestones: 2.13.2, 2.13.3 Feb 6, 2020
@szeiger
Copy link
Contributor

szeiger commented Feb 19, 2020

It looks like this is ready to merge. Is there anything you still want to change? In case of List.from I would go with the simpler version because there's no conclusive difference in the benchmarks.

@joshlemer joshlemer force-pushed the colon-colon-colon-opt branch 2 times, most recently from eea23af to 2169dd6 Compare March 17, 2020 17:43
Optimize List#prependedAll
Optimize List.from
Property tests for List.from, ::: and prependedAll
@joshlemer joshlemer force-pushed the colon-colon-colon-opt branch from 2169dd6 to 55821d5 Compare March 20, 2020 00:52
@joshlemer
Copy link
Contributor Author

@SethTisue This would be a really excellent speedup to include in 2.13.2 if possible

@SethTisue SethTisue modified the milestones: 2.13.3, 2.13.2 Mar 20, 2020
@SethTisue SethTisue added the performance the need for speed. usually compiler performance, sometimes runtime performance. label Mar 20, 2020
@SethTisue
Copy link
Member

Stefan's out of commission at the moment, but it seems plausible he'll have time for a final review/merge here in time for 2.13.2

@szeiger szeiger merged commit ca30256 into scala:2.13.x Apr 1, 2020
@joshlemer joshlemer deleted the colon-colon-colon-opt branch April 1, 2020 16:54
hamzaremmal pushed a commit to hamzaremmal/scala3 that referenced this pull request May 2, 2025
List optimizations for List.from, ::: and prependedAll
hamzaremmal pushed a commit to scala/scala3 that referenced this pull request May 7, 2025
List optimizations for List.from, ::: and prependedAll
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

library:collections PRs involving changes to the standard collection library performance the need for speed. usually compiler performance, sometimes runtime performance.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants