Skip to content

Nested Future blocks do not run in parallel in Scala 2.13.x #12089

@lihaoyi

Description

@lihaoyi

reproduction steps

In the code below, we can see that in Scala 2.13.2, the three nested Future blocks run sequentially, whereas in Scala 2.12.12 they run in parallel. I would expect them to run in parallel in both cases, as I am spawning a small number of Futures and have 16 cores on this machine that should be able to run them in parallel.

$ sbt '++2.13.2!; console'
Welcome to Scala 2.13.2 (OpenJDK 64-Bit Server VM, Java 1.8.0_252).
Type in expressions for evaluation. Or try :help.

scala> import scala.concurrent._, ExecutionContext.Implicits._, duration.Duration.Inf

scala> def slow(key: String) = Future{ println(s"$key start"); Thread.sleep(1000); println(s"$key end"); key }

scala> def runAsyncSerial(): Future[Seq[String]] = slow("A").flatMap { a => Future.sequence(Seq(slow("B"), slow("C"), slow("D"))) }

scala> Await.result(runAsyncSerial(), Inf)
A start
A end
D start
D end
C start
C end
B start
B end
val res0: Seq[String] = List(B, C, D)

$ sbt '++2.12.12!; console'
Welcome to Scala 2.12.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_252).

scala> import scala.concurrent._, ExecutionContext.Implicits._, duration.Duration.Inf

scala> def slow(key: String) = Future{ println(s"$key start"); Thread.sleep(1000); println(s"$key end"); key }

scala> def runAsyncSerial(): Future[Seq[String]] = slow("A").flatMap { a => Future.sequence(Seq(slow("B"), slow("C"), slow("D"))) }

scala> Await.result(runAsyncSerial(), Inf)
A start
A end
B start
C start
D start
B end
C end
D end
res0: Seq[String] = List(B, C, D)

AFAICT this is a minimal repro; either (A) wrapping the Thread.sleep in blocking() or (B) using an explicit Threadpool execution context or (C) removing the flatMap wrapper all seem to make it run correctly parallel in 2.13.2. This applies to all versions 2.13.x

OTOH -Ydelambdafy:inline or replacing the flatMap with map + flatten seems to make no difference. Neither does changing the JVM version from 8 to 11, or swapping between sbt console and Ammonite, or moving it from the REPL into a .sc script or full .scala file in a Mill project.

In the above example I'm spawning a small number of Futures, but in the case where I hit this in the wild I was spawning 100s of Futures and having them all run serially instead of in parallel, resulting in a 16x slowdown over what I was expecting

Notably, this slowdown applies regardless of how long the operations are: an operation that takes 1000ms to run 16x parallel now would take 16000ms, but an operation that takes 1ms to run 16x parallel would now take 16ms. Both could be equally bad depending on where it happens (e.g. in a lot of backend services, extra 15ms on every operation would violate SLAs)

Furthermore, the Futures documentation precisely and accurately describes the behavior of the ExecutionContext.global pre-2.13, as @jimm-porch has pointed out (https://docs.scala-lang.org/overviews/core/futures.html):

The number of concurrently blocking computations can exceed the parallelism level only if each blocking call is wrapped inside a blocking call (more on that below).

Last but not least, you must remember that the ForkJoinPool is not designed for long lasting blocking operations.

This description is no longer accurate as of 2.13. From what I have gathered in the discussion on this thread, the 2.13 behavior is better described as:

The number of any concurrent computations can exceed one only if each call is wrapped in a blocking call

Last but not least, you must remember that the ForkJoinPool is not designed for CPU-bound operations.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions