Skip to content

Optimize classpath implementation to speed up SBT#5956

Merged
retronym merged 2 commits intoscala:2.12.xfrom
retronym:faster/findClass
Jul 7, 2017
Merged

Optimize classpath implementation to speed up SBT#5956
retronym merged 2 commits intoscala:2.12.xfrom
retronym:faster/findClass

Conversation

@retronym
Copy link
Member

@retronym retronym commented Jun 22, 2017

SBT is a heavy user of Classpath.findClass, and its performance regressed, especially for large classpaths, under the new "flat" classpath implementation. The new implementation shipped in 2.11 experimentally (-YclasspathImpl:flat), and was promoted to the default an only implementation in 2.12.

In 2.12, the performance was improved some in 0f9a704 / #5112. This commit goes further by optimizing the lookup of a given class in a JAR classpath entry, and, most importantly, by avoiding a linear scan of the classpath on each lookup by indexing which classpath entries contribute classes to each package.

I tested a 2.11 backport of this patch using https://github.com/guardian/frontend as a test case.

Results: https://gist.github.com/3c89c21c54d27e566dec57a16a4bf208

tl;dr; No significant difference in performance between the classpath implementations after this change. The baseline performance was 55s, which regressed to 79s under -YclasspathImpl:flat and then progressed back to 55s using the same flat classpath implementation as in this PR.

I've also tried to patch SBT to use Symbol.associatedFile rather than scanning the classpath, but I haven't got that change into shape yet.

Fixes scala/bug#10289

@scala-jenkins scala-jenkins added this to the 2.12.3 milestone Jun 22, 2017
@retronym retronym requested a review from lrytz June 22, 2017 07:55
@retronym retronym force-pushed the faster/findClass branch 2 times, most recently from 1b72ec3 to 0ef7185 Compare June 22, 2017 07:59
@retronym retronym changed the title Optimize classpth implementation to speed up SBT Optimize classpath implementation to speed up SBT Jun 22, 2017
@dragos
Copy link
Contributor

dragos commented Jun 22, 2017

LGTM!

Perhaps not directly related, but still in this area of the code: we need to provide a classpath implementation in our codebase and it would be very helpful to widen the visibility of some classes/methods. They don't seem to be consistently private[nsc] anyway. It's mostly the *FileEntryImpl classes, that even though are implementing a trait, are used in many places as return type or type argument.

If that has chances to be merged I'm happy to open a PR.

Copy link
Member

@lrytz lrytz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

}

val classEntry = findEntry(isSource = false)
val sourceEntry = findEntry(isSource = true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could search for both entries in a single traversal, but it's probably not going to make a difference, as there are few aggregates for a given package.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done the in the BasicClassPath of #5957

@lrytz
Copy link
Member

lrytz commented Jun 22, 2017

widen the visibility of some classes/methods

I think that should be fine.

@retronym
Copy link
Member Author

@dragos I'm open to opening up access in those cases.

I think some of the cases you pointed out can be cleaned up by referring to the public types, as in https://github.com/retronym/scala/tree/topic/refactorClassPath

@smarter
Copy link
Member

smarter commented Jun 22, 2017

I've also tried to patch SBT to use Symbol.associatedFile rather than scanning the classpath, but I haven't got that change into shape yet.

Interestingly, the dotty sbt phases do not use findClass and were already using associatedFile, though now I'm wondering if that means we're tracking something incorrectly: https://github.com/lampepfl/dotty/blob/1d0579622da87655c6d32c4e02e7907e75a4f7b5/compiler/src/dotty/tools/dotc/sbt/ExtractDependencies.scala#L89-L90

@retronym
Copy link
Member Author

retronym commented Jun 22, 2017

We should give this a spin through the community build before merging.

Interestingly, the dotty sbt phases do not use findClass and were already using associatedFile, though now I'm wondering if that means we're tracking something incorrectly:

@smarter My patch seemed to have the correct behaviour, but didn't seem to improve performance in my test as I had expected.

@smarter
Copy link
Member

smarter commented Jun 22, 2017

@smarter My patch seemed to have the correct behaviour, but didn't seem to improve performance in my test as I had expected.

So findClass is not the culprit?

@MikeSkells
Copy link

MikeSkells commented Jun 22, 2017

Hi @retronym as discussed directly, I have some work in this area. Happy to share

I have a classpath cache that can use a file watcher to detect changes

This is showing a 20% reduction in memory allocation and around 10% in CPU

I think that my chnages should work well with yours - I have not been changing the AggregateClassPath, but focussed on the other structures.
I will email you directly to discuss

raw figures

 
                  RunName                        AllWallMS                             CPU_MS                         Allocated
       00_linker_baseline  510        8,592.24 [-2.68% +2.96%]            8,467.01 [-2.38% +2.97%]            2,588.77 [-0.04% +0.04%]
        00_linker_cp_raw2 529        7,761.38 [-6.41% +9.11%]            7,558.16 [-6.56% +8.12%]            2,124.32 [-0.06% +0.06%]

@mkeskells
Copy link
Contributor

/rebuild

@retronym
Copy link
Member Author

So findClass is not the culprit?

The best theory is that I've made some silly error in my attempt to modify SBT and/or my attempt to measure the difference. My gut feel is still that switching to associatedFile should also restore previous performance.

@retronym
Copy link
Member Author

/synch

@retronym
Copy link
Member Author

retronym commented Jun 23, 2017

Rebased and made a minor reversion of a change of signature to quiet MiMa, and tried again to schedule a community build at https://scala-ci.typesafe.com/job/scala-2.12.x-integrate-community-build/1621/

@retronym
Copy link
Member Author

Testing this locally in the Akka SBT build has flushed out a problem.

warn]                            ^
[error] /Users/jz/code/akka/akka-remote/src/main/java/akka/remote/WireFormats.java:3088: Class com.google.protobuf.InvalidProtocolBufferException not found - continuing with a stub.
[error]     boolean hasSerializerId();
[error]      ^
[error] Unable to locate class corresponding to inner class entry for Descriptor in owner com.google.protobuf.Descriptors
[error] Unable to locate class corresponding to inner class entry for Descriptor in owner com.google.protobuf.Descriptors
[error] /Users/jz/code/akka/akka-remote/src/main/java/akka/remote/WireFormats.java:6: Class com.google.protobuf.ByteString not found - continuing with a stub.
[error] public final class WireFormats {
[error]                    ^
[error] Unable to locate class corresponding to inner class entry for Descriptor in owner com.google.protobuf.Descriptors
[error] /Users/jz/code/akka/akka-remote/src/main/java/akka/remote/WireFormats.java:2385: Class com.google.protobuf.InvalidProtocolBufferException not found - continuing with a stub.
[error]       public akka.remote.WireFormats.AcknowledgementInfo getDefaultInstanceForType() {
[error]              ^

@retronym retronym added the WIP label Jun 23, 2017
@retronym
Copy link
Member Author

Testing this locally in the Akka SBT build has flushed out a problem.

Sigh, it slipped through my fingers and I can't reproduce now. Will see how the community build fares and try to retrace my steps locally in the Akka build. Marked as WIP so we don't merge just yet.

@SethTisue SethTisue added performance the need for speed. usually compiler performance, sometimes runtime performance. and removed reviewed labels Jun 27, 2017
@retronym
Copy link
Member Author

/synch

 - Rather than `findAll.filter(_.name == name)`, just lookup the entry
   with the right name to start with.
 - Avoid a linear scan of aggregate classpath by building and index keyed on
   package names
@retronym retronym force-pushed the faster/findClass branch from 109f46f to bd73417 Compare July 3, 2017 00:33
*
* Externally, it is used by sbt's compiler interface:
* https://github.com/sbt/sbt/blob/v0.13.15/compile/interface/src/main/scala/xsbt/CompilerInterface.scala#L249
* Jason has some improvements for that in the works (https://github.com/scala/bug/issues/10289#issuecomment-310022699)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could remove parts of this comment

@retronym retronym removed the WIP label Jul 7, 2017
@retronym
Copy link
Member Author

retronym commented Jul 7, 2017

Testing this locally in the Akka SBT build has flushed out a problem.

That turned out to be unrelated to this PR, and has been fixed separately in #5976.

@retronym retronym merged commit 17ef5ad into scala:2.12.x Jul 7, 2017
@jvican
Copy link
Member

jvican commented Jul 17, 2017

@smarter My patch seemed to have the correct behaviour, but didn't seem to improve performance in my test as I had expected.

Zinc 1.0 already checks for associatedFile and @romanowski (the author of the commit that introduced its use) experienced speedups in the codebase he worked on.

allanrenucci added a commit to allanrenucci/dotty that referenced this pull request Sep 12, 2017
allanrenucci added a commit to allanrenucci/dotty that referenced this pull request Sep 12, 2017
allanrenucci added a commit to allanrenucci/dotty that referenced this pull request Sep 13, 2017
allanrenucci added a commit to allanrenucci/dotty that referenced this pull request Sep 13, 2017
Adapted from scala commits b9520211b22e53d4f80801b0f64cdb6275ac55cf and
bd7341754f8aec4ddbf455544eb393ff6cf4a43a by Jason Zaugg and Lukas Rytz
in scala/scala#5956:

 - Rather than `findAll.filter(_.name == name)`, just lookup the entry
   with the right name to start with.
 - Avoid a linear scan of aggregate classpath by building and index
   keyed on
   package names
allanrenucci added a commit to dotty-staging/dotty that referenced this pull request Sep 13, 2017
Adapted from scala commits b9520211b22e53d4f80801b0f64cdb6275ac55cf and
bd7341754f8aec4ddbf455544eb393ff6cf4a43a by Jason Zaugg and Lukas Rytz
in scala/scala#5956:

 - Rather than `findAll.filter(_.name == name)`, just lookup the entry
   with the right name to start with.
 - Avoid a linear scan of aggregate classpath by building and index
   keyed on
   package names
allanrenucci added a commit to dotty-staging/dotty that referenced this pull request Sep 15, 2017
Adapted from scala commits b9520211b22e53d4f80801b0f64cdb6275ac55cf and
bd7341754f8aec4ddbf455544eb393ff6cf4a43a by Jason Zaugg and Lukas Rytz
in scala/scala#5956:

 - Rather than `findAll.filter(_.name == name)`, just lookup the entry
   with the right name to start with.
 - Avoid a linear scan of aggregate classpath by building and index
   keyed on
   package names
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance the need for speed. usually compiler performance, sometimes runtime performance.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants