Skip to content

Classpath improvements#5957

Closed
mkeskells wants to merge 2 commits intoscala:2.12.xfrom
rorygraves:2.12.x_classpath2
Closed

Classpath improvements#5957
mkeskells wants to merge 2 commits intoscala:2.12.xfrom
rorygraves:2.12.x_classpath2

Conversation

@mkeskells
Copy link
Contributor

@mkeskells mkeskells commented Jun 22, 2017

This PR provides a series of improvements to classpath handling

12% reduced CPU, 23% reduction in allocation overall based on the figures in comments below

  • A clean implementation of ClassPath without AbstractFile dependencies in the implementation ( still required in the specification though)
  • Improvements to the data structures in the classpath implementation to facilitate efficient scanning and caching
  • All new Classpath implementation allow for caching on lookups avoiding expensive aggregation and/or file access
  • no locks maintained o ZipFiles to aid windows IDEs
  • Use of NIO to speed file access
  • Ability to scan files and dirs in background threads as this is not AST affecting

I had intended to do some more work on the prior to PR but as @retronym opened #5956 it seems worthwhile to combine these 2 approaches

@mkeskells
Copy link
Contributor Author

mkeskells commented Jun 22, 2017

A slightly earlier version of this change had the following performance measurements

this benchmark included some other changes which are not related and and more WIP, so I have removed then from the PR

I will run a benchmark on this overnight and post up when I can, but here are the stats for a full compile and a focus on the xsbt-dependency phase, which is the star

after 10 90%, full cycle

                  RunName                        AllWallMS                             CPU_MS                         Allocated
       00_linker_baseline  510        8,592.24 [-2.68% +2.96%]            8,467.01 [-2.38% +2.97%]            2,588.77 [-0.04% +0.04%]
        00_linker_cp_raw2  529        7,761.38 [-6.41% +9.11%]            7,558.16 [-6.56% +8.12%]            2,124.32 [-0.06% +0.06%]

after 10 90%, phase xsbt-dependency, no GC
                  RunName                        AllWallMS                             CPU_MS                         Allocated
       00_linker_baseline    16            756.67 [-1.54% +1.75%]                758.79 [-1.16% +2.96%]                518.43 [-0.01% +0.01%]
        00_linker_cp_raw2    18            183.19 [-10.51% +13.15%]              183.16 [-14.69% +10.90%]               57.00 [-0.45% +0.18%]

@mkeskells
Copy link
Contributor Author

fuller results

Full performance figures, to 100 iterations of akka actor,

As a highlight the 3 rows represent the baseline, this change with no args and this change with extra args "-Yclasspath-cache-enabled", "-Yclasspath-top-prefetch", "-Yclasspath-raw-jar", "-Yclasspath-raw-dir"
Which enable respectively intra JVM jar cache, background cache population for the aggregate classpath, BasicClassPath implementation of a Jar ClassPath and BasicClassPath implementation of a directory class path

The phases are (annoyingly) not ordered, but the highlights are ( based on the last 90 runs)
12% reduced CPU, 23% reduction in allocation overall
We see a large reduction in sbt usage, but also in typer which is expected, and jvm which surprised me

ALL 100 cycles

                  RunName                        AllWallMS                             CPU_MS                         Allocated
       00_linker_baseline   2700       9,224.06 [-13.60% +276.82%]      8,964.06 [-12.50% +248.27%]      2,585.83 [-0.49% +15.01%]
             00_cp-noargs   2700       9,098.14 [-14.30% +325.31%]      8,823.59 [-13.05% +280.37%]      2,753.42 [-0.41% +13.22%]
                00_cpargs   2700       8,198.62 [-12.88% +302.70%]      7,989.22 [-13.75% +290.17%]      2,100.26 [-0.95% +19.20%]

( no outlyers, after warmup)
after 10 90% 

                  RunName                        AllWallMS                             CPU_MS                         Allocated
       00_linker_baselinw   2211       8,472.77 [-5.93% +5.94%]            8,303.82 [-5.54% +5.56%]            2,578.64 [-0.21% +0.06%]
             00_cp-noargs   2211       8,281.12 [-5.84% +6.91%]            8,136.19 [-5.71% +6.01%]            2,746.31 [-0.15% +0.16%]
                00_cpargs   2211       7,543.67 [-5.31% +4.66%]            7,362.85 [-6.41% +4.83%]            2,090.69 [-0.49% +0.52%]

after 20 90%

                  RunName                        AllWallMS                             CPU_MS                         Allocated
       00_linker_baseline  1968       8,437.27 [-5.54% +6.39%]            8,268.88 [-5.14% +5.82%]            2,578.45 [-0.21% +0.06%]
             00_cp-noargs  1968       8,237.95 [-5.35% +6.25%]            8,094.84 [-5.23% +5.20%]            2,745.83 [-0.14% +0.17%]
                00_cpargs  1968       7,518.85 [-5.00% +4.65%]            7,336.59 [-6.08% +5.21%]            2,089.35 [-0.43% +0.56%]


after 10 90%, phase xsbt-dependency, no GC

                  RunName                        AllWallMS                             CPU_MS                         Allocated
       00_linker_baseline    61        871.55 [-5.52% +7.27%]                868.34 [-6.43% +7.96%]                516.33 [-0.01% +0.01%]
             00_cp-noargs    63        845.80 [-5.50% +4.91%]                843.50 [-5.53% +5.59%]                658.14 [-0.01% +0.01%]
                00_cpargs    81        180.69 [-11.27% +4.07%]               180.36 [-13.37% +3.96%]                56.52 [-0.88% +0.31%]

after 10 90%, phase typer, no GC

                  RunName                        AllWallMS                             CPU_MS                         Allocated
       00_linker_baseline    79         2,504.58 [-8.97% +9.99%]            2,485.17 [-8.83% +8.77%]               621.99 [-0.16% +0.20%]
             00_cp-noargs    77         2,430.84 [-8.88% +12.01%]           2,421.88 [-8.39% +11.61%]              624.65 [-0.16% +0.22%]
                00_cpargs    78         2,368.58 [-8.33% +10.61%]           2,359.58 [-7.95% +9.92%]               604.80 [-0.35% +0.34%]

after 10 90%, phase jvm, no GC

                  RunName                        AllWallMS                             CPU_MS                         Allocated
       00_linker_baseline    59         1,683.90 [-5.43% +6.15%]            1,653.87 [-5.52% +6.76%]               445.21 [-0.83% +0.96%]
             00_cp-noargs    68         1,656.25 [-5.71% +7.36%]            1,627.53 [-5.92% +6.57%]               460.99 [-0.09% +0.08%]
                00_cpargs    41         1,612.99 [-4.08% +4.35%]            1,583.08 [-5.25% +3.64%]               449.43 [-0.28% +0.17%]

@SethTisue SethTisue added the performance the need for speed. usually compiler performance, sometimes runtime performance. label Jun 27, 2017
@mkeskells
Copy link
Contributor Author

removed the default setting of the jar cache, that was failing test

@retronym
Copy link
Member

retronym commented Jul 4, 2017

The speedups here look great, but it will take some more time to polish and review is available to include in 2.12.3.

I'm not sure about the motivation for using WatchService here. As a point of comparison, the internal implementation of java.util.Zip uses the file timestamp and inode to key its internal caches, this seems simpler and sufficient for our use case. Can you comment on your rationale?

@retronym retronym modified the milestones: 2.12.4, 2.12.3 Jul 4, 2017
@mkeskells
Copy link
Contributor Author

mkeskells commented Jul 4, 2017

HI @retronym
I did have an implementation that worked on using the file time. This has a couple of disadvantages as I see it, in that it doesn't work for directories and it requires file access lookup whenever you compile

I was looking for something that would work in an IDE, and with our internal, highly customised, build environment, and to ensure that it will cope when we build direct to jar in scalac or via zinc via sbt/zinc#305

If you have a jar/zip file then we don't want to have it open as this fails for updates to the jar (in windows it blocks the update and in unix it misses the update), so Watcher seemed a simple solution. It also seems to be (from code inspection only) that the current FlatClassPath caching implementation is broken in the the internal cache is invalid

I did have an alternate implementation that uses last modified time and size to verify but this requires verification at the re-use of the jar, so it in not uncommon for us to have around 500 jars on a compile classpath, and several of these dont end up getting accessed in the actual compile (e.g. you know that you will use rt.jar, but probably not jsse etc). Watcher is efficient and a simple callback API

If we wait until the time that we need the content of the JAR then this means that there is less opportunity to spin this work onto a background thread, and scan the jar in advance. All of this can time is not currently captured by the benchmarks in this PR, but should benefit from parallelization, particularly when jars are on physical media or NAS

We have had some discussion outside this forum, and I can see 4 reuse models, and there could be selected based on settings and pattern match for the classpath element. This would enable a comparison of the different mechanisms as for performance and complexity, and behaviour in different environments

  1. No reuse
  2. watcher based
  3. size & time based ( jar/zip only)
  4. known static

4 is marginally faster and useful in non interactive environments where there is no external modification, like CI and command line builds, or when some of the jar are read only ( e.g. maven/ivy)

As the classpath implementation is simpler than I started with it should be simple to provide the cache validation mechanism a mixin trait

Any additions to this will have to wait for three weeks as I will have very limited network access for that time

@rorygraves rorygraves deleted the 2.12.x_classpath2 branch August 5, 2017 11:23
@mkeskells mkeskells restored the 2.12.x_classpath2 branch August 10, 2017 20:25
@mkeskells
Copy link
Contributor Author

/rebuild

}
}
}
//class CachedLazyIndexedMapping[V <: Named] (miss : (String => Iterable[V]) )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commented out code. Did you overlook it or is it expected (the WIP PR)?

Copy link
Contributor Author

@mkeskells mkeskells Aug 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was an experiment - trying to optimise an access pattern, but I think it will be out of scope in this PR. I will delete when we get close to agreement on what is viable

Copy link
Contributor Author

@mkeskells mkeskells Aug 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will be removed in the next push.
It was to get around the access of a Seq[File/class etc] by name
e.g. for findClass

replaced by an optimisation in PackageInfo

class PackageInfo(val packageName: String, val list: ClassPathEntries, val files: Seq[FileEntryType], val packages: Seq[PackageEntryImpl]) {
    lazy val filesByName: Map[String, AbstractFile] = files.map {
      file => file.name -> file.file
    }(collection.breakOut)

@retronym
Copy link
Member

Moving to 2.12.5 as Mike and I are still working on the design for these changes.

@SethTisue SethTisue modified the milestones: 2.12.5, 2.12.6 Feb 22, 2018
@SethTisue
Copy link
Member

needs rebase. tentatively moving to 2.12.6 milestone?

@mkeskells
Copy link
Contributor Author

I will focus on this (or maybe a replacement) now that the backend changes are merged

@SethTisue SethTisue removed this from the 2.12.6 milestone Mar 23, 2018
@SethTisue SethTisue added this to the 2.12.7 milestone Mar 23, 2018
@adriaanm
Copy link
Contributor

@mkeskells: ping -- shall we close this in favor of the future replacement PR you mentioned? I'm trying to decrease the PR queue depth a bit

@SethTisue
Copy link
Member

closing for now. once activity resumes we're happy to reopen or pursue in a new PR, as appropriate.

@SethTisue SethTisue closed this Jun 6, 2018
@SethTisue SethTisue removed this from the 2.12.7 milestone Jun 6, 2018
@mkeskells
Copy link
Contributor Author

we can reuse this in the work that we are about to start, but the PR will not merge from here, so we can restart a new one when we have a proposal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance the need for speed. usually compiler performance, sometimes runtime performance.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants