Skip to content

Inter-project dependency tracking#2354

Merged
eed3si9n merged 2 commits intosbt:0.13from
eed3si9n:wip/internal
Jan 13, 2016
Merged

Inter-project dependency tracking#2354
eed3si9n merged 2 commits intosbt:0.13from
eed3si9n:wip/internal

Conversation

@eed3si9n
Copy link
Member

@eed3si9n eed3si9n commented Jan 7, 2016

Fixes #2266

Adds trackInternalDependencies and exportToInternal settings. These
can be used to control whether to trigger compilation of a dependent
subprojects when you call compile. Both keys will take one of three
values: TrackLevel.NoTracking, TrackLevel.TrackIfMissing, and
TrackLevel.TrackAlways. By default they are both set to
TrackLevel.TrackAlways.

When trackInternalDependencies is set to TrackLevel.TrackIfMissing,
sbt will no longer try to compile internal (inter-project) dependencies
automatically, unless there are no *.class files (or JAR file when
exportJars is true) in the output directory. When the setting is
set to TrackLevel.NoTracking, the compilation of internal
dependencies will be skipped. Note that the classpath will still be
appended, and dependency graph will still show them as dependencies.
The motivation is to save the I/O overhead of checking for the changes
on a build with many subprojects during development. Here's how to set
all subprojects to TrackIfMissing.

    lazy val root = (project in file(".")).
      aggregate(....).
      settings(
        inThisBuild(Seq(
          trackInternalDependencies := TrackLevel.TrackIfMissing,
          exportJars := true
        ))
      )

The exportToInternal setting allows the dependee subprojects to opt
out of the internal tracking, which might be useful if you want to
track most subprojects except for a few. The intersection of the
trackInternalDependencies and exportToInternal settings will be
used to determine the actual track level. Here's an example to opt-out
one project:

    lazy val dontTrackMe = (project in file("dontTrackMe")).
      settings(
        exportToInternal := TrackLevel.NoTracking
      )

/review @gkossakowski @dwijnand, @jsuereth, @Duhemm

@eed3si9n eed3si9n changed the title Inter-project dependency tracking. Fixes #2266 Inter-project dependency tracking Jan 7, 2016
@dwijnand
Copy link
Member

dwijnand commented Jan 9, 2016

I'll try and have a look when I have a bit more time to dig into it, but I thought perhaps we could cc in @fommil, given he's been spending time recently with this problem in parallel, and he opened the original issue.

Also seems codacy isn't happy.

@fommil
Copy link
Contributor

fommil commented Jan 9, 2016

@dwijnand thanks, I have this on my todo list today to investigate / understand. Been wasting the whole day trying to build a Windows 7 image to test on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is still going to kill the performance, as it invokes quite a lot of subtasks. Hence #2348

I'll do some graph timings with my patch to see what is invoked, but my suspicion is that this isn't going to make a massive difference to performance on Windows due to the large number of Tasks that will be created 😢

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can certainly make lighter-weight setting or task for jar file if that's going to trigger things. I just reopened #2348.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Btw, I don't think this is the only thing triggering lots of upstream Tasks.

@fommil
Copy link
Contributor

fommil commented Jan 9, 2016

I think I've been reading too much of the sbt source code because I actually understand this 😨

Conceptually it looks sound, but I can't be sure about which tasks are being scheduled until I run this on my test projects (I'll turn off sbt-big-project settings). I'll report back when I do that.

@fommil
Copy link
Contributor

fommil commented Jan 9, 2016

Running this on my simple test project https://github.com/fommil/sbt-big-project/tree/master/src/sbt-test/sbt-big-project/simple reconditioned to look like

object SimpleBuild extends Build {
  override lazy val settings = super.settings ++ Seq(
    scalaVersion := "2.10.6",
    version := "v1",
    trackInternalDependencies := TrackLevel.TrackIfMissing,
    exportJars := true,
    concurrentRestrictions in Global := Seq(Tags.limitAll(1))
  )
  def simpleProject(name: String): Project = {
    BigProjectTestSupport.createSources(name)
    Project(name, file(name)).settings(
      //BigProjectPlugin.overrideProjectSettings(Compile, Test),
      BigProjectTestSupport.testInstrumentation(Compile, Test)
    )
  }
  val a = simpleProject("a")
  val b = simpleProject("b") dependsOn(a)
  val c = simpleProject("c") dependsOn(b)
  val d = simpleProject("d") dependsOn(c)
}

I'm still seeing a lot of dependent tasks in subprojects running. This doesn't look so bad for a small project, but for huge projects this is a lot of tasks. In particular, this is causing 32 tasks to run per dependent project, which is not scalable, especially for builds with 100+ projects.

Below is the output of #1209 (comment) with a filter to only show the tasks for the a subproject.

     16:  a/*:update: 152.999064 ms
     26:  a/compile:exportedProductsIfMissing: 94.152834 ms
     37:  a/compile:unmanagedJars: 75.065294 ms
     47:  a/*:projectDescriptors: 62.375231 ms
     54:  a/*:ivySbt: 47.900470999999996 ms
     55:  a/*:ivyConfiguration: 47.027559 ms
     58:  a/compile:internalDependencyClasspath: 43.097148 ms
     60:  a/*:ivyModule: 41.560179999999995 ms
     61:  a/compile:dependencyClasspath: 40.824545 ms
     63:  a/*:fullResolvers: 39.766501 ms
     64:  a/*:projectResolver: 39.411094999999996 ms
     67:  a/compile:externalDependencyClasspath: 36.671284 ms
     68:  a/*:moduleSettings: 36.525515 ms
     77:  a/*:allDependencies: 29.765629999999998 ms
     80:  a/compile:managedClasspath: 28.869135999999997 ms
     86:  a/*:update::unresolvedWarningConfiguration: 26.312904 ms
     87:  a/*:transitiveUpdate: 26.202226999999997 ms
     88:  a/compile:readAnalysis: 24.143079 ms
     91:  a/compile:unmanagedClasspath: 23.806255999999998 ms
     95:  a/*:projectDependencies: 23.214388 ms
     96:  a/compile:incCompileSetup: 23.113008999999998 ms
    103:  a/*:dependencyPositions: 20.61122 ms
    107:  a/*:credentials: 19.892084999999998 ms
    113:  a/compile:classpathConfiguration: 11.159374999999999 ms
    115:  a/compile:update: 10.568942999999999 ms
    128:  a/*:externalResolvers: 6.731135999999999 ms
    131:  a/*:bootResolvers: 6.668432999999999 ms
    132:  a/*:dependencyCacheDirectory: 6.656975999999999 ms
    133:  a/*:updateCacheName: 6.598299 ms
    139:  a/*:incOptions: 4.222926999999999 ms
    140:  a/compile:compileAnalysisFilename: 4.116671999999999 ms
    141:  a/compile:productDirectories: 0.990143 ms

and if I enable caching updateOptions := updateOptions.value.withCachedResolution(true) I still get these tasks.

So I honestly can't say if this improves things or not, sorry!

However, combined with sbt-big-project, this might actually improve the workflow because I've still got a few unsolved problems e.g. https://github.com/fommil/sbt-big-project/issues/11 (I'll be working on that actively, so you might want to check back)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this trigger multiple calls to compile task, one for each product?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Within a dynamic task (Def.taskDyn) the last task value is deferred.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, good to know.

@gkossakowski
Copy link
Contributor

Could you give a context when is this really needed? Is it trying to workaround issues with NFS file systems?

What incremental compiler is doing is equivalent to what git status is doing. For git status it takes 0.05s to check status of 40k files on my machine (MBP 2015). I'm guessing performance scales linearly so it would take 0.5s to check status of 400k or about 1.2s for 1m files. Does sbt need to support larger projects than with 1m files?

@eed3si9n
Copy link
Member Author

Could you give a context when is this really needed? Is it trying to workaround issues with NFS file systems?

Primarily yes. More details are found in #2266 and https://github.com/eed3si9n/sbt-big-project-test, but on Windows 100+ subproject build seem to take 27s just running compile without any source change.
I was also thinking that this feature can potentially become an extension point down the line for further optimization like binary/source switching.

@eed3si9n
Copy link
Member Author

Here's the timing breakdown that sbt.task.timings collected - https://docs.google.com/spreadsheets/d/1bb3mAo0mC2LqXaomiNVs0uKBlwMG1vPActzTUSKtWjQ/edit#gid=166341631

@gkossakowski
Copy link
Contributor

Thanks for sharing details. From incremental compiler point of view this looks good to me. I'll defer review of sbt task engine performance to other people, though.

@fommil
Copy link
Contributor

fommil commented Jan 13, 2016

This is what I tried, using the simple project edited to not use the BigProjectSettings, and to use TrackIfMissing

import sbt._
import Keys._
import Def.Initialize
import fommil._
object SimpleBuild extends Build {
  override lazy val settings = super.settings ++ Seq(
    scalaVersion := "2.10.6",
    version := "v1",
  )
  def simpleProject(name: String): Project = {
    BigProjectTestSupport.createSources(name)
    Project(name, file(name)).settings(
       trackInternalDependencies := TrackLevel.TrackIfMissing,
       exportJars := true
    )
  }
  val a = simpleProject("a")
  val b = simpleProject("b") dependsOn(a)
  val c = simpleProject("c") dependsOn(b)
  val d = simpleProject("d") dependsOn(c)
}
  1. d/compile
  2. check that another d/compile doesn't invoke a/compile:compile, b/compile:compile or c/compile:compile
  3. note that compile does force compiles of everything. Interesting (probably because I added it for each project, not the settings, so the root project doesn't get the setting)
  4. exit sbt, start sbt
  5. check that d/compile doesn't invoke a/compile:compile, etc
  6. make a trivial change to something under b
  7. d/compile doesn't recompile b (this is expected behaviour, "desired" is a strong word but it is the trade off for this workflow. Hopefully I will address this is sbt-big-project)
  8. delete ./b/target/scala-2.10/b_2.10-v1.jar
  9. d/compile does recompile b (nice! desired behaviour)
  10. rm -f b/target/scala-2.10/classes/b/*
  11. d/compile does not recompile b (very much desired behaviour, believe it or not!)
  12. b/compile does recompile b (desired behaviour)
  13. b/compile does attempt to recompile b (desired behaviour)

so, basically, fantastic work!

The one thing that doesn't work quite as expected is this:

  1. d/compile
  2. trivial change to b
  3. b/compile
  4. d/compile or d/runMain => uses old version of b.jar

this is something I've been struggling with in sbt-big-project even without these changes, and I think the solution is to do something like this https://github.com/fommil/sbt-big-project/blob/c3d4bc3260ac5e25f63bdb94a7e0bd5c3ada6a7d/src/main/scala/BigProjectSettings.scala#L100-L117 (i.e. delete the jar if the compile produced anything). There are a couple of other sbt / zinc problems deep inside here.

AFAIK, this doesn't address the fundamental performance problems with sbt on large projects on Windows, but it gives me the levers that I need in order to implement sensible workarounds in sbt-big-project. Great work!

I probably need to go back to the drawing board with sbt-big-project because I don't necessarily need to cache everything that I'm caching, so I should find what the remaining bottlenecks are. I might not need to cache around packageBin anymore, for example.

Can you please squash and create a commit that cleanly applies against 0.13.9? (I get conflicts)

@dwijnand
Copy link
Member

Thanks @fommil for trying this out.

Adds `trackInternalDependencies` and `exportToInternal` settings. These
can be used to control whether to trigger compilation of a dependent
subprojects when you call `compile`. Both keys will take one of three
values: `TrackLevel.NoTracking`, `TrackLevel.TrackIfMissing`, and
`TrackLevel.TrackAlways`. By default they are both set to
`TrackLevel.TrackAlways`.

When `trackInternalDependencies` is set to `TrackLevel.TrackIfMissing`,
sbt will no longer try to compile internal (inter-project) dependencies
automatically, unless there are no `*.class` files (or JAR file when
`exportJars` is `true`) in the output directory. When the setting is
set to `TrackLevel.NoTracking`, the compilation of internal
dependencies will be skipped. Note that the classpath will still be
appended, and dependency graph will still show them as dependencies.
The motivation is to save the I/O overhead of checking for the changes
on a build with many subprojects during development. Here's how to set
all subprojects to `TrackIfMissing`.

    lazy val root = (project in file(".")).
      aggregate(....).
      settings(
        inThisBuild(Seq(
          trackInternalDependencies := TrackLevel.TrackIfMissing,
          exportJars := true
        ))
      )

The `exportToInternal` setting allows the dependee subprojects to opt
out of the internal tracking, which might be useful if you want to
track most subprojects except for a few. The intersection of the
`trackInternalDependencies` and `exportToInternal` settings will be
used to determine the actual track level. Here's an example to opt-out
one project:

    lazy val dontTrackMe = (project in file("dontTrackMe")).
      settings(
        exportToInternal := TrackLevel.NoTracking
      )
@eed3si9n
Copy link
Member Author

@fommil Thanks for the review.

Can you please squash and create a commit that cleanly applies against 0.13.9? (I get conflicts)

See 98b26ae. I've squashed and backported the commits to 0.13.9, and merged that forward to 0.13.

eed3si9n added a commit that referenced this pull request Jan 13, 2016
Inter-project dependency tracking
@eed3si9n eed3si9n merged commit a5bda9d into sbt:0.13 Jan 13, 2016
@fommil
Copy link
Contributor

fommil commented Jan 24, 2016

FYI, I'm formalising the above test scripted with a scripted test in sbt-big-project, including some aspects of the workflow that sbt-big-project expects (such as b/compile deleting b's packageBin and d/runMain needing to use the latest jar)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants