Remove on-demand runfile tree creation from test strategy by buchgr · Pull Request #9244 · bazelbuild/bazel

buchgr · 2019-08-26T12:08:10Z

Remove on-demand runfile tree creation from test strategy

Bazel builds runfiles trees when building an executable target. This
allows a binary to be run directly (without bazel test/run). The flag
--nobuild_runfile_trees disables this behavior. In order to still be
able to run tests that require a runfiles tree the
Local/StandaloneTestStrategy has code to create / update it before
executing the test.

However, most execution strategies in Bazel don't use the runfiles tree:

Actions being executed using sandboxing don't have access to the
runfiles tree as their working directory is an ephemeral directory
created by the sandbox. Instead, the sandbox treats runfiles as normal
inputs and creates a new tree for every action.
Actions being executed with the remote strategy don't have access to
the runfiles tree either as they run on a different machine. Just like
the sandbox the remote strategy treats runfiles as normal inputs and
uploads them to the remote worker.

The runfiles tree is only used by actions executed with the standalone/local
or worker (with --worker_sandboxing=false) strategy as they use the exec
root as their working directory.

This change makes it so that if --nobuild_runfile_links is specified a
runfiles tree is only created if the execution strategy actually uses
it.

We found that this can lead to significant performance improvements in
particular for very large runfile trees. Also, before this change in
remote execution we would create a runfiles tree even for remote cache
hits.

As a side effect this change also removes an imparity between binaries
and tests, in that binaries had no such logic to on-demand create a
runfiles tree if --nobuild_runfile_links is set.

I will send out a follow up change with a similar optimization the
input / output manifest.

Note to Reviewers:

RunfilesTreeUpdater replaces TestStrategy#getLocalRunfilesDirectory.
The locking logic in RunfilesTreeUpdater implements one monitor per
runfiles directory. The monitor ensures that a runfiles directory
can't be modified concurrently. It replaces synchronized(inputManifest)
from getLocalRunfilesDirectory. We can't use synchronized(inputManifest)
for two reasons:
1. The execution strategies don't have access to it.
2. As noted above, in a follow up change I want to enable this logic
  to also work when --nobuild_runfile_manifests is set in which case no
  inputManifest object to use as monitor will exist.

buchgr · 2019-08-26T12:50:29Z

@werkt I believe you will like this.

haxorz · 2019-08-26T20:01:07Z

if you can wait until @ericfelly comes back from OOO next week, he'd be a more qualified reviewer than i would be.

buchgr · 2019-08-27T13:32:01Z

I would typically be able to wait but this week is special because next Monday Bazel 1.0 will be cut :). I'd appreciate reviews before that. I do realize that many folks knowledge able in this area are currently on vacation :.

haxorz · 2019-08-27T16:25:54Z

i'm sorry but i have very little context on this PR or even this part of the bazel codebase. i'd have to spent 1-2 days just researching the background before i could do a competent review. and i'm already pretty busy this week.

from their internal calendars, it look like @ishikhman, @lberki, and @jmmv are all not-OOO this week. are they available to review? if not, i can try to find someone else.

buchgr · 2019-08-28T08:11:56Z

@haxorz I wasn't trying to pick on you to do the review. I requested reviews from folks that I was thought might be knowledgeable in this area.

Take an execRoot and OutErr instead of the whole ActionExecutionContext.

Bazel builds runfiles trees when building an executable target. This allows a binary to be run directly (without bazel test/run). The flag --nobuild_runfile_trees disables this behavior. In order to still be able to run tests that require a runfiles tree the Local/StandaloneTestStrategy has code to create / update it before executing the test. However, most execution strategies in Bazel don't use the runfiles tree: * Actions being executed using sandboxing don't have access to the runfiles tree as their working directory is an ephemeral directory created by the sandbox. Instead, the sandbox treats runfiles as normal inputs and creates a new tree for every action. * Actions being executed with the remote strategy don't have access to the runfiles tree either as they run on a different machine. Just like the sandbox the remote strategy treats runfiles as normal inputs and uploads them to the remote worker. The runfiles tree is only used by actions executed with the standalone/local or worker (with --worker_sandboxing=false) strategy as they use the exec root as their working directory. This change makes it so that if --nobuild_runfile_links is specified a runfiles tree is only created if the execution strategy actually uses it. We found that this can lead to significant performance improvements in particular for very large runfile trees. In one large build we were able to measure a 27% e2e test time reduction. Also, before this change in remote execution we would create a runfiles tree even for remote cache hits. As a side effect this change also removes an imparity between binaries and tests, in that binaries had no such logic to on-demand create a runfiles tree if --nobuild_runfile_links is set. I will send out a follow up change with a similar optimization the input / output manifest. Note to Reviewers: * RunfilesTreeUpdater replaces TestStrategy#getLocalRunfilesDirectory. * The locking logic in RunfilesTreeUpdater implements one monitor per runfiles directory. The monitor ensures that a runfiles directory can't be modified concurrently. It replaces synchronized(inputManifest) from getLocalRunfilesDirectory. We can't use synchronized(inputManifest) for two reasons: 1) The execution strategies don't have access to it. 2) As noted above, in a follow up change I want to enable this logic to also work when --nobuild_runfile_manifests is set in which case no inputManifest object to use as monitor will exist.

src/main/java/com/google/devtools/build/lib/analysis/RunfilesSupport.java

src/main/java/com/google/devtools/build/lib/analysis/config/BuildConfiguration.java

lberki

As an alternative, /cc @aragos is considering making it so that strategy selection is decided in the analysis phase. If that comes to pass, you can just make this decision in the analysis phase.and not worry about RunfilesTreeUpdater.

lberki · 2019-08-28T10:50:16Z

src/main/java/com/google/devtools/build/lib/standalone/StandaloneModule.java

+    boolean runfilesEnabled = BuildConfiguration.runfilesEnabled(coreOptions);
+    // True if all runfile trees will be created when building binaries/test. If so, we won't need
+    // the RunfilesTreeUpdater.
+    boolean createRunfilesAtBuild = BuildConfiguration.shouldCreateRunfilesTree(coreOptions);


This will not work if someone changes the value of this flag in a configuration transition. Sure, it would be kind of weird to do so, but one can do that from Starlark (same in WorkerModule). You'll need to find a way to convey that bit from the configured target itself.

Interesting point. Would you consider this a blocker for this change or can this be done as an immediate follow up?

Blocker. Let's not submit known-incorrect code.

lberki · 2019-08-28T10:53:06Z

src/main/java/com/google/devtools/build/lib/exec/RunfilesTreeUpdater.java

+    try {
+      // Avoid rebuilding the runfiles directory if the manifest in it matches the input manifest,
+      // implying the symlinks exist and are already up to date. If the output manifest is a
+      // symbolic link, it is likely a symbolic link to the input manifest, so we cannot trust it as


Is it? I thought that we create it anew so as to avoid this exact issue, don't we?

I copied this comment from the old code. I also could not find a place where the output manifest is a symlink to the input manifest. It might be time to retire the comment and remove the check?

lberki · 2019-08-28T10:57:47Z

src/main/java/com/google/devtools/build/lib/exec/RunfilesTreeUpdater.java

+    }
+  }
+
+  private AtomicInteger getMonitor(PathFragment runfilesDirectory) {


I think this works, but I'm kinda afraid of this kind of multi-threaded code. Do you have any other alternatives that do not require this? For example (off the top of my head, I haven't given this too much thought), you could synchronize on the Path object referring to one of the manifests (not an Artifact, though, that's not guaranteed to be unique)

Or you could just add the lock to the TestRunnerAction. That has the advantage of not abusing the monitor of a Path. Of course, this only works if this is the only case when a runfiles tree is reused (I don't know what happens when the same binary and its runfiles tree is used as a tool in two local genrules executing in parallel)

For example (off the top of my head, I haven't given this too much thought), you could synchronize on the Path object referring to one of the manifests (not an Artifact, though, that's not guaranteed to be unique)

Exactly. What this code does is map equal path objects to a unique object. I believe it's the most principled way of solving this problem, in particular because it pushes the complexity down the stack where it belongs.

Or you could just add the lock to the TestRunnerAction.

I think this would serialize execution of shards?

I don't know what happens when the same binary and its runfiles tree is used as a tool in two local genrules executing in parallel

What would happen is exactly what this code is trying to prevent :).

Let me word this more carefully: you are probably thinking public synchronized runTest() { }, I'm thinking

Object runfilesLock; public void runTest() { if (runfilesTreeNeeded()) { synchronized (runfilesLock) { ... } } }

...and what would that be in case of the two-genrules-using-the-same-tool situation?

@lberki

This would serialize runfiles tree creation for all tests where runfilesTreeNeeded() returns false. This is a regression.

...and what would that be in case of the two-genrules-using-the-same-tool situation?

As my change description states runfiles tree creation when --nobuild_runfile_links is true does not work for genrules. It only works for tests (because there is custom code). A side effect of this change is that it will make it work for genrules as well.

buchgr · 2019-08-28T11:15:18Z

As an alternative, /cc @aragos is considering making it so that strategy selection is decided in the analysis phase. If that comes to pass, you can just make this decision in the analysis phase.and not worry about RunfilesTreeUpdater.

I am aware of this and I agree. However, there will be lots of water flowing down the Isar until then :). We'll still need RunfilesTreeUpdater (--nobuild_runfiles_links won't go away) but we can make the decision when creating the action.

lberki · 2019-08-28T11:22:16Z

I am aware of this and I agree. However, there will be lots of water flowing down the Isar
Wat, that saying works in any language other than Hungarian, too? (we say Danube, but same difference :) )

buchgr · 2019-08-28T12:26:09Z

Wat, that saying works in any language other than Hungarian, too? (we say Danube, but same difference :) )

I also say Danube (makes sense right?) but adjusted it to Isar given our location. I am not aware of anyone else saying it that way :).

buchgr · 2019-08-28T19:01:17Z

@lberki @laszlocsomor

I have imported the CL, addressed all comments except for one and split it up into five smaller changes. Let's continue the review internally.

The one comment I did not fully address is @lberki's comments regarding getting rid of RunfilesTreeUpdater. I found that even the solution we discussed offline with RunfilesSupport.lock to not work satisfactory. Instead, I simplified RunfilesTreeUpdaterquite a bit by replacing all the lock-free code with a synchronized block which should be good enough.

Bazel builds runfiles trees when building an executable target. This allows a binary to be run directly (without bazel test/run). The flag --nobuild_runfile_trees disables this behavior. In order to still be able to run tests that require a runfiles tree the Local/StandaloneTestStrategy has code to create / update it before executing the test. However, most execution strategies in Bazel don't use the runfiles tree: * Actions being executed using sandboxing don't have access to the runfiles tree as their working directory is an ephemeral directory created by the sandbox. Instead, the sandbox treats runfiles as normal inputs and creates a new tree for every action. * Actions being executed with the remote strategy don't have access to the runfiles tree either as they run on a different machine. Just like the sandbox the remote strategy treats runfiles as normal inputs and uploads them to the remote worker. The runfiles tree is only used by actions executed with the standalone/local or worker (with --worker_sandboxing=false) strategy as they use the exec root as their working directory. This change makes it so that if --nobuild_runfile_links is specified a runfiles tree is only created if the execution strategy actually uses it. We found that this can lead to significant performance improvements in particular for very large runfile trees. Also, before this change in remote execution we would create a runfiles tree even for remote cache hits. As a side effect this change also removes an imparity between binaries and tests, in that binaries had no such logic to on-demand create a runfiles tree if --nobuild_runfile_links is set. I will send out a follow up change with a similar optimization the input / output manifest. Note to Reviewers: * RunfilesTreeUpdater replaces TestStrategy#getLocalRunfilesDirectory. * The locking logic in RunfilesTreeUpdater implements one monitor per runfiles directory. The monitor ensures that a runfiles directory can't be modified concurrently. It replaces synchronized(inputManifest) from getLocalRunfilesDirectory. We can't use synchronized(inputManifest) for two reasons: 1) The execution strategies don't have access to it. 2) As noted above, in a follow up change I want to enable this logic to also work when --nobuild_runfile_manifests is set in which case no inputManifest object to use as monitor will exist. Closes #9244. PiperOrigin-RevId: 265862140 Change-Id: I322787f24a5fcf3ca96a4da2da1f3b0808a0227e

googlebot added the cla: yes label Aug 26, 2019

buchgr force-pushed the runfiles branch from e8481c0 to a1ac117 Compare August 26, 2019 12:49

buchgr changed the title ~~Runfiles~~ Remove on-demand runfile tree creation from test strategy Aug 26, 2019

buchgr force-pushed the runfiles branch 3 times, most recently from 8b619dc to 5d15cd7 Compare August 26, 2019 17:38

buchgr marked this pull request as ready for review August 26, 2019 17:39

buchgr requested review from ishikhman, ola-rozenfeld and philwo as code owners August 26, 2019 17:39

buchgr requested review from ericfelly, haxorz, jmmv, lberki and ulfjack and removed request for ola-rozenfeld and philwo August 26, 2019 17:39

buchgr force-pushed the runfiles branch from 5d15cd7 to b89f732 Compare August 26, 2019 17:46

buchgr requested a review from laszlocsomor August 28, 2019 08:12

buchgr added 2 commits August 28, 2019 10:48

Simplify signature of SymlinkTreeHelper.createSymlinks

287af5b

Take an execRoot and OutErr instead of the whole ActionExecutionContext.

buchgr force-pushed the runfiles branch from b89f732 to f9667dc Compare August 28, 2019 08:50

laszlocsomor suggested changes Aug 28, 2019

View reviewed changes

lberki reviewed Aug 28, 2019

View reviewed changes

bazel-io closed this in 0324607 Aug 30, 2019

emfree mentioned this pull request Sep 15, 2019

remote_download_minimal: Error running tests with a directory dependency #9386

Closed

Conversation

buchgr commented Aug 26, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

buchgr commented Aug 26, 2019

Uh oh!

haxorz commented Aug 26, 2019

Uh oh!

buchgr commented Aug 27, 2019

Uh oh!

haxorz commented Aug 27, 2019

Uh oh!

buchgr commented Aug 28, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lberki left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

buchgr commented Aug 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lberki commented Aug 28, 2019

Uh oh!

buchgr commented Aug 28, 2019

Uh oh!

buchgr commented Aug 28, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

buchgr commented Aug 26, 2019 •

edited

Loading

buchgr commented Aug 28, 2019 •

edited

Loading