Remove on-demand runfile tree creation from test strategy#9244
Remove on-demand runfile tree creation from test strategy#9244buchgr wants to merge 2 commits intobazelbuild:masterfrom
Conversation
|
@werkt I believe you will like this. |
8b619dc to
5d15cd7
Compare
|
if you can wait until @ericfelly comes back from OOO next week, he'd be a more qualified reviewer than i would be. |
|
I would typically be able to wait but this week is special because next Monday Bazel 1.0 will be cut :). I'd appreciate reviews before that. I do realize that many folks knowledge able in this area are currently on vacation :. |
|
i'm sorry but i have very little context on this PR or even this part of the bazel codebase. i'd have to spent 1-2 days just researching the background before i could do a competent review. and i'm already pretty busy this week. from their internal calendars, it look like @ishikhman, @lberki, and @jmmv are all not-OOO this week. are they available to review? if not, i can try to find someone else. |
|
@haxorz I wasn't trying to pick on you to do the review. I requested reviews from folks that I was thought might be knowledgeable in this area. |
Take an execRoot and OutErr instead of the whole ActionExecutionContext.
Bazel builds runfiles trees when building an executable target. This
allows a binary to be run directly (without bazel test/run). The flag
--nobuild_runfile_trees disables this behavior. In order to still be
able to run tests that require a runfiles tree the
Local/StandaloneTestStrategy has code to create / update it before
executing the test.
However, most execution strategies in Bazel don't use the runfiles tree:
* Actions being executed using sandboxing don't have access to the
runfiles tree as their working directory is an ephemeral directory
created by the sandbox. Instead, the sandbox treats runfiles as normal
inputs and creates a new tree for every action.
* Actions being executed with the remote strategy don't have access to
the runfiles tree either as they run on a different machine. Just like
the sandbox the remote strategy treats runfiles as normal inputs and
uploads them to the remote worker.
The runfiles tree is only used by actions executed with the standalone/local
or worker (with --worker_sandboxing=false) strategy as they use the exec
root as their working directory.
This change makes it so that if --nobuild_runfile_links is specified a
runfiles tree is only created if the execution strategy actually uses
it.
We found that this can lead to significant performance improvements in
particular for very large runfile trees. In one large build we were able
to measure a 27% e2e test time reduction. Also, before this change in
remote execution we would create a runfiles tree even for remote cache
hits.
As a side effect this change also removes an imparity between binaries
and tests, in that binaries had no such logic to on-demand create a
runfiles tree if --nobuild_runfile_links is set.
I will send out a follow up change with a similar optimization the
input / output manifest.
Note to Reviewers:
* RunfilesTreeUpdater replaces TestStrategy#getLocalRunfilesDirectory.
* The locking logic in RunfilesTreeUpdater implements one monitor per
runfiles directory. The monitor ensures that a runfiles directory
can't be modified concurrently. It replaces synchronized(inputManifest)
from getLocalRunfilesDirectory. We can't use synchronized(inputManifest)
for two reasons:
1) The execution strategies don't have access to it.
2) As noted above, in a follow up change I want to enable this logic
to also work when --nobuild_runfile_manifests is set in which case no
inputManifest object to use as monitor will exist.
src/main/java/com/google/devtools/build/lib/analysis/RunfilesSupport.java
Show resolved
Hide resolved
src/main/java/com/google/devtools/build/lib/analysis/RunfilesSupport.java
Show resolved
Hide resolved
src/main/java/com/google/devtools/build/lib/analysis/RunfilesSupport.java
Show resolved
Hide resolved
src/main/java/com/google/devtools/build/lib/analysis/RunfilesSupport.java
Show resolved
Hide resolved
src/main/java/com/google/devtools/build/lib/analysis/config/BuildConfiguration.java
Show resolved
Hide resolved
lberki
left a comment
There was a problem hiding this comment.
As an alternative, /cc @aragos is considering making it so that strategy selection is decided in the analysis phase. If that comes to pass, you can just make this decision in the analysis phase.and not worry about RunfilesTreeUpdater.
| boolean runfilesEnabled = BuildConfiguration.runfilesEnabled(coreOptions); | ||
| // True if all runfile trees will be created when building binaries/test. If so, we won't need | ||
| // the RunfilesTreeUpdater. | ||
| boolean createRunfilesAtBuild = BuildConfiguration.shouldCreateRunfilesTree(coreOptions); |
There was a problem hiding this comment.
This will not work if someone changes the value of this flag in a configuration transition. Sure, it would be kind of weird to do so, but one can do that from Starlark (same in WorkerModule). You'll need to find a way to convey that bit from the configured target itself.
There was a problem hiding this comment.
Interesting point. Would you consider this a blocker for this change or can this be done as an immediate follow up?
There was a problem hiding this comment.
Blocker. Let's not submit known-incorrect code.
| try { | ||
| // Avoid rebuilding the runfiles directory if the manifest in it matches the input manifest, | ||
| // implying the symlinks exist and are already up to date. If the output manifest is a | ||
| // symbolic link, it is likely a symbolic link to the input manifest, so we cannot trust it as |
There was a problem hiding this comment.
Is it? I thought that we create it anew so as to avoid this exact issue, don't we?
There was a problem hiding this comment.
I copied this comment from the old code. I also could not find a place where the output manifest is a symlink to the input manifest. It might be time to retire the comment and remove the check?
| } | ||
| } | ||
|
|
||
| private AtomicInteger getMonitor(PathFragment runfilesDirectory) { |
There was a problem hiding this comment.
I think this works, but I'm kinda afraid of this kind of multi-threaded code. Do you have any other alternatives that do not require this? For example (off the top of my head, I haven't given this too much thought), you could synchronize on the Path object referring to one of the manifests (not an Artifact, though, that's not guaranteed to be unique)
Or you could just add the lock to the TestRunnerAction. That has the advantage of not abusing the monitor of a Path. Of course, this only works if this is the only case when a runfiles tree is reused (I don't know what happens when the same binary and its runfiles tree is used as a tool in two local genrules executing in parallel)
There was a problem hiding this comment.
For example (off the top of my head, I haven't given this too much thought), you could synchronize on the Path object referring to one of the manifests (not an Artifact, though, that's not guaranteed to be unique)
Exactly. What this code does is map equal path objects to a unique object. I believe it's the most principled way of solving this problem, in particular because it pushes the complexity down the stack where it belongs.
Or you could just add the lock to the TestRunnerAction.
I think this would serialize execution of shards?
I don't know what happens when the same binary and its runfiles tree is used as a tool in two local genrules executing in parallel
What would happen is exactly what this code is trying to prevent :).
There was a problem hiding this comment.
Let me word this more carefully: you are probably thinking public synchronized runTest() { }, I'm thinking
Object runfilesLock;
public void runTest() {
if (runfilesTreeNeeded()) {
synchronized (runfilesLock) {
...
}
}
}
...and what would that be in case of the two-genrules-using-the-same-tool situation?
There was a problem hiding this comment.
This would serialize runfiles tree creation for all tests where runfilesTreeNeeded() returns false. This is a regression.
...and what would that be in case of the two-genrules-using-the-same-tool situation?
As my change description states runfiles tree creation when --nobuild_runfile_links is true does not work for genrules. It only works for tests (because there is custom code). A side effect of this change is that it will make it work for genrules as well.
I am aware of this and I agree. However, there will be lots of water flowing down the Isar until then :). We'll still need RunfilesTreeUpdater (--nobuild_runfiles_links won't go away) but we can make the decision when creating the action. |
|
I also say Danube (makes sense right?) but adjusted it to Isar given our location. I am not aware of anyone else saying it that way :). |
|
I have imported the CL, addressed all comments except for one and split it up into five smaller changes. Let's continue the review internally. The one comment I did not fully address is @lberki's comments regarding getting rid of |
Bazel builds runfiles trees when building an executable target. This
allows a binary to be run directly (without bazel test/run). The flag
--nobuild_runfile_trees disables this behavior. In order to still be
able to run tests that require a runfiles tree the
Local/StandaloneTestStrategy has code to create / update it before
executing the test.
However, most execution strategies in Bazel don't use the runfiles tree:
* Actions being executed using sandboxing don't have access to the
runfiles tree as their working directory is an ephemeral directory
created by the sandbox. Instead, the sandbox treats runfiles as normal
inputs and creates a new tree for every action.
* Actions being executed with the remote strategy don't have access to
the runfiles tree either as they run on a different machine. Just like
the sandbox the remote strategy treats runfiles as normal inputs and
uploads them to the remote worker.
The runfiles tree is only used by actions executed with the standalone/local
or worker (with --worker_sandboxing=false) strategy as they use the exec
root as their working directory.
This change makes it so that if --nobuild_runfile_links is specified a
runfiles tree is only created if the execution strategy actually uses
it.
We found that this can lead to significant performance improvements in
particular for very large runfile trees. Also, before this change in
remote execution we would create a runfiles tree even for remote cache
hits.
As a side effect this change also removes an imparity between binaries
and tests, in that binaries had no such logic to on-demand create a
runfiles tree if --nobuild_runfile_links is set.
I will send out a follow up change with a similar optimization the
input / output manifest.
Note to Reviewers:
* RunfilesTreeUpdater replaces TestStrategy#getLocalRunfilesDirectory.
* The locking logic in RunfilesTreeUpdater implements one monitor per
runfiles directory. The monitor ensures that a runfiles directory
can't be modified concurrently. It replaces synchronized(inputManifest)
from getLocalRunfilesDirectory. We can't use synchronized(inputManifest)
for two reasons:
1) The execution strategies don't have access to it.
2) As noted above, in a follow up change I want to enable this logic
to also work when --nobuild_runfile_manifests is set in which case no
inputManifest object to use as monitor will exist.
Closes #9244.
PiperOrigin-RevId: 265862140
Change-Id: I322787f24a5fcf3ca96a4da2da1f3b0808a0227e
Remove on-demand runfile tree creation from test strategy
Bazel builds runfiles trees when building an executable target. This
allows a binary to be run directly (without bazel test/run). The flag
--nobuild_runfile_trees disables this behavior. In order to still be
able to run tests that require a runfiles tree the
Local/StandaloneTestStrategy has code to create / update it before
executing the test.
However, most execution strategies in Bazel don't use the runfiles tree:
runfiles tree as their working directory is an ephemeral directory
created by the sandbox. Instead, the sandbox treats runfiles as normal
inputs and creates a new tree for every action.
the runfiles tree either as they run on a different machine. Just like
the sandbox the remote strategy treats runfiles as normal inputs and
uploads them to the remote worker.
The runfiles tree is only used by actions executed with the standalone/local
or worker (with --worker_sandboxing=false) strategy as they use the exec
root as their working directory.
This change makes it so that if --nobuild_runfile_links is specified a
runfiles tree is only created if the execution strategy actually uses
it.
We found that this can lead to significant performance improvements in
particular for very large runfile trees. Also, before this change in
remote execution we would create a runfiles tree even for remote cache
hits.
As a side effect this change also removes an imparity between binaries
and tests, in that binaries had no such logic to on-demand create a
runfiles tree if --nobuild_runfile_links is set.
I will send out a follow up change with a similar optimization the
input / output manifest.
Note to Reviewers:
runfiles directory. The monitor ensures that a runfiles directory
can't be modified concurrently. It replaces synchronized(inputManifest)
from getLocalRunfilesDirectory. We can't use synchronized(inputManifest)
for two reasons:
to also work when --nobuild_runfile_manifests is set in which case no
inputManifest object to use as monitor will exist.