Skip to content

Remove on-demand runfile tree creation from test strategy#9244

Closed
buchgr wants to merge 2 commits intobazelbuild:masterfrom
buchgr:runfiles
Closed

Remove on-demand runfile tree creation from test strategy#9244
buchgr wants to merge 2 commits intobazelbuild:masterfrom
buchgr:runfiles

Conversation

@buchgr
Copy link
Copy Markdown
Contributor

@buchgr buchgr commented Aug 26, 2019

Remove on-demand runfile tree creation from test strategy

Bazel builds runfiles trees when building an executable target. This
allows a binary to be run directly (without bazel test/run). The flag
--nobuild_runfile_trees disables this behavior. In order to still be
able to run tests that require a runfiles tree the
Local/StandaloneTestStrategy has code to create / update it before
executing the test.

However, most execution strategies in Bazel don't use the runfiles tree:

  • Actions being executed using sandboxing don't have access to the
    runfiles tree as their working directory is an ephemeral directory
    created by the sandbox. Instead, the sandbox treats runfiles as normal
    inputs and creates a new tree for every action.
  • Actions being executed with the remote strategy don't have access to
    the runfiles tree either as they run on a different machine. Just like
    the sandbox the remote strategy treats runfiles as normal inputs and
    uploads them to the remote worker.

The runfiles tree is only used by actions executed with the standalone/local
or worker (with --worker_sandboxing=false) strategy as they use the exec
root as their working directory.

This change makes it so that if --nobuild_runfile_links is specified a
runfiles tree is only created if the execution strategy actually uses
it.

We found that this can lead to significant performance improvements in
particular for very large runfile trees. Also, before this change in
remote execution we would create a runfiles tree even for remote cache
hits.

As a side effect this change also removes an imparity between binaries
and tests, in that binaries had no such logic to on-demand create a
runfiles tree if --nobuild_runfile_links is set.

I will send out a follow up change with a similar optimization the
input / output manifest.

Note to Reviewers:

  • RunfilesTreeUpdater replaces TestStrategy#getLocalRunfilesDirectory.
  • The locking logic in RunfilesTreeUpdater implements one monitor per
    runfiles directory. The monitor ensures that a runfiles directory
    can't be modified concurrently. It replaces synchronized(inputManifest)
    from getLocalRunfilesDirectory. We can't use synchronized(inputManifest)
    for two reasons:
    1. The execution strategies don't have access to it.
    2. As noted above, in a follow up change I want to enable this logic
      to also work when --nobuild_runfile_manifests is set in which case no
      inputManifest object to use as monitor will exist.

@buchgr buchgr changed the title Runfiles Remove on-demand runfile tree creation from test strategy Aug 26, 2019
@buchgr
Copy link
Copy Markdown
Contributor Author

buchgr commented Aug 26, 2019

@werkt I believe you will like this.

@buchgr buchgr force-pushed the runfiles branch 3 times, most recently from 8b619dc to 5d15cd7 Compare August 26, 2019 17:38
@buchgr buchgr marked this pull request as ready for review August 26, 2019 17:39
@buchgr buchgr requested review from ericfelly, haxorz, jmmv, lberki and ulfjack and removed request for ola-rozenfeld and philwo August 26, 2019 17:39
@haxorz
Copy link
Copy Markdown
Contributor

haxorz commented Aug 26, 2019

if you can wait until @ericfelly comes back from OOO next week, he'd be a more qualified reviewer than i would be.

@buchgr
Copy link
Copy Markdown
Contributor Author

buchgr commented Aug 27, 2019

I would typically be able to wait but this week is special because next Monday Bazel 1.0 will be cut :). I'd appreciate reviews before that. I do realize that many folks knowledge able in this area are currently on vacation :.

@haxorz
Copy link
Copy Markdown
Contributor

haxorz commented Aug 27, 2019

i'm sorry but i have very little context on this PR or even this part of the bazel codebase. i'd have to spent 1-2 days just researching the background before i could do a competent review. and i'm already pretty busy this week.

from their internal calendars, it look like @ishikhman, @lberki, and @jmmv are all not-OOO this week. are they available to review? if not, i can try to find someone else.

@buchgr
Copy link
Copy Markdown
Contributor Author

buchgr commented Aug 28, 2019

@haxorz I wasn't trying to pick on you to do the review. I requested reviews from folks that I was thought might be knowledgeable in this area.

@buchgr buchgr requested a review from laszlocsomor August 28, 2019 08:12
buchgr added 2 commits August 28, 2019 10:48
Take an execRoot and OutErr instead of the whole ActionExecutionContext.
Bazel builds runfiles trees when building an executable target. This
allows a binary to be run directly (without bazel test/run). The flag
--nobuild_runfile_trees disables this behavior. In order to still be
able to run tests that require a runfiles tree the
Local/StandaloneTestStrategy has code to create / update it before
executing the test.

However, most execution strategies in Bazel don't use the runfiles tree:
  * Actions being executed using sandboxing don't have access to the
    runfiles tree as their working directory is an ephemeral directory
    created by the sandbox. Instead, the sandbox treats runfiles as normal
    inputs and creates a new tree for every action.
  * Actions being executed with the remote strategy don't have access to
    the runfiles tree either as they run on a different machine. Just like
    the sandbox the remote strategy treats runfiles as normal inputs and
    uploads them to the remote worker.

The runfiles tree is only used by actions executed with the standalone/local
or worker (with --worker_sandboxing=false) strategy as they use the exec
root as their working directory.

This change makes it so that if --nobuild_runfile_links is specified a
runfiles tree is only created if the execution strategy actually uses
it.

We found that this can lead to significant performance improvements in
particular for very large runfile trees. In one large build we were able
to measure a 27% e2e test time reduction. Also, before this change in
remote execution we would create a runfiles tree even for remote cache
hits.

As a side effect this change also removes an imparity between binaries
and tests, in that binaries had no such logic to on-demand create a
runfiles tree if --nobuild_runfile_links is set.

I will send out a follow up change with a similar optimization the
input / output manifest.

Note to Reviewers:
 * RunfilesTreeUpdater replaces TestStrategy#getLocalRunfilesDirectory.
 * The locking logic in RunfilesTreeUpdater implements one monitor per
   runfiles directory. The monitor ensures that a runfiles directory
   can't be modified concurrently. It replaces synchronized(inputManifest)
   from getLocalRunfilesDirectory. We can't use synchronized(inputManifest)
   for two reasons:
     1) The execution strategies don't have access to it.
     2) As noted above, in a follow up change I want to enable this logic
     to also work when --nobuild_runfile_manifests is set in which case no
     inputManifest object to use as monitor will exist.
Copy link
Copy Markdown
Contributor

@lberki lberki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an alternative, /cc @aragos is considering making it so that strategy selection is decided in the analysis phase. If that comes to pass, you can just make this decision in the analysis phase.and not worry about RunfilesTreeUpdater.

boolean runfilesEnabled = BuildConfiguration.runfilesEnabled(coreOptions);
// True if all runfile trees will be created when building binaries/test. If so, we won't need
// the RunfilesTreeUpdater.
boolean createRunfilesAtBuild = BuildConfiguration.shouldCreateRunfilesTree(coreOptions);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not work if someone changes the value of this flag in a configuration transition. Sure, it would be kind of weird to do so, but one can do that from Starlark (same in WorkerModule). You'll need to find a way to convey that bit from the configured target itself.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting point. Would you consider this a blocker for this change or can this be done as an immediate follow up?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocker. Let's not submit known-incorrect code.

try {
// Avoid rebuilding the runfiles directory if the manifest in it matches the input manifest,
// implying the symlinks exist and are already up to date. If the output manifest is a
// symbolic link, it is likely a symbolic link to the input manifest, so we cannot trust it as
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it? I thought that we create it anew so as to avoid this exact issue, don't we?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied this comment from the old code. I also could not find a place where the output manifest is a symlink to the input manifest. It might be time to retire the comment and remove the check?

}
}

private AtomicInteger getMonitor(PathFragment runfilesDirectory) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this works, but I'm kinda afraid of this kind of multi-threaded code. Do you have any other alternatives that do not require this? For example (off the top of my head, I haven't given this too much thought), you could synchronize on the Path object referring to one of the manifests (not an Artifact, though, that's not guaranteed to be unique)

Or you could just add the lock to the TestRunnerAction. That has the advantage of not abusing the monitor of a Path. Of course, this only works if this is the only case when a runfiles tree is reused (I don't know what happens when the same binary and its runfiles tree is used as a tool in two local genrules executing in parallel)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example (off the top of my head, I haven't given this too much thought), you could synchronize on the Path object referring to one of the manifests (not an Artifact, though, that's not guaranteed to be unique)

Exactly. What this code does is map equal path objects to a unique object. I believe it's the most principled way of solving this problem, in particular because it pushes the complexity down the stack where it belongs.

Or you could just add the lock to the TestRunnerAction.

I think this would serialize execution of shards?

I don't know what happens when the same binary and its runfiles tree is used as a tool in two local genrules executing in parallel

What would happen is exactly what this code is trying to prevent :).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me word this more carefully: you are probably thinking public synchronized runTest() { }, I'm thinking

Object runfilesLock;

public void runTest() {
  if (runfilesTreeNeeded()) {
    synchronized (runfilesLock) {
      ...
    }
  }
}

...and what would that be in case of the two-genrules-using-the-same-tool situation?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lberki

This would serialize runfiles tree creation for all tests where runfilesTreeNeeded() returns false. This is a regression.

...and what would that be in case of the two-genrules-using-the-same-tool situation?

As my change description states runfiles tree creation when --nobuild_runfile_links is true does not work for genrules. It only works for tests (because there is custom code). A side effect of this change is that it will make it work for genrules as well.

@buchgr
Copy link
Copy Markdown
Contributor Author

buchgr commented Aug 28, 2019

As an alternative, /cc @aragos is considering making it so that strategy selection is decided in the analysis phase. If that comes to pass, you can just make this decision in the analysis phase.and not worry about RunfilesTreeUpdater.

I am aware of this and I agree. However, there will be lots of water flowing down the Isar until then :). We'll still need RunfilesTreeUpdater (--nobuild_runfiles_links won't go away) but we can make the decision when creating the action.

@lberki
Copy link
Copy Markdown
Contributor

lberki commented Aug 28, 2019

I am aware of this and I agree. However, there will be lots of water flowing down the Isar
Wat, that saying works in any language other than Hungarian, too? (we say Danube, but same difference :) )

@buchgr
Copy link
Copy Markdown
Contributor Author

buchgr commented Aug 28, 2019

Wat, that saying works in any language other than Hungarian, too? (we say Danube, but same difference :) )

I also say Danube (makes sense right?) but adjusted it to Isar given our location. I am not aware of anyone else saying it that way :).

@buchgr
Copy link
Copy Markdown
Contributor Author

buchgr commented Aug 28, 2019

@lberki @laszlocsomor

I have imported the CL, addressed all comments except for one and split it up into five smaller changes. Let's continue the review internally.

The one comment I did not fully address is @lberki's comments regarding getting rid of RunfilesTreeUpdater. I found that even the solution we discussed offline with RunfilesSupport.lock to not work satisfactory. Instead, I simplified RunfilesTreeUpdaterquite a bit by replacing all the lock-free code with a synchronized block which should be good enough.

buchgr added a commit that referenced this pull request Aug 30, 2019
Bazel builds runfiles trees when building an executable target. This
allows a binary to be run directly (without bazel test/run). The flag
--nobuild_runfile_trees disables this behavior. In order to still be
able to run tests that require a runfiles tree the
Local/StandaloneTestStrategy has code to create / update it before
executing the test.

However, most execution strategies in Bazel don't use the runfiles tree:
  * Actions being executed using sandboxing don't have access to the
    runfiles tree as their working directory is an ephemeral directory
    created by the sandbox. Instead, the sandbox treats runfiles as normal
    inputs and creates a new tree for every action.
  * Actions being executed with the remote strategy don't have access to
    the runfiles tree either as they run on a different machine. Just like
    the sandbox the remote strategy treats runfiles as normal inputs and
    uploads them to the remote worker.

The runfiles tree is only used by actions executed with the standalone/local
or worker (with --worker_sandboxing=false) strategy as they use the exec
root as their working directory.

This change makes it so that if --nobuild_runfile_links is specified a
runfiles tree is only created if the execution strategy actually uses
it.

We found that this can lead to significant performance improvements in
particular for very large runfile trees. Also, before this change in
remote execution we would create a runfiles tree even for remote cache
hits.

As a side effect this change also removes an imparity between binaries
and tests, in that binaries had no such logic to on-demand create a
runfiles tree if --nobuild_runfile_links is set.

I will send out a follow up change with a similar optimization the
input / output manifest.

Note to Reviewers:
 * RunfilesTreeUpdater replaces TestStrategy#getLocalRunfilesDirectory.
 * The locking logic in RunfilesTreeUpdater implements one monitor per
   runfiles directory. The monitor ensures that a runfiles directory
   can't be modified concurrently. It replaces synchronized(inputManifest)
   from getLocalRunfilesDirectory. We can't use synchronized(inputManifest)
   for two reasons:
     1) The execution strategies don't have access to it.
     2) As noted above, in a follow up change I want to enable this logic
     to also work when --nobuild_runfile_manifests is set in which case no
     inputManifest object to use as monitor will exist.

Closes #9244.

PiperOrigin-RevId: 265862140
Change-Id: I322787f24a5fcf3ca96a4da2da1f3b0808a0227e
@bazel-io bazel-io closed this in 0324607 Aug 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants