-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Tree artifact up-to-dateness check can be very slow for large tree artifacts #17009
Copy link
Copy link
Closed
Labels
P3We're not considering working on this, but happy to review a PR. (No assignee)We're not considering working on this, but happy to review a PR. (No assignee)team-Remote-ExecIssues and PRs for the Execution (Remote) teamIssues and PRs for the Execution (Remote) teamtype: bug
Description
Description of the bug:
When checking whether a local action cache entry is up-to-date, it takes a long time to check actions that have large tree artifacts on their inputs. The stack trace when Bazel is working on this is:
java.lang.Thread.State: RUNNABLE
at java.io.FileInputStream.readBytes(java.base@11.0.6/Native Method)
at java.io.FileInputStream.read(java.base@11.0.6/Unknown Source)
at com.google.common.io.ByteStreams.copy(ByteStreams.java:114)
at com.google.common.io.ByteSource.copyTo(ByteSource.java:257)
at com.google.common.io.ByteSource.hash(ByteSource.java:340)
at com.google.devtools.build.lib.vfs.FileSystem.getDigest(FileSystem.java:339)
at com.google.devtools.build.lib.unix.UnixFileSystem.getDigest(UnixFileSystem.java:452)
at com.google.devtools.build.lib.vfs.Path.getDigest(Path.java:690)
at com.google.devtools.build.lib.vfs.DigestUtils.manuallyComputeDigest(DigestUtils.java:194)
at com.google.devtools.build.lib.skyframe.ActionMetadataHandler.constructFileArtifactValue(ActionMetada
taHandler.java:564)
at com.google.devtools.build.lib.skyframe.ActionMetadataHandler.constructFileArtifactValueFromFilesyste
m(ActionMetadataHandler.java:496)
at com.google.devtools.build.lib.skyframe.ActionMetadataHandler.lambda$constructTreeArtifactValueFromFi
lesystem$0(ActionMetadataHandler.java:354)
at com.google.devtools.build.lib.skyframe.ActionMetadataHandler$$Lambda$1121/0x0000000800857040.visit(Unknown Source)
at com.google.devtools.build.lib.skyframe.TreeArtifactValue.visitTree(TreeArtifactValue.java:411)
at com.google.devtools.build.lib.skyframe.TreeArtifactValue.visitTree(TreeArtifactValue.java:414)
at com.google.devtools.build.lib.skyframe.TreeArtifactValue.visitTree(TreeArtifactValue.java:414)
at com.google.devtools.build.lib.skyframe.TreeArtifactValue.visitTree(TreeArtifactValue.java:414)
at com.google.devtools.build.lib.skyframe.TreeArtifactValue.visitTree(TreeArtifactValue.java:414)
at com.google.devtools.build.lib.skyframe.TreeArtifactValue.visitTree(TreeArtifactValue.java:414)
at com.google.devtools.build.lib.skyframe.TreeArtifactValue.visitTree(TreeArtifactValue.java:414)
at com.google.devtools.build.lib.skyframe.TreeArtifactValue.visitTree(TreeArtifactValue.java:393)
at com.google.devtools.build.lib.skyframe.ActionMetadataHandler.constructTreeArtifactValueFromFilesystem(ActionMetadataHandler.java:342)
at com.google.devtools.build.lib.skyframe.ActionMetadataHandler.getTreeArtifactValue(ActionMetadataHandler.java:317)
at com.google.devtools.build.lib.skyframe.ActionMetadataHandler.getMetadata(ActionMetadataHandler.java:265)
at com.google.devtools.build.lib.actions.ActionCacheChecker.getMetadataOrConstant(ActionCacheChecker.java:566)
at com.google.devtools.build.lib.actions.ActionCacheChecker.getMetadataMaybe(ActionCacheChecker.java:579)
at com.google.devtools.build.lib.actions.ActionCacheChecker.validateArtifacts(ActionCacheChecker.java:207)
at com.google.devtools.build.lib.actions.ActionCacheChecker.mustExecute(ActionCacheChecker.java:541)
My theory is that this is because the visitation happens on a single thread in TreeArtifactValue.visitTree() when called from ActionMetadataHandler.constructTreeArtifactValueFromFilesystem().
What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
Take this BUILD file:
touch WORKSPACE
mkdir -p r
cat > r/BUILD <<'EOF'
load(":r.bzl", "r")
r(name = "ta")
genrule(
name = "c",
srcs = [":ta"],
outs = ["co"],
cmd = "find $(location :ta) > $@",
)
sh_binary(
name = "gen",
srcs = ["gen.sh"],
)
EOF
cat > r/r.bzl << 'EOF'
def _r_impl(ctx):
ta = ctx.actions.declare_directory("d")
ctx.actions.run(
outputs = [ta],
inputs = [],
executable = ctx.executable._gen,
arguments = [ta.path],
)
return [DefaultInfo(files = depset([ta]))]
r = rule(
implementation = _r_impl,
attrs = {
"_gen": attr.label(default = "//r:gen", executable = True, cfg = "exec"),
},
)
EOF
cat > r/gen.sh <<'EOF'
#!/bin/bash
OUT="$1"
mkdir -p "$OUT"
for i in $(seq 1 10); do
for j in $(seq 1 10); do
for k in $(seq 1 100); do
mkdir -p "$OUT/$i/$j"
#echo "$i $j $k" > "$OUT/$i/$j/$k"
dd if=/dev/random of="$OUT/$i/$j/$k" bs=1024 count=1024
done
done
done
echo hello > "$OUT/hello"
EOF
chmod +x r/gen.sh
bazel build //r:c
bazel shutdown
bazel build //r:c # This is slow
Which operating system are you running Bazel on?
Linux @ Google
What is the output of bazel info release?
development version
If bazel info release returns development version or (@non-git), tell us how you built Bazel.
From git commit de4746d .
What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?
No response
Have you found anything relevant by searching the web?
No response
Any other information, logs, or outputs that you want to share?
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
P3We're not considering working on this, but happy to review a PR. (No assignee)We're not considering working on this, but happy to review a PR. (No assignee)team-Remote-ExecIssues and PRs for the Execution (Remote) teamIssues and PRs for the Execution (Remote) teamtype: bug