Skip to content

Bazel sometimes execs file downloaded from remote cache before executable bits have been set #12137

@pcjanzen

Description

@pcjanzen

Description of the problem / feature request:

When using remote execution or remote caching and --remote_download_minimal, when a binary must be downloaded to the local machine from the remote, occasionally Bazel will attempt to execute the file before setting its executable bits.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

I haven't been able to generate a simplified repeatable example because it is sensitive to other things going on in the same build. But my in-house test case looks like:

genrule(
    name = "some_data",
    outs = ["some_data.txt"],
    cmd = ":> $@",
)

cc_test(
    name = "hello_world",
    srcs = ["hello_world.c"],
    data = [":some_data"],
)

sh_test(
    name = "chmod_bug",
    srcs = ["chmod_bug.sh"],
    data = [":hello_world"],
    tags = ["no-remote-exec"],
    args = ["$(location :hello_world)"],
)

That is, there's a binary that's built and cached in remote execution, but there's some other rule that's marked "no-remote" that requires that binary to be downloaded to the local host. With --remote_download_minimal, the binary is deleted after the local execution occurs, so the download/chmod/exec cycle happens every time the binary is required.

The shell script then just does:

stat -L $1
date --rfc-3339=ns -u
$1 || (sleep 1; stat -L $1; exit 1)

which occasionally leads to:

$ bazel test --remote_download_toplevel -t- //pcj:chmod_bug
==================== Test output for //pcj:chmod_bug:
  File: pcj/hello_world
  Size: 8600      	Blocks: 24         IO Block: 4096   regular file
Device: 802h/2050d	Inode: 25690430    Links: 1
Access: (0644/-rw-r--r--)  Uid: ( 1007/pauljanzen)   Gid: ( 1007/pauljanzen)
Access: 2020-09-18 22:24:05.989466229 +0000
Modify: 2020-09-18 22:24:05.989466229 +0000
Change: 2020-09-18 22:24:05.989466229 +0000
 Birth: -
2020-09-18 22:24:06.026526515+00:00
line 5: pcj/hello_world: Permission denied
  File: pcj/hello_world
  Size: 8600      	Blocks: 24         IO Block: 4096   regular file
Device: 802h/2050d	Inode: 25690430    Links: 1
Access: (0755/-rwxr-xr-x)  Uid: ( 1007/pauljanzen)   Gid: ( 1007/pauljanzen)
Access: 2020-09-18 22:24:05.989466229 +0000
Modify: 2020-09-18 22:24:05.989466229 +0000
Change: 2020-09-18 22:24:06.093465133 +0000
 Birth: -
================================================================================

I have to build some other moderately-sized, unrelated target at the same time in order to trigger the bug. It seems like it always succeeds the first time after starting the Bazel server or after any change that causes the analysis cache to be discarded, but will reliably fail within two or three iterations after that.

What operating system are you running Bazel on?

Ubuntu 18.04

What's the output of bazel info release?

release 3.4.1 (but have also verified that the problem is still present in release-3.6.0rc2 aa0d97c)

If bazel info release returns "development version" or "(@Non-Git)", tell us how you built Bazel.

Replace this line with your answer.

What's the output of git remote get-url origin ; git rev-parse master ; git rev-parse HEAD ?

Replace this line with your answer.

Have you found anything relevant by searching the web?

No.

Any other information, logs, or outputs that you want to share?

Replace these lines with your answer.

If the files are large, upload as attachment or provide link.

Metadata

Metadata

Assignees

Labels

P2We'll consider working on this in future. (Assignee optional)team-Remote-ExecIssues and PRs for the Execution (Remote) teamtype: bug

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions