Skip to content

Preserve symlink structure of remote execution inputs #23620

@tpudlik

Description

@tpudlik

Description of the feature request:

If source artifact inputs of build actions include symlinks, these symlinks are represented as regular files when the build action is executed remotely. This can break certain inputs, in particular LLVM built in the "busybox" configuration. The FR is to preserve the symlink structure instead.

Let me unpack this a little bit.

Current Bazel behavior

Let me quote @tjgq from an internal conversation we've had about this:

  1. Every artifact tracked by Bazel has a type (file, directory or symlink) associated with it, which defines how its identity is determined - for artifacts of file or directory type, the content matters; for artifacts of symlink type, only the textual target path of the symlink does

  2. However (and this is where it gets complicated) the "Bazel type" of an artifact is somewhat independent from its filesystem representation. It's possible to have an artifact of file or directory type which is materialized in the filesystem as a symlink, in which case Bazel cares about the bytes at the other end of the symlink, not the path the symlink points to

  3. Source artifacts are, at the moment, always of file type; only output artifacts can be of directory or symlink type (that would be a ctx.actions.declare_directory or a ctx.actions.declare_symlink, respectively)

  4. For artifacts of file type, when we construct the input representation of a remote action, we always represent them as regular files, never symlinks, even if their filesystem representation is a symlink; so their "symlinkness" is effectively lost when the action is executed remotely

How this breaks LLVM toolchains

We use a hermetic LLVM toolchain, and that toolchain is part of the build inputs. The toolchain includes a bunch of "binaries" like bin/clang, bin/clang++, bin/lld, etc. But in fact, the LLVM version we use employs a "busybox" architecture, where these binaries are all symlinks to bin/llvm. However! Invoking bin/clang is not actually equivalent to invoking bin/llvm: the binary examines its argv[0], and behaves differently when invoked via symlink.

What is more, in some situations llvm will re-invoke itself. On the first invocation, we need the argv[0] to be clang. On the re-invocation, llvm will use the path from /proc/self/exe, which needs to end in llvm. If we merely have a copy, argv[0] is clang both times, producing errors like https://pwbug.dev/issues/364781685. I am not a toolchain expert, but I discussed this with some, and they assure me this behavior (i.e., reading /proc/self/exe an assuming it points to llvm and not e.g. clang, rather than just setting it to llvm) is unfortunately necessary due to the treatment of Clang reproducers and -canonical-prefixes (although I confess I could not follow their explanation).

Workarounds

There are workarounds for this issue:

  1. Replace bin/clang (etc) with symlinks created by ctx.actions.declare_symlink. Such Bazel-created symlinks will be faithfully sent to RBE.

  2. Wrap bin/clang (etc) in bash scripts like,

    #!/bin/bash
    exec -a clang $(location //:bin/llvm)  "$@"
    

    This has the advantage that no custom rules are required, you just genrule the wrapper scripts into existence. These wrapper scripts (thanks to the exec -a clang) have the same magic property as the symlinks, i.e. that argv[0] is different from the actual executed binary basename. However, this requires bash (/bin/sh doesn't support the -a flag).

However, this is definitely a sharp edge and it would be nice to remove it.

Further reading for Googlers

See internal discussions of this problem for more details:

Which category does this issue belong to?

Remote Execution

What underlying problem are you trying to solve with this feature?

No response

Which operating system are you running Bazel on?

No response

What is the output of bazel info release?

development version

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

I'm on d62e0a0, fetched by Bazelisk (so, Bazel 8 pre-release).

What's the output of git remote get-url origin; git rev-parse HEAD ?

No response

Have you found anything relevant by searching the web?

Remarkably, not really, this seems to be a pretty edge-case issue!

Any other information, logs, or outputs that you want to share?

For folks who run into similar issues in the future to find this: the cryptic errors produced by clang are,

clang: error: unknown argument: '-cc1'
clang: error: unknown argument '-triple'; did you mean '-Xclang -triple'?
clang: error: unknown argument '-clear-ast-before-backend'; did you mean '-Xclang -clear-ast-before-backend'?
clang: error: unknown argument '-main-file-name'; did you mean '-Xclang -main-file-name'?
clang: error: unknown argument '-mrelocation-model'; did you mean '-Xclang -mrelocation-model'?
clang: error: unknown argument '-pic-level'; did you mean '-Xclang -pic-level'?
clang: error: unknown argument '-fhalf-no-semantic-interposition'; did you mean '-Xclang -fhalf-no-semantic-interposition'?
clang: error: unknown argument '-mframe-pointer=all'; did you mean '-Xclang -mframe-pointer=all'?
clang: error: unknown argument '-nostdsysteminc'; did you mean '-Xclang -nostdsysteminc'?
clang: error: unknown argument '-target-cpu'; did you mean '-Xclang -target-cpu'?
clang: error: unknown argument '-target-feature'; did you mean '-Xclang -target-feature'?
clang: error: unknown argument '-target-feature'; did you mean '-Xclang -target-feature'?
clang: error: unknown argument '-target-feature'; did you mean '-Xclang -target-feature'?
clang: error: unknown argument '-crc'; did you mean '-mcrc'?
clang: error: unknown argument '-target-feature'; did you mean '-Xclang -target-feature'?
clang: error: unknown argument '-target-feature'; did you mean '-Xclang -target-feature'?
clang: error: unknown argument: '-ras'
clang: error: unknown argument '-target-feature'; did you mean '-Xclang -target-feature'?
clang: error: unknown argument: '-sb'

To make progress debugging this issue, you need to run clang under strace.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3We're not considering working on this, but happy to review a PR. (No assignee)team-Remote-ExecIssues and PRs for the Execution (Remote) teamtype: feature request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions