Skip to content

V3 idea: Loosen the restriction of action input as Merkle tree #141

@moroten

Description

@moroten

I have observed that Bazel can spend a lot of CPU resources calculating merkel tree digests. This has been discussed in bazelbuild/bazel#10875 and Extend the Action Cache with alias digests.

The key point is that the single input Merkle tree only needs to be resolved on cache miss, which should be rare, so the client should be allowed to check for cache hit using something else.

One idea was to create an alias cache entry where the client would be able to calculate the digest in any suitable way. The problem is that the alias has to be uploaded by the clients, a trusted CI machine or an untrusted developer machine, but not by the remote execution server side. Therefore, using action cache alias makes the system vulnerable for cache poisoning.

Instead, @EricBurnett suggests to loosen the restriction on the input to describe partial trees:

#140 (comment)
For merkle trees as inputs, the general properties we care about are:

  • Recursively defined, so that sharing trees in inputs doesn't require
    operating on a whole tree each time
  • Parallelly uploadable, so that it doesn't add unnecessary round-trips
    on the order of the depth of the tree.

https://groups.google.com/forum/#!msg/remote-execution-apis/F0Qb4m0J4Vg/QANi1BMdAgAJ
I will note that Merkle Trees, when used as inputs, are defined as they are to achieve:

  1. Reusability (sub-trees shared by two actions will share Merkle Tree nodes),
  2. Determinism (the same set of inputs will always get the same tree, regardless of client)

What would be a good design?

  1. Extend message Directory to include more extra roots, not just subdirectories?
  2. Let Action.input_root_digest be repeated?

Any other design ideas or any ideas to solve the problem in a totally different way?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions