-
Notifications
You must be signed in to change notification settings - Fork 136
V3 idea: Loosen the restriction of action input as Merkle tree #141
Description
I have observed that Bazel can spend a lot of CPU resources calculating merkel tree digests. This has been discussed in bazelbuild/bazel#10875 and Extend the Action Cache with alias digests.
The key point is that the single input Merkle tree only needs to be resolved on cache miss, which should be rare, so the client should be allowed to check for cache hit using something else.
One idea was to create an alias cache entry where the client would be able to calculate the digest in any suitable way. The problem is that the alias has to be uploaded by the clients, a trusted CI machine or an untrusted developer machine, but not by the remote execution server side. Therefore, using action cache alias makes the system vulnerable for cache poisoning.
Instead, @EricBurnett suggests to loosen the restriction on the input to describe partial trees:
#140 (comment)
For merkle trees as inputs, the general properties we care about are:
- Recursively defined, so that sharing trees in inputs doesn't require
operating on a whole tree each time- Parallelly uploadable, so that it doesn't add unnecessary round-trips
on the order of the depth of the tree.
https://groups.google.com/forum/#!msg/remote-execution-apis/F0Qb4m0J4Vg/QANi1BMdAgAJ
I will note that Merkle Trees, when used as inputs, are defined as they are to achieve:
- Reusability (sub-trees shared by two actions will share Merkle Tree nodes),
- Determinism (the same set of inputs will always get the same tree, regardless of client)
What would be a good design?
- Extend
message Directoryto include more extra roots, not just subdirectories? - Let
Action.input_root_digestbe repeated?
Any other design ideas or any ideas to solve the problem in a totally different way?