-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Insufficient remote Merkle tree size causes slow builds #18686
Description
Description of the bug:
After updating our version of Bazel we saw a significant regression in some of our builds. We saw that a couple of actions towards the end of the build that have a lot of inputs took significantly longer. When looking at trace report we saw that the CPU usage is low when executing those actions while the memory usage of the main Bazel process is constantly going up and down:
Using Git bisect, we found #18015 to be the change that lead to the biggest regression.
We figured out that the builds which have regressed are using --experimental_remote_merkle_tree_cache and we could fix it by increasing --experimental_remote_merkle_tree_cache_size. With an insufficient size, Bazel will keep allocating and deallocating the Merklee trees. Presumably we saw a regression after that change because it keeps Merkle trees around for longer.
While we can work around the issue by increasing the size, a warning or error when this starts happening would be appreciated.
What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
Haven't tried this myself, but I'm assuming this can be replicated on a decently size Java project with enabling --experimental_remote_merkle_tree_cache and setting --experimental_remote_merkle_tree_cache_size to a low value.
Which operating system are you running Bazel on?
MacOS 13.4 & Ubuntu Focal Fossa
What is the output of bazel info release?
/
If bazel info release returns development version or (@non-git), tell us how you built Bazel.
We built on top of commit 286306e from the 6.x branch with some additional patches.
What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?
git@github.com:bazelbuild/bazel.git
a7b96f45df00b4024de4e70b90989956904ca4fb
286306e8358542ce272f7442075bf157a2a62ec7
Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.
Have you found anything relevant by searching the web?
No response
Any other information, logs, or outputs that you want to share?
No response
