Skip to content

perf: avoid redundant slice allocation in leafHash assignment#301

Merged
tzdybal merged 1 commit intomainfrom
tzdybal/rm_leaf_copy
Aug 29, 2025
Merged

perf: avoid redundant slice allocation in leafHash assignment#301
tzdybal merged 1 commit intomainfrom
tzdybal/rm_leaf_copy

Conversation

@tzdybal
Copy link
Contributor

@tzdybal tzdybal commented Aug 28, 2025

Overview

To reduce number of allocations, we can directly return precomputed leafs (without copying them).

Benchmark results

$ go-perftuner benchstat before.txt after.txt
name                       old time/op    new time/op    delta
ComputeRoot/64-leaves-16      115µs ± 1%     112µs ± 0%   -2.80%  (p=0.008 n=5+5)
ComputeRoot/128-leaves-16     230µs ± 5%     223µs ± 1%        ~  (p=0.016 n=5+5)
ComputeRoot/256-leaves-16     456µs ± 1%     446µs ± 1%        ~  (p=0.008 n=5+5)
ComputeRoot/20k-leaves-16    49.7ms ± 1%    48.5ms ± 3%        ~  (p=0.095 n=5+5)

name                       old alloc/op   new alloc/op   delta
ComputeRoot/64-leaves-16     30.4kB ± 0%    27.4kB ± 0%  -10.09%  (p=0.008 n=5+5)
ComputeRoot/128-leaves-16    54.3kB ± 0%    48.1kB ± 0%  -11.32%  (p=0.008 n=5+5)
ComputeRoot/256-leaves-16     114kB ± 0%     102kB ± 0%  -10.73%  (p=0.008 n=5+5)
ComputeRoot/20k-leaves-16    12.2MB ± 0%    11.2MB ± 0%   -7.89%  (p=0.008 n=5+5)

name                       old allocs/op  new allocs/op  delta
ComputeRoot/64-leaves-16        402 ± 0%       338 ± 0%  -15.92%  (p=0.008 n=5+5)
ComputeRoot/128-leaves-16       788 ± 0%       660 ± 0%  -16.24%  (p=0.008 n=5+5)
ComputeRoot/256-leaves-16     1.56k ± 0%     1.30k ± 0%  -16.41%  (p=0.008 n=5+5)
ComputeRoot/20k-leaves-16      120k ± 0%      100k ± 0%  -16.64%  (p=0.008 n=5+5)

To reduce number of allocations, we can directly return precomputed leafs (without copying them).

Benchmark results:
```
$ go-perftuner benchstat before.txt after.txt
name                       old time/op    new time/op    delta
ComputeRoot/64-leaves-16      115µs ± 1%     112µs ± 0%   -2.80%  (p=0.008 n=5+5)
ComputeRoot/128-leaves-16     230µs ± 5%     223µs ± 1%        ~  (p=0.016 n=5+5)
ComputeRoot/256-leaves-16     456µs ± 1%     446µs ± 1%        ~  (p=0.008 n=5+5)
ComputeRoot/20k-leaves-16    49.7ms ± 1%    48.5ms ± 3%        ~  (p=0.095 n=5+5)

name                       old alloc/op   new alloc/op   delta
ComputeRoot/64-leaves-16     30.4kB ± 0%    27.4kB ± 0%  -10.09%  (p=0.008 n=5+5)
ComputeRoot/128-leaves-16    54.3kB ± 0%    48.1kB ± 0%  -11.32%  (p=0.008 n=5+5)
ComputeRoot/256-leaves-16     114kB ± 0%     102kB ± 0%  -10.73%  (p=0.008 n=5+5)
ComputeRoot/20k-leaves-16    12.2MB ± 0%    11.2MB ± 0%   -7.89%  (p=0.008 n=5+5)

name                       old allocs/op  new allocs/op  delta
ComputeRoot/64-leaves-16        402 ± 0%       338 ± 0%  -15.92%  (p=0.008 n=5+5)
ComputeRoot/128-leaves-16       788 ± 0%       660 ± 0%  -16.24%  (p=0.008 n=5+5)
ComputeRoot/256-leaves-16     1.56k ± 0%     1.30k ± 0%  -16.41%  (p=0.008 n=5+5)
ComputeRoot/20k-leaves-16      120k ± 0%      100k ± 0%  -16.64%  (p=0.008 n=5+5)
@tzdybal tzdybal self-assigned this Aug 28, 2025
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @tzdybal, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a performance optimization aimed at reducing memory allocations within the Namespaced Merkle Tree computation. By eliminating an unnecessary slice copy operation, the change improves efficiency and reduces the memory footprint, as evidenced by the provided benchmark results showing significant reductions in allocations and allocs/op.

Highlights

  • Performance Optimization: The computeRoot function in nmt.go was optimized to directly assign the leafHash instead of performing a redundant make([]byte, ...) and copy() operation. This change directly reduces memory allocations.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to improve performance by avoiding redundant slice allocations when assigning leaf hashes. The change is effective, as shown by the benchmark results, reducing allocations by up to 16%. However, this performance gain comes at the cost of breaking encapsulation. By returning a direct slice of the tree's internal state, there's a risk of data corruption if the caller modifies the returned slice. My review includes a comment detailing this risk with a concrete example and suggests reverting to the safer implementation that returns a copy. While the performance improvement is notable, ensuring the library's robustness and preventing subtle bugs should be a higher priority.

case 1:
leafHash := make([]byte, len(n.leafHashes[start]))
copy(leafHash, n.leafHashes[start])
leafHash := n.leafHashes[start]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This change avoids a slice allocation by returning a slice of the internal n.leafHashes. While this improves performance, it introduces a risk of data corruption. A caller of an exported method like Root() could modify the returned slice, which would alter the internal state of the tree.

For example:

  1. A tree with one leaf is created.
  2. Root() is called. It returns a slice pointing directly to n.leafHashes[0].
  3. The caller modifies the returned slice.
  4. Another leaf is pushed to the tree.
  5. Root() is called again. The new root will be computed using the modified leaf hash from step 3, resulting in an incorrect root.

This breaks the encapsulation of the NamespacedMerkleTree and can lead to subtle bugs. To prevent this, it's safer to return a copy of the leaf hash. If the performance gain is critical, this behavior should be explicitly documented in all affected public methods (e.g., Root, Get, MinNamespace) to warn users against modifying the returned slices.

Suggested change
leafHash := n.leafHashes[start]
leafHash := make([]byte, len(n.leafHashes[start]))
copy(leafHash, n.leafHashes[start])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find the issue or PR, but iirc we had some issues before similar to the above comment 🤔

in theory likely not

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into git history, and copying was introduced alongside pre-calculating leaf hashes on Push.
"Encapsulation" also doesn't hold for Root - we're returning cached value (internal slice).

I totally agree with:

this behavior should be explicitly documented in all affected public methods (e.g., Root, Get, MinNamespace) to warn users against modifying the returned slices.

Copy link
Member

@rach-id rach-id left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM given the root computation tests are passing and also the proofs are generated correctly

@tzdybal tzdybal merged commit 6f32fad into main Aug 29, 2025
8 checks passed
@tzdybal tzdybal deleted the tzdybal/rm_leaf_copy branch August 29, 2025 07:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants