Skip to content

perf: reuse buffer for hashes (root calculation and leaves push)#305

Merged
mcrakhman merged 4 commits intomainfrom
mcrakhman/reuse-hash-allocs
Sep 24, 2025
Merged

perf: reuse buffer for hashes (root calculation and leaves push)#305
mcrakhman merged 4 commits intomainfrom
mcrakhman/reuse-hash-allocs

Conversation

@mcrakhman
Copy link
Member

Overview

This PR introduces an option ReuseBuffers which will enable the NamespacedMerkleTree and the NmtHasher to reuse the buffers. The pipeline looks like this (if the option is enabled):

  • on first run we use the hasher normally, but underneath it tracks all the slices allocated for hashes
  • after we call Reset on nmt, the buffers are reset and the previously allocated slices are available for reuse
  • the next time the allocations will not happen.

This gives us roughly 15% performance gains when used repeatedly, results in 3x less memory used and 95% less allocations.

If the option is not enabled, the tree functionality is not affected.

Benchmarks

cpu: Apple M1 Max
BenchmarkNMTReuseVsNoReuse
BenchmarkNMTReuseVsNoReuse/128-leaves-Root-NoReuse
BenchmarkNMTReuseVsNoReuse/128-leaves-Root-NoReuse-10         	   16126	     73970 ns/op	   54377 B/op	     405 allocs/op
BenchmarkNMTReuseVsNoReuse/128-leaves-Root-ReuseTree
BenchmarkNMTReuseVsNoReuse/128-leaves-Root-ReuseTree-10       	   17980	     67043 ns/op	   19035 B/op	      16 allocs/op
BenchmarkNMTReuseVsNoReuse/256-leaves-Root-NoReuse
BenchmarkNMTReuseVsNoReuse/256-leaves-Root-NoReuse-10         	    8052	    156591 ns/op	  114570 B/op	     793 allocs/op
BenchmarkNMTReuseVsNoReuse/256-leaves-Root-ReuseTree
BenchmarkNMTReuseVsNoReuse/256-leaves-Root-ReuseTree-10       	    9129	    133909 ns/op	   37509 B/op	      18 allocs/op
BenchmarkNMTReuseVsNoReuse/512-leaves-Root-NoReuse
BenchmarkNMTReuseVsNoReuse/512-leaves-Root-NoReuse-10         	    3771	    316411 ns/op	  240045 B/op	    1565 allocs/op
BenchmarkNMTReuseVsNoReuse/512-leaves-Root-ReuseTree
BenchmarkNMTReuseVsNoReuse/512-leaves-Root-ReuseTree-10       	    4594	    264981 ns/op	   78535 B/op	      20 allocs/op
BenchmarkNMTReuseVsNoReuse/1024-leaves-Root-NoReuse
BenchmarkNMTReuseVsNoReuse/1024-leaves-Root-NoReuse-10        	    1806	    660065 ns/op	  545798 B/op	    3110 allocs/op
BenchmarkNMTReuseVsNoReuse/1024-leaves-Root-ReuseTree
BenchmarkNMTReuseVsNoReuse/1024-leaves-Root-ReuseTree-10      	    2260	    540510 ns/op	  160695 B/op	      26 allocs/op
BenchmarkNMTReuseVsNoReuse/2048-leaves-Root-NoReuse
BenchmarkNMTReuseVsNoReuse/2048-leaves-Root-NoReuse-10        	     877	   1376183 ns/op	 1037494 B/op	    6194 allocs/op
BenchmarkNMTReuseVsNoReuse/2048-leaves-Root-ReuseTree
BenchmarkNMTReuseVsNoReuse/2048-leaves-Root-ReuseTree-10      	    1076	   1084137 ns/op	  325373 B/op	      38 allocs/op

rach-id
rach-id previously approved these changes Sep 19, 2025
Copy link
Member

@rach-id rach-id left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 Good stuff. Left couple questions to understand more

Copy link
Contributor

@tzdybal tzdybal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job. The way allocs are optimized is really impressive.

I left some comments.

I don't like the bufferedHasher interface, and type casting. I played around a bit, and maybe we could pass "reset function" as option instead, see: mcrakhman/reuse-hash-allocs...tzdybal/reset_func (this branch is missing actual option to expose this, but it's a detail :P)
It's just a suggestion, we can keep the interface (but we need to refactor it to be usable).

n.rawRoot = res
if n.reuseBuffers {
// we will reuse root's bytes, so we copy
n.rawRoot = make([]byte, len(res))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this "reusing" if you're allocating a new slice?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because if we reuse the buffers we also reuse the hash for root as I specified in the comment, that's why if the root can be reused we need to copy it. Tbh it seems clear from the comment. But if you think otherwise, please propose how I should restructure it, if necessary

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe something like: // because root was allocated on reusable buffer, we need to make a copy

Comment on lines +370 to +372
if cap(b) < size {
b = make([]byte, size)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If size is predefined for entire byteBuffer instance, this code can be removed for simplification.

}

// newBuffer creates a new byte buffer.
func newBuffer() *byteBuffer {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is used for hashes only (that have the same size), we can simplify the code a bit, but using constant size slices.

Suggested change
func newBuffer() *byteBuffer {
func newBuffer(size int) *byteBuffer {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can, but why do we need this, IMO it will just make code more complicated (we need to calculate the size from the start and include h.Size + 2 * namespace size), also we would need to be sure if we change something in the implementation we provide a correct parameter to the buffer on init.

Right now the simplicity of this is that the hasher decides how much bytes it needs and just requests it and it can be any size and there is less room to make an error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's negligible (perf-wise), but we have to check the size of the slice every time we reuse a buffer, and it will never be of different size - this looks like a waste for me.

There is a Size method in NmtHasher already.

Comment on lines +204 to +206
if reuseHasher, ok := n.treeHasher.(bufferedHasher); ok && n.reuseBuffers {
reuseHasher.resetBuffer()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if reuseHasher, ok := n.treeHasher.(bufferedHasher); ok && n.reuseBuffers {
reuseHasher.resetBuffer()
}
if opts.ReuseBuffers {
if reuseHasher, ok := opts.Hasher.(bufferedHasher); ok {
reuseHasher.resetBuffer()
}
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have opts here, but I can split it in nested ifs if you want

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry, I copy pasted ;) But you got my intention correct - I would prefer nested if (so we don't try to cast if we're not going to reuse buffers).


// bufferedHasher is the interface to which default hasher conforms, which resets
// the buffer and enables it to reuse for hashes.
type bufferedHasher interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this should be called bufferResetter not bufferedHasher as there are no methods related to hashing in this interface.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that makes it more clear

// bufferedHasher is the interface to which default hasher conforms, which resets
// the buffer and enables it to reuse for hashes.
type bufferedHasher interface {
resetBuffer()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method cannot be implemented outside of the package - therefore you need to fork repository to add custom Hasher implementation. I think, entire idea behind Hasher interface was to enable easy replacement when nmt is used as library.

Both interface (type) and method has to be visible so they could be implemented. I would also suggest putting it next to Hasher interface definition.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it can't be, but that's the point, we don't expose it outside the package (we use camelCase), this is basically a way to both allow calling NmtHasher's private method resetBuffer and allow working with custom hasher implementations which will not have this func implemented.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then the API is confusing - because you can use your own Hasher and set ReuseBuffer to true at the same time, but it will have no effect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question is - do we need to expose Hasher as option? Providing custom underlying hash algorithm seems to be enough for most cases IMHO.

Comment on lines +1029 to +1051
type nonBufferedHasher struct {
hasher *NmtHasher
}

func (n nonBufferedHasher) IsMaxNamespaceIDIgnored() bool {
return n.hasher.IsMaxNamespaceIDIgnored()
}

func (n nonBufferedHasher) NamespaceSize() namespace.IDSize {
return n.hasher.NamespaceSize()
}

func (n nonBufferedHasher) HashLeaf(data []byte) ([]byte, error) {
return n.hasher.HashLeaf(data)
}

func (n nonBufferedHasher) HashNode(leftChild, rightChild []byte) ([]byte, error) {
return n.hasher.HashNode(leftChild, rightChild)
}

func (n nonBufferedHasher) EmptyRoot() []byte {
return n.hasher.EmptyRoot()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be simplified:

Suggested change
type nonBufferedHasher struct {
hasher *NmtHasher
}
func (n nonBufferedHasher) IsMaxNamespaceIDIgnored() bool {
return n.hasher.IsMaxNamespaceIDIgnored()
}
func (n nonBufferedHasher) NamespaceSize() namespace.IDSize {
return n.hasher.NamespaceSize()
}
func (n nonBufferedHasher) HashLeaf(data []byte) ([]byte, error) {
return n.hasher.HashLeaf(data)
}
func (n nonBufferedHasher) HashNode(leftChild, rightChild []byte) ([]byte, error) {
return n.hasher.HashNode(leftChild, rightChild)
}
func (n nonBufferedHasher) EmptyRoot() []byte {
return n.hasher.EmptyRoot()
}
type nonBufferedHasher struct {
*NmtHasher
}

But then, what's the point in having this type at all?

Copy link
Member Author

@mcrakhman mcrakhman Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the point of this to check that the algorithm works fine with custom hasher which doesn't support resetBuffer, because this hasher will not give us access to resetBuffer (see that we don't give access to underlying methods by having hasher *NmtHasher instead of *NmtHasher)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense now.
Please add a comment clarifying this, because it's hard to spot that everything except single method is exposed.

This was referenced Sep 24, 2025
Copy link
Contributor

@tzdybal tzdybal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created follow-up issue for API cleanup that could make bufferedHasher obsolete and overall API cleaner (I will add more data there soon).

@tzdybal tzdybal self-requested a review September 24, 2025 10:39
@tzdybal tzdybal dismissed their stale review September 24, 2025 10:41

Created follow-up issue for refactoring.

@mcrakhman mcrakhman merged commit df8eeb6 into main Sep 24, 2025
8 checks passed
@mcrakhman mcrakhman deleted the mcrakhman/reuse-hash-allocs branch September 24, 2025 11:30
mcrakhman added a commit to celestiaorg/rsmt2d that referenced this pull request Sep 24, 2025
## Overview
Use buffering approach during root calculation: 
- each tree has buffer for leaves (in wrapper) and nmt hashes (in tree)
- we limit parallel operations during root computation, so each tree is
reused in specific goroutine, thus we don't allocate these buffers
- the trees are managed by tree pool
- we can reuse the buffers for next square
- the trees can be reused for square sizes which are smaller, for larger
sizes they will be reallocated on the fly

This gives 95% lower allocations for subsequent reuses and 95% less
memory usage and 2x speed.

**Note:** the pr relies on improved NMT version which supports buffer
reuse for hashes: celestiaorg/nmt#305

## Benchmarks
### Without buffering
```
cpu: Apple M1 Max
BenchmarkEDSRootsWithErasuredNMT
BenchmarkEDSRootsWithErasuredNMT/32x32x512_ODS=0MB,_EDS=0MB
BenchmarkEDSRootsWithErasuredNMT/32x32x512_ODS=0MB,_EDS=0MB-10         	     381	   2894619 ns/op	 8566885 B/op	   44171 allocs/op
BenchmarkEDSRootsWithErasuredNMT/64x64x512_ODS=2MB,_EDS=8MB
BenchmarkEDSRootsWithErasuredNMT/64x64x512_ODS=2MB,_EDS=8MB-10         	     151	   7806400 ns/op	33918235 B/op	  170768 allocs/op
BenchmarkEDSRootsWithErasuredNMT/128x128x512_ODS=8MB,_EDS=32MB
BenchmarkEDSRootsWithErasuredNMT/128x128x512_ODS=8MB,_EDS=32MB-10      	      42	  26461898 ns/op	135224424 B/op	  670235 allocs/op
BenchmarkEDSRootsWithErasuredNMT/256x256x512_ODS=32MB,_EDS=128MB
BenchmarkEDSRootsWithErasuredNMT/256x256x512_ODS=32MB,_EDS=128MB-10    	      10	 105217712 ns/op	544944785 B/op	 2653211 allocs/op
BenchmarkEDSRootsWithErasuredNMT/512x512x512_ODS=128MB,_EDS=512MB
BenchmarkEDSRootsWithErasuredNMT/512x512x512_ODS=128MB,_EDS=512MB-10   	       3	 397317458 ns/op	2186366608 B/op	10559819 allocs/op
BenchmarkEDSRootsWithErasuredNMT/1024x1024x512_ODS=512MB,_EDS=2048MB
BenchmarkEDSRootsWithErasuredNMT/1024x1024x512_ODS=512MB,_EDS=2048MB-10         	       1	2713678708 ns/op	8714358288 B/op	42129390 allocs/op
```

### With buffering
```
cpu: Apple M1 Max
BenchmarkEDSRootsWithBufferedErasuredNMT
BenchmarkEDSRootsWithBufferedErasuredNMT/32x32x512_ODS=0MB,_EDS=0MB
BenchmarkEDSRootsWithBufferedErasuredNMT/32x32x512_ODS=0MB,_EDS=0MB-10         	    1626	    839702 ns/op	  328908 B/op	    1348 allocs/op
BenchmarkEDSRootsWithBufferedErasuredNMT/64x64x512_ODS=2MB,_EDS=8MB
BenchmarkEDSRootsWithBufferedErasuredNMT/64x64x512_ODS=2MB,_EDS=8MB-10         	     384	   2951978 ns/op	 1283513 B/op	    2948 allocs/op
BenchmarkEDSRootsWithBufferedErasuredNMT/128x128x512_ODS=8MB,_EDS=32MB
BenchmarkEDSRootsWithBufferedErasuredNMT/128x128x512_ODS=8MB,_EDS=32MB-10      	     115	  10319029 ns/op	 5000122 B/op	    6404 allocs/op
BenchmarkEDSRootsWithBufferedErasuredNMT/256x256x512_ODS=32MB,_EDS=128MB
BenchmarkEDSRootsWithBufferedErasuredNMT/256x256x512_ODS=32MB,_EDS=128MB-10    	      22	  52192040 ns/op	19454644 B/op	   13828 allocs/op
BenchmarkEDSRootsWithBufferedErasuredNMT/512x512x512_ODS=128MB,_EDS=512MB
BenchmarkEDSRootsWithBufferedErasuredNMT/512x512x512_ODS=128MB,_EDS=512MB-10   	       6	 171740514 ns/op	80885168 B/op	   29700 allocs/op
BenchmarkEDSRootsWithBufferedErasuredNMT/1024x1024x512_ODS=512MB,_EDS=2048MB
BenchmarkEDSRootsWithBufferedErasuredNMT/1024x1024x512_ODS=512MB,_EDS=2048MB-10         	       2	 699427729 ns/op	329695408 B/op	   69636 allocs/op
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants