perf: reuse buffer for hashes (root calculation and leaves push) by mcrakhman · Pull Request #305 · celestiaorg/nmt

mcrakhman · 2025-09-18T19:11:52Z

Overview

This PR introduces an option ReuseBuffers which will enable the NamespacedMerkleTree and the NmtHasher to reuse the buffers. The pipeline looks like this (if the option is enabled):

on first run we use the hasher normally, but underneath it tracks all the slices allocated for hashes
after we call Reset on nmt, the buffers are reset and the previously allocated slices are available for reuse
the next time the allocations will not happen.

This gives us roughly 15% performance gains when used repeatedly, results in 3x less memory used and 95% less allocations.

If the option is not enabled, the tree functionality is not affected.

Benchmarks

cpu: Apple M1 Max
BenchmarkNMTReuseVsNoReuse
BenchmarkNMTReuseVsNoReuse/128-leaves-Root-NoReuse
BenchmarkNMTReuseVsNoReuse/128-leaves-Root-NoReuse-10         	   16126	     73970 ns/op	   54377 B/op	     405 allocs/op
BenchmarkNMTReuseVsNoReuse/128-leaves-Root-ReuseTree
BenchmarkNMTReuseVsNoReuse/128-leaves-Root-ReuseTree-10       	   17980	     67043 ns/op	   19035 B/op	      16 allocs/op
BenchmarkNMTReuseVsNoReuse/256-leaves-Root-NoReuse
BenchmarkNMTReuseVsNoReuse/256-leaves-Root-NoReuse-10         	    8052	    156591 ns/op	  114570 B/op	     793 allocs/op
BenchmarkNMTReuseVsNoReuse/256-leaves-Root-ReuseTree
BenchmarkNMTReuseVsNoReuse/256-leaves-Root-ReuseTree-10       	    9129	    133909 ns/op	   37509 B/op	      18 allocs/op
BenchmarkNMTReuseVsNoReuse/512-leaves-Root-NoReuse
BenchmarkNMTReuseVsNoReuse/512-leaves-Root-NoReuse-10         	    3771	    316411 ns/op	  240045 B/op	    1565 allocs/op
BenchmarkNMTReuseVsNoReuse/512-leaves-Root-ReuseTree
BenchmarkNMTReuseVsNoReuse/512-leaves-Root-ReuseTree-10       	    4594	    264981 ns/op	   78535 B/op	      20 allocs/op
BenchmarkNMTReuseVsNoReuse/1024-leaves-Root-NoReuse
BenchmarkNMTReuseVsNoReuse/1024-leaves-Root-NoReuse-10        	    1806	    660065 ns/op	  545798 B/op	    3110 allocs/op
BenchmarkNMTReuseVsNoReuse/1024-leaves-Root-ReuseTree
BenchmarkNMTReuseVsNoReuse/1024-leaves-Root-ReuseTree-10      	    2260	    540510 ns/op	  160695 B/op	      26 allocs/op
BenchmarkNMTReuseVsNoReuse/2048-leaves-Root-NoReuse
BenchmarkNMTReuseVsNoReuse/2048-leaves-Root-NoReuse-10        	     877	   1376183 ns/op	 1037494 B/op	    6194 allocs/op
BenchmarkNMTReuseVsNoReuse/2048-leaves-Root-ReuseTree
BenchmarkNMTReuseVsNoReuse/2048-leaves-Root-ReuseTree-10      	    1076	   1084137 ns/op	  325373 B/op	      38 allocs/op

rach-id

LGTM 👍 Good stuff. Left couple questions to understand more

hasher.go

nmt_test.go

tzdybal

Great job. The way allocs are optimized is really impressive.

I left some comments.

I don't like the bufferedHasher interface, and type casting. I played around a bit, and maybe we could pass "reset function" as option instead, see: mcrakhman/reuse-hash-allocs...tzdybal/reset_func (this branch is missing actual option to expose this, but it's a detail :P)
It's just a suggestion, we can keep the interface (but we need to refactor it to be usable).

tzdybal · 2025-09-23T20:40:12Z

nmt.go

-		n.rawRoot = res
+		if n.reuseBuffers {
+			// we will reuse root's bytes, so we copy
+			n.rawRoot = make([]byte, len(res))


How is this "reusing" if you're allocating a new slice?

Because if we reuse the buffers we also reuse the hash for root as I specified in the comment, that's why if the root can be reused we need to copy it. Tbh it seems clear from the comment. But if you think otherwise, please propose how I should restructure it, if necessary

maybe something like: // because root was allocated on reusable buffer, we need to make a copy

tzdybal · 2025-09-23T20:58:53Z

hasher.go

+		if cap(b) < size {
+			b = make([]byte, size)
+		}


If size is predefined for entire byteBuffer instance, this code can be removed for simplification.

tzdybal · 2025-09-23T20:59:34Z

hasher.go

+}
+
+// newBuffer creates a new byte buffer.
+func newBuffer() *byteBuffer {


As this is used for hashes only (that have the same size), we can simplify the code a bit, but using constant size slices.

Suggested change

func newBuffer() *byteBuffer {

func newBuffer(size int) *byteBuffer {

Yes we can, but why do we need this, IMO it will just make code more complicated (we need to calculate the size from the start and include h.Size + 2 * namespace size), also we would need to be sure if we change something in the implementation we provide a correct parameter to the buffer on init.

Right now the simplicity of this is that the hasher decides how much bytes it needs and just requests it and it can be any size and there is less room to make an error.

Maybe it's negligible (perf-wise), but we have to check the size of the slice every time we reuse a buffer, and it will never be of different size - this looks like a waste for me.

There is a Size method in NmtHasher already.

tzdybal · 2025-09-23T21:02:25Z

nmt.go

+	if reuseHasher, ok := n.treeHasher.(bufferedHasher); ok && n.reuseBuffers {
+		reuseHasher.resetBuffer()
+	}


Suggested change

if reuseHasher, ok := n.treeHasher.(bufferedHasher); ok && n.reuseBuffers {

reuseHasher.resetBuffer()

}

if opts.ReuseBuffers {

if reuseHasher, ok := opts.Hasher.(bufferedHasher); ok {

reuseHasher.resetBuffer()

}

}

We don't have opts here, but I can split it in nested ifs if you want

Ah, sorry, I copy pasted ;) But you got my intention correct - I would prefer nested if (so we don't try to cast if we're not going to reuse buffers).

tzdybal · 2025-09-23T21:10:35Z

nmt.go


+// bufferedHasher is the interface to which default hasher conforms, which resets
+// the buffer and enables it to reuse for hashes.
+type bufferedHasher interface {


nit: this should be called bufferResetter not bufferedHasher as there are no methods related to hashing in this interface.

Agree that makes it more clear

tzdybal · 2025-09-23T21:22:21Z

nmt.go

+// bufferedHasher is the interface to which default hasher conforms, which resets
+// the buffer and enables it to reuse for hashes.
+type bufferedHasher interface {
+	resetBuffer()


This method cannot be implemented outside of the package - therefore you need to fork repository to add custom Hasher implementation. I think, entire idea behind Hasher interface was to enable easy replacement when nmt is used as library.

Both interface (type) and method has to be visible so they could be implemented. I would also suggest putting it next to Hasher interface definition.

Yes it can't be, but that's the point, we don't expose it outside the package (we use camelCase), this is basically a way to both allow calling NmtHasher's private method resetBuffer and allow working with custom hasher implementations which will not have this func implemented.

But then the API is confusing - because you can use your own Hasher and set ReuseBuffer to true at the same time, but it will have no effect.

The question is - do we need to expose Hasher as option? Providing custom underlying hash algorithm seems to be enough for most cases IMHO.

tzdybal · 2025-09-23T21:40:07Z

nmt_test.go

+type nonBufferedHasher struct {
+	hasher *NmtHasher
+}
+
+func (n nonBufferedHasher) IsMaxNamespaceIDIgnored() bool {
+	return n.hasher.IsMaxNamespaceIDIgnored()
+}
+
+func (n nonBufferedHasher) NamespaceSize() namespace.IDSize {
+	return n.hasher.NamespaceSize()
+}
+
+func (n nonBufferedHasher) HashLeaf(data []byte) ([]byte, error) {
+	return n.hasher.HashLeaf(data)
+}
+
+func (n nonBufferedHasher) HashNode(leftChild, rightChild []byte) ([]byte, error) {
+	return n.hasher.HashNode(leftChild, rightChild)
+}
+
+func (n nonBufferedHasher) EmptyRoot() []byte {
+	return n.hasher.EmptyRoot()
+}


This can be simplified:

Suggested change

type nonBufferedHasher struct {

hasher *NmtHasher

}

func (n nonBufferedHasher) IsMaxNamespaceIDIgnored() bool {

return n.hasher.IsMaxNamespaceIDIgnored()

}

func (n nonBufferedHasher) NamespaceSize() namespace.IDSize {

return n.hasher.NamespaceSize()

}

func (n nonBufferedHasher) HashLeaf(data []byte) ([]byte, error) {

return n.hasher.HashLeaf(data)

}

func (n nonBufferedHasher) HashNode(leftChild, rightChild []byte) ([]byte, error) {

return n.hasher.HashNode(leftChild, rightChild)

}

func (n nonBufferedHasher) EmptyRoot() []byte {

return n.hasher.EmptyRoot()

}

type nonBufferedHasher struct {

*NmtHasher

}

But then, what's the point in having this type at all?

No, the point of this to check that the algorithm works fine with custom hasher which doesn't support resetBuffer, because this hasher will not give us access to resetBuffer (see that we don't give access to underlying methods by having hasher *NmtHasher instead of *NmtHasher)

Makes sense now.
Please add a comment clarifying this, because it's hard to spot that everything except single method is exposed.

tzdybal

Created follow-up issue for API cleanup that could make bufferedHasher obsolete and overall API cleaner (I will add more data there soon).

Created follow-up issue for refactoring.

## Overview Use buffering approach during root calculation: - each tree has buffer for leaves (in wrapper) and nmt hashes (in tree) - we limit parallel operations during root computation, so each tree is reused in specific goroutine, thus we don't allocate these buffers - the trees are managed by tree pool - we can reuse the buffers for next square - the trees can be reused for square sizes which are smaller, for larger sizes they will be reallocated on the fly This gives 95% lower allocations for subsequent reuses and 95% less memory usage and 2x speed. **Note:** the pr relies on improved NMT version which supports buffer reuse for hashes: celestiaorg/nmt#305 ## Benchmarks ### Without buffering ``` cpu: Apple M1 Max BenchmarkEDSRootsWithErasuredNMT BenchmarkEDSRootsWithErasuredNMT/32x32x512_ODS=0MB,_EDS=0MB BenchmarkEDSRootsWithErasuredNMT/32x32x512_ODS=0MB,_EDS=0MB-10 381 2894619 ns/op 8566885 B/op 44171 allocs/op BenchmarkEDSRootsWithErasuredNMT/64x64x512_ODS=2MB,_EDS=8MB BenchmarkEDSRootsWithErasuredNMT/64x64x512_ODS=2MB,_EDS=8MB-10 151 7806400 ns/op 33918235 B/op 170768 allocs/op BenchmarkEDSRootsWithErasuredNMT/128x128x512_ODS=8MB,_EDS=32MB BenchmarkEDSRootsWithErasuredNMT/128x128x512_ODS=8MB,_EDS=32MB-10 42 26461898 ns/op 135224424 B/op 670235 allocs/op BenchmarkEDSRootsWithErasuredNMT/256x256x512_ODS=32MB,_EDS=128MB BenchmarkEDSRootsWithErasuredNMT/256x256x512_ODS=32MB,_EDS=128MB-10 10 105217712 ns/op 544944785 B/op 2653211 allocs/op BenchmarkEDSRootsWithErasuredNMT/512x512x512_ODS=128MB,_EDS=512MB BenchmarkEDSRootsWithErasuredNMT/512x512x512_ODS=128MB,_EDS=512MB-10 3 397317458 ns/op 2186366608 B/op 10559819 allocs/op BenchmarkEDSRootsWithErasuredNMT/1024x1024x512_ODS=512MB,_EDS=2048MB BenchmarkEDSRootsWithErasuredNMT/1024x1024x512_ODS=512MB,_EDS=2048MB-10 1 2713678708 ns/op 8714358288 B/op 42129390 allocs/op ``` ### With buffering ``` cpu: Apple M1 Max BenchmarkEDSRootsWithBufferedErasuredNMT BenchmarkEDSRootsWithBufferedErasuredNMT/32x32x512_ODS=0MB,_EDS=0MB BenchmarkEDSRootsWithBufferedErasuredNMT/32x32x512_ODS=0MB,_EDS=0MB-10 1626 839702 ns/op 328908 B/op 1348 allocs/op BenchmarkEDSRootsWithBufferedErasuredNMT/64x64x512_ODS=2MB,_EDS=8MB BenchmarkEDSRootsWithBufferedErasuredNMT/64x64x512_ODS=2MB,_EDS=8MB-10 384 2951978 ns/op 1283513 B/op 2948 allocs/op BenchmarkEDSRootsWithBufferedErasuredNMT/128x128x512_ODS=8MB,_EDS=32MB BenchmarkEDSRootsWithBufferedErasuredNMT/128x128x512_ODS=8MB,_EDS=32MB-10 115 10319029 ns/op 5000122 B/op 6404 allocs/op BenchmarkEDSRootsWithBufferedErasuredNMT/256x256x512_ODS=32MB,_EDS=128MB BenchmarkEDSRootsWithBufferedErasuredNMT/256x256x512_ODS=32MB,_EDS=128MB-10 22 52192040 ns/op 19454644 B/op 13828 allocs/op BenchmarkEDSRootsWithBufferedErasuredNMT/512x512x512_ODS=128MB,_EDS=512MB BenchmarkEDSRootsWithBufferedErasuredNMT/512x512x512_ODS=128MB,_EDS=512MB-10 6 171740514 ns/op 80885168 B/op 29700 allocs/op BenchmarkEDSRootsWithBufferedErasuredNMT/1024x1024x512_ODS=512MB,_EDS=2048MB BenchmarkEDSRootsWithBufferedErasuredNMT/1024x1024x512_ODS=512MB,_EDS=2048MB-10 2 699427729 ns/op 329695408 B/op 69636 allocs/op ```

mcrakhman added 2 commits September 18, 2025 18:10

perf: use byte buffer for alloc reuse

dcaa76b

refactor: add docs and tests

34ea794

rach-id previously approved these changes Sep 19, 2025

View reviewed changes

hasher.go Show resolved Hide resolved

hasher.go Show resolved Hide resolved

hasher.go Show resolved Hide resolved

nmt_test.go Show resolved Hide resolved

rach-id requested a review from evan-forbes September 19, 2025 04:13

mcrakhman mentioned this pull request Sep 19, 2025

perf: use nmt buffering for root calcuation celestiaorg/rsmt2d#381

Merged

mcrakhman marked this pull request as ready for review September 22, 2025 21:14

mcrakhman added 2 commits September 22, 2025 23:44

Merge branch 'main' into mcrakhman/reuse-hash-allocs

dcb6d16

docs: document reset and reuse buffers

93c2baa

mcrakhman dismissed rach-id’s stale review via 93c2baa September 22, 2025 22:03

evan-forbes requested review from rootulp and tzdybal September 23, 2025 01:46

rach-id approved these changes Sep 23, 2025

View reviewed changes

rootulp approved these changes Sep 23, 2025

View reviewed changes

evan-forbes approved these changes Sep 23, 2025

View reviewed changes

tzdybal previously requested changes Sep 23, 2025

View reviewed changes

This was referenced Sep 24, 2025

Consider API cleanup #308

Open

Consider API changes #309

Closed

tzdybal reviewed Sep 24, 2025

View reviewed changes

tzdybal self-requested a review September 24, 2025 10:39

mcrakhman merged commit df8eeb6 into main Sep 24, 2025
8 checks passed

mcrakhman deleted the mcrakhman/reuse-hash-allocs branch September 24, 2025 11:30

	func newBuffer() *byteBuffer {
	func newBuffer(size int) *byteBuffer {

Conversation

mcrakhman commented Sep 18, 2025

Overview

Benchmarks

Uh oh!

rach-id left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tzdybal left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mcrakhman Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tzdybal left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mcrakhman Sep 24, 2025 •

edited

Loading