perf: reuse buffer for hashes (root calculation and leaves push)#305
perf: reuse buffer for hashes (root calculation and leaves push)#305
Conversation
rach-id
left a comment
There was a problem hiding this comment.
LGTM 👍 Good stuff. Left couple questions to understand more
tzdybal
left a comment
There was a problem hiding this comment.
Great job. The way allocs are optimized is really impressive.
I left some comments.
I don't like the bufferedHasher interface, and type casting. I played around a bit, and maybe we could pass "reset function" as option instead, see: mcrakhman/reuse-hash-allocs...tzdybal/reset_func (this branch is missing actual option to expose this, but it's a detail :P)
It's just a suggestion, we can keep the interface (but we need to refactor it to be usable).
| n.rawRoot = res | ||
| if n.reuseBuffers { | ||
| // we will reuse root's bytes, so we copy | ||
| n.rawRoot = make([]byte, len(res)) |
There was a problem hiding this comment.
How is this "reusing" if you're allocating a new slice?
There was a problem hiding this comment.
Because if we reuse the buffers we also reuse the hash for root as I specified in the comment, that's why if the root can be reused we need to copy it. Tbh it seems clear from the comment. But if you think otherwise, please propose how I should restructure it, if necessary
There was a problem hiding this comment.
maybe something like: // because root was allocated on reusable buffer, we need to make a copy
| if cap(b) < size { | ||
| b = make([]byte, size) | ||
| } |
There was a problem hiding this comment.
If size is predefined for entire byteBuffer instance, this code can be removed for simplification.
| } | ||
|
|
||
| // newBuffer creates a new byte buffer. | ||
| func newBuffer() *byteBuffer { |
There was a problem hiding this comment.
As this is used for hashes only (that have the same size), we can simplify the code a bit, but using constant size slices.
| func newBuffer() *byteBuffer { | |
| func newBuffer(size int) *byteBuffer { |
There was a problem hiding this comment.
Yes we can, but why do we need this, IMO it will just make code more complicated (we need to calculate the size from the start and include h.Size + 2 * namespace size), also we would need to be sure if we change something in the implementation we provide a correct parameter to the buffer on init.
Right now the simplicity of this is that the hasher decides how much bytes it needs and just requests it and it can be any size and there is less room to make an error.
There was a problem hiding this comment.
Maybe it's negligible (perf-wise), but we have to check the size of the slice every time we reuse a buffer, and it will never be of different size - this looks like a waste for me.
There is a Size method in NmtHasher already.
| if reuseHasher, ok := n.treeHasher.(bufferedHasher); ok && n.reuseBuffers { | ||
| reuseHasher.resetBuffer() | ||
| } |
There was a problem hiding this comment.
| if reuseHasher, ok := n.treeHasher.(bufferedHasher); ok && n.reuseBuffers { | |
| reuseHasher.resetBuffer() | |
| } | |
| if opts.ReuseBuffers { | |
| if reuseHasher, ok := opts.Hasher.(bufferedHasher); ok { | |
| reuseHasher.resetBuffer() | |
| } | |
| } |
There was a problem hiding this comment.
We don't have opts here, but I can split it in nested ifs if you want
There was a problem hiding this comment.
Ah, sorry, I copy pasted ;) But you got my intention correct - I would prefer nested if (so we don't try to cast if we're not going to reuse buffers).
|
|
||
| // bufferedHasher is the interface to which default hasher conforms, which resets | ||
| // the buffer and enables it to reuse for hashes. | ||
| type bufferedHasher interface { |
There was a problem hiding this comment.
nit: this should be called bufferResetter not bufferedHasher as there are no methods related to hashing in this interface.
There was a problem hiding this comment.
Agree that makes it more clear
| // bufferedHasher is the interface to which default hasher conforms, which resets | ||
| // the buffer and enables it to reuse for hashes. | ||
| type bufferedHasher interface { | ||
| resetBuffer() |
There was a problem hiding this comment.
This method cannot be implemented outside of the package - therefore you need to fork repository to add custom Hasher implementation. I think, entire idea behind Hasher interface was to enable easy replacement when nmt is used as library.
Both interface (type) and method has to be visible so they could be implemented. I would also suggest putting it next to Hasher interface definition.
There was a problem hiding this comment.
Yes it can't be, but that's the point, we don't expose it outside the package (we use camelCase), this is basically a way to both allow calling NmtHasher's private method resetBuffer and allow working with custom hasher implementations which will not have this func implemented.
There was a problem hiding this comment.
But then the API is confusing - because you can use your own Hasher and set ReuseBuffer to true at the same time, but it will have no effect.
There was a problem hiding this comment.
The question is - do we need to expose Hasher as option? Providing custom underlying hash algorithm seems to be enough for most cases IMHO.
| type nonBufferedHasher struct { | ||
| hasher *NmtHasher | ||
| } | ||
|
|
||
| func (n nonBufferedHasher) IsMaxNamespaceIDIgnored() bool { | ||
| return n.hasher.IsMaxNamespaceIDIgnored() | ||
| } | ||
|
|
||
| func (n nonBufferedHasher) NamespaceSize() namespace.IDSize { | ||
| return n.hasher.NamespaceSize() | ||
| } | ||
|
|
||
| func (n nonBufferedHasher) HashLeaf(data []byte) ([]byte, error) { | ||
| return n.hasher.HashLeaf(data) | ||
| } | ||
|
|
||
| func (n nonBufferedHasher) HashNode(leftChild, rightChild []byte) ([]byte, error) { | ||
| return n.hasher.HashNode(leftChild, rightChild) | ||
| } | ||
|
|
||
| func (n nonBufferedHasher) EmptyRoot() []byte { | ||
| return n.hasher.EmptyRoot() | ||
| } |
There was a problem hiding this comment.
This can be simplified:
| type nonBufferedHasher struct { | |
| hasher *NmtHasher | |
| } | |
| func (n nonBufferedHasher) IsMaxNamespaceIDIgnored() bool { | |
| return n.hasher.IsMaxNamespaceIDIgnored() | |
| } | |
| func (n nonBufferedHasher) NamespaceSize() namespace.IDSize { | |
| return n.hasher.NamespaceSize() | |
| } | |
| func (n nonBufferedHasher) HashLeaf(data []byte) ([]byte, error) { | |
| return n.hasher.HashLeaf(data) | |
| } | |
| func (n nonBufferedHasher) HashNode(leftChild, rightChild []byte) ([]byte, error) { | |
| return n.hasher.HashNode(leftChild, rightChild) | |
| } | |
| func (n nonBufferedHasher) EmptyRoot() []byte { | |
| return n.hasher.EmptyRoot() | |
| } | |
| type nonBufferedHasher struct { | |
| *NmtHasher | |
| } |
But then, what's the point in having this type at all?
There was a problem hiding this comment.
No, the point of this to check that the algorithm works fine with custom hasher which doesn't support resetBuffer, because this hasher will not give us access to resetBuffer (see that we don't give access to underlying methods by having hasher *NmtHasher instead of *NmtHasher)
There was a problem hiding this comment.
Makes sense now.
Please add a comment clarifying this, because it's hard to spot that everything except single method is exposed.
tzdybal
left a comment
There was a problem hiding this comment.
Created follow-up issue for API cleanup that could make bufferedHasher obsolete and overall API cleaner (I will add more data there soon).
Created follow-up issue for refactoring.
## Overview Use buffering approach during root calculation: - each tree has buffer for leaves (in wrapper) and nmt hashes (in tree) - we limit parallel operations during root computation, so each tree is reused in specific goroutine, thus we don't allocate these buffers - the trees are managed by tree pool - we can reuse the buffers for next square - the trees can be reused for square sizes which are smaller, for larger sizes they will be reallocated on the fly This gives 95% lower allocations for subsequent reuses and 95% less memory usage and 2x speed. **Note:** the pr relies on improved NMT version which supports buffer reuse for hashes: celestiaorg/nmt#305 ## Benchmarks ### Without buffering ``` cpu: Apple M1 Max BenchmarkEDSRootsWithErasuredNMT BenchmarkEDSRootsWithErasuredNMT/32x32x512_ODS=0MB,_EDS=0MB BenchmarkEDSRootsWithErasuredNMT/32x32x512_ODS=0MB,_EDS=0MB-10 381 2894619 ns/op 8566885 B/op 44171 allocs/op BenchmarkEDSRootsWithErasuredNMT/64x64x512_ODS=2MB,_EDS=8MB BenchmarkEDSRootsWithErasuredNMT/64x64x512_ODS=2MB,_EDS=8MB-10 151 7806400 ns/op 33918235 B/op 170768 allocs/op BenchmarkEDSRootsWithErasuredNMT/128x128x512_ODS=8MB,_EDS=32MB BenchmarkEDSRootsWithErasuredNMT/128x128x512_ODS=8MB,_EDS=32MB-10 42 26461898 ns/op 135224424 B/op 670235 allocs/op BenchmarkEDSRootsWithErasuredNMT/256x256x512_ODS=32MB,_EDS=128MB BenchmarkEDSRootsWithErasuredNMT/256x256x512_ODS=32MB,_EDS=128MB-10 10 105217712 ns/op 544944785 B/op 2653211 allocs/op BenchmarkEDSRootsWithErasuredNMT/512x512x512_ODS=128MB,_EDS=512MB BenchmarkEDSRootsWithErasuredNMT/512x512x512_ODS=128MB,_EDS=512MB-10 3 397317458 ns/op 2186366608 B/op 10559819 allocs/op BenchmarkEDSRootsWithErasuredNMT/1024x1024x512_ODS=512MB,_EDS=2048MB BenchmarkEDSRootsWithErasuredNMT/1024x1024x512_ODS=512MB,_EDS=2048MB-10 1 2713678708 ns/op 8714358288 B/op 42129390 allocs/op ``` ### With buffering ``` cpu: Apple M1 Max BenchmarkEDSRootsWithBufferedErasuredNMT BenchmarkEDSRootsWithBufferedErasuredNMT/32x32x512_ODS=0MB,_EDS=0MB BenchmarkEDSRootsWithBufferedErasuredNMT/32x32x512_ODS=0MB,_EDS=0MB-10 1626 839702 ns/op 328908 B/op 1348 allocs/op BenchmarkEDSRootsWithBufferedErasuredNMT/64x64x512_ODS=2MB,_EDS=8MB BenchmarkEDSRootsWithBufferedErasuredNMT/64x64x512_ODS=2MB,_EDS=8MB-10 384 2951978 ns/op 1283513 B/op 2948 allocs/op BenchmarkEDSRootsWithBufferedErasuredNMT/128x128x512_ODS=8MB,_EDS=32MB BenchmarkEDSRootsWithBufferedErasuredNMT/128x128x512_ODS=8MB,_EDS=32MB-10 115 10319029 ns/op 5000122 B/op 6404 allocs/op BenchmarkEDSRootsWithBufferedErasuredNMT/256x256x512_ODS=32MB,_EDS=128MB BenchmarkEDSRootsWithBufferedErasuredNMT/256x256x512_ODS=32MB,_EDS=128MB-10 22 52192040 ns/op 19454644 B/op 13828 allocs/op BenchmarkEDSRootsWithBufferedErasuredNMT/512x512x512_ODS=128MB,_EDS=512MB BenchmarkEDSRootsWithBufferedErasuredNMT/512x512x512_ODS=128MB,_EDS=512MB-10 6 171740514 ns/op 80885168 B/op 29700 allocs/op BenchmarkEDSRootsWithBufferedErasuredNMT/1024x1024x512_ODS=512MB,_EDS=2048MB BenchmarkEDSRootsWithBufferedErasuredNMT/1024x1024x512_ODS=512MB,_EDS=2048MB-10 2 699427729 ns/op 329695408 B/op 69636 allocs/op ```
Overview
This PR introduces an option
ReuseBufferswhich will enable theNamespacedMerkleTreeand theNmtHasherto reuse the buffers. The pipeline looks like this (if the option is enabled):Reseton nmt, the buffers are reset and the previously allocated slices are available for reuseThis gives us roughly 15% performance gains when used repeatedly, results in 3x less memory used and 95% less allocations.
If the option is not enabled, the tree functionality is not affected.
Benchmarks