Skip to content

perf: use nmt buffering for root calcuation#381

Merged
mcrakhman merged 2 commits intomainfrom
mcrakhman/buffered-nmt
Sep 24, 2025
Merged

perf: use nmt buffering for root calcuation#381
mcrakhman merged 2 commits intomainfrom
mcrakhman/buffered-nmt

Conversation

@mcrakhman
Copy link
Member

@mcrakhman mcrakhman commented Sep 19, 2025

Overview

Use buffering approach during root calculation:

  • each tree has buffer for leaves (in wrapper) and nmt hashes (in tree)
  • we limit parallel operations during root computation, so each tree is reused in specific goroutine, thus we don't allocate these buffers
  • the trees are managed by tree pool
  • we can reuse the buffers for next square
  • the trees can be reused for square sizes which are smaller, for larger sizes they will be reallocated on the fly

This gives 95% lower allocations for subsequent reuses and 95% less memory usage and 2x speed.

Note: the pr relies on improved NMT version which supports buffer reuse for hashes: celestiaorg/nmt#305

Benchmarks

Without buffering

cpu: Apple M1 Max
BenchmarkEDSRootsWithErasuredNMT
BenchmarkEDSRootsWithErasuredNMT/32x32x512_ODS=0MB,_EDS=0MB
BenchmarkEDSRootsWithErasuredNMT/32x32x512_ODS=0MB,_EDS=0MB-10         	     381	   2894619 ns/op	 8566885 B/op	   44171 allocs/op
BenchmarkEDSRootsWithErasuredNMT/64x64x512_ODS=2MB,_EDS=8MB
BenchmarkEDSRootsWithErasuredNMT/64x64x512_ODS=2MB,_EDS=8MB-10         	     151	   7806400 ns/op	33918235 B/op	  170768 allocs/op
BenchmarkEDSRootsWithErasuredNMT/128x128x512_ODS=8MB,_EDS=32MB
BenchmarkEDSRootsWithErasuredNMT/128x128x512_ODS=8MB,_EDS=32MB-10      	      42	  26461898 ns/op	135224424 B/op	  670235 allocs/op
BenchmarkEDSRootsWithErasuredNMT/256x256x512_ODS=32MB,_EDS=128MB
BenchmarkEDSRootsWithErasuredNMT/256x256x512_ODS=32MB,_EDS=128MB-10    	      10	 105217712 ns/op	544944785 B/op	 2653211 allocs/op
BenchmarkEDSRootsWithErasuredNMT/512x512x512_ODS=128MB,_EDS=512MB
BenchmarkEDSRootsWithErasuredNMT/512x512x512_ODS=128MB,_EDS=512MB-10   	       3	 397317458 ns/op	2186366608 B/op	10559819 allocs/op
BenchmarkEDSRootsWithErasuredNMT/1024x1024x512_ODS=512MB,_EDS=2048MB
BenchmarkEDSRootsWithErasuredNMT/1024x1024x512_ODS=512MB,_EDS=2048MB-10         	       1	2713678708 ns/op	8714358288 B/op	42129390 allocs/op

With buffering

cpu: Apple M1 Max
BenchmarkEDSRootsWithBufferedErasuredNMT
BenchmarkEDSRootsWithBufferedErasuredNMT/32x32x512_ODS=0MB,_EDS=0MB
BenchmarkEDSRootsWithBufferedErasuredNMT/32x32x512_ODS=0MB,_EDS=0MB-10         	    1626	    839702 ns/op	  328908 B/op	    1348 allocs/op
BenchmarkEDSRootsWithBufferedErasuredNMT/64x64x512_ODS=2MB,_EDS=8MB
BenchmarkEDSRootsWithBufferedErasuredNMT/64x64x512_ODS=2MB,_EDS=8MB-10         	     384	   2951978 ns/op	 1283513 B/op	    2948 allocs/op
BenchmarkEDSRootsWithBufferedErasuredNMT/128x128x512_ODS=8MB,_EDS=32MB
BenchmarkEDSRootsWithBufferedErasuredNMT/128x128x512_ODS=8MB,_EDS=32MB-10      	     115	  10319029 ns/op	 5000122 B/op	    6404 allocs/op
BenchmarkEDSRootsWithBufferedErasuredNMT/256x256x512_ODS=32MB,_EDS=128MB
BenchmarkEDSRootsWithBufferedErasuredNMT/256x256x512_ODS=32MB,_EDS=128MB-10    	      22	  52192040 ns/op	19454644 B/op	   13828 allocs/op
BenchmarkEDSRootsWithBufferedErasuredNMT/512x512x512_ODS=128MB,_EDS=512MB
BenchmarkEDSRootsWithBufferedErasuredNMT/512x512x512_ODS=128MB,_EDS=512MB-10   	       6	 171740514 ns/op	80885168 B/op	   29700 allocs/op
BenchmarkEDSRootsWithBufferedErasuredNMT/1024x1024x512_ODS=512MB,_EDS=2048MB
BenchmarkEDSRootsWithBufferedErasuredNMT/1024x1024x512_ODS=512MB,_EDS=2048MB-10         	       2	 699427729 ns/op	329695408 B/op	   69636 allocs/op

@mcrakhman mcrakhman marked this pull request as draft September 19, 2025 11:16
@gemini-code-assist
Copy link

Summary of Changes

Hello @mcrakhman, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the performance of NMT root calculations within the data square by introducing a buffered NMT pooling mechanism. This change aims to minimize memory allocations and optimize concurrent processing, leading to more efficient extended data square computations. The new implementation includes a configurable parallelism limit and is thoroughly tested for correctness and performance benefits.

Highlights

  • Performance Optimization: Introduced a buffering approach for Namespace Merkle Tree (NMT) root calculation to reduce memory allocations and improve performance, especially in high-throughput scenarios.
  • NMT Pooling: Implemented a treePool to manage and reuse resizeableBufferTree instances, which are NMTs with pre-allocated buffers for share data.
  • Controlled Parallelism: Added parallelOps to the dataSquare structure and computeRoots method, allowing for a configurable limit on concurrent operations during root computation using errgroup.Group.SetLimit.
  • New API for Buffered Computation: Provided a new function ComputeExtendedDataSquareWithBuffer that leverages the buffered NMTs and controlled parallelism for extended data square computation.
  • Comprehensive Testing and Benchmarking: Added new benchmarks (BenchmarkEDSRootsWithBufferedErasuredNMT) and correctness tests (TestNmtVsBufferedNmtSameRoot, TestComputeExtendedDataSquareVsWithBuffer) to validate the new buffered approach and compare its performance and results against the standard method.
  • Dependency Update: Updated the github.com/celestiaorg/nmt dependency to a newer version to support the new buffering features.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a buffering and pooling mechanism for NMT root calculation to improve performance by reducing memory allocations. This is achieved through a treePool of resizeableBufferTrees, with concurrency managed by a new parallelOps setting. The changes are logical and supported by new benchmarks and correctness tests that validate the buffered approach against the existing implementation.

I've found one critical issue and one of medium severity:

  1. The new file nmtbuffered_tree_test.go contains production code but is named as a test file, which will cause build issues.
  2. There's a redundant check in the Push method of resizeableBufferTree.

Details are in the review comments below. Once these are addressed, this looks like a solid performance improvement.

Comment on lines +129 to +131
if nidDataLen > t.bufferEntrySize {
return fmt.Errorf("data is too large to be used with allocated buffer")
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This check is redundant and can be removed. nidDataLen is calculated as t.namespaceSize + len(data), and t.bufferEntrySize is t.namespaceSize + shareSize. The newDataSquare function already ensures that len(data) is equal to shareSize for all non-nil shares. Therefore, nidDataLen will always be equal to t.bufferEntrySize, and this condition will never be met.

evan-forbes
evan-forbes previously approved these changes Sep 19, 2025
rach-id
rach-id previously approved these changes Sep 19, 2025
@mcrakhman mcrakhman marked this pull request as ready for review September 22, 2025 22:08
rootulp
rootulp previously approved these changes Sep 23, 2025
@mcrakhman mcrakhman dismissed stale reviews from rootulp, rach-id, and evan-forbes via f9a870d September 24, 2025 16:21
@mcrakhman mcrakhman merged commit 97ea8c7 into main Sep 24, 2025
6 checks passed
@mcrakhman mcrakhman deleted the mcrakhman/buffered-nmt branch September 24, 2025 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants