-
Notifications
You must be signed in to change notification settings - Fork 38.7k
merkle: pre‑reserve leaves to prevent reallocs with odd vtx count #32497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
merkle: pre‑reserve leaves to prevent reallocs with odd vtx count #32497
The head ref may contain hidden characters: "l0rinc/pre\u2011reserve-merkle-leaves-to-max"
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32497. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please copy-paste ConflictsNo conflicts as of last run. |
luke-jr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK main commit. Not sure why the benchmark is being changed.
l0rinc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review!
yuvicc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
Memory allocation/deallocation is slow and expensive.
|
I have thought on changing the benchmark tests as we would want to test the worst case scenario right? or else we could have both case for testing? |
yuvicc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmark Testing
master@5757de4ddd37f9321ee6b338b40888fd3561fc00
- With 9000
| ns/leaf | leaf/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 52.62 | 19,005,746.07 | 0.2% | 0.01 | `MerkleRoot`
| 52.64 | 18,998,504.40 | 0.3% | 0.01 | `MerkleRoot`
| 52.63 | 18,999,727.67 | 0.2% | 0.01 | `MerkleRoot`
- With 9001
| ns/leaf | leaf/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 53.50 | 18,693,063.88 | 0.3% | 0.01 | `MerkleRoot`
| 53.53 | 18,681,211.49 | 0.5% | 0.01 | `MerkleRoot`
| 53.49 | 18,694,053.87 | 0.5% | 0.01 | `MerkleRoot`
Commit 39b6c139bd6be33699af781f1d71f6fed303d468
- With 9000
| ns/leaf | leaf/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 52.51 | 19,043,628.95 | 0.2% | 0.01 | `MerkleRoot`
| 52.52 | 19,040,989.96 | 0.2% | 0.01 | `MerkleRoot`
| 52.53 | 19,036,358.39 | 0.2% | 0.01 | `MerkleRoot`
- With 9001
| ns/leaf | leaf/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 53.40 | 18,713,525.67 | 0.3% | 0.01 | `MerkleRoot`
| 53.44 | 19,314,655.73 | 0.3% | 0.01 | `MerkleRoot`
| 53.41 | 18,883,462.75 | 0.3% | 0.01 | `MerkleRoot`
39b6c13 to
4cc5942
Compare
|
Updated the benchmark to showcase the before/after state better (resembles production code changes), by splitting out the vector copies to the unmetered lambda and having an odd number of elements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK 4cc5942895673591de4edceb6dd0c7188c302a72
Testing
Before Master@9a7eece5a4
| ns/leaf | leaf/s | err% | total | benchmark |
|---|---|---|---|---|
| 53.75 | 18,605,160.70 | 0.7% | 0.01 | MerkleRoot |
| 53.64 | 18,643,002.35 | 0.1% | 0.01 | MerkleRoot |
| 53.91 | 18,548,704.52 | 0.4% | 0.01 | MerkleRoot |
After commit@4cc5942895673591de4edceb6dd0c7188c302a72
| ns/leaf | leaf/s | err% | total | benchmark |
|---|---|---|---|---|
| 57.67 | 17,340,305.39 | 0.0% | 0.52 | MerkleRoot |
| 57.69 | 17,332,534.96 | 0.0% | 0.52 | MerkleRoot |
| 58.14 | 17,199,057.10 | 0.0% | 0.52 | MerkleRoot |
maflcko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure how meaningful this is, but it looks like there are a bunch of unrelated style changes that don't seem to have a benefit?
4cc5942 to
a1f2a4c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes revert the original benchmark's behavior of including the leaf vector in the benchmark (more explicitly and reserved like the production code).
I've also changed the emplace_back calls for the hashes to push_bash and reverted the original loop conditions to minimize the diff in critical code sections.
Note to yuvicc: previous benchmark version was measuring something else than master (only the merkle call without the vector copy), so we can't directly compare the results.
The new results don't show any significant speed increase, only the memory allocation is more predictable - which isn't captured by the benchmark anymore.
a1f2a4c to
a73dd45
Compare
|
I've rebased the changed and adjusted the benchmark to be more similar to the other production usages in the first commit, rounded to even in the second (optimization) commit, so that we can realistically measure the speed difference before and after: % build/bin/bench_bitcoin -filter='MerkleRoot' --min-time=1000 Before 7f620cffebee593e48434cf182cc2fd64a6d76be:
After a73dd45e95f5e909d0950ba7c76225589df74854:
|
a73dd45 to
be8f305
Compare
|
lgtm re-ACK be8f3053a7ad526823f32f1a70847b9c06c4c54b Before e87430ed5fad6f9d90c1e7d256b9b8276b936c24
After be8f3053a7ad526823f32f1a70847b9c06c4c54b
|
be8f305 to
6d1950d
Compare
|
Rebased and added a new commit on top for deduplicating the integer rounding. Ready for review again. |
6d1950d to
2d1a4e7
Compare
|
🚧 At least one of the CI tasks failed. HintsTry to run the tests locally, according to the documentation. However, a CI failure may still
Leave a comment here, if you need help tracking down a confusing failure. |
|
LGTM!
|
l0rinc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest push:
- rebases against latest master;
- reverts code back to simply change
leaves.resize(block.vtx.size())constructs before callingComputeMerkleRootto.reserve((block.vtx.size() + 1) & ~1ULL);
** added godbolt reproducer for each important combination of compiler and architecture - Separates the test changes to a separate commit explaining the motivation in the commit message
- updated commit messages and PR descriptions.
de6a283 to
e3b02f1
Compare
hodlinator
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed e3b02f1539e59a6a20a00b4161219b5aba6fefed
Thanks for reverting to a simpler version of this change!
2 inline questions.
14a4dca to
8af0a3a
Compare
hodlinator
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 8af0a3a
While the benefit of this change to consensus code is low, it's been simplified back to a more easily reviewable version.
The benchmark now has two parts, one that detects mutation issues and one which does not, corresponding to the non-witness & witness merkle root computations performed by Bitcoin Core on mainnet.
One extra level of paranoia?
I think this change really is simple enough to approve as-is. It's maybe not the same risk-level as changing standard library major versions, but adjacent to it. If people still object, one idea to improve confidence could be to splice in a new commit before the last one - it could add BlockMerkleRootNew() and BlockWitnessMerkleRootNew() alongside the old ones, and have a fuzz test which compares results between new & old versions. Then the last commit could merge new & old together, keeping the old name but using the new implementations, and removing the fuzz test again.
Benchmarks
Expand
₿ cmake --build build -t bench_bitcoin && build/bin/bench_bitcoin -filter="MerkleRoot" -min-time=100005fd62ed "bench: make MerkleRoot benchmark more representative"
| ns/leaf | leaf/s | err% | ins/leaf | cyc/leaf | IPC | bra/leaf | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 115.53 | 8,655,444.89 | 0.0% | 1,216.23 | 425.94 | 2.855 | 2.14 | 0.3% | 11.03 | `MerkleRoot`
| 116.84 | 8,558,882.10 | 0.0% | 1,231.24 | 430.78 | 2.858 | 3.14 | 0.2% | 10.99 | `MerkleRootWithMutation`
8af0a3a "validation: pre-reserve leaves to prevent reallocs with odd vtx count"
| ns/leaf | leaf/s | err% | ins/leaf | cyc/leaf | IPC | bra/leaf | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 113.25 | 8,830,206.60 | 0.0% | 1,223.17 | 417.50 | 2.930 | 2.63 | 0.2% | 10.99 | `MerkleRoot`
| 114.67 | 8,720,906.94 | 0.0% | 1,238.18 | 422.72 | 2.929 | 3.63 | 0.2% | 11.01 | `MerkleRootWithMutation`
Modified 8af0a3a which just does leaves.reserve(hashes.size())
| ns/leaf | leaf/s | err% | ins/leaf | cyc/leaf | IPC | bra/leaf | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 114.81 | 8,709,871.88 | 0.0% | 1,230.23 | 423.26 | 2.907 | 3.64 | 0.2% | 11.01 | `MerkleRoot`
| 116.21 | 8,605,021.23 | 0.0% | 1,245.23 | 428.41 | 2.907 | 4.64 | 0.2% | 11.00 | `MerkleRootWithMutation`
This last one shows that even just switching from resize() to reserve(), without always reserving an even number, does result in a measurable speedup by itself. Explains why I didn't see as big differences in my previous testing (#32497 (comment)) since all those variants used reserve().
vasild
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approach ACK 8af0a3a
src/test/fuzz/integer.cpp
Outdated
| @@ -80,7 +81,8 @@ FUZZ_TARGET(integer, .init = initialize_integer) | |||
| } | |||
| constexpr uint256 u256_min{"0000000000000000000000000000000000000000000000000000000000000000"}; | |||
| constexpr uint256 u256_max{"ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff"}; | |||
| const std::vector<uint256> v256{u256, u256_min, u256_max}; | |||
| std::vector<uint256> v256{u256, u256_min, u256_max}; | |||
| v256.reserve(RoundUpToEven(v256.size())); | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's simpler if the caller does not have to be aware that they should allocate a larger vector to be passed in.
+1
I do not understand why the patch above works :-/ but I like that it is contained inside ComputeMerkleRoot().
Update the integer fuzz test to move the vector into `ComputeMerkleRoot`, matching production usage patterns and avoiding unnecessary copies. Update `merkle_test_BlockWitness` to use an odd number of transactions to ensure the test covers the scenario where leaf duplication occurs. Also switch to `GetWitnessHash` to match `BlockWitnessMerkleRoot` semantics. The manual vector setup retains the exact-size `resize` to explicitly verify the behavior against the calculated root.
8af0a3a to
9f39a83
Compare
hodlinator
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re-ACK 9f39a83
Resolved conflict with #33805 and fixed code nit #32497 (comment).
vasild
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 9f39a83
src/bench/merkle_root.cpp
Outdated
|
|
||
| bool mutated{false}; | ||
| const uint256 root{ComputeMerkleRoot(std::move(leaves), mutate ? &mutated : nullptr)}; | ||
| assert(!mutate || mutated == (root.GetUint64(0) & 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In 1a768c0 "bench: make MerkleRoot benchmark more representative"
I'm having a hard time understanding why root.GetUint64(0) & 1 is an indicator that the merkle root was mutated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it was a leftover anti-optimization trick since we're in a loop now (meant to confuse the compiler, not the reader). But you're right, it seems to work correctly without it: removed the assert and updated the measurements in the commit messages: https://github.com/bitcoin/bitcoin/compare/9f39a8309cb0a23df03937b0b225aff8ce9e1ec6..3dd815f048c80c9a35f01972e0537eb42531aec7
Two versions are run now, one with the mutation calculations, the other without.
To avoid unwanted compiler optimizations, we assert the expected hash, which should inhibit aggressive optimization.
To make the benchmark more similar to production `ComputeMerkleRoot` call sites, the input leaves-copying is made explicit before each run.
> ./build/bin/bench_bitcoin -filter='MerkleRoot.*' -min-time=1000
| ns/leaf | leaf/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 44.18 | 22,634,858.70 | 0.0% | 1.10 | `MerkleRoot`
| 44.66 | 22,390,601.03 | 0.0% | 1.10 | `MerkleRootWithMutation`
Massif memory measurements show the excessive memory reservations:
MB
1.332^ :
| # :
| # :
| # :
| # :
| # @ :
| # @ :
| # @ :
| # @ :
| # @ :
| # @ :
| # @ :
| # @ :
| #::::@::::::::::::::::::::::::::::::::::::::::::::::::::::::@:::::@::::
| #: ::@::::: :::::::: :: ::: :::::: : : :: ::: ::: : : : ::::@:::::@::::
| #: ::@::::: :::::::: :: ::: :::::: : : :: ::: ::: : : : ::::@:::::@::::
| #: ::@::::: :::::::: :: ::: :::::: : : :: ::: ::: : : : ::::@:::::@::::
| #: ::@::::: :::::::: :: ::: :::::: : : :: ::: ::: : : : ::::@:::::@::::
| #: ::@::::: :::::::: :: ::: :::::: : : :: ::: ::: : : : ::::@:::::@::::
| #: ::@::::: :::::::: :: ::: :::::: : : :: ::: ::: : : : ::::@:::::@::::
0 +----------------------------------------------------------------------->s
0 226.2
showing the reallocations clearly in the stacks:
97.87% (1,366,841B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
->41.25% (576,064B) 0x969717: allocate (new_allocator.h:151)
| ->41.25% (576,064B) 0x969717: allocate (allocator.h:203)
| ->41.25% (576,064B) 0x969717: allocate (alloc_traits.h:614)
| ->41.25% (576,064B) 0x969717: _M_allocate (stl_vector.h:387)
| ->41.25% (576,064B) 0x969717: _M_realloc_append<const uint256&> (vector.tcc:572)
| ->41.25% (576,064B) 0x969717: push_back (stl_vector.h:1427)
| ->41.25% (576,064B) 0x969717: ComputeMerkleRoot(std::vector<uint256, std::allocator<uint256> >, bool*) (merkle.cpp:55)
| ->41.25% (576,064B) 0x2235A7: operator() (merkle_root.cpp:31)
| ->41.25% (576,064B) 0x2235A7: ankerl::nanobench::Bench& ankerl::nanobench::Bench::run<MerkleRoot(ankerl::nanobench::Bench&)::{lambda()
Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>
`ComputeMerkleRoot` duplicates the last hash when the input size is odd. If the caller provides a `std::vector` whose capacity equals its size, that extra `push_back` forces a reallocation, doubling its capacity (allocating 3x the necessary memory). This affects roughly half of the created blocks (those with odd transaction counts), causing unnecessary memory fragmentation during every block validation. Fix this by pre-reserving the vector capacity to account for the odd-count duplication. The expression `(size + 1) & ~1ULL` adds 1 to the size and clears the last bit, effectively rounding up to the next even number. This syntax produces optimal assembly across x86/ARM and 32/64-bit platforms for gcc/clang, see https://godbolt.org/z/xzscoq7nv. Also switch from `resize` to `reserve` + `push_back` to eliminate the default construction of `uint256` objects that are immediately overwritten. > ./build/bin/bench_bitcoin -filter='MerkleRoot.*' -min-time=1000 | ns/leaf | leaf/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 43.73 | 22,867,350.51 | 0.0% | 1.10 | `MerkleRoot` | 44.17 | 22,640,349.14 | 0.0% | 1.10 | `MerkleRootWithMutation` Massif memory measurements after show 0.8 MB peak memory usage KB 801.4^ # | # | # | # | # | # | # | # :::::@:::::@: | #:::@@@::@:::::::::::::::@::@:@:::@@:::::::::@::::::@:::::@::::@:::::@: | #:::@ @: @:::::::::::::::@::@:@:::@ :::: ::::@::::::@:::::@::::@:::::@: | #:::@ @: @:::::::::::::::@::@:@:::@ :::: ::::@::::::@:::::@::::@:::::@: | #:::@ @: @:::::::::::::::@::@:@:::@ :::: ::::@::::::@:::::@::::@:::::@: | #:::@ @: @:::::::::::::::@::@:@:::@ :::: ::::@::::::@:::::@::::@:::::@: | #:::@ @: @:::::::::::::::@::@:@:::@ :::: ::::@::::::@:::::@::::@:::::@: | #:::@ @: @:::::::::::::::@::@:@:::@ :::: ::::@::::::@:::::@::::@:::::@: | #:::@ @: @:::::::::::::::@::@:@:::@ :::: ::::@::::::@:::::@::::@:::::@: | #:::@ @: @:::::::::::::::@::@:@:::@ :::: ::::@::::::@:::::@::::@:::::@: | #:::@ @: @:::::::::::::::@::@:@:::@ :::: ::::@::::::@:::::@::::@:::::@: | #:::@ @: @:::::::::::::::@::@:@:::@ :::: ::::@::::::@:::::@::::@:::::@: | #:::@ @: @:::::::::::::::@::@:@:::@ :::: ::::@::::::@:::::@::::@:::::@: 0 +----------------------------------------------------------------------->s 0 227.5 and the stacks don't show reallocs anymore: 96.37% (790,809B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc. ->35.10% (288,064B) 0x2234AF: allocate (new_allocator.h:151) | ->35.10% (288,064B) 0x2234AF: allocate (allocator.h:203) | ->35.10% (288,064B) 0x2234AF: allocate (alloc_traits.h:614) | ->35.10% (288,064B) 0x2234AF: _M_allocate (stl_vector.h:387) | ->35.10% (288,064B) 0x2234AF: reserve (vector.tcc:79) | ->35.10% (288,064B) 0x2234AF: ToMerkleLeaves<std::vector<uint256>, MerkleRoot(ankerl::nanobench::Bench&)::<lambda()>::<lambda(bool, const auto:46&)> > (merkle.h:19) | ->35.10% (288,064B) 0x2234AF: operator() (merkle_root.cpp:25) | ->35.10% (288,064B) 0x2234AF: ankerl::nanobench::Bench& ankerl::nanobench::Bench::run<MerkleRoot(ankerl::nanobench::Bench&)::{lambda() Co-authored-by: optout21 <13562139+optout21@users.noreply.github.com> Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>
9f39a83 to
3dd815f
Compare
hodlinator
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re-ACK 3dd815f
Changes since previous ACK (#32497 (review)):
- Remove confusing assert in benchmark (#32497 (comment)).
- Updated benchmark timing results in commit messsages.
vasild
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 3dd815f
w0xlt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 3dd815f with minor nits.
On my machine, the Massif data clearly shows the improvement.
Benchmark Results
| Version | Benchmark | ns/leaf | leaf/s |
|---|---|---|---|
| After (optimized) | MerkleRoot | 46.50 | 21.5M |
| After (optimized) | MerkleRootWithMutation | 45.97 | 21.8M |
| Before (master) | MerkleRoot | 47.19 | 21.2M |
Massif Memory Results
| Metric | Before (master) | After (optimized) | Change |
|---|---|---|---|
| Peak memory | 1.34 MB | 802 KB | -40% |
| Reallocation overhead | 576 KB (41%) | 0 | Eliminated |
- Memory reduction: 1.34 MB → 0.80 MB
- Reallocation eliminated: The _M_realloc_insert stack trace is gone entirely
- No performance regression: ~47 ns/leaf before and after
| hashes.resize(block.vtx.size()); | ||
| hashes[0].SetNull(); | ||
| hashes[1] = block.vtx[1]->GetHash().ToUint256(); | ||
| hashes.resize(vtx_count); // Note: leaving odd count to exercise old behavior |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "Old behavior" is not clear to me - leaf duplication for odd counts is the normal merkle tree algorithm, not legacy behavior, isn't it ?.
| hashes.resize(vtx_count); // Note: leaving odd count to exercise old behavior | |
| hashes.resize(vtx_count); // Odd count exercises leaf duplication in ComputeMerkleRoot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
old behavior = not properly sized.
You're also referring to it as "old": #32497 (comment)
I don't mind adjusting this in a followup
| hashes[0].SetNull(); | ||
| hashes[1] = block.vtx[1]->GetHash().ToUint256(); | ||
| hashes.resize(vtx_count); // Note: leaving odd count to exercise old behavior | ||
| for (size_t pos{1}; pos < vtx_count; ++pos) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: The old code explicitly had hashes[0].SetNull(). Now hashes[0] relies on implicit zero-initialization from resize(). A brief comment would clarify intent:
| for (size_t pos{1}; pos < vtx_count; ++pos) { | |
| // hashes[0] is zero (coinbase witness hash) from resize() | |
| for (size_t pos{1}; pos < vtx_count; ++pos) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do that in a follow-up if you think it's necessarry
| std::vector<uint256> leaves; | ||
| leaves.resize(block.vtx.size()); | ||
| leaves[0].SetNull(); // The witness hash of the coinbase is 0. | ||
| leaves.reserve((block.vtx.size() + 1) & ~1ULL); // capacity rounded up to even |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pattern (size + 1) & ~1ULL with identical comment appears 4 times across 3 files:
- merkle.cpp:69
- merkle.cpp:79
- signet.cpp:61
- merkle_root.cpp:27
A small helper would reduce duplication and can be arguably more readable: "add 1 if odd":
constexpr size_t RoundUpToEven(size_t n) { return n + (n & 1); }Also: (size + 1) & ~1ULL works but assumes 64-bit arithmetic. assumptions.h explicitly supports 32-bit size_t.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's what I did originally, but reviewers found it too complicated
Summary
ComputeMerkleRootduplicates the last hash when the input size is odd. If the caller provides astd::vectorwhose capacity equals its size, that extrapush_backforces a reallocation, doubling its capacity (causing peak memory usage of 3x the necessary size).This affects roughly half of the created blocks (those with odd transaction counts), causing unnecessary memory fragmentation during every block validation.
Fix
(size + 1) & ~1ULL.uint256objects that are immediately overwritten by switching fromresizetoreserve+push_back.Memory Impact
Memory profiling shows 50% reduction in peak allocation (576KB → 288KB) and elimination of reallocation overhead.
Validation
The benchmark was updated to use an odd leaf count to demonstrate the real-world scenario where the reallocation occurs.
A full
-reindex-chainstateup to block 896 408 ran without triggering the asserts.Validation asserts
Temporary asserts (not included in this PR) confirm that
push_backnever reallocates and that the coinbase witness hash remains null:Benchmark Performance
While the main purpose is to improve predictability, the reduced memory operations also improve hashing throughput slightly.