refactor: optimize block index comparisons (1.4-6.8x faster)#33637
refactor: optimize block index comparisons (1.4-6.8x faster)#33637l0rinc wants to merge 7 commits intobitcoin:masterfrom
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33637. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please copy-paste ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
|
Approach ACK I have tested the new block index comparator but I’ll refrain from acking the added benchmarks/tests |
f74572e to
c15d839
Compare
|
I attempted to run the script, not really sure what these results indicate. Just pasting what the results were Details |
|
Thanks for the measurements @Christewart, this is how your measurements compare to mine (but most importantly how it compares to |
laanwj
left a comment
There was a problem hiding this comment.
Code review ACK c15d839aefba017e64f14cd9cd2779655352f4b6
i did not run the benchmarks but the code changes look good, using <=> makes sense.
c15d839 to
a558471
Compare
|
Rebased after #29640 - only change was adjusting the comment |
mzumsande
left a comment
There was a problem hiding this comment.
Concept ACK
Not 100% sure yet about 2dd0f2ced35a268bcab661c671e7c70271cdd91f, seems like inlining gives most of the speedup, whereas the gain of using the spaceship operator (which comes at the cost of readability) is marginal.
src/test/blockchain_tests.cpp
Outdated
| if (pa->nChainWork > pb->nChainWork) return false; | ||
| if (pa->nChainWork < pb->nChainWork) return true; | ||
|
|
||
| // ... then by earliest time received, ... |
There was a problem hiding this comment.
should take the latest comments from the old function after #29640
There was a problem hiding this comment.
I have already done that, which parts do you think I'm missing? Edit: ah in the tests, good point, let me update it
src/test/blockchain_tests.cpp
Outdated
| @@ -117,6 +117,43 @@ BOOST_AUTO_TEST_CASE(num_chain_tx_max) | |||
| BOOST_CHECK_EQUAL(block_index.m_chain_tx_count, std::numeric_limits<uint64_t>::max()); | |||
| } | |||
|
|
|||
| BOOST_AUTO_TEST_CASE(cblockindex_comparator_equivalence) | |||
There was a problem hiding this comment.
this looks like a fuzz test, why not make it an actual fuzz target out of this?
There was a problem hiding this comment.
I look at fuzz tests as exploratory tools that are hard to write and run and debug and do code coverage on - not trivial to maintain (see #33731), it's definitely not my go-to way to test something. This is just a randomized property based unit test - easy to debug, easy to run and understand.
But looking at the tests again, I can make it more deterministic so that it hits all important branches which allows me to reduce the iteration count - code coverage and debugging shows this regularly tests each branch now (green line on left side):

Added you as coauthor.
There was a problem hiding this comment.
I look at fuzz tests as exploratory tools that are hard to write and run and debug and do code coverage ... not trivial to maintain ... not my go-to way to test something
There is a learning curve to writing and using fuzz testing but it is 100% worth getting in to. The maintenance aspect only relates to running them using libFuzzer on macOS.
On average, coverage guided fuzzing will be magnitudes better at finding bugs than this style of unit test. For something as small as the comparator your approach is probably good enough, although you could just have a fixed list of test cases that enumerate all possibilities (for the old comparator) instead? (maybe you do already, didn't look)
src/test/blockchain_tests.cpp
Outdated
| @@ -117,6 +117,43 @@ BOOST_AUTO_TEST_CASE(num_chain_tx_max) | |||
| BOOST_CHECK_EQUAL(block_index.m_chain_tx_count, std::numeric_limits<uint64_t>::max()); | |||
| } | |||
|
|
|||
| BOOST_AUTO_TEST_CASE(cblockindex_comparator_equivalence) | |||
There was a problem hiding this comment.
would suggest to add a bit more explanation, e.g. mentioning that old_comparator is a snapshot of an old implementation, so that people who look at this in years understand where it's coming from.
There was a problem hiding this comment.
I strongly dislike code comments, I prefer explaining with live code over dead comments - and if people disagree they can always do a blame which instantly reveals the purpose since I also over-explain in commit messages usually.
Are the names old_comparator in a cblockindex_comparator_equivalence not enough to make it obvious that? I
I have added a comment anyway, let me know if it helps.
src/node/blockstorage.h
Outdated
| @@ -87,12 +87,20 @@ static constexpr uint32_t UNDO_DATA_DISK_OVERHEAD{STORAGE_HEADER_BYTES + uint256 | |||
| using BlockMap = std::unordered_map<uint256, CBlockIndex, BlockHasher>; | |||
|
|
|||
| struct CBlockIndexWorkComparator { | |||
| bool operator()(const CBlockIndex* pa, const CBlockIndex* pb) const; | |||
| // First sort by most total work (descending), then by earliest activatable time (ascending), then by pointer value (ascending). | |||
There was a problem hiding this comment.
I find "ascending/descending" confusing here, I would have expected the opposites: "descending work" means most work first, but the strict weak ordering of C++ comparators would place the one with the lower work first.
Would suggest something like
"First compare by work (less first), then by earliest activatable time (higher sequence first), then by pointer value (higher first)."
There was a problem hiding this comment.
I wanted to use the SQL order-by terminology here
Ascending order puts smaller values first, where “smaller” is defined in terms of the < operator.
Similarly, descending order is determined with the > operator.
But you're right, I inverted the terminology, added a dedicated test to check for hard-coded order (first param is increasing, second is decreasing), updated the comments (this is why I find comments a lazy solution, it's too easy to keep them up-to-date), added you as coauthor.
da5eb75 to
5cbb9a4
Compare
|
Pushed, thanks for the review, you had a few good points, added you as coauthor.
the spaceship might not be very familiar to us yet, but it's arguably simpler, having fewer moving parts, and some compilers prefer that over the manual comparisons. |
|
utACK 5cbb9a40873203ea5a4dd0aa7127c9756da2f607 An evident, localized micro-optimization in a hot-path. I haven't run measurements. |
5cbb9a4 to
237eec0
Compare
I don't follow here. The issue is about the additional Instead of mentioning an unrelated issue, I think it could be better to mention what real-world effect this improvement has. I think that is |
The original issue won't be fixed since it's not exactly a bug, but we could still speed up the regression by optimizing another bottleneck.
Yes, this was mentioned in the benchmarking section.
Yes, it's used explicitly in git grep 'setBlockIndexCandidates.' src/validation.cpp | wc -l
44 |
| // Pointer tiebreak should only happen with blocks loaded from disk, as those share the same id: 0 for blocks on the best chain, 1 for all others. | ||
| bool operator()(const CBlockIndex* pa, const CBlockIndex* pb) const noexcept | ||
| { | ||
| return std::tie(pa->nChainWork, pb->nSequenceId, pb) |
There was a problem hiding this comment.
I thought of eliminating the worst case here (when pa == pb, where it has to do the comparison for every byte) with something like:
return pa != pb &&
std::tie(pa->nChainWork, pb->nSequenceId, pb)
< std::tie(pb->nChainWork, pa->nSequenceId, pa);But the benchmarks indicate that it's not really faster.
benchmark with pointer duplicates
// Copyright (c) 2025-present The Bitcoin Core developers
// Distributed under the MIT software license, see the accompanying
// file COPYING or http://www.opensource.org/licenses/mit-license.php.
#include <arith_uint256.h>
#include <bench/bench.h>
#include <chain.h>
#include <node/blockstorage.h>
#include <random.h>
#include <uint256.h>
#include <algorithm>
#include <memory>
#include <vector>
static void CBlockIndexWorkComparator(benchmark::Bench& bench)
{
FastRandomContext rng{/*fDeterministic=*/true};
constexpr size_t n{1'000};
std::vector<std::shared_ptr<CBlockIndex>> blocks;
blocks.reserve(n);
for (size_t i{0}; i < n; ++i) {
auto block{std::make_shared<CBlockIndex>()};
block->nChainWork = UintToArith256(rng.rand256());
block->nSequenceId = int32_t(rng.rand32());
if (i % 10 == 1) {
// Have some duplicates
if (rng.randbool()) {
block = blocks.back();
} else {
if (rng.randbool()) block->nChainWork = blocks.back()->nChainWork;
if (rng.randbool()) block->nSequenceId = blocks.back()->nSequenceId;
}
}
blocks.push_back(std::move(block));
}
std::ranges::shuffle(blocks, rng);
const node::CBlockIndexWorkComparator comparator;
bench.batch(n * n).unit("cmp").run([&] {
for (size_t i{0}; i < n; ++i) {
const auto* lhs{blocks[i].get()};
for (size_t j{0}; j < n; ++j) {
const auto* rhs{blocks[j].get()};
const bool result{comparator(lhs, rhs)};
ankerl::nanobench::doNotOptimizeAway(result);
}
}
});
}
BENCHMARK(CBlockIndexWorkComparator, benchmark::PriorityLevel::HIGH);b17504b refactor: optimize CBlockIndexWorkComparator with std::tie
| ns/cmp | cmp/s | err% | ins/cmp | cyc/cmp | IPC | bra/cmp | miss% | total | benchmark |
|---|---|---|---|---|---|---|---|---|---|
| 2.13 | 469,485,340.38 | 0.2% | 15.07 | 5.09 | 2.959 | 4.02 | 0.1% | 11.01 | CBlockIndexWorkComparator |
| ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |
|---|---|---|---|---|---|---|---|---|---|
| 90,265.10 | 11,078.48 | 0.1% | 637,536.00 | 215,946.18 | 2.952 | 178,844.00 | 0.0% | 10.84 | CheckBlockIndex |
2b77e56 refactor: short-circuit CBlockIndex comparator for equality
| ns/cmp | cmp/s | err% | ins/cmp | cyc/cmp | IPC | bra/cmp | miss% | total | benchmark |
|---|---|---|---|---|---|---|---|---|---|
| 2.60 | 385,002,212.20 | 0.0% | 18.00 | 6.22 | 2.894 | 5.00 | 0.0% | 11.00 | CBlockIndexWorkComparator |
| ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |
|---|---|---|---|---|---|---|---|---|---|
| 94,110.53 | 10,625.80 | 0.0% | 682,454.03 | 225,242.05 | 3.030 | 186,486.01 | 0.0% | 11.00 | CheckBlockIndex |
|
Rebased after 4f65a1c#diff-f14e9423bd43609969886b2e42751c55606344e0b71780564cdb23515802cec0R12, ready for review again. |
| ns/op | op/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 131.79 | 7,588,049.73 | 0.1% | 10.78 | `PrivateBroadcastBench`
|
Rebased to fix silent merge conflict. |
6e9061f to
a37cd53
Compare
|
🚧 At least one of the CI tasks failed. HintsTry to run the tests locally, according to the documentation. However, a CI failure may still
Leave a comment here, if you need help tracking down a confusing failure. |
a37cd53 to
78b4306
Compare
|
Rebased after conflict with https://github.com/bitcoin/bitcoin/pull/34179/changes#diff-ed3f90693a242b38b9719af171de8f55183576957676dfa358945bea22276bd5R139, ready for review again. |
78b4306 to
80bfeaa
Compare
|
You should really add a fuzz test here to compare old and new behavior as suggested before; it's the perfect use case for it, as it avoids the need for putting presumed knowledge about potential bugs in the unit tests you create. |
80bfeaa to
5ab4789
Compare
Thanks for the hint, I've split the differential fuzzer and kept the simple unit tests to to lock in the expected ordering of both comparators. |
|
As pointed out earlier, most of the speedup comes from a move-only inline (#33637 (review))? I think it would be better to just submit a trivial to review move-only change stand-alone, which wouldn't need in-depth review and dedicated differential unit/fuzz tests. Or at least the move-only change should be the first change. This way, it is easier to see if the other changes are worth it to review/test at all. |
Profiling shows this comparator consumes a significant portion of `CheckBlockIndex()`:
... ChainstateManager::CheckBlockIndex()
... std::_Rb_tree<...>::find(...)
... node::CBlockIndexWorkComparator::operator()(...)
... base_uint<256u>::CompareTo(...) const
This commit introduces a benchmark that performs pairwise comparisons on 1000 randomly generated block indices (with some duplicates) to establish baseline performance metrics before further optimization.
| ns/cmp | cmp/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 4.35 | 229,642,637.11 | 0.2% | 1.10 | `CBlockIndexWorkComparator`
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 38,213.19 | 26,168.97 | 0.5% | 1.10 | `CheckBlockIndex`
…nt256` This test was added before other changes to create a baseline behavior.
Move `CBlockIndexWorkComparator::operator()` and `CBlockIndexHeightOnlyComparator::operator()` to the header to facilitate inlining. | ns/cmp | cmp/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 2.87 | 348,283,873.30 | 0.2% | 1.10 | `CBlockIndexWorkComparator` | ns/op | op/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 36,534.37 | 27,371.49 | 0.3% | 1.10 | `CheckBlockIndex`
Add equivalence coverage to validate behavior while optimizing comparison operations. The originals were kept exactly as the previous implementations were done. The `arith_uint256` equivalence check is implemented as the `arith_uint256_comparison_equivalence` fuzz target. It compares the refactored `operator<=>` against a snapshot of the old word-by-word comparison for all operators (<, >, <=, >=, ==, !=). The `CBlockIndexWorkComparator` equivalence check is implemented as the `block_index_work_comparator` fuzz target. It compares the refactored comparator against a snapshot of the old comparator logic (chain work, sequence ID, pointer tiebreak). Add deterministic unit tests to lock in the expected ordering of both comparators. Co-authored-by: Martin Zumsande <mzumsande@gmail.com>
Replace manual comparison branches with a single tuple comparison, allowing the compilers to generate more efficient comparison code. Made them explicitly `noexcept`, which can enable small optimizer wins and aligns with expectations for hot-path comparators used in ordered containers. | ns/cmp | cmp/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 2.00 | 499,008,850.73 | 0.2% | 1.10 | `CBlockIndexWorkComparator` | ns/op | op/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 32,297.54 | 30,962.11 | 0.9% | 1.10 | `CheckBlockIndex` Co-authored-by: Martin Zumsande <mzumsande@gmail.com>
Remove the `CompareTo` method and inline its logic directly into `operator<=>`, updating related comments. This eliminates function call overhead in the hot path during block generation and chain selection. The comparison algorithm remains unchanged, iterating from most significant to least significant byte, but now returns `std::strong_ordering` directly, removing an extra conversion and enabling better inlining. | ns/cmp | cmp/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 2.18 | 459,159,880.91 | 0.2% | 1.10 | `CBlockIndexWorkComparator` | ns/op | op/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 31,780.27 | 31,466.07 | 0.5% | 1.10 | `CheckBlockIndex`
Replace multiple comparisons with a single C++20 spaceship operator call directly. | ns/cmp | cmp/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 0.68 | 1,477,110,677.23 | 0.2% | 1.06 | `CBlockIndexWorkComparator` | ns/op | op/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 27,528.11 | 36,326.51 | 0.2% | 1.10 | `CheckBlockIndex`
That's actually what enables the other ones to shine, but I don't mind reordering the commits to make it obvious.
Measurementsfor commit in 8b54cc2e71b015b651d7dcbfc3033c1f76c37e4e 9304b3b225f4dcc06b007d073035a586e1806951 0473240b4f8d29016778ebd73745c610c2b437e8 d64d9432a06879e57702f8188a764eb3fc8f2751 c2f69e78ba69ee4232c5190a41509e142b039e8b 7cb0c5f1fc70d57277689d114ab2da14af2a0696 df0153a846036f94e51ab70b7c91491b138d3569; do \
git fetch origin $commit >/dev/null 2>&1 && git checkout $commit >/dev/null 2>&1 && echo "" && git log -1 --pretty='%h %s' && \
rm -rfd build >/dev/null 2>&1 && cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release >/dev/null 2>&1 && \
cmake --build build -j$(nproc) >/dev/null 2>&1 && \
for _ in $(seq 3); do \
sleep 5; \
sudo taskpolicy -t 5 -l 5 nice -n -20 ./build/bin/bench_bitcoin -filter='CBlockIndexWorkComparator|CheckBlockIndex' -min-time=1000; \
done; \
done
8b54cc2e71 bench: add benchmark to measure `CBlockIndexWorkComparator` performance
| ns/cmp | cmp/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 4.41 | 226,584,742.96 | 0.8% | 1.10 | `CBlockIndexWorkComparator`
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 38,253.37 | 26,141.49 | 0.9% | 1.09 | `CheckBlockIndex`
| ns/cmp | cmp/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 4.35 | 229,642,637.11 | 0.2% | 1.10 | `CBlockIndexWorkComparator`
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 38,213.19 | 26,168.97 | 0.5% | 1.10 | `CheckBlockIndex`
| ns/cmp | cmp/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 4.37 | 228,881,057.73 | 0.3% | 1.10 | `CBlockIndexWorkComparator`
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 38,287.78 | 26,117.99 | 0.4% | 1.10 | `CheckBlockIndex`
9304b3b225 test: add sorting tests for `CBlockIndexWorkComparator` and `arith_uint256`
| ns/cmp | cmp/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 4.39 | 227,961,262.13 | 0.5% | 1.10 | `CBlockIndexWorkComparator`
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 38,333.93 | 26,086.55 | 0.2% | 1.10 | `CheckBlockIndex`
| ns/cmp | cmp/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 4.39 | 227,728,690.97 | 0.7% | 1.11 | `CBlockIndexWorkComparator`
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 39,671.02 | 25,207.32 | 0.3% | 1.10 | `CheckBlockIndex`
| ns/cmp | cmp/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 4.40 | 227,294,352.21 | 0.5% | 1.09 | `CBlockIndexWorkComparator`
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 38,971.06 | 25,660.07 | 1.1% | 1.10 | `CheckBlockIndex`
0473240b4f move-only: inline `CBlockIndex` comparators to header
| ns/cmp | cmp/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 2.88 | 347,302,616.35 | 0.3% | 1.10 | `CBlockIndexWorkComparator`
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 36,701.33 | 27,246.97 | 0.3% | 1.10 | `CheckBlockIndex`
| ns/cmp | cmp/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 2.87 | 348,283,873.30 | 0.2% | 1.10 | `CBlockIndexWorkComparator`
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 36,790.73 | 27,180.76 | 0.6% | 1.10 | `CheckBlockIndex`
| ns/cmp | cmp/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 2.87 | 348,095,671.92 | 0.2% | 1.09 | `CBlockIndexWorkComparator`
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 36,534.37 | 27,371.49 | 0.3% | 1.10 | `CheckBlockIndex`
d64d9432a0 test/fuzz: add equivalence coverage for comparison refactors
| ns/cmp | cmp/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 2.88 | 347,680,941.08 | 0.5% | 1.10 | `CBlockIndexWorkComparator`
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 37,044.83 | 26,994.32 | 1.1% | 1.08 | `CheckBlockIndex`
| ns/cmp | cmp/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 2.90 | 345,339,421.65 | 0.6% | 1.10 | `CBlockIndexWorkComparator`
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 37,174.31 | 26,900.30 | 1.4% | 1.09 | `CheckBlockIndex`
| ns/cmp | cmp/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 2.88 | 346,925,394.89 | 0.3% | 1.10 | `CBlockIndexWorkComparator`
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 37,287.05 | 26,818.96 | 1.6% | 1.09 | `CheckBlockIndex`
c2f69e78ba refactor: optimize `CBlockIndexWorkComparator` with `std::tie`
| ns/cmp | cmp/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 2.00 | 499,008,850.73 | 0.2% | 1.10 | `CBlockIndexWorkComparator`
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 33,476.29 | 29,871.88 | 1.0% | 1.10 | `CheckBlockIndex`
| ns/cmp | cmp/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 2.03 | 493,360,414.53 | 0.5% | 1.10 | `CBlockIndexWorkComparator`
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 33,342.34 | 29,991.90 | 0.6% | 1.10 | `CheckBlockIndex`
| ns/cmp | cmp/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 2.01 | 498,514,389.19 | 0.5% | 1.10 | `CBlockIndexWorkComparator`
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 32,297.54 | 30,962.11 | 0.9% | 1.10 | `CheckBlockIndex`
7cb0c5f1fc refactor: inline `arith_uint256` comparison operator
| ns/cmp | cmp/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 2.18 | 459,159,880.91 | 0.2% | 1.10 | `CBlockIndexWorkComparator`
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 32,006.20 | 31,243.95 | 0.7% | 1.10 | `CheckBlockIndex`
| ns/cmp | cmp/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 2.18 | 458,414,231.91 | 0.3% | 1.10 | `CBlockIndexWorkComparator`
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 31,780.27 | 31,466.07 | 0.5% | 1.10 | `CheckBlockIndex`
| ns/cmp | cmp/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 2.19 | 457,429,572.18 | 0.2% | 1.10 | `CBlockIndexWorkComparator`
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 32,536.14 | 30,735.06 | 1.2% | 1.09 | `CheckBlockIndex`
df0153a846 refactor: optimize `arith_uint256` comparison with spaceship operator
| ns/cmp | cmp/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 0.68 | 1,473,011,136.06 | 0.5% | 1.06 | `CBlockIndexWorkComparator`
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 27,528.11 | 36,326.51 | 0.2% | 1.10 | `CheckBlockIndex`
| ns/cmp | cmp/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 0.69 | 1,451,170,702.42 | 1.0% | 1.07 | `CBlockIndexWorkComparator`
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 27,875.27 | 35,874.09 | 0.4% | 1.10 | `CheckBlockIndex`
| ns/cmp | cmp/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 0.68 | 1,477,110,677.23 | 0.2% | 1.06 | `CBlockIndexWorkComparator`
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 29,059.25 | 34,412.45 | 0.7% | 1.12 | `CheckBlockIndex`Besides moving the inlines earlier, I also added |
5ab4789 to
703165e
Compare


Context
The comparator is often called either directly or for the sorting of setBlockIndexCandidates, a red-black tree-set containing valid block headers where the comparator is invoked extensively.
Besided the sorted set usages, the comparator is used directly in
ChainstateinFindMostWorkChain,PruneBlockIndexCandidates,ActivateBestChainandInvalidateBlock; and inChainstateManagerinLoadBlockIndex,CheckBlockIndex,ActivateSnapshotandPopulateAndValidateSnapshot.It was profiled during the investigation of #33618 (comment).
Testing
To ensure the optimized implementations are both fast and correct, the first commit adds a dedicated benchmark to measure
CBlockIndexWorkComparatorperformance, and the second commit adds randomized tests comparing the new implementation with the original one.Results
CBlockIndexWorkComparator: 100,772,511 cmp/s → 656,674,205 cmp/s = 6.51x fasterCheckBlockIndex: 9,091 ops/s → 14,765 ops/s = 1.62x fasterCBlockIndexWorkComparator: 100,451,893 cmp/s → 683,414,234 cmp/s = 6.8x fasterCheckBlockIndex: 10,322 ops/s → 14,376 ops/s = 1.39x fasterSee https://corecheck.dev/bitcoin/bitcoin/pulls/33637 for the CI's
CheckBlockIndexperformance.gcc and clang measurements
CBlockIndexWorkComparatorCheckBlockIndexe2e0217 refactor: inline
arith_uint256comparison operatorCBlockIndexWorkComparatorCheckBlockIndex85b74b0 refactor: optimize
arith_uint256comparison with spaceship operatorCBlockIndexWorkComparatorCheckBlockIndexdeb58ee refactor: optimize
CBlockIndexWorkComparatorwith std::tieCBlockIndexWorkComparatorCheckBlockIndexb60450f bench: add benchmark to measure
CBlockIndexWorkComparatorperformanceCBlockIndexWorkComparatorCheckBlockIndexe2e0217 refactor: inline
arith_uint256comparison operatorCBlockIndexWorkComparatorCheckBlockIndex85b74b0 refactor: optimize
arith_uint256comparison with spaceship operatorCBlockIndexWorkComparatorCheckBlockIndexdeb58ee refactor: optimize
CBlockIndexWorkComparatorwith std::tieCBlockIndexWorkComparatorCheckBlockIndexReproducer
Run the equivalence tests with:
Each commit shows how it changes the relevant benchmarks.
Benchmark script
Note: something similar was already started in #33334, but this is a broader optimization that doesn't use the same technique.