refactor: optimize block index comparisons (1.4-6.8x faster) by l0rinc · Pull Request #33637 · bitcoin/bitcoin

l0rinc · 2025-10-16T01:54:31Z

Context

The comparator is often called either directly or for the sorting of setBlockIndexCandidates, a red-black tree-set containing valid block headers where the comparator is invoked extensively.
Besided the sorted set usages, the comparator is used directly in Chainstate in FindMostWorkChain, PruneBlockIndexCandidates, ActivateBestChain and InvalidateBlock; and in ChainstateManager in LoadBlockIndex, CheckBlockIndex, ActivateSnapshot and PopulateAndValidateSnapshot.

It was profiled during the investigation of #33618 (comment).

Testing

To ensure the optimized implementations are both fast and correct, the first commit adds a dedicated benchmark to measure CBlockIndexWorkComparator performance, and the second commit adds randomized tests comparing the new implementation with the original one.

Results

GCC 15.0.1:

CBlockIndexWorkComparator: 100,772,511 cmp/s → 656,674,205 cmp/s = 6.51x faster
CheckBlockIndex: 9,091 ops/s → 14,765 ops/s = 1.62x faster

Clang 22.0.0:

CBlockIndexWorkComparator: 100,451,893 cmp/s → 683,414,234 cmp/s = 6.8x faster
CheckBlockIndex: 10,322 ops/s → 14,376 ops/s = 1.39x faster

See https://corecheck.dev/bitcoin/bitcoin/pulls/33637 for the CI's CheckBlockIndex performance.

gcc and clang measurements

Compiler: gcc 15.0.1
b60450f bench: add benchmark to measure CBlockIndexWorkComparator performance

ns/cmp	cmp/s	err%	ins/cmp	cyc/cmp	IPC	bra/cmp	miss%	total	benchmark
9.92	100,772,511.62	0.0%	63.98	35.64	1.795	14.17	1.9%	5.50	`CBlockIndexWorkComparator`

ns/op	op/s	err%	ins/op	cyc/op	IPC	bra/op	miss%	total	benchmark
109,996.46	9,091.20	0.2%	1,014,421.11	394,979.29	2.568	313,025.11	0.0%	5.50	`CheckBlockIndex`

e2e0217 refactor: inline arith_uint256 comparison operator

ns/cmp	cmp/s	err%	ins/cmp	cyc/cmp	IPC	bra/cmp	miss%	total	benchmark
6.48	154,439,491.66	0.0%	31.01	23.25	1.334	7.16	3.8%	5.50	`CBlockIndexWorkComparator`

ns/op	op/s	err%	ins/op	cyc/op	IPC	bra/op	miss%	total	benchmark
105,754.86	9,455.83	0.1%	913,130.11	379,588.20	2.406	276,692.11	0.0%	5.50	`CheckBlockIndex`

85b74b0 refactor: optimize arith_uint256 comparison with spaceship operator

ns/cmp	cmp/s	err%	ins/cmp	cyc/cmp	IPC	bra/cmp	miss%	total	benchmark
6.37	156,990,488.06	0.0%	28.85	22.87	1.261	8.61	3.2%	5.50	`CBlockIndexWorkComparator`

ns/op	op/s	err%	ins/op	cyc/op	IPC	bra/op	miss%	total	benchmark
83,803.10	11,932.73	0.0%	743,565.09	300,824.84	2.472	232,646.08	0.0%	5.56	`CheckBlockIndex`

deb58ee refactor: optimize CBlockIndexWorkComparator with std::tie

ns/cmp	cmp/s	err%	ins/cmp	cyc/cmp	IPC	bra/cmp	miss%	total	benchmark
1.52	656,674,205.98	0.0%	13.18	5.47	2.410	3.06	0.1%	5.50	`CBlockIndexWorkComparator`

ns/op	op/s	err%	ins/op	cyc/op	IPC	bra/op	miss%	total	benchmark
67,726.84	14,765.19	0.2%	585,826.07	243,155.90	2.409	181,920.07	0.0%	5.54	`CheckBlockIndex`

Compiler: clang 22.0.0

b60450f bench: add benchmark to measure CBlockIndexWorkComparator performance

ns/cmp	cmp/s	err%	ins/cmp	cyc/cmp	IPC	bra/cmp	miss%	total	benchmark
9.96	100,451,893.46	0.0%	61.28	35.75	1.714	13.62	2.1%	5.52	`CBlockIndexWorkComparator`

ns/op	op/s	err%	ins/op	cyc/op	IPC	bra/op	miss%	total	benchmark
96,878.70	10,322.19	0.1%	802,827.10	347,679.01	2.309	234,823.10	0.0%	5.50	`CheckBlockIndex`

e2e0217 refactor: inline arith_uint256 comparison operator

ns/cmp	cmp/s	err%	ins/cmp	cyc/cmp	IPC	bra/cmp	miss%	total	benchmark
6.13	163,183,693.10	0.0%	25.74	22.01	1.170	6.10	4.6%	5.50	`CBlockIndexWorkComparator`

ns/op	op/s	err%	ins/op	cyc/op	IPC	bra/op	miss%	total	benchmark
86,307.94	11,586.42	1.2%	646,229.09	309,785.33	2.086	195,119.09	0.0%	5.54	`CheckBlockIndex`

85b74b0 refactor: optimize arith_uint256 comparison with spaceship operator

ns/cmp	cmp/s	err%	ins/cmp	cyc/cmp	IPC	bra/cmp	miss%	total	benchmark
6.36	157,330,900.16	1.0%	26.20	22.61	1.159	6.55	4.4%	5.53	`CBlockIndexWorkComparator`

ns/op	op/s	err%	ins/op	cyc/op	IPC	bra/op	miss%	total	benchmark
75,031.66	13,327.71	1.0%	650,604.15	266,934.04	2.437	149,967.14	0.0%	5.54	`CheckBlockIndex`

deb58ee refactor: optimize CBlockIndexWorkComparator with std::tie

ns/cmp	cmp/s	err%	ins/cmp	cyc/cmp	IPC	bra/cmp	miss%	total	benchmark
1.46	683,414,234.14	0.0%	16.12	5.25	3.067	4.02	0.1%	5.50	`CBlockIndexWorkComparator`

ns/op	op/s	err%	ins/op	cyc/op	IPC	bra/op	miss%	total	benchmark
69,559.70	14,376.14	0.1%	559,208.07	249,654.28	2.240	132,342.07	0.0%	5.51	`CheckBlockIndex`

Reproducer

Run the equivalence tests with:

cmake -B build && cmake --build build && ./build/bin/test_bitcoin --run_test=arith_uint256_tests,blockchain_tests

Each commit shows how it changes the relevant benchmarks.

Benchmark script

for compiler in gcc clang; do \
  if [ "$compiler" = "gcc" ]; then CC=gcc; CXX=g++; COMP_VER=$(gcc -dumpfullversion); \
  else CC=clang; CXX=clang++; COMP_VER=$(clang -dumpversion); fi && \
  echo "> Compiler: $compiler $COMP_VER" && \
  for commit in 70c6244b709fcf6a853ea44b91b9ed219609ac90 516e05cd7d8dbb61b05854c59f5393367e502284 d053b27c06d06cccbc6395b0aa7dc1c4193a6fc3 8ef4ce2cb3ac80cbe58b8cf76e183f9eca6fafe4 703165e6d3b9fbd58b9c31ddce0e961047fc5e71; do \
    git fetch origin $commit >/dev/null 2>&1 && git checkout $commit >/dev/null 2>&1 && git log -1 --pretty='%h %s' && \
    rm -rf build && \
    cmake -B build -DBUILD_BENCH=ON -DENABLE_IPC=OFF -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_C_COMPILER=$CC -DCMAKE_CXX_COMPILER=$CXX >/dev/null 2>&1 && \
    cmake --build build -j$(nproc) >/dev/null 2>&1 && \
    for i in 1; do \
      build/bin/bench_bitcoin -filter='CBlockIndexWorkComparator|CheckBlockIndex' -min-time=5000; \
    done; \
  done; \
done

Note: something similar was already started in #33334, but this is a broader optimization that doesn't use the same technique.

DrahtBot · 2025-10-16T01:54:37Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33637.

Reviews

See the guideline for information on the review process.

Type	Reviewers
Concept ACK	mzumsande
Approach ACK	Raimo33
Stale ACK	laanwj, optout21

If your review is incorrectly listed, please copy-paste  into the comment that the bot should ignore.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#34075 (fees: Introduce Mempool Based Fee Estimation to reduce overestimation by ismaelsadeeq)
#33854 (fix assumevalid is ignored during reindex by Eunovo)
#33324 (blocks: add -reobfuscate-blocks argument to enable (de)obfuscating existing blocks by l0rinc)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

Raimo33 · 2025-10-17T09:16:18Z

Approach ACK

I have tested the new block index comparator but I’ll refrain from acking the added benchmarks/tests

src/bench/blockstorage.cpp

Christewart · 2025-10-18T21:34:57Z

I attempted to run the script, not really sure what these results indicate. Just pasting what the results were

Details

Darwin Chriss-MacBook-Pro.local 24.6.0 Darwin Kernel Version 24.6.0: Mon Jul 14 11:30:55 PDT 2025; root:xnu-11417.140.69~1/RELEASE_ARM64_T6031 arm64

Apple clang version 17.0.0 (clang-1700.3.19.1)
Target: arm64-apple-darwin24.6.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

|              ns/cmp |               cmp/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                4.97 |      201,237,014.91 |    0.4% |      5.49 | `CBlockIndexWorkComparator`

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|           43,777.21 |           22,842.94 |    0.3% |      5.41 | `CheckBlockIndex`

|              ns/cmp |               cmp/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                1.89 |      529,694,702.31 |    0.1% |      5.50 | `CBlockIndexWorkComparator`

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|           38,303.87 |           26,107.02 |    0.1% |      5.35 | `CheckBlockIndex`

|              ns/cmp |               cmp/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                2.31 |      432,334,219.11 |    0.6% |      5.45 | `CBlockIndexWorkComparator`

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|           33,598.60 |           29,763.15 |    0.2% |      5.32 | `CheckBlockIndex`

|              ns/cmp |               cmp/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                0.78 |    1,287,258,159.19 |    0.1% |      5.46 | `CBlockIndexWorkComparator`

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|           31,055.32 |           32,200.61 |    0.1% |      5.30 | `CheckBlockIndex`

l0rinc · 2025-10-19T00:24:34Z

Thanks for the measurements @Christewart, this is how your measurements compare to mine (but most importantly how it compares to master):

laanwj

Code review ACK c15d839aefba017e64f14cd9cd2779655352f4b6
i did not run the benchmarks but the code changes look good, using <=> makes sense.

l0rinc · 2025-10-28T23:19:25Z

Rebased after #29640 - only change was adjusting the comment

mzumsande

Concept ACK

Not 100% sure yet about 2dd0f2ced35a268bcab661c671e7c70271cdd91f, seems like inlining gives most of the speedup, whereas the gain of using the spaceship operator (which comes at the cost of readability) is marginal.

mzumsande · 2025-10-30T20:41:18Z

src/test/blockchain_tests.cpp

+        if (pa->nChainWork > pb->nChainWork) return false;
+        if (pa->nChainWork < pb->nChainWork) return true;
+
+        // ... then by earliest time received, ...


should take the latest comments from the old function after #29640

~~I have already done that, which parts do you think I'm missing?~~ Edit: ah in the tests, good point, let me update it

src/test/arith_uint256_tests.cpp

mzumsande · 2025-10-30T21:15:27Z

src/test/blockchain_tests.cpp

@@ -117,6 +117,43 @@ BOOST_AUTO_TEST_CASE(num_chain_tx_max)
    BOOST_CHECK_EQUAL(block_index.m_chain_tx_count, std::numeric_limits<uint64_t>::max());
 }

+BOOST_AUTO_TEST_CASE(cblockindex_comparator_equivalence)


this looks like a fuzz test, why not make it an actual fuzz target out of this?

I look at fuzz tests as exploratory tools that are hard to write and run and debug and do code coverage on - not trivial to maintain (see #33731), it's definitely not my go-to way to test something. This is just a randomized property based unit test - easy to debug, easy to run and understand.
But looking at the tests again, I can make it more deterministic so that it hits all important branches which allows me to reduce the iteration count - code coverage and debugging shows this regularly tests each branch now (green line on left side):

Added you as coauthor.

I look at fuzz tests as exploratory tools that are hard to write and run and debug and do code coverage ... not trivial to maintain ... not my go-to way to test something

There is a learning curve to writing and using fuzz testing but it is 100% worth getting in to. The maintenance aspect only relates to running them using libFuzzer on macOS.

On average, coverage guided fuzzing will be magnitudes better at finding bugs than this style of unit test. For something as small as the comparator your approach is probably good enough, although you could just have a fixed list of test cases that enumerate all possibilities (for the old comparator) instead? (maybe you do already, didn't look)

mzumsande · 2025-10-30T21:18:50Z

src/test/blockchain_tests.cpp

@@ -117,6 +117,43 @@ BOOST_AUTO_TEST_CASE(num_chain_tx_max)
    BOOST_CHECK_EQUAL(block_index.m_chain_tx_count, std::numeric_limits<uint64_t>::max());
 }

+BOOST_AUTO_TEST_CASE(cblockindex_comparator_equivalence)


would suggest to add a bit more explanation, e.g. mentioning that old_comparator is a snapshot of an old implementation, so that people who look at this in years understand where it's coming from.

I strongly dislike code comments, I prefer explaining with live code over dead comments - and if people disagree they can always do a blame which instantly reveals the purpose since I also over-explain in commit messages usually.
Are the names old_comparator in a cblockindex_comparator_equivalence not enough to make it obvious that? I
I have added a comment anyway, let me know if it helps.

mzumsande · 2025-10-30T21:48:16Z

src/node/blockstorage.h

@@ -87,12 +87,20 @@ static constexpr uint32_t UNDO_DATA_DISK_OVERHEAD{STORAGE_HEADER_BYTES + uint256
 using BlockMap = std::unordered_map<uint256, CBlockIndex, BlockHasher>;

 struct CBlockIndexWorkComparator {
-    bool operator()(const CBlockIndex* pa, const CBlockIndex* pb) const;
+    // First sort by most total work (descending), then by earliest activatable time (ascending), then by pointer value (ascending).


I find "ascending/descending" confusing here, I would have expected the opposites: "descending work" means most work first, but the strict weak ordering of C++ comparators would place the one with the lower work first.

Would suggest something like
"First compare by work (less first), then by earliest activatable time (higher sequence first), then by pointer value (higher first)."

I wanted to use the SQL order-by terminology here

Ascending order puts smaller values first, where “smaller” is defined in terms of the < operator.
Similarly, descending order is determined with the > operator.

But you're right, I inverted the terminology, added a dedicated test to check for hard-coded order (first param is increasing, second is decreasing), updated the comments (this is why I find comments a lazy solution, it's too easy to keep them up-to-date), added you as coauthor.

l0rinc · 2025-10-31T13:13:33Z

Pushed, thanks for the review, you had a few good points, added you as coauthor.

the gain of using the spaceship operator (which comes at the cost of readability) is marginal.

the spaceship might not be very familiar to us yet, but it's arguably simpler, having fewer moving parts, and some compilers prefer that over the manual comparisons.

src/test/arith_uint256_tests.cpp

src/test/blockchain_tests.cpp

optout21 · 2025-11-03T12:44:19Z

utACK 5cbb9a40873203ea5a4dd0aa7127c9756da2f607

An evident, localized micro-optimization in a hot-path. I haven't run measurements.

optout21 · 2025-11-03T12:52:15Z

reACK 80bfeaa Rebase.

Prev:
reACK 78b4306 Rebase, no content changes.
reACK 6e9061f
utACK 237eec0f7ca095bc60e40ee94ba1a160a7064753
Re-acked (after 7 mins :D ), only minor/formatting changes since last review

maflcko · 2025-11-06T15:34:25Z

Profiling the performance regression in #33618 (comment) revealed that CBlockIndexWorkComparator and its underlying base_uint<256u>::CompareTo are hot paths during block validation, consuming ~4% of CPU time.

I don't follow here. The issue is about the additional CWallet::WriteBestBlock overhead? CBlockIndexWorkComparator seems unrelated to the issue, at least I can't see how it changed the result between 29.x and 30.x.

Instead of mentioning an unrelated issue, I think it could be better to mention what real-world effect this improvement has. I think that is -checkblockindex (default in regtest) and possibly LoadExternalBlockFile/LoadBlockIndex?

l0rinc · 2025-11-06T16:22:42Z

CBlockIndexWorkComparator seems unrelated to the issue

The original issue won't be fixed since it's not exactly a bug, but we could still speed up the regression by optimizing another bottleneck.
Since you found the etymology confusing, I have removed the details from the description, only mentioned #33618 (comment) for the flame graph that show this being one of the bottlenecks. Not the biggest one, but the one we can easily optimize and clean up.

I think that is -checkblockindex (default in regtest)

Yes, this was mentioned in the benchmarking section.

possibly LoadExternalBlockFile/LoadBlockIndex

Yes, it's used explicitly in Chainstate in FindMostWorkChain, PruneBlockIndexCandidates, ActivateBestChain and InvalidateBlock.
And ChainstateManager in LoadBlockIndex, CheckBlockIndex, ActivateSnapshot and PopulateAndValidateSnapshot.
But the setBlockIndexCandidates usages should already suffice as motivation, I'd say:

git grep 'setBlockIndexCandidates.' src/validation.cpp | wc -l 
   44

l0rinc · 2025-11-12T22:50:32Z

src/node/blockstorage.h

+    // Pointer tiebreak should only happen with blocks loaded from disk, as those share the same id: 0 for blocks on the best chain, 1 for all others.
+    bool operator()(const CBlockIndex* pa, const CBlockIndex* pb) const noexcept
+    {
+        return std::tie(pa->nChainWork, pb->nSequenceId, pb)


I thought of eliminating the worst case here (when pa == pb, where it has to do the comparison for every byte) with something like:

return pa != pb && std::tie(pa->nChainWork, pb->nSequenceId, pb) < std::tie(pb->nChainWork, pa->nSequenceId, pa);

But the benchmarks indicate that it's not really faster.

benchmark with pointer duplicates

// Copyright (c) 2025-present The Bitcoin Core developers // Distributed under the MIT software license, see the accompanying // file COPYING or http://www.opensource.org/licenses/mit-license.php. #include <arith_uint256.h> #include <bench/bench.h> #include <chain.h> #include <node/blockstorage.h> #include <random.h> #include <uint256.h> #include <algorithm> #include <memory> #include <vector> static void CBlockIndexWorkComparator(benchmark::Bench& bench) { FastRandomContext rng{/*fDeterministic=*/true}; constexpr size_t n{1'000}; std::vector<std::shared_ptr<CBlockIndex>> blocks; blocks.reserve(n); for (size_t i{0}; i < n; ++i) { auto block{std::make_shared<CBlockIndex>()}; block->nChainWork = UintToArith256(rng.rand256()); block->nSequenceId = int32_t(rng.rand32()); if (i % 10 == 1) { // Have some duplicates if (rng.randbool()) { block = blocks.back(); } else { if (rng.randbool()) block->nChainWork = blocks.back()->nChainWork; if (rng.randbool()) block->nSequenceId = blocks.back()->nSequenceId; } } blocks.push_back(std::move(block)); } std::ranges::shuffle(blocks, rng); const node::CBlockIndexWorkComparator comparator; bench.batch(n * n).unit("cmp").run([&] { for (size_t i{0}; i < n; ++i) { const auto* lhs{blocks[i].get()}; for (size_t j{0}; j < n; ++j) { const auto* rhs{blocks[j].get()}; const bool result{comparator(lhs, rhs)}; ankerl::nanobench::doNotOptimizeAway(result); } } }); } BENCHMARK(CBlockIndexWorkComparator, benchmark::PriorityLevel::HIGH);

b17504b refactor: optimize CBlockIndexWorkComparator with std::tie

ns/cmp cmp/s err% ins/cmp cyc/cmp IPC bra/cmp miss% total benchmark

2.13 469,485,340.38 0.2% 15.07 5.09 2.959 4.02 0.1% 11.01 CBlockIndexWorkComparator

ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark

90,265.10 11,078.48 0.1% 637,536.00 215,946.18 2.952 178,844.00 0.0% 10.84 CheckBlockIndex

2b77e56 refactor: short-circuit CBlockIndex comparator for equality

ns/cmp cmp/s err% ins/cmp cyc/cmp IPC bra/cmp miss% total benchmark

2.60 385,002,212.20 0.0% 18.00 6.22 2.894 5.00 0.0% 11.00 CBlockIndexWorkComparator

ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark

94,110.53 10,625.80 0.0% 682,454.03 225,242.05 3.030 186,486.01 0.0% 11.00 CheckBlockIndex

l0rinc · 2025-11-25T14:53:23Z

Rebased after 4f65a1c#diff-f14e9423bd43609969886b2e42751c55606344e0b71780564cdb23515802cec0R12, ready for review again.

| ns/op | op/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 131.79 | 7,588,049.73 | 0.1% | 10.78 | `PrivateBroadcastBench`

l0rinc · 2026-01-15T17:13:16Z

Rebased to fix silent merge conflict.

DrahtBot · 2026-01-15T19:13:45Z

🚧 At least one of the CI tasks failed.
_{Task iwyu: https://github.com/bitcoin/bitcoin/actions/runs/21039886920/job/60501264056}
_{LLM reason (✨ experimental): IWYU reported a failure after adjusting includes in src/node/blockstorage.h, causing the CI to fail.}

Hints

Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the
affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

l0rinc · 2026-02-02T10:44:43Z

Rebased after conflict with https://github.com/bitcoin/bitcoin/pull/34179/changes#diff-ed3f90693a242b38b9719af171de8f55183576957676dfa358945bea22276bd5R139, ready for review again.

sipa · 2026-02-02T17:16:04Z

You should really add a fuzz test here to compare old and new behavior as suggested before; it's the perfect use case for it, as it avoids the need for putting presumed knowledge about potential bugs in the unit tests you create.

l0rinc · 2026-02-02T21:46:26Z

You should really add a fuzz test here to compare old and new behavior

Thanks for the hint, I've split the differential fuzzer and kept the simple unit tests to to lock in the expected ordering of both comparators.
After a while (if fuzzers can't find any difference) we should be able to remove the fuzzers while keeping only the unit tests.

maflcko · 2026-02-03T07:07:59Z

As pointed out earlier, most of the speedup comes from a move-only inline (#33637 (review))?

I think it would be better to just submit a trivial to review move-only change stand-alone, which wouldn't need in-depth review and dedicated differential unit/fuzz tests. Or at least the move-only change should be the first change. This way, it is easier to see if the other changes are worth it to review/test at all.

Profiling shows this comparator consumes a significant portion of `CheckBlockIndex()`: ... ChainstateManager::CheckBlockIndex() ... std::_Rb_tree<...>::find(...) ... node::CBlockIndexWorkComparator::operator()(...) ... base_uint<256u>::CompareTo(...) const This commit introduces a benchmark that performs pairwise comparisons on 1000 randomly generated block indices (with some duplicates) to establish baseline performance metrics before further optimization. | ns/cmp | cmp/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 4.35 | 229,642,637.11 | 0.2% | 1.10 | `CBlockIndexWorkComparator` | ns/op | op/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 38,213.19 | 26,168.97 | 0.5% | 1.10 | `CheckBlockIndex`

…nt256` This test was added before other changes to create a baseline behavior.

Move `CBlockIndexWorkComparator::operator()` and `CBlockIndexHeightOnlyComparator::operator()` to the header to facilitate inlining. | ns/cmp | cmp/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 2.87 | 348,283,873.30 | 0.2% | 1.10 | `CBlockIndexWorkComparator` | ns/op | op/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 36,534.37 | 27,371.49 | 0.3% | 1.10 | `CheckBlockIndex`

Add equivalence coverage to validate behavior while optimizing comparison operations. The originals were kept exactly as the previous implementations were done. The `arith_uint256` equivalence check is implemented as the `arith_uint256_comparison_equivalence` fuzz target. It compares the refactored `operator<=>` against a snapshot of the old word-by-word comparison for all operators (<, >, <=, >=, ==, !=). The `CBlockIndexWorkComparator` equivalence check is implemented as the `block_index_work_comparator` fuzz target. It compares the refactored comparator against a snapshot of the old comparator logic (chain work, sequence ID, pointer tiebreak). Add deterministic unit tests to lock in the expected ordering of both comparators. Co-authored-by: Martin Zumsande <mzumsande@gmail.com>

Replace manual comparison branches with a single tuple comparison, allowing the compilers to generate more efficient comparison code. Made them explicitly `noexcept`, which can enable small optimizer wins and aligns with expectations for hot-path comparators used in ordered containers. | ns/cmp | cmp/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 2.00 | 499,008,850.73 | 0.2% | 1.10 | `CBlockIndexWorkComparator` | ns/op | op/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 32,297.54 | 30,962.11 | 0.9% | 1.10 | `CheckBlockIndex` Co-authored-by: Martin Zumsande <mzumsande@gmail.com>

Remove the `CompareTo` method and inline its logic directly into `operator<=>`, updating related comments. This eliminates function call overhead in the hot path during block generation and chain selection. The comparison algorithm remains unchanged, iterating from most significant to least significant byte, but now returns `std::strong_ordering` directly, removing an extra conversion and enabling better inlining. | ns/cmp | cmp/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 2.18 | 459,159,880.91 | 0.2% | 1.10 | `CBlockIndexWorkComparator` | ns/op | op/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 31,780.27 | 31,466.07 | 0.5% | 1.10 | `CheckBlockIndex`

Replace multiple comparisons with a single C++20 spaceship operator call directly. | ns/cmp | cmp/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 0.68 | 1,477,110,677.23 | 0.2% | 1.06 | `CBlockIndexWorkComparator` | ns/op | op/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 27,528.11 | 36,326.51 | 0.2% | 1.10 | `CheckBlockIndex`

l0rinc · 2026-02-03T11:44:47Z

As pointed out earlier, most of the speedup comes from a move-only inline

That's actually what enables the other ones to shine, but I don't mind reordering the commits to make it obvious.
The biggest optimization is the spaceship (with inline), it's now the last commit.

Measurements

for commit in 8b54cc2e71b015b651d7dcbfc3033c1f76c37e4e 9304b3b225f4dcc06b007d073035a586e1806951 0473240b4f8d29016778ebd73745c610c2b437e8 d64d9432a06879e57702f8188a764eb3fc8f2751 c2f69e78ba69ee4232c5190a41509e142b039e8b 7cb0c5f1fc70d57277689d114ab2da14af2a0696 df0153a846036f94e51ab70b7c91491b138d3569; do \
    git fetch origin $commit >/dev/null 2>&1 && git checkout $commit >/dev/null 2>&1 && echo "" && git log -1 --pretty='%h %s' && \
    rm -rfd build >/dev/null 2>&1 && cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release >/dev/null 2>&1 && \
    cmake --build build -j$(nproc) >/dev/null 2>&1 && \
    for _ in $(seq 3); do \
      sleep 5; \
      sudo taskpolicy -t 5 -l 5 nice -n -20 ./build/bin/bench_bitcoin -filter='CBlockIndexWorkComparator|CheckBlockIndex' -min-time=1000; \
    done; \
done


8b54cc2e71 bench: add benchmark to measure `CBlockIndexWorkComparator` performance

|              ns/cmp |               cmp/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                4.41 |      226,584,742.96 |    0.8% |      1.10 | `CBlockIndexWorkComparator`

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|           38,253.37 |           26,141.49 |    0.9% |      1.09 | `CheckBlockIndex`

|              ns/cmp |               cmp/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                4.35 |      229,642,637.11 |    0.2% |      1.10 | `CBlockIndexWorkComparator`

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|           38,213.19 |           26,168.97 |    0.5% |      1.10 | `CheckBlockIndex`

|              ns/cmp |               cmp/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                4.37 |      228,881,057.73 |    0.3% |      1.10 | `CBlockIndexWorkComparator`

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|           38,287.78 |           26,117.99 |    0.4% |      1.10 | `CheckBlockIndex`

9304b3b225 test: add sorting tests for `CBlockIndexWorkComparator` and `arith_uint256`

|              ns/cmp |               cmp/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                4.39 |      227,961,262.13 |    0.5% |      1.10 | `CBlockIndexWorkComparator`

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|           38,333.93 |           26,086.55 |    0.2% |      1.10 | `CheckBlockIndex`

|              ns/cmp |               cmp/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                4.39 |      227,728,690.97 |    0.7% |      1.11 | `CBlockIndexWorkComparator`

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|           39,671.02 |           25,207.32 |    0.3% |      1.10 | `CheckBlockIndex`

|              ns/cmp |               cmp/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                4.40 |      227,294,352.21 |    0.5% |      1.09 | `CBlockIndexWorkComparator`

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|           38,971.06 |           25,660.07 |    1.1% |      1.10 | `CheckBlockIndex`

0473240b4f move-only: inline `CBlockIndex` comparators to header

|              ns/cmp |               cmp/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                2.88 |      347,302,616.35 |    0.3% |      1.10 | `CBlockIndexWorkComparator`

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|           36,701.33 |           27,246.97 |    0.3% |      1.10 | `CheckBlockIndex`

|              ns/cmp |               cmp/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                2.87 |      348,283,873.30 |    0.2% |      1.10 | `CBlockIndexWorkComparator`

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|           36,790.73 |           27,180.76 |    0.6% |      1.10 | `CheckBlockIndex`

|              ns/cmp |               cmp/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                2.87 |      348,095,671.92 |    0.2% |      1.09 | `CBlockIndexWorkComparator`

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|           36,534.37 |           27,371.49 |    0.3% |      1.10 | `CheckBlockIndex`

d64d9432a0 test/fuzz: add equivalence coverage for comparison refactors

|              ns/cmp |               cmp/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                2.88 |      347,680,941.08 |    0.5% |      1.10 | `CBlockIndexWorkComparator`

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|           37,044.83 |           26,994.32 |    1.1% |      1.08 | `CheckBlockIndex`

|              ns/cmp |               cmp/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                2.90 |      345,339,421.65 |    0.6% |      1.10 | `CBlockIndexWorkComparator`

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|           37,174.31 |           26,900.30 |    1.4% |      1.09 | `CheckBlockIndex`

|              ns/cmp |               cmp/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                2.88 |      346,925,394.89 |    0.3% |      1.10 | `CBlockIndexWorkComparator`

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|           37,287.05 |           26,818.96 |    1.6% |      1.09 | `CheckBlockIndex`

c2f69e78ba refactor: optimize `CBlockIndexWorkComparator` with `std::tie`

|              ns/cmp |               cmp/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                2.00 |      499,008,850.73 |    0.2% |      1.10 | `CBlockIndexWorkComparator`

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|           33,476.29 |           29,871.88 |    1.0% |      1.10 | `CheckBlockIndex`

|              ns/cmp |               cmp/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                2.03 |      493,360,414.53 |    0.5% |      1.10 | `CBlockIndexWorkComparator`

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|           33,342.34 |           29,991.90 |    0.6% |      1.10 | `CheckBlockIndex`

|              ns/cmp |               cmp/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                2.01 |      498,514,389.19 |    0.5% |      1.10 | `CBlockIndexWorkComparator`

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|           32,297.54 |           30,962.11 |    0.9% |      1.10 | `CheckBlockIndex`

7cb0c5f1fc refactor: inline `arith_uint256` comparison operator

|              ns/cmp |               cmp/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                2.18 |      459,159,880.91 |    0.2% |      1.10 | `CBlockIndexWorkComparator`

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|           32,006.20 |           31,243.95 |    0.7% |      1.10 | `CheckBlockIndex`

|              ns/cmp |               cmp/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                2.18 |      458,414,231.91 |    0.3% |      1.10 | `CBlockIndexWorkComparator`

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|           31,780.27 |           31,466.07 |    0.5% |      1.10 | `CheckBlockIndex`

|              ns/cmp |               cmp/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                2.19 |      457,429,572.18 |    0.2% |      1.10 | `CBlockIndexWorkComparator`

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|           32,536.14 |           30,735.06 |    1.2% |      1.09 | `CheckBlockIndex`

df0153a846 refactor: optimize `arith_uint256` comparison with spaceship operator

|              ns/cmp |               cmp/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                0.68 |    1,473,011,136.06 |    0.5% |      1.06 | `CBlockIndexWorkComparator`

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|           27,528.11 |           36,326.51 |    0.2% |      1.10 | `CheckBlockIndex`

|              ns/cmp |               cmp/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                0.69 |    1,451,170,702.42 |    1.0% |      1.07 | `CBlockIndexWorkComparator`

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|           27,875.27 |           35,874.09 |    0.4% |      1.10 | `CheckBlockIndex`

|              ns/cmp |               cmp/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                0.68 |    1,477,110,677.23 |    0.2% |      1.06 | `CBlockIndexWorkComparator`

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|           29,059.25 |           34,412.45 |    0.7% |      1.12 | `CheckBlockIndex`

Besides moving the inlines earlier, I also added noexcept to operator<=>(const base_uint and split the general unit test (which we should have regardless of the upcoming changes, they set the baseline to document current behavior) and the fuzz tests which probe and compare against the old code.
Added the best measurements of 3 from above to the non-test commit messages for reference.

DrahtBot added the Refactoring label Oct 16, 2025

l0rinc mentioned this pull request Oct 16, 2025

node: optimize CBlockIndexWorkComparator #33334

Closed

This was referenced Oct 16, 2025

fuzz: compact block harness #33300

Open

Fix tiebreak when loading blocks from disk (and add tests for comparing chain ties) #29640

Merged

sipa reviewed Oct 17, 2025

View reviewed changes

src/bench/blockstorage.cpp Outdated Show resolved Hide resolved

l0rinc force-pushed the l0rinc/block_index_comparators branch from f74572e to c15d839 Compare October 18, 2025 05:55

l0rinc changed the title ~~refactor: optimize block index comparisons (1.4-7.7x faster)~~ refactor: optimize block index comparisons (1.4-6.8x faster) Oct 18, 2025

laanwj approved these changes Oct 22, 2025

View reviewed changes

DrahtBot added the Needs rebase label Oct 27, 2025

l0rinc force-pushed the l0rinc/block_index_comparators branch from c15d839 to a558471 Compare October 28, 2025 23:18

DrahtBot removed the Needs rebase label Oct 28, 2025

mzumsande reviewed Oct 30, 2025

View reviewed changes

l0rinc force-pushed the l0rinc/block_index_comparators branch 2 times, most recently from da5eb75 to 5cbb9a4 Compare October 31, 2025 13:10

optout21 reviewed Nov 3, 2025

View reviewed changes

src/test/arith_uint256_tests.cpp Outdated Show resolved Hide resolved

optout21 reviewed Nov 3, 2025

View reviewed changes

src/test/blockchain_tests.cpp Outdated Show resolved Hide resolved

DrahtBot requested review from laanwj and mzumsande November 3, 2025 12:44

l0rinc force-pushed the l0rinc/block_index_comparators branch from 5cbb9a4 to 237eec0 Compare November 3, 2025 12:48

l0rinc commented Nov 12, 2025

View reviewed changes

DrahtBot removed the Needs rebase label Nov 25, 2025

DrahtBot mentioned this pull request Dec 30, 2025

refactor: Enable transparent lookup for setBlockIndexCandidates to remove const_cast #34179

Merged

l0rinc force-pushed the l0rinc/block_index_comparators branch from 6e9061f to a37cd53 Compare January 15, 2026 17:13

DrahtBot added the CI failed label Jan 15, 2026

l0rinc force-pushed the l0rinc/block_index_comparators branch from a37cd53 to 78b4306 Compare January 15, 2026 19:19

DrahtBot removed the CI failed label Jan 15, 2026

DrahtBot mentioned this pull request Jan 18, 2026

fix assumevalid is ignored during reindex #33854

Open

DrahtBot mentioned this pull request Jan 26, 2026

blocks: add -reobfuscate-blocks argument to enable (de)obfuscating existing blocks #33324

Open

DrahtBot added the Needs rebase label Feb 2, 2026

l0rinc force-pushed the l0rinc/block_index_comparators branch from 78b4306 to 80bfeaa Compare February 2, 2026 10:44

DrahtBot removed the Needs rebase label Feb 2, 2026

l0rinc force-pushed the l0rinc/block_index_comparators branch from 80bfeaa to 5ab4789 Compare February 2, 2026 21:42

l0rinc and others added 7 commits February 3, 2026 12:41

test: add sorting tests for CBlockIndexWorkComparator and `arith_ui…

f682c2b

…nt256` This test was added before other changes to create a baseline behavior.

l0rinc force-pushed the l0rinc/block_index_comparators branch from 5ab4789 to 703165e Compare February 3, 2026 11:46

DrahtBot mentioned this pull request Feb 6, 2026

fees: Introduce Mempool Based Fee Estimation to reduce overestimation #34075

Open

Conversation

l0rinc commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Testing

Results

Reproducer

Uh oh!

DrahtBot commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Coverage & Benchmarks

Reviews

Conflicts

Uh oh!

Raimo33 commented Oct 17, 2025

Uh oh!

Uh oh!

Christewart commented Oct 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

l0rinc commented Oct 19, 2025

Uh oh!

laanwj left a comment

Choose a reason for hiding this comment

Uh oh!

l0rinc commented Oct 28, 2025

Uh oh!

mzumsande left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

l0rinc Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mzumsande Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

l0rinc commented Oct 31, 2025

Uh oh!

Uh oh!

Uh oh!

optout21 commented Nov 3, 2025

Uh oh!

optout21 commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maflcko commented Nov 6, 2025

Uh oh!

l0rinc commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

l0rinc commented Nov 25, 2025

Uh oh!

l0rinc commented Jan 15, 2026

Uh oh!

DrahtBot commented Jan 15, 2026

Uh oh!

l0rinc commented Feb 2, 2026

Uh oh!

sipa commented Feb 2, 2026

Uh oh!

l0rinc commented Feb 2, 2026

l0rinc commented Oct 16, 2025 •

edited

Loading

DrahtBot commented Oct 16, 2025 •

edited

Loading

Christewart commented Oct 18, 2025 •

edited

Loading

l0rinc Oct 31, 2025 •

edited

Loading

mzumsande Oct 30, 2025 •

edited

Loading

optout21 commented Nov 3, 2025 •

edited

Loading

l0rinc commented Nov 6, 2025 •

edited

Loading

l0rinc commented Feb 3, 2026 •

edited

Loading