optimization: Reduce cache lookups in CCoinsViewCache::FetchCoin #30326

l0rinc · 2024-06-23T22:01:29Z

Enhanced efficiency and readability of CCoinsViewCache::FetchCoin by replacing separate find() and emplace() calls with a single try_emplace(), reducing map lookups and potential insertions.

AssembleBlock shows FetchCoin as one of its bottlenecks:

These changes result in a modest performance improvement:

./src/bench/bench_bitcoin --filter='AssembleBlock' --min-time=10000

before:

ns/op	op/s	err%	total	benchmark
156,160.70	6,403.66	0.6%	10.91	`AssembleBlock`

after:

ns/op	op/s	err%	total	benchmark
152,971.97	6,537.15	0.2%	10.95	`AssembleBlock`

Further benchmarks: #30326 (comment)

DrahtBot · 2024-06-23T22:01:31Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	andrewtoth, sipa, achow101
Stale ACK	luke-jr

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

No conflicts as of last run.

fanquake · 2024-06-24T14:07:58Z

These changes result in a modest performance improvement:

I ran the benchmark on an aarch64 machine, and saw mostly worse performance when using the latest GCC, or Clang, with libstdc++. There was a small improvement when using Clang and libc++.

GCC 14.1.1 20240620 (Red Hat 14.1.1-6)

Master

|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|          298,908.49 |            3,345.51 |    0.1% |    2,175,461.97 |      883,257.19 |  2.463 |     314,883.84 |    0.5% |     10.99 | `AssembleBlock`

PR

|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|          317,848.00 |            3,146.16 |    0.5% |    2,243,862.65 |      918,949.53 |  2.442 |     316,609.60 |    0.5% |     11.00 | `AssembleBlock`

Clang 18.1.6 (Fedora 18.1.6-4.fc41)

Master

|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|          290,702.08 |            3,439.95 |    0.2% |    2,047,925.12 |      858,324.72 |  2.386 |     290,750.85 |    0.6% |     10.99 | `AssembleBlock`

PR

|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|          301,625.77 |            3,315.37 |    0.2% |    2,138,789.21 |      891,494.06 |  2.399 |     294,669.20 |    0.5% |     10.96 | `AssembleBlock`

Clang libc++

Master

|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|          285,978.35 |            3,496.77 |    0.2% |    1,960,603.15 |      845,960.63 |  2.318 |     294,904.09 |    0.6% |     10.98 | `AssembleBlock`

PR

|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|          279,828.66 |            3,573.62 |    0.1% |    1,944,499.08 |      825,776.61 |  2.355 |     296,404.49 |    0.6% |     11.01 | `AssembleBlock`

l0rinc · 2024-06-24T14:17:10Z

Thanks for checking @fanquake!
As the profiler showed, this is only 5% of AssembleBlock, and the benchmark varied a lot on my machine as well, it might be why you also see the discrepancy.
Do you think I should add a dedicated benchmark for this only?

fanquake · 2024-06-24T14:27:52Z

I think that rather than trying to micro-optimise these somewhat sensitive functions, if you're interested in benchmarking / performance related changes, your time might be better spent reviewing Pull Requests like #28280 (which your changes conflict with).

I'd also note that all of your benchmarking / profiling seems to be being done on an arm64 macOS machine, using (I assume) Apple Clang as the compiler, and libc++ as the standard library. So when you create changes like these, even if you see improvments locally, they aren't necessarily going to be improvements for the compiler/std lib we primarily care about (because it's used to build the majority of our release binaries), which is GCC and libstdc++.

sipa · 2024-06-24T17:08:46Z

If you can benchmark pre-assumevalid IBD I think you'll see FetchCoins have a bigger impact than in AssembleBlock. It may need an actual -reindex though, I don't know if we have a benchmark with realistic workload.

l0rinc · 2024-06-25T09:09:11Z

Thanks for the compiler hints @fanquake, I had a lot of problems with MacOSX14.sdk, so was using Apple clang version 15.0.0 (clang-1500.3.9.4). Might give GCC and libstdc++ another try if I can find a readme of how to do that on a mac properly, everything just failed catastrophically so far.

And thanks for the IBD hint @sipa, by limiting `-stopatheight=200000` and disabling unrelated validations, the target method was indeed the main bottleneck (~20% of time spent):

Index: src/validation.cpp
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/src/validation.cpp b/src/validation.cpp
--- a/src/validation.cpp	(revision a114ced7192204963f19abb09db05d6be14cbf61)
+++ b/src/validation.cpp	(date 1719320825375)
@@ -2133,7 +2133,7 @@
                        bool cacheFullScriptStore, PrecomputedTransactionData& txdata,
                        std::vector<CScriptCheck>* pvChecks)
 {
-    if (tx.IsCoinBase()) return true;
+    return true;
 
     if (pvChecks) {
         pvChecks->reserve(tx.vin.size());

I ran bitcoind -reindex-chainstate -assumevalid=0 -par=1 -stopatheight=200000 for 20 iterations (skipped the first in both as a warmup) as such:

git checkout master && make -j10 >/dev/null 2>&1 && for i in {1..20}; do (time (./src/bitcoind -reindex-chainstate -assumevalid=0 -par=1 -stopatheight=200000 >/dev/null 2>&1)) 2>&1 | awk '{print $(NF-1)}'; done; git checkout paplorinc/FetchCoin && make -j10 >/dev/null 2>&1 && for i in {1..20}; do (time (./src/bitcoind -reindex-chainstate -assumevalid=0 -par=1 -stopatheight=200000 >/dev/null 2>&1)) 2>&1 | awk '{print $(NF-1)}'; done

which reveals a small but measurable difference between the two runs (~4%):

Is there any other test or benchmark or change you'd like me to do?

l0rinc · 2024-06-28T16:41:30Z

your time might be better spent reviewing Pull Requests like #28280

Thanks for the hint @fanquake, it's a tough PR, I have added my two sats to it: #28280 (review)

andrewtoth · 2024-07-04T22:59:18Z

Just a style nit, but I think if you return early on if (!inserted) it would make a cleaner diff and make it easier to see the benefit of this change.

diff --git a/src/coins.cpp b/src/coins.cpp
index a4e4d4ad32..9e8072ba02 100644
--- a/src/coins.cpp
+++ b/src/coins.cpp
@@ -41,20 +41,20 @@ size_t CCoinsViewCache::DynamicMemoryUsage() const {
 }
 
 CCoinsMap::iterator CCoinsViewCache::FetchCoin(const COutPoint &outpoint) const {
-    CCoinsMap::iterator it = cacheCoins.find(outpoint);
-    if (it != cacheCoins.end())
+    const auto [it, inserted] = cacheCoins.try_emplace(outpoint);
+    if (!inserted)
         return it;
-    Coin tmp;
-    if (!base->GetCoin(outpoint, tmp))
+    if (!base->GetCoin(outpoint, it->second.coin)) {
+        cacheCoins.erase(it);
         return cacheCoins.end();
-    CCoinsMap::iterator ret = cacheCoins.emplace(std::piecewise_construct, std::forward_as_tuple(outpoint), std::forward_as_tuple(std::move(tmp))).first;
-    if (ret->second.coin.IsSpent()) {
+    }
+    if (it->second.coin.IsSpent()) {
         // The parent only has an empty entry for this outpoint; we can consider our
         // version as fresh.
-        ret->second.flags = CCoinsCacheEntry::FRESH;
+        it->second.flags = CCoinsCacheEntry::FRESH;
     }
-    cachedCoinsUsage += ret->second.coin.DynamicMemoryUsage();
-    return ret;
+    cachedCoinsUsage += it->second.coin.DynamicMemoryUsage();
+    return it;
 }
 
 bool CCoinsViewCache::GetCoin(const COutPoint &outpoint, Coin &coin) const {

l0rinc · 2024-07-05T06:59:23Z

Thanks for the review @andrewtoth.
I'm not sure I see the advantage of inverting the if (inserted) { call, just to have two return it; calls after.
But I did rename it to ret, since we're getting the most likely return value immediately now.

andrewtoth · 2024-07-05T13:40:04Z

disabling unrelated validations

I don't think @sipa said to disable validations. "benchmark pre-assumevalid IBD" means testing IBD (or reindex) before the assumevalid block, which right now is block 824,000. By setting -assumevalid=0 that means you disabled using assumevalid entirely, so you will do signature checks from genesis.

limiting -stopatheight=200000

With this limit you should not see any benefit from this change. Using default dbcache settings the cache is not flushed until around block 250000 for the first time. This means that all cache entries will exist in the coinsCache map when calling FetchCoin so it will never enter the if (inserted) { block.

I think you should benchmark bitcoind -reindex-chainstate -stopatheight=820000 or so. And you should not change anything in validation.cpp, that could skew the benchmark.

I'm not sure I see the advantage of inverting the if (inserted) { call, just to have two return it; calls after.

It was because the diff is then +9/-9 and no lines change indentation, and the current diff is +12/-13 with indentation changing. So just a style nit, no advantage to the code. Also, after your change of renaming it to ret, the change I suggest would have a diff of only +6/-6 and would no longer conflict with #28280

luke-jr

utACK 00052ecc0402f2c53e3602b901a937b64aa7f7cc

There's a potential race condition for negative lookups, but I assume it's fine since the old code would also have a (different) race.

Performance might be slightly worse for negative lookups, but that seems an acceptable tradeoff since the positive lookup path is much more likely, especially during IBD.

l0rinc · 2024-07-07T11:18:58Z

Thanks for the review @andrewtoth and @luke-jr!

I think you should benchmark bitcoind -reindex-chainstate -stopatheight=820000

I don't have that much space yet, but ran it with:

sudo hyperfine \
--runs 3 \
--show-output \
--export-markdown bench.md \
--parameter-list commit bd5d16,00052ec \
--prepare "sudo purge; sync; git checkout {commit}; git reset --hard; make -j10" \
"./src/bitcoind -reindex-chainstate -stopatheight=400000 -printtoconsole=0"

resulting in

Command	Mean [s]	Min [s]	Max [s]	Relative
`./src/bitcoind -reindex-chainstate -stopatheight=400000 -printtoconsole=0 (commit = bd5d16)`	1012.260 ± 1.206	1011.416	1013.641	1.02 ± 0.00
`./src/bitcoind -reindex-chainstate -stopatheight=400000 -printtoconsole=0 (commit = 00052ec)`	994.387 ± 3.077	991.018	997.050	1.00

i.e. a modest 2% speedup.

would no longer conflict with #28280

good point, if more changes will be needed, I'll do that!

positive lookup path is much more likely, especially during IBD.

Thanks Luke, that was my thinking as well. I'll do a full IBD this week to see how it behaves (once I free up some extra space locally).

andrewtoth

ACK 00052ecc0402f2c53e3602b901a937b64aa7f7cc

I benchmarked IBD from a local node connection to block 820k, twice on this branch twice on master.
This branch: 22265 seconds mean time
Master: 22523 seconds mean time

Also benchmarked -reindex-chainstate up to block 820k, twice on this branch and twice on master.
This branch: 22085 seconds mean time
Master: 22255 seconds mean time

So it appears to shave off ~3 minutes of time.

l0rinc · 2024-07-07T19:32:30Z

Thanks a lot for checking, @andrewtoth!

DrahtBot · 2024-07-22T11:57:27Z

🚧 At least one of the CI tasks failed.
_{Debug: https://github.com/bitcoin/bitcoin/runs/27072041125}

Hints

Make sure to run all tests locally, according to the documentation.

The failure may happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the
affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

Enhanced efficiency and readability of CCoinsViewCache::FetchCoin by replacing separate find() and emplace() calls with a single try_emplace(), reducing map lookups and potential insertions.

l0rinc · 2024-08-08T20:55:25Z

Rebased after #28280

andrewtoth

re-ACK 204ca67

sipa · 2024-08-09T20:04:00Z

utACK 204ca67

achow101 · 2024-08-12T20:06:04Z

ACK 204ca67

l0rinc force-pushed the paplorinc/FetchCoin branch from faf0347 to bf4b2ac Compare June 23, 2024 22:02

l0rinc changed the title ~~optimizations: Reduce cache lookups in CCoinsViewCache::FetchCoin~~ optimization: Reduce cache lookups in CCoinsViewCache::FetchCoin Jun 23, 2024

l0rinc force-pushed the paplorinc/FetchCoin branch from bf4b2ac to bcacbf4 Compare June 24, 2024 07:57

DrahtBot mentioned this pull request Jun 24, 2024

Don't empty dbcache on prune flushes: >30% faster IBD #28280

Merged

l0rinc marked this pull request as ready for review June 24, 2024 10:10

l0rinc force-pushed the paplorinc/FetchCoin branch from bcacbf4 to 00052ec Compare July 5, 2024 06:58

luke-jr approved these changes Jul 6, 2024

View reviewed changes

andrewtoth approved these changes Jul 7, 2024

View reviewed changes

l0rinc mentioned this pull request Jul 12, 2024

precalculate SipHash constant salt XORs #30442

Merged

DrahtBot added the CI failed label Jul 22, 2024

DrahtBot removed the CI failed label Jul 22, 2024

DrahtBot added the Needs rebase label Aug 8, 2024

Reduce cache lookups in CCoinsViewCache::FetchCoin

204ca67

Enhanced efficiency and readability of CCoinsViewCache::FetchCoin by replacing separate find() and emplace() calls with a single try_emplace(), reducing map lookups and potential insertions.

l0rinc force-pushed the paplorinc/FetchCoin branch from 00052ec to 204ca67 Compare August 8, 2024 20:52

DrahtBot removed the Needs rebase label Aug 8, 2024

andrewtoth approved these changes Aug 9, 2024

View reviewed changes

DrahtBot requested a review from luke-jr August 9, 2024 17:57

achow101 merged commit 34ea750 into bitcoin:master Aug 12, 2024

l0rinc deleted the paplorinc/FetchCoin branch August 12, 2024 20:17

andrewtoth mentioned this pull request Sep 6, 2024

28.0rc1 synchronizes much slower on Windows #30833

Closed

l0rinc mentioned this pull request Mar 12, 2025

[IBD] Tracking PR for speeding up Initial Block Download #32043

Closed

13 tasks

l0rinc mentioned this pull request Mar 24, 2025

Draft: CCoinMap Experiments #32128

Closed

bitcoin locked and limited conversation to collaborators Aug 12, 2025

optimization: Reduce cache lookups in CCoinsViewCache::FetchCoin #30326

optimization: Reduce cache lookups in CCoinsViewCache::FetchCoin #30326

Uh oh!

Conversation

l0rinc commented Jun 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DrahtBot commented Jun 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Coverage

Reviews

Conflicts

Uh oh!

fanquake commented Jun 24, 2024

Uh oh!

l0rinc commented Jun 24, 2024

Uh oh!

fanquake commented Jun 24, 2024

Uh oh!

sipa commented Jun 24, 2024

Uh oh!

l0rinc commented Jun 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

l0rinc commented Jun 28, 2024

Uh oh!

andrewtoth commented Jul 4, 2024

Uh oh!

l0rinc commented Jul 5, 2024

Uh oh!

andrewtoth commented Jul 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

luke-jr left a comment

Choose a reason for hiding this comment

Uh oh!

l0rinc commented Jul 7, 2024

Uh oh!

andrewtoth left a comment

Choose a reason for hiding this comment

Uh oh!

l0rinc commented Jul 7, 2024

Uh oh!

DrahtBot commented Jul 22, 2024

Uh oh!

l0rinc commented Aug 8, 2024

Uh oh!

andrewtoth left a comment

Choose a reason for hiding this comment

Uh oh!

sipa commented Aug 9, 2024

Uh oh!

achow101 commented Aug 12, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

l0rinc commented Jun 23, 2024 •

edited

Loading

DrahtBot commented Jun 23, 2024 •

edited

Loading

l0rinc commented Jun 25, 2024 •

edited

Loading

andrewtoth commented Jul 5, 2024 •

edited

Loading