-
Notifications
You must be signed in to change notification settings - Fork 38.7k
optimization: Reduce cache lookups in CCoinsViewCache::FetchCoin #30326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code CoverageFor detailed information about the code coverage, see the test coverage report. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsNo conflicts as of last run. |
faf0347 to
bf4b2ac
Compare
bf4b2ac to
bcacbf4
Compare
I ran the benchmark on an aarch64 machine, and saw mostly worse performance when using the latest GCC, or Clang, with libstdc++. There was a small improvement when using Clang and libc++. GCC 14.1.1 20240620 (Red Hat 14.1.1-6) Master | ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 298,908.49 | 3,345.51 | 0.1% | 2,175,461.97 | 883,257.19 | 2.463 | 314,883.84 | 0.5% | 10.99 | `AssembleBlock`PR | ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 317,848.00 | 3,146.16 | 0.5% | 2,243,862.65 | 918,949.53 | 2.442 | 316,609.60 | 0.5% | 11.00 | `AssembleBlock`Clang 18.1.6 (Fedora 18.1.6-4.fc41) Master | ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 290,702.08 | 3,439.95 | 0.2% | 2,047,925.12 | 858,324.72 | 2.386 | 290,750.85 | 0.6% | 10.99 | `AssembleBlock`PR | ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 301,625.77 | 3,315.37 | 0.2% | 2,138,789.21 | 891,494.06 | 2.399 | 294,669.20 | 0.5% | 10.96 | `AssembleBlock`Clang libc++ Master | ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 285,978.35 | 3,496.77 | 0.2% | 1,960,603.15 | 845,960.63 | 2.318 | 294,904.09 | 0.6% | 10.98 | `AssembleBlock`PR | ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 279,828.66 | 3,573.62 | 0.1% | 1,944,499.08 | 825,776.61 | 2.355 | 296,404.49 | 0.6% | 11.01 | `AssembleBlock` |
|
Thanks for checking @fanquake! |
|
I think that rather than trying to micro-optimise these somewhat sensitive functions, if you're interested in benchmarking / performance related changes, your time might be better spent reviewing Pull Requests like #28280 (which your changes conflict with). I'd also note that all of your benchmarking / profiling seems to be being done on an arm64 macOS machine, using (I assume) Apple Clang as the compiler, and libc++ as the standard library. So when you create changes like these, even if you see improvments locally, they aren't necessarily going to be improvements for the compiler/std lib we primarily care about (because it's used to build the majority of our release binaries), which is GCC and libstdc++. |
|
If you can benchmark pre-assumevalid IBD I think you'll see |
|
Thanks for the compiler hints @fanquake, I had a lot of problems with MacOSX14.sdk, so was using
And thanks for the IBD hint @sipa, by limiting `-stopatheight=200000` and disabling unrelated validations, the target method was indeed the main bottleneck (~20% of time spent):
|
Thanks for the hint @fanquake, it's a tough PR, I have added my two sats to it: #28280 (review) |
|
Just a style nit, but I think if you return early on diff --git a/src/coins.cpp b/src/coins.cpp
index a4e4d4ad32..9e8072ba02 100644
--- a/src/coins.cpp
+++ b/src/coins.cpp
@@ -41,20 +41,20 @@ size_t CCoinsViewCache::DynamicMemoryUsage() const {
}
CCoinsMap::iterator CCoinsViewCache::FetchCoin(const COutPoint &outpoint) const {
- CCoinsMap::iterator it = cacheCoins.find(outpoint);
- if (it != cacheCoins.end())
+ const auto [it, inserted] = cacheCoins.try_emplace(outpoint);
+ if (!inserted)
return it;
- Coin tmp;
- if (!base->GetCoin(outpoint, tmp))
+ if (!base->GetCoin(outpoint, it->second.coin)) {
+ cacheCoins.erase(it);
return cacheCoins.end();
- CCoinsMap::iterator ret = cacheCoins.emplace(std::piecewise_construct, std::forward_as_tuple(outpoint), std::forward_as_tuple(std::move(tmp))).first;
- if (ret->second.coin.IsSpent()) {
+ }
+ if (it->second.coin.IsSpent()) {
// The parent only has an empty entry for this outpoint; we can consider our
// version as fresh.
- ret->second.flags = CCoinsCacheEntry::FRESH;
+ it->second.flags = CCoinsCacheEntry::FRESH;
}
- cachedCoinsUsage += ret->second.coin.DynamicMemoryUsage();
- return ret;
+ cachedCoinsUsage += it->second.coin.DynamicMemoryUsage();
+ return it;
}
bool CCoinsViewCache::GetCoin(const COutPoint &outpoint, Coin &coin) const { |
bcacbf4 to
00052ec
Compare
|
Thanks for the review @andrewtoth. |
I don't think @sipa said to disable validations. "benchmark pre-assumevalid IBD" means testing IBD (or reindex) before the assumevalid block, which right now is block 824,000. By setting
With this limit you should not see any benefit from this change. Using default I think you should benchmark
It was because the diff is then +9/-9 and no lines change indentation, and the current diff is +12/-13 with indentation changing. So just a style nit, no advantage to the code. Also, after your change of renaming |
luke-jr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK 00052ecc0402f2c53e3602b901a937b64aa7f7cc
There's a potential race condition for negative lookups, but I assume it's fine since the old code would also have a (different) race.
Performance might be slightly worse for negative lookups, but that seems an acceptable tradeoff since the positive lookup path is much more likely, especially during IBD.
|
Thanks for the review @andrewtoth and @luke-jr!
I don't have that much space yet, but ran it with: sudo hyperfine \
--runs 3 \
--show-output \
--export-markdown bench.md \
--parameter-list commit bd5d16,00052ec \
--prepare "sudo purge; sync; git checkout {commit}; git reset --hard; make -j10" \
"./src/bitcoind -reindex-chainstate -stopatheight=400000 -printtoconsole=0"resulting in
i.e. a modest 2% speedup.
good point, if more changes will be needed, I'll do that!
Thanks Luke, that was my thinking as well. I'll do a full IBD this week to see how it behaves (once I free up some extra space locally). |
andrewtoth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 00052ecc0402f2c53e3602b901a937b64aa7f7cc
I benchmarked IBD from a local node connection to block 820k, twice on this branch twice on master.
This branch: 22265 seconds mean time
Master: 22523 seconds mean time
Also benchmarked -reindex-chainstate up to block 820k, twice on this branch and twice on master.
This branch: 22085 seconds mean time
Master: 22255 seconds mean time
So it appears to shave off ~3 minutes of time.
|
Thanks a lot for checking, @andrewtoth! |
|
🚧 At least one of the CI tasks failed. HintsMake sure to run all tests locally, according to the documentation. The failure may happen due to a number of reasons, for example:
Leave a comment here, if you need help tracking down a confusing failure. |
Enhanced efficiency and readability of CCoinsViewCache::FetchCoin by replacing separate find() and emplace() calls with a single try_emplace(), reducing map lookups and potential insertions.
00052ec to
204ca67
Compare
|
Rebased after #28280 |
andrewtoth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re-ACK 204ca67
|
utACK 204ca67 |
|
ACK 204ca67 |


Enhanced efficiency and readability of
CCoinsViewCache::FetchCoinby replacing separatefind()andemplace()calls with a singletry_emplace(), reducing map lookups and potential insertions.AssembleBlockshowsFetchCoinas one of its bottlenecks:These changes result in a modest performance improvement:
before:
AssembleBlockafter:
AssembleBlockFurther benchmarks: #30326 (comment)