-
Notifications
You must be signed in to change notification settings - Fork 38.7k
log: avoid collecting GetSerializeSize data when compact block logging is disabled
#33738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33738. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please copy-paste ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
|
Thanks for catching this, the PR seems correct to me, why is it in draft? |
|
Concept ACK |
maflcko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a benchmark that detects this?
|
crACK 1785634 |
|
Concept ACK |
464d35d to
10e0e96
Compare
`PartiallyDownloadedBlock::FillBlock()` calculated the header hash that's only needed when debug logging is enabled. Github-Pull: bitcoin#33738 Rebased-From: 10e0e96 (partial)
danielabrozzoni
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review ACK 10e0e96
Code looks good to me, I ran the tests locally, but I didn’t perform any additional testing beyond that.
|
crACK 10e0e96 Lightly tested running on mainnet, logs for receiving look good: Haven't received a |
hodlinator
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 10e0e96
Agree the situation on master is not ideal.
However, when I see CTransaction I often think it should be renamed ImmutableTransaction (to counter CMutableTransaction).
In that light, and since it caches hashes, it seems worth at least exploring computing sizes as we are scanning the bytes to eagerly compute either hash. Even if some future endeavor has us lazy-computing the hash (which I would be happy to see), size & hash could still be computed together.
After realizing benchmarks were skewed, I replaced the block data from 2016 AD with a recent block to be more representative:
./build/bin/bitcoin-cli getblock 000000000000000000008ba770436ce094dc41adb020ce92eb00632b63af11a4 0 | xxd -r -p > src/bench/data/block413567.raw
(For the issue on more representative block data, see #32554).
First stab to cache the size during hashing
diff --git a/src/primitives/transaction.cpp b/src/primitives/transaction.cpp
index e80ab60fcd..37cd75099d 100644
--- a/src/primitives/transaction.cpp
+++ b/src/primitives/transaction.cpp
@@ -78,22 +78,74 @@ bool CTransaction::ComputeHasWitness() const
});
}
-Txid CTransaction::ComputeHash() const
+namespace {
+class CombinedWriter
{
- return Txid::FromUint256((HashWriter{} << TX_NO_WITNESS(*this)).GetHash());
+ HashWriter m_hash_writer;
+ SizeComputer m_size_computer;
+
+public:
+ void write(std::span<const std::byte> src)
+ {
+ m_hash_writer.write(src);
+ m_size_computer.write(src);
+ }
+
+ template <typename T>
+ CombinedWriter& operator<<(const T& obj)
+ {
+ ::Serialize(*this, obj);
+ return *this;
+ }
+
+ size_t GetSize() const {
+ return m_size_computer.size();
+ }
+
+ uint256 GetHash() {
+ return m_hash_writer.GetHash();
+ }
+};
+} // namespace
+
+Txid CTransaction::ComputeHash(size_t& out_size) const
+{
+ if (HasWitness()) {
+ return Txid::FromUint256((HashWriter{} << TX_NO_WITNESS(*this)).GetHash());
+ }
+
+ CombinedWriter combined_writer;
+ combined_writer << TX_NO_WITNESS(*this);
+ out_size = combined_writer.GetSize();
+ auto result{combined_writer.GetHash()};
+
+ // Costly, just for verification
+ //assert(result == (HashWriter{} << TX_NO_WITNESS(*this)).GetHash());
+ //assert(out_size == ::GetSerializeSize(TX_NO_WITNESS(*this)));
+
+ return Txid::FromUint256(result);
}
-Wtxid CTransaction::ComputeWitnessHash() const
+Wtxid CTransaction::ComputeWitnessHash(size_t& out_size) const
{
if (!HasWitness()) {
return Wtxid::FromUint256(hash.ToUint256());
}
- return Wtxid::FromUint256((HashWriter{} << TX_WITH_WITNESS(*this)).GetHash());
+ CombinedWriter combined_writer;
+ combined_writer << TX_WITH_WITNESS(*this);
+ out_size = combined_writer.GetSize();
+ auto result{combined_writer.GetHash()};
+
+ // Costly, just for verification
+ //assert(result == (HashWriter{} << TX_WITH_WITNESS(*this)).GetHash());
+ //assert(out_size == ::GetSerializeSize(TX_WITH_WITNESS(*this)));
+
+ return Wtxid::FromUint256(result);
}
-CTransaction::CTransaction(const CMutableTransaction& tx) : vin(tx.vin), vout(tx.vout), version{tx.version}, nLockTime{tx.nLockTime}, m_has_witness{ComputeHasWitness()}, hash{ComputeHash()}, m_witness_hash{ComputeWitnessHash()} {}
-CTransaction::CTransaction(CMutableTransaction&& tx) : vin(std::move(tx.vin)), vout(std::move(tx.vout)), version{tx.version}, nLockTime{tx.nLockTime}, m_has_witness{ComputeHasWitness()}, hash{ComputeHash()}, m_witness_hash{ComputeWitnessHash()} {}
+CTransaction::CTransaction(const CMutableTransaction& tx) : vin(tx.vin), vout(tx.vout), version{tx.version}, nLockTime{tx.nLockTime}, m_has_witness{ComputeHasWitness()}, hash{ComputeHash(/*out_size=*/m_size)}, m_witness_hash{ComputeWitnessHash(/*out_size=*/m_size)} {}
+CTransaction::CTransaction(CMutableTransaction&& tx) : vin(std::move(tx.vin)), vout(std::move(tx.vout)), version{tx.version}, nLockTime{tx.nLockTime}, m_has_witness{ComputeHasWitness()}, hash{ComputeHash(/*out_size=*/m_size)}, m_witness_hash{ComputeWitnessHash(/*out_size=*/m_size)} {}
CAmount CTransaction::GetValueOut() const
{
@@ -109,7 +161,7 @@ CAmount CTransaction::GetValueOut() const
unsigned int CTransaction::GetTotalSize() const
{
- return ::GetSerializeSize(TX_WITH_WITNESS(*this));
+ return m_size;
}
std::string CTransaction::ToString() const
diff --git a/src/primitives/transaction.h b/src/primitives/transaction.h
index 295bce61b9..406f6fb9ff 100644
--- a/src/primitives/transaction.h
+++ b/src/primitives/transaction.h
@@ -313,9 +313,10 @@ private:
const bool m_has_witness;
const Txid hash;
const Wtxid m_witness_hash;
+ size_t m_size;
- Txid ComputeHash() const;
- Wtxid ComputeWitnessHash() const;
+ Txid ComputeHash(size_t& out_size) const;
+ Wtxid ComputeWitnessHash(size_t& out_size) const;
bool ComputeHasWitness() const;
DeserializeBlockTest benchmark shows ~0.4% slowdown compared to master.
cmake --build build -t bench_bitcoin && build/bin/bench_bitcoin -filter="DeserializeBlockTest" -min-time=10000
Without diff (but still new block data):
| ns/block | block/s | err% | ins/block | cyc/block | IPC | bra/block | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 13,695,535.27 | 73.02 | 0.1% | 172,027,205.56 | 50,480,492.19 | 3.408 | 7,936,417.71 | 0.1% | 32.95 | `DeserializeBlockTest`
| 13,719,279.25 | 72.89 | 0.1% | 172,026,940.33 | 50,563,252.02 | 3.402 | 7,936,357.15 | 0.1% | 32.84 | `DeserializeBlockTest`
| 13,736,674.56 | 72.80 | 0.2% | 172,027,205.58 | 50,631,723.07 | 3.398 | 7,936,417.76 | 0.1% | 32.98 | `DeserializeBlockTest`
With new block data diff (size computed with hash):
| ns/block | block/s | err% | ins/block | cyc/block | IPC | bra/block | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 13,758,327.17 | 72.68 | 0.1% | 172,313,875.87 | 50,716,919.21 | 3.398 | 7,948,257.78 | 0.1% | 32.97 | `DeserializeBlockTest`
| 13,770,950.59 | 72.62 | 0.1% | 172,313,875.88 | 50,762,181.95 | 3.395 | 7,948,257.79 | 0.1% | 32.83 | `DeserializeBlockTest`
| 13,743,023.50 | 72.76 | 0.1% | 172,313,864.58 | 50,656,651.53 | 3.402 | 7,948,295.76 | 0.1% | 32.89 | `DeserializeBlockTest`
Second attempt, using CSHA256::bytes (preferred)
diff --git a/src/crypto/sha256.h b/src/crypto/sha256.h
index 3ac771c5d0..b97ace5fcc 100644
--- a/src/crypto/sha256.h
+++ b/src/crypto/sha256.h
@@ -23,6 +23,7 @@ public:
CSHA256();
CSHA256& Write(const unsigned char* data, size_t len);
void Finalize(unsigned char hash[OUTPUT_SIZE]);
+ uint64_t GetInputSize() const { return bytes; }
CSHA256& Reset();
};
diff --git a/src/hash.h b/src/hash.h
index 34486af64a..a4943fd69a 100644
--- a/src/hash.h
+++ b/src/hash.h
@@ -108,6 +108,8 @@ public:
ctx.Write(UCharCast(src.data()), src.size());
}
+ uint64_t GetInputSize() const { return ctx.GetInputSize(); }
+
/** Compute the double-SHA256 hash of all data written to this object.
*
* Invalidates this object.
diff --git a/src/primitives/transaction.cpp b/src/primitives/transaction.cpp
index e80ab60fcd..d9ebcfc1b7 100644
--- a/src/primitives/transaction.cpp
+++ b/src/primitives/transaction.cpp
@@ -78,22 +78,44 @@ bool CTransaction::ComputeHasWitness() const
});
}
-Txid CTransaction::ComputeHash() const
+Txid CTransaction::ComputeHash(size_t& out_size) const
{
- return Txid::FromUint256((HashWriter{} << TX_NO_WITNESS(*this)).GetHash());
+ if (HasWitness()) {
+ return Txid::FromUint256((HashWriter{} << TX_NO_WITNESS(*this)).GetHash());
+ }
+
+ HashWriter writer;
+ writer << TX_NO_WITNESS(*this);
+ out_size = writer.GetInputSize();
+ auto result{writer.GetHash()};
+
+ // Costly, just for verification
+ //assert(result == (HashWriter{} << TX_NO_WITNESS(*this)).GetHash());
+ //assert(out_size == ::GetSerializeSize(TX_NO_WITNESS(*this)));
+
+ return Txid::FromUint256(result);
}
-Wtxid CTransaction::ComputeWitnessHash() const
+Wtxid CTransaction::ComputeWitnessHash(size_t& out_size) const
{
if (!HasWitness()) {
return Wtxid::FromUint256(hash.ToUint256());
}
- return Wtxid::FromUint256((HashWriter{} << TX_WITH_WITNESS(*this)).GetHash());
+ HashWriter writer;
+ writer << TX_WITH_WITNESS(*this);
+ out_size = writer.GetInputSize();
+ auto result{writer.GetHash()};
+
+ // Costly, just for verification
+ //assert(result == (HashWriter{} << TX_WITH_WITNESS(*this)).GetHash());
+ //assert(out_size == ::GetSerializeSize(TX_WITH_WITNESS(*this)));
+
+ return Wtxid::FromUint256(result);
}
-CTransaction::CTransaction(const CMutableTransaction& tx) : vin(tx.vin), vout(tx.vout), version{tx.version}, nLockTime{tx.nLockTime}, m_has_witness{ComputeHasWitness()}, hash{ComputeHash()}, m_witness_hash{ComputeWitnessHash()} {}
-CTransaction::CTransaction(CMutableTransaction&& tx) : vin(std::move(tx.vin)), vout(std::move(tx.vout)), version{tx.version}, nLockTime{tx.nLockTime}, m_has_witness{ComputeHasWitness()}, hash{ComputeHash()}, m_witness_hash{ComputeWitnessHash()} {}
+CTransaction::CTransaction(const CMutableTransaction& tx) : vin(tx.vin), vout(tx.vout), version{tx.version}, nLockTime{tx.nLockTime}, m_has_witness{ComputeHasWitness()}, hash{ComputeHash(/*out_size=*/m_size)}, m_witness_hash{ComputeWitnessHash(/*out_size=*/m_size)} {}
+CTransaction::CTransaction(CMutableTransaction&& tx) : vin(std::move(tx.vin)), vout(std::move(tx.vout)), version{tx.version}, nLockTime{tx.nLockTime}, m_has_witness{ComputeHasWitness()}, hash{ComputeHash(/*out_size=*/m_size)}, m_witness_hash{ComputeWitnessHash(/*out_size=*/m_size)} {}
CAmount CTransaction::GetValueOut() const
{
@@ -109,7 +131,7 @@ CAmount CTransaction::GetValueOut() const
unsigned int CTransaction::GetTotalSize() const
{
- return ::GetSerializeSize(TX_WITH_WITNESS(*this));
+ return m_size;
}
std::string CTransaction::ToString() const
diff --git a/src/primitives/transaction.h b/src/primitives/transaction.h
index 295bce61b9..406f6fb9ff 100644
--- a/src/primitives/transaction.h
+++ b/src/primitives/transaction.h
@@ -313,9 +313,10 @@ private:
const bool m_has_witness;
const Txid hash;
const Wtxid m_witness_hash;
+ size_t m_size;
- Txid ComputeHash() const;
- Wtxid ComputeWitnessHash() const;
+ Txid ComputeHash(size_t& out_size) const;
+ Wtxid ComputeWitnessHash(size_t& out_size) const;
bool ComputeHasWitness() const;
DeserializeBlockTest benchmark shows no slowdown.
cmake --build build -t bench_bitcoin && build/bin/bench_bitcoin -filter="DeserializeBlockTest" -min-time=10000
Without diff (but still new block data):
| ns/block | block/s | err% | ins/block | cyc/block | IPC | bra/block | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 13,642,735.69 | 73.30 | 0.0% | 172,027,198.95 | 50,303,451.00 | 3.420 | 7,936,417.69 | 0.1% | 10.98 | `DeserializeBlockTest`
| 13,696,742.23 | 73.01 | 0.0% | 172,027,199.00 | 50,509,943.00 | 3.406 | 7,936,417.71 | 0.1% | 10.98 | `DeserializeBlockTest`
| 13,608,190.19 | 73.49 | 0.0% | 172,027,198.90 | 50,181,783.58 | 3.428 | 7,936,417.64 | 0.1% | 11.01 | `DeserializeBlockTest`
With new block data diff (size read off CSHA256::bytes):
| ns/block | block/s | err% | ins/block | cyc/block | IPC | bra/block | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 13,597,905.17 | 73.54 | 0.2% | 172,124,011.94 | 50,128,232.00 | 3.434 | 7,951,499.67 | 0.1% | 11.01 | `DeserializeBlockTest`
| 13,691,575.07 | 73.04 | 0.1% | 172,123,994.96 | 50,450,106.97 | 3.412 | 7,951,537.71 | 0.1% | 10.95 | `DeserializeBlockTest`
| 13,584,015.82 | 73.62 | 0.0% | 172,124,011.88 | 50,092,367.11 | 3.436 | 7,951,499.62 | 0.1% | 11.00 | `DeserializeBlockTest`
I can see the point of GetTotalSize() having so few users though. Not going to N-A-C-K current PR state, but might be a shame to change the name to ComputeTotalSize if we later decide to return a cached size value.
10e0e96 to
4f1b29d
Compare
I didn't understand at first how that's related, but slept on it and now I do :) If we want to cache size (or compute it alongside hashing), that should be evaluated based on broader call sites and overall construction cost, and done in a separate PR with focused benchmarks and review. I will investigate that in a separate PR.
I would consider that a feature: we should rename it back when we're not computing it anymore as it's confusing currently. Rebased in separate empty push and and pushed 2 suggestions in https://github.com/bitcoin/bitcoin/compare/4f1b29d32dea5fbcefe92df6dd0540807cfac1b2..2d3b557f1d414c509b992caa8b96f8ef3bb2aa46 |
4f1b29d to
2d3b557
Compare
… cached Transaction hashes are cached, it may not be intuitive that their sizes are actually recalculated every time. This is done before the other refactors to clarify why we want to avoid calling this method; Co-authored-by: maflcko <6399679+maflcko@users.noreply.github.com>
`PeerManagerImpl::SendBlockTransactions()` computed the total byte size of requested transactions for a debug log line by calling `ComputeTotalSize()` in a tight loop, triggering serialization even when debug logging is off. Guard the size accumulation with `LogAcceptCategory` so the serialization work only happens when the log line can be emitted. No behavior change when debug logging is enabled: the reported block hash, transaction count, and byte totals are the same. The bounds checks still run unconditionally; the debug-only loop iterates the already-validated response contents. Separating debug-only work from the critical path reduces risk and favors the performance-critical non-debug case. This also narrows the racy scope of when logging is toggled from another thread.
…is disabled `PartiallyDownloadedBlock::FillBlock()` computed the block header hash and summed missing transaction sizes for debug logging unconditionally, including when cmpctblock debug logging is disabled. Guard the debug-only hash and size computations with `LogAcceptCategory`. Since `txn_available` is invalidated after the first loop (needed for efficient moving), we compute `tx_missing_size` by iterating `vtx_missing` directly. This is safe because the later `tx_missing_offset` check guarantees `vtx_missing` was fully consumed during reconstruction. Use `block.GetHash()` instead of `header.GetHash()`, since header is cleared before logging. No behavior change when debug logging is enabled: the reported counts, hashes, and byte totals remain the same.
2d3b557 to
211149a
Compare
|
close/open dance to retrigger CI - likely caused by GitHub outage |
|
code-review reACK or crreACK 211149a |
hodlinator
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
crACK 211149a
The PR is a step forward, I'm just sad I couldn't recruit anyone yet to my wild-goose chase.
Wild-goose chase
The first commit renaming CTransaction::GetTotalSize => CTransaction::ComputeTotalSize irks me as I've taken the time to at least partially prove that the current name on master would probably be fine given that we incorporate something like "Second attempt, using CSHA256::bytes" in #33738 (review).
I can understand the willingness to leave that as a possible follow-up, and as it is not certain that such a follow-up would be merged, the rename is tolerable.
Context
The new accounting options introduced in #32582 can be quite heavy, and are not needed when debug logging is disabled.
Problem
PartiallyDownloadedBlock::FillBlock()andPeerManagerImpl::SendBlockTransactions()accumulate transaction sizes for debug logging by callingComputeTotalSize()in loops, which invokes expensiveGetSerializeSize()serializations.The block header hash is also only computed for the debug log.
Fixes
Guard the size and hash calculations with
LogAcceptCategory()checks so the serialization and hashing work only occurs when compact block debug logging is enabled.Also modernized the surrounding code a bit since the change is quite trivial.
Reproducer
You can test the change by starting an up-to-date
bitcoindnode with-debug=cmpctblockand observing compact block log lines such as:Test patch