-
Notifications
You must be signed in to change notification settings - Fork 38.7k
kernel: Separate UTXO set access from validation functions #32317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32317. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please copy-paste ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. LLM Linter (✨ experimental)Possible typos and grammar issues:
2025-12-03 |
|
🚧 At least one of the CI tasks failed. HintsTry to run the tests locally, according to the documentation. However, a CI failure may still
Leave a comment here, if you need help tracking down a confusing failure. |
w0xlt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK, as this PR increases the decoupling of components.
nit: The name SpendBlock may not be clear (unless this concept is already known and I'm missing the point).
Maybe BIP30Validation, something like that, would be clearer.
It is doing more than BIP-30 validation, so I don't think that would accurate either. I thought the name was pretty clear for what it does, but I'd be open to other suggestions. |
| bool CheckInputScripts(const CTransaction& tx, TxValidationState& state, | ||
| const CCoinsViewCache& inputs, unsigned int flags, bool cacheSigStore, | ||
| std::vector<CTxOut>&& spent_outputs, unsigned int flags, bool cacheSigStore, | ||
| bool cacheFullScriptStore, PrecomputedTransactionData& txdata, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we keep passing the spent_outputs instead of removing it (to simplify the API) and requiring callers to initialize txdata before calling CheckInputScripts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My read is that the call to txdata.Init is delayed to after checking the script execution cache for performance reasons. Calling Init does some hashing, that would not be required if it is already in the cache. Moving the spent_outputs vector construction is already a potential performance downside for potential cache hits, though I think that is more acceptable in the grand scheme of things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, it makes sense. Thanks for the insights.
|
Updated 76a8f22 -> 16a695f (spendblock_0 -> spendblock_1, compare)
|
stickies-v
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
Reducing the coupling between validation functions and the UTXO set makes the validation functions more usable, and more testable. The approach taken here seems sensible, but I'm going to think about alternatives a bit more.
| for (const auto& txin : tx.vin) { | ||
| const COutPoint& prevout = txin.prevout; | ||
| const Coin& coin = AccessCoin(prevout); | ||
| assert(!coin.IsSpent()); | ||
| spent_outputs.emplace_back(coin.out); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can use AccessCoins here?
| for (const auto& txin : tx.vin) { | |
| const COutPoint& prevout = txin.prevout; | |
| const Coin& coin = AccessCoin(prevout); | |
| assert(!coin.IsSpent()); | |
| spent_outputs.emplace_back(coin.out); | |
| } | |
| for (const Coin& coin : AccessCoins(tx)) { | |
| assert(!coin.IsSpent()); | |
| spent_outputs.emplace_back(coin.out); | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: inconsistent naming between GetUnspentOutputs and spent_outputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also use AccessCoin, but I think accessing it directly is a bit more efficient, because we don't allocate another vector. Do you have a better suggestion for the naming there? It is all a bit confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't considered the extra vector allocation, yeah I'm okay leaving this as-is then.
I did nerd-snipe myself into looking into having AccessCoins return a view would be quite elegant in some ways, allowing lazy evaluation, but seems like it would require quite a bit of overhaul and might have other drawbacks I've not considered yet.
auto AccessCoins(const CTransaction& tx) const
{
return tx.vin | std::views::transform([this](const CTxIn& input) -> const Coin& {
return this->AccessCoin(input.prevout);
});
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a better suggestion for the naming there? It is all a bit confusing.
I agree the naming is confusing. I'd suggest GetUnspentOutputs -> GetPreviousOutputs and spent_outputs -> previous_outputs to avoid the spent/unspent terminology. What do you think?
| std::vector<CTxOut> spent_outputs{active_coins_tip.GetUnspentOutputs(tx)}; | ||
| txdata.Init(tx, std::move(spent_outputs)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a way, it's a bit silly to add a parameter to the CheckInputScripts signature to then just move it into another parameter (txdata). An alternative approach could be to let the caller handle this, and assert txdata.m_spent_outputs_ready, but that could raise runtime assertion errors instead of catching them at compile time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @TheCharlatan noted here, delaying txdata.Init could be on purpose for the potential cache hit here. I wonder if there is a cleaner version for achieving the same goal though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK.
I had a few more comments (sorry for the scattered feedback; didn't have the chance to review everything earlier).
|
PSA: Bitcoin Core PR Review Club will cover this PR in its next meeting at 2025-05-14 at 17:00 UTC. See https://bitcoincore.reviews/32317 for notes, questions, and instructions on how to join. |
|
Thank you for the reviews @stringintech and @stickies-v! 16a695f -> e8ac6d2 (spendblock_1 -> spendblock_2, compare)
|
|
A function that takes a |
|
Rebased 5988c71 -> 6493e47 (spendblock_14 -> spendblock_15, compare) |
Move the responsibility of retrieving coins from GetP2SHSigOpCount to its caller. This is a part of a series of commits for removing access to the CCoinsViewCache in consensus verification functions. The goal is to allow calling verification functions with pre-fetched, or a user-defined set of coins. Define two explicit template specializations for both a span of references to coins and a span of coins. This allows using it for both Coin entries referenced from the CCoinsViewCache, and from contiguous memory, like the vector in CBlockUndo.
Move the responsibility of retrieving coins from GetTransactionSigOpCost to its caller. This is a part of a series of commits for removing access to the CCoinsViewCache in consensus verification functions. The goal is to allow calling verification functions with pre-fetched, or a user-defined set of coins. Define two explicit template specializations for both a span of references to coins and a span of coins. This allows using it for both Coin entries referenced from the CCoinsViewCache, and from contiguous memory, like the vector in CBlockUndo.
Move the responsibility of retrieving coins from CheckTxInputs to its caller. The missing inputs check will be moved in an upcoming commit. This is a part of a series of commits for removing access to the CCoinsViewCache in consensus verification functions. The goal is to allow calling verification functions with pre-fetched, or a user-defined set of coins. Define two explicit template specializations for both a span of references to coins and a span of coins. This allows using it for both Coin entries referenced from the CCoinsViewCache, and from contiguous memory, like the vector in CBlockUndo.
…InputScripts Move the responsibility of retrieving coins from the CCoinsViewCache in CheckInputScripts to its caller. Add a helper method in CCoinsViewCache to collect all Coin's outputs spent by a transaction's inputs. Callers of CCoinsViewCache are updated to either pre-fetch the spent outputs, or pass in an empty vector if the precomputed transaction data has already been initialized with the required outputs. This is a part of a series of commits for removing access to the CCoinsViewCache in consensus verification functions. The goal is to allow calling verification functions with pre-fetched, or a user-defined set of coins.
Move the BIP30 checks from ConnectBlock to a new SpendBlock method. This is a move-only change, more content to SpendBlock is added in the next commits. The goal is to move logic requiring access to the CCoinsViewCache out of ConnectBlock and to the new SpendBlock method. SpendBlock will in future handle all UTXO set interactions that previously took place in ConnectBlock. Callers of ConnectBlock now also need to call SpendBlock before. This is enforced in a future commit by adding a CBlockUndo argument that needs to be passed to both. This is a part of a series of commits for removing access to the CCoinsViewCache in consensus verification functions. The goal is to allow calling verification functions with pre-fetched, or a user-defined set of coins.
This is a part of a series of commits for removing access to the CCoinsViewCache in consensus verification functions. The goal is to allow calling verification functions with pre-fetched, or a user-defined set of coins.
Move the remaining UTXO-related operations from ConnectBlock to SpendBlock. This includes moving the existence check, the UpdateCoins call, and CBlockUndo generation. ConnectBlock now takes a pre-populated CBlockUndo as an argument and no longer accesses or manipulates the UTXO set.
|
Rebased 6493e47 -> a0e9578 (spendblock_15 -> spendblock_16, compare)
|
ismaelsadeeq
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
ACK first four commits, I think the remaining commits can even be made better.
comments are inline
|
|
||
| BOOST_CHECK(!::AreInputsStandard(CTransaction(txToNonStd1), coins)); | ||
| BOOST_CHECK_EQUAL(GetP2SHSigOpCount(CTransaction(txToNonStd1), coins), 16U); | ||
| auto coinsTxToNonStd1{coins.AccessCoins(CTransaction(txToNonStd1))}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In "consensus: Use Coin span in GetP2SHSigOpCount" 5709d07
nit: here and other places, these new variables should be snake case?
| * @param[in] tx Transaction for which we are computing the cost | ||
| * @param[in] inputs Map of previous transactions that have outputs we're spending | ||
| * @param[in] tx Transaction for which we are computing the cost | ||
| * @param[in] coins Sorted span of Coins containing previous transaction outputs we're spending |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In "consensus: Use Coin span in GetP2SHSigOpCount" 5709d07
Here and other commits.
Sorted in what order?
| /** | ||
| * Spends the coins associated with a block. First checks that the block | ||
| * contains no duplicate transactions (BIP30), then for each transaction in | ||
| * the block checks that the outputs it is spending exist, spends them, and | ||
| * populates the CBlockUndo data structure. | ||
| * | ||
| * @param[in] block The block to be spent | ||
| * @param[in] pindex Points to the block map entry associated with block | ||
| * @param[in, out] view Its coins are spent and used to populate CBlockUndo during its execution | ||
| * @param[out] state This may be set to an Error state if any error occurred processing them | ||
| * @param[out] blockundo Coins consumed by the block are added to it. | ||
| */ | ||
| bool SpendBlock(const CBlock& block, const CBlockIndex* pindex, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are doing this, I think this should also be separated into two functions.
EnforceBIP30— a method that performs BIP30 validation.SpendBlock— for each transaction in the block, checks that the outputs it is spending exist, spends them, and populates theCBlockUndodata structure.
It would also make testing easier, and removing BIP30 after a consensus cleanup soft fork would be straightforward.
What I will want to see is we have just SpendBlock and non-contextual block validation using the block data and undo data in a series of consensus rules validation functions.
| assert(pindex); | ||
| Assume(pindex->GetBlockHash() == block.GetHash()); | ||
| auto block_hash{pindex->GetBlockHash()}; | ||
|
|
||
| const CChainParams& params{m_chainman.GetParams()}; | ||
| // Special case for the genesis block, skipping connection of its transactions | ||
| // (its coinbase is unspendable) | ||
| if (block_hash == params.GetConsensus().hashGenesisBlock) { | ||
| return true; | ||
| } | ||
|
|
||
| // verify that the view's current state corresponds to the previous block | ||
| uint256 hashPrevBlock = pindex->pprev == nullptr ? uint256() : pindex->pprev->GetBlockHash(); | ||
| assert(hashPrevBlock == view.GetBestBlock()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In "validation: Add SpendBlock function" c937b12
why duplicate this, I think we should just add a comment warning that we expect they are already executed before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean here. Afaict this is moved over from ConnectBlock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd rather keep these duplicated for now. These are short-circuits to our logic, similar to how the coinbase checks work in various places. Based on that I'd rather keep them in place. This could also easily be improved in a follow-up.
| !warning_messages.empty() ? strprintf(" warning='%s'", warning_messages) : ""); | ||
| } | ||
|
|
||
| bool Chainstate::SpendBlock( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In "validation: Add SpendBlock function" c937b12
The goal is to move logic requiring access to the
CCoinsViewCache out of ConnectBlock and to the new SpendBlock method.
SpendBlock will in future handle all UTXO set interactions that
previously took place in ConnectBlock.
This is counter to the objective of the PR; we are still accessing the UTXO set in this block validation function.
My hope was that we would have two UTXO-accessing validation functions that 1 populate the block undo data and 2. BIP30 separately, while the remaining validation functions would simply get passed the block undo data and operate as stateless validation functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is important that the BIP30 validation is done before we spend the coins. The current split makes sense to me for this reason. Can you elaborate a bit more what you think might not be optimal here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not sure if this is a good thing. I don't see why you would ever spend the coins in the utxo set without also doing the bip30 checks beforehand. To this end, I did implement it, but added a tag type that gets passed from BIP30Validate to SpendBlock: https://github.com/sedited/bitcoin/tree/spendblock_17 . Let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve reviewed the branch, and it looks good overall. My only concern is the last commit that adds the tag. Initially, I thought it should be the caller’s responsibility to decide whether bip30validate is called, rather than mandating it here.
My motivation for separating SpendBlock from BIP30Validate is discussed in:
stratum-mining/sv2-apps#120
The branch without the last commit would make it easier to implement a variant of the idea described in stratum-mining/sv2-apps#120, where block validation for this JDS is as follows call SpendBlock and then ConnectBlock skip BIP30. However, the idea of skipping consensus checks is not acceptable.
Given that, I’m inclined to keep this as-is.
For stratum-mining/sv2-apps#120, we can instead expose IPC methods: SpendBlock, which takes a block and returns the coins, and ValidateBlock, which performs the full block validation checks. Since blocks from miners tend to share many transactions, they can still benefit from this approach, because the JDS needs to only call SpendBlock to validate newly seen transactions.
For Utreexo client, they provide their own undo data and cannot perform BIP30 validation anyway, so they benefit from this approach as well.
Overall, this is still an improvement over the current status quo.
ismaelsadeeq
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK a0e9578
The current code makes it hard to call functions for checking the consensus-correctness of a block without having a full UTXO set at hand. This makes implementing features such as Utreexo and non-assumevalid swiftsync, where maintaining a full UTXO set defeats their purpose, through the kernel API difficult. Checks like the coinbase subsidy, or block sigops are only available through
ConnectBlock, which requires a complete coins cache.Solve this by moving accessing coins and the responsibility of spending them out of
ConnectBlockand into a new separate function calledSpendBlock. Pass a block's spent coins through the existingCBlockUndodata structure toConnectBlock.While the change is largely a refactor, some smaller behaviour changes were unavoidable. These should be highlighted in the commit messages. The last commit contains the most important part of the coins logic move, the commits before attempt to prepare it piecemeal.
This pull request was benchmarked on benchcoin, which showed no performance regression. While discussing this pull request it was noted that pre-fetching and eliminating the need for some additional map lookups might improve performance a bit, but this does not make a measurable difference in the grand scheme of things and was not the motivation for this pull request.
In future changes, ConnectBlock could be made even more self contained, accessing the coins from the internal map could be further optimized by consolidating existence checking and spending, and the function eventually exposed in an external kernel API (as done here)