Skip to content

Conversation

@sedited
Copy link
Contributor

@sedited sedited commented Apr 21, 2025

The current code makes it hard to call functions for checking the consensus-correctness of a block without having a full UTXO set at hand. This makes implementing features such as Utreexo and non-assumevalid swiftsync, where maintaining a full UTXO set defeats their purpose, through the kernel API difficult. Checks like the coinbase subsidy, or block sigops are only available through ConnectBlock, which requires a complete coins cache.

Solve this by moving accessing coins and the responsibility of spending them out of ConnectBlock and into a new separate function called SpendBlock. Pass a block's spent coins through the existing CBlockUndo data structure to ConnectBlock.

While the change is largely a refactor, some smaller behaviour changes were unavoidable. These should be highlighted in the commit messages. The last commit contains the most important part of the coins logic move, the commits before attempt to prepare it piecemeal.

This pull request was benchmarked on benchcoin, which showed no performance regression. While discussing this pull request it was noted that pre-fetching and eliminating the need for some additional map lookups might improve performance a bit, but this does not make a measurable difference in the grand scheme of things and was not the motivation for this pull request.

In future changes, ConnectBlock could be made even more self contained, accessing the coins from the internal map could be further optimized by consolidating existence checking and spending, and the function eventually exposed in an external kernel API (as done here)

@DrahtBot
Copy link
Contributor

DrahtBot commented Apr 21, 2025

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32317.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK ismaelsadeeq
Concept ACK w0xlt, stickies-v, stringintech, enirox001

If your review is incorrectly listed, please copy-paste <!--meta-tag:bot-skip--> into the comment that the bot should ignore.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #34165 (coins: don't mutate main cache when connecting block by andrewtoth)
  • #34164 (validation: add reusable coins view for ConnectBlock by andrewtoth)
  • #34004 (Implementation of SwiftSync by rustaceanrob)
  • #32575 (refactor: Split multithreaded case out of CheckInputScripts by fjahr)
  • #32468 (rpc: generateblock to allow multiple outputs by polespinasa)
  • #32427 ((RFC) kernel: Replace leveldb-based BlockTreeDB with flat-file based store by sedited)
  • #31974 (Drop testnet3 by Sjors)
  • #31132 (validation: fetch block inputs on parallel threads 3x faster IBD by andrewtoth)
  • #29843 (policy: Allow non-standard scripts with -acceptnonstdtxn=1 (test nets only) by ajtowns)
  • #29060 (Policy: Report debug message why inputs are non standard by ismaelsadeeq)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

LLM Linter (✨ experimental)

Possible typos and grammar issues:

  • "...(amounts and maturity) This does not modify the UTXO set." -> "...(amounts and maturity). This does not modify the UTXO set." [Missing sentence separator (period) after the parenthesis makes the sentence run-on and harms readability.]

  • "Return a vector of unspent outputs of coins in the cache that are spent by the provided transaction." -> "Return a vector of unspent outputs (CTxOut) from the cache that are being spent by the provided transaction." [The original phrasing is internally contradictory/unclear ("unspent outputs ... that are spent"); rewording clarifies that these are the outputs in the cache which the transaction spends.]

  • "@param[in] blockundo Has to contain all coins spent by block. Written to disk on successful validation" -> "@param[in] blockundo Must contain all coins spent by the block and is written to disk on successful validation." [Sentence fragment and missing auxiliary verb; the original is grammatically fragmented and slightly ambiguous.]

2025-12-03

@sedited sedited changed the title kernel: Seperate UTXO set access from validation functions kernel: Separate UTXO set access from validation functions Apr 21, 2025
@DrahtBot
Copy link
Contributor

🚧 At least one of the CI tasks failed.
Debug: https://github.com/bitcoin/bitcoin/runs/40876609750

Hints

Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:

  • Possibly due to a silent merge conflict (the changes in this pull request being
    incompatible with the current code in the target branch). If so, make sure to rebase on the latest
    commit of the target branch.

  • A sanitizer issue, which can only be found by compiling with the sanitizer and running the
    affected test.

  • An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

Copy link
Contributor

@w0xlt w0xlt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK, as this PR increases the decoupling of components.

nit: The name SpendBlock may not be clear (unless this concept is already known and I'm missing the point).
Maybe BIP30Validation, something like that, would be clearer.

@sedited
Copy link
Contributor Author

sedited commented May 7, 2025

Re #32317 (review)

nit: The name SpendBlock may not be clear (unless this concept is already known and I'm missing the point).
Maybe BIP30Validation, something like that, would be clearer.

It is doing more than BIP-30 validation, so I don't think that would accurate either. I thought the name was pretty clear for what it does, but I'd be open to other suggestions.

Comment on lines 2174 to 2089
bool CheckInputScripts(const CTransaction& tx, TxValidationState& state,
const CCoinsViewCache& inputs, unsigned int flags, bool cacheSigStore,
std::vector<CTxOut>&& spent_outputs, unsigned int flags, bool cacheSigStore,
bool cacheFullScriptStore, PrecomputedTransactionData& txdata,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we keep passing the spent_outputs instead of removing it (to simplify the API) and requiring callers to initialize txdata before calling CheckInputScripts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My read is that the call to txdata.Init is delayed to after checking the script execution cache for performance reasons. Calling Init does some hashing, that would not be required if it is already in the cache. Moving the spent_outputs vector construction is already a potential performance downside for potential cache hits, though I think that is more acceptable in the grand scheme of things.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, it makes sense. Thanks for the insights.

@sedited
Copy link
Contributor Author

sedited commented May 10, 2025

Updated 76a8f22 -> 16a695f (spendblock_0 -> spendblock_1, compare)

Copy link
Contributor

@stickies-v stickies-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

Reducing the coupling between validation functions and the UTXO set makes the validation functions more usable, and more testable. The approach taken here seems sensible, but I'm going to think about alternatives a bit more.

Comment on lines +177 to +185
for (const auto& txin : tx.vin) {
const COutPoint& prevout = txin.prevout;
const Coin& coin = AccessCoin(prevout);
assert(!coin.IsSpent());
spent_outputs.emplace_back(coin.out);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use AccessCoins here?

Suggested change
for (const auto& txin : tx.vin) {
const COutPoint& prevout = txin.prevout;
const Coin& coin = AccessCoin(prevout);
assert(!coin.IsSpent());
spent_outputs.emplace_back(coin.out);
}
for (const Coin& coin : AccessCoins(tx)) {
assert(!coin.IsSpent());
spent_outputs.emplace_back(coin.out);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: inconsistent naming between GetUnspentOutputs and spent_outputs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also use AccessCoin, but I think accessing it directly is a bit more efficient, because we don't allocate another vector. Do you have a better suggestion for the naming there? It is all a bit confusing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't considered the extra vector allocation, yeah I'm okay leaving this as-is then.

I did nerd-snipe myself into looking into having AccessCoins return a view would be quite elegant in some ways, allowing lazy evaluation, but seems like it would require quite a bit of overhaul and might have other drawbacks I've not considered yet.

    auto AccessCoins(const CTransaction& tx) const
    {
        return tx.vin | std::views::transform([this](const CTxIn& input) -> const Coin& {
            return this->AccessCoin(input.prevout);
        });
    }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a better suggestion for the naming there? It is all a bit confusing.

I agree the naming is confusing. I'd suggest GetUnspentOutputs -> GetPreviousOutputs and spent_outputs -> previous_outputs to avoid the spent/unspent terminology. What do you think?

Comment on lines +127 to +128
std::vector<CTxOut> spent_outputs{active_coins_tip.GetUnspentOutputs(tx)};
txdata.Init(tx, std::move(spent_outputs));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a way, it's a bit silly to add a parameter to the CheckInputScripts signature to then just move it into another parameter (txdata). An alternative approach could be to let the caller handle this, and assert txdata.m_spent_outputs_ready, but that could raise runtime assertion errors instead of catching them at compile time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @TheCharlatan noted here, delaying txdata.Init could be on purpose for the potential cache hit here. I wonder if there is a cleaner version for achieving the same goal though.

Copy link
Contributor

@stringintech stringintech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK.
I had a few more comments (sorry for the scattered feedback; didn't have the chance to review everything earlier).

@stickies-v
Copy link
Contributor

PSA: Bitcoin Core PR Review Club will cover this PR in its next meeting at 2025-05-14 at 17:00 UTC. See https://bitcoincore.reviews/32317 for notes, questions, and instructions on how to join.

@sedited
Copy link
Contributor Author

sedited commented May 15, 2025

Thank you for the reviews @stringintech and @stickies-v!

16a695f -> e8ac6d2 (spendblock_1 -> spendblock_2, compare)

  • Addressed @stickies-v's comments 1, 2, 3, 4, 5 improving docstrings for ConnectBlock and SpendBlock.
  • Addressedd @stickies-v's comment, removed block_hash argument from ConnectBlock and SpendBlock methods.
  • Addressed @stickies-v's comment, added CoinRef concept.
  • Addressed @stickies-v's comment, removed unneeded helper, use direct initialization to solve msvc compilation error.
  • Addressed @stickies-v's comment, add assert for blockundo.vtxundo.size() to rule out UB.
    *Addressed @stringintech's comment, pass in empty span to GetTransactionSigOpCost.
  • Addressed @stringintech's comment, update documentation for the removal of the inputs availability check in CheckTxInputs.
  • Addressed @stringintech's comment, removed line in commit message mentioning execution of the coinbase amount check even when another check has failed previously.

@purpleKarrot
Copy link
Contributor

A function that takes a const std::span<T> coins argument cannot modify coins, but it has mutable access to coins[x].

@sedited
Copy link
Contributor Author

sedited commented Nov 12, 2025

Rebased 5988c71 -> 6493e47 (spendblock_14 -> spendblock_15, compare)

Move the responsibility of retrieving coins from GetP2SHSigOpCount to
its caller.

This is a part of a series of commits for removing access to the
CCoinsViewCache in consensus verification functions. The goal is to
allow calling verification functions with pre-fetched, or a user-defined
set of coins.

Define two explicit template specializations for both a span of
references to coins and a span of coins. This allows using it for both
Coin entries referenced from the CCoinsViewCache, and from contiguous
memory, like the vector in CBlockUndo.
Move the responsibility of retrieving coins from GetTransactionSigOpCost
to its caller.

This is a part of a series of commits for removing access to the
CCoinsViewCache in consensus verification functions. The goal is to
allow calling verification functions with pre-fetched, or a user-defined
set of coins.

Define two explicit template specializations for both a span of
references to coins and a span of coins. This allows using it for both
Coin entries referenced from the CCoinsViewCache, and from contiguous
memory, like the vector in CBlockUndo.
Move the responsibility of retrieving coins from CheckTxInputs
to its caller. The missing inputs check will be moved in an upcoming
commit.

This is a part of a series of commits for removing access to the
CCoinsViewCache in consensus verification functions. The goal is to
allow calling verification functions with pre-fetched, or a user-defined
set of coins.

Define two explicit template specializations for both a span of
references to coins and a span of coins. This allows using it for both
Coin entries referenced from the CCoinsViewCache, and from contiguous
memory, like the vector in CBlockUndo.
…InputScripts

Move the responsibility of retrieving coins from the CCoinsViewCache
in CheckInputScripts to its caller.

Add a helper method in CCoinsViewCache to collect all Coin's outputs
spent by a transaction's inputs.

Callers of CCoinsViewCache are updated to either pre-fetch the spent
outputs, or pass in an empty vector if the precomputed transaction data
has already been initialized with the required outputs.

This is a part of a series of commits for removing access to the
CCoinsViewCache in consensus verification functions. The goal is to
allow calling verification functions with pre-fetched, or a user-defined
set of coins.
Move the BIP30 checks from ConnectBlock to a new SpendBlock method.
This is a move-only change, more content to SpendBlock is added in the
next commits. The goal is to move logic requiring access to the
CCoinsViewCache out of ConnectBlock and to the new SpendBlock method.
SpendBlock will in future handle all UTXO set interactions that
previously took place in ConnectBlock.

Callers of ConnectBlock now also need to call SpendBlock before. This is
enforced in a future commit by adding a CBlockUndo argument that needs
to be passed to both.

This is a part of a series of commits for removing access to the
CCoinsViewCache in consensus verification functions. The goal is to
allow calling verification functions with pre-fetched, or a user-defined
set of coins.
This is a part of a series of commits for removing access to the
CCoinsViewCache in consensus verification functions. The goal is to
allow calling verification functions with pre-fetched, or a user-defined
set of coins.
Move the remaining UTXO-related operations from ConnectBlock to
SpendBlock. This includes moving the existence check, the UpdateCoins
call, and CBlockUndo generation.

ConnectBlock now takes a pre-populated CBlockUndo as an argument and no
longer accesses or manipulates the UTXO set.
@sedited
Copy link
Contributor Author

sedited commented Dec 3, 2025

@sedited sedited moved this from API Development to Client Flexibility in The libbitcoinkernel Project Dec 14, 2025
Copy link
Member

@ismaelsadeeq ismaelsadeeq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

ACK first four commits, I think the remaining commits can even be made better.
comments are inline


BOOST_CHECK(!::AreInputsStandard(CTransaction(txToNonStd1), coins));
BOOST_CHECK_EQUAL(GetP2SHSigOpCount(CTransaction(txToNonStd1), coins), 16U);
auto coinsTxToNonStd1{coins.AccessCoins(CTransaction(txToNonStd1))};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In "consensus: Use Coin span in GetP2SHSigOpCount" 5709d07

nit: here and other places, these new variables should be snake case?

* @param[in] tx Transaction for which we are computing the cost
* @param[in] inputs Map of previous transactions that have outputs we're spending
* @param[in] tx Transaction for which we are computing the cost
* @param[in] coins Sorted span of Coins containing previous transaction outputs we're spending
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In "consensus: Use Coin span in GetP2SHSigOpCount" 5709d07

Here and other commits.
Sorted in what order?

Comment on lines 752 to 764
/**
* Spends the coins associated with a block. First checks that the block
* contains no duplicate transactions (BIP30), then for each transaction in
* the block checks that the outputs it is spending exist, spends them, and
* populates the CBlockUndo data structure.
*
* @param[in] block The block to be spent
* @param[in] pindex Points to the block map entry associated with block
* @param[in, out] view Its coins are spent and used to populate CBlockUndo during its execution
* @param[out] state This may be set to an Error state if any error occurred processing them
* @param[out] blockundo Coins consumed by the block are added to it.
*/
bool SpendBlock(const CBlock& block, const CBlockIndex* pindex,
Copy link
Member

@ismaelsadeeq ismaelsadeeq Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are doing this, I think this should also be separated into two functions.

  1. EnforceBIP30 — a method that performs BIP30 validation.
  2. SpendBlock — for each transaction in the block, checks that the outputs it is spending exist, spends them, and populates the CBlockUndo data structure.

It would also make testing easier, and removing BIP30 after a consensus cleanup soft fork would be straightforward.

What I will want to see is we have just SpendBlock and non-contextual block validation using the block data and undo data in a series of consensus rules validation functions.

Comment on lines +2819 to +2832
assert(pindex);
Assume(pindex->GetBlockHash() == block.GetHash());
auto block_hash{pindex->GetBlockHash()};

const CChainParams& params{m_chainman.GetParams()};
// Special case for the genesis block, skipping connection of its transactions
// (its coinbase is unspendable)
if (block_hash == params.GetConsensus().hashGenesisBlock) {
return true;
}

// verify that the view's current state corresponds to the previous block
uint256 hashPrevBlock = pindex->pprev == nullptr ? uint256() : pindex->pprev->GetBlockHash();
assert(hashPrevBlock == view.GetBestBlock());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In "validation: Add SpendBlock function" c937b12
why duplicate this, I think we should just add a comment warning that we expect they are already executed before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean here. Afaict this is moved over from ConnectBlock.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd rather keep these duplicated for now. These are short-circuits to our logic, similar to how the coinbase checks work in various places. Based on that I'd rather keep them in place. This could also easily be improved in a follow-up.

!warning_messages.empty() ? strprintf(" warning='%s'", warning_messages) : "");
}

bool Chainstate::SpendBlock(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In "validation: Add SpendBlock function" c937b12

The goal is to move logic requiring access to the
CCoinsViewCache out of ConnectBlock and to the new SpendBlock method.
SpendBlock will in future handle all UTXO set interactions that
previously took place in ConnectBlock.

This is counter to the objective of the PR; we are still accessing the UTXO set in this block validation function.
My hope was that we would have two UTXO-accessing validation functions that 1 populate the block undo data and 2. BIP30 separately, while the remaining validation functions would simply get passed the block undo data and operate as stateless validation functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is important that the BIP30 validation is done before we spend the coins. The current split makes sense to me for this reason. Can you elaborate a bit more what you think might not be optimal here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not sure if this is a good thing. I don't see why you would ever spend the coins in the utxo set without also doing the bip30 checks beforehand. To this end, I did implement it, but added a tag type that gets passed from BIP30Validate to SpendBlock: https://github.com/sedited/bitcoin/tree/spendblock_17 . Let me know what you think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve reviewed the branch, and it looks good overall. My only concern is the last commit that adds the tag. Initially, I thought it should be the caller’s responsibility to decide whether bip30validate is called, rather than mandating it here.

My motivation for separating SpendBlock from BIP30Validate is discussed in:
stratum-mining/sv2-apps#120

The branch without the last commit would make it easier to implement a variant of the idea described in stratum-mining/sv2-apps#120, where block validation for this JDS is as follows call SpendBlock and then ConnectBlock skip BIP30. However, the idea of skipping consensus checks is not acceptable.

Given that, I’m inclined to keep this as-is.
For stratum-mining/sv2-apps#120, we can instead expose IPC methods: SpendBlock, which takes a block and returns the coins, and ValidateBlock, which performs the full block validation checks. Since blocks from miners tend to share many transactions, they can still benefit from this approach, because the JDS needs to only call SpendBlock to validate newly seen transactions.

For Utreexo client, they provide their own undo data and cannot perform BIP30 validation anyway, so they benefit from this approach as well.

Overall, this is still an improvement over the current status quo.

Copy link
Member

@ismaelsadeeq ismaelsadeeq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK a0e9578

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Client Flexibility

Development

Successfully merging this pull request may close these issues.