[RFC] CChain Concurrency Improvement (Base + Tail Architecture)#34424
[RFC] CChain Concurrency Improvement (Base + Tail Architecture)#34424alexanderwiederin wants to merge 5 commits intobitcoin:masterfrom
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/34424. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please copy-paste ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. LLM Linter (✨ experimental)Possible typos and grammar issues:
2026-01-30 18:32:27 |
94e79ea to
a26ce37
Compare
ismaelsadeeq
left a comment
There was a problem hiding this comment.
Concept ACK on this.
Looks simpler than I thought.
A few comments after a quick skim through the PR
| @@ -0,0 +1,260 @@ | |||
| // Copyright (c) 2013 Adobe | |||
| // Copyright (c) 2025-present The Bitcoin Core developers | |||
There was a problem hiding this comment.
In "util: add copy-on-write smart pointer" 490324d
Is this a direct copy? Or has modifications, perhaps add a link in the commit message to the reference implementation.
I also think we should have a basic unit test for this utility.
There was a problem hiding this comment.
It's practically a direct copy. I removed the comments, documentation and deprecated fields and methods to keep it minimal. I also left a link to https://github.com/stlab/stlab-copy-on-write on the file header and just added Original source: https://github.com/stlab/stlab-copy-on-write/tree/abb4445 to the commit message.
Agree on the unit tests for this!
src/validation.h
Outdated
| //! @{ | ||
| Chainstate& ActiveChainstate() const; | ||
| CChain& ActiveChain() const EXCLUSIVE_LOCKS_REQUIRED(GetMutex()) { return ActiveChainstate().m_chain; } | ||
| CChain ActiveChainSnapshot() const { |
There was a problem hiding this comment.
In "validation: add ActiveChainSnapshot() method" 6b4db96
It reduces lock contention, indeed, but one downside is that it can be inconvenient for someone who just wants to know, e.g., the chain tip to get height or hash in a concurrent way. They have to copy the whole chain.
We should also support copying the chain tip, and returning the latest block index in the chain, not copying the whole chain.
There was a problem hiding this comment.
Of course, they can use the old ActiveChain, lock and make a copy, but in my opinion, we should strive to make the locking of cs_main internal to ChainstateManager and simply return copies of these values. Future code should only handle locks directly in performance-critical paths where making a copy is too expensive, or in synchronous contexts where the caller already holds the lock.
There was a problem hiding this comment.
For this RFC we chose to do a limited feature set. I agree that the end goal is something like you are describing.
There was a problem hiding this comment.
@ismaelsadeeq let me think about the best approach here. Thanks for the feedback.
There was a problem hiding this comment.
@ismaelsadeeq In the last push I refactored ActiveTip and ActiveHeight to use the snapshots and deduce the values from there. To reiterate, taking a copy of the chain is cheap due to copy-on-write - the underlying data is reference-counted. ActiveTip and ActiveHeight no longer require holding cs_main.
I've also removed the write_mutex on the CChain class in my last push to lean more into value semantics. The idea is that we treat the chain similar to how we would treat a std::vector. It's not internally thread-safe, and we expect the owner to synchronise.
I've also added documentation to CChain describing the performance characteristics and copy-on-write behaviour.
In the design section you describe the initial state with a single block in the tail, but a merge operation empties it. It seems nicer for those to be consistent?
I guess you could go an order of magnitude lower and match the coinbase maturity, if we need to rewrite the chain this deep we have bigger problems than a large amount of copies? |
Introduce copy-on-write wrapper from stlab library to enable efficient value semantics with shared ownership. Uses atomic reference counting to share instances and only copies on modification when non-unique. Based on Adobe's stlab library, distributed under Boost Software License 1.0 Original source: https://github.com/stlab/stlab-copy-on-write/tree/abb4445 Co-authored-by: janb84 <608446+janb84@users.noreply.github.com>
a26ce37 to
bbbed41
Compare
|
Appreciate the feedback @darosior!
I think the trade-off for Assuming the tail is copied on every The key insight is that
So far, the change has only resulted in a negligible IBD performance reduction, but I think we should pay attention to this parameter. |
bbbed41 to
84beaa9
Compare
|
Is there actually a need for stable If not, it seems a simpler alternative might be to just give |
Refactor CChain into a regular type with copy-on-write semantics, using a split base/tail architecture to enable snapshots. Key changes: - Make CChain a regular type supporting copy construction/assignment - Introduce COW Impl struct holding base (COW vector) and tail (mutable vector) - Optimize sequential appends by accumulating in tail until MAX_TAIL_SIZE - Merge tail into base only when size threshold is reached - Handle reorgs by rebuilding base and clearing tail - Update all methods (FindFork, FindEarliestAtLeast, etx.) for split storage This allows creating chain snapshots via simple copy while sharing the bulk of the data. Copies only occur on modification. Co-authored-by: janb84 <608446+janb84@users.noreply.github.com>
Add ActiveChainSnapshot() to ChainstateManager to enable lock-free access to the active chain. Unlike ActiveChain() which requires holding cs_main, this methods acquires the lock internally and returns a copy-on-write snapshot of the chain. Refactor ActiveTip() and ActiveHeight() to use ActiveChainSnapshot() instead of ActiveChain(), removing the EXCLUSIVE_LOCKS_REQUIRED annotation. This allows these methods to be called without holding cs_main, as they now manage locking internally. This reduces lock contention by allowing RPC methods and other consumers to work with a consistent view of the chain without holding cs_main for extended periods. Co-authored-by: janb84 <608446+janb84@users.noreply.github.com>
Replace cs_main-locked chain access with ActiveChainSnapshot() in getnetworkhashps to reduce lock contention.
Replace cs_main-locked chain access with ActiveChainSnapshot(), ActiveTip() and ActiveHeight() in blockchain RPC methods to reduce lock contention. These methods acquire cs_main internally and provide a consistent view of the chain without requiring the caller to hold cs_main throughout. For RPCs that also need BlockIndex lookups, cs_main is still acquired but only for the BlockIndex access, not for chain queries. Updated RPCs: - getblock, getblockheader: ActiveTip() for tip, cs_main for block lookup - getchaintips: ActiveChainSnapshot() for active chain, cs_main for BlockIndex iteration - getblockcount, getbestblockhash: ActiveHeight() only no cs_main needed - geblockhash, getdifficulty: ActiveChainSnapshot() via ParseHashOrHeight - pruneblockchain: ActiveHeight() for height calculation, cs_main for pruning - getchaintxstats: ActiveChainSnapshot() for tip and Contains check
84beaa9 to
711b3f4
Compare
Based on anecdotal experience, I believe yes. From what I understand this could be interesting for: a) RPC users in the context of mining and b) kernel API consumers for use cases like indexing, analytics and monitoring. But I acknowledge this needs a more concrete answer. In the meantime, @sedited and/or @Sjors, do you want to chime in? |
|
I believe that due to the fact that
I haven't tried this, or dug into what callers would need to change for this, so feel free to find counterexamples. But absent that, my thinking at a high level is that the |
There are a number of rpc calls where a stable I see this change as a first step towards moving some other data structures into a similar pattern and cleaning up our locking annotations for the chain and block index data structures (for example making the pprev pointer const). What might be less compelling for developers in this codebase is that the approach here arguably provides an API to the chain that is fairly easy to reason about for external, or rather kernel library, users. |
|
I certainly have no objections to improving locking annotations, and making locking cleaner. But in this case, I'm really not convinced: if you need lock-free access, just don't use Specifically for |
Mmh, going by its prevalence in the current code base it seems like developers would rather have random O(1) access than forego synchronization :P To provide a little more background, this topic was repeatedly brought up during review of the kernel api, where it was repeatedly requested to provide optimized, indexed, random access to entries in the block map without the need for synchronization. If there is no greater appetite for that here, then I guess that should be abandoned. If developers really require repeated, fast, random height queries, they can also build such an index themselves. I do agree though that in most cases in our code a single query to the tip (that could be protected by its own lock as per your suggestion), and then a lookup to |
Maybe, or that place in the code held a lock already, so there was no reason not to use Maybe we should investigate what would be required to drop |
|
There do seem to be a lot of places where Still, maybe it's possible to add an internal mutex to |
|
Your suggestions make sense - since I'll reconsider the approach with this in mind - thanks for taking the time! |
RFC: CChain Concurrency Improvement
This PR implements a copy-on-write based CChain design to enable concurrent access.
Motivation
Current CChain uses a single vector protected by
cs_main, creating bottlenecks for multi-threaded access. This PR lays the groundwork for removingcs_mainlocks by enabling consistent snapshots without holding locks.Design
See full design document: CChain Concurrency (Out of date after mutex removal - will be updated soon)
TL;DR: Split chain into base (bulk of history) + tail (recent ~1,000 blocks). Use nested
stlab::copy_on_writeso updates copy only tail, keeping base shared.Thanks to @purpleKarrot for pointing me to
stlab::copy_on_writeand making design suggestions.Changes
cs_mainlocks from RPC methods (where possible)cs_mainstill held during validation)Request for Comments
MAX_TAIL_SIZE = 1000reasonable?