index: initial sync speedup, parallelize process #26966

furszy · 2023-01-25T13:36:23Z

The current procedure for building the block filter index involves processing filters one at a time;
Reading blocks, undo data, and previous headers from disk sequentially.

This PR introduces a new mechanism to perform the work concurrently. Dividing the filters
generation workload among a pool of workers that can be configured by the user,
significantly increasing the speed of the index construction process.

The same concurrent processing model has been applied to the transactions index as well.

The newly introduced init flag -indexworkers=<n> enables the concurrent sync
behavior.
Where "n" is the number of worker threads that will be spawned at startup to create ranges
of block filters during the initial sync process. Destroying the workers pool once the
initial sync completes.
Note: by default, the parallelized sync process is not enabled.

Now the juicy part:
In my computer, with the node in debug mode and on IBD, with -indexworkers=4, the
block filter index generation took less than an hour. While, in master, the sync took more than 7 hours.

Important Note:
As the access to the block data on disk is protected by cs_main, this new feature runs substantially
faster when the node is not in IBD.

Testing Notes:

Sync your node without any index.
Restart the node with one of the indexes (-blockfilterindex or -txindex) and -connect=0 (to sync only the index, without running the net/validation threads. Since threads won't be competing for cs_main, this will give you a more accurate result).

You’ll see a "[index name] is enabled at height [height]" log entry once it finishes. Then it’s just a matter of subtracting the index startup time from the "index synced" log time.

Keep in mind that threads are shared among all indexes you start. So if you run both indexes at the same time, your benchmark results cannot be compared against single-index runs.

Fuzz Test Coverage Report

Coverage diff for the fuzz test can be found on https://brunoerg.xyz/bitcoin-core-coverage/26966/coverage_report/

DrahtBot · 2023-01-25T13:36:26Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/26966.

Reviews

See the guideline for information on the review process.

Type	Reviewers
Concept ACK	w0xlt, TheCharlatan, ismaelsadeeq, mzumsande
Approach ACK	ryanofsky
Approach NACK	l0rinc
Stale ACK	pinheadmz

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#33689 (http: replace WorkQueue and single threads handling for ThreadPool by furszy)
#31308 (ci, iwyu: Treat warnings as errors for specific directories by hebasto)
#31260 (scripted-diff: Type-safe settings retrieval by ryanofsky)
#29770 (index: Check all necessary block data is available before starting to sync by fjahr)
#29700 (kernel, refactor: return error status on all fatal errors by ryanofsky)
#17783 (common: Disallow calling IsArgSet() on ALLOW_LIST options by ryanofsky)
#17581 (refactor: Remove settings merge reverse precedence code by ryanofsky)
#17580 (refactor: Add ALLOW_LIST flags and enforce usage in CheckArgFlags by ryanofsky)
#17493 (util: Forbid ambiguous multiple assignments in config file by ryanofsky)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

LLM Linter (✨ experimental)

Possible typos and grammar issues:

a posterior -> later [“a posterior” is incorrect here; use “later” (or “posterior”) to convey a subsequent check]

^{drahtbot_id_5_m}

Sjors · 2023-01-26T14:38:27Z

Cool, will take it for a spin...

furszy · 2023-01-28T15:18:54Z

Cool @Sjors, just pushed a small update. Found a little bug.

Going to add an important note to the PR description (because otherwise testing results will vary a lot):

As the access to the block data on disk is protected by cs_main, this new feature runs substantially
faster when the node is not in IBD.
(where "substantially" here means full index sync, with 5 workers, in less than 20 minutes in my computer).

mzumsande

It's probably much easier suggested than done, but did you attempt to implement parallelization in a more general way so that other indices could benefit from it as well? On first glance, txindex, and the indices suggested in PRs (#24539, #26951) seem to be parallelizable as well (not coinstatsindex though).

furszy · 2023-01-30T13:37:06Z

It's probably much easier suggested than done, but did you attempt to implement parallelization in a more general way so that other indices could benefit from it as well?

Yeah, that is part of the plan. I started with the block filter index because it requires an special treatment that txindex parallelization does not require (block/undo data reading and block filters creation can be parallelized but writing must be done sequentially due the need to link filter headers to their predecessors to create the filters-chain on disk).

My idea was to start reviewing this one, so the process gets as clean as possible, and then move forward with the generalization step. It's usually more natural to abstract processes when the specific cases are well-defined.

w0xlt

Concept ACK

Perhaps it could have separate parallelization and index logic. So it could be reused in other indexes.

Sjors · 2023-02-07T13:47:32Z

Building the index on (AMD Ryzen 7950X, blocks stored on SSD):

master @ fe86616: 35'15" (mostly 1 alternating CPU thread)
this PR (rebased)
- n=8: 5'20" (uses about 8 of 32 CPU threads as expected)
- n=32: 4'26" (pleasantly close to 100% CPU usage with a dip every 10 seconds, but it drops to only 1 CPU in the last minute or two)

I made sure to not load any wallets and disabled other indexes.

I didn't test if the index was correct.

I wonder if, for users without this index, it would be faster to generate the index, rescan the wallet and then delete it again. Combined with #26951 you would only have to generate filters up to the age of the wallet (IIUC, cc @pstratem).

Note to self for future benchmarks: -txindex takes 1:07'14", -coinstatsindex takes 3.5 hours.

furszy · 2023-02-07T14:18:36Z

Great results @Sjors!.

Could also give it a run rebased on top #27006.
On master, the index initial sync is slower when the node is in IBD because the index thread has to compete for access to block data on disk through cs_main acquisition.

I didn't test if the index was correct.

The PR contain a test verifying it.

Side note:
I'm working on generalizing the parallelization flow so other indexes, like the txindex and #26951 can make use of it too.

furszy · 2025-10-14T15:06:10Z

I also measured the txindex with -indexworkers=15 on a 32 vcpu machine and it was slightly slower than master. I guess this makes sense since each thread needs to write the index data to leveldb, and the actual work done by the CPU is minimal (serializing then computing offsets). So each thread is waiting for the other threads to finish writing. For blockfilter the filter computation is more intensive, and then each filter is written to a separate file. Only after the filter is written the location of the filter is written to the shared leveldb.

Edit: Just tried again on latest push f130332 on a node synced to 913866:

./build/bin/bitcoind -connect=0 -disablewallet -txindex=1 -indexworkers=5

branch (f130332) - time was 90 minutes 41 seconds master (becf150) - time was 84 minutes 42 seconds

So master was ~7% faster than this PR with 5 workers on an SSD.

Maybe we don't enable parallel sync for txindex?

I'm still working on the current approach but synced the tx index on the same environment as before; bare hardware running on an SSD, synchronizing Signet up to block height 240593 (following the testing steps in the PR description). Results:

Parallel Mode (5 workers): ~5 minutes.

2025-10-13T19:54:55Z initload thread exit
2025-10-13T19:55:28Z Syncing txindex with block chain from height 149999
2025-10-13T19:55:59Z Syncing txindex with block chain from height 197999
2025-10-13T19:56:32Z Syncing txindex with block chain from height 224999
2025-10-13T19:57:19Z Syncing txindex with block chain from height 225999
2025-10-13T19:58:13Z Syncing txindex with block chain from height 231999
2025-10-13T19:58:57Z Syncing txindex with block chain from height 236999
2025-10-13T19:59:09Z txindex is enabled at height 240593

Sequential Mode: 13 minutes.

2025-10-13T20:00:02Z initload thread exit
2025-10-13T20:00:34Z Syncing txindex with block chain from height 64999
2025-10-13T20:01:04Z Syncing txindex with block chain from height 84999
2025-10-13T20:01:35Z Syncing txindex with block chain from height 130999
2025-10-13T20:02:05Z Syncing txindex with block chain from height 182999
2025-10-13T20:02:36Z Syncing txindex with block chain from height 192999
2025-10-13T20:03:16Z Syncing txindex with block chain from height 200999
2025-10-13T20:03:47Z Syncing txindex with block chain from height 212999
2025-10-13T20:04:20Z Syncing txindex with block chain from height 223999
2025-10-13T20:05:21Z Syncing txindex with block chain from height 226999
2025-10-13T20:05:54Z Syncing txindex with block chain from height 227999
2025-10-13T20:06:29Z Syncing txindex with block chain from height 228999
2025-10-13T20:06:59Z Syncing txindex with block chain from height 229999
2025-10-13T20:07:39Z Syncing txindex with block chain from height 230999
2025-10-13T20:08:39Z Syncing txindex with block chain from height 231999
2025-10-13T20:09:22Z Syncing txindex with block chain from height 232999
2025-10-13T20:10:09Z Syncing txindex with block chain from height 233999
2025-10-13T20:10:48Z Syncing txindex with block chain from height 234999
2025-10-13T20:11:38Z Syncing txindex with block chain from height 236999
2025-10-13T20:12:14Z Syncing txindex with block chain from height 237999
2025-10-13T20:13:02Z Syncing txindex with block chain from height 239999
2025-10-13T20:13:06Z txindex is enabled at height 240593

———

Furthermore, this is something I haven't pushed but this is from the batching DB writes branch:

Parallel Mode (5 workers) batching db writes: ~2.30 minutes

2025-10-13T20:30:54Z initload thread exit
2025-10-13T20:31:25Z Syncing txindex with block chain from height 181999
2025-10-13T20:31:56Z Syncing txindex with block chain from height 217999
2025-10-13T20:32:29Z Syncing txindex with block chain from height 225999
2025-10-13T20:33:14Z Syncing txindex with block chain from height 230999
2025-10-13T20:33:50Z Syncing txindex with block chain from height 236999
2025-10-13T20:33:54Z txindex is enabled at height 240593

So I’d guess that it’s not only related to disk access, but the difference might also be tied to your CPU virtualization technology? @andrewtoth.

Moreover, I mentioned this somewhere earlier in this PR, but it probably got lost over the years. This PR also prepares the ground for batching DB writes across blocks, which will improve sequential runs as well. Which every user will benefit.

Also, parallelization is disabled by default. I don’t think it’s worth removing it for the tx index just because some users might not be able to take advantage of it. It seems to me that the benefits for those who can run it greatly outweigh the limitations for those who can’t (they’re not losing anything with this PR, since sequential mode is still available).

l0rinc · 2025-10-14T20:17:02Z

what's the reason for the frequent pushes here recently?

furszy · 2025-10-14T20:44:22Z

what's the reason for the frequent pushes here recently?

I wrote it above, #26966 (comment). I'm working on a few changes that improves parallelism across indexes and the structure of the code, which work properly locally but are not stable on the CI yet. Will update the state with the modifications upon finishing.

furszy · 2025-10-15T18:20:22Z

Way forward

Given that this PR has struggled to attract reviewers over the past two years, I think we need a different approach.
This is a risky change that touches critical index infrastructure - let's do it in tiny, focused steps instead.

I'd like to propose keeping this PR open as a draft tracking PR that contains the full end-to-end implementation for reference and discussion. Meanwhile, we can merge the work through a series of smaller, focused PRs:

Start with a minimal implementation: single index (blockfilterindex), hardcoded 2 threads, minimal configurability

Get that reviewed, tested, and merged - let's see what users think

Then add the thread pool infrastructure in a separate PR (if still needed - we might find simpler per-index solutions work better)

Then add txindex and coinstatsindex (and eventually chainstate) support once we have confidence in the approach (and fix the current implementation issues).

For the "hardcoded 2 threads" step, I wrote this above but adding 2 threads or n threads represents the exact same work. The code complexity comes from the two-step procedure that allows sequential dumps after parallel block data construction (a restriction that comes from the filters headers-chain structure; headers are linked to each other). It is not related to the number of threads introduced, that's is simply controlled by the thread pool introduced in the first commit. That line simply doesn't make sense to me.

Furthermore, the description of this as "critical index infrastructure" is not entirely accurate. We are not dealing with consensus-related code here - this code lives on a different level. Existing tests already verify both its behavior and outcomes, and the previous sequential sync behavior is retained. This PR introduces a new optional feature which, in the worst-case scenario, can run slower than sequential if it is badly configured by the user (something that could also happen with other configurations, such as setting a high number of block sigs verification threads).

Also, about the "Given that this PR has struggled to attract reviewers over the past two years":
I don't think it is fair to say that at the moment. This PR has gotten quite nice reviews lately and has been ramping up quite nicely. That's mainly why I ended up modifying part of it and simplifying some workflows (an update will come in the next comment).
I think the problem in the past years was mainly related to the lack of urgency for having this new feature and its requirement of understanding not only what indexes are and how they are compounded, but also the underlying structure of the block filters chain.
I think it is just a matter of coordination between people who have worked on this code before and anyone else who wants to spend time understanding it deeply. I think we are happily starting to roll the ball on that direction now.

furszy · 2025-10-15T18:27:26Z

Now that the CI is happy again. Update regarding the introduced changes:

Before we had queues at two different levels, one at the thread pool level and another one at the index level. The initial rationale behind it was to granularly control the way on which the process interrupts them when needed. Not draining the index's pending_tasks queue as soon as the first task fails to process or a shutdown was triggered. But, after talking with mzumsande last week, I realized that this was actually over-engineering for not much gain and was harming parallelism across indexes. Top level index queues were racing with each other to take control of the underlying worker threads.
By unifying this two queues, we let the thread pool decide how/when to handle the tasks alone, with no index inference at all. Which allows for much better parallelism among any piece of code using the same workers (not just indexes) and also for further improvements along the road; like modifying a single spot with a lock-free structure or even the prioritization of certain tasks at the workers level - this former one was an idea coming from l0rinc two weeks ago.

So, in short, the code should be cleaner and run faster now, using less mutexes. Also, there is a possible follow-up improvement on moving the final post-processing round to the last task being executed (in case more than one task finishes at the same time by the end of sync). I didn't do it here because it comes with some extra code complexity that I don't think is worth adding to this PR; the same reason I don't think parallelizing the coinstats index here is a good idea, nor batching DB writes. Better to go in steps, as the gains introduced here are already a pretty solid step forward.

furszy · 2025-10-21T15:42:21Z

Small update, per discussion, will push the thread pool on an isolated PR in the coming week. Replacing the current http server WorkQueue and all its related code; it is an easy dedup that adds further test coverage to a never unit nor fuzz tested area. Plus a way to start using this class in the repo that allow others to use it.

Also, I'm investigating the tx index slow down with @andrewtoth. Trying to understand the differences between us. Worst-case scenario, we can disable parallel sync for it (will update accordantly in the coming week).
That being said, this refactoring enables batching DB writes as well, which will speedup tx index anyway. But prefer to do it in a follow-up to not continue expanding this PR.

Last note; I reworked the large commit, splitting it into smaller chunks. We can come back to this one after (or while) the http server thread handling PR moves on. But will focus on moving forward there first per feedback.

No behavior changes. This separation lays the groundwork for parallelizing the block data digest phase, while preserving sequential consistency for indexes that requires it before dumping the data to disk. And it will also allow us to batch database and file writes across multiple blocks. Essentially, it introduces a two-phase block processing mechanism using 'CustomProcessBlock()' and 'CustomPostProcessBlocks()': 1) 'CustomProcessBlock()' will be in charge of digesting block data and returning a custom result object per index. The goal is to encapsulate all procedures that can safely run concurrently here. 2) 'CustomPostProcessBlocks()' receives all digested objects in order, allowing the index to perform any subsequent steps that require sequentiality. For example, the block filter index requires sequentiality in the headers-chain construction, as each record depends on its predecessor hash. Important note: at present, the code executes entirely synchronously.

No behavior changes. The index still processes one block at a time. The code was refactored to allow handling multiple blocks on the processing phase. Introducing the concept of a "Task" which merely represents a range of blocks and their resulting digests. In the next commit, we will take advantage of this to process batches of blocks concurrently.

Move progress related timers (logging and locator write) into a shared SyncContext struct with atomic members. This does not change behavior but prepares the code for safe multi-threaded execution in the next commit.

No-behavior changes. This refactor moves the existing processing logic from BaseIndex::Sync into BaseIndex::RunTask, simplifying the main sync loop and laying groundwork for parallelization of block processing in the upcoming commit. The goal is to have BaseIndex::RunTask running concurrently.

And add option to customize thread pool workers count

This introduces parallel sync for the indexes initial sync, distributing the workload across multiple threads. When enabled, the chain is divided into fixed-size block ranges called "tasks". Worker threads consume tasks concurrently, calling to the CustomProcessBlock over their assigned range, storing results in a shared context while opportunistically batching and dumping the collected information to disk sequentially (when needed). Since large reorgs are improbable during initial sync (headers-chain PoW dictates the valid chain), reorgs are detected only before syncing begins and once it completes. Any new blocks connected during the process are caught up sequentially at the end. This, together with the fact that we no longer depend on an intermediate "index best block" that might be out of sync with the index m_best_block_index, allows us to remove the index_reorg_crash test, as it is no longer possible to call Rewind on a block that is not the index tip.

It also adds coverage for initial sync from a particular block. Mimicking a node restart.

Co-authored-by: furszy <matiasfurszyfer@protonmail.com>

DrahtBot · 2025-10-28T12:56:02Z

🐙 This pull request conflicts with the target branch and needs rebase.

DrahtBot added the UTXO Db and Indexes label Jan 25, 2023

furszy force-pushed the 2022_parallelize_blockfilter_index_2 branch 3 times, most recently from deb2e6e to d8b0f5e Compare January 25, 2023 16:29

furszy force-pushed the 2022_parallelize_blockfilter_index_2 branch from d8b0f5e to 65dc850 Compare January 28, 2023 15:04

furszy force-pushed the 2022_parallelize_blockfilter_index_2 branch from 65dc850 to 09cc56a Compare January 28, 2023 15:40

mzumsande reviewed Jan 29, 2023

View reviewed changes

furszy force-pushed the 2022_parallelize_blockfilter_index_2 branch from 09cc56a to 43e6237 Compare January 30, 2023 16:15

w0xlt reviewed Jan 30, 2023

View reviewed changes

furszy force-pushed the 2022_parallelize_blockfilter_index_2 branch 2 times, most recently from b2b62f5 to 4e1fba1 Compare January 30, 2023 20:55

furszy mentioned this pull request Jan 31, 2023

reduce cs_main scope, guard block index 'nFile' under a local mutex #27006

Closed

DrahtBot added the Needs rebase label Feb 17, 2023

furszy force-pushed the 2022_parallelize_blockfilter_index_2 branch 2 times, most recently from 1da78b2 to f264ed5 Compare February 21, 2023 14:01

DrahtBot removed the Needs rebase label Feb 21, 2023

This was referenced Oct 14, 2025

net: Provide block templates to peers on request #33191

Draft

Broadcast own transactions only via short-lived Tor or I2P connections #29415

Merged

furszy force-pushed the 2022_parallelize_blockfilter_index_2 branch from 201a90f to 21479ee Compare October 14, 2025 14:27

furszy force-pushed the 2022_parallelize_blockfilter_index_2 branch 2 times, most recently from fcdd89b to f7cf59e Compare October 14, 2025 20:10

furszy force-pushed the 2022_parallelize_blockfilter_index_2 branch 2 times, most recently from 3dede1f to 831dec8 Compare October 15, 2025 16:58

DrahtBot mentioned this pull request Oct 16, 2025

kernel, refactor: return error status on all fatal errors #29700

Draft

furszy force-pushed the 2022_parallelize_blockfilter_index_2 branch from 831dec8 to c38a5db Compare October 17, 2025 18:42

furszy mentioned this pull request Oct 23, 2025

http: replace WorkQueue and single threads handling for ThreadPool #33689

Open

furszy and others added 10 commits October 24, 2025 01:26

Index: introduce SyncContext - move logging & last locator timers

275d0ae

Move progress related timers (logging and locator write) into a shared SyncContext struct with atomic members. This does not change behavior but prepares the code for safe multi-threaded execution in the next commit.

util: introduce general purpose thread pool

83db5cd

init: provide thread pool to indexes

4525317

And add option to customize thread pool workers count

index: enable block filter index parallel sync

c815d95

It also adds coverage for initial sync from a particular block. Mimicking a node restart.

txindex: enable parallel sync

c89d9a5

fuzz: add test case for threadpool

0ad89bf

Co-authored-by: furszy <matiasfurszyfer@protonmail.com>

furszy force-pushed the 2022_parallelize_blockfilter_index_2 branch from c38a5db to 0ad89bf Compare October 24, 2025 05:34

DrahtBot added the Needs rebase label Oct 28, 2025

l0rinc mentioned this pull request Nov 25, 2025

RFC: bench: Add multi thread benchmarking #33740

Closed

index: initial sync speedup, parallelize process #26966

Are you sure you want to change the base?

index: initial sync speedup, parallelize process #26966

Uh oh!

Conversation

furszy commented Jan 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testing Notes:

Fuzz Test Coverage Report

Uh oh!

DrahtBot commented Jan 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Coverage & Benchmarks

Reviews

Conflicts

LLM Linter (✨ experimental)

Uh oh!

Sjors commented Jan 26, 2023

Uh oh!

furszy commented Jan 28, 2023

Uh oh!

mzumsande left a comment

Choose a reason for hiding this comment

Uh oh!

furszy commented Jan 30, 2023

Uh oh!

w0xlt left a comment

Choose a reason for hiding this comment

Uh oh!

Sjors commented Feb 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

furszy commented Feb 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

furszy commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

l0rinc commented Oct 14, 2025

Uh oh!

furszy commented Oct 14, 2025

Uh oh!

furszy commented Oct 15, 2025

Way forward

Uh oh!

furszy commented Oct 15, 2025

Uh oh!

furszy commented Oct 21, 2025

Uh oh!

DrahtBot commented Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

16 participants

furszy commented Jan 25, 2023 •

edited

Loading

DrahtBot commented Jan 25, 2023 •

edited

Loading

Sjors commented Feb 7, 2023 •

edited

Loading

furszy commented Feb 7, 2023 •

edited

Loading

furszy commented Oct 14, 2025 •

edited

Loading