Skip to content

index: batch db writes during initial sync#34489

Open
furszy wants to merge 10 commits intobitcoin:masterfrom
furszy:2026_index_batch_processing
Open

index: batch db writes during initial sync#34489
furszy wants to merge 10 commits intobitcoin:masterfrom
furszy:2026_index_batch_processing

Conversation

@furszy
Copy link
Member

@furszy furszy commented Feb 3, 2026

Decouples part of #26966.

Right now, index initial sync writes to disk for every block, which is not the best for HDD. This change batches those, so disk writes are less frequent during initial sync. This also cuts down cs_main contention by reducing the number of NextBlockSync calls (instead of calling it for every block, we will call it once per block window), making the node more responsive (IBD and validation) while indexes sync up. On top of that, it lays the groundwork for the bigger speedups, since part of the parallelization pre-work is already in place.

Just as a small summary:

  • Batch DB writes instead of flushing per block, which improves sync time on HDD due to the reduced number of disk write operations.
  • Reduce cs_main lock contention, which improves the overall node responsiveness (and primarily IBD) while the indexes threads are syncing.
  • Lays the groundwork for the real speedups, since part of index: initial sync speedup, parallelize process #26966 parallelization pre-work is introduced here as well.
  • Reduces the number of generated LDB files. See l0rinc's comment.

@DrahtBot
Copy link
Contributor

DrahtBot commented Feb 3, 2026

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type Reviewers
Concept ACK arejula27, l0rinc
Stale ACK optout21

If your review is incorrectly listed, please copy-paste <!--meta-tag:bot-skip--> into the comment that the bot should ignore.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #34654 ([RFC] BlockMap and CChain Concurrency Improvement by alexanderwiederin)
  • #32875 (index: handle case where pindex_prev equals chain tip in NextSyncBlock() by HowHsu)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

LLM Linter (✨ experimental)

Possible typos and grammar issues:

  • // attached while m_synced is still false, and it would not be -> // attached while m_synced is still false, and it would not be indexed. [fragmentary comment missing its conclusion; adding "indexed." completes the sentence and makes the intent clear]

  • // No need to handle errors in Commit. If it fails, the error will be already be -> // No need to handle errors in Commit. If it fails, the error will already be -> duplicate word "be" makes the sentence ungrammatical; remove the extra "be" to restore correct syntax.

Possible places where named args for integral literals may be used (e.g. func(x, /*named_arg=*/0) in C++, and func(x, named_arg=0) in Python):

  • WaitForHeight(&index, blocking_height + 2, 5s) in src/test/blockfilter_index_tests.cpp

2026-03-08 18:11:08

@furszy furszy force-pushed the 2026_index_batch_processing branch from 4734e8d to cf065aa Compare February 3, 2026 04:42
@DrahtBot
Copy link
Contributor

DrahtBot commented Feb 3, 2026

🚧 At least one of the CI tasks failed.
Task iwyu: https://github.com/bitcoin/bitcoin/actions/runs/21616547248/job/62296358632
LLM reason (✨ experimental): IWYU (include-what-you-use) reported a failure in the test script, causing the CI to fail.

Hints

Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:

  • Possibly due to a silent merge conflict (the changes in this pull request being
    incompatible with the current code in the target branch). If so, make sure to rebase on the latest
    commit of the target branch.

  • A sanitizer issue, which can only be found by compiling with the sanitizer and running the
    affected test.

  • An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

Copy link
Contributor

@fjahr fjahr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This relates to what you wrote here right? It would be helpful that you check if it solves the LevelDB file issue in coinstatsindex and I am also curious about your benchmark results because @l0rinc did not find that this was leading to a speed up.

@l0rinc
Copy link
Contributor

l0rinc commented Feb 3, 2026

Concept ACK, I prefer this over #33306 - and will investigate if this solves that problem once my benchmarking servers free up.

because @l0rinc did not find that this was leading to a speed up.

I experimented with something similar in https://github.com/l0rinc/bitcoin/pull/37/changes, will compare it against this change. I wrote:

indexes: batch index writes: 34188s, 211M, 8 files
It solved the fragmentation, but didn't speed up anything.
I still think this is a better direction than adding manual compactions.

Most likely since writing isn't the bottleneck but MuHash calculations were.

@furszy
Copy link
Member Author

furszy commented Feb 3, 2026

This relates to what you wrote here right? It would be helpful that you check if it solves the LevelDB file issue in coinstatsindex and I am also curious about your benchmark results because @l0rinc did not find that this was leading to a speed up.

@fjahr the comment was merely an excuse to decouple the DB writes batching code out of #26966 rather than me having a formed opinion in favor or against #33306. That's why I didn't mention it in the PR description. Maybe the changes complement each other.

To be crystal clear, just updated the PR description with further details on why this change worth alone, independently on #33306.

To summarize it here, the goal of the changes are:

  • Batch DB writes instead of flushing per block, which will improve sync time on HDD due to the reduced number of IO operations.
  • Reduce cs_main lock contention, which orthogonally improves the overall node responsiveness (and primarily IBD) while the indexes threads are syncing.
  • Lays the groundwork for the real speedups, since part of index: initial sync speedup, parallelize process #26966 parallelization pre-work is introduced here as well.

@furszy furszy force-pushed the 2026_index_batch_processing branch 2 times, most recently from dd76491 to a2f8447 Compare February 3, 2026 20:25
@furszy
Copy link
Member Author

furszy commented Feb 3, 2026

Updated per feedback from @hebasto (thanks!).
Changed <cinttypes> include for <cstdint> to make IWYU happy.

@hebasto
Copy link
Member

hebasto commented Feb 3, 2026

Updated per feedback from @hebasto (thanks!). Changed <cinttypes> include for <cstdint> to make IWYU happy.

I guess, this is a bug in IWYU caused by include-what-you-use/include-what-you-use@44480a2.

UPDATE: Fixed in #34498.

@maflcko
Copy link
Member

maflcko commented Feb 4, 2026

Can you run all unit and functional tests on all commits? Or do they time out?

@furszy furszy force-pushed the 2026_index_batch_processing branch from a2f8447 to 8826900 Compare February 4, 2026 15:17
@furszy
Copy link
Member Author

furszy commented Feb 4, 2026

Can you run all unit and functional tests on all commits? Or do they time out?

Bad squash, my bad. Thanks for the heads up. Fixed.
Also rebased the branch to pick up the CI changes.

@Jhackman2019
Copy link

Jhackman2019 commented Feb 8, 2026

Builds clean on ARM64 (Pi 5, Debian Bookworm). All index-related functional tests and blockfilter_index unit tests pass.
:)

@furszy furszy force-pushed the 2026_index_batch_processing branch from 8826900 to 6f66ff7 Compare February 8, 2026 16:55
@optout21
Copy link
Contributor

optout21 commented Feb 9, 2026

I am confused as to how this batching relates to the batching logic of the orginal code.
Looking at the pre-change BaseIndex::Sync(), it seems that Commit/WriteBatch occured only every 30 seconds (or at the chain tip), not after each block. This change does a WriteBatch after every 500 blocks, which in fact may be even more frequent. The original commit-after-every-30-secs is still there, so I don't see how this change makes less writes. Maybe I misunderstand the change.

            if (!ProcessBlock(pindex)) return; // error logged internally
            ...
            if (current_time - last_locator_write_time >= SYNC_LOCATOR_WRITE_INTERVAL) {
                SetBestBlockIndex(pindex);
                last_locator_write_time = current_time;
                // No need to handle errors in Commit. See rationale above.
                Commit();
            }

@furszy
Copy link
Member Author

furszy commented Feb 9, 2026

I am confused as to how this batching relates to the batching logic of the orginal code. Looking at the pre-change BaseIndex::Sync(), it seems that Commit/WriteBatch occured only every 30 seconds (or at the chain tip), not after each block. This change does a WriteBatch after every 500 blocks, which in fact may be even more frequent. The original commit-after-every-30-secs is still there, so I don't see how this change makes less writes. Maybe I misunderstand the change.

            if (!ProcessBlock(pindex)) return; // error logged internally
            ...
            if (current_time - last_locator_write_time >= SYNC_LOCATOR_WRITE_INTERVAL) {
                SetBestBlockIndex(pindex);
                last_locator_write_time = current_time;
                // No need to handle errors in Commit. See rationale above.
                Commit();
            }

Commit() only persists the locator (i.e. "how far the index has synced") so we can resume from that point after a restart. It does not flush or batch the per-block digested data.

The block data is written inside CustomAppend() which is called from ProcessBlock() per block, and it writes to the db immediately.

We're not replacing locator update time nor merging it with something else here, we're only batching something that previously wasn't batched at all.

Also, using a block window instead of reusing the existing time-based window is intentional. Time-based flushing makes memory usage hard to predict, and difficult to test (see #32878), because the amount of buffered data depends on runtime speed, which does not necessarily correlate with available memory (batched data stays in memory until it's flushed..). A fixed block window provides predictable memory usage and deterministic behavior, which is overall better and also integrates nicely with multiple workers sync (#26966). I would argue in favor of ditching the time window in the future, but.. that's not something related to this PR.

@furszy furszy force-pushed the 2026_index_batch_processing branch from 6f66ff7 to d0f793a Compare February 9, 2026 19:58
@furszy
Copy link
Member Author

furszy commented Feb 9, 2026

Updated per feedback, plus added test coverage for the introduced changes. Thanks @optout21.

@optout21
Copy link
Contributor

optout21 commented Feb 9, 2026

Thanks for the explanations, and apologies for asking trivial/irrelevant questions.

Commit() only persists the locator (i.e. "how far the index has synced") so we can resume from that point after a restart.
The block data is written inside CustomAppend() which is called from ProcessBlock() per block, and it writes to the db immediately.
These are two important points, I can see now.

I was puzzled by the fact that there is a loop in Sync(), and another loop is introduced in ProcessBlocks(). But the two complement each other.

Another approach would be to keep the outer loop in Sync() block-by-block, store a CDBBatch variable, process one block, count the blocks in the batch, and flush the batch if batch size is reached (outside of ProcessBlock). This would be slightly less change (no secondary loop), but -- if I'm correct -- the cs_main lock would be still accessed at every block.

... I would argue in favor of ditching the time window in the future, but.. that's not something related to this PR.
Maybe Commit() could be called with the same granularity as batch writes, but that's out of scope here.

@furszy
Copy link
Member Author

furszy commented Feb 9, 2026

Another approach would be to keep the outer loop in Sync() block-by-block, store a CDBBatch variable, process one block, count the blocks in the batch, and flush the batch if batch size is reached (outside of ProcessBlock). This would be slightly less change (no secondary loop), but -- if I'm correct -- the cs_main lock would be still accessed at every block.

Yes. That would still lock cs_main on every block, which harms the overall node responsiveness, and it is also not very helpful for the parallelization goal, which requires certain isolation between batches so worker threads can process them concurrently.

@furszy
Copy link
Member Author

furszy commented Mar 3, 2026

CI failure is unrelated.

@furszy
Copy link
Member Author

furszy commented Mar 3, 2026

Why would it be unrealated, when it consistently happens on this pull only?

This PR only touches indexes code and nothing else. CI was failing in interface_ipc.py this morning, and now after the restart fails on p2p_orphan_handling.py. None of these tests have any index enabled.

@maflcko
Copy link
Member

maflcko commented Mar 3, 2026

None of these tests have any index enabled.

Yeah, I guess this is a good point. It is just odd that the p2p_orphan_handling.py happened twice here, but nowhere else so far. The IPC one is tracked in #34711

Edit: Let's see if the CI fails in #34726 ...

@DrahtBot DrahtBot removed the CI failed label Mar 3, 2026
@furszy
Copy link
Member Author

furszy commented Mar 4, 2026

Let's see if the CI fails in #34726

Could add some debug logging to see where it's getting stuck. At first glance, it looks like it stalls during block creation.

Also, just to be clear to other reviewers. Since the issue happens there as well, we know for sure it is not related to this PR.

@arejula27
Copy link

looks like CI passed now

@furszy
Copy link
Member Author

furszy commented Mar 4, 2026

looks like CI passed now

yes, review can continue.

*
* This index is used to serve BIP 157 net requests.
*/
class BlockFilterIndex final : public BaseIndex
Copy link

@arejula27 arejula27 Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im not sure if removing final from BlockFilterIndex only to allow IndexBlockSim to subclass it for testing is the better decision, i undertsand that it was used final for a reason, and i do not think modifying the code fo fit a test is the best option.

Restoring final makes the compilation to fail with error:

  [ 82%] Building CXX object src/test/CMakeFiles/test_bitcoin.dir/checkqueue_tests.cpp.o
/home/arejula27/workspaces/bitcoin/src/test/blockfilter_index_tests.cpp:440:7: error: cannot derive from ‘final’ base ‘BlockFilterIndex’ in derived type ‘blockfilter_index_tests::IndexBlockSim’
  440 | class IndexBlockSim : public BlockFilterIndex
      |       ^~~~~~~~~~~~~
gmake[2]: *** [src/test/CMakeFiles/test_bitcoin.dir/build.make:348: src/test/CMakeFiles/test_bitcoin.dir/blockfilter_index_tests.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:2217: src/test/CMakeFiles/test_bitcoin.dir/all] Error 2
gmake: *** [Makefile:146: all] Error 2

Maybe we should base BlockFilterIndex from BaseIndex like IndexReorgCrash

Copy link
Member Author

@furszy furszy Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I went back and forth on this decision for a while before pushing it. The idea was to actually cover BlockFilterIndex methods during reorgs, not just the BaseIndex behavior, even if they aren't strictly required for this PR changes.

Ideally, we should have both testing scenarios: reorgs during initial sync against BaseIndex, and reorgs against BlockFilterIndex. But.. doing that would require extracting BuildChainTestingSetup into a separate file, and I didn't want to expand the scope of this PR any further. This test, and even the existing index_reorg_crash (the one that uses IndexReorgCrash) do not belong to blockfilter_index_tests.cpp. They only test against the base class. But well, that is something that already exists in our codebase, and not something we are including here, so it is better to leave it for a different PR.

Also, the final declaration here (and in all other indexes) seems really unnecessary. I cannot think of anything we gain from it. It only restricts our unit testing capabilities. I would remove it from all indexes.

That being said, most of this is just me ranting about improvements we could make. So will take your suggestion and use the BaseIndex class here for now. Let's go step by step. Thanks for the feedback.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we should have both testing scenarios: reorgs during initial sync against BaseIndex, and reorgs against BlockFilterIndex.

Why? Isn't that redundant?

They only test against the base class. But that is something that already exists in our codebase and is not included here, so it is better to leave it for a different PR.

Also, the final declaration here (and in all other indexes) seems really unnecessary. I cannot think of anything we gain from it. It only restricts our unit testing capabilities. I would remove it from all indexes.

Sounds reasonable to discuss in another PR. I like the idea of removing final if it is not needed, but as you said, it should be consistent across all indexes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? Isn't that redundant?

We can test at different levels.
The BaseIndex test can verify that we always follow the best chain during initial sync, that the appropriate child methods are called in order (append - remove), and check the db chain locator updates. Then, BlockFilterIndex test, and others, should only focus on manually connecting and disconnecting blocks, verifying that its data is consistent in memory and disk.

@arejula27
Copy link

Two small comments, I feel the one about removing final is relevant, the other one not really but I saw the same pattern on other files of the project so if it is not an inconvenient you can add it, if not just resolve it.

Waiting for the HDD benchmark of @l0rinc. I can do it if he is busy

@furszy
Copy link
Member Author

furszy commented Mar 4, 2026

Waiting for the HDD benchmark of l0rinc. I can do it if he is busy

Having more benchmarks does not hurt :). Feel more than welcome adding your results as well!

furszy added 3 commits March 4, 2026 17:19
No behavior change. This is just for correctness.
No behavior change.

Introduce ProcessBlocks(start, end) to handle a range of blocks
in forward order. Currently used per-block, but this lays the
foundation for future batch processing and parallelization.

This has the nice property of allowing us to collect the block
indexes we are about to process without locking 'cs_main'. Just
by traversing the chain backwards via each block index pprev.
@furszy furszy force-pushed the 2026_index_batch_processing branch from 80aea07 to 5be65db Compare March 4, 2026 20:25
@furszy
Copy link
Member Author

furszy commented Mar 4, 2026

Updated per feedback. Thanks @arejula27.
Also rebased on master to pull the CI fixes.

@l0rinc
Copy link
Contributor

l0rinc commented Mar 8, 2026

Did another index benchmark for the latest push on an HDD:

image
indexes | i7-hdd | x86_64 | Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz | 8 cores | 62Gi RAM | ext4 | HDD
BEFORE="4c40a923f003420193aa574745f70788bcf35265"; AFTER="5be65dbcdceaafb6f5a0e34c69efc2b3dfcfe27a"; \
DATA_DIR="/mnt/my_storage/BitcoinData"; export DATA_DIR; \
RESULTS_FILE="${DATA_DIR}/index_benchmark_results.txt"; export RESULTS_FILE; \
wait_index() { \
  tail -F ${DATA_DIR}/debug.log 2>/dev/null | grep -q -m1 'index is enabled\|is enabled at height'; \
  sleep 2; \
  killall bitcoind 2>/dev/null || true; \
  wait; \
}; export -f wait_index; \
log_size() { \
  local label="$1"; local index_dir="$2"; \
  local size=$(du -sh "$index_dir" 2>/dev/null | cut -f1); \
  local files=$(find "$index_dir" -type f -name '*.ldb' 2>/dev/null | wc -l); \
  echo "$label: size=$size, ldb_files=$files" | tee -a "$RESULTS_FILE"; \
}; export -f log_size; \
git reset --hard >/dev/null 2>&1 && git clean -fxd >/dev/null 2>&1 && git fetch origin $BEFORE $AFTER >/dev/null 2>&1; \
for c in $BEFORE:build-before $AFTER:build-after; do \
  git checkout ${c%:*} >/dev/null 2>&1 && cmake -B ${c#*:} -G Ninja -DCMAKE_BUILD_TYPE=Release >/dev/null 2>&1 && ninja -C ${c#*:} bitcoind >/dev/null 2>&1; \
done; \
echo "indexes | $(hostname) | $(uname -m) | $(lscpu | grep 'Model name' | head -1 | cut -d: -f2 | xargs) | $(nproc) cores | $(free -h | awk '/^Mem:/{print $2}') RAM | $(df -T $DATA_DIR | awk 'NR==2{print $2}') | $(lsblk -no ROTA $(df --output=source $DATA_DIR | tail -1) 2>/dev/null | grep -q 0 && echo SSD || echo HDD)" | tee -a "$RESULTS_FILE" && \
for INDEX in txindex blockfilterindex coinstatsindex txospenderindex;; do \
  echo -e "\n--- $INDEX ---" | tee -a "$RESULTS_FILE"; \
  if [ "$INDEX" = "blockfilterindex" ]; then \
    INDEX_DIR="${DATA_DIR}/indexes/blockfilter/basic"; \
  else \
    INDEX_DIR="${DATA_DIR}/indexes/${INDEX}"; \
  fi; \
  export INDEX_DIR; \
  for BUILD in before after; do \
    if [ "$BUILD" = "before" ]; then COMMIT="${BEFORE:0:7}"; else COMMIT="${AFTER:0:7}"; fi; \
    hyperfine --runs 1 --shell bash --sort command \
      --prepare "rm -rf ${DATA_DIR}/indexes/* ${DATA_DIR}/debug.log" \
      --cleanup "log_size '${INDEX} ${BUILD}' '${INDEX_DIR}'" \
      -n "${BUILD} (${COMMIT})" \
      "./build-${BUILD}/bin/bitcoind -datadir=${DATA_DIR} -${INDEX}=1 -connect=0 -printtoconsole=0 & wait_index" \
      2>&1 | tee -a "$RESULTS_FILE"; \
  done; \
done;

--- txindex ---

Benchmark 1: before (4c40a92)
Time (abs ≡): 23757.916 s [User: 13234.906 s, System: 1697.284 s]

Benchmark 1: after (5be65db)
Time (abs ≡): 21134.433 s [User: 12627.245 s, System: 1483.096 s]

--- blockfilterindex ---

Benchmark 1: before (4c40a92)
Time (abs ≡): 14205.566 s [User: 8328.684 s, System: 418.981 s]

Benchmark 1: after (5be65db)
Time (abs ≡): 14046.610 s [User: 8286.885 s, System: 415.464 s]

--- coinstatsindex ---

Benchmark 1: before (4c40a92)
Time (abs ≡): 40075.612 s [User: 34320.885 s, System: 403.669 s]

Benchmark 1: after (5be65db)
Time (abs ≡): 39869.235 s [User: 34307.695 s, System: 398.322 s]

--- txospenderindex ---
Benchmark 1: before (4c40a92)
Time (abs ≡): 36324.897 s [User: 23490.919 s, System: 2227.044 s]

Benchmark 1: after (5be65db)
Time (abs ≡): 26375.223 s [User: 21783.716 s, System: 1690.540 s]

Edit: added txospenderindex

@arejula27
Copy link

Did another index benchmark for the latest push on an HDD:

image
indexes | i7-hdd | x86_64 | Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz | 8 cores | 62Gi RAM | ext4 | HDD
BEFORE="4c40a923f003420193aa574745f70788bcf35265"; AFTER="5be65dbcdceaafb6f5a0e34c69efc2b3dfcfe27a"; \
DATA_DIR="/mnt/my_storage/BitcoinData"; export DATA_DIR; \
RESULTS_FILE="${DATA_DIR}/index_benchmark_results.txt"; export RESULTS_FILE; \
wait_index() { \
  tail -F ${DATA_DIR}/debug.log 2>/dev/null | grep -q -m1 'index is enabled\|is enabled at height'; \
  sleep 2; \
  killall bitcoind 2>/dev/null || true; \
  wait; \
}; export -f wait_index; \
log_size() { \
  local label="$1"; local index_dir="$2"; \
  local size=$(du -sh "$index_dir" 2>/dev/null | cut -f1); \
  local files=$(find "$index_dir" -type f -name '*.ldb' 2>/dev/null | wc -l); \
  echo "$label: size=$size, ldb_files=$files" | tee -a "$RESULTS_FILE"; \
}; export -f log_size; \
git reset --hard >/dev/null 2>&1 && git clean -fxd >/dev/null 2>&1 && git fetch origin $BEFORE $AFTER >/dev/null 2>&1; \
for c in $BEFORE:build-before $AFTER:build-after; do \
  git checkout ${c%:*} >/dev/null 2>&1 && cmake -B ${c#*:} -G Ninja -DCMAKE_BUILD_TYPE=Release >/dev/null 2>&1 && ninja -C ${c#*:} bitcoind >/dev/null 2>&1; \
done; \
echo "indexes | $(hostname) | $(uname -m) | $(lscpu | grep 'Model name' | head -1 | cut -d: -f2 | xargs) | $(nproc) cores | $(free -h | awk '/^Mem:/{print $2}') RAM | $(df -T $DATA_DIR | awk 'NR==2{print $2}') | $(lsblk -no ROTA $(df --output=source $DATA_DIR | tail -1) 2>/dev/null | grep -q 0 && echo SSD || echo HDD)" | tee -a "$RESULTS_FILE" && \
for INDEX in txindex blockfilterindex coinstatsindex; do \
  echo -e "\n--- $INDEX ---" | tee -a "$RESULTS_FILE"; \
  if [ "$INDEX" = "blockfilterindex" ]; then \
    INDEX_DIR="${DATA_DIR}/indexes/blockfilter/basic"; \
  else \
    INDEX_DIR="${DATA_DIR}/indexes/${INDEX}"; \
  fi; \
  export INDEX_DIR; \
  for BUILD in before after; do \
    if [ "$BUILD" = "before" ]; then COMMIT="${BEFORE:0:7}"; else COMMIT="${AFTER:0:7}"; fi; \
    hyperfine --runs 1 --shell bash --sort command \
      --prepare "rm -rf ${DATA_DIR}/indexes/* ${DATA_DIR}/debug.log" \
      --cleanup "log_size '${INDEX} ${BUILD}' '${INDEX_DIR}'" \
      -n "${BUILD} (${COMMIT})" \
      "./build-${BUILD}/bin/bitcoind -datadir=${DATA_DIR} -${INDEX}=1 -connect=0 -printtoconsole=0 & wait_index" \
      2>&1 | tee -a "$RESULTS_FILE"; \
  done; \
done;

--- txindex ---

Benchmark 1: before (4c40a92)
Time (abs ≡): 23757.916 s [User: 13234.906 s, System: 1697.284 s]

Benchmark 1: after (5be65db)
Time (abs ≡): 21134.433 s [User: 12627.245 s, System: 1483.096 s]

--- blockfilterindex ---

Benchmark 1: before (4c40a92)
Time (abs ≡): 14205.566 s [User: 8328.684 s, System: 418.981 s]

Benchmark 1: after (5be65db)
Time (abs ≡): 14046.610 s [User: 8286.885 s, System: 415.464 s]

--- coinstatsindex ---

Benchmark 1: before (4c40a92)
Time (abs ≡): 40075.612 s [User: 34320.885 s, System: 403.669 s]

Benchmark 1: after (5be65db)
Time (abs ≡): 39869.235 s [User: 34307.695 s, System: 398.322 s]

It looks like the improvement is not very noticeable for two of the indexes. It might be interesting to try increasing the batch size. Do you have any suggestions on what could be causing this?

@sedited
Copy link
Contributor

sedited commented Mar 8, 2026

It looks like the improvement is not very noticeable for two of the indexes. It might be interesting to try increasing the batch size. Do you have any suggestions on what could be causing this?

I'm not sure this is really surprising given the difference in the amount of data written to the latter two. I would expect a jump more similar to the txindex for the txospenderindex.

What is the expectation on the increase of memory usage here? I tried profiling it with massif, but usage when running with this patch seemed a bit erratic, which I guess is still an indication that it is likely to be higher. Can we really batch operations in this way without putting more memory pressure on the OS?

@furszy
Copy link
Member Author

furszy commented Mar 8, 2026

Can we really batch operations in this way without putting more memory pressure on the OS?

The batch size is intentionally small. I was very conservative with the chosen value to avoid having to worry about it here. The goal of this PR is to land the structural improvements, with a net positive: less cs_main locking, merging prerequisites for the parallelization goal, and the last discovery of reducing the number of created LBD files.

It's expected that any batching implementation adds some memory overhead, but that shouldn't be a problem, it pays off in many other areas, not only with a faster sync promise. And, if we ever wanted to, we could add a config flag to disable batching by setting the batch size to 1. But realistically speaking, I don't think anyone running indexes would want them to take hours to sync while locking cs_main and slowing down the whole node. People who run indexes want them to sync-up quickly so they can start using them.

Also, an important point. All these changes only matter during the index initial sync period. Once the index is synced, they are no longer relevant.

We can benchmark this on small devices like the Pi 4 to see the benefits more clearly, but I honestly wouldn't spend too much time on it. If we agree this PR lands structural, scalable improvements with many benefits and no noticeable downside, and also lets us move forward with the major parallelization speedup path, it seems reasonable to just move forward. This also has a good number of tests we currently lack from.

furszy and others added 7 commits March 8, 2026 15:09
This is a refactoring that makes NextSyncBlock easily adaptable to
return block ranges instead of single blocks in the subsequent commit.

It also adds a fast-path that avoids calling 'FindFork()' and 'Next()'
when the index is synced.

Co-authored-by: Hao Xu <hao.xu@linux.dev>
Set end of the window of blocks an index will process at time.
Stopping at either the configured window size or the chain tip.

This is the first step toward batch and parallel processing.
These block windows become units of work and live in memory until
flushed to disk in one go.

Since each index has a different data size, the range size
is configurable so it can be tuned considering memory
consumption.
Pass CDBBatch to subclasses so writes can be accumulated
and committed together instead of flushed per block

Note:
Batch writes were intentionally not tied to the existing Commit() method.
The rationale is to bound memory consumption as batches accumulate in RAM
until flushed to disk. This leaves two separate workflows with different
purposes:

1) Batch writes flushes data for a block range atomically: either all
blocks in the batch land in LevelDB or none do.

2) Commit() is about progress checkpointing: it writes the best block
locator alongside any other index state position (e.g. last written
file for the block filter index - which requires an fsync call), so the
node always has a consistent recovery point after an interruption or crash.

Reprocessing data after a crash is not really a problem because we would
just overwrite the existing one.
Verifies the index persists the last processed index when interrupted.
@furszy furszy force-pushed the 2026_index_batch_processing branch from 5be65db to 1497086 Compare March 8, 2026 18:10
Copy link
Contributor

@l0rinc l0rinc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK on the goal: batching is the right direction for reducing write amplification, cs_main contention, and LDB file proliferation.

However, I think it's important to stress my concern that this change is more complex than it's being treated as.
It alters the invariants every index subclass relies on: in-memory state can now advance ahead of what's persisted, and the interrupt/shutdown paths haven't been fully thought through for each subclass.
The most serious issue seems to me that Commit() during interrupt writes advanced m_muhash alongside a stale locator, which causes CoinStatsIndex::CustomInit to reject the index as corrupted on restart. blockfilterindex has a milder variant (orphaned flat file data). I wonder if fuzzing could catch these.

More generally, several "no behavior change" refactoring commits don't explain why the restructuring is safe, and memory concerns are brushed away with "shouldn't be a problem".
I also left a ton of comments about the ProcessBlocks loop needing tighter bounds and the shutdown test only exercising a no-op BaseIndex subclass. It doesn't catch the actual subclass corruption paths.
Some of the earlier commits (e.g. the iterator ordering fix) could land as standalone PRs to reduce the review surface here.

See inline comments for details on these and other issues (interrupt handler scoping, NextSyncBlock invariants, loop safety, stale comments, etc.).

Note

in the meantime, I finished the HDD benchmarks with txospenderindex measurements, see #34489 (comment).

}
pindex = pindex_next;

if (!ProcessBlock(pindex_next)) return; // error logged internally
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

95fc91f index: sync, update iterator only after processing block:

We're changing behavior in a commit that claims it's not a behavior change, without adding context for why this change was needed or how to validate that it is correct. Given that it wasn't obvious to the original authors, an explanation is needed. Also, No test was added to exercise this behavior, to explain exactly why it was necessary, and make sure it doesn't happen again.

It also seems independent enough to be pushed in a separate PR, with a test documenting the current behavior in the first commit and, in the next commit, a fix and a test update that reflects the new behavior.

}
}

if (m_interrupt) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0cbe84d index: refactor, decouple interruption from sync loop:

The commit claims there's no behavior change, but it doesn't explain why the move is safe. Given all the internal returns and breaks it's not self-evident.


It seems to me that previously a synced, and later interrupted, path could hit

LogInfo("%s is enabled at height %d", GetName(), pindex->nHeight);

, but after the change it hits

LogInfo("%s: m_interrupt set; exiting ThreadSync", GetName());

and returns.

I'm not sure there is any other change, but I would appreciate a commit message explanation, and maybe a test exercising this path, preferably a fuzz test to cover all these weird combinations.


bool BaseIndex::ProcessBlocks(const CBlockIndex* start, const CBlockIndex* end)
{
// Collect all block indexes from [end...start] in order
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0336c61 index: add method to process block ranges:

nit: for consistency we could use the same comment style as below:

Suggested change
// Collect all block indexes from [end...start] in order
// Collect all block indexes from end to start

// Collect all block indexes from [end...start] in order
std::vector<const CBlockIndex*> ordered_blocks;
ordered_blocks.reserve(end->nHeight - start->nHeight + 1);
for (const CBlockIndex* block = end; block && start->pprev != block; block = block->pprev) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0336c61 index: add method to process block ranges:

When can block be nullptr here? Wouldn't it be a fatal error if we walk past genesis? And what if the caller mixes up begin and end? Or if start and end have valid heights but are on different forks.

Given how intertwined the code is here, a serious mess-up could cause an infinite loop and would be really hard to trace back to the source.

I would probably sleep better if we asserted that end->nHeight >= start->nHeight and used a bounded for loop here instead.

const int range_size{end->nHeight - start->nHeight + 1};
Assert(range_size > 0);

// Collect all block indexes from end to start
std::vector<const CBlockIndex*> ordered_blocks;
{
    ordered_blocks.reserve(range_size);
    CBlockIndex* it = end;
    for (int i{0}; i < range_size; ++i) {
        ordered_blocks.emplace_back(Assert(it));
        it = it->pprev;
    }
    Assert(it == start->pprev);
}


// And process blocks in forward order: from start to end
for (auto it = ordered_blocks.rbegin(); it != ordered_blocks.rend(); ++it) {
if (m_interrupt) return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0336c61 index: add method to process block ranges:

The method comment states:

Returns false on unrecoverable failure or during interruption

But if the interruption happens before we call the first ProcessBlock, we should be safe since no state was changed - i.e. nothing to recover from.


// Verifies that the index persists its sync progress when interrupted during initial sync.
// The index should resume from the last processed batch rather than restarting from genesis.
BOOST_FIXTURE_TEST_CASE(shutdown_during_initial_sync, BuildChainTestingSetup)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1497086 test: Add coverage for index locator persistence during shutdown:

This should have probably caught the partial write regression.
Can we extend IndexCommitStateSim to implement CustomCommit as well?

index.Stop();
}

// Ensure the initial sync batch window behaves as expected.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7c82cc8 test: index, add initial sync batch writes coverage:

Adding tests at the very end doesn't help document the progression of the changes introduced in this PR, commit by commit.
We could add this test as a first commit:

// Ensure initial sync can be restarted cleanly without overwriting earlier
// results. Tests sync from genesis and from a higher block to mimic a restart.
BOOST_FIXTURE_TEST_CASE(initial_sync_restart, BuildChainTestingSetup)

and later, in the batch size commit, just add filter_index.SetProcessingBatchSize(BATCH_SIZE); to the test to prove that it still passes. That way, we are asserting that the previous behavior is retained, not just that the new behavior is covered.

// Tests that indexes can complete its initial sync even if a reorg occurs mid-sync.
// The index is paused at a specific block while a fork is introduced a few blocks before the tip.
// Once unblocked, the index should continue syncing and correctly reach the new chain tip.
BOOST_FIXTURE_TEST_CASE(initial_sync_reorg, BuildChainTestingSetup)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

28d8b04 test: index, add coverage for initial sync reorgs:

This test almost completely passes on master - we should add as much coverage before the refactor.
But we need to make it more realistic, likely by overriding CustomAppend as well.

Comment on lines +522 to +524
constexpr int SHUTDOWN_HEIGHT = 45;
constexpr int BATCH_SIZE = 10;
constexpr int EXPECTED_LAST_SYNCED_BLOCK = 39;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1497086 test: Add coverage for index locator persistence during shutdown:

This passes before the change with:

    // The index will be interrupted while processing block 45. Since sync is
    // still per-block here, the last persisted block is expected to be 45.
    constexpr int SHUTDOWN_HEIGHT = 45;
    constexpr int EXPECTED_LAST_SYNCED_BLOCK = 45;

(and no SetProcessingBatchSize, of course)

bool BlockFilterIndex::Write(const BlockFilter& filter, uint32_t block_height, const uint256& filter_header)
bool BlockFilterIndex::Write(CDBBatch& batch, const BlockFilter& filter, uint32_t block_height, const uint256& filter_header)
{
size_t bytes_written = WriteFilterToDisk(m_next_filter_pos, filter);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We seem to be writing to disk to eagerly here, this also is misaligned with the batching.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.