validation: reduce persisted UTXO set size by prioritizing positive lookups (RFC) #33817

l0rinc · 2025-11-07T12:43:23Z

draft to gather comments and conceptual reviews

Context

BIP30 prevents duplicate transaction IDs by checking whether outputs already exist in the UTXO set before adding them.
LevelDB's FilterPolicy stores a per-table probabilistic filter to optimize for negative lookups.

After the first ~230k blocks (BIP30/BIP34 windows), validation does not deliberately probe the UTXO set for missing entries (missing coins imply invalid transactions).
Bloom filters therefore slow the common case (present-key lookups) while bloating the on-disk tables.

History

Bloom filters were introduced in the Ultraprune PR (#1677) without explicit documentation of their purpose.

Fix

For blocks prior to the assumevalid anchor, we already skip script verification, relying on accumulated proof of work. Skipping BIP30 for those deeply buried blocks is consistent with assumevalid's purpose (especially after the recent checkpoint removal).

Removing the LevelDB bloom filters slightly speeds up present-key workloads (~11% faster AssumeUTXO load) and reduces the on-disk chainstate size by ~2% because filter blocks are not stored.

Disclaimer

Nodes syncing from genesis with -assumevalid=0 still perform full BIP30 validation, which may be a few seconds slower.
Checks beyond 1,983,701 remain enforced regardless of fScriptChecks.

Performance

Performance change is best demonstrated by an AssumeUTXO loading - since this change was mostly motivated by UTXO set size and memory reduction.

AssumeUTXO loads with default dbcache show ~11% faster bootstrapping.

COMMITS="745eb053a41c487cc10f20644c65dc8455cf8974 5cb93dad7c06db82642169d8f7d07442d215f49c"; \
CC=gcc; CXX=g++; \
BASE_DIR="/mnt/my_storage"; DATA_DIR="$BASE_DIR/ShallowBitcoinData"; LOG_DIR="$BASE_DIR/logs"; UTXO_SNAPSHOT_PATH="$BASE_DIR/utxo-880000.dat"; \
(echo ""; for c in $COMMITS; do git fetch -q origin $c && git log -1 --pretty='%h %s' $c || exit 1; done; echo "") && \
for DBCACHE in 450 4500 45000; do \
  (echo "assumeutxo load | 880000 blocks | dbcache ${DBCACHE} | $(hostname) | $(uname -m) | $(lscpu | grep 'Model name' | head -1 | cut -d: -f2 | xargs) | $(nproc) cores | $(free -h | awk '/^Mem:/{print $2}') RAM | $(df -T $BASE_DIR | awk 'NR==2{print $2}') |
$(lsblk -no ROTA $(df --output=source $BASE_DIR | tail -1) | grep -q 0 && echo SSD || echo HDD)";) &&\
  hyperfine \
  --sort command \
  --runs 5 \
  --export-json "$BASE_DIR/assumeutxo-$(sed -E 's/(\w{8})\w+ ?/\1-/g;s/-$//'<<<"$COMMITS")-$DBCACHE-$CC-$(date +%s).json" \
  --parameter-list COMMIT ${COMMITS// /,} \
  --prepare "killall bitcoind 2>/dev/null; rm -rf $DATA_DIR/blocks $DATA_DIR/chainstate $DATA_DIR/chainstate_snapshot $DATA_DIR/debug.log; git checkout {COMMIT}; git clean -fxd; git reset --hard && \
             cmake -B build -G Ninja -DCMAKE_BUILD_TYPE=RelWithDebInfo -DENABLE_IPC=OFF && ninja -C build bitcoind bitcoin-cli -j2 && \
             ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=1 -printtoconsole=0; sleep 20 && \
             ./build/bin/bitcoind -datadir=$DATA_DIR -daemon -blocksonly -connect=0 -dbcache=$DBCACHE -printtoconsole=0; sleep 20" \
   --conclude "build/bin/bitcoin-cli -datadir=$DATA_DIR stop || true; killall bitcoind || true; sleep 10; \
               echo '{COMMIT} | dbcache=$DBCACHE | chainstate: $(find $DATA_DIR/chainstate_snapshot -type f 2>/dev/null | wc -l) files, $(du -sb $DATA_DIR/chainstate_snapshot 2>/dev/null | cut -f1) bytes' >> $DATA_DIR/debug.log; \
               cp $DATA_DIR/debug.log $LOG_DIR/debug-assumeutxo-{COMMIT}-dbcache-$DBCACHE-$(date +%s).log" \
    "COMPILER=$CC DBCACHE=$DBCACHE ./build/bin/bitcoin-cli -datadir=$DATA_DIR -rpcclienttimeout=0 loadtxoutset $UTXO_SNAPSHOT_PATH"; \
done

745eb053a4 Merge bitcoin-core/gui#901: Add createwallet, createwalletdescriptor, and migratewallet to history filter
5cb93dad7c leveldb: remove bloom filters from leveldb

Benchmark 1: COMPILER=gcc DBCACHE=450 ./build/bin/bitcoin-cli -datadir=/mnt/my_storage/ShallowBitcoinData -rpcclienttimeout=0 loadtxoutset /mnt/my_storage/utxo-880000.dat (COMMIT = 745eb053a41c487cc10f20644c65dc8455cf8974)
  Time (mean ± σ):     696.452 s ± 57.904 s    [User: 0.002 s, System: 0.001 s]
  Range (min … max):   655.482 s … 797.623 s    5 runs

Benchmark 2: COMPILER=gcc DBCACHE=450 ./build/bin/bitcoin-cli -datadir=/mnt/my_storage/ShallowBitcoinData -rpcclienttimeout=0 loadtxoutset /mnt/my_storage/utxo-880000.dat (COMMIT = 5cb93dad7c06db82642169d8f7d07442d215f49c)
  Time (mean ± σ):     628.999 s ± 37.939 s    [User: 0.002 s, System: 0.001 s]
  Range (min … max):   596.216 s … 673.440 s    5 runs

Relative speed comparison
        1.11 ±  0.11  COMPILER=gcc DBCACHE=450 ./build/bin/bitcoin-cli -datadir=/mnt/my_storage/ShallowBitcoinData -rpcclienttimeout=0 loadtxoutset /mnt/my_storage/utxo-880000.dat (COMMIT = 745eb053a41c487cc10f20644c65dc8455cf8974)
        1.00          COMPILER=gcc DBCACHE=450 ./build/bin/bitcoin-cli -datadir=/mnt/my_storage/ShallowBitcoinData -rpcclienttimeout=0 loadtxoutset /mnt/my_storage/utxo-880000.dat (COMMIT = 5cb93dad7c06db82642169d8f7d07442d215f49c)
assumeutxo load | 880000 blocks | dbcache 4500 | i7-hdd | x86_64 | Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz | 8 cores | 62Gi RAM | ext4 | HDD
Benchmark 1: COMPILER=gcc DBCACHE=4500 ./build/bin/bitcoin-cli -datadir=/mnt/my_storage/ShallowBitcoinData -rpcclienttimeout=0 loadtxoutset /mnt/my_storage/utxo-880000.dat (COMMIT = 745eb053a41c487cc10f20644c65dc8455cf8974)
  Time (mean ± σ):     674.430 s ± 37.704 s    [User: 0.001 s, System: 0.001 s]
  Range (min … max):   642.483 s … 734.178 s    5 runs

Benchmark 2: COMPILER=gcc DBCACHE=4500 ./build/bin/bitcoin-cli -datadir=/mnt/my_storage/ShallowBitcoinData -rpcclienttimeout=0 loadtxoutset /mnt/my_storage/utxo-880000.dat (COMMIT = 5cb93dad7c06db82642169d8f7d07442d215f49c)
  Time (mean ± σ):     622.827 s ± 16.068 s    [User: 0.001 s, System: 0.002 s]
  Range (min … max):   610.489 s … 650.770 s    5 runs

Relative speed comparison
        1.08 ±  0.07  COMPILER=gcc DBCACHE=4500 ./build/bin/bitcoin-cli -datadir=/mnt/my_storage/ShallowBitcoinData -rpcclienttimeout=0 loadtxoutset /mnt/my_storage/utxo-880000.dat (COMMIT = 745eb053a41c487cc10f20644c65dc8455cf8974)
        1.00          COMPILER=gcc DBCACHE=4500 ./build/bin/bitcoin-cli -datadir=/mnt/my_storage/ShallowBitcoinData -rpcclienttimeout=0 loadtxoutset /mnt/my_storage/utxo-880000.dat (COMMIT = 5cb93dad7c06db82642169d8f7d07442d215f49c)
assumeutxo load | 880000 blocks | dbcache 45000 | i7-hdd | x86_64 | Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz | 8 cores | 62Gi RAM | ext4 | HDD
Benchmark 1: COMPILER=gcc DBCACHE=45000 ./build/bin/bitcoin-cli -datadir=/mnt/my_storage/ShallowBitcoinData -rpcclienttimeout=0 loadtxoutset /mnt/my_storage/utxo-880000.dat (COMMIT = 745eb053a41c487cc10f20644c65dc8455cf8974)
  Time (mean ± σ):     484.569 s ± 16.260 s    [User: 0.001 s, System: 0.002 s]
  Range (min … max):   469.979 s … 507.771 s    5 runs

Benchmark 2: COMPILER=gcc DBCACHE=45000 ./build/bin/bitcoin-cli -datadir=/mnt/my_storage/ShallowBitcoinData -rpcclienttimeout=0 loadtxoutset /mnt/my_storage/utxo-880000.dat (COMMIT = 5cb93dad7c06db82642169d8f7d07442d215f49c)
  Time (mean ± σ):     482.040 s ± 12.817 s    [User: 0.002 s, System: 0.001 s]
  Range (min … max):   465.205 s … 500.719 s    5 runs

Relative speed comparison
        1.01 ±  0.04  COMPILER=gcc DBCACHE=45000 ./build/bin/bitcoin-cli -datadir=/mnt/my_storage/ShallowBitcoinData -rpcclienttimeout=0 loadtxoutset /mnt/my_storage/utxo-880000.dat (COMMIT = 745eb053a41c487cc10f20644c65dc8455cf8974)
        1.00          COMPILER=gcc DBCACHE=45000 ./build/bin/bitcoin-cli -datadir=/mnt/my_storage/ShallowBitcoinData -rpcclienttimeout=0 loadtxoutset /mnt/my_storage/utxo-880000.dat (COMMIT = 5cb93dad7c06db82642169d8f7d07442d215f49c)

(note: image will be moved to a comment later)

For reference, here is how the change affects reindex-chainstate per 100k block chunk:

(note: image will be moved to a comment later)

Persisted Size

UTXO set size depends on LevelDB compaction scheduling. To stability stabilize measurements, we have instrumented the code to compact after every block connect to see the exact effect of the bloom filters on number of LevelDB files and their total sizes. This is for on-disk size measurement only, not for performance.

compact after each block connection for stable size

log index directory stats for every update tip

From 3e5414c6ef6f4cefbb0ad49d3c164823850e42b2 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?L=C5=91rinc?= <pap.lorinc@gmail.com>
Date: Wed, 29 Oct 2025 08:53:07 +0100
Subject: [PATCH] log index directory stats for every update tip

---
 src/validation.cpp | 33 +++++++++++++++++++++++++++++++--
 1 file changed, 31 insertions(+), 2 deletions(-)

diff --git a/src/validation.cpp b/src/validation.cpp
index af523b06d74e4..cc68023b1e7dc 100644
--- a/src/validation.cpp
+++ b/src/validation.cpp
@@ -68,11 +68,13 @@
 #include <cassert>
 #include <chrono>
 #include <deque>
+#include <filesystem>
 #include <numeric>
 #include <optional>
 #include <ranges>
 #include <span>
 #include <string>
+#include <thread>
 #include <tuple>
 #include <utility>

@@ -2942,6 +2944,27 @@ void Chainstate::PruneAndFlush()
     }
 }

+static std::pair<size_t, size_t> GetDirectoryStats(const fs::path& dir_path)
+{
+    assert(fs::exists(dir_path) && fs::is_directory(dir_path));
+    for (int attempts{0}; attempts < 100; ++attempts) {
+        try {
+            size_t file_count{0}, total_bytes{0};
+            for (const auto& entry : fs::recursive_directory_iterator(dir_path)) {
+                if (entry.is_regular_file()) {
+                    ++file_count;
+                    total_bytes += entry.file_size();
+                }
+            }
+            return {file_count, total_bytes};
+        } catch (const fs::filesystem_error&) {
+            // can fail during compaction
+            std::this_thread::sleep_for(std::chrono::seconds(5));
+        }
+    }
+    std::terminate();
+}
+
 static void UpdateTipLog(
     const ChainstateManager& chainman,
     const CCoinsViewCache& coins_tip,
@@ -2953,8 +2976,12 @@ static void UpdateTipLog(

     AssertLockHeld(::cs_main);

-    // Disable rate limiting in LogPrintLevel_ so this source location may log during IBD.
-    LogPrintLevel_(BCLog::LogFlags::ALL, BCLog::Level::Info, /*should_ratelimit=*/false, "%s%s: new best=%s height=%d version=0x%08x log2_work=%f tx=%lu date='%s' progress=%f cache=%.1fMiB(%utxo)%s\n",
+    const fs::path datadir{"/mnt/my_storage/BitcoinData"}; // TODO shouldn't be hard-coded
+    auto [chainstate_files, chainstate_bytes] = GetDirectoryStats(datadir / "chainstate");
+    auto [index_files, index_bytes] = GetDirectoryStats(datadir / "blocks" / "index");
+
+    LogPrintLevel_(BCLog::LogFlags::ALL, BCLog::Level::Info, /*should_ratelimit=*/false,
+                   "%s%s: new best=%s height=%d version=0x%08x log2_work=%f tx=%lu date='%s' progress=%f cache=%.1fMiB(%utxo) chainstate=%zu files/%zu bytes index=%zu files/%zu bytes%s\n",
                    prefix, func_name,
                    tip->GetBlockHash().ToString(), tip->nHeight, tip->nVersion,
                    log(tip->nChainWork.getdouble()) / log(2.0), tip->m_chain_tx_count,
@@ -2962,6 +2989,8 @@ static void UpdateTipLog(
                    chainman.GuessVerificationProgress(tip),
                    coins_tip.DynamicMemoryUsage() * (1.0 / (1 << 20)),
                    coins_tip.GetCacheSize(),
+                   chainstate_files, chainstate_bytes,
+                   index_files, index_bytes,
                    !warning_messages.empty() ? strprintf(" warning='%s'", warning_messages) : "");
 }

From 76d866de450e30bd60edddd221a64266fb6488da Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?L=C5=91rinc?= <pap.lorinc@gmail.com>
Date: Tue, 4 Nov 2025 18:07:59 +0100
Subject: [PATCH] compact after each block connection

---
 src/dbwrapper.cpp  | 7 ++++++-
 src/dbwrapper.h    | 2 ++
 src/txdb.h         | 1 +
 src/validation.cpp | 2 ++
 4 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/src/dbwrapper.cpp b/src/dbwrapper.cpp
index fe5f9cb0893d7..8e2be54f35fd3 100644
--- a/src/dbwrapper.cpp
+++ b/src/dbwrapper.cpp
@@ -245,7 +245,7 @@ CDBWrapper::CDBWrapper(const DBParams& params)

     if (params.options.force_compact) {
         LogInfo("Starting database compaction of %s", fs::PathToString(params.path));
-        DBContext().pdb->CompactRange(nullptr, nullptr);
+        CompactFull();
         LogInfo("Finished database compaction of %s", fs::PathToString(params.path));
     }

@@ -348,6 +348,11 @@ bool CDBWrapper::IsEmpty()
     return !(it->Valid());
 }

+void CDBWrapper::CompactFull()
+{
+    DBContext().pdb->CompactRange(nullptr, nullptr);
+}
+
 struct CDBIterator::IteratorImpl {
     const std::unique_ptr<leveldb::Iterator> iter;

diff --git a/src/dbwrapper.h b/src/dbwrapper.h
index b9b98bd96ade3..8aba4feb08e6c 100644
--- a/src/dbwrapper.h
+++ b/src/dbwrapper.h
@@ -284,6 +284,8 @@ class CDBWrapper
         ssKey2 << key_end;
         return EstimateSizeImpl(ssKey1, ssKey2);
     }
+
+    void CompactFull();
 };

 #endif // BITCOIN_DBWRAPPER_H
diff --git a/src/txdb.h b/src/txdb.h
index ea0cf9d77e596..394993fa5264a 100644
--- a/src/txdb.h
+++ b/src/txdb.h
@@ -56,6 +56,7 @@ class CCoinsViewDB final : public CCoinsView

     //! @returns filesystem path to on-disk storage or std::nullopt if in memory.
     std::optional<fs::path> StoragePath() { return m_db->StoragePath(); }
+    void CompactFull() const { m_db->CompactFull(); }
 };

 #endif // BITCOIN_TXDB_H
diff --git a/src/validation.cpp b/src/validation.cpp
index cc68023b1e7dc..b6e796d539357 100644
--- a/src/validation.cpp
+++ b/src/validation.cpp
@@ -3027,6 +3027,8 @@ void Chainstate::UpdateTip(const CBlockIndex* pindexNew)
             }
         }
     }
+    m_blockman.m_block_tree_db->CompactFull();
+    this->CoinsDB().CompactFull();
     UpdateTipLog(m_chainman, coins_tip, pindexNew, __func__, "",
                  util::Join(warning_messages, Untranslated(", ")).original);
 }

Running a before/after reindex-chainstate and plotting the on-disk size of the chainstate index for every block shows that the PR reduces the UTXO index by roughly 222MB (2%).

instrumented benchmark patch

COMMITS="76d866de450e30bd60edddd221a64266fb6488da fd69291daff5cee0763023203a24d52cd7aab183"; \
STOP=921129; DBCACHE=450; \
CC=gcc; CXX=g++; \
BASE_DIR="/mnt/my_storage"; DATA_DIR="$BASE_DIR/BitcoinData"; LOG_DIR="$BASE_DIR/logs"; \
(echo ""; for c in $COMMITS; do git fetch -q origin $c && git log -1 --pretty='%h %s' $c || exit 1; done) && \
(echo "" && echo "reindex-chainstate | ${STOP} blocks | dbcache ${DBCACHE} | $(hostname) | $(uname -m) | $(lscpu | grep 'Model name' | head -1 | cut -d: -f2 | xargs) | $(nproc) cores | $(free -h | awk '/^Mem:/{print $2}') RAM | $(df -T $BASE_DIR | awk 'NR==2{print $2}') | $(lsblk -no ROTA $(df --output=source $BASE_DIR | tail -1) | grep -q 0 && echo SSD || echo HDD)"; echo "") &&\
hyperfine \
  --sort command \
  --runs 1 \
  --export-json "$BASE_DIR/rdx-$(sed -E 's/(\w{8})\w+ ?/\1-/g;s/-$//'<<<"$COMMITS")-$STOP-$DBCACHE-$CC.json" \
  --parameter-list COMMIT ${COMMITS// /,} \
  --prepare "killall bitcoind 2>/dev/null; rm -f $DATA_DIR/debug.log; git checkout {COMMIT}; git clean -fxd; git reset --hard && \
    cmake -B build -G Ninja -DCMAKE_BUILD_TYPE=RelWithDebInfo -DENABLE_IPC=OFF && ninja -C build bitcoind -j2 && \
    ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP -dbcache=1000 -printtoconsole=0; sleep 20" \
  --conclude "killall bitcoind || true; sleep 5; grep -q 'height=0' $DATA_DIR/debug.log && grep -q 'height=$STOP' $DATA_DIR/debug.log; \
              cp $DATA_DIR/debug.log $LOG_DIR/debug-{COMMIT}-$(date +%s).log" \
  "COMPILER=$CC ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP -dbcache=$DBCACHE -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0"

76d866de45 compact after each block connection
fd69291daf leveldb: remove bloom filters from leveldb

reindex-chainstate | 921129 blocks | dbcache 450 | i9-ssd | x86_64 | Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz | 16 cores | 62Gi RAM | xfs | SSD

Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=921129 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 76d866de450e30bd60edddd221a64266fb6488da)
  Time (abs ≡):        56611.911 s               [User: 45475.199 s, System: 5006.131 s]
Benchmark 2: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=921129 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = fd69291daff5cee0763023203a24d52cd7aab183)
  Time (abs ≡):        53522.995 s               [User: 41015.776 s, System: 4894.659 s]

Relative speed comparison
        1.06          COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=921129 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 76d866de450e30bd60edddd221a64266fb6488da)
        1.00          COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=921129 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = fd69291daff5cee0763023203a24d52cd7aab183)

(note: image will be moved to a comment later)

Full validation

To help with reproducibility, the first commit introduces a slight regression to demonstrate the need for the second commit.

With BIP30 checks still active and without LevelDB bloom filters, the first 230k blocks validate ~7% slower.

COMMITS="2b9c3511986bb2f55310dd5fe7b6367fcc63e44e 166d35713cf61986bb4b37283cb8b001ad013771"; STOP=230000; DBCACHE=450; CC=gcc; CXX=g++; BASE_DIR="/mnt/my_storage"; DATA_DIR="$BASE_DIR/BitcoinData"; LOG_DIR="$BASE_DIR/logs"; (echo ""; for c in $COMMITS; do git fetch -q origin $c && git log -1 --pretty='%h %s' $c || exit 1; done) && (echo "" && echo "reindex-chainstate | ${STOP} blocks | dbcache ${DBCACHE} | $(hostname) | $(uname -m) | $(lscpu | grep 'Model name' | head -1 | cut -d: -f2 | xargs) | $(nproc) cores | $(free -h | awk '/^Mem:/{print $2}') RAM | $(df -T $BASE_DIR | awk 'NR==2{print $2}') | $(lsblk -no ROTA $(df --output=source $BASE_DIR | tail -1) | grep -q 0 && echo SSD || echo HDD)"; echo "") &&hyperfine   --sort command   --runs 2   --export-json "$BASE_DIR/rdx-$(sed -E 's/(\w{8})\w+ ?/\1-/g;s/-$//'<<<"$COMMITS")-$STOP-$DBCACHE-$CC.json"   --parameter-list COMMIT ${COMMITS// /,}   --prepare "killall bitcoind 2>/dev/null; rm -f $DATA_DIR/debug.log; git checkout {COMMIT}; git clean -fxd; git reset --hard && \
    cmake -B build -G Ninja -DCMAKE_BUILD_TYPE=RelWithDebInfo -DENABLE_IPC=OFF && ninja -C build bitcoind -j2 && \
    ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP -dbcache=1000 -printtoconsole=0; sleep 20"   --conclude "killall bitcoind || true; sleep 5; grep -q 'height=0' $DATA_DIR/debug.log && grep -q 'Disabling script verification at block #1' $DATA_DIR/debug.log && grep -q 'height=$STOP' $DATA_DIR/debug.log; \
              cp $DATA_DIR/debug.log $LOG_DIR/debug-{COMMIT}-$(date +%s).log"   "COMPILER=$CC ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP -dbcache=$DBCACHE -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0"

2b9c351198 Merge bitcoin/bitcoin#33768: refactor: remove dead branches in `SingletonClusterImpl`
166d35713c leveldb: remove bloom filters from leveldb

reindex-chainstate | 230000 blocks | dbcache 450 | i9-ssd | x86_64 | Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz | 16 cores | 62Gi RAM | xfs | SSD

Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=230000 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 2b9c3511986bb2f55310dd5fe7b6367fcc63e44e)
  Time (mean ± σ):     170.615 s ±  0.468 s    [User: 186.278 s, System: 10.035 s]
  Range (min … max):   170.285 s … 170.946 s    2 runs

Benchmark 2: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=230000 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 166d35713cf61986bb4b37283cb8b001ad013771)
  Time (mean ± σ):     181.904 s ±  0.534 s    [User: 196.567 s, System: 10.482 s]
  Range (min … max):   181.526 s … 182.281 s    2 runs

Relative speed comparison
        1.00          COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=230000 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 2b9c3511986bb2f55310dd5fe7b6367fcc63e44e)
        1.07 ±  0.00  COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=230000 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 166d35713cf61986bb4b37283cb8b001ad013771)

With BIP30 buried behind assumevalid and without LevelDB bloom filters, the first 230k blocks validate ~33% faster.

COMMITS="2b9c3511986bb2f55310dd5fe7b6367fcc63e44e 060a83df97a84e39a44a7f4a8ea27512d2e7b008"; \
STOP=230000; DBCACHE=450; \
CC=gcc; CXX=g++; \
BASE_DIR="/mnt/my_storage"; DATA_DIR="$BASE_DIR/BitcoinData"; LOG_DIR="$BASE_DIR/logs"; \
(echo ""; for c in $COMMITS; do git fetch -q origin $c && git log -1 --pretty='%h %s' $c || exit 1; done) && \
(echo "" && echo "reindex-chainstate | ${STOP} blocks | dbcache ${DBCACHE} | $(hostname) | $(uname -m) | $(lscpu | grep 'Model name' | head -1 | cut -d: -f2
 | xargs) | $(nproc) cores | $(free -h | awk '/^Mem:/{print $2}') RAM | $(df -T $BASE_DIR | awk 'NR==2{print $2}') | $(lsblk -no ROTA $(df --output=source $
BASE_DIR | tail -1) | grep -q 0 && echo SSD || echo HDD)"; echo "") &&\
hyperfine \
  --sort command \
  --runs 2 \
  --export-json "$BASE_DIR/rdx-$(sed -E 's/(\w{8})\w+ ?/\1-/g;s/-$//'<<<"$COMMITS")-$STOP-$DBCACHE-$CC.json" \
  --parameter-list COMMIT ${COMMITS// /,} \
  --prepare "killall bitcoind 2>/dev/null; rm -f $DATA_DIR/debug.log; git checkout {COMMIT}; git clean -fxd; git reset --hard && \
    cmake -B build -G Ninja -DCMAKE_BUILD_TYPE=RelWithDebInfo -DENABLE_IPC=OFF && ninja -C build bitcoind -j2 && \
    ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP -dbcache=1000 -printtoconsole=0; sleep 20" \
  --conclude "killall bitcoind || true; sleep 5; grep -q 'height=0' $DATA_DIR/debug.log && grep -q 'Disabling script verification at block #1' $DATA_DIR/deb
ug.log && grep -q 'height=$STOP' $DATA_DIR/debug.log; \
              cp $DATA_DIR/debug.log $LOG_DIR/debug-{COMMIT}-$(date +%s).log" \
  "COMPILER=$CC ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP -dbcache=$DBCACHE -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0"

2b9c351198 Merge bitcoin/bitcoin#33768: refactor: remove dead branches in `SingletonClusterImpl`
060a83df97 validation: bury bip30 checks behind assumevalid

reindex-chainstate | 230000 blocks | dbcache 450 | i9-ssd | x86_64 | Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz | 16 cores | 62Gi RAM | xfs | SSD

Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=230000 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 2b9c3511986bb2f55310dd5fe7b6367fcc63e44e)
  Time (mean ± σ):     170.827 s ±  0.718 s    [User: 186.351 s, System: 10.223 s]
  Range (min … max):   170.319 s … 171.334 s    2 runs

Benchmark 2: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=230000 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 060a83df97a84e39a44a7f4a8ea27512d2e7b008)
  Time (mean ± σ):     128.569 s ±  0.168 s    [User: 143.057 s, System: 10.436 s]
  Range (min … max):   128.449 s … 128.688 s    2 runs

Relative speed comparison
        1.33 ±  0.01  COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=230000 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 2b9c3511986bb2f55310dd5fe7b6367fcc63e44e)
        1.00          COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=230000 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 060a83df97a84e39a44a7f4a8ea27512d2e7b008)

LevelDB's `FilterPolicy` stores a per-table probabilistic filter to optimize for negative lookups. Outside the BIP30/BIP34 window (first ~230k blocks), validation does not deliberately probe the UTXO set for missing entries (missing coins imply invalid transactions). Filters therefore slow the common case (present-key lookups) while adding a probabilistic structure to on-disk tables. Bloom filters were introduced in the Ultraprune PR (bitcoin#1677) without explicit documentation of their purpose. Removing them slightly speeds up present-key workloads (~11% faster AssumeUTXO load) and reduces the on-disk chainstate size by ~2% because filter blocks are not stored. This commit is placed before burying BIP30 behind assumevalid to make performance changes reproducible in isolation. Benchmarking reindex-chainstate for the first 230k blocks (to quantify the cost of negative lookups without filters) shows only a small slowdown on misses, totaling a few seconds, while later blocks can be faster due to optimizing for the common case. ----- 2b9c351 Merge bitcoin#33768: refactor: remove dead branches in `SingletonClusterImpl` 166d35713c leveldb: remove bloom filters from leveldb Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=230000 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 2b9c351) Time (mean ± σ): 170.615 s ± 0.468 s [User: 186.278 s, System: 10.035 s] Range (min … max): 170.285 s … 170.946 s 2 runs Benchmark 2: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=230000 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 166d35713cf61986bb4b37283cb8b001ad013771) Time (mean ± σ): 181.904 s ± 0.534 s [User: 196.567 s, System: 10.482 s] Range (min … max): 181.526 s … 182.281 s 2 runs

BIP30 prevents duplicate transaction IDs by checking whether outputs already exist in the UTXO set before adding them. This applies to blocks <227,930 (pre-BIP34 activation) and is conservatively re-enforced after height 1,983,701. BIP30 checks are the only place in validation where we intentionally query the UTXO database for entries we expect not to find. For blocks prior to the `assumevalid` anchor, we already skip script verification and other checks, relying on accumulated proof of work. Skipping BIP30 for those deeply buried blocks is consistent with assumevalid's purpose. This removes negative UTXO lookups during IBD when íassumevalidí is used. Nodes syncing from genesis with -assumevalid=0 still perform full BIP30 validation. Checks beyond 1,983,701 remain enforced regardless of `fScriptChecks`. ----- 2b9c351 Merge bitcoin#33768: refactor: remove dead branches in `SingletonClusterImpl` 060a83df97 validation: bury bip30 checks behind assumevalid Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=230000 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 2b9c351) Time (mean ± σ): 170.827 s ± 0.718 s [User: 186.351 s, System: 10.223 s] Range (min … max): 170.319 s … 171.334 s 2 runs Benchmark 2: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=230000 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 060a83df97a84e39a44a7f4a8ea27512d2e7b008) Time (mean ± σ): 128.569 s ± 0.168 s [User: 143.057 s, System: 10.436 s] Range (min … max): 128.449 s … 128.688 s 2 runs

DrahtBot · 2025-11-07T12:43:34Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33817.

Reviews

See the guideline for information on the review process.
A summary of reviews will appear here.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#34004 (Implementation of SwiftSync by rustaceanrob)
#32317 (kernel: Separate UTXO set access from validation functions by sedited)
#30214 (refactor: Improve assumeutxo state representation by ryanofsky)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

maflcko · 2025-11-07T15:29:48Z

Skipping BIP30 for those deeply buried blocks

Not sure about this. Wouldn't this mean someone can feed a -nominimumchainwork node a bogus chain, so that the node crashes or is stuck irrecoverably on the bogus chain?

Even if it wasn't, I am not sure if touching validation.cpp is worth it for basically a rounding error on overall IBD speed?

l0rinc · 2025-11-07T15:34:14Z

I am not sure if touching validation.cpp is worth it

It's not about IBD necessarily, but reduced disk footprint and adjusting the database to resemble the usage more closely:

Removing the LevelDB bloom filters slightly speeds up present-key workloads (~11% faster AssumeUTXO load) and reduces the on-disk chainstate size by ~2% because filter blocks are not stored.

gmaxwell · 2025-11-07T21:51:45Z

Hm. I don't know we were aware that you could turn off the filters in leveldb-- I thought they were used to also decide what level an entry might be in!

Have you tried to characterize if this opens up any DOS attacks with unconfirmed transactions?

I think assumeutxo load time is not the best benchmark for this-- it's a one time operation and already pretty fast. It would be more compelling if it could be shown to reduce IBD time or block validation time at tip-- though the validation time graph you've provided isn't very compelling.

Nor do I think 2% storage is particularly compelling. But if there isn't a potential downside, why not?

l0rinc · 2025-11-09T18:49:13Z

Thanks for the comments!
I'm still testing how this combines with other LevelDB options, and how it integrates with other changes such as #31132, which could benefit from faster reads (I'm getting mixed results on different systems for now), and how much memory is saved by skipping the filters (these are all really slow to measure reliably).
In the meantime please keep the conceptual reviews coming, appreciate the high-level context.

DrahtBot · 2025-12-16T15:28:58Z

🐙 This pull request conflicts with the target branch and needs rebase.

l0rinc added 2 commits November 7, 2025 12:50

DrahtBot added the Validation label Nov 7, 2025

This was referenced Nov 13, 2025

kernel: Separate UTXO set access from validation functions #32317

Open

refactor: Improve assumeutxo state representation #30214

Merged

DrahtBot mentioned this pull request Dec 4, 2025

Implementation of SwiftSync #34004

Draft

DrahtBot added the Needs rebase label Dec 16, 2025

kevkevinpal mentioned this pull request Dec 16, 2025

refactor: Improve assumeutxo state representation kevkevinpal/bitcoin#231

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

validation: reduce persisted UTXO set size by prioritizing positive lookups (RFC) #33817

validation: reduce persisted UTXO set size by prioritizing positive lookups (RFC) #33817

Uh oh!

l0rinc commented Nov 7, 2025 •

edited

Loading

Uh oh!

DrahtBot commented Nov 7, 2025 •

edited

Loading

Uh oh!

maflcko commented Nov 7, 2025

Uh oh!

l0rinc commented Nov 7, 2025

Uh oh!

gmaxwell commented Nov 7, 2025 •

edited

Loading

Uh oh!

l0rinc commented Nov 9, 2025

Uh oh!

DrahtBot commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

validation: reduce persisted UTXO set size by prioritizing positive lookups (RFC) #33817

Are you sure you want to change the base?

validation: reduce persisted UTXO set size by prioritizing positive lookups (RFC) #33817

Uh oh!

Conversation

l0rinc commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

History

Fix

Disclaimer

Performance

Persisted Size

Full validation

Uh oh!

DrahtBot commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Coverage & Benchmarks

Reviews

Conflicts

Uh oh!

maflcko commented Nov 7, 2025

Uh oh!

l0rinc commented Nov 7, 2025

Uh oh!

gmaxwell commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

l0rinc commented Nov 9, 2025

Uh oh!

DrahtBot commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

l0rinc commented Nov 7, 2025 •

edited

Loading

DrahtBot commented Nov 7, 2025 •

edited

Loading

gmaxwell commented Nov 7, 2025 •

edited

Loading