Skip to content

refactor: don't zero-after-free during serialization#31

Draft
l0rinc wants to merge 3 commits intomasterfrom
detached340
Draft

refactor: don't zero-after-free during serialization#31
l0rinc wants to merge 3 commits intomasterfrom
detached340

Conversation

@l0rinc
Copy link
Owner

@l0rinc l0rinc commented Aug 26, 2025

image
for compiler in gcc clang; do   if [ "$compiler" = "gcc" ]; then CC=gcc; CXX=g++; COMP_VER=$(gcc -dumpfullversion);   else CC=clang; CXX=clang++; COMP_VER=$(clang -dumpversion); fi &&   echo "> Compiler: $compiler $COMP_VER" &&   for commit in d2d3d65e9c1fdc69ae758356de32bdb02879d081 d324342ce93a433499146a86b09a2f69e405a7c1; do     git fetch origin $commit >/dev/null 2>&1 && git checkout $commit >/dev/null 2>&1 && echo "" && git log -1 --pretty='%h %s' &&     rm -rf build >/dev/null 2>&1 && cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=$CC -DCMAKE_CXX_COMPILER=$CXX >/dev/null 2>&1 &&     cmake --build build -j$(nproc) >/dev/null 2>&1 &&     for i in $(seq 0 5); do       build/bin/bench_bitcoin -filter='DataStreamInitAndFree' -min-time=10000;     done;   done; done                                                                                                                                                        
root@rpi5-4:/mnt/my_storage/bitcoin# for compiler in gcc clang; do   if [ "$compiler" = "gcc" ]; then CC=gcc; CXX=g++; COMP_VER=$(gcc -dumpfullversion);   else CC=clang; CXX=clang++; COMP_VER=$(clang -dumpversion); fi &&   echo "> Compiler: $compiler $COMP_VER" &&   for commit in d2d3d65e9c1fdc69ae758356de32bdb02879d081 d324342ce93a433499146a86b09a2f69e405a7c1; do     git fetch origin $commit >/dev/null 2>&1 && git checkout $commit >/dev/null 2>&1 && echo "" && git log -1 --pretty='%h %s' &&     rm -rf build >/dev/null 2>&1 && cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=$CC -DCMAKE_CXX_COMPILER=$CXX >/dev/null 2>&1 &&     cmake --build build -j$(nproc) >/dev/null 2>&1 &&     for i in $(seq 0 5); do       build/bin/bench_bitcoin -filter='DataStreamInitAndFree' -min-time=10000;     done;   done; done

Compiler: gcc 15.0.1

d2d3d65 bench/streams: replace DataStream dummy with lightweight SpanReader

ns/MB MB/s err% ins/MB cyc/MB IPC bra/MB miss% total benchmark
732,457.24 1,365.27 0.4% 5,308,918.02 1,753,168.00 3.028 1,065,061.00 0.0% 10.64 DataStreamInitAndFree
ns/MB MB/s err% ins/MB cyc/MB IPC bra/MB miss% total benchmark
730,263.54 1,369.37 0.1% 5,308,918.02 1,748,311.58 3.037 1,065,061.00 0.0% 10.65 DataStreamInitAndFree
ns/MB MB/s err% ins/MB cyc/MB IPC bra/MB miss% total benchmark
729,825.82 1,370.19 0.0% 5,308,918.02 1,747,167.10 3.039 1,065,061.00 0.0% 10.69 DataStreamInitAndFree
ns/MB MB/s err% ins/MB cyc/MB IPC bra/MB miss% total benchmark
729,934.13 1,369.99 0.1% 5,308,918.02 1,747,514.21 3.038 1,065,061.00 0.0% 10.67 DataStreamInitAndFree
ns/MB MB/s err% ins/MB cyc/MB IPC bra/MB miss% total benchmark
731,197.06 1,367.62 0.2% 5,308,918.02 1,750,386.35 3.033 1,065,061.00 0.0% 10.73 DataStreamInitAndFree
ns/MB MB/s err% ins/MB cyc/MB IPC bra/MB miss% total benchmark
730,506.77 1,368.91 0.1% 5,308,918.02 1,748,645.94 3.036 1,065,061.00 0.0% 10.69 DataStreamInitAndFree

d324342 serialization: stop zeroing DataStream bytes on destruction

ns/MB MB/s err% ins/MB cyc/MB IPC bra/MB miss% total benchmark
66,467.06 15,045.05 5.1% 131,532.00 159,121.33 0.827 16,478.00 0.0% 9.85 〰️ DataStreamInitAndFree (Unstable with ~12,963.4 iters. Increase minEpochIterations to e.g. 129634)
ns/MB MB/s err% ins/MB cyc/MB IPC bra/MB miss% total benchmark
68,677.37 14,560.84 3.9% 131,532.00 164,407.92 0.800 16,478.00 0.0% 10.48 DataStreamInitAndFree
ns/MB MB/s err% ins/MB cyc/MB IPC bra/MB miss% total benchmark
63,976.19 15,630.82 5.3% 131,532.00 153,136.95 0.859 16,478.00 0.0% 10.43 〰️ DataStreamInitAndFree (Unstable with ~14,449.4 iters. Increase minEpochIterations to e.g. 144494)
ns/MB MB/s err% ins/MB cyc/MB IPC bra/MB miss% total benchmark
63,321.76 15,792.36 5.4% 131,532.00 151,607.55 0.868 16,478.00 0.0% 10.55 〰️ DataStreamInitAndFree (Unstable with ~14,854.8 iters. Increase minEpochIterations to e.g. 148548)
ns/MB MB/s err% ins/MB cyc/MB IPC bra/MB miss% total benchmark
62,451.38 16,012.46 5.2% 131,532.00 149,514.53 0.880 16,478.00 0.0% 10.82 〰️ DataStreamInitAndFree (Unstable with ~15,301.9 iters. Increase minEpochIterations to e.g. 153019)
ns/MB MB/s err% ins/MB cyc/MB IPC bra/MB miss% total benchmark
69,387.16 14,411.89 6.4% 131,532.00 166,094.90 0.792 16,478.00 0.0% 10.04 〰️ DataStreamInitAndFree (Unstable with ~13,136.5 iters. Increase minEpochIterations to e.g. 131365)

Compiler: clang 22.0.0

d2d3d65 bench/streams: replace DataStream dummy with lightweight SpanReader

ns/MB MB/s err% ins/MB cyc/MB IPC bra/MB miss% total benchmark
94,580.75 10,572.98 3.9% 197,121.00 226,433.31 0.871 32,871.00 0.0% 11.16 DataStreamInitAndFree
ns/MB MB/s err% ins/MB cyc/MB IPC bra/MB miss% total benchmark
90,958.82 10,993.99 5.1% 197,121.00 217,736.99 0.905 32,871.00 0.0% 11.25 〰️ DataStreamInitAndFree (Unstable with ~11,154.2 iters. Increase minEpochIterations to e.g. 111542)
ns/MB MB/s err% ins/MB cyc/MB IPC bra/MB miss% total benchmark
92,194.76 10,846.60 4.9% 197,121.00 220,689.26 0.893 32,871.00 0.0% 11.30 DataStreamInitAndFree
ns/MB MB/s err% ins/MB cyc/MB IPC bra/MB miss% total benchmark
91,627.27 10,913.78 5.5% 197,121.00 219,359.11 0.899 32,871.00 0.0% 11.34 〰️ DataStreamInitAndFree (Unstable with ~10,820.9 iters. Increase minEpochIterations to e.g. 108209)
ns/MB MB/s err% ins/MB cyc/MB IPC bra/MB miss% total benchmark
87,235.61 11,463.21 5.2% 197,121.00 208,857.27 0.944 32,871.00 0.0% 11.73 〰️ DataStreamInitAndFree (Unstable with ~11,321.9 iters. Increase minEpochIterations to e.g. 113219)
ns/MB MB/s err% ins/MB cyc/MB IPC bra/MB miss% total benchmark
92,227.82 10,842.72 3.2% 197,121.00 220,806.18 0.893 32,871.00 0.0% 11.07 DataStreamInitAndFree

d324342 serialization: stop zeroing DataStream bytes on destruction

ns/MB MB/s err% ins/MB cyc/MB IPC bra/MB miss% total benchmark
66,765.55 14,977.78 6.5% 131,533.00 159,818.15 0.823 16,478.00 0.0% 11.11 〰️ DataStreamInitAndFree (Unstable with ~15,176.2 iters. Increase minEpochIterations to e.g. 151762)
ns/MB MB/s err% ins/MB cyc/MB IPC bra/MB miss% total benchmark
65,376.58 15,296.00 5.3% 131,533.00 156,508.58 0.840 16,478.00 0.0% 11.28 〰️ DataStreamInitAndFree (Unstable with ~15,491.1 iters. Increase minEpochIterations to e.g. 154911)
ns/MB MB/s err% ins/MB cyc/MB IPC bra/MB miss% total benchmark
63,715.23 15,694.84 5.8% 131,533.00 152,564.98 0.862 16,478.00 0.0% 11.02 〰️ DataStreamInitAndFree (Unstable with ~15,204.4 iters. Increase minEpochIterations to e.g. 152044)
ns/MB MB/s err% ins/MB cyc/MB IPC bra/MB miss% total benchmark
61,714.34 16,203.69 5.2% 131,533.00 147,748.42 0.890 16,478.00 0.0% 10.97 〰️ DataStreamInitAndFree (Unstable with ~16,357.8 iters. Increase minEpochIterations to e.g. 163578)
ns/MB MB/s err% ins/MB cyc/MB IPC bra/MB miss% total benchmark
58,684.16 17,040.37 10.7% 131,533.00 140,520.65 0.936 16,478.00 0.0% 11.81 〰️ DataStreamInitAndFree (Unstable with ~17,445.5 iters. Increase minEpochIterations to e.g. 174455)
ns/MB MB/s err% ins/MB cyc/MB IPC bra/MB miss% total benchmark
64,245.00 15,565.41 5.5% 131,533.00 153,693.39 0.856 16,478.00 0.0% 10.88

@l0rinc l0rinc changed the title refactor: done zero after free during serialization refactor: don't zero after free during serialization Aug 26, 2025
@l0rinc l0rinc changed the title refactor: don't zero after free during serialization refactor: don't zero-after-free during serialization Aug 26, 2025
l0rinc and others added 3 commits December 30, 2025 15:08
Problem:
Block benches used `DataStream::Rewind()` to reuse the same input, but `::read()`/`::ignore()` clear the buffer when it is fully consumed, making a rewind after full deserialization observe an empty stream making the benchmarks useless. A dummy write was added originally to prevent this compaction, but was easy to misread as referring to `DataStream::Compact()` - which was in fact unused.

Fixes:
* Change the deserialize bench inputs to use a lightweight `SpanReader` instead and can be recreated each iteration instead of rewinding.
* Remove unused DataStream::Compact().
* Leave `DataStream::Rewind()` usage only in `PrevectorDeserialize` for now (refactored in a follow-up commit).
* Deduplicate and document the clear-on-fully-consumed behavior in `::read()`/`::ignore()` by calling clear().

Since the serialization benchmarks avoid `DataStream` (and implicitly `SerializeData` and `zero_after_free_allocator`) after this change, a dedicated benchmark was added to measure allocation and deallocation.

|               ns/MB |                MB/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|           17,099.33 |           58,481.82 |    0.4% |     10.94 | `DataStreamInitAndFree`
By integrating rewinding with the other methods we can also use it for non-benchmark code as well.

The surrounding code was reformatted for consistency.
`DataStream` is used for non-secret data like P2P messages, block
serialization, and database keys/values. Using `zero_after_free_allocator`
for this data has been acknowledged as unnecessary since at least 2015 [1]:

  "network data should never be sensitive" - sipa
  "i'd prefer if we were more conservative with where that was used.
   for non-sensitive data, it just seems wasteful" - cfields

Benchmarking suggests this barrier may also inhibit other compiler optimizations.

For actual secrets, `secure_allocator` should be used instead, which
both zeros memory and calls mlock() to prevent paging to disk.

Use `std::vector<std::byte>` for `DataStream` storage instead of
`SerializeData`. Code that still relies on zeroing now includes
`zeroafterfree.h` directly (wallet migration code).

This improves `DataStreamInitAndFree` by about 33%:

|               ns/MB |                MB/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|           12,877.39 |           77,655.49 |    0.6% |     10.94 | `DataStreamInitAndFree`

[1] IRC #bitcoin-core-dev 2015-11-06, 2016-11-22

Co-authored-by: David Gumberg <davidzgumberg@gmail.com>
@l0rinc
Copy link
Owner Author

l0rinc commented Jan 1, 2026

Weird, actual reindex-chainstate shows a serious slowdown for some reason:

COMMITS="2bcb3f64648ad27d4df10013b5bf0441124f1372 d324342ce93a433499146a86b09a2f69e405a7c1"; \
STOP=930000; DBCACHE=450; \
CC=gcc; CXX=g++; \
BASE_DIR="/mnt/my_storage"; DATA_DIR="$BASE_DIR/BitcoinData"; LOG_DIR="$BASE_DIR/logs"; \
(echo ""; for c in $COMMITS; do git fetch -q origin $c && git log -1 --pretty='%h %s' $c || exit 1; done) && \
(echo "" && echo "IBD | ${STOP} blocks | dbcache ${DBCACHE} | $(hostname) | $(uname -m) | $(lscpu | grep 'Model name' | head -1 | cut -d: -f2 | xargs) | $(nproc) cores | $(free -h | awk '/^Mem:/{print $2}') RAM | $(df -T $BASE_DIR | awk 'NR==2{print $2}') | $(lsblk -no ROTA $(df --output=source $BASE_DIR | tail -1) | grep -q 0 && echo SSD || echo HDD)"; echo "") &&\
hyperfine \
  --sort command \
  --runs 2 \
  --export-json "$BASE_DIR/ibd-$(sed -E 's/(\w{8})\w+ ?/\1-/g;s/-$//'<<<"$COMMITS")-$STOP-$DBCACHE-$CC.json" \
  --parameter-list COMMIT ${COMMITS// /,} \
  --prepare "killall -9 bitcoind 2>/dev/null; rm -rf $DATA_DIR/*; git checkout {COMMIT}; git clean -fxd; git reset --hard && \
    cmake -B build -G Ninja -DCMAKE_BUILD_TYPE=RelWithDebInfo && ninja -C build bitcoind -j2 && \
    ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=1 -printtoconsole=0; sleep 20" \
  --conclude "cp $DATA_DIR/debug.log $LOG_DIR/debug-{COMMIT}-$(date +%s).log && \
             grep -q 'height=0' $DATA_DIR/debug.log && grep -q 'Disabling script verification at block #1' $DATA_DIR/debug.log && grep -q 'height=$STOP' $DATA_DIR/debug.log" \
  "COMPILER=$CC ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP -dbcache=$DBCACHE -blocksonly -printtoconsole=0"

2bcb3f6464 Merge bitcoin/bitcoin#34112: rpc: [mempool] Remove erroneous Univalue integral casts
d324342ce9 serialization: stop zeroing `DataStream` bytes on destruction

IBD | 930000 blocks | dbcache 450 | i9-ssd | x86_64 | Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz | 16 cores | 62Gi RAM | xfs | SSD

Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=930000 -dbcache=450 -blocksonly -printtoconsole=0 (COMMIT = 2bcb3f64648ad27d4df10013b5bf0441124f1372)
  Time (mean ± σ):     32809.269 s ± 825.872 s    [User: 59825.552 s, System: 4231.481 s]
  Range (min … max):   32225.290 s … 33393.249 s    2 runs
 
Benchmark 2: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=930000 -dbcache=450 -blocksonly -printtoconsole=0 (COMMIT = d324342ce93a433499146a86b09a2f69e405a7c1)
  Time (mean ± σ):     34284.192 s ± 288.254 s    [User: 58633.844 s, System: 3870.922 s]
  Range (min … max):   34080.366 s … 34488.018 s    2 runs
 
Relative speed comparison
        1.00          COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=930000 -dbcache=450 -blocksonly -printtoconsole=0 (COMMIT = 2bcb3f64648ad27d4df10013b5bf0441124f1372)
        1.04 ±  0.03  COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=930000 -dbcache=450 -blocksonly -printtoconsole=0 (COMMIT = d324342ce93a433499146a86b09a2f69e405a7c1)

and

for DBCACHE in 450; do \
  COMMITS="2bcb3f64648ad27d4df10013b5bf0441124f1372 d324342ce93a433499146a86b09a2f69e405a7c1"; \
  STOP=930000; \
  CC=gcc; CXX=g++; \
  BASE_DIR="/mnt/my_storage"; DATA_DIR="$BASE_DIR/BitcoinData"; LOG_DIR="$BASE_DIR/logs"; \
  (echo ""; for c in $COMMITS; do git fetch -q origin $c && git log -1 --pretty='%h %s' $c || exit 1; done) && \
  (echo "" && echo "reindex-chainstate | ${STOP} blocks | dbcache ${DBCACHE} | $(hostname) | $(uname -m) | $(lscpu | grep 'Model name' | head -1 | cut -d: -f2 | xargs) | $(nproc) cores | $(free -h | awk '/^Mem:/{print $2}') RAM | $(df -T $BASE_DIR | awk 'NR==2{print $2}') | $(lsblk -no ROTA $(df --output=source $BASE_DIR | tail -1) | grep -q 0 && echo SSD || echo HDD)"; echo "") &&\                                                                                                                  awk 'NR==2{print $2}') | $(lsblk -no ROTA $(df --output=source $BASE_DIR | tail -1) | grep -q 0 && echo SSD || echo HDD)"; echo "") &&\
  hyperfine \
    --sort command \
    --runs 1 \
    --export-json "$BASE_DIR/rdx-$(sed -E 's/(\w{8})\w+ ?/\1-/g;s/-$//'<<<"$COMMITS")-$STOP-$DBCACHE-$CC.json" \
    --parameter-list COMMIT ${COMMITS// /,} \
    --prepare "killall -9 bitcoind 2>/dev/null; rm -f $DATA_DIR/debug.log; git checkout {COMMIT}; git clean -fxd; git reset --hard && \
      cmake -B build -G Ninja -DCMAKE_BUILD_TYPE=Release -DENABLE_IPC=OFF && ninja -C build bitcoind -j2 && \
      ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP -dbcache=1000 -printtoconsole=0; sleep 20; rm -f $DATA_DIR/debug.log" \
    --conclude "killall bitcoind || true; sleep 5; grep -q 'height=0' $DATA_DIR/debug.log && grep -q 'Disabling script verification at block #1' $DATA_DIR/debug.log && grep -q 'height=$STOP' $DATA_DIR/debug.log; \
                cp $DATA_DIR/debug.log $LOG_DIR/debug-{COMMIT}-$(date +%s).log" \
    "COMPILER=$CC ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP -dbcache=$DBCACHE -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0";
done

2bcb3f6464 Merge bitcoin/bitcoin#34112: rpc: [mempool] Remove erroneous Univalue integral casts
d324342ce9 serialization: stop zeroing `DataStream` bytes on destruction

reindex-chainstate | 930000 blocks | dbcache 450 | rpi5-16-2 | aarch64 | Cortex-A76 | 4 cores | 15Gi RAM | ext4 | SSD

Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=930000 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 2bcb3f64648ad27d4df10013b5bf0441124f1372)
  Time (abs ≡):        45238.631 s               [User: 82907.155 s, System: 7714.545 s]
 
Benchmark 2: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=930000 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = d324342ce93a433499146a86b09a2f69e405a7c1)
  Time (abs ≡):        45439.667 s               [User: 83052.216 s, System: 7648.153 s]
 
Relative speed comparison
        1.00          COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=930000 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 2bcb3f64648ad27d4df10013b5bf0441124f1372)
        1.00          COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=930000 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = d324342ce93a433499146a86b09a2f69e405a7c1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant