refactor: don't zero-after-free during serialization by l0rinc · Pull Request #31 · l0rinc/bitcoin

l0rinc · 2025-08-26T05:21:02Z

for compiler in gcc clang; do   if [ "$compiler" = "gcc" ]; then CC=gcc; CXX=g++; COMP_VER=$(gcc -dumpfullversion);   else CC=clang; CXX=clang++; COMP_VER=$(clang -dumpversion); fi &&   echo "> Compiler: $compiler $COMP_VER" &&   for commit in d2d3d65e9c1fdc69ae758356de32bdb02879d081 d324342ce93a433499146a86b09a2f69e405a7c1; do     git fetch origin $commit >/dev/null 2>&1 && git checkout $commit >/dev/null 2>&1 && echo "" && git log -1 --pretty='%h %s' &&     rm -rf build >/dev/null 2>&1 && cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=$CC -DCMAKE_CXX_COMPILER=$CXX >/dev/null 2>&1 &&     cmake --build build -j$(nproc) >/dev/null 2>&1 &&     for i in $(seq 0 5); do       build/bin/bench_bitcoin -filter='DataStreamInitAndFree' -min-time=10000;     done;   done; done                                                                                                                                                        
root@rpi5-4:/mnt/my_storage/bitcoin# for compiler in gcc clang; do   if [ "$compiler" = "gcc" ]; then CC=gcc; CXX=g++; COMP_VER=$(gcc -dumpfullversion);   else CC=clang; CXX=clang++; COMP_VER=$(clang -dumpversion); fi &&   echo "> Compiler: $compiler $COMP_VER" &&   for commit in d2d3d65e9c1fdc69ae758356de32bdb02879d081 d324342ce93a433499146a86b09a2f69e405a7c1; do     git fetch origin $commit >/dev/null 2>&1 && git checkout $commit >/dev/null 2>&1 && echo "" && git log -1 --pretty='%h %s' &&     rm -rf build >/dev/null 2>&1 && cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=$CC -DCMAKE_CXX_COMPILER=$CXX >/dev/null 2>&1 &&     cmake --build build -j$(nproc) >/dev/null 2>&1 &&     for i in $(seq 0 5); do       build/bin/bench_bitcoin -filter='DataStreamInitAndFree' -min-time=10000;     done;   done; done

Compiler: gcc 15.0.1

d2d3d65 bench/streams: replace DataStream dummy with lightweight SpanReader

ns/MB	MB/s	err%	ins/MB	cyc/MB	IPC	bra/MB	miss%	total	benchmark
732,457.24	1,365.27	0.4%	5,308,918.02	1,753,168.00	3.028	1,065,061.00	0.0%	10.64	`DataStreamInitAndFree`

ns/MB	MB/s	err%	ins/MB	cyc/MB	IPC	bra/MB	miss%	total	benchmark
730,263.54	1,369.37	0.1%	5,308,918.02	1,748,311.58	3.037	1,065,061.00	0.0%	10.65	`DataStreamInitAndFree`

ns/MB	MB/s	err%	ins/MB	cyc/MB	IPC	bra/MB	miss%	total	benchmark
729,825.82	1,370.19	0.0%	5,308,918.02	1,747,167.10	3.039	1,065,061.00	0.0%	10.69	`DataStreamInitAndFree`

ns/MB	MB/s	err%	ins/MB	cyc/MB	IPC	bra/MB	miss%	total	benchmark
729,934.13	1,369.99	0.1%	5,308,918.02	1,747,514.21	3.038	1,065,061.00	0.0%	10.67	`DataStreamInitAndFree`

ns/MB	MB/s	err%	ins/MB	cyc/MB	IPC	bra/MB	miss%	total	benchmark
731,197.06	1,367.62	0.2%	5,308,918.02	1,750,386.35	3.033	1,065,061.00	0.0%	10.73	`DataStreamInitAndFree`

ns/MB	MB/s	err%	ins/MB	cyc/MB	IPC	bra/MB	miss%	total	benchmark
730,506.77	1,368.91	0.1%	5,308,918.02	1,748,645.94	3.036	1,065,061.00	0.0%	10.69	`DataStreamInitAndFree`

d324342 serialization: stop zeroing DataStream bytes on destruction

ns/MB	MB/s	err%	ins/MB	cyc/MB	IPC	bra/MB	miss%	total	benchmark
66,467.06	15,045.05	5.1%	131,532.00	159,121.33	0.827	16,478.00	0.0%	9.85	〰️ `DataStreamInitAndFree` (Unstable with ~12,963.4 iters. Increase `minEpochIterations` to e.g. 129634)

ns/MB	MB/s	err%	ins/MB	cyc/MB	IPC	bra/MB	miss%	total	benchmark
68,677.37	14,560.84	3.9%	131,532.00	164,407.92	0.800	16,478.00	0.0%	10.48	`DataStreamInitAndFree`

ns/MB	MB/s	err%	ins/MB	cyc/MB	IPC	bra/MB	miss%	total	benchmark
63,976.19	15,630.82	5.3%	131,532.00	153,136.95	0.859	16,478.00	0.0%	10.43	〰️ `DataStreamInitAndFree` (Unstable with ~14,449.4 iters. Increase `minEpochIterations` to e.g. 144494)

ns/MB	MB/s	err%	ins/MB	cyc/MB	IPC	bra/MB	miss%	total	benchmark
63,321.76	15,792.36	5.4%	131,532.00	151,607.55	0.868	16,478.00	0.0%	10.55	〰️ `DataStreamInitAndFree` (Unstable with ~14,854.8 iters. Increase `minEpochIterations` to e.g. 148548)

ns/MB	MB/s	err%	ins/MB	cyc/MB	IPC	bra/MB	miss%	total	benchmark
62,451.38	16,012.46	5.2%	131,532.00	149,514.53	0.880	16,478.00	0.0%	10.82	〰️ `DataStreamInitAndFree` (Unstable with ~15,301.9 iters. Increase `minEpochIterations` to e.g. 153019)

ns/MB	MB/s	err%	ins/MB	cyc/MB	IPC	bra/MB	miss%	total	benchmark
69,387.16	14,411.89	6.4%	131,532.00	166,094.90	0.792	16,478.00	0.0%	10.04	〰️ `DataStreamInitAndFree` (Unstable with ~13,136.5 iters. Increase `minEpochIterations` to e.g. 131365)

Compiler: clang 22.0.0

d2d3d65 bench/streams: replace DataStream dummy with lightweight SpanReader

ns/MB	MB/s	err%	ins/MB	cyc/MB	IPC	bra/MB	miss%	total	benchmark
94,580.75	10,572.98	3.9%	197,121.00	226,433.31	0.871	32,871.00	0.0%	11.16	`DataStreamInitAndFree`

ns/MB	MB/s	err%	ins/MB	cyc/MB	IPC	bra/MB	miss%	total	benchmark
90,958.82	10,993.99	5.1%	197,121.00	217,736.99	0.905	32,871.00	0.0%	11.25	〰️ `DataStreamInitAndFree` (Unstable with ~11,154.2 iters. Increase `minEpochIterations` to e.g. 111542)

ns/MB	MB/s	err%	ins/MB	cyc/MB	IPC	bra/MB	miss%	total	benchmark
92,194.76	10,846.60	4.9%	197,121.00	220,689.26	0.893	32,871.00	0.0%	11.30	`DataStreamInitAndFree`

ns/MB	MB/s	err%	ins/MB	cyc/MB	IPC	bra/MB	miss%	total	benchmark
91,627.27	10,913.78	5.5%	197,121.00	219,359.11	0.899	32,871.00	0.0%	11.34	〰️ `DataStreamInitAndFree` (Unstable with ~10,820.9 iters. Increase `minEpochIterations` to e.g. 108209)

ns/MB	MB/s	err%	ins/MB	cyc/MB	IPC	bra/MB	miss%	total	benchmark
87,235.61	11,463.21	5.2%	197,121.00	208,857.27	0.944	32,871.00	0.0%	11.73	〰️ `DataStreamInitAndFree` (Unstable with ~11,321.9 iters. Increase `minEpochIterations` to e.g. 113219)

ns/MB	MB/s	err%	ins/MB	cyc/MB	IPC	bra/MB	miss%	total	benchmark
92,227.82	10,842.72	3.2%	197,121.00	220,806.18	0.893	32,871.00	0.0%	11.07	`DataStreamInitAndFree`

d324342 serialization: stop zeroing DataStream bytes on destruction

ns/MB	MB/s	err%	ins/MB	cyc/MB	IPC	bra/MB	miss%	total	benchmark
66,765.55	14,977.78	6.5%	131,533.00	159,818.15	0.823	16,478.00	0.0%	11.11	〰️ `DataStreamInitAndFree` (Unstable with ~15,176.2 iters. Increase `minEpochIterations` to e.g. 151762)

ns/MB	MB/s	err%	ins/MB	cyc/MB	IPC	bra/MB	miss%	total	benchmark
65,376.58	15,296.00	5.3%	131,533.00	156,508.58	0.840	16,478.00	0.0%	11.28	〰️ `DataStreamInitAndFree` (Unstable with ~15,491.1 iters. Increase `minEpochIterations` to e.g. 154911)

ns/MB	MB/s	err%	ins/MB	cyc/MB	IPC	bra/MB	miss%	total	benchmark
63,715.23	15,694.84	5.8%	131,533.00	152,564.98	0.862	16,478.00	0.0%	11.02	〰️ `DataStreamInitAndFree` (Unstable with ~15,204.4 iters. Increase `minEpochIterations` to e.g. 152044)

ns/MB	MB/s	err%	ins/MB	cyc/MB	IPC	bra/MB	miss%	total	benchmark
61,714.34	16,203.69	5.2%	131,533.00	147,748.42	0.890	16,478.00	0.0%	10.97	〰️ `DataStreamInitAndFree` (Unstable with ~16,357.8 iters. Increase `minEpochIterations` to e.g. 163578)

ns/MB	MB/s	err%	ins/MB	cyc/MB	IPC	bra/MB	miss%	total	benchmark
58,684.16	17,040.37	10.7%	131,533.00	140,520.65	0.936	16,478.00	0.0%	11.81	〰️ `DataStreamInitAndFree` (Unstable with ~17,445.5 iters. Increase `minEpochIterations` to e.g. 174455)

ns/MB	MB/s	err%	ins/MB	cyc/MB	IPC	bra/MB	miss%	total	benchmark
64,245.00	15,565.41	5.5%	131,533.00	153,693.39	0.856	16,478.00	0.0%	10.88

Problem: Block benches used `DataStream::Rewind()` to reuse the same input, but `::read()`/`::ignore()` clear the buffer when it is fully consumed, making a rewind after full deserialization observe an empty stream making the benchmarks useless. A dummy write was added originally to prevent this compaction, but was easy to misread as referring to `DataStream::Compact()` - which was in fact unused. Fixes: * Change the deserialize bench inputs to use a lightweight `SpanReader` instead and can be recreated each iteration instead of rewinding. * Remove unused DataStream::Compact(). * Leave `DataStream::Rewind()` usage only in `PrevectorDeserialize` for now (refactored in a follow-up commit). * Deduplicate and document the clear-on-fully-consumed behavior in `::read()`/`::ignore()` by calling clear(). Since the serialization benchmarks avoid `DataStream` (and implicitly `SerializeData` and `zero_after_free_allocator`) after this change, a dedicated benchmark was added to measure allocation and deallocation. | ns/MB | MB/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 17,099.33 | 58,481.82 | 0.4% | 10.94 | `DataStreamInitAndFree`

By integrating rewinding with the other methods we can also use it for non-benchmark code as well. The surrounding code was reformatted for consistency.

`DataStream` is used for non-secret data like P2P messages, block serialization, and database keys/values. Using `zero_after_free_allocator` for this data has been acknowledged as unnecessary since at least 2015 [1]: "network data should never be sensitive" - sipa "i'd prefer if we were more conservative with where that was used. for non-sensitive data, it just seems wasteful" - cfields Benchmarking suggests this barrier may also inhibit other compiler optimizations. For actual secrets, `secure_allocator` should be used instead, which both zeros memory and calls mlock() to prevent paging to disk. Use `std::vector<std::byte>` for `DataStream` storage instead of `SerializeData`. Code that still relies on zeroing now includes `zeroafterfree.h` directly (wallet migration code). This improves `DataStreamInitAndFree` by about 33%: | ns/MB | MB/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 12,877.39 | 77,655.49 | 0.6% | 10.94 | `DataStreamInitAndFree` [1] IRC #bitcoin-core-dev 2015-11-06, 2016-11-22 Co-authored-by: David Gumberg <davidzgumberg@gmail.com>

l0rinc · 2026-01-01T12:10:09Z

Weird, actual reindex-chainstate shows a serious slowdown for some reason:

COMMITS="2bcb3f64648ad27d4df10013b5bf0441124f1372 d324342ce93a433499146a86b09a2f69e405a7c1"; \
STOP=930000; DBCACHE=450; \
CC=gcc; CXX=g++; \
BASE_DIR="/mnt/my_storage"; DATA_DIR="$BASE_DIR/BitcoinData"; LOG_DIR="$BASE_DIR/logs"; \
(echo ""; for c in $COMMITS; do git fetch -q origin $c && git log -1 --pretty='%h %s' $c || exit 1; done) && \
(echo "" && echo "IBD | ${STOP} blocks | dbcache ${DBCACHE} | $(hostname) | $(uname -m) | $(lscpu | grep 'Model name' | head -1 | cut -d: -f2 | xargs) | $(nproc) cores | $(free -h | awk '/^Mem:/{print $2}') RAM | $(df -T $BASE_DIR | awk 'NR==2{print $2}') | $(lsblk -no ROTA $(df --output=source $BASE_DIR | tail -1) | grep -q 0 && echo SSD || echo HDD)"; echo "") &&\
hyperfine \
  --sort command \
  --runs 2 \
  --export-json "$BASE_DIR/ibd-$(sed -E 's/(\w{8})\w+ ?/\1-/g;s/-$//'<<<"$COMMITS")-$STOP-$DBCACHE-$CC.json" \
  --parameter-list COMMIT ${COMMITS// /,} \
  --prepare "killall -9 bitcoind 2>/dev/null; rm -rf $DATA_DIR/*; git checkout {COMMIT}; git clean -fxd; git reset --hard && \
    cmake -B build -G Ninja -DCMAKE_BUILD_TYPE=RelWithDebInfo && ninja -C build bitcoind -j2 && \
    ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=1 -printtoconsole=0; sleep 20" \
  --conclude "cp $DATA_DIR/debug.log $LOG_DIR/debug-{COMMIT}-$(date +%s).log && \
             grep -q 'height=0' $DATA_DIR/debug.log && grep -q 'Disabling script verification at block #1' $DATA_DIR/debug.log && grep -q 'height=$STOP' $DATA_DIR/debug.log" \
  "COMPILER=$CC ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP -dbcache=$DBCACHE -blocksonly -printtoconsole=0"

2bcb3f6464 Merge bitcoin/bitcoin#34112: rpc: [mempool] Remove erroneous Univalue integral casts
d324342ce9 serialization: stop zeroing `DataStream` bytes on destruction

IBD | 930000 blocks | dbcache 450 | i9-ssd | x86_64 | Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz | 16 cores | 62Gi RAM | xfs | SSD

Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=930000 -dbcache=450 -blocksonly -printtoconsole=0 (COMMIT = 2bcb3f64648ad27d4df10013b5bf0441124f1372)
  Time (mean ± σ):     32809.269 s ± 825.872 s    [User: 59825.552 s, System: 4231.481 s]
  Range (min … max):   32225.290 s … 33393.249 s    2 runs
 
Benchmark 2: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=930000 -dbcache=450 -blocksonly -printtoconsole=0 (COMMIT = d324342ce93a433499146a86b09a2f69e405a7c1)
  Time (mean ± σ):     34284.192 s ± 288.254 s    [User: 58633.844 s, System: 3870.922 s]
  Range (min … max):   34080.366 s … 34488.018 s    2 runs
 
Relative speed comparison
        1.00          COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=930000 -dbcache=450 -blocksonly -printtoconsole=0 (COMMIT = 2bcb3f64648ad27d4df10013b5bf0441124f1372)
        1.04 ±  0.03  COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=930000 -dbcache=450 -blocksonly -printtoconsole=0 (COMMIT = d324342ce93a433499146a86b09a2f69e405a7c1)

and

for DBCACHE in 450; do \
  COMMITS="2bcb3f64648ad27d4df10013b5bf0441124f1372 d324342ce93a433499146a86b09a2f69e405a7c1"; \
  STOP=930000; \
  CC=gcc; CXX=g++; \
  BASE_DIR="/mnt/my_storage"; DATA_DIR="$BASE_DIR/BitcoinData"; LOG_DIR="$BASE_DIR/logs"; \
  (echo ""; for c in $COMMITS; do git fetch -q origin $c && git log -1 --pretty='%h %s' $c || exit 1; done) && \
  (echo "" && echo "reindex-chainstate | ${STOP} blocks | dbcache ${DBCACHE} | $(hostname) | $(uname -m) | $(lscpu | grep 'Model name' | head -1 | cut -d: -f2 | xargs) | $(nproc) cores | $(free -h | awk '/^Mem:/{print $2}') RAM | $(df -T $BASE_DIR | awk 'NR==2{print $2}') | $(lsblk -no ROTA $(df --output=source $BASE_DIR | tail -1) | grep -q 0 && echo SSD || echo HDD)"; echo "") &&\                                                                                                                  awk 'NR==2{print $2}') | $(lsblk -no ROTA $(df --output=source $BASE_DIR | tail -1) | grep -q 0 && echo SSD || echo HDD)"; echo "") &&\
  hyperfine \
    --sort command \
    --runs 1 \
    --export-json "$BASE_DIR/rdx-$(sed -E 's/(\w{8})\w+ ?/\1-/g;s/-$//'<<<"$COMMITS")-$STOP-$DBCACHE-$CC.json" \
    --parameter-list COMMIT ${COMMITS// /,} \
    --prepare "killall -9 bitcoind 2>/dev/null; rm -f $DATA_DIR/debug.log; git checkout {COMMIT}; git clean -fxd; git reset --hard && \
      cmake -B build -G Ninja -DCMAKE_BUILD_TYPE=Release -DENABLE_IPC=OFF && ninja -C build bitcoind -j2 && \
      ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP -dbcache=1000 -printtoconsole=0; sleep 20; rm -f $DATA_DIR/debug.log" \
    --conclude "killall bitcoind || true; sleep 5; grep -q 'height=0' $DATA_DIR/debug.log && grep -q 'Disabling script verification at block #1' $DATA_DIR/debug.log && grep -q 'height=$STOP' $DATA_DIR/debug.log; \
                cp $DATA_DIR/debug.log $LOG_DIR/debug-{COMMIT}-$(date +%s).log" \
    "COMPILER=$CC ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP -dbcache=$DBCACHE -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0";
done

2bcb3f6464 Merge bitcoin/bitcoin#34112: rpc: [mempool] Remove erroneous Univalue integral casts
d324342ce9 serialization: stop zeroing `DataStream` bytes on destruction

reindex-chainstate | 930000 blocks | dbcache 450 | rpi5-16-2 | aarch64 | Cortex-A76 | 4 cores | 15Gi RAM | ext4 | SSD

Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=930000 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 2bcb3f64648ad27d4df10013b5bf0441124f1372)
  Time (abs ≡):        45238.631 s               [User: 82907.155 s, System: 7714.545 s]
 
Benchmark 2: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=930000 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = d324342ce93a433499146a86b09a2f69e405a7c1)
  Time (abs ≡):        45439.667 s               [User: 83052.216 s, System: 7648.153 s]
 
Relative speed comparison
        1.00          COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=930000 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 2bcb3f64648ad27d4df10013b5bf0441124f1372)
        1.00          COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=930000 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = d324342ce93a433499146a86b09a2f69e405a7c1)

l0rinc changed the title ~~refactor: done zero after free during serialization~~ refactor: don't zero after free during serialization Aug 26, 2025

l0rinc changed the title ~~refactor: don't zero after free during serialization~~ refactor: don't zero-after-free during serialization Aug 26, 2025

l0rinc and others added 3 commits December 30, 2025 15:08

refactor: simplify Rewind method calls

f219f82

By integrating rewinding with the other methods we can also use it for non-benchmark code as well. The surrounding code was reformatted for consistency.

l0rinc force-pushed the detached340 branch from c2b2e33 to d324342 Compare December 30, 2025 20:28

l0rinc closed this Dec 30, 2025

l0rinc reopened this Dec 30, 2025

l0rinc mentioned this pull request Feb 3, 2026

refactor: Use SpanReader over DataStream bitcoin/bitcoin#34483

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: don't zero-after-free during serialization#31

refactor: don't zero-after-free during serialization#31
l0rinc wants to merge 3 commits intomasterfrom
detached340

l0rinc commented Aug 26, 2025 •

edited

Loading

Uh oh!

l0rinc commented Jan 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

l0rinc commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

l0rinc commented Jan 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

l0rinc commented Aug 26, 2025 •

edited

Loading

l0rinc commented Jan 1, 2026 •

edited

Loading