bitcoin Improve runtime performance of --reindex

Background

During the first part of reindexing, LoadExternalBlockFile() sequentially reads raw blocks from the blocks/blk00nnn.dat files (rather than receiving them from peers, as with initial block download) and eventually adds all of them to the block index. When an individual block is initially read, it can't be immediately added unless all its ancestors have been added, which is rare (only about 8% of the time), because the blocks are not sorted by height. When the block can't be immediately added to the block index, its disk location is saved in a map so it can be added later. When its parent is later added to the block index, LoadExternalBlockFile() reads and deserializes the block from disk a second time and adds it to the block index. Most blocks (92%) get deserialized twice.

This PR

During the initial read, it's rarely useful to deserialize the entire block; only the header is needed to determine if the block can be added to the block index immediately. This change to LoadExternalBlockFile() initially deserializes only a block's header, then deserializes the entire block only if it can be added immediately. This reduces reindex time on mainnet by 7 hours on a Raspberry Pi, which translates to around a 25% reduction in the first part of reindexing (adding blocks to the index), and about a 6% reduction in overall reindex time.

Summary: The performance gain is the result of deserializing each block only once, except its header which is deserialized twice, but the header is only 80 bytes.

Sep 29 '19 00:09 LarryRuane

Note: this comment is outdated; the scope of this PR has been reduced.

Running bitcoind --reindex causes the client to reread the .bitcoin/blocks/blknnnnn.dat files and re-derive all the index and chainstate information. It's similar to IBD but the client uses these local files rather than download blocks from the network. As LoadExternalBlockFile() reads each block from disk, it checks to see if its parent has already been seen (and processed). If so, then this block can be immediately processed (by giving it to AcceptBlock()). If this block's parent hasn't yet been seen, it cannot be processed until after its parent is seen. Its prev-hash (parent's hash) is put into a map of "pending" blocks (called mapBlocksUnknownParent). Later, when its parent is encountered and processed, this map is consulted to see it has a child that's already been seen that can now be processed. If so, LoadExternalBlockFile() now has to go back and re-read the child block from disk (ReadBlockFromDisk()), deserialize it again, and then process (pass it to AcceptBlock()).

Performance isn't very good because about 90% of blocks are read before their parents are read. (I think this is a result of the headers-first IBD performance enhancement of a few years ago.) This can be seen by running bitcoind --reindex --debug=reindex, causing many messages like:

2019-09-29T00:29:00Z LoadExternalBlockFile: Out of order block 00000000a2268fb2d4ddc0408fae961a96668db58de415d3c9e0a7694eeb1657, parent 000000005fb53da5c3cda78aef28cbc44fde3e435f1dc2ac04df53387049171f not known

then, a short time later:

2019-09-29T00:29:00Z LoadExternalBlockFile: Processing out of order child 00000000a2268fb2d4ddc0408fae961a96668db58de415d3c9e0a7694eeb1657 of 000000005fb53da5c3cda78aef28cbc44fde3e435f1dc2ac04df53387049171f

So 90% of blocks end up being read from disk twice (actually, not twice, since the operating system's buffer cache likely still has the data, but there is still a memory-to-memory copy), and deserialized twice.

This PR mitigates this problem by taking advantage of the fact that recent blocks are still in the buffer (blkdat) that is used to read from the blk files. Since rewinding to an earlier offset in the buffer has been fixed by the recently merged #16577, we can see if the child block is still in memory (the probability is increased by making the buffer somewhat larger), and if so, avoid having to read the child block data from disk. It turns out that the child block is still in the buffer 60% of the time. This depends on some randomness that's introduced during IBD, so it may be different for you. (My IBD was done using all the default settings, nothing special.)

The second part of this change initially deserializes only the block header, rather than the entire block. since about 90% of the time we're going to have to process this block later; only the hashPrevBlock field of the header is needed to determine if we can process this block immediately (if this block's parent has been seen). This does mean that 10% of the time, we deserialize the header, and then immediately also the full block (where it would have been more efficient to simply deserialize the full block), but overall this is a big performance win.

On my system, this PR reduces the time to --reindex on mainnet by 48 minutes. I encourage reviewers to attempt to replicate my findings. Reindexing has two parts, reading the blocks/blknnnnn.dat files (which this PR improves), and then connecting the blocks to the blockchain (the UpdateTip messages), which this PR doesn't affect. For me, this PR reduces the first part from 123 to 75 minutes. After you start seeing UpdateTip messages, you can stop the client, and you will have the time measurement you need from debug.log.

Sep 29 '19 01:09 LarryRuane

Thanks for the suggestion, @promag, done (force-pushed).

Sep 30 '19 04:09 LarryRuane

I forgot to update the CBufferedFile unit tests, just did that and another force-push.

Sep 30 '19 05:09 LarryRuane

I changed this PR to WIP because I thought of a further improvement, which I'll try to push by the end of the week. It will build on the existing two commits, so reviewing those wouldn't be a waste of time. Thanks in advance for any reviews and suggestions.

Oct 01 '19 14:10 LarryRuane

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

No conflicts as of last run.

Oct 03 '19 20:10 DrahtBot

Testnet performance isn't important, but as a sanity check, I verified that the new code runs faster there too on my laptop. The time between Reindexing block file blk00001.dat... and Reindexing finished reduced from 10m21s to 7m13s.

Oct 11 '19 16:10 LarryRuane

@mzumsande, those are very good thoughts, thank you. I will study this further (and invite others to also). One nice thing about this design is that any amount of memory will work, in that it will be an improvement -- it is one of those classic space-time tradeoffs. The idea of making the buffer not have a hard-coded size is intriguing and promising. I think we would not go back to the pre-second iteration (multiple-files), because even reducing the buffer memory to the original amount would be improved by the multi-file version. (That is to say, there is no downside to the latest version here.) That's because all the multi-file commit means is that we do not discard cached data when switching files.

One thing I hope I made clear enough in an earlier comment is that I tried to keep the memory usage during reindexing be strictly less than what is needed during normal operation, so that there would be no overall increase in memory demand. We certainly would not want this design to create a spike (high water mark) of memory usage. But I was very likely mistaken, since I was not aware that memory could be "dialed down" for small environments.

Oct 13 '19 06:10 LarryRuane

I force-pushed a reorganization of the commits to make reviewing easier. I split the largest commit, speed up reindex by reading out-of-order blocks from memory, into two separate commits:

Commit message: initially deserialize only CBlockHeader, not CBlock. Deserialize only a block header (only 32 bytes) initially, instead of a full block. The block header is all that's needed to tell us if this block's parent has been seen yet; if not, the block will have to be deserialized later, after its parent is found. Nearly all blocks (89% on my system) are seen before their parents, so this is a big improvement.
Commit message: speed up reindex by reading out-of-order blocks from memory. If we've just processed a block that's the parent of an earlier-seen block, we must now process that earlier block. Instead of always reading that block from disk, see if it happens to still be in the circular disk read memory buffer ('CBufferedFile blkdat`). This saves a disk read.

We could merge the first commit without the second. The first commit is pretty simple; the second commit is a little more involved.

These two commits have a buffer size unchanged from master, 8mb (2*MAX_BLOCK_SERIALIZED_SIZE), so there is no change in memory usage. But with this buffer size, only 29% of the out-of-order blocks are found in the buffer. If we increase the buffer by not too much (as I did in an earlier commit that's now been erased by force-push, 640mb), to 32mb, the "hit rate" increases to 50%. This is probably a good space-time balance.

Here are the block-load times on my system (with all default settings), which is just the first part of reindexing (each row builds on the previous row):

version	real time (minutes)
master	130
deserialize only header	90
look for blocks in buffer, 8mb	84
look for blocks in buffer, 32mb	78
look for blocks in buffer, 640mb	72

(Take with a grain of salt, because the times varied even on identical runs, and a lot probably depends on the specifics of my system.) Keep in mind that this buffer exists only during the block-load part of reindex.

Oct 15 '19 03:10 LarryRuane

Under what circumstances is a -reindex needed these days? Only when disabling -prune? Though, then you'll download from the network anyway.

Oct 18 '19 12:10 maflcko

@MarcoFalke Building new indices is required if upgrading from pre-0.17 to post-0.17 (https://bitcoin.org/en/release/v0.17.0) with a txindex.

If your node has a txindex, the txindex db will be migrated the first time you run 0.17.0 or newer, which may take up to a few hours. Your node will not be functional until this migration completes.

I haven't looked into whether this PR speeds up that process, but it is something to consider.

Oct 21 '19 01:10 leishman

@MarcoFalke Building new indices is required if upgrading from pre-0.17 to post-0.17 (https://bitcoin.org/en/release/v0.17.0) with a txindex.

Isn't -reindex-chainstate enough in that case? I don't think that requres a rescan of the block files.

Oct 30 '19 11:10 laanwj

Rebased. I think running --reindex is also good when for whatever reason you don't trust your index files; for example, think files may be missing or may have been corrupted. It's true that in that case you could just re-IBD, but this is faster and less disruptive to peers.

I wasn't sure how much variation there might be in the runtimes I posted here earlier, so to get a few more data points, I added assert(0) immediately after printing Reindexing finished, and ran bitcoind --reindex several times in a shell loop, timing each run (using the standard time bash builtin). Here are the real times without this PR (which I reported as 130 minutes in the earlier comment):

real	131m52.120s
real	130m55.904s
real	133m20.911s
real	134m37.052s
real	146m17.564s
real	135m17.301s

and here are the times for runs with this PR (I ran this a few more times), this is what I reported as 84 minutes above:

real	79m55.050s
real	92m30.341s
real	82m44.119s
real	80m50.067s
real	89m31.793s
real	87m36.802s
real	97m42.247s
real	86m31.999s
real	87m13.304s
real	82m21.497s

Oct 30 '19 21:10 LarryRuane

Rebased to fix merge conflicts, force-pushed.

Nov 07 '19 19:11 LarryRuane

Rebased, force-pushed

Nov 21 '19 17:11 LarryRuane

Reviewers, please take a fresh look at the latest force-push; this PR is much better than before. I'll explain from scratch, so it's not necessary to read any of the previous comments in this PR.

The previous version of this PR included two separate changes (both affecting -reindex):

initially deserializing only the header instead of the entire block
when we find a block whose child has already been read, attempt to "rewind" in the stream memory buffer to access and process the child (instead of always reading it again from disk)

These should always have been separate PRs. This PR now contains only the first change, which is the more effective and localized. I'll make a separate PR for the second change if it turns out to be a significant improvement. This PR no longer has any effect on memory usage, which reviewers had concerns about.

To go over some basics again, for background: During -reindex, blocks are read from disk (the blocks/blk00???.dat files), deserialized, and processed. But a block can't be processed if its parent hasn't yet been seen (we know this only after deserializing the block). In that case, its disk location is saved in a map and we go on to the next block.

Whenever we do successfully process a block (we've got its parent), we check to see if its child has already been seen (is present in the map, looked up by block hash), and, if so, we read the child block from disk (again), deserialize it, and process it (and remove it from the map).

It turns out that, due to the "headers first" initial block download (IBD) algorithm, over 92% of blocks (on my system) are out of order, in the sense that 92% of the time when we first read a block, we haven't yet seen its parent. You can see these by specifying the -debug=reindex option (see the second comment in this PR).

What this PR does -- and now all it does -- is to initially deserialize only the header (80 bytes), not the entire block. From the header, we can tell if this block's parent has already been seen (by its prev-hash). If not (92% of the time), we skip over the rest of the block. So the savings are to not deserialize the entire block in the most common case where that's unnecessary.

It's true that if all blocks were in order, this would be slightly slower because we'd deserialize the header, discover that the block's parent is already known, then deserialize the full block. (There's no easy way to deserialize the rest of the block.)

To test this change,you may want to apply this patch:

--- a/src/init.cpp
+++ b/src/init.cpp
@@ -692,6 +692,7 @@ static void ThreadImport(std::vector<fs::path> vImportFiles)
         pblocktree->WriteReindexing(false);
         fReindex = false;
         LogPrintf("Reindexing finished\n");
+        StartShutdown();
         // To avoid ending up in a situation without genesis block, re-try initializing (no-op if reindexing worked):
         LoadGenesisBlock(chainparams);
     }

Then you can time the reindexing in isolation (without having to look at timestamps in the log file):

$ time src/bitcoind --reindex --maxconnections=0

Setting maxconnections to zero makes the timing less variable, since you can get very different peers each time.

When I run this, I consistently get 83 minutes with this PR's branch, and 125 minutes without, an improvement of 42 minutes.

Feb 21 '20 19:02 LarryRuane

Just to clarify, the last bullet point in the review club writeup:

Before this PR, we would always read entire blocks from disk into our buffer. If the predecessor of a block was not available, the blocks would have to be read again at a later point. This PR changes behavior such that initially only the 80 byte block header is read into the buffer, and if we can’t accept the block yet, the rest of the block is skipped for now

is slightly incorrect. Even with this PR, we always read entire blocks from disk (most of them twice); the disk read behavior before and after this PR is exactly the same. The difference is that with this PR, only the header is (initially) deserialized, rather than the entire block, which saves CPU.

The CBufferedFile object does somewhat large reads, around 4 MB (MAX_BLOCK_SERIALIZED_SIZE), each of which often includes multiple entire blocks and a partial block (because we don't know where the boundaries are; we don't know how much to read to get exactly a block or blocks). Recent blocks are close to full, so each disk read probably reads in only one or two blocks (plus a partial block).

But now I'm wondering if this PR should work as described in the review club comment. We could read only the 80-byte header from disk, and then, if we've seen its parent, read the rest of the block. Two disk reads are less efficient than one large read (a disk read has a fixed overhead and a length-proportional overhead), but that happens only about 8% of the time. If we haven't seen this block's parent (probability 92%), we could seek ahead in the file, instead of reading it into memory. But as explained earlier, there could be more than one block per large read, so we may end up exchanging one large read for 2 or 3 80-byte reads -- hard to say which is better. I'd recommend leaving this PR as is, because it's an (almost) unambiguous improvement on the status quo, and changing it to do 80-byte reads and seeks is more complex than this PR is currently. (The reason I say almost is because, if the blocks are already sorted, this ends up deserializing the header one extra time.)

I'm just now thinking, we could make reindex (part 1) screaming fast by storing the headers in separate "parallel" files in the blocks/ directory: blk01234.dat would be the same as now, but there would be a hdr01234.dat (similar to the way there's a rev001234.dat today) that would be that file's blocks' headers in the same order. Actually, that file would be a sequence of (header, file-offset, size) tuples. The file-offset is the file offset of that block's block in the same-named blk file. That's only 100 MB of additional disk space (about 100 bytes times about 1M blocks, this is far less than 1% of the size of the blocks/ directory).

Then -reindex could do 4 MB reads, same as today except reading the hdr001234.dat files, bringing in 50k headers in a single read -- super-efficient. Then loop through each of those, and if the block can be accepted now (its parent has been seen, 8% case), then seek and read from the blk01234.dat file (we would know the exact seek offset and the size); that would be slower than today. But the 92% case would be much faster because we wouldn't be reading entire blocks into memory unnecessarily. This would be a significant reduction in the number of disk reads. (Definitely this would be a separate PR.) Opinions?

Mar 03 '20 19:03 LarryRuane

~Btw https://bitcoincore.reviews/16981.html notes that this fixes a bug as well. I would really like if this bug-fix was its own separate commit (followed by the commit doing the actual runtime performance improvement).~

Mar 04 '20 09:03 kallewoof

This seems to be a misunderstanding. I don't think that this PR fixes a bug, the sentence in question referred to the older PR #16577 which is already merged.

Mar 04 '20 10:03 mzumsande

OH. Yeah, that's what it says. I wasn't reading carefully enough.

Mar 04 '20 10:03 kallewoof

Concept ACK, also reviewed the code and mostly satisfied. I think there is good potential for performance improvement here. I will wait with testing until the current comments are addressed.

Mar 04 '20 21:03 fjahr

Edit: I forgot to mention; this node has txindex enabled.

[x] Benchmarks running on a50f956e304adc7c75428ec391f1a76a99c1ddde:

25.11s user 163.37s system 7% cpu 39:53.51 total
25.59s user 162.06s system 7% cpu 39:43.73 total
25.35s user 161.31s system 7% cpu 39:16.82 total
24.98s user 162.59s system 7% cpu 39:06.45 total
25.36s user 162.80s system 8% cpu 39:01.81 total

[x] With SCHED_BATCH off:

24.64s user 155.84s system 8% cpu 35:31.20 total
25.22s user 156.17s system 8% cpu 36:08.64 total
25.31s user 156.53s system 8% cpu 36:44.35 total
25.08s user 156.75s system 8% cpu 37:30.11 total
24.97s user 156.26s system 8% cpu 37:15.24 total

[x] Master (too slow; only ran one time, on 3d28c886f077ce22fb7755fe9ec1f4e08d3d4a62):

4261.85s user 151.57s system 66% cpu 1:49:57.76 total

Mar 07 '20 02:03 kallewoof

@vasild, thanks for the timings, I like both of your suggestions (here and here) so I just force-pushed both of them. I also fixed the reindex functional test, which I hadn't noticed has been broken for a month.

Oops, I accidentally force-pushed a rebase onto latest master (because I was trying out your suggestion, which didn't merge cleanly without rebasing), sorry about that, reviewers.

Apr 09 '20 20:04 LarryRuane

ACK fbf708b8c

Compiled and tested locally.

I viewed the changes introduced in the latest force-push (0e27461 -> fbf708b8c) using:

diff -u <(git diff 0e27461~..0e27461) <(git diff fbf708b8c~..fbf708b8c)

Do you think it would be possible to demonstrate that it produces identical result as master? That is - reindex with master, dump the leveldb database from blocks/index in some textual form, reindex with this PR, dump again and compare the two dumps?

Apr 10 '20 06:04 vasild

@vasild

Do you think it would be possible to demonstrate that it produces identical result as master? That is - reindex with master, dump the leveldb database from blocks/index in some textual form, reindex with this PR, dump again and compare the two dumps?

That's a good idea, but I can't find a tool to create such a dump. (Maybe that would be a nice addition to contrib/?)

I did think of a test to convince me that it's working:

I ran bitcoind --reindex --maxconnections=0 and let it finish reindexing. Not just the first part that this PR modifies, but the entire reindex procedure, to the point it had logged an UpdateTip of a very recent block. Then, of course, it couldn't add new blocks because it had no peers.

Then I restarted the client without any arguments, and it started syncing at that very recent block height (since now it had peers) and within about a minute was fully synced.

This convinces me because once before I'd purposely damaged the first part of reindexing by simulating a disk read error early in the process, which resulted in one block not being processed. The client could not load blocks from that point onward, because this missing block's child couldn't be processed (parent is required), so that child block's child couldn't be processed, and so on, recursively. The system very nicely recovered by going into IBD mode and getting all blocks following that missing block from peers and then running normally.

If this PR broke the reindexing process, the same would have happened (it would have started IBD-ing from when the bug introduced damage, probably very early). But it didn't. It picked up downloading from a very recent height, and accepted these blocks ("UpdateTip: new best=..."), proving that all previous blocks were recorded correctly in the block index.

What do you think, does that seem like a good test to you?

Apr 12 '20 19:04 LarryRuane

I had an idea for a further speedup, but it didn't work out as well as I had hoped. It's about 10% faster, but probably should not be merged because the added complexity isn't worth it. The branch is here (based on this PR's branch plus a single commit): https://github.com/LarryRuane/bitcoin/tree/reindex-speedup-thread

The idea is to move the delayed processing of child blocks (delayed because the parent wasn't yet available) into a separate thread. This overlaps the child processing with the reading of the block files using the CBufferedFile object and deserializing the header, which is a natural separation. If the times to do those two things were equal, this would have given a 2x improvement. But some debug printing shows that processing the child blocks takes the majority of the time (the parent thread ends up waiting for the child thread near the bottom of the function). And, unfortunately, the processing of child blocks can't be multithreaded, because each child depends on its parent. (If there was a long chain split, it may be possible to process the two chains in parallel, but those are very rare.)

Apr 12 '20 20:04 LarryRuane

Code looks good to me so far but I would like it more if the refactoring was in a separate commit and the commit message could go into more detail on how the performance improvement is achieved.

But I am currently also having trouble to reproduce the performance improvement itself. I am comparing the current master (d8dfcea5d956126fe0e2b1f36db74c8ef805e7ab) with this PR and I am using testnet locally with no other special configurations. I am looking at the time span between Reindexing block file blk00001.dat and Leaving InitialBlockDownload, so this includes the entire reindex procedure.

My two runs with this PR were 1:10 h and 1:14 h while on master they were 1:05 h and 1:06 h. Any idea why that could be the case?

Apr 16 '20 23:04 fjahr

Thanks, @fjahr, good idea, I just force-pushed three commits to replace the previous single commit so the changes are separated out more clearly.

I also just updated the description, which was outdated.

I timed testnet and this PR was 4 minutes faster than latest master (with this PR based on latest master). On mainnet, it's more like 24 minutes. I think the difference is that testnet blocks are smaller and less complex on average, so the savings of deserializing only the header instead of the entire block is less. If you still have your debug.log files, how much time was there between reindexing the first block file and Reindexing finished? That's the only phase of reindexing that this PR changes. I suspect there is enough randomness in the entire reindex that the 4 minutes advantage on testnet can be lost in the noise.

One suggestion, if you could try it again, is to start bitcoind with the -maxconnections=0. I think each time you start bitcoind you can get a different set of peers, and different peers can put a different load on your node, so not having any peers reduces the variance, at least that's been my experience.

Apr 18 '20 02:04 LarryRuane

One suggestion, if you could try it again, is to start bitcoind with the -maxconnections=0. I think each time you start bitcoind you can get a different set of peers, and different peers can put a different load on your node, so not having any peers reduces the variance, at least that's been my experience.

Sorry, I should have said that I already set -maxconnections=0, I tried to replicate your setup described earlier. It wouldn't be the first time my laptop gets odd benchmark results (see https://github.com/bitcoin/bitcoin/pull/18352) but I don't see a direct connection.

Apr 19 '20 22:04 fjahr

Force-pushed and rebased to fix merge conflicts. This PR had momentum for a while but that's faded. Reviewers may be put off by the number of comments, but the result of all the discussion is a much simpler PR. I believe all reviewers' comments have been addressed. I did make sure that the description (first comment) is up-to-date and accurate.

I believe the tests may be the best aspect of this PR; see especially the feature_reindex.py functional test. It tests out-of-order block processing in LoadExternalBlockFile() which has never been (auto-) tested before. Even if all we merged from this PR is that test, it probably would be worth it. Thanks in advance for helping to get this PR over the hump.

Jul 13 '20 05:07 LarryRuane

The fixup commit fix CI failures, same changes as #18216 would better be squashed into whichever commit introduced the failures.

Jul 13 '20 18:07 vasild

@vasild, good suggestion, done (squashed and force-pushed).

I re-ran the timing experiments again since a lot has changed. I started the client with maxconnections=0 (so it wouldn't be bothered by peers). I ran the experiments multiple times in a controlled environment, so the variation was very small. These are the times of the first phase of -reindex (reading the blk?????.dat files) on mainnet:

	Real time (minutes, seconds)	User CPU time	System CPU time
master	133m35s	116m11s	4m28s
master + PR	87m3s	70m36s	3m35s

The real-time with the PR (probably what people care about most) is 65% of the real-time without this PR, or 46m32s faster.

Jul 13 '20 23:07 LarryRuane

Force-pushed rebase to eliminate merge conflict.

Jul 21 '20 17:07 LarryRuane

Force-push to fix comment (review suggestion), also took the opportunity to improve the test by checking for expected text in debug.log.

Jul 24 '20 15:07 LarryRuane

Running a -reindex benchmark on this; will report back with results.

Jul 24 '20 15:07 jamesob

Ran a pair of benchmark tests:

With PR: 1 hour, 8 minutes 5b1e92f539c24b6518f577b92874ccac0bbf617f

2020-07-24T22:41:21Z Reindexing block file blk00000.dat...
...
2020-07-24T23:49:24Z Reindexing block file blk02165.dat...
2020-07-24T23:49:27Z Loaded 126 blocks from external file in 2570ms
2020-07-24T23:49:27Z Reindexing finished

Without PR: 1 hour, 40 minutes HEAD~3 2c0c3f8e8ca6c64234b1861583f55da2bb385446

2020-07-25T00:29:27Z Reindexing block file blk00000.dat...
...
2020-07-25T02:10:48Z Reindexing block file blk02165.dat...
2020-07-25T02:10:52Z Loaded 126 blocks from external file in 3120ms
2020-07-25T02:10:52Z Reindexing finished

Nice job @LarryRuane !

Jul 25 '20 02:07 pinheadmz

Concept ACK.

Jul 25 '20 14:07 hebasto

ACK 5b1e92f539c24b6518f577b92874ccac0bbf617f

I did a fresh re-review of the code. I currently don't have access to the machine where I saw no speed-up before but I ran another test on a different machine and that confirms the benchmarks that others have seen here: 2:05h on current master vs. 1:22h with this PR.

Jul 25 '20 20:07 fjahr

Benchmarked and tested 5b1e92f539c24b6518f577b92874ccac0bbf617f on Linux Mint 20 (x86_64).

Setup:

block height=640740
recent block file blk02168.dat
commad-line arguments: -nonetworkactive -reindex

master (40a04814d130dfc9131af3f568eb44533e2bcbfc)

2020-07-25T14:56:14Z [main] Bitcoin Core version v0.20.99.0-40a04814d (release build)
...
2020-07-25T14:56:14Z [loadblk] loadblk thread start
2020-07-25T14:56:14Z [loadblk] Reindexing block file blk00000.dat...
...
2020-07-25T16:29:02Z [loadblk] Reindexing block file blk02167.dat...
2020-07-25T16:29:03Z [loadblk] Loaded 2 blocks from external file in 1515ms
2020-07-25T16:29:03Z [loadblk] Reindexing block file blk02168.dat...
2020-07-25T16:29:14Z [loadblk] Loaded 837 blocks from external file in 10413ms
2020-07-25T16:29:14Z [loadblk] Reindexing finished
2020-07-25T16:29:14Z [loadblk] Pre-allocating up to position 0x100000 in rev00000.dat
2020-07-25T16:29:14Z [loadblk] UpdateTip: new best=00000000839a8e6886ab5951d76f411475428afc90947ee320161bbf18eb6048 height=1 version=0x00000001 log2_work=33.000022 tx=2 date='2009-01-09T02:54:25Z' progress=0.000000 cache=0.0MiB(1txo)
...
2020-07-25T21:00:01Z [loadblk] UpdateTip: new best=0000000000000000000f9ddd409d655a2e420c442a877813a7697289a84d4525 height=640740 version=0x37ffe000 log2_work=92.145733 tx=551982677 date='2020-07-25T14:51:51Z' progress=0.999868 cache=586.0MiB(4419378txo)
2020-07-25T21:00:01Z [loadblk] Leaving InitialBlockDownload (latching to false)
2020-07-25T21:00:01Z [loadblk] Imported mempool transactions from disk: 137 succeeded, 0 failed, 0 expired, 0 already there, 0 waiting for initial broadcast
2020-07-25T21:00:01Z [loadblk] loadblk thread exit
...

master + this PR

2020-07-25T21:14:58Z [main] Bitcoin Core version v0.20.99.0-b0a2130fb (release build)
...
2020-07-25T21:14:58Z [loadblk] loadblk thread start
2020-07-25T21:14:58Z [loadblk] Reindexing block file blk00000.dat...
...
2020-07-25T22:13:00Z [loadblk] Reindexing block file blk02167.dat...
2020-07-25T22:13:01Z [loadblk] Loaded 0 blocks from external file in 1233ms
2020-07-25T22:13:01Z [loadblk] Reindexing block file blk02168.dat...
2020-07-25T22:13:02Z [loadblk] Loaded 0 blocks from external file in 1210ms
2020-07-25T22:13:02Z [loadblk] Reindexing finished
2020-07-25T22:13:02Z [loadblk] Pre-allocating up to position 0x100000 in rev00000.dat
2020-07-25T22:13:02Z [loadblk] UpdateTip: new best=00000000839a8e6886ab5951d76f411475428afc90947ee320161bbf18eb6048 height=1 version=0x00000001 log2_work=33.000022 tx=2 date='2009-01-09T02:54:25Z' progress=0.000000 cache=0.0MiB(1txo)
...
2020-07-26T01:09:08Z [loadblk] UpdateTip: new best=000000000000000001ee925f0e1570c6b61990ac9777fd4a73983b623494e894 height=467466 version=0x20000002 log2_work=86.465712 tx=224507800 date='2017-05-21T16:50:34Z' progress=0.408111 cache=256.7MiB(1729948txo)
2020-07-26T01:09:09Z [loadblk] Imported mempool transactions from disk: 1 succeeded, 136 failed, 0 expired, 0 already there, 0 waiting for initial broadcast
2020-07-26T01:09:09Z [loadblk] loadblk thread exit
...

Please note some differences:

log messages about the blk02168.dat reindexing: 837 vs 0 blocks
block height at which the loadblk thread exits: 640740 vs 467466
this PR log lacks Leaving InitialBlockDownload (latching to false) log message
Imported mempool transactions from disk transaction statistics

Jul 26 '20 07:07 hebasto

Also the resulted index sizes differ substantially:

master (40a04814d130dfc9131af3f568eb44533e2bcbfc):

$ ls -l
total 168532
-rw------- 1 hebasto hebasto 83614454 Jul 26 15:59 000003.log
-rw------- 1 hebasto hebasto       27 Jul 26 15:59 000004.log
-rw------- 1 hebasto hebasto 88941405 Jul 26 15:59 000005.ldb
-rw------- 1 hebasto hebasto       16 Jul 26 14:15 CURRENT
-rw------- 1 hebasto hebasto        0 Jul 26 14:15 LOCK
-rw------- 1 hebasto hebasto       50 Jul 26 14:15 MANIFEST-000002

master + this PR:

$ ls -l
total 122436
-rw------- 1 hebasto hebasto 60722874 Jul 26 14:07 000003.log
-rw------- 1 hebasto hebasto       27 Jul 26 14:07 000004.log
-rw------- 1 hebasto hebasto 64632267 Jul 26 14:07 000005.ldb
-rw------- 1 hebasto hebasto       16 Jul 26 13:09 CURRENT
-rw------- 1 hebasto hebasto        0 Jul 26 13:09 LOCK
-rw------- 1 hebasto hebasto       50 Jul 26 13:09 MANIFEST-000002

Jul 26 '20 13:07 hebasto

I found the root of all differences in logs:

master (40a04814d130dfc9131af3f568eb44533e2bcbfc):

[loadblk] Reindexing block file blk00877.dat...
...
[loadblk] LoadExternalBlockFile: Processing out of order child 000000000000000001ee925f0e1570c6b61990ac9777fd4a73983b623494e894 of 000000000000000001d22f4d6a05420ba43ee2eb457efea06be8280f8a5f14d2
[loadblk] LoadExternalBlockFile: Deserialize or I/O error - ReadCompactSize(): size too large: iostream error
...
[loadblk] Loaded 98 blocks from external file
[loadblk] Reindexing block file blk00878.dat...
...

master + this PR

[loadblk] Reindexing block file blk00877.dat...
...
[loadblk] LoadExternalBlockFile: Processing out of order child 000000000000000001ee925f0e1570c6b61990ac9777fd4a73983b623494e894 of 000000000000000001d22f4d6a05420ba43ee2eb457efea06be8280f8a5f14d2
...
[loadblk] Loaded 68 blocks from external file
[loadblk] Reindexing block file blk00878.dat...
...

Is this change in error handling behavior expected?

Jul 26 '20 13:07 hebasto

I haven't looked closely at the changes here or the debug.log from these runs, but reindexing to 450,000 isn't showing much difference on my end between the HEAD of this branch and the mergebase.

Jul 26 '20 16:07 jamesob

@jamesob Will try to describe a problem better. The blocks/blk00877.dat file is somehow broken on my disk.

The master branch:

encounter an error in blkdat >> block
reports about an error with the "Deserialize or I/O error..."
tries to read from the next position

This PR:

no error during blkdat >> header
a broken block (or just random bytes) just treated as a block with unknown parent

An example with this PR:

$ ./src/bitcoind -loadblock=/home/hebasto/blk00877.dat.error
...
2020-07-26T17:19:28Z [loadblk] loadblk thread start
2020-07-26T17:19:28Z [loadblk] Importing blocks file /home/hebasto/blk00877.dat.error...
...
2020-07-26T17:19:28Z [loadblk] LoadExternalBlockFile: Out of order block eda32a9dea60fff9b0c4df31a425cd6d0e19b74f4c88d8f08a5b8f9e04265872, parent 01ee925f0e1570c6b61990ac9777fd4a73983b623494e89420000000000f3aee not known
...

Please note that blocks

eda32a9dea60fff9b0c4df31a425cd6d0e19b74f4c88d8f08a5b8f9e04265872, and
01ee925f0e1570c6b61990ac9777fd4a73983b623494e89420000000000f3aee just do not exist in the Universe yet :)

Jul 26 '20 17:07 hebasto

May I suggest a patch:

--- a/src/validation.cpp
+++ b/src/validation.cpp
@@ -4719,6 +4719,7 @@ void LoadExternalBlockFile(const CChainParams& chainparams, FILE* fileIn, FlatFi
                 blkdat.Skip(nBlockPos + nSize - blkdat.GetPos());
 
                 const uint256 hash = header.GetHash();
+                if (!CheckProofOfWork(hash, header.nBits, chainparams.GetConsensus())) continue;
                 {
                     LOCK(cs_main);
                     // Store positions of out of order blocks for later.

that makes loading blocks from files bullet-proof?

The excerpt from the log (master + this PR + patch):

...
2020-07-27T14:10:13Z [loadblk] loadblk thread start
2020-07-27T14:10:13Z [loadblk] Reindexing block file blk00000.dat...
...
2020-07-27T15:33:46Z [loadblk] Reindexing block file blk02167.dat...
2020-07-27T15:33:47Z [loadblk] Loaded 2 blocks from external file in 1282ms
2020-07-27T15:33:47Z [loadblk] Reindexing block file blk02168.dat...
2020-07-27T15:33:58Z [loadblk] Loaded 837 blocks from external file in 10813ms
2020-07-27T15:33:58Z [loadblk] Reindexing finished
...

It takes 84 min to build a new block index vs 93 min on master (see https://github.com/bitcoin/bitcoin/pull/16981#issuecomment-663951709), that is ~10% improvement.

Jul 27 '20 15:07 hebasto

@jamesob:

... reindexing to 450,000 isn't showing much difference on my end between the HEAD of this branch and the mergebase

That's strange. What does reindexing to 450,000 mean? I didn't know you can do that. Is it reindexing up to that height, or only beyond that height?

Are the debug.log files still available for those runs? I'd be curious to know when the message Reindexing finished appears relative to the start, because this PR should only affect that part of a -reindex run.

Jul 28 '20 20:07 LarryRuane

I think the benchmarks for this PR should additionally provide "Out of order block..." log message count. The lower this count is the lower -reindex speedup should be expected.

Jul 29 '20 07:07 hebasto

@hebasto, thank you for doing all those tests. I did more testing (that's why I changed this PR to a Draft), and I'm now convinced there's no functionally important difference between master and this PR when there's data corruption in the blk.dat files. There can be a behavior and timing difference, but I don't think it matters. The behavior difference (as you have explained) is that if there's corruption in the middle of a block's serialization on disk, it's most likely not in the header, and because this PR only initially deserializes the header, the corruption isn't detected immediately. master deserializes the entire block, so it detects it immediately.

Assume there's corruption in the middle of a block (not in the header). This happens within LoadExternalBlockFile():

With master, the blkdat >> block throws an exception, this is caught and the catch handler prints "Deserialize or I/O error", and we move on to the next block. So it was as if that block didn't exist in the blk.dat files.

With this PR, the same thing happens if the block can be processed immediately (that is, if we've already seen its parent). Otherwise, an entry gets added to mapBlocksUnknownParent and we move on to the next block, and the corruption is undetected for now. A short time later, we reach this corrupt block's parent and then call ReadBlockFromDisk() on the corrupt block, which detects the corruption, logs the error, and returns false. LoadExternalBlockFile() just continues, so, again, it's as if this block didn't exist in the blk.dat files.

In both cases (with and without this PR), subsequent blocks no longer have AcceptBlock() called on them, because the chain is broken (the corrupt block is a missing link in the blockchain unless the block happens to not be part of the best chain, which is unlikely). All these blocks just get added to mapBlocksUnknownParent (and never removed and processed).

Here's a patch to simulate corruption

--- a/src/validation.cpp
+++ b/src/validation.cpp
@@ -4720,8 +4720,20 @@ void LoadExternalBlockFile(const CChainParams& chainparams, FILE* fileIn, FlatFi
                     if (hash != chainparams.GetConsensus().hashGenesisBlock && !LookupBlockIndex(block.hashPrevBlock)) {
                         LogPrint(BCLog::REINDEX, "%s: Out of order block %s, parent %s not known\n", __func__, hash.ToString(),
                                 block.hashPrevBlock.ToString());
-                        if (dbp)
-                            mapBlocksUnknownParent.insert(std::make_pair(block.hashPrevBlock, *dbp));
+                        static bool docorrupt = true;
+                        // simulate corruption in the first block that doesn't have a parent yet
+                        if (dbp) {
+                            if (docorrupt) {
+                                // Simulate corruption in the first block that can't be processed
+                                // immediately (that's when there's a difference between master
+                                // and the PR) -- don't add an entry to mapBlocksUnknownParent.
+                                docorrupt = false;
+                                LogPrintf("%s: simulate corruption block %s, parent %s\n", __func__, hash.ToString(),
+                                        block.hashPrevBlock.ToString());
+                            } else {
+                                mapBlocksUnknownParent.insert(std::make_pair(block.hashPrevBlock, *dbp));
+                            }
+                        }
                         continue;
                     }
 
@@ -4778,13 +4790,13 @@ void LoadExternalBlockFile(const CChainParams& chainparams, FILE* fileIn, FlatFi
                     }
                 }
             } catch (const std::exception& e) {
-                LogPrintf("%s: Deserialize or I/O error - %s\n", __func__, e.what());
+                LogPrintf("%s: Deserialize or I/O error - %s pos %s offset %d\n", __func__, e.what(), dbp->ToString(), blkdat.GetPos());
             }
         }
     } catch (const std::runtime_error& e) {
         AbortNode(std::string("System error: ") + e.what());
     }
-    LogPrintf("Loaded %i blocks from external file in %dms\n", nLoaded, GetTimeMillis() - nStart);
+    LogPrintf("Loaded %i blocks from external file in %dms mapBlocksUnknownParent size %i\n", nLoaded, GetTimeMillis() - nStart, mapBlocksUnknownParent.size());
 }
 
 void CChainState::CheckBlockIndex(const Consensus::Params& consensusParams)

(The final LogPrintf in the function also prints the number of entries in mapBlocksUnknownParent.)

Here's the output when I run this on master on testnet:

2020-07-30T19:55:07Z Reindexing block file blk00000.dat...
2020-07-30T19:55:11Z Loaded 753 blocks from external file in 3837ms mapBlocksUnknownParent size 85998
2020-07-30T19:55:11Z Reindexing block file blk00001.dat...
2020-07-30T19:55:14Z Loaded 0 blocks from external file in 2985ms mapBlocksUnknownParent size 139893
2020-07-30T19:55:14Z Reindexing block file blk00002.dat...
2020-07-30T19:55:17Z Loaded 0 blocks from external file in 3162ms mapBlocksUnknownParent size 202285
(...)
2020-07-30T20:05:43Z Reindexing block file blk00185.dat...
2020-07-30T20:05:47Z Loaded 0 blocks from external file in 3997ms mapBlocksUnknownParent size 1781217

It's always zero blocks loaded from then on. Then (either with this PR or master), it enters the ActivateBestChain phase, and it gets blocks from peers. So it automatically goes into IBD mode. When it gets near the tip. It finishes syncing, it logs InitialBlockDownload (latching to false), and then runs normally. I verified that if I restart the node, it's still synced, and runs normally. This is both with and without the PR.

Jul 31 '20 03:07 LarryRuane

@LarryRuane

You described the scenario when a block is corrupted within a block file. In my situation (don't know how it was happened though) a block file has random/broken bytes but all block are correct within it.

Therefore, with the suggested patch --reindex completes without getting blocks from peers.

Jul 31 '20 09:07 hebasto

a block file has random/broken bytes but all block are correct within it.

Can you explain more? Where exactly is the corruption?

Aug 01 '20 15:08 LarryRuane

a block file has random/broken bytes but all block are correct within it.

Can you explain more? Where exactly is the corruption?

blk00877.dat.error

Aug 01 '20 15:08 hebasto

TL;DR

I think the PR is good as it is can and should be merged.

@hebasto, I understand exactly what's going on, and having that corrupted file really helped, thanks. Before explaining what I found, can you try a different fix, this will confirm my understanding. At this point in LoadExternalBlockFile():

// Skip the rest of this block; position to the marker before the next block.
nRewind = nBlockPos + nSize;

Just comment out that setting of nRewind. I expect that all the blocks will load (you won't see Loaded 0 blocks... messages).

Explanation

There's a bad block whose marker bytes, 0xf9, 0xbe, 0xb4, 0xd9, start at offset 0x3147657. The size field that follows is 0xf3afd = 998141, which is reasonable. This is the number of bytes of the serialized block. If you move ahead that many bytes in the stream, you should be at the marker bytes of the next block (or be at the end of the file). That should be 0x3147657 + 4 + 4 + 998141 = 0x323b15c. Looking at that part of the hex dump of the file, we see:

0323b150  ac ad 21 07 00 f9 be b4  d9 03 3b 0f 00 02 00 00  |..!.......;.....|

That's 16 bytes, and offset 0xc should be the next marker bytes, but instead we see the marker bytes starting at offset 0x5. These are the marker bytes for a legitimate block, and if we don't load it, all subsequent blocks will have no parent.

So, with the PR, what happens is, we deserialize just the header, and it's probably impossible for that to fail because it's just 80 bytes and everything is fixed-sized. Its parent isn't found (of course, since its prevhash is garbage, doesn't even start with zeros), so we add an entry to mapBlocksUnknownParent[], and then skip ahead to where we think the next block's marker bytes start. But that's just beyond where the marker bytes are, so miss that entire block. We get resynchronized (FindByte()) with the following block.

In contrast, with master, the attempt to deserialize the entire block fails, so we resynchronize starting at the byte after that bad block's first marker byte, and thus we do find and load this block.

The additional call to CheckProofOfWork() that @hebasto suggested acts here as a sanity-check on the block header, which this nonsense header fails, so then again we attempt to resynchronize starting at the byte after that bad block's first marker byte, and we find the next block.

So, a way to summarize the problem is that this PR makes the block size (that follows the marker bytes, nSize) more important than it is with master. On master, when we fail to deserialize the block, we go back to resynchronizing with the stream (looking for the marker bytes) and the block size isn't used.

One thing we can do to fix this in this PR is that, after deserializing just the header, if its parent hasn't been seen, then after adding it to mapBlocksUnknownParent, we could resynchronize with the stream, instead of just skipping ahead by the size of the block. We can do that by removing this line. But that would be much slower.

I coded a patch to speed up `FindByte()`

--- a/src/streams.h
+++ b/src/streams.h
@@ -851,12 +851,21 @@ public:
 
     //! search for a given byte in the stream, and remain positioned on it
     void FindByte(char ch) {
+        // start is the index into vchBuf[] at which to start searching
+        size_t start = nReadPos % vchBuf.size();
         while (true) {
             if (nReadPos == nSrcPos)
                 Fill();
-            if (vchBuf[nReadPos % vchBuf.size()] == ch)
-                break;
-            nReadPos++;
+            size_t n = vchBuf.size() - start;
+            if (n > nSrcPos - nReadPos)
+                n = nSrcPos - nReadPos;
+            auto it_start = vchBuf.begin()+start;
+            auto it_find = std::find(it_start, it_start+n, ch);
+            size_t inc = it_find - it_start;
+            nReadPos += inc;
+            if (inc < n) break;
+            start += inc;
+            if (start >= vchBuf.size()) start = 0;
         }

And this works, and is much faster (std::find() seems to be very efficient in this case).

But on further reflection, I don't think we should do this, because this fixes only this specific kind of file corruption -- this problem can happen anyway. There can be all kinds of corruption that makes a block unserializable, and once that happens, all subsequent blocks cannot be loaded. But I have seen that when this occurs, the node is able to download blocks from peers and synchronize with the network anyway. It's true that it doesn't seem to correct damaged blk.dat files, but if that's worth doing, it should be for a separate PR.

But I'll do some benchmarking to see if this resynchronizing (with the faster FindByte()) is close to the PR as it is now, and if it's only a few percent slower, maybe it's worth doing. I'll give an update later.

Aug 04 '20 15:08 LarryRuane

I benchmarked this latest version, and it's only 3% slower than the PR version. This latest version has the faster FindByte() and also resynchronizes with the byte stream in the (very common) case where, after deserializing only the header, the parent hasn't yet been seen. This is instead of depending on the nSize value (moving ahead by this much). This makes reindexing more tolerant of various kinds of file corruption (as good as master). So I'll push that as a separate commit; I think the added robustness is worth it (and this is still much faster than master).

Aug 05 '20 16:08 LarryRuane

@LarryRuane

FindByte() speed up is great, but combining two different speed optimizing solutions in a single PR is not a good idea. I'd suggest to separate the FindByte() patch into a dedicated PR that will allow to measure each optimization approach individually.
Resynchronizing on each out-of-order block (commenting out nRewind = nBlockPos + nSize;) indeed improves robustness. Let's compare two solutions:

resynchronizing on each header that fails CheckProofOfWork() (https://github.com/bitcoin/bitcoin/pull/16981#issuecomment-664474119) has a small constant time (optimized hashing) overhead per each block
resynchronizing on each out-of-order block has a much greater time overhead (reading operations) per each out-of-order block, and the number of such blocks can vary from user to user. Moreover, this approach inserts into mapBlocksUnknownParent a piece of junk that will cause additional time overhead while searching through this map

Therefore, I'd strictly prefer the former approach.

While touching log message you could combine ideas from #19600.

Aug 06 '20 03:08 hebasto

@hebasto, that makes sense, I took your suggestion. Force-pushed (diff). The only slight change I made from your suggestion is this:

if (!CheckProofOfWork(hash, header.nBits, chainparams.GetConsensus())) continue;
{
    LOCK(cs_main);
    (...)

looks somewhat like this large block of code is within the if statement (if you don't notice the continue). So I wrote the continue on a separate line.

Aug 07 '20 15:08 LarryRuane

@LarryRuane

Thanks!

Two last commits with the same messages are confusing. Did you mean that they should be a single commit (that seems reasonable)?

Aug 07 '20 15:08 hebasto

Two last commits with the same messages are confusing

Yes, that was a mistake, thanks for catching that. Force-pushed (diff).

Aug 07 '20 17:08 LarryRuane

Was writing a review on e5724b5fde855b67ff500e21f11274f12b5c893a and noticed a fresh push. What is its goal?

Aug 07 '20 21:08 hebasto

@hebasto, you're too quick! Sorry, I confused myself with my lack of git skills... I just make a force-push to do a small cleanup (diff) and ended up doing some extra commits by mistake. I had noticed that the "-debug=reindex" in the python test was unneeded, because the test nodes are started with "-debug" (which turns on all debug logging).

Aug 07 '20 21:08 LarryRuane

I ran another round of benchmarks of a reindex up to block 600_000 on 84b995a9ef468e0badf7bc0378851ea8c698dfc7 ~(it says ibd on the graph, but it's a reindex).~

~There is a definite speedup but with a much wider range of peak memory usage:~ [edit: I got this wrong. Will repost with the proper results.]

Aug 16 '20 00:08 adamjonas

@pinheadmz, @hebasto, I just added another commit, please review. I discovered an additional minor performance improvement in the case where the block is already present in the block index (the message Block Import: already had block ... appears). This happens when you start with -reindex but the client stops before completing, and you start again without -reindex. I didn't attempt to measure the performance difference. I recommend ignoring whitespace because I took the chance to fix some broken indentation since I'm changing some of those exact lines.

Aug 24 '20 05:08 LarryRuane

@LarryRuane top commit makes sense to me, no point in reading the block data if we already have it. I'm running another reindex now to test it out, and will also try to get an "already had..." message. Also digesting the suggestion by @hebasto to skip corrupted block headers. Am able to modify the test by adding "corrupt" bytes and make sure the process just skips the block instead of completely borking.

Aug 25 '20 17:08 pinheadmz

The latest commit 969366d07c8cfdf8ea56cdb2ae11f5c648152345 looks good, but it adds another behavior change into this PR. I think it deserves its own PR to get the measurable effects from each improvement.

Sep 12 '20 17:09 hebasto

The latest commit 969366d looks good, but it adds another behavior change into this PR. I think it deserves its own PR to get the measurable effects from each improvement.

I see it differently. In my view, there's only one behavior change, which is "don't deserialize the full block unless necessary." If initially, I had made a single commit that's the same as what are now two commits, no one would have been said this is two behavior changes. What happened instead is that I simply missed one of the code paths in which it's not necessary to deserialize the entire block (only the header is needed). I made a second, separate commit to fix that only because some reviewers may have already seen the first commit. These two commits should probably be squashed. I see no reason to measure the performance of the two commits separately. It's clear that if one of these commits is worth merging, the other one is too.

Sep 13 '20 16:09 LarryRuane

Concept ACK

Dec 10 '20 06:12 decryp2kanon

Rebased, no changes except to improve a few of the commit messages.

Feb 08 '21 23:02 LarryRuane

Rebased to resolve merge conflicts.

Jun 14 '21 03:06 LarryRuane

Force-pushed rebase for merge conflicts, and also, @hebasto, I had added a call to CheckProofOfWork() as you suggested in this comment above, but I looked more carefully and found that this function is already called as part of importing each block (reindexing), so I removed that call as part of this rebase. It would help performance only if many blocks failed this check, but very few, if any, do.

Jul 28 '21 00:07 LarryRuane

Rebased (no changes to this PR's code), there were no conflicts but just because some people would like to review, nice to have a fresh base.

Nov 20 '21 17:11 LarryRuane

I tested the performance of this PR on a Raspberry Pi 4 (RPi4, a myNode), and this PR reduces the reindex real-time by 7 hours (from 28h15m down to 21h15m for the first phase of reindex (reading the blocks from disk into the block index). This is, admittedly, a best-case comparison, because the RPi has a slow CPU, so not deserializing most blocks twice is a significant win.

Dec 08 '21 06:12 LarryRuane

One further update on performance measurement: most of the -reindex time is spent in the second phase during which the node is building the best chain (the UpdateTip: new best= log messages), and this PR doesn't change this part of the reindex process. This myNode required 4 days and 15 hours to complete the entire reindex (both phases). The 7 hours improvement is, of course, a much smaller fraction of this time, about 6%. That's not as dramatic but still significant.

The reindex time was probably affected by the following configuration settings (which are the default for myNode, but I don't know what all these do):

txindex=1
dbcache=1000
maxorphantx=10
maxmempool=250
maxconnections=40
maxuploadtarget=1000

Enabling txindex definitely slows down reindex, so without that option, the percentage improvement from this PR would increase slightly (but I don't know by how much). Tor was also enabled, but I don't think that affects reindex time.

I recall that IBD on this myNode took 2 or 3 weeks (I wish I had measured it, I can do that if anyone's interested), so, at least on this hardware, doing a reindex is a much better option than a full IBD, to minimize both elapsed time and network usage.

Dec 10 '21 19:12 LarryRuane

tACK f8f9ba31f

I finished my performance review, and the results seem to align with Larry's findings:

The test setup is as follows: on each machine, I ran the first part of the reindex multiple times consecutively to also measure the stability of the timings. This set of reindexes was ran both on bitcoin:master branch as well as on LarryRuane:reindex-speedup f8f9ba31f.

On my laptop, reindexing took 1h43 mins on average (n=3) on master branch. On f8f9ba31f, the average (n=3) took only 1h13m, a decrease of 29%. On my Raspberry Pi 4, reindexing took 18h46 mins on average (n=2) on master branch. On f8f9ba31f, the average (n=2) took only 13h27m, a decrease of 28%.

Across all these runs, the maximum runtime difference on each setup was < 4%, so even though sample size is pretty small, the results look quite reliable.

Dec 17 '21 17:12 stickies-v

Force-pushed to implement review suggestion to not hold cs_main while deserializing the block, also clean up the commits, ready for review.

Dec 27 '21 07:12 LarryRuane

Rebased to latest master to fix CI failure.

Dec 27 '21 20:12 LarryRuane

Checking the reindex performance of the latest commit 7da6497bd7f0eaf6d20a1d0f83a2c849f3ba344f, on my Dell XPS-15 laptop, mainnet, default configuration except txindex is enabled. I also specified -networkactive=0 to eliminate any possible measurement noise caused by background networking activity (reindexing does not require any networking).

The result is that the reduction in reindex time with this PR is 82 minutes (from 362 minutes on master, down to 280 minutes with the PR for the first part of reindexing, adding all blocks to the index).

I also measured CPU time, and it's 81 minutes lower with this PR, which makes sense since this PR is a CPU-time improvement.

@hebasto, @fjahr, @vasild, @kallewoof, @pinheadmz, @stickies-v , would you mind taking another look and re-ACK if appropriate? Thanks.

Dec 28 '21 19:12 LarryRuane

Force-pushed dd83d9e to fix CI failure with 5a9387d, which had a hidden merge conflict with #22932. This actually reduced the size and complexity of this patch.

Feb 08 '22 00:02 LarryRuane

Concept ACK

Apr 17 '22 16:04 theStack

Force-pushed to resolve hidden conflict, and make the functional test less brittle (not rely on hardcoded constants).

Jun 09 '22 05:06 LarryRuane

Force-pushed rebase due to conflict with #25571

Aug 01 '22 18:08 LarryRuane

@stickies-v, thanks for the review, I decided to force-push all your suggestions. This force-push also includes a small refactor (adds the FillBufferUpToLength() to eliminate some duplicated code). Just to make sure the performance improvement still exists, I reindexed testnet3 (it's much faster than mainnet), and with this PR in its current form, the time to read the block files (set up the block index) is 8 minutes, and without this PR, the time is 13 minutes.

Aug 04 '22 18:08 LarryRuane

Force-pushed again to fix (I hope) CI failure (ARM (32-bit), where size_t is 32 bits instead of 64).

Aug 04 '22 19:08 LarryRuane

Force-pushed for review suggestions, thanks!

Aug 04 '22 21:08 LarryRuane

Force-pushed to fix CI failures

Aug 06 '22 02:08 LarryRuane

Force-pushed to address review comments (thank you, @stickies-v), ready for review.

Aug 09 '22 22:08 LarryRuane

🐙 This pull request conflicts with the target branch and needs rebase.

_{Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a "draft".}

Aug 30 '22 16:08 DrahtBot