-
Notifications
You must be signed in to change notification settings - Fork 88
Orphans sometimes have mismatched hashes in the db #942
Description
Blocks in the leveldb occasionally appear to have bad hashes. At startup many messages will appear in the log like this:
2021-05-14T03:40:58Z LoadBlockIndexGuts: Skipping Block 116: e4331ddbe300b6eb008d4f0313e155bd477038beb6f2787048d422659f2add85 (status=2)- Block Hash does not match Index Key
[slightly modified the checked-in code to show block height]
It's unclear whether this is related to #692 (which I did trigger on occasion) and this is quite complex so I am filing a separate bug.
With some additional debugging in WriteBatchSync, a search over the database produces some more information when blocks enter this state:
2021-05-14T03:40:58Z (debug) WriteBatchSync: !!!! Search found bad block! 116 - Wrote: bf18646c65c8200e3c28a25fa7378070fbc70f95ccc646bbf8cf785f84112495, Read: e4331ddbe300b6eb008d4f0313e155bd477038beb6f2787048d422659f2add85
Block indexes have a block hash cached in the index, and under certain circumstances, another hash of the index is created.
The cause of this difference is that orphaned blocks do not always have an updated mapAccumulatorHashes (all entries are set to the default 0)1, although they have a cached hash of what said map should be, in hashAccumulators. A hash can be retrieved in these ways:
CBlockIndex::GetBlockHashreturns a cached hash.
Lines 394 to 397 in b425596
uint256 GetBlockHash() const { return *phashBlock; } CBlockIndex::GetBlockHeaderconstructs a CBlockHeader object that will compute the hash. In doing this, it hashes the current accumulator map
Line 386 in b425596
block.hashAccumulators = SerializeHash(mapAccumulatorHashes); CDiskBlockIndex::GetBlockHashconstructs a header to compute the hash, and provides its cached version ofhashAccumulators.
Line 707 in b425596
block.hashAccumulators = hashAccumulators;
The problem here is inconsistency: GetBlockHash is not virtual, so its result depends on which class we consider the object to be while we read it--because one method recalculates hashAccumulators rather than use the saved value.
So, change 1: CDiskBlockIndex::GetBlockHash uses the cached hashAccumulators instead of rehashing (an empty map).
The issue persists but is a little clearer about what happens. From a regtest experiment:
2021-05-14T03:29:31Z UpdateTip: new best=28a4b2824e6212746424513c4b917b03cb49d34944714412be7b975069dcc952 height=116 typ
e=Sha256d version=0x31000000 tx=127 date='2021-05-14T03:29:07Z' cache=0.0MiB(1txo)
2021-05-14T03:29:31Z PruneStaleBlockIndexes: Checking for stale indexes. Block index size=199; Chain height=116
2021-05-14T03:29:31Z PruneStaleBlockIndexes: Erasing 0 of 82 irrelevant indexes. LastPrunedHeight now 0.
2021-05-14T03:29:31Z [default wallet] AddToWallet e9723950d5fe21c412ae7e342b52e3dbe9184a8db82b2fe9702030cb498cad45 new
2021-05-14T03:29:31Z [default wallet] keypool keep 2
2021-05-14T03:29:33Z UpdateTip: new best=a6076adc6b5eb783bc069adf7cd320e6ff711f7f3dd83f1bd088ef784416cf69 height=117 type=Sha256d version=0x31000000 tx=128 date='2021-05-14T03:29:31Z' cache=0.0MiB(2txo)
2021-05-14T03:29:33Z PruneStaleBlockIndexes: Checking for stale indexes. Block index size=200; Chain height=117
...
2021-05-14T03:31:01Z UpdateTip: new best=716fd96058bc67a094c2cb73df6c7127e5fba665a1dcbe4164b250a8c58e9255 height=123 type=PoS version=0x30000000 tx=136 date='2021-05-14T03:30:32Z' cache=0.0MiB(12txo)
2021-05-14T03:31:01Z PruneStaleBlockIndexes: Checking for stale indexes. Block index size=207; Chain height=123
2021-05-14T03:31:01Z PruneStaleBlockIndexes: Erasing 1 of 83 irrelevant indexes. LastPrunedHeight now 2.
2021-05-14T03:31:12Z UpdateTip: new best=00e3e2cb8aad309699b6c767b063b496a70fe849fde1856cfa9c7a3e8db82efb height=124 type=Sha256d version=0x31000000 tx=138 date='2021-05-14T03:31:02Z' cache=0.0MiB(14txo)
2021-05-14T03:31:12Z PruneStaleBlockIndexes: Checking for stale indexes. Block index size=207; Chain height=124
2021-05-14T03:31:12Z PruneStaleBlockIndexes: Erasing 1 of 82 irrelevant indexes. LastPrunedHeight now 3.
2021-05-14T03:31:12Z [default wallet] AddToWallet 02cd94b276169359253627290de0d7f108a69f4c98855bea322985169fcfc94b new
2021-05-14T03:31:12Z [default wallet] AddToWallet 19c632b8c0fd4be93f0f2c05d8fa42683b1f2beafd30ad07aa955943626cb860 update
2021-05-14T03:31:15Z UpdateTip: new best=716fd96058bc67a094c2cb73df6c7127e5fba665a1dcbe4164b250a8c58e9255 height=123 type=PoS version=0x30000000 tx=136 date='2021-05-14T03:30:32Z' cache=0.0MiB(13txo)
2021-05-14T03:31:15Z UpdateTip: new best=28706523b91a2bf1532fe58f32bc94e67436f7cea4e129ed00aae1cb80fdaab7 height=124 type=PoS version=0x30000000 tx=138 date='2021-05-14T03:31:04Z' cache=0.0MiB(15txo)
2021-05-14T03:31:15Z [default wallet] AddToWallet 02cd94b276169359253627290de0d7f108a69f4c98855bea322985169fcfc94b
2021-05-14T03:31:15Z [default wallet] AddToWallet 19c632b8c0fd4be93f0f2c05d8fa42683b1f2beafd30ad07aa955943626cb860 update
2021-05-14T03:31:15Z PruneStaleBlockIndexes: Checking for stale indexes. Block index size=207; Chain height=124
2021-05-14T03:31:15Z PruneStaleBlockIndexes: Erasing 0 of 82 irrelevant indexes. LastPrunedHeight now 3.
We reach height 116 + 117 without incident (this is very shortly after starting up, too), and then we continue to height 124, where the miner loses to the staker. We now have three orphan blocks (even though we didn't see it for 116 and 117).
When we sync the blocks to disk, debugging messages indicate 116 and 117 have an all-zero accumulator map and a hashAccumulators that disagrees2. 124 does not. All three are not in the active chain:
2021-05-14T03:32:39Z GetBlockHeader:445 WARNING index has empty accumulator map! block=CBlockIndex(pprev=0x7b5400087a00, nHeight=116, merkle=fcb0848297079344a6f40ce926b04ded479edfc0ac10c9e2c413f1c33597e988, hashBlock=bf18646c65c8200e3c28a25fa7378070fbc70f95ccc646bbf8cf785f84112495) hashAccum=d73c609e12078dc73ced3307cb070c5e24945bd2ba6c573e10448827ee324252
2021-05-14T03:32:39Z WriteBatchSync: Writing orphan block: height=116 hash=bf18646c65c8200e3c28a25fa7378070fbc70f95ccc646bbf8cf785f84112495 prev=01caa4fcfa305748acd9e67a5b1ab7aedc96c4f111083562ba8f438d62ae7291 exp=28a4b2824e6212746424513c4b917b03cb49d34944714412be7b975069dcc952
2021-05-14T03:32:39Z WriteBatchSync: Writing orphan block: height=124 hash=28706523b91a2bf1532fe58f32bc94e67436f7cea4e129ed00aae1cb80fdaab7 prev=716fd96058bc67a094c2cb73df6c7127e5fba665a1dcbe4164b250a8c58e9255 exp=00e3e2cb8aad309699b6c767b063b496a70fe849fde1856cfa9c7a3e8db82efb
2021-05-14T03:32:39Z GetBlockHeader:445 WARNING index has empty accumulator map! block=CBlockIndex(pprev=0x7b54000aff00, nHeight=117, merkle=ff83068c32b175237d783514282d1dc972382c6ee61a69d8cd3cb666c274461e, hashBlock=bfd3e3c47db9adbdfbe0c7f5d104f5694f828eccc4eac4b3ceb3489622b13844) hashAccum=d73c609e12078dc73ced3307cb070c5e24945bd2ba6c573e10448827ee324252
2021-05-14T03:32:39Z WriteBatchSync: Writing orphan block: height=117 hash=bfd3e3c47db9adbdfbe0c7f5d104f5694f828eccc4eac4b3ceb3489622b13844 prev=28a4b2824e6212746424513c4b917b03cb49d34944714412be7b975069dcc952 exp=a6076adc6b5eb783bc069adf7cd320e6ff711f7f3dd83f1bd088ef784416cf69
When we start up again, we hit this section of code, but only for orphan 116:
Lines 5673 to 5700 in b425596
| if (IsWitnessEnabled(pindexIter->pprev, params.GetConsensus()) && !(pindexIter->nStatus & BLOCK_OPT_WITNESS) && !chainActive.Contains(pindexIter)) { | |
| // Reduce validity | |
| pindexIter->nStatus = std::min<unsigned int>(pindexIter->nStatus & BLOCK_VALID_MASK, BLOCK_VALID_TREE) | (pindexIter->nStatus & ~BLOCK_VALID_MASK); | |
| // Remove have-data flags. | |
| pindexIter->nStatus &= ~(BLOCK_HAVE_DATA | BLOCK_HAVE_UNDO); | |
| // Remove storage location. | |
| pindexIter->nFile = 0; | |
| pindexIter->nDataPos = 0; | |
| pindexIter->nUndoPos = 0; | |
| // Remove various other things | |
| pindexIter->nTx = 0; | |
| pindexIter->nChainTx = 0; | |
| pindexIter->nSequenceId = 0; | |
| // Make sure it gets written. | |
| setDirtyBlockIndex.insert(pindexIter); | |
| // Update indexes | |
| setBlockIndexCandidates.erase(pindexIter); | |
| std::pair<std::multimap<CBlockIndex*, CBlockIndex*>::iterator, std::multimap<CBlockIndex*, CBlockIndex*>::iterator> ret = mapBlocksUnlinked.equal_range(pindexIter->pprev); | |
| while (ret.first != ret.second) { | |
| if (ret.first->second == pindexIter) { | |
| mapBlocksUnlinked.erase(ret.first++); | |
| } else { | |
| ++ret.first; | |
| } | |
| } | |
| } else if (pindexIter->IsValid(BLOCK_VALID_TRANSACTIONS) && pindexIter->nChainTx) { | |
| setBlockIndexCandidates.insert(pindexIter); | |
| } |
2021-05-14T03:32:46Z init: Checking block index
2021-05-14T03:32:46Z GetBlockHeader:445 WARNING index has empty accumulator map! block=CBlockIndex(pprev=0x7b5400006900,
nHeight=116, merkle=fcb0848297079344a6f40ce926b04ded479edfc0ac10c9e2c413f1c33597e988, hashBlock=bf18646c65c8200e3c28a25
fa7378070fbc70f95ccc646bbf8cf785f84112495) hashAccum=0000000000000000000000000000000000000000000000000000000000000000
2021-05-14T03:32:46Z WriteBatchSync: Writing orphan block: height=116 hash=bf18646c65c8200e3c28a25fa7378070fbc70f95ccc64
6bbf8cf785f84112495 prev=01caa4fcfa305748acd9e67a5b1ab7aedc96c4f111083562ba8f438d62ae7291 exp=28a4b2824e6212746424513c4b917b03cb49d34944714412be7b975069dcc952
2021-05-14T03:32:46Z WriteBatchSync: Checking the writes
2021-05-14T03:32:46Z ToString:820 WARNING index has empty accumulator map! hashAccum=0000000000000000000000000000000000000000000000000000000000000000
2021-05-14T03:32:46Z (debug) WriteBatchSync: !!!! Read failed for CDiskBlockIndex(CBlockIndex(pprev=0x7b5400006900, nHeight=116, merkle=fcb0848297079344a6f40ce926b04ded479edfc0ac10c9e2c413f1c33597e988, hashBlock=bf18646c65c8200e3c28a25fa7378070fbc70f95ccc646bbf8cf785f84112495)
hashBlock=e4331ddbe300b6eb008d4f0313e155bd477038beb6f2787048d422659f2add85, hashPrev=01caa4fcfa305748acd9e67a5b1ab7aedc96c4f111083562ba8f438d62ae7291): e4331ddbe300b6eb008d4f0313e155bd477038beb6f2787048d422659f2add85
2021-05-14T03:32:46Z GetBlockHeader:445 WARNING index has empty accumulator map! block=CBlockIndex(pprev=0x7b5400006900, nHeight=116, merkle=fcb0848297079344a6f40ce926b04ded479edfc0ac10c9e2c413f1c33597e988, hashBlock=bf18646c65c8200e3c28a25fa7378070fbc70f95ccc646bbf8cf785f84112495) hashAccum=0000000000000000000000000000000000000000000000000000000000000000
2021-05-14T03:32:46Z WriteBatchSync: Checking the whole database
2021-05-14T03:32:46Z (debug) WriteBatchSync: !!!! Search found bad block! 116 - Wrote: bf18646c65c8200e3c28a25fa7378070fbc70f95ccc646bbf8cf785f84112495, Read: e4331ddbe300b6eb008d4f0313e155bd477038beb6f2787048d422659f2add85
2021-05-14T03:32:46Z Block 116 accumulator data: Accumulators ZERO HASH
...
2021-05-14T03:32:51Z Syncing txindex with block chain from height 116
I'm not sure how it happened, but orphan 116's hashAccumulators has become 0 (which is not the hash of the zero accumulator map ZERO_HASH1), causing the CBlockIndex and the CDiskBlockIndex versions of the class to disagree on the block hash (the former caching the old value, the latter recalculating it including the hashAccumulators value of 0. In WriteBatchSync, additional reads for debugging iterate through the list of block indexes to verify against the database, and the reads attempted to read the block with the invalid has e4331ddb...
We also get the "Search found bad block" debug message when we shut down again. And on startup:
2021-05-14T03:35:39Z LoadBlockIndexGuts: Skipping Block 116: e4331ddbe300b6eb008d4f0313e155bd477038beb6f2787048d422659f2add85 (status=2)- Block Hash does not match Index Key
These errors repeat on every run, with no further changes to the state of orphan 116.
No such debug messages or errors occur for 117 or 124. Eventually we get a second orphan in this state (208), but there is no mention of the other two orphans mentioned here, until they are pruned away:
2021-05-14T04:12:22Z PruneStaleBlockIndexes: Checking for stale indexes. Block index size=255; Chain height=237
2021-05-14T04:12:22Z PruneStaleBlockIndexes: Erasing 0 of 17 irrelevant indexes. LastPrunedHeight now 102.
2021-05-14T04:12:55Z UpdateTip: new best=fd61eeaa8fa4d0200082ee3a789d785e9db9645db3840c909c470ba5a03e4353 height=238 type=PoS version=0x38000001 tx=314 date='2021-05-14T04:12:45Z' cache=0.0MiB(45txo)
2021-05-14T04:12:55Z PruneStaleBlockIndexes: Checking for stale indexes. Block index size=256; Chain height=238
2021-05-14T04:12:55Z PruneStaleBlockIndexes: Erasing 1 of 17 irrelevant indexes. LastPrunedHeight now 117.
...
2021-05-14T04:16:48Z PruneStaleBlockIndexes: Checking for stale indexes. Block index size=261; Chain height=245
2021-05-14T04:16:48Z PruneStaleBlockIndexes: Erasing 1 of 15 irrelevant indexes. LastPrunedHeight now 124.
In the meanwhile, despite the pruning height passing it entirely, orphan 116 is still in the db:
2021-05-14T04:13:40Z LoadBlockIndexGuts: Skipping Block 116: e4331ddbe300b6eb008d4f0313e155bd477038beb6f2787048d422659f2add85 (status=2)- Block Hash does not match Index Key
Actions:
- Use the cached hash of the accumulator map in
CDiskBlockIndex::GetBlockHash. - Investigate where blocks are provided the accumulator map from their parent, and ensure orphans are given it as well.
- Determine how 116 became corrupt between the initial sync that wrote it to disk and the subsequent startup and
RewindBlockIndexspecial case.
[1] The hash of the empty/all-zero accumulator map is 92ec7c1f8c0355c09d57aa8ff3e903f8fac348cebcc82c2adb215d66771ad412. Easy to use as a quick comparison.
[2] Debugging added to CBlockIndex::GetBlockHeader:
439 if (SerializeHash(mapAccumulatorHashes) == ZERO_HASH) {
440 if (hashAccumulators == ZERO_HASH)
441 LogPrintf("%s:%d WARNING index has zero accumulators. block=%s\n",
442 __func__, __LINE__, ToString());
443 else
444 LogPrintf("%s:%d WARNING index has empty accumulator map! block=%s "
445 "hashAccum=%s\n", __func__, __LINE__, ToString(), hashAccumulators.GetHex());
446 }[3] Debugging added to CBlockTreeDB::WriteBatchSync:
256 CBlockIndex* tmp;
257 if ((tmp = chainActive[diskindex.nHeight]) && (tmp->GetBlockHash() != (*it)->GetBlockHash())) {
258 LogPrintf("%s: Writing orphan block: height=%d hash=%s prev=%s exp=%s\n",
259 __func__, diskindex.nHeight, (*it)->GetBlockHash().GetHex(),
260 (*it)->GetPrevBlockHash().GetHex(),
261 tmp->GetBlockHash().GetHex());
262 }