Skip to content

Conversation

@mzumsande
Copy link
Contributor

@mzumsande mzumsande commented May 22, 2024

This addresses the problem from #11600 that the Rolling Forward process after a crash (ReplayBlocks()) is uninterruptible. ReplayBlocks can take a long time to finish, and this can be especially annoying to GUI users who are taunted to "press q to shutdown" even though pressing "q" does nothing.

Now, when an interrupt is received during ReplayBlocks(), the intermediate progress is saved:
In addition to writing the updated coins to disk, the flush adjusts the lower head block (saved in DB_HEAD_BLOCKS), so that with the next restart, the replay continues from where it was stopped without losing progress.

I tested this manually on signet: A situation where ReplayBlocks() becomes necessary can be created by syncing with -dbcrashratio=1 and stopping the node after ~70k blocks. Then, after the next restart, the replay process can be interrupted. I then used the dumptxoutset output to check that the final utxo-set hash is unaffected by how many times the process is interrupted.

@DrahtBot
Copy link
Contributor

DrahtBot commented May 22, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/30155.

Reviews

See the guideline for information on the review process.

Type Reviewers
Concept ACK TheCharlatan, andrewtoth, l0rinc

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #33443 (log: reduce excessive "rolling back/forward" messages during block replay by l0rinc)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@mzumsande mzumsande force-pushed the 202405_replay_blocks branch from 4bd626e to 26e78f2 Compare May 22, 2024 20:32
std::vector<uint256> old_heads = GetHeadBlocks();
if (old_heads.size() == 2) {
if (old_heads[0] != hashBlock) {
LogPrintLevel(BCLog::COINDB, BCLog::Level::Error, "The coins database detected an inconsistent state, likely due to a previous crash or shutdown. You will need to restart bitcoind with the -reindex-chainstate or -reindex configuration option.\n");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a safe assumption that this cannot happen except an interrupted replay?

Copy link
Contributor Author

@mzumsande mzumsande May 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we abort with an assert in the next line after the log, so the expectation in master is that this shouldn't happen outside of db corruption. I couldn't find any reports on the internet of users that have run into this error / assert, so it probably doesn't happen very often in practice.
But theoretically speaking, if the coins db gets corrupted, I guess anything can happen depending on the specifics of the corruption?

While working on this PR, I initially went with the different approach of adding an explicit in_replay flag to BatchWrite(), and setting that flag only during the interrupt flush. I switched to the current approach of detecting this implicitly inside BatchWrite() because it seemed more elegant and led to less boilerplate code, but it's still an option.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "validation: Make ReplayBlocks interruptible" (26e78f2)

I switched to the current approach of detecting this implicitly inside BatchWrite() because it seemed more elegant and led to less boilerplate code, but it's still an option.

I would definitely be curious to see the other approach. Not just because it might be better at detecting inconsistent states like Luke is suggesting, but also because it seems like it would make the code more clear.

I think it would be pretty hard to look at the code after this PR (without looking at the PR itself), and understand why old_heads[0] != hashBlock implies that this BatchWrite call is finalizing a replay that is currently being interrupted.

I was also confused by the name of the "unfinished_replay" variable below, because I thought "unfinished replay" was just referring to a replay of an previously unfinished flush. I did not realize it was referring specifically to an interrupted replay of an unfinished flush.

Also as a possible followup, for future maintainability of this code I think it could be a good idea to remove the replay logic entirely from BatchWrite(), since ReplayBlocks() is the only BatchWrite() caller that should need it. Other BatchWrite calls should be able to assume and assert that GetHeadBlocks() is empty, and this is just a normal write, not a replay.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would definitely be curious to see the other approach. Not just because it might be better at detecting inconsistent states like Luke is suggesting, but also because it seems like it would make the code more clear.

See mzumsande@db8232d - because of the inheritance the bool interrupted_replay needed to be added to quite a few spots, which I didn't love.

I think it would be pretty hard to look at the code after this PR (without looking at the PR itself), and understand why old_heads[0] != hashBlock implies that this BatchWrite call is finalizing a replay that is currently being interrupted.

I'll add more documentation (if I don't switch to the other approach).

I was also confused by the name of the "unfinished_replay" variable below,

Maybe interrupted_replay is better? But the issue remains that it's the replay that is being interrupted (or unfinished), and not the flush.

Also as a possible followup, for future maintainability of this code I think it could be a good idea to remove the replay logic entirely from BatchWrite(), since ReplayBlocks() is the only BatchWrite() caller that should need it. Other BatchWrite calls should be able to assume and assert that GetHeadBlocks() is empty, and this is just a normal write, not a replay.

Seems reasonable for a follow-up, though I haven't looked into details yet.

@sedited
Copy link
Contributor

sedited commented Oct 15, 2024

Concept ACK

@andrewtoth
Copy link
Contributor

Concept ACK

It appears to me that the size of the cache used in ReplayBlocks is never checked if it grows too large. Wouldn't this mean that if the node was sufficiently far behind, the rolling forward would fill up the cache to levels exceeding dbcache? This would result in a cycle of rolling forward and then OOM crashing, never being able to write any chainstate to disk.

@mzumsande
Copy link
Contributor Author

It appears to me that the size of the cache used in ReplayBlocks is never checked if it grows too large. Wouldn't this mean that if the node was sufficiently far behind, the rolling forward would fill up the cache to levels exceeding dbcache? This would result in a cycle of rolling forward and then OOM crashing, never being able to write any chainstate to disk.

Good point - I think that is true, ReplayBlocks() uses its own cache, which doesn't seem to be subject to any limits. However, the fact that we got into this position in the first place means that the user specified a certain dbcache in their original run that was shutdown uncleanly, and I guess that will be respected by design because we don't connect any blocks that weren't also connected in the first run?!

I guess users don't get the chance to change their preference in between the original run and the replay though, which is not ideal. Maybe the method introduced in this PR could also be used to do regular flushes.

@l0rinc
Copy link
Contributor

l0rinc commented Sep 20, 2025

I have tested this with a recent interrupted reindex, it correctly interrupts and flushes and on next start continues from where we left off:

2025-09-20T06:11:25Z Rolling forward 00000000127100b0bcdde9e9f363ae231c73e885e2d527821ef9ce0457930494 (51141)
2025-09-20T06:11:25Z Rolling forward 0000000013484b73eff09514abc3f916b319eb9f43c55f051be9e760ad2210d0 (51142)
^C2025-09-20T06:11:25Z Flushing intermediate state of replay
2025-09-20T06:11:25Z [error] Unable to replay blocks. You will need to rebuild the database using -reindex-chainstate.
Error: Unable to replay blocks. You will need to rebuild the database using -reindex-chainstate.
2025-09-20T06:11:25Z Shutdown in progress...

and on a next run continues from where we left off:

2025-09-20T06:14:27Z Replaying blocks
2025-09-20T06:14:27Z Rolling forward 0000000002a5a4503b56e432dea4647e99037ac20b9499d27a0782d7c2fc417a (51143)
2025-09-20T06:14:27Z Rolling forward 000000001732f85802afe0ff3680ce45f62c21e03b6924054df5993c3def3c85 (51144)

Needs a rebase, but otherwise Concept ACK

Now, when an interrupt is received during ReplayBlocks the
intermediate progress is flushed:
In addition to updating the coins on disk, this
flush adjusts the head blocks, so that with the next
restart, the replay continues from where we stopped.
@mzumsande
Copy link
Contributor Author

Needs a rebase, but otherwise Concept ACK

Rebased for fresh CI (rebase seemed clean, didn't change anything)

@mzumsande
Copy link
Contributor Author

Closing as "up for grabs" - #11600 is already closed, and because we now flush the chainstate every hour replays in general shouldn't take as much time anymore, so there is less need to interrupt/continue them.

That said, it could still be a nice-to-have if someone wants to pick it up (note #30155 (comment) in this case, maybe there is a more elegant way I didn't see).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants