Skip to content

spec:abci2.0 - clarify crash recovery mechanism#469

Merged
jmalicevic merged 11 commits intofeature/abci++veffrom
jasmina/spec-crash-recovery
Mar 17, 2023
Merged

spec:abci2.0 - clarify crash recovery mechanism#469
jmalicevic merged 11 commits intofeature/abci++veffrom
jasmina/spec-crash-recovery

Conversation

@jmalicevic
Copy link
Copy Markdown
Collaborator

Closes #203 .

@jmalicevic jmalicevic added abci Application blockchain interface spec Specification-related labels Mar 6, 2023
@jmalicevic jmalicevic added this to the 2023-Q1 milestone Mar 6, 2023
@jmalicevic jmalicevic requested a review from a team as a code owner March 6, 2023 12:13
@jmalicevic jmalicevic self-assigned this Mar 6, 2023
@jmalicevic jmalicevic requested a review from a team as a code owner March 6, 2023 12:13
Copy link
Copy Markdown
Contributor

@lasarojc lasarojc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicking.

last block it successfully completed Commit for.
last block it successfully completed `Commit` for.

The application is expected to persist it's state during the `Commit` method.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The application is expected to persist it's state during the `Commit` method.
The application is expected to persist its state during the `Commit` method.

In other words, if `Commit` for a block H has not been executed and CometBFT crashed,
the application
should not have persisted and relied on any state between the `Commit` for H - 1 and
H.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really part of this PR, but in line 824, below, it states that either Commit completed and last_block_height is now H or Commit failed and last_block_height is still H-1, but even if Commit crashed midway, it may have crashed after the advancing to H. So the restriction should be last_block_height and last_block_app_hash agree, not that they are agree on H-1.

Suggestion

If the app successfully committed block `H`, then `last_block_height = H` and `last_block_app_hash = <hash returned by Commit for block H>`. 
If the app failed during the Commit of block `H`, then either 
- `last_block_height = H-1` and `last_block_app_hash = <hash returned by Commit for block H-1, which is the hash in the header of block H>`; or, 
- `last_block_height = H` and `last_block_app_hash = <hash returned by Commit for block H>`. 

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would, first, refer that last_block_height and last_block_app_hash are the fields returned by the Info method (https://github.com/cometbft/cometbft/blob/7c7b48039453aa66fd6b128af6701651e1398319/spec/abci/abci%2B%2B_methods.md). This because it was not clear to me, and these variables are only used here on the text.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second, is it not possible to have one of the fields updated (to H) but the other not?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Third, I agree with Lasaro here that the app could crash during the Commit of height H and it will have the two fields updated to H state, although Comet will not be aware of this because the crash occurs before returning the control to Comet.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the value assigned mid-commit before the crash persisted? As in, is the state not updated atomically (or at least saved)?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope #493 can help us.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the value assigned mid-commit before the crash persisted? As in, is the state not updated atomically (or at least saved)?

It must be the case that both values are updated during commit, very likely as the the last steps, and the problem is ensuring that both updates are part of one atomic operation, probably along with putting the final nail on the old's state coffin. Since we don't control that, as it happens on the app side, we must require the app to implement this behavior.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The application typically uses something like leveldb. Correct me if I am wrong, but this kind of databases do not support transactions. So, while both writes are probably atomic, if a crash occurs between them the state can become inconsistent.

Copy link
Copy Markdown
Collaborator

@sergio-mena sergio-mena Mar 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main invariant is that the App state is always committed after consensus persists the results (returned by the app) of executing the block.
So the app and consensus "persist" actions don't need to be atomic. Upon recovery, the logic described later on in this section is used to replay anything the app has "forgotten" (this is figured out with the handshake after recovery, using Info method).

Copy link
Copy Markdown

@cason cason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicking++.

In other words, if `Commit` for a block H has not been executed and CometBFT crashed,
the application
should not have persisted and relied on any state between the `Commit` for H - 1 and
H.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would, first, refer that last_block_height and last_block_app_hash are the fields returned by the Info method (https://github.com/cometbft/cometbft/blob/7c7b48039453aa66fd6b128af6701651e1398319/spec/abci/abci%2B%2B_methods.md). This because it was not clear to me, and these variables are only used here on the text.

In other words, if `Commit` for a block H has not been executed and CometBFT crashed,
the application
should not have persisted and relied on any state between the `Commit` for H - 1 and
H.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second, is it not possible to have one of the fields updated (to H) but the other not?

In other words, if `Commit` for a block H has not been executed and CometBFT crashed,
the application
should not have persisted and relied on any state between the `Commit` for H - 1 and
H.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Third, I agree with Lasaro here that the app could crash during the Commit of height H and it will have the two fields updated to H state, although Comet will not be aware of this because the crash occurs before returning the control to Comet.

Comment on lines +831 to +839
```
APP: Execute block Persist state
/ return ResultFinalizeBlock /
/ /
Event: ------------- block_stored -------------/ -------------state_stored ----------------/---app_persisted_state
| / | / |
CometBFT: Decides - Persistes block ---Call `FinalizeBlock` Persist results------------Commit----
on in the (txResults, validator
Block block store updated,..)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
```
APP: Execute block Persist state
/ return ResultFinalizeBlock /
/ /
Event: ------------- block_stored -------------/ -------------state_stored ----------------/---app_persisted_state
| / | / |
CometBFT: Decides - Persistes block ---Call `FinalizeBlock` Persist results------------Commit----
on in the (txResults, validator
Block block store updated,..)

APP: Execute block Persist application
/ return ResultFinalizeBlock / state
/ / |
Event: -------------- block_stored -----------/-------------state_stored -------------/---app_persisted_state
| / | /
CometBFT: Decide ---- Persist ----------- Call ------------- Persist ------------ Call Commit----
on Block block in the FinalizeBlock results
blockstore (txResults, validator updated,..)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is rendered weirdly :) I got the suggestion I think, but please confirm :)

Copy link
Copy Markdown
Collaborator

@sergio-mena sergio-mena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome improvement of the crash-recovery section, finally! Thanks for taking care of this!

@jmalicevic jmalicevic merged commit 19700ac into feature/abci++vef Mar 17, 2023
@jmalicevic jmalicevic deleted the jasmina/spec-crash-recovery branch March 17, 2023 10:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

abci Application blockchain interface spec Specification-related

Projects

No open projects
Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants