Skip to content

Rebased to master the existing ProcessProposal PR#7752

Merged
sergio-mena merged 10 commits intomasterfrom
sergio/7735-rebase-process_proposal_PR
Feb 8, 2022
Merged

Rebased to master the existing ProcessProposal PR#7752
sergio-mena merged 10 commits intomasterfrom
sergio/7735-rebase-process_proposal_PR

Conversation

@sergio-mena
Copy link
Contributor

@sergio-mena sergio-mena commented Feb 2, 2022

This PR is rebasing PR #7091 implementing ProcessProposal

The logic here is out of sync with the current ABCI++ spec, but rebasing the existing implementation is a first step to filling the gap between spec and implementation, which will be addressed via another issue (#7656)

This addresses #7735

Copy link
Contributor

@williambanfield williambanfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor fixups. I think that, without pretty good justification, we should revert the TwoThirdPrevoteRound change since it's a bit unrelated and may be confounding to people trying to translate between this implementation and the algorithm in the paper.


stateMachineValidBlock, err := cs.blockExec.ProcessProposal(cs.ProposalBlock)
if err != nil {
logger.Error("state machine returned an error when trying to process proposal block", "err", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it makes sense to make this change now or in later fixups, but we should issue the prevote for nil in this error condition check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIU, this is an unexpected error from ABCI infra, or maybe from the App (the nil-prevoting seems to take place in line 1520 below).
In this case I'd propose we do the same we do for existing API calls (e.g., BeginBLock, DeliverTx, EndBlock)... I'll look that up and then we can see what to do here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a closer look at the code and actually in this "if" we fall through to the next one. When BlockExecutor's method ProcessProposal returns an error, it also returns false as first parameter, so the condition at the next "if" will be true and we'll prevote nil.
Maybe adding stateMachineValidBlock = false inside this "if" will help readability. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks for diving in. This feels like a bit of a conundrum since we didn't have this kind of application logic anywhere in the consensus engine at all before. The error type returned by ProcessProposal is an ErrInvalidBlock, which sounds like the block should not be considered valid, however, the IsOk check that happens on the response object appears to cover that condition instead. This means that this is a check is determining if communication with the app was successful or not.

In this situation, I think we get to choose how Tendermint responds when it has an issue communicating with the application. Our options are 1. Issue a prevote for nil 2. Crash, 3. Return from the method, issuing no prevote.

Thinking this through a bit, I would actually be in favor of crashing since it's not clear how to recover from this condition. The application is effectively not participating in consensus. We could hide the fact that it's not participating in consensus by just logging the error and continuing. I'd definitely be interested to hear what other people have to say about this though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to agree with you: as we don't know how to react to this, I'd crash. If we try to (keep calm and) carry on, we might end up with another weird "manifestation" of the problem later on, which could be a nightmare to diagnose. In these cases, as a troubleshooter, I prefer to see where the problem actually started, in the form of a backtrace (or core file, or similar) rather than having to trace the problem back (via logs, etc).

But you're right: let's see if folks think the same way or have different views.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I was involved in a similar discussion before where we had these three options. My ideas then was to log the error and not vote at all. My thinking was that crashing seems a bit extreme if there was only something temporarily wrong with for example the socket connection but was eventually able to recover in the following round.

As far as other nodes are concerned, not voting looks the same as crashing i.e. it's an omission fault. Then again I don't probably have sufficient knowledge on how the connection could fail and am not entirely opposed to doing that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can further discuss about this if you have concerns. Mine is that If the communication is disrupted between Tendermint and the App (e.g., when using a socket), that is something we don't know how to recover from, even at a theoretical level. For instance, what if the error at this point is produced because the App just missed the last FinalizeBlock message?

I propose to leave the panic. If that gets in the way of some user in an unreasonable way, we can then "learn the lesson" and change it to an omission fault.

@sergio-mena sergio-mena force-pushed the sergio/7735-rebase-process_proposal_PR branch from d8c4fc4 to 06e4265 Compare February 4, 2022 11:47
@sergio-mena sergio-mena marked this pull request as ready for review February 4, 2022 12:00
@sergio-mena sergio-mena force-pushed the sergio/7735-rebase-process_proposal_PR branch from 06e4265 to 6dbe453 Compare February 4, 2022 16:39
Comment on lines +1518 to +1519
// Consensus says we must vote nil
logger.Error("prevote step: consensus deems this block to be mustVoteNil", "err", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we clean up the comments here i.e. it's the application that his deemed the block invalid not consensus (I know this was part of the earlier commit). Also I'm unsure whether it makes sense to log an error. Logged errors are telling the node operator that they should be doing something. There isn't really anything they can do in this case. I would simply log this as either Info or Debug

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for spotting this!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, after reflecting on this, I left the log at error level. If Application rejects blocks, according to the spec, someone is byzantine, so I think operators should be aware of it. I clarified the error message

Copy link
Contributor

@cmwaters cmwaters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor clean up work left.

@sergio-mena sergio-mena force-pushed the sergio/7735-rebase-process_proposal_PR branch from b3ca1d7 to f96daf7 Compare February 8, 2022 16:09
@sergio-mena sergio-mena merged commit 27297a4 into master Feb 8, 2022
@sergio-mena sergio-mena deleted the sergio/7735-rebase-process_proposal_PR branch February 8, 2022 16:32
@sergio-mena sergio-mena mentioned this pull request Jul 22, 2022
35 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants