Rebased to master the existing ProcessProposal PR#7752
Conversation
williambanfield
left a comment
There was a problem hiding this comment.
A few minor fixups. I think that, without pretty good justification, we should revert the TwoThirdPrevoteRound change since it's a bit unrelated and may be confounding to people trying to translate between this implementation and the algorithm in the paper.
internal/consensus/state.go
Outdated
|
|
||
| stateMachineValidBlock, err := cs.blockExec.ProcessProposal(cs.ProposalBlock) | ||
| if err != nil { | ||
| logger.Error("state machine returned an error when trying to process proposal block", "err", err) |
There was a problem hiding this comment.
Not sure if it makes sense to make this change now or in later fixups, but we should issue the prevote for nil in this error condition check.
There was a problem hiding this comment.
AFAIU, this is an unexpected error from ABCI infra, or maybe from the App (the nil-prevoting seems to take place in line 1520 below).
In this case I'd propose we do the same we do for existing API calls (e.g., BeginBLock, DeliverTx, EndBlock)... I'll look that up and then we can see what to do here
There was a problem hiding this comment.
Took a closer look at the code and actually in this "if" we fall through to the next one. When BlockExecutor's method ProcessProposal returns an error, it also returns false as first parameter, so the condition at the next "if" will be true and we'll prevote nil.
Maybe adding stateMachineValidBlock = false inside this "if" will help readability. What do you think?
There was a problem hiding this comment.
Ah, thanks for diving in. This feels like a bit of a conundrum since we didn't have this kind of application logic anywhere in the consensus engine at all before. The error type returned by ProcessProposal is an ErrInvalidBlock, which sounds like the block should not be considered valid, however, the IsOk check that happens on the response object appears to cover that condition instead. This means that this is a check is determining if communication with the app was successful or not.
In this situation, I think we get to choose how Tendermint responds when it has an issue communicating with the application. Our options are 1. Issue a prevote for nil 2. Crash, 3. Return from the method, issuing no prevote.
Thinking this through a bit, I would actually be in favor of crashing since it's not clear how to recover from this condition. The application is effectively not participating in consensus. We could hide the fact that it's not participating in consensus by just logging the error and continuing. I'd definitely be interested to hear what other people have to say about this though.
There was a problem hiding this comment.
I tend to agree with you: as we don't know how to react to this, I'd crash. If we try to (keep calm and) carry on, we might end up with another weird "manifestation" of the problem later on, which could be a nightmare to diagnose. In these cases, as a troubleshooter, I prefer to see where the problem actually started, in the form of a backtrace (or core file, or similar) rather than having to trace the problem back (via logs, etc).
But you're right: let's see if folks think the same way or have different views.
There was a problem hiding this comment.
I think I was involved in a similar discussion before where we had these three options. My ideas then was to log the error and not vote at all. My thinking was that crashing seems a bit extreme if there was only something temporarily wrong with for example the socket connection but was eventually able to recover in the following round.
As far as other nodes are concerned, not voting looks the same as crashing i.e. it's an omission fault. Then again I don't probably have sufficient knowledge on how the connection could fail and am not entirely opposed to doing that.
There was a problem hiding this comment.
We can further discuss about this if you have concerns. Mine is that If the communication is disrupted between Tendermint and the App (e.g., when using a socket), that is something we don't know how to recover from, even at a theoretical level. For instance, what if the error at this point is produced because the App just missed the last FinalizeBlock message?
I propose to leave the panic. If that gets in the way of some user in an unreasonable way, we can then "learn the lesson" and change it to an omission fault.
d8c4fc4 to
06e4265
Compare
06e4265 to
6dbe453
Compare
internal/consensus/state.go
Outdated
| // Consensus says we must vote nil | ||
| logger.Error("prevote step: consensus deems this block to be mustVoteNil", "err", err) |
There was a problem hiding this comment.
Can we clean up the comments here i.e. it's the application that his deemed the block invalid not consensus (I know this was part of the earlier commit). Also I'm unsure whether it makes sense to log an error. Logged errors are telling the node operator that they should be doing something. There isn't really anything they can do in this case. I would simply log this as either Info or Debug
There was a problem hiding this comment.
Thanks for spotting this!
There was a problem hiding this comment.
Actually, after reflecting on this, I left the log at error level. If Application rejects blocks, according to the spec, someone is byzantine, so I think operators should be aware of it. I clarified the error message
cmwaters
left a comment
There was a problem hiding this comment.
Just some minor clean up work left.
add processproposal proto/boilerplate/logic mockery gofmt fix test gofmt move UNKNOWN response behaviour to reject
b3ca1d7 to
f96daf7
Compare
This PR is rebasing PR #7091 implementing
ProcessProposalThe logic here is out of sync with the current ABCI++ spec, but rebasing the existing implementation is a first step to filling the gap between spec and implementation, which will be addressed via another issue (#7656)
This addresses #7735