Skip to content
This repository was archived by the owner on Feb 23, 2022. It is now read-only.

ABCI++ spec. First complete version#366

Merged
sergio-mena merged 33 commits intomasterfrom
sergio/351-abci++-properties
Jan 17, 2022
Merged

ABCI++ spec. First complete version#366
sergio-mena merged 33 commits intomasterfrom
sergio/351-abci++-properties

Conversation

@sergio-mena
Copy link
Contributor

@sergio-mena sergio-mena commented Nov 22, 2021

This PR is addressing (some of) the comments in #351
The first steps are:

Copy link
Contributor

@josef-widder josef-widder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the clarifications in this draft. We will discuss my comments, and the TODOs in a call later today.

@sergio-mena
Copy link
Contributor Author

sergio-mena commented Nov 23, 2021

Sketch of items discussed in yesterday's meeting (Attendees: Sergio, Josef; Date: 2021-11-22):

  • Discussed open comments (see above)
  • We came up with the idea of adding a top-level third section (in addition to Description and Properties) called Application Modes, which should be a guide for the App developer on different ways to structure the Application
    • [AI] Write down first version
  • VerifyVoteExtension
    • Can it have side effects on state?
      • To which state do you apply the extension?
      • Bear in mind
        • several possible (unconfirmed) states from ProcessProposal
        • several possible different extensions from different validators in a round
        • several possible (different?) extensions from the same validator in different rounds
        • several possible different extensions from the same (byzantine) validator in the same round (i.e., equivocation)
      • We came to the conclusion it would be close to impossible to specify, or even forsee how it could be managed by the Application
      • Therefore, current approach: no side effects
    • Can Reject force consensus to reject a (non-nil) precommit value?
      • If it can --> important liveness implications to consensus
        • Imagine a run in which v is locked by f+1 correct processes in round r0, and a process outside that quorum repeatedly rejects precommits based on the vote extensions in rounds r > r0
      • Therefore, current approach: Reject cannot force rejection of precommit message if it is properly formatted and signed
    • If no side effects, and no influence on precommit message acceptance/rejection:
      • Could VerifyVoteExtension's Application logic be part or PrepareProsoal and ProcessProposal of the next block?
  • PreparePrososal
    • We need to specify the "limits" of the function w.r.t to modifying the block passed as parameter: what can and cannot be changed
      • Our current understanding is that the Application can:
        • modify the order of transactions
        • remove transactions
        • aggregate transactions (?)
        • add new transactions (which were not in the mempool)
        • Question: How is then the mempool maintained?
      • How about the block header?
    • PreparePrososal doesn't return Accept/Reject in Dev's pseudo-code, so they are not asking for it.
      • [AI] Remove it and adapt the text
  • Follow-ups:
    • Rework the text in aspects that are clearer now
    • Develop TODOs that weren't addressed so far (e.g., "Parameters and return values")
    • Continue the discusion (another iteration on the text)

@sergio-mena
Copy link
Contributor Author

Sketch of items discussed in today's meeting (Attendees: Sergio, Josef; Date: 2021-11-29):

  • Overall, we agreed that, after the changes coming up today (as result of this meeting) we should convert this draft PR into a full PR, and start gathering feedback from App folks
  • We went through the newly drafted "Properties" section
  • Properties 1 and 2
    • Non-determinism is needed by proposer-based timestamp.
    • [AI] However, as this is rather a "non-property", let's move it to the end of the section as a Note.
  • Property 3
    • Straightforward property, but needed
    • [AI] Need to flesh out which fields the App wants to be able to modify as part of PrepareProposal
  • Property 4
    • [AI] Add sentence saying the consequences on liveness of violating this property led to introducing $\bot$
    • [AI] Actually, define the $\bot$-based solution here. Add forward references from occurrences of $\bot$ in the "Methods" section
  • [AI] Swap properties 5 and 6
  • Property 7
    • This was the core of our discussion.
    • Josef convinced me that it is too restrictive for the things the App folks want to do (e.g. Validator-based oracles). It's a pity, cause it simplifies the whole thing a lot
    • [AI] Remove property 7, and adapt/remove the TODO that follows it
  • Property 8
    • As in property 4, what can happen if the App violates it? (a bug)
    • [AI] Add TODO, before property 8: "If all we can do when verifying vote extensions is to flag those that failed validation, couldn't we implement that logic in PrepareProposal?
  • Vote extend/verify, in general. We need to make clear somewhere in the draft, or when discussing with the App folks, the tradeoff we are facing here:
    • either we provide "best effort" properties/guarantees on extensions,
    • or we provide agreement guarantees
    • the latter MUST go through consensus, there's no way around it
  • Follow-ups:
    • Update the draft with the AIs above
    • Convert the draft PR into a PR
    • Start gathering feedback from App folks

@sergio-mena sergio-mena marked this pull request as ready for review November 30, 2021 00:12
@sergio-mena sergio-mena requested a review from cmwaters November 30, 2021 00:12
@sergio-mena
Copy link
Contributor Author

sergio-mena commented Nov 30, 2021

These are my notes from today's meeting with Callum (goal: record decisions made):

  • PrepareProposal: remove all params marked with (*) in the draft, as the App's modification options of the header are limited. Check the "two modes" comment below regarding what the App can modify.
  • Question for ABCI++ bi-weekly: Is the proposer's Tendermint supposed call ProcessProposal (in the round it is proposing), just as any other process? Or should we just assume there's no value added in doing that?
  • No events in PrepareProposal (so, remove the corresponding params). The App can keep them and send them at FinalizeBlock time. This makes sense for most Apps. Add this in "Usage".
  • Mempool management approach. We agreed on approach 2 of the draft. I'm removing the rest and turning the TODO into "official" text.
  • Question for ABCI++ bi-weekly: Approach 2 also supports "too early" transactions. If App want to delay a TX, it just removes it from the returned list. However, the App should be careful not to cause leaks. Is that OK for App folks?
  • We discussed about the two App modes: (a) "immediate execution" or "same-block hash"; and (b) "delayed execution" or "next-block hash"
    • As a safety measure ABCI++ should know the App mode so that it can't change things in the header it is not supposed to
    • The mode should be present in the Genesis file. It cannot be part of Consensus params, since changing it on the fly is calling for trouble
    • In mode (b), the App can't touch the header inside PrepareProposal, only the TXs
    • In mode (a), the App can (actually, should) modify the App Hash. It can also modify ConsensusParams, and ValidatorUpdates
    • In mode (a), the parts of the header that can be modified will take effect 1 block earlier:
      • the App Hash at height H refers to height H itself
      • Consensus Params in H's header will come into force at H+1
      • Validator Updates in H's header will come into effect for H+1
      • [@cmwaters , @josef-widder , please, keep me honest on these]
  • ProcessProposal: We agreed to not consider evidence collecting at this point, since the App can keep them and provide them at FinalizeBlock time (just like events in PrepareProposal). I'll remove it from the text
  • Finally, we discussed the $\bot$-based approach to help liveness in case of a bug in PrepareProposal or ProcessProposal that forces the latter to reject proposals it shouldn't.
    • We agreed that it's a good idea, but we need to sort out the implementation details (e.g., what the header will look like, etc)
    • If $\bot$ is decided, there should be a flag somewhere in the header to make it obvious that "there is a problem", so that it is not mistaken as an empty block due to lack of activity
  • Follow-ups:
    • We will continue the discussion tomorrow where we stopped today
    • I am addressing these outcomes on the text. A commit will follow up shortly.
  • Thanks a lot @cmwaters for the help!

@sergio-mena
Copy link
Contributor Author

sergio-mena commented Dec 2, 2021

Notes from meeting with Callum yesterday (2021-12-01):

  • There is a "bug" in the "When does Tendermint call it?" section of PrepareProposal. It should explain what happens when validValue is not nil.
  • Current protobuf version of ExtendVote (as of proto: abci++ changes #348 ) passed the whole Vote message to the app. We went through the fields inside Vote and only the ones currently present in the current spec are needed.
  • VerifyVoteExtension not forcing Tendermint to precommit nil should be discussed in the ABCI++ meeting.
  • heightas parameter in RequestPrepareProposal, RequestProcessProposal, and FinalizeBlock is redundant, as it is also contained in the Headerstructure.
  • VerifyVoteExtensionRequest should also have these parameters: block_id, and validator's address (sender of the extended vote).
  • Should the extension have a signed and an unsigned part? Is unsigned part useful? Discuss with App folks.
  • Make sure that we say ExtendVote is non-deterministic, just as PrepareProposal. Should probably go into Properties section.
  • Text needs to describe what Tendermint does with extended votes
  • Should FinalizeBlockRequest only include the block's hash for efficiency (the block was part of a PrepareProposal or ProcessProposal)?
    • Agreement is to include the whole block. We can implement the optimization later, as it will not be breaking this API.
  • Property 2. Clarify that the App is not modifying the header directly, but modifying the params that are the source of various hashes in the header (which Tendermint then modifies).
  • Add a property saying that a vote extension cannot be "used" in the same height as it is produced.
  • Follow-ups
    • Address this in the text
    • Discuss open points with App folks in ABCI++ meeting

@sergio-mena
Copy link
Contributor Author

sergio-mena commented Dec 2, 2021

Notes from ABCI++ meeting yesterday (2021-12-01, attendees Michael, William, Marko, Josef, Callum, Sergio, John, [anyone I'm forgetting?]):

  • Initial part of the meeting was presenting the draft ABCI++ spec, its overall structure, and its core part (i.e., Properties section) at a high level.
    • Also encouraging the attendees to review it if they have the opportunity, and provide feedback.
  • Main items discussed:
    a) should VerifyVoteExtension be prevented from forcing Tendermint to precommit nil on the block the extended vote is referring to?
    b) if the answer to a) is yes, can the logic of VerifyVoteExtension always be implemented as part of (next block's) PrepareProposal, thus rendering VerifyVoteExtension unnecessary?
  • Regarding a), everyone seemed to agree that a bug on ExtendVote/VerifyVoteExtension is not a reason to stop Consensus progress (validators getting stuck on very high round numbers, unable to decide).
    • Rather,
      • the vote extension should be flagged "invalid" by Tendermint if VerifyVoteExtension returns "reject", but
      • Tendermint should precommit the proposed block according to the algorithm.
    • This allows the blocks to keep on coming. Other transactions can thus continue to be processed, and the "invalid" flags in the extensions (included in the next block's header) will signal the maintainers that something needs to be fixed ASAP.
  • Regarding b), there was not a clear agreement. AFAICT, no strong argument was made against, but I think the issue needs to be discussed further.
  • Follow-ups:
    • Clarify point b) with further discussions so we know how to deal with VerifyVoteExtension in the spec's text.
    • Discuss the other unclear points remaining:
      • $\bot$-based improvement to liveness at ProcessProposal time.
      • The utility of "unsigned" part in the vote extension.
      • Is the proposer expected to call ProcessProposal, just as any other? Or can we assume that PrepareProposal has done the job?
        • Follow-up question: with the current determinism properties of ProcessProposal in the draft, there is not point in calling ProcessProposal on a block already received in a previous round of the current height. Is this correct?
      • Present proposed mempool management solution. See if there are issues with it.
        • In particular, the possibility for the App to delay a transaction.
      • Should the vote extension have a signed and an unsigned part?
      • Fork-join mechanism
      • Making ProposeTimeout part of ConsensusParams, since PrepareProposal and ProcessProposal now stand in the critical execution path
      • Should we require that a proposer sticks to its initial proposal for this height?
        • Goal: bound the number of candidate states that the App has to deal with in a particular height

@cmwaters
Copy link
Contributor

cmwaters commented Dec 7, 2021

I think another thing worth considering in this document is how blocksync might use the new ABCI calls. So far we've just considered the conditions within the consensus state machine in which the ABCI call needs to happen but I assume we will use these calls also in blocksync (but perhaps just ProcessPropsoal and FinalizeBlock).

@cmwaters
Copy link
Contributor

cmwaters commented Dec 7, 2021

I also talked to William and Daniel today about how they were handling proto changes in the spec and they said they were doing it in a long-lived branch. I think ABCI++ should follow the same pattern.

@sergio-mena
Copy link
Contributor Author

@cmwaters , regarding blocksync. I'm adding an AI in my notes to extend the document regarding this aspect.

@sergio-mena
Copy link
Contributor Author

@cmwaters , as for the branch. I agree with the proposed approach.
Moreover, as I mentioned yesterday, I'd propose to go even further: the spec should have the same release branch structure as the Tendermint Go repo (and probably, as you said, the spec versions should match the code's versions for simplicity).
E.g., the spec for existing ABCI should still live unmodified in the release (or throttle) branches corresponding to versions that still contain it (and which people may still use), and the new spec for ABCI++ should live in all release branches where ABCI++ is present; and ABCI++ branches should not contain ABCI spec.
For the moment, I'll work on the protobufs in the ABCI++ branch in the Go repo (so won't touch those on the spec). Then, when rebasing or merging the ABCI++ branch, I'll "export" any protobuf changes back to the spec repo.
Let me know if you have any thoughts on this.

@cmwaters
Copy link
Contributor

cmwaters commented Dec 8, 2021

For the moment, I'll work on the protobufs in the ABCI++ branch in the Go repo (so won't touch those on the spec). Then, when rebasing or merging the ABCI++ branch, I'll "export" any protobuf changes back to the spec repo.

This makes sense to me considering the ABCI++ branch already contains a set of proto changes

@sergio-mena
Copy link
Contributor Author

sergio-mena commented Dec 8, 2021

My notes on Monday's meeting with Josef & Dev (2021-12-06) and follow-up conversations with Josef and Callum:

  • Most of the meeting was devoted to re-discussing item a) from ABCI++ meeting on 2021-12-01 (see above)
    • Namely, a precommit message with an extension rejected by VerifyVoteExtension should nevertheless be accepted as a valid precommit message (part of the 2/3+ precommits used for deciding)
    • Dev isn't convinced about that, as the uses cases he has in mind (threshold signatures?) require that at least 2/3+ extensions used at next block's PrepareProposal are valid.
    • We explained that, if a bug is introduced in the ExtendVote-VerfiyVoteExtension pair, e.g., a deterministic bug, the block production will halt, even if the transactions in the proposed blocks are all valid.
  • After some discussion, we got to a possible solution:
    • Let's have a config parameter (an integer) called "protected rounds"
      • For every round r < r_protected, we reject precommit messages with invalid extensions.
      • For every round r >= r_protected, we accept precommit messages with invalid extensions to protect liveness of consensus.
  • In follow-up conversations, a similar solution was proposed:
    • The config is extended (Genesis file?, ConsensusParams?) with a boolean called "Failure Mode" that defines two modes of operation:
      • "halt": precommit messages with invalid vote extensions are rejected.
        • If vote extension/verification code has a problem the blockchain will get stuck with high round numbers of a height.
      • "continue": precommit messages with invalid vote extensions are flagged and accepted.
        • If vote extension/verification code has a problem the blockchain will continue to produce blocks with transactions, preserving safety properties, albeit with some flagged/invalid extensions provided to the App in the next height.
    • It is up to the App developer to choose the mode they want to operate on. But they should be aware of the liveness consequences of their choice.
    • Finally, this solution can be extended to the PrepareProposal-ProcessProposal pair. The implications of a problem in that code are not the same, but similar, as all discussions we've had boiled down to
      • "do we want to continue producing blocks while we fix the problem?, or
      • are we OK if the blockchain gets stuck while we find a fix?"
  • We have introduced an item in the agenda for this Thursday's Tendermint Core meeting

Copy link
Contributor

@williambanfield williambanfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together. My comments here are a mixture of not having previously had a complete grasp on the proposal of ABCI++ and some points that may need some further thinking or clarification. Overall, I'm really excited about the direction that ABCI++ is headed.

Copy link
Contributor Author

@sergio-mena sergio-mena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks William for going through the text and providing feedback.

@williambanfield
Copy link
Contributor

Thanks for responding! I left a few comments, but a lot of my previous queries were addressed.

@sergio-mena
Copy link
Contributor Author

Thanks William. Pushing a small commit to address your latest comments

@sergio-mena
Copy link
Contributor Author

sergio-mena commented Dec 10, 2021

My notes on the Tendermint Core meeting on 2021-12-09, where core points of ABCI++ design were discussed:

  • Recording Passcode: XZZZDz^9
  • Slides
  • Two main items were discussed
    • Item#1. ProcessProposal using fork-join as described in v4.md in the spec files (Item#1 in the slides)
      • After some discussion, we agreed to
        • not include this mechanism in the first version of ABCI++
        • open an issue to revisit the Tendermint algorithm to see if we can nevertheless insert the join just before Tendermint locks on a value
    • Item#2. Failure modes.
      • The two failure modes, namely "halt" and "continue" were presented, first in general terms, then applied to PrepareProposal-ProcessProposal, and ExtendVote-VerifyVoteExtension
      • We engaged in a deep and thorough discussion of the pros and cons of each of the failure modes in the case of a deterministic bug in Proposal-related code, or Extension-related code
      • The config parameter "failure mode" I was proposing in the slides was not viewed by many as a workable solution
        • main argument against was that different SDK modules might want different failure modes, making the configurable parameter a source of confusion rather than something helpful
      • From that point on, the discussion focused on deciding which mode we wanted Tendermint to implement (as it won't be configurable)
      • Several participants in the meeting (including myself) went from being convinced "continue" was best, to being convinced "halt" was inevitable, or the other way around
      • The final decision was to:
        • go for the "halt" failure mode
        • but the application SHOULD always return "true" on ProcessProposal and VerifyVoteExtension, unless they are ready to shoot themselves in the foot, regarding liveness
        • this way, the application will, in the general case, implement "continue", but it also has the choice to go for "halt" if really needed

Copy link
Contributor

@josef-widder josef-widder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sergio-mena thanks a lot for the great work!! This clarifies a lot. There is so much information in this document. Let's think about a way to cut it in digestible chunks ;-)

Copy link
Contributor

@williambanfield williambanfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't check all of the TODOs / application modes discussion at the bottom, but this is looking great.

@williambanfield
Copy link
Contributor

My current question on this spec: Are there additional specification questions that need to be answered before implementation can begin? I think that the design for most of this is reasonably clear, with a few minor exceptions like how/if verifyVoteExtension will initially be implemented if the data is not signed over by Tendermint. Given that many of the major technical decisions seem settled, I'm wondering if anything stops us from starting to implement?

Copy link
Contributor Author

@sergio-mena sergio-mena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again, William and Josef, for your review. You folks spotted (again!) important glitches in the text 😊

@sergio-mena
Copy link
Contributor Author

My current question on this spec: Are there additional specification questions that need to be answered before implementation can begin? I think that the design for most of this is reasonably clear, with a few minor exceptions like how/if verifyVoteExtension will initially be implemented if the data is not signed over by Tendermint. Given that many of the major technical decisions seem settled, I'm wondering if anything stops us from starting to implement?

There is already an implementation in a long-lived branch. It consists of 5 PRs (4 merged, 1 outstanding) AFAICT. I wanted to dive into those to identify the differences between that implementation and the current spec. But I haven't had the time so far :-( ... maybe this week

@williambanfield
Copy link
Contributor

There is already an implementation in a long-lived branch. It consists of 5 PRs (4 merged, 1 outstanding) AFAICT. I wanted to dive into those to identify the differences between that implementation and the current spec. But I haven't had the time so far :-( ... maybe this week

That definitely makes sense, you've been making a lot of great progress on the spec so it makes sense that you have not dug into all parts of the code. I was more wondering if there were additional pieces of this spec that you felt were blocking us from beginning an implementation? I.e. major unanswered questions or risks that would require us to completely shift after we'd begun? As far as I can tell, we have a few clearly-defined new method calls that we are planning to add to tendermint. These will interact directly with the consensus engine in well specified places. Because of the specificity of the changes, I think we can begin implementing a lot of this without answers to each and every question.

sergio-mena and others added 19 commits January 17, 2022 09:43
…bci++_basic_concepts_002_draft.md

Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com>
Co-authored-by: Josef Widder <44643235+josef-widder@users.noreply.github.com>
Co-authored-by: Callum Waters <cmwaters19@gmail.com>
@sergio-mena
Copy link
Contributor Author

Rebasing the branch again to tip of master...

@sergio-mena sergio-mena force-pushed the sergio/351-abci++-properties branch from 1c627c3 to 51224e7 Compare January 17, 2022 08:45
@sergio-mena
Copy link
Contributor Author

My notes on the ABCI++ meeting last Friday (2022-01-14), where some points of ABCI++ design were discussed:

  • ProposeTimeout on config.toml, rather than on ConsensusParams for v0.36 (as an interim solution)
    • Reason for interim solution: soft upgrades will be done in 0.37 (or later) and changing ConsensusParams needs soft upgrades (or hashing should be partial)
    • When discussing the main disadvantage of this interim solution (i.e., not being able to change it on the fly), Sam pointed out that you can even change it on the fly via "social consensus". Need to restart the nodes, but that's a somewhat common operation
    • William noted that PBTS are adding changes to ConsensusParams, so we could sort of piggyback on those changes
  • Producing empty blocks/need_new_block boolean
    • This new boolean in combination with existing create_empty_blocks/create_empty_blocks_interval in config.toml, there are some combinations that don't make sense, or shouldn't be supported.
    • It was agreed that more discussion is needed, and therefore this item should be "punted out" of ABCI++
    • As this was self-contained in one commit of this PR, agreement was to revert that commit
  • Future of CheckTx
    • Marko has a proposal. We will discuss about it this week
    • My view: let's leave CheckTx as it is for the moment while we land ABCI++, then we can decide about CheckTx

@mpoke
Copy link

mpoke commented Jan 18, 2022

Here are a few comments regarding the spec. In general it looks nice; great job. I’d just add a bit more context and structure. Here is a list with more detailed comments:

Basic concepts and definitions

  • It jumps directly into Connections / Errors … without an overview; this makes hard to follow, especially the function names.
  • There is no ABCI vs ABCI++ overview.
  • I'd first describe a concept and then add examples of where it is used. For example, in the Events section, you start to describe different methods that were not yet introduced.
  • Last sentence in VoteExtensions is unclear.
  • In general, if this is the entry point, then it should contain a bit more context for the reader (some overview, diagram, high-level description).

Application requirements

  • Equations are not visible on github (i.e., $p_i$)
  • The structure is not clear, i.e., bullet points vs paragraphs. This makes it hard to find something without reading the entire section. Also, some statements may be lost (e.g., third sentence is not really emphasized and I assume it’s important)
  • In Requirement 4, you redefine p and q, which where defined in the beginning of the section.
  • In Requirement 7, $e^r_p$ is not defined

Tendermint’s expected behavior

  • When describing the grammar line by line, add a comment with the path to get there, e.g., start -> clean-start / recovery -> consensus-exec before describing consensus-exec
  • Not clear why a non-proposer starts with calling VerifyVoteExtension multiple times.
  • “We have kept some of the ABCI++ methods out of the grammar” here should be ABCI
  • I’d add the functions in the last section (i.e., Adapting existing Applications that use ABCI) as pseudocode.

Methods

  • Some methods are missing (e.g., BeginBlock)
  • Why is the Field Number relevant to the spec?
  • For some methods, the Usage section is very large (e.g., PrepareProposal). I’d try to add some structure, e.g., pre- and post-conditions, pseudocode.
  • For ProcessProposal, under When does Tendermint call it?
    • You state that the validator p may be the same as the proposer q, and then at (# 5) you state that “If Tendermint should prevote for the block just received” then it calls ProcessProposal. This is kind of unclear. It depends on the condition to prevote. For example, if a proposer prevotes on its own proposals, this contradicts the statement from Tendermint’s Expected Behavior “ProcessProposal will be called exactly once at all processes except the proposer”.
    • (# 2) -> PrepareProposal has only 5 steps.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants