fix(blocksync)!: don't block in blocksync if our voting power is blocking the chain#3406
fix(blocksync)!: don't block in blocksync if our voting power is blocking the chain#3406sergio-mena merged 14 commits intomainfrom
Conversation
69ccf42 to
0160866
Compare
cason
left a comment
There was a problem hiding this comment.
Not sure about this workaround.
We should now whether we should run block sync outside the protocol. But, ok, it works. But by changing the block Reactor constructor, we breaking a lot of code.
cason
left a comment
There was a problem hiding this comment.
I would approve, but the >=1/3 vs >2/3 question remains open.
See associated comment (line 515).
.changelog/unreleased/bug-fixes/3406-blocksync-dont-stall-if-blocking-chain.md
Outdated
Show resolved
Hide resolved
.changelog/unreleased/bug-fixes/3406-blocksync-dont-stall-if-blocking-chain.md
Outdated
Show resolved
Hide resolved
Co-authored-by: Daniel <daniel.cason@informal.systems>
|
I think it's definitely safe to backport, it can't really affect mainnets as you need one Val w/ over 1/3 to do anything. (And it only helps users right now if it's on the 38 line) |
We need to find a solution for v0.37.x too... |
To me, it's a bug, so unless there is big risk identified I'd backport it. Besides, this is clearly holding teams back, which are on |
…king the chain (#3406) Partially addresses #3415 The a node has no peers, blocksync gets stuck without switching to consesnus, because it needs info from other peers to have an idea of maximum height. However, there is an edge case (mainly when testing) where a validator might have >2/3 of the voting power and other validators are not started. In this case, we know we are blocking the chain, so we don't need to stay in blockchain if the only condition is that we don't have peers. Moreover, in order to block a chain, 1/3 of the voting power is enough, so the reasoning of this fix is the following: * _I am a node and I am starting... shall I run blocksync?_ * _Well, looks like I have 1/3 of the voting power (or more) at my current height... so there's no way the chain could advance in my absence... so I don't need to blocksync"_ Explanation of commits: * Commit 1: `e2e` testbed reproducing the issue * Commit 2: commit with a trivial change to trigger `e2e` tests. Check the error: ❌ next to the commit hash (3fb1057) * Commit 3: Tentative fix. Although there is a ❌ next to the commit hash (16a46ea), if you click on it, you'll see that `e2e` are passing now. * Commit 4: revert commit2 * Commit 5: Move the check for "local node is blocking the chain" outside the pool, as suggested by @cason * Commit 6: Fixed unit tests All further commits: addressing other comments and tidying up the code --- #### PR checklist - [x] Tests written/updated - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - ~[ ] Updated relevant documentation (`docs/` or `spec/`) and code comments~ - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec --------- Co-authored-by: Daniel <daniel.cason@informal.systems> (cherry picked from commit bd95579) # Conflicts: # internal/blocksync/reactor.go
…king the chain (#3406) Partially addresses #3415 The a node has no peers, blocksync gets stuck without switching to consesnus, because it needs info from other peers to have an idea of maximum height. However, there is an edge case (mainly when testing) where a validator might have >2/3 of the voting power and other validators are not started. In this case, we know we are blocking the chain, so we don't need to stay in blockchain if the only condition is that we don't have peers. Moreover, in order to block a chain, 1/3 of the voting power is enough, so the reasoning of this fix is the following: * _I am a node and I am starting... shall I run blocksync?_ * _Well, looks like I have 1/3 of the voting power (or more) at my current height... so there's no way the chain could advance in my absence... so I don't need to blocksync"_ Explanation of commits: * Commit 1: `e2e` testbed reproducing the issue * Commit 2: commit with a trivial change to trigger `e2e` tests. Check the error: ❌ next to the commit hash (3fb1057) * Commit 3: Tentative fix. Although there is a ❌ next to the commit hash (16a46ea), if you click on it, you'll see that `e2e` are passing now. * Commit 4: revert commit2 * Commit 5: Move the check for "local node is blocking the chain" outside the pool, as suggested by @cason * Commit 6: Fixed unit tests All further commits: addressing other comments and tidying up the code --- #### PR checklist - [x] Tests written/updated - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - ~[ ] Updated relevant documentation (`docs/` or `spec/`) and code comments~ - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec --------- Co-authored-by: Daniel <daniel.cason@informal.systems> (cherry picked from commit bd95579) # Conflicts: # .changelog/v0.38.3/bug-fixes/3406-blocksync-dont-stall-if-blocking-chain.md # blocksync/reactor.go # blocksync/reactor_test.go # node/node.go
…king the chain (#3406) Partially addresses #3415 The a node has no peers, blocksync gets stuck without switching to consesnus, because it needs info from other peers to have an idea of maximum height. However, there is an edge case (mainly when testing) where a validator might have >2/3 of the voting power and other validators are not started. In this case, we know we are blocking the chain, so we don't need to stay in blockchain if the only condition is that we don't have peers. Moreover, in order to block a chain, 1/3 of the voting power is enough, so the reasoning of this fix is the following: * _I am a node and I am starting... shall I run blocksync?_ * _Well, looks like I have 1/3 of the voting power (or more) at my current height... so there's no way the chain could advance in my absence... so I don't need to blocksync"_ Explanation of commits: * Commit 1: `e2e` testbed reproducing the issue * Commit 2: commit with a trivial change to trigger `e2e` tests. Check the error: ❌ next to the commit hash (3fb1057) * Commit 3: Tentative fix. Although there is a ❌ next to the commit hash (16a46ea), if you click on it, you'll see that `e2e` are passing now. * Commit 4: revert commit2 * Commit 5: Move the check for "local node is blocking the chain" outside the pool, as suggested by @cason * Commit 6: Fixed unit tests All further commits: addressing other comments and tidying up the code --- #### PR checklist - [x] Tests written/updated - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - ~[ ] Updated relevant documentation (`docs/` or `spec/`) and code comments~ - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec --------- Co-authored-by: Daniel <daniel.cason@informal.systems> (cherry picked from commit bd95579) # Conflicts: # blocksync/reactor_test.go # internal/blocksync/reactor.go # node/node.go # node/setup.go
…king the chain (#3406) Partially addresses #3415 The a node has no peers, blocksync gets stuck without switching to consesnus, because it needs info from other peers to have an idea of maximum height. However, there is an edge case (mainly when testing) where a validator might have >2/3 of the voting power and other validators are not started. In this case, we know we are blocking the chain, so we don't need to stay in blockchain if the only condition is that we don't have peers. Moreover, in order to block a chain, 1/3 of the voting power is enough, so the reasoning of this fix is the following: * _I am a node and I am starting... shall I run blocksync?_ * _Well, looks like I have 1/3 of the voting power (or more) at my current height... so there's no way the chain could advance in my absence... so I don't need to blocksync"_ Explanation of commits: * Commit 1: `e2e` testbed reproducing the issue * Commit 2: commit with a trivial change to trigger `e2e` tests. Check the error: ❌ next to the commit hash (3fb1057) * Commit 3: Tentative fix. Although there is a ❌ next to the commit hash (16a46ea), if you click on it, you'll see that `e2e` are passing now. * Commit 4: revert commit2 * Commit 5: Move the check for "local node is blocking the chain" outside the pool, as suggested by @cason * Commit 6: Fixed unit tests All further commits: addressing other comments and tidying up the code --- - [x] Tests written/updated - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - ~[ ] Updated relevant documentation (`docs/` or `spec/`) and code comments~ - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec --------- Co-authored-by: Daniel <daniel.cason@informal.systems>
…king the chain (backport #3406) (#3420) Partially addresses #3415 The a node has no peers, blocksync gets stuck without switching to consesnus, because it needs info from other peers to have an idea of maximum height. However, there is an edge case (mainly when testing) where a validator might have >2/3 of the voting power and other validators are not started. In this case, we know we are blocking the chain, so we don't need to stay in blockchain if the only condition is that we don't have peers. Moreover, in order to block a chain, 1/3 of the voting power is enough, so the reasoning of this fix is the following: * _I am a node and I am starting... shall I run blocksync?_ * _Well, looks like I have 1/3 of the voting power (or more) at my current height... so there's no way the chain could advance in my absence... so I don't need to blocksync"_ Explanation of commits: * Commit 1: `e2e` testbed reproducing the issue * Commit 2: commit with a trivial change to trigger `e2e` tests. Check the error: ❌ next to the commit hash (3fb1057) * Commit 3: Tentative fix. Although there is a ❌ next to the commit hash (16a46ea), if you click on it, you'll see that `e2e` are passing now. * Commit 4: revert commit2 * Commit 5: Move the check for "local node is blocking the chain" outside the pool, as suggested by @cason * Commit 6: Fixed unit tests All further commits: addressing other comments and tidying up the code --- #### PR checklist - [x] Tests written/updated - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - ~[ ] Updated relevant documentation (`docs/` or `spec/`) and code comments~ - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec <hr>This is an automatic backport of pull request #3406 done by [Mergify](https://mergify.com). --------- Co-authored-by: Sergio Mena <sergio@informal.systems> Co-authored-by: Daniel <daniel.cason@informal.systems>
…king the chain (#3406) Partially addresses #3415 The a node has no peers, blocksync gets stuck without switching to consesnus, because it needs info from other peers to have an idea of maximum height. However, there is an edge case (mainly when testing) where a validator might have >2/3 of the voting power and other validators are not started. In this case, we know we are blocking the chain, so we don't need to stay in blockchain if the only condition is that we don't have peers. Moreover, in order to block a chain, 1/3 of the voting power is enough, so the reasoning of this fix is the following: * _I am a node and I am starting... shall I run blocksync?_ * _Well, looks like I have 1/3 of the voting power (or more) at my current height... so there's no way the chain could advance in my absence... so I don't need to blocksync"_ Explanation of commits: * Commit 1: `e2e` testbed reproducing the issue * Commit 2: commit with a trivial change to trigger `e2e` tests. Check the error: ❌ next to the commit hash (3fb1057) * Commit 3: Tentative fix. Although there is a ❌ next to the commit hash (16a46ea), if you click on it, you'll see that `e2e` are passing now. * Commit 4: revert commit2 * Commit 5: Move the check for "local node is blocking the chain" outside the pool, as suggested by @cason * Commit 6: Fixed unit tests All further commits: addressing other comments and tidying up the code --- - [x] Tests written/updated - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - ~[ ] Updated relevant documentation (`docs/` or `spec/`) and code comments~ - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec --------- Co-authored-by: Daniel <daniel.cason@informal.systems>
…king the chain (#3406) Partially addresses #3415 The a node has no peers, blocksync gets stuck without switching to consesnus, because it needs info from other peers to have an idea of maximum height. However, there is an edge case (mainly when testing) where a validator might have >2/3 of the voting power and other validators are not started. In this case, we know we are blocking the chain, so we don't need to stay in blockchain if the only condition is that we don't have peers. Moreover, in order to block a chain, 1/3 of the voting power is enough, so the reasoning of this fix is the following: * _I am a node and I am starting... shall I run blocksync?_ * _Well, looks like I have 1/3 of the voting power (or more) at my current height... so there's no way the chain could advance in my absence... so I don't need to blocksync"_ Explanation of commits: * Commit 1: `e2e` testbed reproducing the issue * Commit 2: commit with a trivial change to trigger `e2e` tests. Check the error: ❌ next to the commit hash (3fb1057) * Commit 3: Tentative fix. Although there is a ❌ next to the commit hash (16a46ea), if you click on it, you'll see that `e2e` are passing now. * Commit 4: revert commit2 * Commit 5: Move the check for "local node is blocking the chain" outside the pool, as suggested by @cason * Commit 6: Fixed unit tests All further commits: addressing other comments and tidying up the code --- - [x] Tests written/updated - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - ~[ ] Updated relevant documentation (`docs/` or `spec/`) and code comments~ - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec --------- Co-authored-by: Daniel <daniel.cason@informal.systems>
…ing the chain (backport #3406) (#3421) Partially addresses #3415 The a node has no peers, blocksync gets stuck without switching to consesnus, because it needs info from other peers to have an idea of maximum height. However, there is an edge case (mainly when testing) where a validator might have >2/3 of the voting power and other validators are not started. In this case, we know we are blocking the chain, so we don't need to stay in blockchain if the only condition is that we don't have peers. Moreover, in order to block a chain, 1/3 of the voting power is enough, so the reasoning of this fix is the following: * _I am a node and I am starting... shall I run blocksync?_ * _Well, looks like I have 1/3 of the voting power (or more) at my current height... so there's no way the chain could advance in my absence... so I don't need to blocksync"_ Explanation of commits: * Commit 1: `e2e` testbed reproducing the issue * Commit 2: commit with a trivial change to trigger `e2e` tests. Check the error: ❌ next to the commit hash (3fb1057) * Commit 3: Tentative fix. Although there is a ❌ next to the commit hash (16a46ea), if you click on it, you'll see that `e2e` are passing now. * Commit 4: revert commit2 * Commit 5: Move the check for "local node is blocking the chain" outside the pool, as suggested by @cason * Commit 6: Fixed unit tests All further commits: addressing other comments and tidying up the code --- #### PR checklist - [x] Tests written/updated - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - ~[ ] Updated relevant documentation (`docs/` or `spec/`) and code comments~ - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec <hr>This is an automatic backport of pull request #3406 done by [Mergify](https://mergify.com). --------- Co-authored-by: Sergio Mena <sergio@informal.systems> Co-authored-by: Daniel <daniel.cason@informal.systems>
…ing the chain (backport #3406) (#3422) Partially addresses #3415 The a node has no peers, blocksync gets stuck without switching to consesnus, because it needs info from other peers to have an idea of maximum height. However, there is an edge case (mainly when testing) where a validator might have >2/3 of the voting power and other validators are not started. In this case, we know we are blocking the chain, so we don't need to stay in blockchain if the only condition is that we don't have peers. Moreover, in order to block a chain, 1/3 of the voting power is enough, so the reasoning of this fix is the following: * _I am a node and I am starting... shall I run blocksync?_ * _Well, looks like I have 1/3 of the voting power (or more) at my current height... so there's no way the chain could advance in my absence... so I don't need to blocksync"_ Explanation of commits: * Commit 1: `e2e` testbed reproducing the issue * Commit 2: commit with a trivial change to trigger `e2e` tests. Check the error: ❌ next to the commit hash (3fb1057) * Commit 3: Tentative fix. Although there is a ❌ next to the commit hash (16a46ea), if you click on it, you'll see that `e2e` are passing now. * Commit 4: revert commit2 * Commit 5: Move the check for "local node is blocking the chain" outside the pool, as suggested by @cason * Commit 6: Fixed unit tests All further commits: addressing other comments and tidying up the code --- #### PR checklist - [x] Tests written/updated - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - ~[ ] Updated relevant documentation (`docs/` or `spec/`) and code comments~ - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec <hr>This is an automatic backport of pull request #3406 done by [Mergify](https://mergify.com). --------- Co-authored-by: Sergio Mena <sergio@informal.systems> Co-authored-by: Daniel <daniel.cason@informal.systems>
Contributes to #3415 This is mainly refactoring to simplify `onlyValidatorIsUs` and `localNodeBlocksTheChain` (since the latter implies the former). It is a follow-up of #3406 (this is the part of #3406 that doesn't need to be backported) --- #### PR checklist - ~[ ] Tests written/updated~ - ~[ ] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog)~ - ~[ ] Updated relevant documentation (`docs/` or `spec/`) and code comments~ --------- Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
…ing the chain (backport cometbft#3406) (cometbft#3421) Partially addresses cometbft#3415 The a node has no peers, blocksync gets stuck without switching to consesnus, because it needs info from other peers to have an idea of maximum height. However, there is an edge case (mainly when testing) where a validator might have >2/3 of the voting power and other validators are not started. In this case, we know we are blocking the chain, so we don't need to stay in blockchain if the only condition is that we don't have peers. Moreover, in order to block a chain, 1/3 of the voting power is enough, so the reasoning of this fix is the following: * _I am a node and I am starting... shall I run blocksync?_ * _Well, looks like I have 1/3 of the voting power (or more) at my current height... so there's no way the chain could advance in my absence... so I don't need to blocksync"_ Explanation of commits: * Commit 1: `e2e` testbed reproducing the issue * Commit 2: commit with a trivial change to trigger `e2e` tests. Check the error: ❌ next to the commit hash (3fb1057) * Commit 3: Tentative fix. Although there is a ❌ next to the commit hash (16a46ea), if you click on it, you'll see that `e2e` are passing now. * Commit 4: revert commit2 * Commit 5: Move the check for "local node is blocking the chain" outside the pool, as suggested by @cason * Commit 6: Fixed unit tests All further commits: addressing other comments and tidying up the code --- #### PR checklist - [x] Tests written/updated - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - ~[ ] Updated relevant documentation (`docs/` or `spec/`) and code comments~ - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec <hr>This is an automatic backport of pull request cometbft#3406 done by [Mergify](https://mergify.com). --------- Co-authored-by: Sergio Mena <sergio@informal.systems> Co-authored-by: Daniel <daniel.cason@informal.systems>
* Added votes to header + added secp256k1 + other changes * updated import * txHash fix+update canonical rep * removed sig size * docs: fix consensus spec formatting (cometbft#3804) * abci/server: recover from app panics in socket server (cometbft#3809) fixes cometbft#3800 * abci/client: fix DATA RACE in gRPC client (cometbft#3798) * Remove go func {}() closes #357 - Remove go func(){}() that caused race condiditon - To reproduce - add -race in make file to `install_abci` - Remove `CGO_ENABLED=0` & add -race to `install` Signed-off-by: Marko Baricevic <marbar3778@yahoo.com> * remove -race * fix data race also, reorder callbacks similarly to socket client * docs: "Writing a built-in Tendermint Core application in Go" guide (cometbft#3608) * docs: go built-in guide * fix package imports, add badger db, simplify Query * newTendermint function * working example * finish the first guide * add one more note * add the second Golang guide - external ABCI app * fix typos * libs: Remove db from tendermint in favor of tendermint/tm-cmn (cometbft#3811) * Remove db from tendemrint in favor of tendermint/tm-cmn - remove db from `libs` - update dependancy, there have been no breaking changes in the updated deps - https://github.com/grpc/grpc-go/releases - https://github.com/golang/protobuf/releases Signed-off-by: Marko Baricevic <marbar3778@yahoo.com> * changelog add * gofmt * more gofmt * docs: add A TOC to the Readme.md of ADR Section (#3820) * ADR TOC in readme.md * Added A TOC to the Readme.md of ADR Section - Added table of contents to the Readme of the architecture section. - Easier to traverse and when you know what is there. - If the Adr's become viewable online it would help guide the user Signed-off-by: Marko Baricevic <marbar3778@yahoo.com> * add tm-cmn to subprojects * normalize word * rpc: make max_body_bytes and max_header_bytes configurable (cometbft#3818) * rpc: make max_body_bytes and max_header_bytes configurable * update changelog pending * p2p/conn: Add Bufferpool (cometbft#3664) * use byte buffer pool to decreass allocs * wrap to put buffer in defer * wapper defer * add dependency * remove Gopkg,* * add change log * rpc: /broadcast_evidence (cometbft#3481) * implement broadcast_duplicate_vote endpoint * fix test_cover * address comments * address comments * Update abci/example/kvstore/persistent_kvstore.go Co-Authored-By: mossid <torecursedivine@gmail.com> * Update rpc/client/main_test.go Co-Authored-By: mossid <torecursedivine@gmail.com> * address comments in progress * reformat the code * make linter happy * make tests pass * replace BroadcastDuplicateVote with BroadcastEvidence * fix test * fix endpoint name * improve doc * fix TestBroadcastEvidenceDuplicateVote * Update rpc/core/evidence.go Co-Authored-By: Thane Thomson <connect@thanethomson.com> * add changelog entry * fix TestBroadcastEvidenceDuplicateVote * mempool: make max_msg_bytes configurable (cometbft#3826) * mempool: make max_msg_bytes configurable * apply suggestions from code review * update changelog pending * apply suggestions from code review again * rpc: return err if page is incorrect (less than 0 or greater than tot… (cometbft#3825) * rpc: return err if page is incorrect (less than 0 or greater than total pages) Fixes cometbft#3813 * fix rpc_test * blockchain: Reorg reactor (cometbft#3561) * go routines in blockchain reactor * Added reference to the go routine diagram * Initial commit * cleanup * Undo testing_logger change, committed by mistake * Fix the test loggers * pulled some fsm code into pool.go * added pool tests * changes to the design added block requests under peer moved the request trigger in the reactor poolRoutine, triggered now by a ticker in general moved everything required for making block requests smarter in the poolRoutine added a simple map of heights to keep track of what will need to be requested next added a few more tests * send errors to FSM in a different channel than blocks send errors (RemovePeer) from switch on a different channel than the one receiving blocks renamed channels added more pool tests * more pool tests * lint errors * more tests * more tests * switch fast sync to new implementation * fixed data race in tests * cleanup * finished fsm tests * address golangci comments :) * address golangci comments :) * Added timeout on next block needed to advance * updating docs and cleanup * fix issue in test from previous cleanup * cleanup * Added termination scenarios, tests and more cleanup * small fixes to adr, comments and cleanup * Fix bug in sendRequest() If we tried to send a request to a peer not present in the switch, a missing continue statement caused the request to be blackholed in a peer that was removed and never retried. While this bug was manifesting, the reactor kept asking for other blocks that would be stored and never consumed. Added the number of unconsumed blocks in the math for requesting blocks ahead of current processing height so eventually there will be no more blocks requested until the already received ones are consumed. * remove bpPeer's didTimeout field * Use distinct err codes for peer timeout and FSM timeouts * Don't allow peers to update with lower height * review comments from Ethan and Zarko * some cleanup, renaming, comments * Move block execution in separate goroutine * Remove pool's numPending * review comments * fix lint, remove old blockchain reactor and duplicates in fsm tests * small reorg around peer after review comments * add the reactor spec * verify block only once * review comments * change to int for max number of pending requests * cleanup and godoc * Add configuration flag fast sync version * golangci fixes * fix config template * move both reactor versions under blockchain * cleanup, golint, renaming stuff * updated documentation, fixed more golint warnings * integrate with behavior package * sync with master * gofmt * add changelog_pending entry * move to improvments * suggestion to changelog entry * Renamed wire.go to codec.go (cometbft#3827) * Renamed wire.go to codec.go - Wire was the previous name of amino - Codec describes the file better than `wire` & `amino` Signed-off-by: Marko Baricevic <marbar3778@yahoo.com> * ide error * rename amino.go to codec.go * docs: add guides to docs (cometbft#3830) * add staticcheck linting (cometbft#3828) cleanup to add linter grpc change: https://godoc.org/google.golang.org/grpc#WithContextDialer https://godoc.org/google.golang.org/grpc#WithDialer grpc/grpc-go#2627 prometheous change: due to UninstrumentedHandler, being deprecated in the future empty branch = empty if or else statement didn't delete them entirely but commented couldn't find a reason to have them could not replicate the issue cometbft#3406 but if want to keep it commented then we should comment out the if statement as well * types: move MakeVote / MakeBlock functions (cometbft#3819) to the types package Paritally Fixes cometbft#3584 * p2p: Fix error logging for connection stop (cometbft#3824) * p2p: fix false-positive error logging when stopping connections This changeset fixes two types of false-positive errors occurring during connection shutdown. The first occurs when the process invokes FlushStop() or Stop() on a connection. While the previous behavior did properly wait for the sendRoutine to finish, it did not notify the recvRoutine that the connection was shutting down. This would cause the recvRouting to receive and error when reading and log this error. The changeset fixes this by notifying the recvRoutine that the connection is shutting down. The second occurs when the connection is terminated (gracefully) by the other side. The recvRoutine would get an EOF error during the read, log it, and stop the connection with an error. The changeset detects EOF and gracefully shuts down the connection. * bring back the comment about flushing * add changelog entry * listen for quitRecvRoutine too * we have to call stopForError Otherwise peer won't be removed from the peer set and maybe readded later. * p2p: Do not write 'Couldn't connect to any seeds' if there are no seeds (cometbft#3834) * Do not write 'Couldn't connect to any seeds' if there are no seeds * changelog * remove privValUpgrade * Fix typo in changelog * Update CHANGELOG_PENDING.md Co-Authored-By: Marko <marbar3778@yahoo.com> I'm setting up all peers dynamically by calling dial_peers, so p2p.seeds in configs is empty, and I'm seeing error log a lot in logs. * docs: add a footer to guides (cometbft#3835) * docs: "Writing a Tendermint Core application in Kotlin (gRPC)" guide (cometbft#3838) * add abci grpc kotlin guide * Update docs/guides/kotlin.md Co-Authored-By: Anton Kaliaev <anton.kalyaev@gmail.com> * Update docs/guides/kotlin.md Co-Authored-By: Anton Kaliaev <anton.kalyaev@gmail.com> * Update docs/guides/kotlin.md Co-Authored-By: Anton Kaliaev <anton.kalyaev@gmail.com> * Update kotlin.md * node: allow replacing existing p2p.Reactor(s) (cometbft#3846) * node: allow replacing existing p2p.Reactor(s) using [`CustomReactors` option](https://godoc.org/github.com/tendermint/tendermint/node#CustomReactors). Warning: beware of accidental name clashes. Here is the list of existing reactors: MEMPOOL, BLOCKCHAIN, CONSENSUS, EVIDENCE, PEX. * check the absence of "CUSTOM" prefix * merge 2 tests * add doc.go to node package * gocritic (1/2) (cometbft#3836) Add gocritic as a linter The linting is not complete, but should i complete in this PR or in a following. 23 files have been touched so it may be better to do in a following PR Commits: * Add gocritic to linting - Added gocritic to linting Signed-off-by: Marko Baricevic <marbar3778@yahoo.com> * gocritic * pr comments * remove switch in cmdBatch * tm-cmn to tm-db (cometbft#3850) * tm-cmn to tm-db * go.mod changes * go.mod changes * more go.mod * fix tm-db * ci fix, pending change * version tmdb (cometbft#3854) * txindexer: Refactor Tx Search Aggregation (cometbft#3851) - Replace the previous intersect call, which was called at each query condition, with a map intersection. - Replace fmt.Sprintf with string() closes: cometbft#3076 Benchmarks ``` Old goos: darwin goarch: amd64 pkg: github.com/tendermint/tendermint/state/txindex/kv BenchmarkTxSearch-4 200 103641206 ns/op 7998416 B/op 71171 allocs/op PASS ok github.com/tendermint/tendermint/state/txindex/kv 26.019s New goos: darwin goarch: amd64 pkg: github.com/tendermint/tendermint/state/txindex/kv BenchmarkTxSearch-4 1000 38615024 ns/op 13515226 B/op 166460 allocs/op PASS ok github.com/tendermint/tendermint/state/txindex/kv 53.618s ``` ~62% performance improvement Commits: * Refactor tx search * Add pending changelog entry * Add tx search benchmarking * remove intermediate hashes list also reset timer in BenchmarkTxSearch and fix other benchmark * fix import * Add test cases * Fix searching * Replace fmt.Sprintf with string * Update state/txindex/kv/kv.go Co-Authored-By: Anton Kaliaev <anton.kalyaev@gmail.com> * Rename params * Cleanup * Check error in benchmarks * release for v0.32.2 * Merge PR cometbft#3860: Update log v0.32.2 * changelog updates * pr comments * Fix for panic in signature verification if a peer sends a nil public key. * update version.go * Changelog update * Update CHANGELOG.md Co-Authored-By: Anton Kaliaev <anton.kalyaev@gmail.com> * update changelog * p2p: only allow ed25519 pubkeys when connecting also, recover from any possible failures in acceptPeers Refs cometbft#4030 * update changelog and bump version to v0.32.6 * set the date to today * cs: panic only when WAL#WriteSync fails - modify WAL#Write and WAL#WriteSync to return an error * types: validate Part#Proof add ValidateBasic to crypto/merkle/SimpleProof * cs: limit max bit array size and block parts count * cs: test new limits * cs: only assert important stuff * update changelog and bump version to 0.32.7 * fixes after Ethan's review * align max wal msg and max consensus msg sizes * fix tests * fix test * use bor * add data in commit * remove votes from header * new: add proposal results in vote * fix: go mod * new: add sidechannel proto objects * new: add begin side blocker and deliver side tx * new: add side tx results in begin side block * add: add side tx results into request begin side-block * chg: add address in sig object * chg: add events in side block * chg: allow empty sig * chg: add flag to execute side-tx while not syncing * chg: remove data from vote * fix: use last byte on bigendian bytes * fix: call sidetx result for string method * feat: add rollback feature * Use bor version v0.2.16 * Change log level tag from a single character to a full word This will change logging format from: D[2016-05-02|11:06:44.322] to: DEBUG[2016-05-02|11:06:44.322] The purpose is to unify the logging with bor. * consensus,scripts,state,store,types: change PartSetHeader total to uint32 * libs/log: add warn log level (cometbft#27) * libs/log: add warn log level * mardizzone/POS-1609: dev: chg: bump btcd dep and solve related issues * mardizzone/POS-1609: dev: chg: solve vulnerabilities associated with some packages * mardizzone/POS-1609: dev: chg: update bor version and replace tm-db * mardizzone/POS-1609: dev: chg: bump go version * mardizzone/POS-1609: dev: chg: bump go version to latest patch * Changed the value of default maxNumInboundPeers and maxNumOutboundPeers * made Stopping peer for error log as debig (cometbft#30) * made dialing failed log as debug (cometbft#31) * Added log to print number of peers (cometbft#32) * added log to print number of peers * update * peppermint: changes to crypto * Modified NewFilePV to generate secp256k1 * (temporarily) allow both tendermint/P*KeySecp256k1 and comet/P*KeySecp256k1Uncompressed to ease migration * Forward-port disabled `MaxSignatureSize` checks (+ new ones needed) * cherry pick secp256k1 migration commits + go mod tidy * blocksync,consensus,crypto,libs,types: fix tests and more conflicts * consensus,libs,types: fix tests, vulns from govuln and some lint errors * ci: bump go version to 1.21.4 * Fixed `TestPubKeySecp256k1Address` * crypto: enforce curve group order checks in genPrivKey * abci,crypto: fix conflicts and tests * types: fix TestInvalidPrecommitExtensions * fix lint * Extend kvstore example add with with key types * Fix `TestReactorValidatorSetChanges` * Fix UTs in `execution_test.go` * Fix `TestEvidencePoolBasic` * Fix `TestVoteExtension` * test/e2e: use go 1.21.4 in docker * test/e2e: use secp256k1 as default key type in testnet setup * p2p/conn: use secp256k1 for p2p authentication * p2p/conn: allow both secp256k1 and ed25519 key types for authentication * all: address PR comments * types,blocksync: fix lint + tests + bump deps complained by govuln * crypto,state,test: resolve conflicts from v0.38.5 * abci: resolve conflicts from v0.38.5 * resolve go mod deps * Revert "Merge branch 'v0.38.5-upstream' into raneet10/peppermint-changes" This reverts commit 2706fc9, reversing changes made to e404e0f. * Revert "Revert "Merge branch 'v0.38.5-upstream' into raneet10/peppermint-changes"" This reverts commit fc56973. * all: fix issue from merge * docs: remove Warn log definition from ADR * state: remove outdated comments * types: increase MaxSignatureSize to 65 and unskip related tests * cmd: minor refactor Co-authored-by: Sergio Mena <sergio@informal.systems> * libs/protoio: minor refactor Co-authored-by: Sergio Mena <sergio@informal.systems> * libs/pubsub: minor refactor Co-authored-by: Sergio Mena <sergio@informal.systems> * state: minor refactor Co-authored-by: Sergio Mena <sergio@informal.systems> * state: minor restructure in test Co-authored-by: Sergio Mena <sergio@informal.systems> * types: fix TestMaxCommitBytes + lint * state,types: fix TestTxFilter and TestBlockMaxDataBytes * types: fix TestBlockMaxDataBytesNoEvidence * types: fix TestInvalidPrecommitExtensions * abci,types: address comments * crypto,proto: add secp256k1_uncompressed oneof in PublicKey proto message type * remove revive from .golangci.yml * remove replace of go-ethereum dep with bor and go mod tidy --------- Co-authored-by: vaibhavchellani <vaibhavchellani223@gmail.com> Co-authored-by: Alex Dupre <sysadmin@alexdupre.com> Co-authored-by: Roman Useinov <roman.useinov@gmail.com> Co-authored-by: Marko <marbar3778@yahoo.com> Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com> Co-authored-by: Jun Kimura <junkxdev@gmail.com> Co-authored-by: zjubfd <296179868@qq.com> Co-authored-by: Anca Zamfir <ancazamfir@users.noreply.github.com> Co-authored-by: folex <0xdxdy@gmail.com> Co-authored-by: Ivan Kushmantsev <kushmantsev@gmail.com> Co-authored-by: Alexander Bezobchuk <alexanderbez@users.noreply.github.com> Co-authored-by: Ethan Buchman <ethan@coinculture.info> Co-authored-by: Zaki Manian <zaki@manian.org> Co-authored-by: Zaki Manian <zaki@tendermint.com> Co-authored-by: Jaynti Kanani <jdkanani@gmail.com> Co-authored-by: Sai Kumar <sai@vitwit.com> Co-authored-by: Krishna Upadhyaya <krishnau1604@gmail.com> Co-authored-by: Jerry <jerrycgh@gmail.com> Co-authored-by: Anshal Shukla <53994948+anshalshukla@users.noreply.github.com> Co-authored-by: marcello33 <marcelloardizzone@hotmail.it> Co-authored-by: Vaibhav Jindal <vaibhavjindal29@gmail.com> Co-authored-by: VaibhavJindal <74560896+VAIBHAVJINDAL3012@users.noreply.github.com> Co-authored-by: Pratik Patil <pratikspatil024@gmail.com> Co-authored-by: Sergio Mena <sergio@informal.systems>
Partially addresses #3415
The a node has no peers, blocksync gets stuck without switching to consesnus, because it needs info from other peers to have an idea of maximum height.
However, there is an edge case (mainly when testing) where a validator might have >2/3 of the voting power and other validators are not started. In this case, we know we are blocking the chain, so we don't need to stay in blockchain if the only condition is that we don't have peers.
Moreover, in order to block a chain, 1/3 of the voting power is enough, so the reasoning of this fix is the following:
Explanation of commits:
e2etestbed reproducing the issuee2etests. Check the error: ❌ next to the commit hash (3fb1057)e2eare passing now.All further commits: addressing other comments and tidying up the code
PR checklist
.changelog(we use unclog to manage our changelog)[ ] Updated relevant documentation (docs/orspec/) and code comments