Skip to content

fix(blocksync)!: don't block in blocksync if our voting power is blocking the chain#3406

Merged
sergio-mena merged 14 commits intomainfrom
sergio/blocksync-stalled-no-peers
Jul 4, 2024
Merged

fix(blocksync)!: don't block in blocksync if our voting power is blocking the chain#3406
sergio-mena merged 14 commits intomainfrom
sergio/blocksync-stalled-no-peers

Conversation

@sergio-mena
Copy link
Collaborator

@sergio-mena sergio-mena commented Jul 3, 2024

Partially addresses #3415

The a node has no peers, blocksync gets stuck without switching to consesnus, because it needs info from other peers to have an idea of maximum height.

However, there is an edge case (mainly when testing) where a validator might have >2/3 of the voting power and other validators are not started. In this case, we know we are blocking the chain, so we don't need to stay in blockchain if the only condition is that we don't have peers.

Moreover, in order to block a chain, 1/3 of the voting power is enough, so the reasoning of this fix is the following:

  • I am a node and I am starting... shall I run blocksync?
  • Well, looks like I have 1/3 of the voting power (or more) at my current height... so there's no way the chain could advance in my absence... so I don't need to blocksync"

Explanation of commits:

  • Commit 1: e2e testbed reproducing the issue
  • Commit 2: commit with a trivial change to trigger e2e tests. Check the error: ❌ next to the commit hash (3fb1057)
  • Commit 3: Tentative fix. Although there is a ❌ next to the commit hash (16a46ea), if you click on it, you'll see that e2e are passing now.
  • Commit 4: revert commit2
  • Commit 5: Move the check for "local node is blocking the chain" outside the pool, as suggested by @cason
  • Commit 6: Fixed unit tests

All further commits: addressing other comments and tidying up the code


PR checklist

  • Tests written/updated
  • Changelog entry added in .changelog (we use unclog to manage our changelog)
  • [ ] Updated relevant documentation (docs/ or spec/) and code comments
  • Title follows the Conventional Commits spec

@sergio-mena sergio-mena self-assigned this Jul 3, 2024
@sergio-mena sergio-mena force-pushed the sergio/blocksync-stalled-no-peers branch from 69ccf42 to 0160866 Compare July 3, 2024 11:54
Copy link

@cason cason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this workaround.

We should now whether we should run block sync outside the protocol. But, ok, it works. But by changing the block Reactor constructor, we breaking a lot of code.

@sergio-mena sergio-mena marked this pull request as ready for review July 3, 2024 17:55
@sergio-mena sergio-mena requested a review from a team as a code owner July 3, 2024 17:55
@sergio-mena sergio-mena requested a review from a team July 3, 2024 17:55
@sergio-mena sergio-mena added bug Something isn't working block-sync labels Jul 3, 2024
Copy link

@cason cason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would approve, but the >=1/3 vs >2/3 question remains open.

See associated comment (line 515).

@ValarDragon
Copy link
Contributor

I think it's definitely safe to backport, it can't really affect mainnets as you need one Val w/ over 1/3 to do anything. (And it only helps users right now if it's on the 38 line)

@cason
Copy link

cason commented Jul 4, 2024

Is this safe to backport to 0.38/v1 (post-rc1 v1)?

We need to find a solution for v0.37.x too...

@sergio-mena
Copy link
Collaborator Author

Is this safe to backport to 0.38/v1 (post-rc1 v1)?

To me, it's a bug, so unless there is big risk identified I'd backport it. Besides, this is clearly holding teams back, which are on v0.38.x/v0.37.x. Please reply if you don't agree.

@sergio-mena sergio-mena added this pull request to the merge queue Jul 4, 2024
@sergio-mena sergio-mena added backport-to-v0.37.x backport-to-v0.38.x Tell Mergify to backport the PR to v0.38.x labels Jul 4, 2024
Merged via the queue into main with commit bd95579 Jul 4, 2024
@sergio-mena sergio-mena deleted the sergio/blocksync-stalled-no-peers branch July 4, 2024 08:54
mergify bot pushed a commit that referenced this pull request Jul 4, 2024
…king the chain (#3406)

Partially addresses #3415

The a node has no peers, blocksync gets stuck without switching to
consesnus, because it needs info from other peers to have an idea of
maximum height.

However, there is an edge case (mainly when testing) where a validator
might have >2/3 of the voting power and other validators are not
started. In this case, we know we are blocking the chain, so we don't
need to stay in blockchain if the only condition is that we don't have
peers.

Moreover, in order to block a chain, 1/3 of the voting power is enough,
so the reasoning of this fix is the following:

* _I am a node and I am starting... shall I run blocksync?_
* _Well, looks like I have 1/3 of the voting power (or more) at my
current height... so there's no way the chain could advance in my
absence... so I don't need to blocksync"_

Explanation of commits:

* Commit 1: `e2e` testbed reproducing the issue
* Commit 2: commit with a trivial change to trigger `e2e` tests. Check
the error: ❌ next to the commit hash (3fb1057)
* Commit 3: Tentative fix. Although there is a ❌ next to the commit hash
(16a46ea), if you click on it, you'll see that `e2e` are passing now.
* Commit 4: revert commit2
* Commit 5: Move the check for "local node is blocking the chain"
outside the pool, as suggested by @cason
* Commit 6: Fixed unit tests

All further commits: addressing other comments and tidying up the code

---

#### PR checklist

- [x] Tests written/updated
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- ~[ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments~
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec

---------

Co-authored-by: Daniel <daniel.cason@informal.systems>
(cherry picked from commit bd95579)

# Conflicts:
#	internal/blocksync/reactor.go
mergify bot pushed a commit that referenced this pull request Jul 4, 2024
…king the chain (#3406)

Partially addresses #3415

The a node has no peers, blocksync gets stuck without switching to
consesnus, because it needs info from other peers to have an idea of
maximum height.

However, there is an edge case (mainly when testing) where a validator
might have >2/3 of the voting power and other validators are not
started. In this case, we know we are blocking the chain, so we don't
need to stay in blockchain if the only condition is that we don't have
peers.

Moreover, in order to block a chain, 1/3 of the voting power is enough,
so the reasoning of this fix is the following:

* _I am a node and I am starting... shall I run blocksync?_
* _Well, looks like I have 1/3 of the voting power (or more) at my
current height... so there's no way the chain could advance in my
absence... so I don't need to blocksync"_

Explanation of commits:

* Commit 1: `e2e` testbed reproducing the issue
* Commit 2: commit with a trivial change to trigger `e2e` tests. Check
the error: ❌ next to the commit hash (3fb1057)
* Commit 3: Tentative fix. Although there is a ❌ next to the commit hash
(16a46ea), if you click on it, you'll see that `e2e` are passing now.
* Commit 4: revert commit2
* Commit 5: Move the check for "local node is blocking the chain"
outside the pool, as suggested by @cason
* Commit 6: Fixed unit tests

All further commits: addressing other comments and tidying up the code

---

#### PR checklist

- [x] Tests written/updated
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- ~[ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments~
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec

---------

Co-authored-by: Daniel <daniel.cason@informal.systems>
(cherry picked from commit bd95579)

# Conflicts:
#	.changelog/v0.38.3/bug-fixes/3406-blocksync-dont-stall-if-blocking-chain.md
#	blocksync/reactor.go
#	blocksync/reactor_test.go
#	node/node.go
mergify bot pushed a commit that referenced this pull request Jul 4, 2024
…king the chain (#3406)

Partially addresses #3415

The a node has no peers, blocksync gets stuck without switching to
consesnus, because it needs info from other peers to have an idea of
maximum height.

However, there is an edge case (mainly when testing) where a validator
might have >2/3 of the voting power and other validators are not
started. In this case, we know we are blocking the chain, so we don't
need to stay in blockchain if the only condition is that we don't have
peers.

Moreover, in order to block a chain, 1/3 of the voting power is enough,
so the reasoning of this fix is the following:

* _I am a node and I am starting... shall I run blocksync?_
* _Well, looks like I have 1/3 of the voting power (or more) at my
current height... so there's no way the chain could advance in my
absence... so I don't need to blocksync"_

Explanation of commits:

* Commit 1: `e2e` testbed reproducing the issue
* Commit 2: commit with a trivial change to trigger `e2e` tests. Check
the error: ❌ next to the commit hash (3fb1057)
* Commit 3: Tentative fix. Although there is a ❌ next to the commit hash
(16a46ea), if you click on it, you'll see that `e2e` are passing now.
* Commit 4: revert commit2
* Commit 5: Move the check for "local node is blocking the chain"
outside the pool, as suggested by @cason
* Commit 6: Fixed unit tests

All further commits: addressing other comments and tidying up the code

---

#### PR checklist

- [x] Tests written/updated
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- ~[ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments~
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec

---------

Co-authored-by: Daniel <daniel.cason@informal.systems>
(cherry picked from commit bd95579)

# Conflicts:
#	blocksync/reactor_test.go
#	internal/blocksync/reactor.go
#	node/node.go
#	node/setup.go
sergio-mena added a commit that referenced this pull request Jul 4, 2024
sergio-mena added a commit that referenced this pull request Jul 4, 2024
…king the chain (#3406)

Partially addresses #3415

The a node has no peers, blocksync gets stuck without switching to
consesnus, because it needs info from other peers to have an idea of
maximum height.

However, there is an edge case (mainly when testing) where a validator
might have >2/3 of the voting power and other validators are not
started. In this case, we know we are blocking the chain, so we don't
need to stay in blockchain if the only condition is that we don't have
peers.

Moreover, in order to block a chain, 1/3 of the voting power is enough,
so the reasoning of this fix is the following:

* _I am a node and I am starting... shall I run blocksync?_
* _Well, looks like I have 1/3 of the voting power (or more) at my
current height... so there's no way the chain could advance in my
absence... so I don't need to blocksync"_

Explanation of commits:

* Commit 1: `e2e` testbed reproducing the issue
* Commit 2: commit with a trivial change to trigger `e2e` tests. Check
the error: ❌ next to the commit hash (3fb1057)
* Commit 3: Tentative fix. Although there is a ❌ next to the commit hash
(16a46ea), if you click on it, you'll see that `e2e` are passing now.
* Commit 4: revert commit2
* Commit 5: Move the check for "local node is blocking the chain"
outside the pool, as suggested by @cason
* Commit 6: Fixed unit tests

All further commits: addressing other comments and tidying up the code

---

- [x] Tests written/updated
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- ~[ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments~
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec

---------

Co-authored-by: Daniel <daniel.cason@informal.systems>
melekes pushed a commit that referenced this pull request Jul 5, 2024
…king the chain (backport #3406) (#3420)

Partially addresses #3415

The a node has no peers, blocksync gets stuck without switching to
consesnus, because it needs info from other peers to have an idea of
maximum height.

However, there is an edge case (mainly when testing) where a validator
might have >2/3 of the voting power and other validators are not
started. In this case, we know we are blocking the chain, so we don't
need to stay in blockchain if the only condition is that we don't have
peers.

Moreover, in order to block a chain, 1/3 of the voting power is enough,
so the reasoning of this fix is the following:

* _I am a node and I am starting... shall I run blocksync?_
* _Well, looks like I have 1/3 of the voting power (or more) at my
current height... so there's no way the chain could advance in my
absence... so I don't need to blocksync"_

Explanation of commits:

* Commit 1: `e2e` testbed reproducing the issue
* Commit 2: commit with a trivial change to trigger `e2e` tests. Check
the error: ❌ next to the commit hash (3fb1057)
* Commit 3: Tentative fix. Although there is a ❌ next to the commit hash
(16a46ea), if you click on it, you'll see that `e2e` are passing now.
* Commit 4: revert commit2
* Commit 5: Move the check for "local node is blocking the chain"
outside the pool, as suggested by @cason
* Commit 6: Fixed unit tests

All further commits: addressing other comments and tidying up the code

---

#### PR checklist

- [x] Tests written/updated
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- ~[ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments~
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
<hr>This is an automatic backport of pull request #3406 done by
[Mergify](https://mergify.com).

---------

Co-authored-by: Sergio Mena <sergio@informal.systems>
Co-authored-by: Daniel <daniel.cason@informal.systems>
sergio-mena added a commit that referenced this pull request Jul 5, 2024
sergio-mena added a commit that referenced this pull request Jul 5, 2024
…king the chain (#3406)

Partially addresses #3415

The a node has no peers, blocksync gets stuck without switching to
consesnus, because it needs info from other peers to have an idea of
maximum height.

However, there is an edge case (mainly when testing) where a validator
might have >2/3 of the voting power and other validators are not
started. In this case, we know we are blocking the chain, so we don't
need to stay in blockchain if the only condition is that we don't have
peers.

Moreover, in order to block a chain, 1/3 of the voting power is enough,
so the reasoning of this fix is the following:

* _I am a node and I am starting... shall I run blocksync?_
* _Well, looks like I have 1/3 of the voting power (or more) at my
current height... so there's no way the chain could advance in my
absence... so I don't need to blocksync"_

Explanation of commits:

* Commit 1: `e2e` testbed reproducing the issue
* Commit 2: commit with a trivial change to trigger `e2e` tests. Check
the error: ❌ next to the commit hash (3fb1057)
* Commit 3: Tentative fix. Although there is a ❌ next to the commit hash
(16a46ea), if you click on it, you'll see that `e2e` are passing now.
* Commit 4: revert commit2
* Commit 5: Move the check for "local node is blocking the chain"
outside the pool, as suggested by @cason
* Commit 6: Fixed unit tests

All further commits: addressing other comments and tidying up the code

---

- [x] Tests written/updated
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- ~[ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments~
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec

---------

Co-authored-by: Daniel <daniel.cason@informal.systems>
sergio-mena added a commit that referenced this pull request Jul 5, 2024
sergio-mena added a commit that referenced this pull request Jul 5, 2024
…king the chain (#3406)

Partially addresses #3415

The a node has no peers, blocksync gets stuck without switching to
consesnus, because it needs info from other peers to have an idea of
maximum height.

However, there is an edge case (mainly when testing) where a validator
might have >2/3 of the voting power and other validators are not
started. In this case, we know we are blocking the chain, so we don't
need to stay in blockchain if the only condition is that we don't have
peers.

Moreover, in order to block a chain, 1/3 of the voting power is enough,
so the reasoning of this fix is the following:

* _I am a node and I am starting... shall I run blocksync?_
* _Well, looks like I have 1/3 of the voting power (or more) at my
current height... so there's no way the chain could advance in my
absence... so I don't need to blocksync"_

Explanation of commits:

* Commit 1: `e2e` testbed reproducing the issue
* Commit 2: commit with a trivial change to trigger `e2e` tests. Check
the error: ❌ next to the commit hash (3fb1057)
* Commit 3: Tentative fix. Although there is a ❌ next to the commit hash
(16a46ea), if you click on it, you'll see that `e2e` are passing now.
* Commit 4: revert commit2
* Commit 5: Move the check for "local node is blocking the chain"
outside the pool, as suggested by @cason
* Commit 6: Fixed unit tests

All further commits: addressing other comments and tidying up the code

---

- [x] Tests written/updated
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- ~[ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments~
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec

---------

Co-authored-by: Daniel <daniel.cason@informal.systems>
sergio-mena added a commit that referenced this pull request Jul 5, 2024
…ing the chain (backport #3406) (#3421)

Partially addresses #3415

The a node has no peers, blocksync gets stuck without switching to
consesnus, because it needs info from other peers to have an idea of
maximum height.

However, there is an edge case (mainly when testing) where a validator
might have >2/3 of the voting power and other validators are not
started. In this case, we know we are blocking the chain, so we don't
need to stay in blockchain if the only condition is that we don't have
peers.

Moreover, in order to block a chain, 1/3 of the voting power is enough,
so the reasoning of this fix is the following:

* _I am a node and I am starting... shall I run blocksync?_
* _Well, looks like I have 1/3 of the voting power (or more) at my
current height... so there's no way the chain could advance in my
absence... so I don't need to blocksync"_

Explanation of commits:

* Commit 1: `e2e` testbed reproducing the issue
* Commit 2: commit with a trivial change to trigger `e2e` tests. Check
the error: ❌ next to the commit hash (3fb1057)
* Commit 3: Tentative fix. Although there is a ❌ next to the commit hash
(16a46ea), if you click on it, you'll see that `e2e` are passing now.
* Commit 4: revert commit2
* Commit 5: Move the check for "local node is blocking the chain"
outside the pool, as suggested by @cason
* Commit 6: Fixed unit tests

All further commits: addressing other comments and tidying up the code

---

#### PR checklist

- [x] Tests written/updated
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- ~[ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments~
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
<hr>This is an automatic backport of pull request #3406 done by
[Mergify](https://mergify.com).

---------

Co-authored-by: Sergio Mena <sergio@informal.systems>
Co-authored-by: Daniel <daniel.cason@informal.systems>
sergio-mena added a commit that referenced this pull request Jul 5, 2024
…ing the chain (backport #3406) (#3422)

Partially addresses #3415

The a node has no peers, blocksync gets stuck without switching to
consesnus, because it needs info from other peers to have an idea of
maximum height.

However, there is an edge case (mainly when testing) where a validator
might have >2/3 of the voting power and other validators are not
started. In this case, we know we are blocking the chain, so we don't
need to stay in blockchain if the only condition is that we don't have
peers.

Moreover, in order to block a chain, 1/3 of the voting power is enough,
so the reasoning of this fix is the following:

* _I am a node and I am starting... shall I run blocksync?_
* _Well, looks like I have 1/3 of the voting power (or more) at my
current height... so there's no way the chain could advance in my
absence... so I don't need to blocksync"_

Explanation of commits:

* Commit 1: `e2e` testbed reproducing the issue
* Commit 2: commit with a trivial change to trigger `e2e` tests. Check
the error: ❌ next to the commit hash (3fb1057)
* Commit 3: Tentative fix. Although there is a ❌ next to the commit hash
(16a46ea), if you click on it, you'll see that `e2e` are passing now.
* Commit 4: revert commit2
* Commit 5: Move the check for "local node is blocking the chain"
outside the pool, as suggested by @cason
* Commit 6: Fixed unit tests

All further commits: addressing other comments and tidying up the code

---

#### PR checklist

- [x] Tests written/updated
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- ~[ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments~
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
<hr>This is an automatic backport of pull request #3406 done by
[Mergify](https://mergify.com).

---------

Co-authored-by: Sergio Mena <sergio@informal.systems>
Co-authored-by: Daniel <daniel.cason@informal.systems>
github-merge-queue bot pushed a commit that referenced this pull request Sep 11, 2024
Contributes to #3415

This is mainly refactoring to simplify `onlyValidatorIsUs` and
`localNodeBlocksTheChain` (since the latter implies the former).
It is a follow-up of #3406 (this is the part of #3406 that doesn't need
to be backported)

---

#### PR checklist

- ~[ ] Tests written/updated~
- ~[ ] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)~
- ~[ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments~

---------

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
roy-dydx pushed a commit to dydxprotocol/cometbft that referenced this pull request Feb 3, 2025
…ing the chain (backport cometbft#3406) (cometbft#3421)

Partially addresses cometbft#3415

The a node has no peers, blocksync gets stuck without switching to
consesnus, because it needs info from other peers to have an idea of
maximum height.

However, there is an edge case (mainly when testing) where a validator
might have >2/3 of the voting power and other validators are not
started. In this case, we know we are blocking the chain, so we don't
need to stay in blockchain if the only condition is that we don't have
peers.

Moreover, in order to block a chain, 1/3 of the voting power is enough,
so the reasoning of this fix is the following:

* _I am a node and I am starting... shall I run blocksync?_
* _Well, looks like I have 1/3 of the voting power (or more) at my
current height... so there's no way the chain could advance in my
absence... so I don't need to blocksync"_

Explanation of commits:

* Commit 1: `e2e` testbed reproducing the issue
* Commit 2: commit with a trivial change to trigger `e2e` tests. Check
the error: ❌ next to the commit hash (3fb1057)
* Commit 3: Tentative fix. Although there is a ❌ next to the commit hash
(16a46ea), if you click on it, you'll see that `e2e` are passing now.
* Commit 4: revert commit2
* Commit 5: Move the check for "local node is blocking the chain"
outside the pool, as suggested by @cason
* Commit 6: Fixed unit tests

All further commits: addressing other comments and tidying up the code

---

#### PR checklist

- [x] Tests written/updated
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- ~[ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments~
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
<hr>This is an automatic backport of pull request cometbft#3406 done by
[Mergify](https://mergify.com).

---------

Co-authored-by: Sergio Mena <sergio@informal.systems>
Co-authored-by: Daniel <daniel.cason@informal.systems>
jmalicevic pushed a commit to informalsystems/cometbft that referenced this pull request May 14, 2025
* Added votes to header + added secp256k1 + other changes

* updated import

* txHash fix+update canonical rep

* removed sig size

* docs: fix consensus spec formatting (cometbft#3804)

* abci/server: recover from app panics in socket server (cometbft#3809)

fixes cometbft#3800

* abci/client: fix DATA RACE in gRPC client (cometbft#3798)

* Remove go func {}()

closes #357

- Remove go func(){}() that caused race condiditon

- To reproduce
	- add -race in make file to `install_abci`
	- Remove `CGO_ENABLED=0` & add -race to `install`

Signed-off-by: Marko Baricevic <marbar3778@yahoo.com>

* remove -race

* fix data race

also, reorder callbacks similarly to socket client

* docs: "Writing a built-in Tendermint Core application in Go" guide (cometbft#3608)

* docs: go built-in guide

* fix package imports, add badger db, simplify Query

* newTendermint function

* working example

* finish the first guide

* add one more note

* add the second Golang guide - external ABCI app

* fix typos

* libs: Remove db from tendermint in favor of tendermint/tm-cmn (cometbft#3811)

* Remove db from tendemrint in favor of tendermint/tm-cmn

- remove db from `libs`
- update dependancy, there have been no breaking changes in the updated deps
	- https://github.com/grpc/grpc-go/releases
	- https://github.com/golang/protobuf/releases

Signed-off-by: Marko Baricevic <marbar3778@yahoo.com>

* changelog add

* gofmt

* more gofmt

* docs: add A TOC to the Readme.md of ADR Section (#3820)

* ADR TOC in readme.md

* Added A TOC to the Readme.md of ADR Section

- Added table of contents to the Readme of the architecture section.
	- Easier to traverse and when you know what is there.
	- If the Adr's become viewable online it would help guide the user

Signed-off-by: Marko Baricevic <marbar3778@yahoo.com>

* add tm-cmn to subprojects

* normalize word

* rpc: make max_body_bytes and max_header_bytes configurable (cometbft#3818)

* rpc: make max_body_bytes and max_header_bytes configurable

* update changelog pending

* p2p/conn: Add Bufferpool (cometbft#3664)

* use byte buffer pool to decreass allocs

* wrap to put buffer in defer

* wapper defer

* add dependency

* remove Gopkg,*

* add change log

* rpc: /broadcast_evidence (cometbft#3481)

* implement broadcast_duplicate_vote endpoint

* fix test_cover

* address comments

* address comments

* Update abci/example/kvstore/persistent_kvstore.go

Co-Authored-By: mossid <torecursedivine@gmail.com>

* Update rpc/client/main_test.go

Co-Authored-By: mossid <torecursedivine@gmail.com>

* address comments in progress

* reformat the code

* make linter happy

* make tests pass

* replace BroadcastDuplicateVote with BroadcastEvidence

* fix test

* fix endpoint name

* improve doc

* fix TestBroadcastEvidenceDuplicateVote

* Update rpc/core/evidence.go

Co-Authored-By: Thane Thomson <connect@thanethomson.com>

* add changelog entry

* fix TestBroadcastEvidenceDuplicateVote

* mempool: make max_msg_bytes configurable (cometbft#3826)

* mempool: make max_msg_bytes configurable

* apply suggestions from code review

* update changelog pending

* apply suggestions from code review again

* rpc: return err if page is incorrect (less than 0 or greater than tot… (cometbft#3825)

* rpc: return err if page is incorrect (less than 0 or greater than total pages)

Fixes cometbft#3813

* fix rpc_test

* blockchain: Reorg reactor (cometbft#3561)

* go routines in blockchain reactor

* Added reference to the go routine diagram

* Initial commit

* cleanup

* Undo testing_logger change, committed by mistake

* Fix the test loggers

* pulled some fsm code into pool.go

* added pool tests

* changes to the design

added block requests under peer

moved the request trigger in the reactor poolRoutine, triggered now by a ticker

in general moved everything required for making block requests smarter in the poolRoutine

added a simple map of heights to keep track of what will need to be requested next

added a few more tests

* send errors to FSM in a different channel than blocks

send errors (RemovePeer) from switch on a different channel than the
one receiving blocks
renamed channels
added more pool tests

* more pool tests

* lint errors

* more tests

* more tests

* switch fast sync to new implementation

* fixed data race in tests

* cleanup

* finished fsm tests

* address golangci comments :)

* address golangci comments :)

* Added timeout on next block needed to advance

* updating docs and cleanup

* fix issue in test from previous cleanup

* cleanup

* Added termination scenarios, tests and more cleanup

* small fixes to adr, comments and cleanup

* Fix bug in sendRequest()

If we tried to send a request to a peer not present in the switch, a
missing continue statement caused the request to be blackholed in a peer
that was removed and never retried.

While this bug was manifesting, the reactor kept asking for other
blocks that would be stored and never consumed. Added the number of
unconsumed blocks in the math for requesting blocks ahead of current
processing height so eventually there will be no more blocks requested
until the already received ones are consumed.

* remove bpPeer's didTimeout field

* Use distinct err codes for peer timeout and FSM timeouts

* Don't allow peers to update with lower height

* review comments from Ethan and Zarko

* some cleanup, renaming, comments

* Move block execution in separate goroutine

* Remove pool's numPending

* review comments

* fix lint, remove old blockchain reactor and duplicates in fsm tests

* small reorg around peer after review comments

* add the reactor spec

* verify block only once

* review comments

* change to int for max number of pending requests

* cleanup and godoc

* Add configuration flag fast sync version

* golangci fixes

* fix config template

* move both reactor versions under blockchain

* cleanup, golint, renaming stuff

* updated documentation, fixed more golint warnings

* integrate with behavior package

* sync with master

* gofmt

* add changelog_pending entry

* move to improvments

* suggestion to changelog entry

* Renamed wire.go to codec.go (cometbft#3827)

* Renamed wire.go to codec.go

- Wire was the previous name of amino
- Codec describes the file better than `wire` & `amino`

Signed-off-by: Marko Baricevic <marbar3778@yahoo.com>

* ide error

* rename amino.go to codec.go

* docs: add guides to docs (cometbft#3830)

* add staticcheck linting (cometbft#3828)

cleanup to add linter

    grpc change:
        https://godoc.org/google.golang.org/grpc#WithContextDialer
        https://godoc.org/google.golang.org/grpc#WithDialer
        grpc/grpc-go#2627
    prometheous change:
        due to UninstrumentedHandler, being deprecated in the future
    empty branch = empty if or else statement
        didn't delete them entirely but commented
        couldn't find a reason to have them
    could not replicate the issue cometbft#3406
        but if want to keep it commented then we should comment out the if statement as well

* types: move MakeVote / MakeBlock functions (cometbft#3819)

to the types package

Paritally Fixes cometbft#3584

* p2p: Fix error logging for connection stop (cometbft#3824)

* p2p: fix false-positive error logging when stopping connections

This changeset fixes two types of false-positive errors occurring during
connection shutdown.

The first occurs when the process invokes FlushStop() or Stop() on a
connection. While the previous behavior did properly wait for the sendRoutine
to finish, it did not notify the recvRoutine that the connection was shutting
down. This would cause the recvRouting to receive and error when reading and
log this error. The changeset fixes this by notifying the recvRoutine that
the connection is shutting down.

The second occurs when the connection is terminated (gracefully) by the other side.
The recvRoutine would get an EOF error during the read, log it, and stop the connection
with an error. The changeset detects EOF and gracefully shuts down the connection.

* bring back the comment about flushing

* add changelog entry

* listen for quitRecvRoutine too

* we have to call stopForError

Otherwise peer won't be removed from the peer set and maybe readded
later.

* p2p: Do not write 'Couldn't connect to any seeds' if there are no seeds (cometbft#3834)

* Do not write 'Couldn't connect to any seeds' if there are no seeds

* changelog

* remove privValUpgrade

* Fix typo in changelog

* Update CHANGELOG_PENDING.md

Co-Authored-By: Marko <marbar3778@yahoo.com>

I'm setting up all peers dynamically by calling dial_peers, so p2p.seeds in configs is empty, and I'm seeing error log a lot in logs.

* docs: add a footer to guides (cometbft#3835)

* docs: "Writing a Tendermint Core application in Kotlin (gRPC)" guide (cometbft#3838)

* add abci grpc kotlin guide

* Update docs/guides/kotlin.md

Co-Authored-By: Anton Kaliaev <anton.kalyaev@gmail.com>

* Update docs/guides/kotlin.md

Co-Authored-By: Anton Kaliaev <anton.kalyaev@gmail.com>

* Update docs/guides/kotlin.md

Co-Authored-By: Anton Kaliaev <anton.kalyaev@gmail.com>

* Update kotlin.md

* node: allow replacing existing p2p.Reactor(s)  (cometbft#3846)

* node: allow replacing existing p2p.Reactor(s)

using [`CustomReactors`
option](https://godoc.org/github.com/tendermint/tendermint/node#CustomReactors).
Warning: beware of accidental name clashes. Here is the list of existing
reactors: MEMPOOL, BLOCKCHAIN, CONSENSUS, EVIDENCE, PEX.

* check the absence of "CUSTOM" prefix

* merge 2 tests

* add doc.go to node package

* gocritic (1/2) (cometbft#3836)

    Add gocritic as a linter

    The linting is not complete, but should i complete in this PR or in a following.

    23 files have been touched so it may be better to do in a following PR


Commits:

* Add gocritic to linting

- Added gocritic to linting

Signed-off-by: Marko Baricevic <marbar3778@yahoo.com>

* gocritic

* pr comments

* remove switch in cmdBatch

* tm-cmn to tm-db (cometbft#3850)

* tm-cmn to tm-db

* go.mod changes

* go.mod changes

* more go.mod

* fix tm-db

* ci fix, pending change

* version tmdb (cometbft#3854)

* txindexer: Refactor Tx Search Aggregation (cometbft#3851)

- Replace the previous intersect call, which was called at each query condition, with a map intersection.
- Replace fmt.Sprintf with string()

closes: cometbft#3076

Benchmarks

```
Old
goos: darwin
goarch: amd64
pkg: github.com/tendermint/tendermint/state/txindex/kv
BenchmarkTxSearch-4   	     200	 103641206 ns/op	 7998416 B/op	   71171 allocs/op
PASS
ok  	github.com/tendermint/tendermint/state/txindex/kv	26.019s

New
goos: darwin
goarch: amd64
pkg: github.com/tendermint/tendermint/state/txindex/kv
BenchmarkTxSearch-4   	    1000	  38615024 ns/op	13515226 B/op	  166460 allocs/op
PASS
ok  	github.com/tendermint/tendermint/state/txindex/kv	53.618s
```

~62% performance improvement

Commits:

* Refactor tx search

* Add pending changelog entry

* Add tx search benchmarking

* remove intermediate hashes list

also reset timer in BenchmarkTxSearch
and fix other benchmark

* fix import

* Add test cases

* Fix searching

* Replace fmt.Sprintf with string

* Update state/txindex/kv/kv.go

Co-Authored-By: Anton Kaliaev <anton.kalyaev@gmail.com>

* Rename params

* Cleanup

* Check error in benchmarks

* release for v0.32.2

* Merge PR cometbft#3860: Update log v0.32.2

* changelog updates

* pr comments

* Fix for panic in signature verification if a peer sends a nil public key.

* update version.go

* Changelog update

* Update CHANGELOG.md

Co-Authored-By: Anton Kaliaev <anton.kalyaev@gmail.com>

* update changelog

* p2p: only allow ed25519 pubkeys when connecting

also, recover from any possible failures in acceptPeers

Refs cometbft#4030

* update changelog and bump version to v0.32.6

* set the date to today

* cs: panic only when WAL#WriteSync fails

- modify WAL#Write and WAL#WriteSync to return an error

* types: validate Part#Proof

add ValidateBasic to crypto/merkle/SimpleProof

* cs: limit max bit array size and block parts count

* cs: test new limits

* cs: only assert important stuff

* update changelog and bump version to 0.32.7

* fixes after Ethan's review

* align max wal msg and max consensus msg sizes

* fix tests

* fix test

* use bor

* add data in commit

* remove votes from header

* new: add proposal results in vote

* fix: go mod

* new: add sidechannel proto objects

* new: add begin side blocker and deliver side tx

* new: add side tx results in begin side block

* add: add side tx results into request begin side-block

* chg: add address in sig object

* chg: add events in side block

* chg: allow empty sig

* chg: add flag to execute side-tx while not syncing

* chg: remove data from vote

* fix: use last byte on bigendian bytes

* fix: call sidetx result for string method

* feat: add rollback feature

* Use bor version v0.2.16

* Change log level tag from a single character to a full word

This will change logging format from:

D[2016-05-02|11:06:44.322]

to:

DEBUG[2016-05-02|11:06:44.322]

The purpose is to unify the logging with bor.

* consensus,scripts,state,store,types: change PartSetHeader total to uint32

* libs/log: add warn log level (cometbft#27)

* libs/log: add warn log level

* mardizzone/POS-1609: dev: chg: bump btcd dep and solve related issues

* mardizzone/POS-1609: dev: chg: solve vulnerabilities associated with some packages

* mardizzone/POS-1609: dev: chg: update bor version and replace tm-db

* mardizzone/POS-1609: dev: chg: bump go version

* mardizzone/POS-1609: dev: chg: bump go version to latest patch

* Changed the value of default maxNumInboundPeers and maxNumOutboundPeers

* made Stopping peer for error log as debig (cometbft#30)

* made dialing failed log as debug (cometbft#31)

* Added log to print number of peers (cometbft#32)

* added log to print number of peers

* update

* peppermint: changes to crypto

* Modified NewFilePV to generate secp256k1

* (temporarily) allow both tendermint/P*KeySecp256k1 and comet/P*KeySecp256k1Uncompressed to ease migration

* Forward-port disabled `MaxSignatureSize` checks (+ new ones needed)

* cherry pick secp256k1 migration commits + go mod tidy

* blocksync,consensus,crypto,libs,types: fix tests and more conflicts

* consensus,libs,types: fix tests, vulns from govuln and some lint errors

* ci: bump go version to 1.21.4

* Fixed `TestPubKeySecp256k1Address`

* crypto: enforce curve group order checks in genPrivKey

* abci,crypto: fix conflicts and tests

* types: fix TestInvalidPrecommitExtensions

* fix lint

* Extend kvstore example add with with key types

* Fix `TestReactorValidatorSetChanges`

* Fix UTs in `execution_test.go`

* Fix `TestEvidencePoolBasic`

* Fix `TestVoteExtension`

* test/e2e: use go 1.21.4 in docker

* test/e2e: use secp256k1 as default key type in testnet setup

* p2p/conn: use secp256k1 for p2p authentication

* p2p/conn: allow both secp256k1 and ed25519 key types for authentication

* all: address PR comments

* types,blocksync: fix lint + tests + bump deps complained by govuln

* crypto,state,test: resolve conflicts from v0.38.5

* abci: resolve conflicts from v0.38.5

* resolve go mod deps

* Revert "Merge branch 'v0.38.5-upstream' into raneet10/peppermint-changes"

This reverts commit 2706fc9, reversing
changes made to e404e0f.

* Revert "Revert "Merge branch 'v0.38.5-upstream' into raneet10/peppermint-changes""

This reverts commit fc56973.

* all: fix issue from merge

* docs: remove Warn log definition from ADR

* state: remove outdated comments

* types: increase MaxSignatureSize to 65 and unskip related tests

* cmd: minor refactor

Co-authored-by: Sergio Mena <sergio@informal.systems>

* libs/protoio: minor refactor

Co-authored-by: Sergio Mena <sergio@informal.systems>

* libs/pubsub: minor refactor

Co-authored-by: Sergio Mena <sergio@informal.systems>

* state: minor refactor

Co-authored-by: Sergio Mena <sergio@informal.systems>

* state: minor restructure in test

Co-authored-by: Sergio Mena <sergio@informal.systems>

* types: fix TestMaxCommitBytes + lint

* state,types: fix TestTxFilter and TestBlockMaxDataBytes

* types: fix TestBlockMaxDataBytesNoEvidence

* types: fix TestInvalidPrecommitExtensions

* abci,types: address comments

* crypto,proto: add secp256k1_uncompressed oneof in PublicKey proto message type

* remove revive from .golangci.yml

* remove replace of go-ethereum dep with bor and go mod tidy

---------

Co-authored-by: vaibhavchellani <vaibhavchellani223@gmail.com>
Co-authored-by: Alex Dupre <sysadmin@alexdupre.com>
Co-authored-by: Roman Useinov <roman.useinov@gmail.com>
Co-authored-by: Marko <marbar3778@yahoo.com>
Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
Co-authored-by: Jun Kimura <junkxdev@gmail.com>
Co-authored-by: zjubfd <296179868@qq.com>
Co-authored-by: Anca Zamfir <ancazamfir@users.noreply.github.com>
Co-authored-by: folex <0xdxdy@gmail.com>
Co-authored-by: Ivan Kushmantsev <kushmantsev@gmail.com>
Co-authored-by: Alexander Bezobchuk <alexanderbez@users.noreply.github.com>
Co-authored-by: Ethan Buchman <ethan@coinculture.info>
Co-authored-by: Zaki Manian <zaki@manian.org>
Co-authored-by: Zaki Manian <zaki@tendermint.com>
Co-authored-by: Jaynti Kanani <jdkanani@gmail.com>
Co-authored-by: Sai Kumar <sai@vitwit.com>
Co-authored-by: Krishna Upadhyaya <krishnau1604@gmail.com>
Co-authored-by: Jerry <jerrycgh@gmail.com>
Co-authored-by: Anshal Shukla <53994948+anshalshukla@users.noreply.github.com>
Co-authored-by: marcello33 <marcelloardizzone@hotmail.it>
Co-authored-by: Vaibhav Jindal <vaibhavjindal29@gmail.com>
Co-authored-by: VaibhavJindal <74560896+VAIBHAVJINDAL3012@users.noreply.github.com>
Co-authored-by: Pratik Patil <pratikspatil024@gmail.com>
Co-authored-by: Sergio Mena <sergio@informal.systems>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-to-v0.38.x Tell Mergify to backport the PR to v0.38.x block-sync bug Something isn't working

Projects

No open projects
Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants