Skip to content

fix(mempool)!: stop accepting TXs in the mempool if we can't keep up with reCheckTX#3314

Merged
hvanz merged 17 commits intomainfrom
sergio/bound-rechecktx
Jun 24, 2024
Merged

fix(mempool)!: stop accepting TXs in the mempool if we can't keep up with reCheckTX#3314
hvanz merged 17 commits intomainfrom
sergio/bound-rechecktx

Conversation

@sergio-mena
Copy link
Collaborator

@sergio-mena sergio-mena commented Jun 21, 2024

This PR is a combination of ideas from @ValarDragon, @hvanz and @sergio-mena to alleviate nodes that, while not having their mempool full "officially", they have too many TXs lingering in the mempool which causes them to fall behind.
The mechanism works as follows:

  • We mark when we start and end reChecking
  • If, by the time a new block is decided we are still running the previous reCheckTx, we declare the mempool as rechecktx-full
  • Otherwise, we declare the mempool as not rechecktx-full

We have tested this and it fixes the failing nightlies that are blocking us from cutting v1.0.0-rc1.

Some UTs need to be fixed, hence posting as draft for the moment.


PR checklist

  • Tests written/updated
  • Changelog entry added in .changelog (we use unclog to manage our changelog)
  • Updated relevant documentation (docs/ or spec/) and code comments
  • Title follows the Conventional Commits spec

@andynog andynog marked this pull request as ready for review June 21, 2024 18:01
@andynog andynog requested a review from a team as a code owner June 21, 2024 18:01
@andynog andynog requested a review from a team June 21, 2024 18:01
Copy link
Collaborator

@greg-szabo greg-szabo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather have Debug level for the newly introduced log message, because it's a bit cryptic for validators. For n end-user, I could consider a "mempool is soft full" message useful, although there's not much they can do about it.

Copy link
Collaborator

@melekes melekes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we respect the Recheck config setting?

@hvanz hvanz added this pull request to the merge queue Jun 24, 2024
Merged via the queue into main with commit 0cd2907 Jun 24, 2024
@hvanz hvanz deleted the sergio/bound-rechecktx branch June 24, 2024 21:00
mergify bot pushed a commit that referenced this pull request Jun 24, 2024
…with reCheckTX (#3314)

This PR is a combination of ideas from @ValarDragon, @hvanz and
@sergio-mena to alleviate nodes that, while not having their mempool
full "officially", they have too many TXs lingering in the mempool which
causes them to fall behind.
The mechanism works as follows:

* We mark when we start and end reChecking
* If, by the time a new block is decided we are still running the
previous `reCheckTx`, we declare the mempool as rechecktx-full
* Otherwise, we declare the mempool as not rechecktx-full

We have tested this and it fixes the failing nightlies that are blocking
us from cutting `v1.0.0-rc1`.

Some UTs need to be fixed, hence posting as draft for the moment.

---

#### PR checklist

- [ ] Tests written/updated
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec

---------

Co-authored-by: Andy Nogueira <me@andynogueira.dev>
Co-authored-by: Dev Ojha <dojha@berkeley.edu>
Co-authored-by: hvanz <hernan.vanzetto@gmail.com>
Co-authored-by: Hernán Vanzetto <15466498+hvanz@users.noreply.github.com>
(cherry picked from commit 0cd2907)
mergify bot pushed a commit that referenced this pull request Jun 24, 2024
…with reCheckTX (#3314)

This PR is a combination of ideas from @ValarDragon, @hvanz and
@sergio-mena to alleviate nodes that, while not having their mempool
full "officially", they have too many TXs lingering in the mempool which
causes them to fall behind.
The mechanism works as follows:

* We mark when we start and end reChecking
* If, by the time a new block is decided we are still running the
previous `reCheckTx`, we declare the mempool as rechecktx-full
* Otherwise, we declare the mempool as not rechecktx-full

We have tested this and it fixes the failing nightlies that are blocking
us from cutting `v1.0.0-rc1`.

Some UTs need to be fixed, hence posting as draft for the moment.

---

#### PR checklist

- [ ] Tests written/updated
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec

---------

Co-authored-by: Andy Nogueira <me@andynogueira.dev>
Co-authored-by: Dev Ojha <dojha@berkeley.edu>
Co-authored-by: hvanz <hernan.vanzetto@gmail.com>
Co-authored-by: Hernán Vanzetto <15466498+hvanz@users.noreply.github.com>
(cherry picked from commit 0cd2907)

# Conflicts:
#	mempool/clist_mempool.go
#	mempool/clist_mempool_test.go
mergify bot pushed a commit that referenced this pull request Jun 24, 2024
…with reCheckTX (#3314)

This PR is a combination of ideas from @ValarDragon, @hvanz and
@sergio-mena to alleviate nodes that, while not having their mempool
full "officially", they have too many TXs lingering in the mempool which
causes them to fall behind.
The mechanism works as follows:

* We mark when we start and end reChecking
* If, by the time a new block is decided we are still running the
previous `reCheckTx`, we declare the mempool as rechecktx-full
* Otherwise, we declare the mempool as not rechecktx-full

We have tested this and it fixes the failing nightlies that are blocking
us from cutting `v1.0.0-rc1`.

Some UTs need to be fixed, hence posting as draft for the moment.

---

#### PR checklist

- [ ] Tests written/updated
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec

---------

Co-authored-by: Andy Nogueira <me@andynogueira.dev>
Co-authored-by: Dev Ojha <dojha@berkeley.edu>
Co-authored-by: hvanz <hernan.vanzetto@gmail.com>
Co-authored-by: Hernán Vanzetto <15466498+hvanz@users.noreply.github.com>
(cherry picked from commit 0cd2907)

# Conflicts:
#	internal/consensus/replay_stubs.go
#	mempool/clist_mempool.go
#	mempool/clist_mempool_test.go
#	mempool/errors.go
#	mempool/reactor_test.go
#	state/execution.go
#	state/execution_test.go
hvanz pushed a commit that referenced this pull request Jun 24, 2024
…with reCheckTX (backport #3314) (#3336)

This PR is a combination of ideas from @ValarDragon, @hvanz and
@sergio-mena to alleviate nodes that, while not having their mempool
full "officially", they have too many TXs lingering in the mempool which
causes them to fall behind.
The mechanism works as follows:

* We mark when we start and end reChecking
* If, by the time a new block is decided we are still running the
previous `reCheckTx`, we declare the mempool as rechecktx-full
* Otherwise, we declare the mempool as not rechecktx-full

We have tested this and it fixes the failing nightlies that are blocking
us from cutting `v1.0.0-rc1`.

Some UTs need to be fixed, hence posting as draft for the moment.

---

#### PR checklist

- [ ] Tests written/updated
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
<hr>This is an automatic backport of pull request #3314 done by
[Mergify](https://mergify.com).

Co-authored-by: Sergio Mena <sergio@informal.systems>
hvanz added a commit that referenced this pull request Jun 24, 2024
…with reCheckTX (#3314)

This PR is a combination of ideas from @ValarDragon, @hvanz and
@sergio-mena to alleviate nodes that, while not having their mempool
full "officially", they have too many TXs lingering in the mempool which
causes them to fall behind.
The mechanism works as follows:

* We mark when we start and end reChecking
* If, by the time a new block is decided we are still running the
previous `reCheckTx`, we declare the mempool as rechecktx-full
* Otherwise, we declare the mempool as not rechecktx-full

We have tested this and it fixes the failing nightlies that are blocking
us from cutting `v1.0.0-rc1`.

Some UTs need to be fixed, hence posting as draft for the moment.

---

- [ ] Tests written/updated
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec

---------

Co-authored-by: Andy Nogueira <me@andynogueira.dev>
Co-authored-by: Dev Ojha <dojha@berkeley.edu>
Co-authored-by: hvanz <hernan.vanzetto@gmail.com>
Co-authored-by: Hernán Vanzetto <15466498+hvanz@users.noreply.github.com>
hvanz added a commit that referenced this pull request Jun 24, 2024
…with reCheckTX (backport #3314) (#3337)

This PR is a combination of ideas from @ValarDragon, @hvanz and
@sergio-mena to alleviate nodes that, while not having their mempool
full "officially", they have too many TXs lingering in the mempool which
causes them to fall behind.
The mechanism works as follows:

* We mark when we start and end reChecking
* If, by the time a new block is decided we are still running the
previous `reCheckTx`, we declare the mempool as rechecktx-full
* Otherwise, we declare the mempool as not rechecktx-full

We have tested this and it fixes the failing nightlies that are blocking
us from cutting `v1.0.0-rc1`.

Some UTs need to be fixed, hence posting as draft for the moment.

---

#### PR checklist

- [ ] Tests written/updated
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
<hr>This is an automatic backport of pull request #3314 done by
[Mergify](https://mergify.com).

---------

Co-authored-by: Sergio Mena <sergio@informal.systems>
Co-authored-by: hvanz <hernan.vanzetto@gmail.com>
hvanz added a commit that referenced this pull request Jun 24, 2024
…with reCheckTX (backport #3314) (#3338)

This PR is a combination of ideas from @ValarDragon, @hvanz and
@sergio-mena to alleviate nodes that, while not having their mempool
full "officially", they have too many TXs lingering in the mempool which
causes them to fall behind.
The mechanism works as follows:

* We mark when we start and end reChecking
* If, by the time a new block is decided we are still running the
previous `reCheckTx`, we declare the mempool as rechecktx-full
* Otherwise, we declare the mempool as not rechecktx-full

We have tested this and it fixes the failing nightlies that are blocking
us from cutting `v1.0.0-rc1`.

Some UTs need to be fixed, hence posting as draft for the moment.

---

#### PR checklist

- [ ] Tests written/updated
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
<hr>This is an automatic backport of pull request #3314 done by
[Mergify](https://mergify.com).

Co-authored-by: Sergio Mena <sergio@informal.systems>
Co-authored-by: Andy Nogueira <me@andynogueira.dev>
Co-authored-by: Dev Ojha <dojha@berkeley.edu>
Co-authored-by: hvanz <hernan.vanzetto@gmail.com>
Co-authored-by: Hernán Vanzetto <15466498+hvanz@users.noreply.github.com>
ValarDragon added a commit to osmosis-labs/cometbft that referenced this pull request Jun 27, 2024
…with reCheckTX (backport cometbft#3314) (cometbft#3338)

This PR is a combination of ideas from @ValarDragon, @hvanz and
@sergio-mena to alleviate nodes that, while not having their mempool
full "officially", they have too many TXs lingering in the mempool which
causes them to fall behind.
The mechanism works as follows:

* We mark when we start and end reChecking
* If, by the time a new block is decided we are still running the
previous `reCheckTx`, we declare the mempool as rechecktx-full
* Otherwise, we declare the mempool as not rechecktx-full

We have tested this and it fixes the failing nightlies that are blocking
us from cutting `v1.0.0-rc1`.

Some UTs need to be fixed, hence posting as draft for the moment.

---

#### PR checklist

- [ ] Tests written/updated
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
<hr>This is an automatic backport of pull request cometbft#3314 done by
[Mergify](https://mergify.com).

Co-authored-by: Sergio Mena <sergio@informal.systems>
Co-authored-by: Andy Nogueira <me@andynogueira.dev>
Co-authored-by: Dev Ojha <dojha@berkeley.edu>
Co-authored-by: hvanz <hernan.vanzetto@gmail.com>
Co-authored-by: Hernán Vanzetto <15466498+hvanz@users.noreply.github.com>
ValarDragon added a commit to osmosis-labs/cometbft that referenced this pull request Jun 27, 2024
… with reCheckTX (backport cometbft#3314) cometbft#3338  (#117)

* Revert "Add ResetUpdate, so non-block proposers no longer fail to sync (#113)"

This reverts commit 158bcbe.

* fix(mempool)!: stop accepting TXs in the mempool if we can't keep up with reCheckTX (backport cometbft#3314) (cometbft#3338)

This PR is a combination of ideas from @ValarDragon, @hvanz and
@sergio-mena to alleviate nodes that, while not having their mempool
full "officially", they have too many TXs lingering in the mempool which
causes them to fall behind.
The mechanism works as follows:

* We mark when we start and end reChecking
* If, by the time a new block is decided we are still running the
previous `reCheckTx`, we declare the mempool as rechecktx-full
* Otherwise, we declare the mempool as not rechecktx-full

We have tested this and it fixes the failing nightlies that are blocking
us from cutting `v1.0.0-rc1`.

Some UTs need to be fixed, hence posting as draft for the moment.

---

#### PR checklist

- [ ] Tests written/updated
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
<hr>This is an automatic backport of pull request cometbft#3314 done by
[Mergify](https://mergify.com).

Co-authored-by: Sergio Mena <sergio@informal.systems>
Co-authored-by: Andy Nogueira <me@andynogueira.dev>
Co-authored-by: Dev Ojha <dojha@berkeley.edu>
Co-authored-by: hvanz <hernan.vanzetto@gmail.com>
Co-authored-by: Hernán Vanzetto <15466498+hvanz@users.noreply.github.com>

---------

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Sergio Mena <sergio@informal.systems>
Co-authored-by: Andy Nogueira <me@andynogueira.dev>
Co-authored-by: hvanz <hernan.vanzetto@gmail.com>
Co-authored-by: Hernán Vanzetto <15466498+hvanz@users.noreply.github.com>
hvanz added a commit that referenced this pull request Jun 28, 2024
…its logic to Lock (#3361)

This PR partially reverts the backport of #3314 into the recently
released `v0.38.8` (and `v0.37.7`). With this change the `Mempool`
interface is the same as in previous versions.

The reason is that we do not want to break the public API. We still keep
in the code the feature that #3314 introduced by moving it inside the
existing `Lock` method. We also keep the `RecheckFull bool` field that
we added to `ErrMempoolIsFull`.

<!--

Please add a reference to the issue that this PR addresses and indicate
which
files are most critical to review. If it fully addresses a particular
issue,
please include "Closes #XXX" (where "XXX" is the issue number).

If this PR is non-trivial/large/complex, please ensure that you have
either
created an issue that the team's had a chance to respond to, or had some
discussion with the team prior to submitting substantial pull requests.
The team
can be reached via GitHub Discussions or the Cosmos Network Discord
server in
the #cometbft channel. GitHub Discussions is preferred over Discord as
it
allows us to keep track of conversations topically.
https://github.com/cometbft/cometbft/discussions

If the work in this PR is not aligned with the team's current
priorities, please
be advised that it may take some time before it is merged - especially
if it has
not yet been discussed with the team.

See the project board for the team's current priorities:
https://github.com/orgs/cometbft/projects/1

-->

---

#### PR checklist

- [ ] Tests written/updated
- [ ] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [ ] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
hvanz added a commit that referenced this pull request Jun 28, 2024
…its logic to Lock (#3363)

Complements #3361 

This PR partially reverts the backport of
#3314 into the recently
released v0.37.7. With this change the Mempool interface is the same as
in previous versions.

The reason is that we do not want to break the public API. We still keep
in the code the feature that
#3314 introduced by moving it
inside the existing Lock method. We also keep the RecheckFull bool field
that we added to ErrMempoolIsFull.

<!--

Please add a reference to the issue that this PR addresses and indicate
which
files are most critical to review. If it fully addresses a particular
issue,
please include "Closes #XXX" (where "XXX" is the issue number).

If this PR is non-trivial/large/complex, please ensure that you have
either
created an issue that the team's had a chance to respond to, or had some
discussion with the team prior to submitting substantial pull requests.
The team
can be reached via GitHub Discussions or the Cosmos Network Discord
server in
the #cometbft channel. GitHub Discussions is preferred over Discord as
it
allows us to keep track of conversations topically.
https://github.com/cometbft/cometbft/discussions

If the work in this PR is not aligned with the team's current
priorities, please
be advised that it may take some time before it is merged - especially
if it has
not yet been discussed with the team.

See the project board for the team's current priorities:
https://github.com/orgs/cometbft/projects/1

-->

---

#### PR checklist

- [ ] Tests written/updated
- [ ] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [ ] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
github-merge-queue bot pushed a commit that referenced this pull request Aug 9, 2024
Complements #3314

This PR adds a new error with a more clear error message when checking
if the mempool is full while rechecking is in progress after a new block
was committed.

---

#### PR checklist

- [ ] Tests written/updated
- [ ] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [ ] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
mergify bot pushed a commit that referenced this pull request Aug 9, 2024
Complements #3314

This PR adds a new error with a more clear error message when checking
if the mempool is full while rechecking is in progress after a new block
was committed.

---

#### PR checklist

- [ ] Tests written/updated
- [ ] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [ ] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec

(cherry picked from commit 987fa83)
mergify bot pushed a commit that referenced this pull request Aug 9, 2024
Complements #3314

This PR adds a new error with a more clear error message when checking
if the mempool is full while rechecking is in progress after a new block
was committed.

---

#### PR checklist

- [ ] Tests written/updated
- [ ] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [ ] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec

(cherry picked from commit 987fa83)

# Conflicts:
#	mempool/clist_mempool.go
hvanz added a commit that referenced this pull request Aug 9, 2024
Complements #3314

This PR adds a new error with a more clear error message when checking
if the mempool is full while rechecking is in progress after a new block
was committed.

---

#### PR checklist

- [ ] Tests written/updated
- [ ] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [ ] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
<hr>This is an automatic backport of pull request #3654 done by
[Mergify](https://mergify.com).

Co-authored-by: Hernán Vanzetto <15466498+hvanz@users.noreply.github.com>
hvanz added a commit that referenced this pull request Aug 9, 2024
Complements #3314

This PR adds a new error with a more clear error message when checking
if the mempool is full while rechecking is in progress after a new block
was committed.

---

#### PR checklist

- [ ] Tests written/updated
- [ ] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [ ] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
<hr>This is an automatic backport of pull request #3654 done by
[Mergify](https://mergify.com).

---------

Co-authored-by: Hernán Vanzetto <15466498+hvanz@users.noreply.github.com>
Co-authored-by: hvanz <hernan.vanzetto@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-to-v0.38.x Tell Mergify to backport the PR to v0.38.x bug Something isn't working mempool

Projects

No open projects
Status: Done

Development

Successfully merging this pull request may close these issues.

6 participants