mempool: disable MaxBatchBytes by melekes · Pull Request #5800 · tendermint/tendermint

melekes · 2020-12-15T16:18:14Z

@p4u from vocdoni.io reported that the mempool might behave incorrectly under a
high load. The consequences can range from pauses between blocks to the peers
disconnecting from this node.

My current theory is that the flowrate lib we're using to control flow
(multiplex over a single TCP connection) was not designed w/ large blobs
(1MB batch of txs) in mind.

I've tried decreasing the Mempool reactor priority, but that did not
have any visible effect. What actually worked is adding a time.Sleep
into mempool.Reactor#broadcastTxRoutine after an each successful send ==
manual control flow of sort.

As a temporary remedy (until the mempool package
is refactored), the max-batch-bytes was disabled. Transactions will be sent
one by one without batching

Closes #5796

My current theory is that the flowrate lib we're using to control flow (multiplex over a single TCP connection) was not designed w/ large blobs (1MB batch of txs) in mind. I've tried decreasing the Mempool reactor priority, but that did not have any visible effect. What actually worked is adding a time.Sleep into mempool.Reactor#broadcastTxRoutine after an each successful send == manual control flow of sort. Closes #5796

this is expensive

codecov · 2020-12-15T16:25:41Z

Codecov Report

Merging #5800 (c1db2eb) into master (6a056e0) will increase coverage by 0.05%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #5800      +/-   ##
==========================================
+ Coverage   59.77%   59.82%   +0.05%     
==========================================
  Files         262      262              
  Lines       23705    23688      -17     
==========================================
+ Hits        14169    14171       +2     
+ Misses       8023     8007      -16     
+ Partials     1513     1510       -3

Impacted Files	Coverage Δ
config/toml.go	`60.86% <ø> (ø)`
config/config.go	`79.27% <100.00%> (+0.95%)`	⬆️
mempool/reactor.go	`82.50% <100.00%> (-3.87%)`	⬇️
abci/client/socket_client.go	`38.02% <0.00%> (-1.15%)`	⬇️
mempool/clist_mempool.go	`80.57% <0.00%> (-0.72%)`	⬇️
consensus/reactor.go	`74.21% <0.00%> (-0.13%)`	⬇️
proxy/multi_app_conn.go	`48.05% <0.00%> (ø)`
blockchain/v0/reactor.go	`63.96% <0.00%> (ø)`
consensus/state.go	`68.49% <0.00%> (+0.27%)`	⬆️
... and 5 more

config/config.go

p4u · 2020-12-16T09:03:50Z

I hope you are sure about this @melekes, you know Tendermint better than me, but this workaround looks like buring a very serious problem that eventually will be solved with a complete refactor in the future.

mempool/reactor.go

erikgrinaker · 2020-12-16T09:55:38Z

My current theory is that the flowrate lib we're using to control flow

The rate limits are configurable. @p4u, can you see if increasing send-rate and recv-rate in config.toml to e.g. 10000000 (10 MB/s) improves the situation?

tessr · 2020-12-16T12:32:08Z

this workaround looks like buring a very serious problem that eventually will be solved with a complete refactor in the future

This is something we're trying to balance. Although we intend to do a complete refactor, it's not slated to begin for a few months, and it will probably take a few months itself. I think we're looking at mid-2021 before the mempool refactor is ready to roll. In the meantime, we'd like to see if there's a quick (but safe) fix that we can apply to get everything working for you again.

erikgrinaker

LGTM. I think the original PR that introduced this made some other changes as well, might be worth verifying that this change actually fixes the problems, i.e. that other bugs weren't introduced too.

p4u · 2020-12-16T12:37:43Z

This is something we're trying to balance. Although we intend to do a complete refactor, it's not slated to begin for a few months, and it will probably take a few months itself. I think we're looking at mid-2021 before the mempool refactor is ready to roll. In the meantime, we'd like to see if there's a quick (but safe) fix that we can apply to get everything working for you again.

Sure, I understand. I just wanted to point out that if the work around is not safe enough it would become a mess for anyone using Tendermint and upgrading to 0.34. If we are not sure of the safety of this, I'd revert the whole Batch Tx feature for now.

fix that we can apply to get everything working for you again.

Well, not only for me but for anyone using Tendermint I hope 👍

EDIT: I just noticed that the PR has been changed, so my comment was in the previous context that was quite risky IMO.

CHANGELOG_PENDING.md

tessr · 2020-12-16T12:46:09Z

Is this PR description still accurate, @melekes?

melekes · 2020-12-16T12:51:10Z

Is this PR description still accurate, @melekes?

yes

@p4u

@p4u from vocdoni.io reported that the mempool might behave incorrectly under a high load. The consequences can range from pauses between blocks to the peers disconnecting from this node. My current theory is that the flowrate lib we're using to control flow (multiplex over a single TCP connection) was not designed w/ large blobs (1MB batch of txs) in mind. I've tried decreasing the Mempool reactor priority, but that did not have any visible effect. What actually worked is adding a time.Sleep into mempool.Reactor#broadcastTxRoutine after an each successful send == manual control flow of sort. As a temporary remedy (until the mempool package is refactored), the max-batch-bytes was disabled. Transactions will be sent one by one without batching Closes #5796

@p4u

@p4u from vocdoni.io reported that the mempool might behave incorrectly under a high load. The consequences can range from pauses between blocks to the peers disconnecting from this node. My current theory is that the flowrate lib we're using to control flow (multiplex over a single TCP connection) was not designed w/ large blobs (1MB batch of txs) in mind. I've tried decreasing the Mempool reactor priority, but that did not have any visible effect. What actually worked is adding a time.Sleep into mempool.Reactor#broadcastTxRoutine after an each successful send == manual control flow of sort. As a temporary remedy (until the mempool package is refactored), the max-batch-bytes was disabled. Transactions will be sent one by one without batching Closes #5796

@p4u

@p4u from vocdoni.io reported that the mempool might behave incorrectly under a high load. The consequences can range from pauses between blocks to the peers disconnecting from this node. My current theory is that the flowrate lib we're using to control flow (multiplex over a single TCP connection) was not designed w/ large blobs (1MB batch of txs) in mind. I've tried decreasing the Mempool reactor priority, but that did not have any visible effect. What actually worked is adding a time.Sleep into mempool.Reactor#broadcastTxRoutine after an each successful send == manual control flow of sort. As a temporary remedy (until the mempool package is refactored), the max-batch-bytes was disabled. Transactions will be sent one by one without batching Closes #5796

This configuration is not used anymore; it's a leftover of batching txs in the mempool, which was deprecated (tendermint/tendermint#5800)

This configuration is not used anymore; it's a leftover of batching txs in the mempool, which was deprecated (tendermint/tendermint#5800) (cherry picked from commit dab72ad)

The PR (tendermint/tendermint#5800) is the change that disabled transaction batching, not the issue (#5796) which reported the problem. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

melekes added 2 commits December 15, 2020 19:18

mempool: do not recalculate batch size each time

9aba1d9

this is expensive

melekes self-assigned this Dec 15, 2020

melekes changed the title ~~mempool: decrease MaxBatchBytes from 1MB to 4kB~~ mempool: decrease MaxBatchBytes from 10MB to 4kB Dec 15, 2020

melekes added 2 commits December 16, 2020 11:49

bring back expensive loop

8b85b93

update changelog

4d92dd8

melekes marked this pull request as ready for review December 16, 2020 08:31

melekes requested review from ebuchman, erikgrinaker, tac0turtle and tessr as code owners December 16, 2020 08:31

p4u reviewed Dec 16, 2020

View reviewed changes

config/config.go Outdated Show resolved Hide resolved

erikgrinaker reviewed Dec 16, 2020

View reviewed changes

mempool/reactor.go Outdated Show resolved Hide resolved

melekes marked this pull request as draft December 16, 2020 11:01

disable transaction batching completely

300c859

melekes changed the title ~~mempool: decrease MaxBatchBytes from 10MB to 4kB~~ mempool: disable MaxBatchBytes Dec 16, 2020

melekes marked this pull request as ready for review December 16, 2020 12:09

format file

ee35188

erikgrinaker approved these changes Dec 16, 2020

View reviewed changes

mvdan reviewed Dec 16, 2020

View reviewed changes

CHANGELOG_PENDING.md Outdated Show resolved Hide resolved

melekes added 2 commits December 16, 2020 17:01

fix spelling

5531963

Merge branch 'master' into anton/5796-mempool

6d66276

alexanderbez approved these changes Dec 16, 2020

View reviewed changes

tessr added this to the v0.34.1 milestone Dec 17, 2020

Merge branch 'master' into anton/5796-mempool

c1db2eb

melekes merged commit 77deb71 into master Dec 21, 2020

melekes deleted the anton/5796-mempool branch December 21, 2020 15:17

tac0turtle mentioned this pull request Dec 21, 2020

I cannot define max_batch_bytes small. #5565

Closed

cmwaters mentioned this pull request Feb 22, 2022

docs: remove spec section from v0.34 docs #7940

Merged

This was referenced Apr 22, 2022

feature/GRAPH 306 v0.34.13 dm graphprotocol/tendermint#15

Merged

feature/GRAPH 306 v0.34.14 dm graphprotocol/tendermint#16

Merged

hvanz mentioned this pull request Jan 16, 2024

config: Remove unused max_batch_bytes cometbft/cometbft#2050

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mempool: disable MaxBatchBytes#5800

mempool: disable MaxBatchBytes#5800
melekes merged 9 commits intomasterfrom
anton/5796-mempool

melekes commented Dec 15, 2020 •

edited

Loading

Uh oh!

codecov bot commented Dec 15, 2020 •

edited

Loading

Uh oh!

Uh oh!

p4u commented Dec 16, 2020

Uh oh!

Uh oh!

erikgrinaker commented Dec 16, 2020

Uh oh!

tessr commented Dec 16, 2020

Uh oh!

erikgrinaker left a comment •

edited

Loading

Uh oh!

p4u commented Dec 16, 2020 •

edited

Loading

Uh oh!

Uh oh!

tessr commented Dec 16, 2020 •

edited

Loading

Uh oh!

melekes commented Dec 16, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

melekes commented Dec 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Dec 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

p4u commented Dec 16, 2020

Uh oh!

Uh oh!

erikgrinaker commented Dec 16, 2020

Uh oh!

tessr commented Dec 16, 2020

Uh oh!

erikgrinaker left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

p4u commented Dec 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

tessr commented Dec 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

melekes commented Dec 16, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

melekes commented Dec 15, 2020 •

edited

Loading

codecov bot commented Dec 15, 2020 •

edited

Loading

erikgrinaker left a comment •

edited

Loading

p4u commented Dec 16, 2020 •

edited

Loading

tessr commented Dec 16, 2020 •

edited

Loading