Construct test case to exercise long block execution times.

We're aware thatblocks that take a long time to process (e.x. the Osmosis epoch block) can create backpressure within tendermint, and this issue exists to track the work of building some kind of reproduction for this case. 

There are a number of theories about the cause that this test case needs to be able to exercise: 
- first, that there's some back pressure from the large number of events created during `EndBlock` (or `FinalizeBlock`). It would be good then, to have tests that both create a large number of transactions in this block *and* also that take a lot of time but that don't have many transactions. 
- second, it might be the case that the `MConnectionConfig` settings for heartbeats (ping/pong) are tuned too tightly, and that it might be possible to change these timeouts to see if that could be a successful workaround.
- third, there's lock contention in the `consensus.State` object, which is triggered by the settings on the `query23MajRoutine` process. For experimental process we might want to be able to run this test without this setting or change the frequency that it runs (which is configurable). My recent change in https://github.com/tendermint/tendermint/commit/eed617c2d9da5e6ba1d742b5f59940dec6682f99 may address some of the lock pressure on the node, so it would be useful to run this reproduction case without this change. 

There are lots of larger solutions to this problem:
- working within the application to reduce the scope of the epoch block, 
- improve infrastructure below the application in the SDK (e.g. databases, iavl->smt etc.),
- decouple the transport/connection protocol (e.g. mconnection) from the higher level constructs to preclude the possibility of this kind of back pressure (perhaps libp2p) 

Nevertheless, having a test case will help us validate that any of our remediation or solutions have fixed the issue.

The best path for implementing this isn't extremely straightforward, even though it should be possible to reproduce and observe it manually. 

It would be good if we could write this replication using a standard go test case, although it may be difficult to construct the right kind of test fixture (application, node configuration) given the current level of isolation. The e2e framework similarly would need a little bit of work to expose the right kind of options to be able to orchestrate this test.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Construct test case to exercise long block execution times. #7797

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Construct test case to exercise long block execution times. #7797

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions