-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Description
Summary
RFC 004 discussed the value of the e2e framework and highlighted a few ways with which it could be improved. I would like to propose that some of these suggestions be transformed into more concrete designs that we should prioritise this as part of our objective to gain greater confidence in our release.
Specifically, I think there should be a separation of desired outcome in our end to end testing, namely:
- Correctness: does Tendermint perform state machine replication correctly even given arbitrary failures of less than 1/3 of the network.
- Performance: is Tendermint robust enough to handle a variety of stress conditions (not just transaction load but other variables like size of the network).
Our existing test suite currently does a good job at detecting failures in correctness. I would like to leverage the existing tooling (i.e. node orchestration) to be able to support the testing of performance in what would be similar to a staging environment or canary testnet. This would be deployed not after every commit to master but after large projects are merged and before the cutting of releases.
This performance orientated testnet should have the following characteristics
- Large and geographically disperse set of participating nodes. We currently don't run networks greater than 7 nodes whilst production grade networks have a couple hundred. We also run all nodes on the same machine which doesn't account for networking latency when passing messages.
- Flexible workload generation. In the same way that the current e2e test systematically generates every permutation over a range of possible configurations, we should apply the same to workload generation. Systematically injecting transactions across several dimensions: number of transactions, size of transactions, duration of load, frequency of load (i.e. over bursts), distribution of load (do we submit all transactions to a single node or across multiple).
- Metrics gathering. Use existing tools like prometheus and grafana to monitor for trends and record statistics for comparison across versions / commits / network configurations.
Rather than producing a full product we should focus first on getting a minimal design that can provide some insight and value.
For Admin Use
- Not duplicate issue
- Appropriate labels applied
- Appropriate contributors tagged
- Contributor assigned/self-assigned