QA Process report for v0.37.x (and baseline for v0.34.x) by sergio-mena · Pull Request #9499 · tendermint/tendermint

sergio-mena · 2022-09-29T22:22:18Z

Closes #9123 and #9333

(rendered)

This PR provides the QA report for both v0.34.x (as a baseline) and v0.37.x.

There is a method section that describes how the testnets were run, and how the relevant data was then extracted for this report
- The method is highly manual at the moment, and the idea is to increase automation with successive iterations.
The results section uses a sequence of plots to show the relevant data requested in the description of Capture v0.34 baseline metrics #9123
For v0.37.x, it also compares those plots to the baseline

PR checklist

Tests written/updated, or no tests needed
CHANGELOG_PENDING.md updated, or no changelog entry needed
Updated relevant documentation (docs/) and code comments, or no
documentation updates needed

docs/qa/method.md

…t section

docs/qa/v034/README.md

docs/qa/README.md

docs/qa/v034/README.md

docs/qa/method.md

tac0turtle · 2022-10-02T12:53:40Z

this is amazing, super good insights!!

is there interest in testing:

variable mempool sizes?
different consensus configurations tested (timeouts)

thanethomson · 2022-10-02T16:56:05Z

is there interest in testing:

variable mempool sizes?

different consensus configurations tested (timeouts)

This isn't planned for this round of QA, because every additional set of parameters represents an additional dimension to the tests. The costs of testing grow exponentially with each additional dimension.

We could, however, potentially look into how we could incorporate variations on mempool size and consensus params into future rounds of QA or research.

sergio-mena · 2022-10-03T07:49:38Z

@marbar3778, further to @thanethomson's reply, when in doubt, we chose to set parameters to their defaults, so the reader can have an idea of what to expect if they don't tweak parameters.
The variations you propose make a lot of sense, but remember this is, at the moment, a highly manual activity (so, costly).

tac0turtle · 2022-10-03T08:55:00Z

100%, was curious. This work is amazing, thank you for it!!

Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com>

…sses from memory plots

docs/qa/v034/README.md

docs/qa/v037/README.md

docs/qa/v034/img/v034_report_tabbed.txt

thanethomson · 2022-10-11T17:48:44Z

docs/qa/v034/README.md

+
+We can observe there are some very high latencies, towards the end of the test.
+Upon suspicion that they are duplicate transactions, we examined the latencies
+raw file and discovered there are more than 100K duplicate transactions.


Why were there so many duplicate transactions? I thought the scheme we were using with the load generation tool would ensure uniqueness among the transactions by way of the value component of the key/value pairs?

The reason why I ask this, is that this could explain why we had so few valid transactions during the tests. This skews the throughput rate calculation.

The load generator produces unique transactions, the proof is that v0.37 doesn't contain any duplicate TX. The impression I have is that somehow some transactions are gossiped around and keep on being accepted in blocks. I know there is a protection built into the mempool to prevent this, but I guess it is not working all the time.
Gut feeling is that there is a bug that was triggered in the v0.34 run, but not in the 0.37 run. My next step on this is to look into @jmalicevic's test to see if the reason for what she saw is also duplicate TXs. If that is the case, I will add some info to the issue she wrote, else, I'll create another issue.

Could it be that the same issue that affects the 200 node network at high load also affected this test? I.e. that the mempool cache is not large enough and so duplicate transactions end up being submitted.

I think so. I added an extra sentence to line 204

Signed-off-by: Thane Thomson <connect@thanethomson.com>

docs/qa/method.md

docs/qa/v034/README.md

williambanfield · 2022-10-15T23:09:25Z

docs/qa/v034/README.md

+
+We can observe there are some very high latencies, towards the end of the test.
+Upon suspicion that they are duplicate transactions, we examined the latencies
+raw file and discovered there are more than 100K duplicate transactions.


Could it be that the same issue that affects the 200 node network at high load also affected this test? I.e. that the mempool cache is not large enough and so duplicate transactions end up being submitted.

Update the latency vs throughput script to rather generate plots from the "raw" CSV output from the loadtime reporting tool as opposed to the separated CSV files from the experimental method. Also update the relevant documentation, and regenerate the images from the raw CSV data (resulting in pretty much the same plots as the previous ones). Signed-off-by: Thane Thomson <connect@thanethomson.com>

Signed-off-by: Thane Thomson <connect@thanethomson.com>

…vs throughput Signed-off-by: Thane Thomson <connect@thanethomson.com>

sergio-mena · 2022-10-17T09:15:34Z

docs/qa/v034/README.md

 The plot of latencies can we used as a baseline to compare with other releases.

-## Prometheus Metrics on the Chosen Experiment
+The following plot summarizes average latencies versus overall throughputs


@thanethomson , do you agree to move line 88 below your paragraph (and make it plural). I believe the plot you have introduced should also used in the comparison against the baseline.

sergio-mena · 2022-10-17T09:16:32Z

docs/qa/v037/README.md

+* (non-critical, not fixed) [\#9537] - With the default mempool cache setting,
+  duplicated transactions are not rejected when gossipped and eventually flood
+  all mempools. The 200 node testnets were thus run with a value of 200000 (as
+  opposed to the default 10000)


Thanks for tidying this up :-)

sergio-mena · 2022-10-17T09:35:20Z

scripts/qa/reporting/latency_throughput.py

+        avg_latency = exp['total_latencies'] / exp['tx_count']
+        exp_start_time = exp['block_time_min'] - exp['block_time_min_duration']
+        exp_duration = exp['block_time_max'] - exp_start_time
+        throughput_rate = exp['tx_count'] / exp_duration


The rest of the text uses "transactions per minute". Shall we change this to tx/min, or the rest of the text to txs/s ?

What's the usual convention when measuring throughput for such systems? I thought it was always txs/s? (cc @cason)

There is no really convention here, although in our papers, and in most papers I am aware of, we use transactions per second, using base 10 multipliers if the numbers are too high (which is not the case here). If the convention is to adopt the international SI, we should use seconds.

I saw that the text use blocks per minute, which makes more sense since we don't have more than a block, actually not even one block, per second. I saw that in the text we use transactions per minute (e.g., 4154) to keep this correspondence with the blocks (e.g. 45), but I think we have enough transactions per second (e.g. 69.2) in order to this metric to make sense.

In summary, is always about readability: we can reason better with 45 block/min than with 0.75 block/s, and with 69.2 tx/s than 4154 tx/min. In particular, I think that we reason better about 4.154 Ktx/min than 4154 tx/min. The problem here is that is not natural, straightforward, to divide by 60 to see the correspondence between the two metrics...

Yes, the timescales here do make more intuitive sense when thinking in minutes as opposed to seconds. Should I regenerate the latency vs throughput graphs using txs/min instead?

sergio-mena · 2022-10-17T09:41:44Z

scripts/qa/reporting/latency_throughput.py

+    # Sort stats for each number of connections in order of increasing
+    # throughput rate, and then extract average latencies and throughput rates
+    # as separate data series.
+    conns = sorted(stats.keys())


Aren't we doing the same in line 52 ?

Yes, but returning the sorted connection numbers would be a micro optimization that may hamper readability. Right now we're sorting an array of 3 integers here.

I'm actually not sure why we need to sort these at all. We already have the data for throughput vs latency correlated, so what does doing this add?

@thanethomson's script is plotting with lines, unlike the latencies plot. So I imagine you need to sort so that the lines are properly drawn. In the case of the connections loop, I understand it's for the legend to show the proper order

sergio-mena

I just reviewed @thanethomson's part.
I can't hit Approve as I would be approving my own commits, but I "Approve" Thane's part.

Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com>

williambanfield · 2022-10-17T14:49:50Z

scripts/qa/reporting/README.md

+#
+# This will generate a plot in a PNG file called 'tm034.png' in the current
+# directory based on the reporting tool CSV output in the "raw.csv" file. The
+# default title of the plot is overridden below.


Not sure I understand what this line means @thanethomson ?

In the example below the -t flag is used to override the default name of the plot ?

The default title text at the top of the plot is Tendermint latency vs throughput (defined here).

If you want to override that and customize the title text (mostly a cosmetic thing), then use the -t flag.

"Title" is a specific term as defined by matplotlib, the plotting library I'm using: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.title.html

Right, it's not clear to me which thing in the below text is overriding the title. I think that clarifying that the -t flag is doing that would be helpful.

It should hopefully be clearer now.

Also, one could just run ./latency_throughput.py --help to get more information on script usage.

williambanfield · 2022-10-17T15:01:15Z

scripts/qa/reporting/latency_throughput.py

+    # Compute average latency and throughput rate for each experiment
+    for exp_id, exp in experiments.items():
+        conns = exp['connections']
+        avg_latency = exp['total_latencies'] / exp['tx_count']


avg_latency is helpful for understanding how the majority of transactions fared when executed on the network. I think 95th percentile latency would also be useful to plot to understand how the least lucky transactions did in these scenarios.

I think percentiles are useful information too, but they are not that trivial to extract as the average.

Regarding that, I see very few use for the plots presenting all latencies. I would replace them by a CDF of latencies, that provide very useful information about the latency distribution. To produce a CDF we need to order all latencies by their "value", and normalize the transactions count into the 1-100 range.

This is true. We have the full data set and it should be straightforward enough to find a python library to perform the computation. I'm fine leaving this as average, I think we've put in enough alteration to this report and it's helpful enough in its current form.

Yeah grouping the data into percentiles should be easy enough. We can look at doing that for the next round of QA perhaps. Speaking of which, we should probably put together a "laundry list" issue keeping track of all the ways in which we'd want to improve the next round of QA.

scripts/qa/reporting/latency_throughput.py

Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com>

williambanfield

What's presented here looks good. We definitely have room for improvement in future releases, but for a first go at generating this report, I think we have a great baseline.

It looks like @cason has a few comments left as well.

cason · 2022-10-17T17:43:22Z

It looks like @cason has a few comments left as well.

Yes, but none of them should prevent merging this PR.

Signed-off-by: Thane Thomson <connect@thanethomson.com>

@jmalicevic

* 1st version. 200 nodes. Missing rotating node * Small fixes * Addressed @jmalicevic's comment * Explain in method how to set the tmint version to test. Improve result section * 1st version of how to run the 'rotating node' testnet * Apply suggestions from @williambanfield Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com> * Addressed @williambanfield's comments * Added reference to Unix load metric * Added total TXs * Fixed some 'png's that got swapped. Excluded '.*-node-exporter' processes from memory plots * Report for rotating node * Adressed remaining comments from @williambanfield * Cosmetic * Addressed some of @thanethomson's comments * Re-executed the 200 node tests and updated the corresponding sections of the report * Ignore Python virtualenv directories Signed-off-by: Thane Thomson <connect@thanethomson.com> * Add latency vs throughput script Signed-off-by: Thane Thomson <connect@thanethomson.com> * Add README for latency vs throughput script Signed-off-by: Thane Thomson <connect@thanethomson.com> * Fix local links to folders Signed-off-by: Thane Thomson <connect@thanethomson.com> * v034: only have one level-1 heading Signed-off-by: Thane Thomson <connect@thanethomson.com> * Adjust headings Signed-off-by: Thane Thomson <connect@thanethomson.com> * v0.37.x: add links to issues/PRs Signed-off-by: Thane Thomson <connect@thanethomson.com> * v0.37.x: add note about bug being present in v0.34 Signed-off-by: Thane Thomson <connect@thanethomson.com> * method: adjust heading depths Signed-off-by: Thane Thomson <connect@thanethomson.com> * Show data points on latency vs throughput plot Signed-off-by: Thane Thomson <connect@thanethomson.com> * Add latency vs throughput plots Signed-off-by: Thane Thomson <connect@thanethomson.com> * Correct mentioning of v0.34.21 and add heading Signed-off-by: Thane Thomson <connect@thanethomson.com> * Refactor latency vs throughput script Update the latency vs throughput script to rather generate plots from the "raw" CSV output from the loadtime reporting tool as opposed to the separated CSV files from the experimental method. Also update the relevant documentation, and regenerate the images from the raw CSV data (resulting in pretty much the same plots as the previous ones). Signed-off-by: Thane Thomson <connect@thanethomson.com> * Remove unused default duration const Signed-off-by: Thane Thomson <connect@thanethomson.com> * Adjust experiment start time to be more accurate and re-plot latency vs throughput Signed-off-by: Thane Thomson <connect@thanethomson.com> * Addressed @williambanfield's comments * Apply suggestions from code review Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com> * scripts: Update latency vs throughput readme for clarity Signed-off-by: Thane Thomson <connect@thanethomson.com> Signed-off-by: Thane Thomson <connect@thanethomson.com> Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com> Co-authored-by: Thane Thomson <connect@thanethomson.com> (cherry picked from commit b06e1ce)

@jmalicevic

* 1st version. 200 nodes. Missing rotating node * Small fixes * Addressed @jmalicevic's comment * Explain in method how to set the tmint version to test. Improve result section * 1st version of how to run the 'rotating node' testnet * Apply suggestions from @williambanfield Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com> * Addressed @williambanfield's comments * Added reference to Unix load metric * Added total TXs * Fixed some 'png's that got swapped. Excluded '.*-node-exporter' processes from memory plots * Report for rotating node * Adressed remaining comments from @williambanfield * Cosmetic * Addressed some of @thanethomson's comments * Re-executed the 200 node tests and updated the corresponding sections of the report * Ignore Python virtualenv directories Signed-off-by: Thane Thomson <connect@thanethomson.com> * Add latency vs throughput script Signed-off-by: Thane Thomson <connect@thanethomson.com> * Add README for latency vs throughput script Signed-off-by: Thane Thomson <connect@thanethomson.com> * Fix local links to folders Signed-off-by: Thane Thomson <connect@thanethomson.com> * v034: only have one level-1 heading Signed-off-by: Thane Thomson <connect@thanethomson.com> * Adjust headings Signed-off-by: Thane Thomson <connect@thanethomson.com> * v0.37.x: add links to issues/PRs Signed-off-by: Thane Thomson <connect@thanethomson.com> * v0.37.x: add note about bug being present in v0.34 Signed-off-by: Thane Thomson <connect@thanethomson.com> * method: adjust heading depths Signed-off-by: Thane Thomson <connect@thanethomson.com> * Show data points on latency vs throughput plot Signed-off-by: Thane Thomson <connect@thanethomson.com> * Add latency vs throughput plots Signed-off-by: Thane Thomson <connect@thanethomson.com> * Correct mentioning of v0.34.21 and add heading Signed-off-by: Thane Thomson <connect@thanethomson.com> * Refactor latency vs throughput script Update the latency vs throughput script to rather generate plots from the "raw" CSV output from the loadtime reporting tool as opposed to the separated CSV files from the experimental method. Also update the relevant documentation, and regenerate the images from the raw CSV data (resulting in pretty much the same plots as the previous ones). Signed-off-by: Thane Thomson <connect@thanethomson.com> * Remove unused default duration const Signed-off-by: Thane Thomson <connect@thanethomson.com> * Adjust experiment start time to be more accurate and re-plot latency vs throughput Signed-off-by: Thane Thomson <connect@thanethomson.com> * Addressed @williambanfield's comments * Apply suggestions from code review Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com> * scripts: Update latency vs throughput readme for clarity Signed-off-by: Thane Thomson <connect@thanethomson.com> Signed-off-by: Thane Thomson <connect@thanethomson.com> Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com> Co-authored-by: Thane Thomson <connect@thanethomson.com> (cherry picked from commit b06e1ce)

@jmalicevic

* 1st version. 200 nodes. Missing rotating node * Small fixes * Addressed @jmalicevic's comment * Explain in method how to set the tmint version to test. Improve result section * 1st version of how to run the 'rotating node' testnet * Apply suggestions from @williambanfield Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com> * Addressed @williambanfield's comments * Added reference to Unix load metric * Added total TXs * Fixed some 'png's that got swapped. Excluded '.*-node-exporter' processes from memory plots * Report for rotating node * Adressed remaining comments from @williambanfield * Cosmetic * Addressed some of @thanethomson's comments * Re-executed the 200 node tests and updated the corresponding sections of the report * Ignore Python virtualenv directories Signed-off-by: Thane Thomson <connect@thanethomson.com> * Add latency vs throughput script Signed-off-by: Thane Thomson <connect@thanethomson.com> * Add README for latency vs throughput script Signed-off-by: Thane Thomson <connect@thanethomson.com> * Fix local links to folders Signed-off-by: Thane Thomson <connect@thanethomson.com> * v034: only have one level-1 heading Signed-off-by: Thane Thomson <connect@thanethomson.com> * Adjust headings Signed-off-by: Thane Thomson <connect@thanethomson.com> * v0.37.x: add links to issues/PRs Signed-off-by: Thane Thomson <connect@thanethomson.com> * v0.37.x: add note about bug being present in v0.34 Signed-off-by: Thane Thomson <connect@thanethomson.com> * method: adjust heading depths Signed-off-by: Thane Thomson <connect@thanethomson.com> * Show data points on latency vs throughput plot Signed-off-by: Thane Thomson <connect@thanethomson.com> * Add latency vs throughput plots Signed-off-by: Thane Thomson <connect@thanethomson.com> * Correct mentioning of v0.34.21 and add heading Signed-off-by: Thane Thomson <connect@thanethomson.com> * Refactor latency vs throughput script Update the latency vs throughput script to rather generate plots from the "raw" CSV output from the loadtime reporting tool as opposed to the separated CSV files from the experimental method. Also update the relevant documentation, and regenerate the images from the raw CSV data (resulting in pretty much the same plots as the previous ones). Signed-off-by: Thane Thomson <connect@thanethomson.com> * Remove unused default duration const Signed-off-by: Thane Thomson <connect@thanethomson.com> * Adjust experiment start time to be more accurate and re-plot latency vs throughput Signed-off-by: Thane Thomson <connect@thanethomson.com> * Addressed @williambanfield's comments * Apply suggestions from code review Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com> * scripts: Update latency vs throughput readme for clarity Signed-off-by: Thane Thomson <connect@thanethomson.com> Signed-off-by: Thane Thomson <connect@thanethomson.com> Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com> Co-authored-by: Thane Thomson <connect@thanethomson.com> (cherry picked from commit b06e1ce) Co-authored-by: Sergio Mena <sergio@informal.systems>

@jmalicevic

) (#9578) * QA Process report for v0.37.x (and baseline for v0.34.x) (#9499) * 1st version. 200 nodes. Missing rotating node * Small fixes * Addressed @jmalicevic's comment * Explain in method how to set the tmint version to test. Improve result section * 1st version of how to run the 'rotating node' testnet * Apply suggestions from @williambanfield Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com> * Addressed @williambanfield's comments * Added reference to Unix load metric * Added total TXs * Fixed some 'png's that got swapped. Excluded '.*-node-exporter' processes from memory plots * Report for rotating node * Adressed remaining comments from @williambanfield * Cosmetic * Addressed some of @thanethomson's comments * Re-executed the 200 node tests and updated the corresponding sections of the report * Ignore Python virtualenv directories Signed-off-by: Thane Thomson <connect@thanethomson.com> * Add latency vs throughput script Signed-off-by: Thane Thomson <connect@thanethomson.com> * Add README for latency vs throughput script Signed-off-by: Thane Thomson <connect@thanethomson.com> * Fix local links to folders Signed-off-by: Thane Thomson <connect@thanethomson.com> * v034: only have one level-1 heading Signed-off-by: Thane Thomson <connect@thanethomson.com> * Adjust headings Signed-off-by: Thane Thomson <connect@thanethomson.com> * v0.37.x: add links to issues/PRs Signed-off-by: Thane Thomson <connect@thanethomson.com> * v0.37.x: add note about bug being present in v0.34 Signed-off-by: Thane Thomson <connect@thanethomson.com> * method: adjust heading depths Signed-off-by: Thane Thomson <connect@thanethomson.com> * Show data points on latency vs throughput plot Signed-off-by: Thane Thomson <connect@thanethomson.com> * Add latency vs throughput plots Signed-off-by: Thane Thomson <connect@thanethomson.com> * Correct mentioning of v0.34.21 and add heading Signed-off-by: Thane Thomson <connect@thanethomson.com> * Refactor latency vs throughput script Update the latency vs throughput script to rather generate plots from the "raw" CSV output from the loadtime reporting tool as opposed to the separated CSV files from the experimental method. Also update the relevant documentation, and regenerate the images from the raw CSV data (resulting in pretty much the same plots as the previous ones). Signed-off-by: Thane Thomson <connect@thanethomson.com> * Remove unused default duration const Signed-off-by: Thane Thomson <connect@thanethomson.com> * Adjust experiment start time to be more accurate and re-plot latency vs throughput Signed-off-by: Thane Thomson <connect@thanethomson.com> * Addressed @williambanfield's comments * Apply suggestions from code review Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com> * scripts: Update latency vs throughput readme for clarity Signed-off-by: Thane Thomson <connect@thanethomson.com> Signed-off-by: Thane Thomson <connect@thanethomson.com> Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com> Co-authored-by: Thane Thomson <connect@thanethomson.com> (cherry picked from commit b06e1ce) * Remove v037 dir * Removed reference to v0.37 testnets Co-authored-by: Sergio Mena <sergio@informal.systems>

…x-rc1 * release/v0.37.0-rc1: QA Process report for v0.37.x (and baseline for v0.34.x) (tendermint#9499) (tendermint#9577) Fix TX payload for DO testnets (tendermint#9540) (tendermint#9542) blocksync: retry requests after timeout (backport tendermint#9518) (tendermint#9533) Extend the load report tool to include transactions' hashes (tendermint#9509) (tendermint#9513) build(deps): Bump styfle/cancel-workflow-action from 0.10.0 to 0.10.1 (tendermint#9502) build(deps): Bump actions/stale from 5 to 6 (tendermint#9494) loadtime: add block time to the data point (tendermint#9484) (tendermint#9489) config: Add missing storage section when generating config (tendermint#9483) (tendermint#9487) Sync Vote.Verify() in spec with implementation (tendermint#9466) (tendermint#9476) fix spec (tendermint#9467) (tendermint#9469) metrics: fix panic because of absent prometheus label (tendermint#9455) (tendermint#9474) Ensure Dockerfile stages use consistent Go version (backport tendermint#9462) (tendermint#9472) build(deps): Bump slackapi/slack-github-action from 1.21.0 to 1.22.0 (tendermint#9432) build(deps): Bump bufbuild/buf-setup-action from 1.7.0 to 1.8.0 (tendermint#9453) state: restore previous error message (tendermint#9435) (tendermint#9440) build(deps): Bump gonum.org/v1/gonum from 0.11.0 to 0.12.0 (tendermint#9411) docs: Update ADRs for v0.37 (tendermint#9399) (tendermint#9418) build(deps): Bump github.com/spf13/viper from 1.12.0 to 1.13.0 (tendermint#9410) build(deps): Bump github.com/lib/pq from 1.10.6 to 1.10.7 (tendermint#9409) feat: support HTTPS inside websocket (tendermint#9416) (tendermint#9422) Removed unused param (tendermint#9394) test: generate uuid on startup for load tool (tendermint#9383) (tendermint#9392) add redirect links (tendermint#9385) (tendermint#9389) refactor: migrate to cosmos/gogoproto (backport tendermint#9356) (tendermint#9381) cmd: print all versions of tendermint and its sub protocols (tendermint#9329) (tendermint#9386) Add missing changes changelog files (backport tendermint#9376) (tendermint#9382) add separated runs by UUID (backport tendermint#9367) (tendermint#9379) spec: abci++ cleanup for v0.37 (backport tendermint#9288) (tendermint#9374) ci: Remove "(WARNING: BETA SOFTWARE)" tagline from all upcoming releases (tendermint#9371) (tendermint#9372) Update rpc client header (tendermint#9276) (tendermint#9349) ci: Pre-release workflows (backport tendermint#9366) (tendermint#9368) test: add the loadtime report tool (tendermint#9351) (tendermint#9364) Update Tendermint version to v0.37.0 (tendermint#9354) test: add the loadtime tool (tendermint#9342) (tendermint#9357) # Conflicts: # version/version.go

1st version. 200 nodes. Missing rotating node

a2ca2d4

sergio-mena changed the title ~~QA Process report for v0.37.x (and baseline for v0.34.x)~~ [WIP ]QA Process report for v0.37.x (and baseline for v0.34.x) Sep 29, 2022

sergio-mena changed the title ~~[WIP ]QA Process report for v0.37.x (and baseline for v0.34.x)~~ [WIP] QA Process report for v0.37.x (and baseline for v0.34.x) Sep 29, 2022

Small fixes

d38aea7

jmalicevic reviewed Sep 30, 2022

View reviewed changes

docs/qa/method.md Outdated Show resolved Hide resolved

sergio-mena added 3 commits September 30, 2022 11:29

Addressed @jmalicevic's comment

062bd3a

Explain in method how to set the tmint version to test. Improve resul…

5d152e1

…t section

1st version of how to run the 'rotating node' testnet

c13ea11

williambanfield reviewed Oct 1, 2022

View reviewed changes

sergio-mena and others added 6 commits October 3, 2022 11:43

Apply suggestions from @williambanfield

d34c44d

Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com>

Addressed @williambanfield's comments

8bcdbaf

Added reference to Unix load metric

8f1e9c1

Added total TXs

2c259cc

Fixed some 'png's that got swapped. Excluded '.*-node-exporter' proce…

e86f856

…sses from memory plots

Report for rotating node

bbbeb69

sergio-mena marked this pull request as ready for review October 11, 2022 13:59

sergio-mena requested a review from a team October 11, 2022 13:59

sergio-mena changed the title ~~[WIP] QA Process report for v0.37.x (and baseline for v0.34.x)~~ QA Process report for v0.37.x (and baseline for v0.34.x) Oct 11, 2022

sergio-mena assigned williambanfield and sergio-mena Oct 11, 2022

Adressed remaining comments from @williambanfield

4da1d23

sergio-mena added S:backport-to-v0.34.x Tell mergify to backport the PR to v0.34.x S:backport-to-v0.37.x Tell mergify to backport the PR to v0.37.x labels Oct 11, 2022

Cosmetic

2db2860

thanethomson reviewed Oct 11, 2022

View reviewed changes

docs/qa/v034/README.md Outdated Show resolved Hide resolved

docs/qa/v037/README.md Outdated Show resolved Hide resolved

docs/qa/v034/img/v034_report_tabbed.txt Outdated Show resolved Hide resolved

thanethomson reviewed Oct 11, 2022

View reviewed changes

thanethomson added 2 commits October 15, 2022 07:46

Correct mentioning of v0.34.21 and add heading

3c0f9e8

Signed-off-by: Thane Thomson <connect@thanethomson.com>

Merge branch 'main' into sergio/qa-report

659aec5

williambanfield suggested changes Oct 15, 2022

View reviewed changes

thanethomson added 3 commits October 16, 2022 07:53

Remove unused default duration const

e03e370

Signed-off-by: Thane Thomson <connect@thanethomson.com>

Adjust experiment start time to be more accurate and re-plot latency …

d363b4b

…vs throughput Signed-off-by: Thane Thomson <connect@thanethomson.com>

sergio-mena commented Oct 17, 2022

View reviewed changes

sergio-mena and others added 2 commits October 17, 2022 12:15

Addressed @williambanfield's comments

c5b614b

Apply suggestions from code review

be349c1

Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com>

williambanfield reviewed Oct 17, 2022

View reviewed changes

Apply suggestions from code review

697c485

Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com>

williambanfield approved these changes Oct 17, 2022

View reviewed changes

thanethomson added 2 commits October 17, 2022 15:31

scripts: Update latency vs throughput readme for clarity

a467def

Signed-off-by: Thane Thomson <connect@thanethomson.com>

Merge branch 'main' into sergio/qa-report

5a14a65

sergio-mena merged commit b06e1ce into main Oct 17, 2022

sergio-mena deleted the sergio/qa-report branch October 17, 2022 20:08

mergify bot mentioned this pull request Oct 17, 2022

QA Process report for v0.37.x (and baseline for v0.34.x) (backport #9499) #9577

Merged

mergify bot mentioned this pull request Oct 17, 2022

QA Process report for v0.37.x (and baseline for v0.34.x) (backport #9499) #9578

Merged

sergio-mena mentioned this pull request Oct 19, 2022

qa: Capture v0.37 pre-release metrics #9333

Closed

cason mentioned this pull request Mar 8, 2023

p2p: dialing addresses provided by a seed node disregards MaxNumOutboundPeers parameter cometbft/cometbft#486

Closed

Conversation

sergio-mena commented Sep 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tac0turtle commented Oct 2, 2022

Uh oh!

thanethomson commented Oct 2, 2022

Uh oh!

sergio-mena commented Oct 3, 2022

Uh oh!

tac0turtle commented Oct 3, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sergio-mena Oct 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sergio-mena Oct 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cason Oct 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sergio-mena left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

sergio-mena commented Sep 29, 2022 •

edited

Loading

sergio-mena Oct 11, 2022 •

edited

Loading

sergio-mena Oct 17, 2022 •

edited

Loading

cason Oct 17, 2022 •

edited

Loading

thanethomson Oct 17, 2022 •

edited

Loading