Skip to content

QA Process report for v0.37.x (and baseline for v0.34.x)#9499

Merged
sergio-mena merged 36 commits intomainfrom
sergio/qa-report
Oct 17, 2022
Merged

QA Process report for v0.37.x (and baseline for v0.34.x)#9499
sergio-mena merged 36 commits intomainfrom
sergio/qa-report

Conversation

@sergio-mena
Copy link
Contributor

@sergio-mena sergio-mena commented Sep 29, 2022

Closes #9123 and #9333

(rendered)

This PR provides the QA report for both v0.34.x (as a baseline) and v0.37.x.

  • There is a method section that describes how the testnets were run, and how the relevant data was then extracted for this report
    • The method is highly manual at the moment, and the idea is to increase automation with successive iterations.
  • The results section uses a sequence of plots to show the relevant data requested in the description of Capture v0.34 baseline metrics #9123
  • For v0.37.x, it also compares those plots to the baseline

PR checklist

  • Tests written/updated, or no tests needed
  • CHANGELOG_PENDING.md updated, or no changelog entry needed
  • Updated relevant documentation (docs/) and code comments, or no
    documentation updates needed

@sergio-mena sergio-mena changed the title QA Process report for v0.37.x (and baseline for v0.34.x) [WIP ]QA Process report for v0.37.x (and baseline for v0.34.x) Sep 29, 2022
@sergio-mena sergio-mena changed the title [WIP ]QA Process report for v0.37.x (and baseline for v0.34.x) [WIP] QA Process report for v0.37.x (and baseline for v0.34.x) Sep 29, 2022
@tac0turtle
Copy link
Contributor

this is amazing, super good insights!!

is there interest in testing:

  • variable mempool sizes?
  • different consensus configurations tested (timeouts)

@thanethomson
Copy link
Contributor

is there interest in testing:

  • variable mempool sizes?

  • different consensus configurations tested (timeouts)

This isn't planned for this round of QA, because every additional set of parameters represents an additional dimension to the tests. The costs of testing grow exponentially with each additional dimension.

We could, however, potentially look into how we could incorporate variations on mempool size and consensus params into future rounds of QA or research.

@sergio-mena
Copy link
Contributor Author

@marbar3778, further to @thanethomson's reply, when in doubt, we chose to set parameters to their defaults, so the reader can have an idea of what to expect if they don't tweak parameters.
The variations you propose make a lot of sense, but remember this is, at the moment, a highly manual activity (so, costly).

@tac0turtle
Copy link
Contributor

100%, was curious. This work is amazing, thank you for it!!

@sergio-mena sergio-mena marked this pull request as ready for review October 11, 2022 13:59
@sergio-mena sergio-mena requested a review from a team October 11, 2022 13:59
@sergio-mena sergio-mena changed the title [WIP] QA Process report for v0.37.x (and baseline for v0.34.x) QA Process report for v0.37.x (and baseline for v0.34.x) Oct 11, 2022
@sergio-mena sergio-mena added S:backport-to-v0.34.x Tell mergify to backport the PR to v0.34.x S:backport-to-v0.37.x Tell mergify to backport the PR to v0.37.x labels Oct 11, 2022

We can observe there are some very high latencies, towards the end of the test.
Upon suspicion that they are duplicate transactions, we examined the latencies
raw file and discovered there are more than 100K duplicate transactions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why were there so many duplicate transactions? I thought the scheme we were using with the load generation tool would ensure uniqueness among the transactions by way of the value component of the key/value pairs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why I ask this, is that this could explain why we had so few valid transactions during the tests. This skews the throughput rate calculation.

Copy link
Contributor Author

@sergio-mena sergio-mena Oct 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The load generator produces unique transactions, the proof is that v0.37 doesn't contain any duplicate TX. The impression I have is that somehow some transactions are gossiped around and keep on being accepted in blocks. I know there is a protection built into the mempool to prevent this, but I guess it is not working all the time.
Gut feeling is that there is a bug that was triggered in the v0.34 run, but not in the 0.37 run. My next step on this is to look into @jmalicevic's test to see if the reason for what she saw is also duplicate TXs. If that is the case, I will add some info to the issue she wrote, else, I'll create another issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be that the same issue that affects the 200 node network at high load also affected this test? I.e. that the mempool cache is not large enough and so duplicate transactions end up being submitted.

Copy link
Contributor Author

@sergio-mena sergio-mena Oct 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so. I added an extra sentence to line 204


We can observe there are some very high latencies, towards the end of the test.
Upon suspicion that they are duplicate transactions, we examined the latencies
raw file and discovered there are more than 100K duplicate transactions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be that the same issue that affects the 200 node network at high load also affected this test? I.e. that the mempool cache is not large enough and so duplicate transactions end up being submitted.

Update the latency vs throughput script to rather generate plots from
the "raw" CSV output from the loadtime reporting tool as opposed to the
separated CSV files from the experimental method.

Also update the relevant documentation, and regenerate the images from
the raw CSV data (resulting in pretty much the same plots as the
previous ones).

Signed-off-by: Thane Thomson <connect@thanethomson.com>
Signed-off-by: Thane Thomson <connect@thanethomson.com>
…vs throughput

Signed-off-by: Thane Thomson <connect@thanethomson.com>
The plot of latencies can we used as a baseline to compare with other releases.

## Prometheus Metrics on the Chosen Experiment
The following plot summarizes average latencies versus overall throughputs
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thanethomson , do you agree to move line 88 below your paragraph (and make it plural). I believe the plot you have introduced should also used in the comparison against the baseline.

* (non-critical, not fixed) [\#9537] - With the default mempool cache setting,
duplicated transactions are not rejected when gossipped and eventually flood
all mempools. The 200 node testnets were thus run with a value of 200000 (as
opposed to the default 10000)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tidying this up :-)

avg_latency = exp['total_latencies'] / exp['tx_count']
exp_start_time = exp['block_time_min'] - exp['block_time_min_duration']
exp_duration = exp['block_time_max'] - exp_start_time
throughput_rate = exp['tx_count'] / exp_duration
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest of the text uses "transactions per minute". Shall we change this to tx/min, or the rest of the text to txs/s ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the usual convention when measuring throughput for such systems? I thought it was always txs/s? (cc @cason)

Copy link

@cason cason Oct 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no really convention here, although in our papers, and in most papers I am aware of, we use transactions per second, using base 10 multipliers if the numbers are too high (which is not the case here). If the convention is to adopt the international SI, we should use seconds.

I saw that the text use blocks per minute, which makes more sense since we don't have more than a block, actually not even one block, per second. I saw that in the text we use transactions per minute (e.g., 4154) to keep this correspondence with the blocks (e.g. 45), but I think we have enough transactions per second (e.g. 69.2) in order to this metric to make sense.

In summary, is always about readability: we can reason better with 45 block/min than with 0.75 block/s, and with 69.2 tx/s than 4154 tx/min. In particular, I think that we reason better about 4.154 Ktx/min than 4154 tx/min. The problem here is that is not natural, straightforward, to divide by 60 to see the correspondence between the two metrics...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the timescales here do make more intuitive sense when thinking in minutes as opposed to seconds. Should I regenerate the latency vs throughput graphs using txs/min instead?

# Sort stats for each number of connections in order of increasing
# throughput rate, and then extract average latencies and throughput rates
# as separate data series.
conns = sorted(stats.keys())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't we doing the same in line 52 ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but returning the sorted connection numbers would be a micro optimization that may hamper readability. Right now we're sorting an array of 3 integers here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually not sure why we need to sort these at all. We already have the data for throughput vs latency correlated, so what does doing this add?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thanethomson's script is plotting with lines, unlike the latencies plot. So I imagine you need to sort so that the lines are properly drawn. In the case of the connections loop, I understand it's for the legend to show the proper order

Copy link
Contributor Author

@sergio-mena sergio-mena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just reviewed @thanethomson's part.
I can't hit Approve as I would be approving my own commits, but I "Approve" Thane's part.

sergio-mena and others added 2 commits October 17, 2022 12:15
Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com>
#
# This will generate a plot in a PNG file called 'tm034.png' in the current
# directory based on the reporting tool CSV output in the "raw.csv" file. The
# default title of the plot is overridden below.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand what this line means @thanethomson ?

In the example below the -t flag is used to override the default name of the plot ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default title text at the top of the plot is Tendermint latency vs throughput (defined here).

If you want to override that and customize the title text (mostly a cosmetic thing), then use the -t flag.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Title" is a specific term as defined by matplotlib, the plotting library I'm using: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.title.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it's not clear to me which thing in the below text is overriding the title. I think that clarifying that the -t flag is doing that would be helpful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should hopefully be clearer now.

Copy link
Contributor

@thanethomson thanethomson Oct 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, one could just run ./latency_throughput.py --help to get more information on script usage.

# Compute average latency and throughput rate for each experiment
for exp_id, exp in experiments.items():
conns = exp['connections']
avg_latency = exp['total_latencies'] / exp['tx_count']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avg_latency is helpful for understanding how the majority of transactions fared when executed on the network. I think 95th percentile latency would also be useful to plot to understand how the least lucky transactions did in these scenarios.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think percentiles are useful information too, but they are not that trivial to extract as the average.

Regarding that, I see very few use for the plots presenting all latencies. I would replace them by a CDF of latencies, that provide very useful information about the latency distribution. To produce a CDF we need to order all latencies by their "value", and normalize the transactions count into the 1-100 range.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true. We have the full data set and it should be straightforward enough to find a python library to perform the computation. I'm fine leaving this as average, I think we've put in enough alteration to this report and it's helpful enough in its current form.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah grouping the data into percentiles should be easy enough. We can look at doing that for the next round of QA perhaps. Speaking of which, we should probably put together a "laundry list" issue keeping track of all the ways in which we'd want to improve the next round of QA.

Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com>
Copy link
Contributor

@williambanfield williambanfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's presented here looks good. We definitely have room for improvement in future releases, but for a first go at generating this report, I think we have a great baseline.

It looks like @cason has a few comments left as well.

@cason
Copy link

cason commented Oct 17, 2022

It looks like @cason has a few comments left as well.

Yes, but none of them should prevent merging this PR.

@sergio-mena sergio-mena merged commit b06e1ce into main Oct 17, 2022
@sergio-mena sergio-mena deleted the sergio/qa-report branch October 17, 2022 20:08
mergify bot pushed a commit that referenced this pull request Oct 17, 2022
* 1st version. 200 nodes. Missing rotating node

* Small fixes

* Addressed @jmalicevic's comment

* Explain in method how to set the tmint version to test. Improve result section

* 1st version of how to run the 'rotating node' testnet

* Apply suggestions from @williambanfield

Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com>

* Addressed @williambanfield's comments

* Added reference to Unix load metric

* Added total TXs

* Fixed some 'png's that got swapped. Excluded '.*-node-exporter' processes from memory plots

* Report for rotating node

* Adressed remaining comments from @williambanfield

* Cosmetic

* Addressed some of @thanethomson's comments

* Re-executed the 200 node tests and updated the corresponding sections of the report

* Ignore Python virtualenv directories

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Add latency vs throughput script

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Add README for latency vs throughput script

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Fix local links to folders

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* v034: only have one level-1 heading

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Adjust headings

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* v0.37.x: add links to issues/PRs

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* v0.37.x: add note about bug being present in v0.34

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* method: adjust heading depths

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Show data points on latency vs throughput plot

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Add latency vs throughput plots

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Correct mentioning of v0.34.21 and add heading

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Refactor latency vs throughput script

Update the latency vs throughput script to rather generate plots from
the "raw" CSV output from the loadtime reporting tool as opposed to the
separated CSV files from the experimental method.

Also update the relevant documentation, and regenerate the images from
the raw CSV data (resulting in pretty much the same plots as the
previous ones).

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Remove unused default duration const

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Adjust experiment start time to be more accurate and re-plot latency vs throughput

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Addressed @williambanfield's comments

* Apply suggestions from code review

Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com>

* scripts: Update latency vs throughput readme for clarity

Signed-off-by: Thane Thomson <connect@thanethomson.com>

Signed-off-by: Thane Thomson <connect@thanethomson.com>
Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com>
Co-authored-by: Thane Thomson <connect@thanethomson.com>
(cherry picked from commit b06e1ce)
mergify bot pushed a commit that referenced this pull request Oct 17, 2022
* 1st version. 200 nodes. Missing rotating node

* Small fixes

* Addressed @jmalicevic's comment

* Explain in method how to set the tmint version to test. Improve result section

* 1st version of how to run the 'rotating node' testnet

* Apply suggestions from @williambanfield

Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com>

* Addressed @williambanfield's comments

* Added reference to Unix load metric

* Added total TXs

* Fixed some 'png's that got swapped. Excluded '.*-node-exporter' processes from memory plots

* Report for rotating node

* Adressed remaining comments from @williambanfield

* Cosmetic

* Addressed some of @thanethomson's comments

* Re-executed the 200 node tests and updated the corresponding sections of the report

* Ignore Python virtualenv directories

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Add latency vs throughput script

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Add README for latency vs throughput script

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Fix local links to folders

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* v034: only have one level-1 heading

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Adjust headings

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* v0.37.x: add links to issues/PRs

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* v0.37.x: add note about bug being present in v0.34

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* method: adjust heading depths

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Show data points on latency vs throughput plot

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Add latency vs throughput plots

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Correct mentioning of v0.34.21 and add heading

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Refactor latency vs throughput script

Update the latency vs throughput script to rather generate plots from
the "raw" CSV output from the loadtime reporting tool as opposed to the
separated CSV files from the experimental method.

Also update the relevant documentation, and regenerate the images from
the raw CSV data (resulting in pretty much the same plots as the
previous ones).

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Remove unused default duration const

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Adjust experiment start time to be more accurate and re-plot latency vs throughput

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Addressed @williambanfield's comments

* Apply suggestions from code review

Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com>

* scripts: Update latency vs throughput readme for clarity

Signed-off-by: Thane Thomson <connect@thanethomson.com>

Signed-off-by: Thane Thomson <connect@thanethomson.com>
Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com>
Co-authored-by: Thane Thomson <connect@thanethomson.com>
(cherry picked from commit b06e1ce)
sergio-mena added a commit that referenced this pull request Oct 17, 2022
* 1st version. 200 nodes. Missing rotating node

* Small fixes

* Addressed @jmalicevic's comment

* Explain in method how to set the tmint version to test. Improve result section

* 1st version of how to run the 'rotating node' testnet

* Apply suggestions from @williambanfield

Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com>

* Addressed @williambanfield's comments

* Added reference to Unix load metric

* Added total TXs

* Fixed some 'png's that got swapped. Excluded '.*-node-exporter' processes from memory plots

* Report for rotating node

* Adressed remaining comments from @williambanfield

* Cosmetic

* Addressed some of @thanethomson's comments

* Re-executed the 200 node tests and updated the corresponding sections of the report

* Ignore Python virtualenv directories

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Add latency vs throughput script

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Add README for latency vs throughput script

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Fix local links to folders

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* v034: only have one level-1 heading

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Adjust headings

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* v0.37.x: add links to issues/PRs

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* v0.37.x: add note about bug being present in v0.34

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* method: adjust heading depths

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Show data points on latency vs throughput plot

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Add latency vs throughput plots

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Correct mentioning of v0.34.21 and add heading

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Refactor latency vs throughput script

Update the latency vs throughput script to rather generate plots from
the "raw" CSV output from the loadtime reporting tool as opposed to the
separated CSV files from the experimental method.

Also update the relevant documentation, and regenerate the images from
the raw CSV data (resulting in pretty much the same plots as the
previous ones).

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Remove unused default duration const

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Adjust experiment start time to be more accurate and re-plot latency vs throughput

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Addressed @williambanfield's comments

* Apply suggestions from code review

Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com>

* scripts: Update latency vs throughput readme for clarity

Signed-off-by: Thane Thomson <connect@thanethomson.com>

Signed-off-by: Thane Thomson <connect@thanethomson.com>
Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com>
Co-authored-by: Thane Thomson <connect@thanethomson.com>
(cherry picked from commit b06e1ce)

Co-authored-by: Sergio Mena <sergio@informal.systems>
thanethomson pushed a commit that referenced this pull request Oct 18, 2022
) (#9578)

* QA Process report for v0.37.x (and baseline for v0.34.x) (#9499)

* 1st version. 200 nodes. Missing rotating node

* Small fixes

* Addressed @jmalicevic's comment

* Explain in method how to set the tmint version to test. Improve result section

* 1st version of how to run the 'rotating node' testnet

* Apply suggestions from @williambanfield

Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com>

* Addressed @williambanfield's comments

* Added reference to Unix load metric

* Added total TXs

* Fixed some 'png's that got swapped. Excluded '.*-node-exporter' processes from memory plots

* Report for rotating node

* Adressed remaining comments from @williambanfield

* Cosmetic

* Addressed some of @thanethomson's comments

* Re-executed the 200 node tests and updated the corresponding sections of the report

* Ignore Python virtualenv directories

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Add latency vs throughput script

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Add README for latency vs throughput script

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Fix local links to folders

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* v034: only have one level-1 heading

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Adjust headings

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* v0.37.x: add links to issues/PRs

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* v0.37.x: add note about bug being present in v0.34

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* method: adjust heading depths

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Show data points on latency vs throughput plot

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Add latency vs throughput plots

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Correct mentioning of v0.34.21 and add heading

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Refactor latency vs throughput script

Update the latency vs throughput script to rather generate plots from
the "raw" CSV output from the loadtime reporting tool as opposed to the
separated CSV files from the experimental method.

Also update the relevant documentation, and regenerate the images from
the raw CSV data (resulting in pretty much the same plots as the
previous ones).

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Remove unused default duration const

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Adjust experiment start time to be more accurate and re-plot latency vs throughput

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Addressed @williambanfield's comments

* Apply suggestions from code review

Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com>

* scripts: Update latency vs throughput readme for clarity

Signed-off-by: Thane Thomson <connect@thanethomson.com>

Signed-off-by: Thane Thomson <connect@thanethomson.com>
Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com>
Co-authored-by: Thane Thomson <connect@thanethomson.com>
(cherry picked from commit b06e1ce)

* Remove v037 dir

* Removed reference to v0.37 testnets

Co-authored-by: Sergio Mena <sergio@informal.systems>
james-chf added a commit to heliaxdev/tendermint that referenced this pull request Nov 25, 2022
…x-rc1

* release/v0.37.0-rc1:
  QA Process report for v0.37.x (and baseline for v0.34.x) (tendermint#9499) (tendermint#9577)
  Fix TX payload for DO testnets (tendermint#9540) (tendermint#9542)
  blocksync: retry requests after timeout (backport tendermint#9518) (tendermint#9533)
  Extend the load report tool to include transactions' hashes (tendermint#9509) (tendermint#9513)
  build(deps): Bump styfle/cancel-workflow-action from 0.10.0 to 0.10.1 (tendermint#9502)
  build(deps): Bump actions/stale from 5 to 6 (tendermint#9494)
  loadtime: add block time to the data point (tendermint#9484) (tendermint#9489)
  config: Add missing storage section when generating config (tendermint#9483) (tendermint#9487)
  Sync Vote.Verify() in spec with implementation (tendermint#9466) (tendermint#9476)
  fix spec (tendermint#9467) (tendermint#9469)
  metrics: fix panic because of absent prometheus label (tendermint#9455) (tendermint#9474)
  Ensure Dockerfile stages use consistent Go version (backport tendermint#9462) (tendermint#9472)
  build(deps): Bump slackapi/slack-github-action from 1.21.0 to 1.22.0 (tendermint#9432)
  build(deps): Bump bufbuild/buf-setup-action from 1.7.0 to 1.8.0 (tendermint#9453)
  state: restore previous error message (tendermint#9435) (tendermint#9440)
  build(deps): Bump gonum.org/v1/gonum from 0.11.0 to 0.12.0 (tendermint#9411)
  docs: Update ADRs for v0.37 (tendermint#9399) (tendermint#9418)
  build(deps): Bump github.com/spf13/viper from 1.12.0 to 1.13.0 (tendermint#9410)
  build(deps): Bump github.com/lib/pq from 1.10.6 to 1.10.7 (tendermint#9409)
  feat: support HTTPS inside websocket (tendermint#9416) (tendermint#9422)
  Removed unused param (tendermint#9394)
  test: generate uuid on startup for load tool (tendermint#9383) (tendermint#9392)
  add redirect links (tendermint#9385) (tendermint#9389)
  refactor: migrate to cosmos/gogoproto (backport tendermint#9356) (tendermint#9381)
  cmd: print all versions of tendermint and its sub protocols  (tendermint#9329) (tendermint#9386)
  Add missing changes changelog files (backport tendermint#9376) (tendermint#9382)
  add separated runs by UUID (backport tendermint#9367) (tendermint#9379)
  spec: abci++ cleanup for v0.37 (backport tendermint#9288) (tendermint#9374)
  ci: Remove "(WARNING: BETA SOFTWARE)" tagline from all upcoming releases (tendermint#9371) (tendermint#9372)
  Update rpc client header (tendermint#9276) (tendermint#9349)
  ci: Pre-release workflows (backport tendermint#9366) (tendermint#9368)
  test: add the loadtime report tool (tendermint#9351) (tendermint#9364)
  Update Tendermint version to v0.37.0 (tendermint#9354)
  test: add the loadtime tool (tendermint#9342) (tendermint#9357)

# Conflicts:
#	version/version.go
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S:backport-to-v0.34.x Tell mergify to backport the PR to v0.34.x S:backport-to-v0.37.x Tell mergify to backport the PR to v0.37.x

Projects

Status: Done/Merged

Development

Successfully merging this pull request may close these issues.

Capture v0.34 baseline metrics

6 participants