test: add the loadtime report tool by williambanfield · Pull Request #9351 · tendermint/tendermint

williambanfield · 2022-09-01T21:59:56Z

This pull request adds the report tool and modifies the loadtime libraries to better support its use.
It moves the previously named loadtime tool to be called just load, built under test/loadtime/cmd/load/. The new report tool is built from test/loadtime/cmd/report. These commands cannot easily be made subcommands in the same binary because load makes use of the tm-load-test framework, which provides flags and subcommands of its own. Making both subcommands of the same binary would require surgery to that code base and likely is not worth it.

The report tool works by opening a handle to the Tendermint blockstore and reading all of the transactions out of the block. It calculates the latency of each transaction by differencing the timestamp listed in the transaction Payload and the timestamp of the block. It uses these latencies to calculate a set of metrics including the minimum, maximum, average and standard deviation value.

The tool currently treats all of the transactions it finds as being part of the same experimental run, but minor additional work is needed to separate transactions by experimental variables (Connections / Size / Rate as listed in the Payload data). See: #9352

Sample Output

./build/report --database-type goleveldb --data-dir ~/.tendermint/data
Total Valid Tx: 9000
Total Invalid Tx: 0
Minimum Latency: 351.103223ms
Maximum Latency: 1.040886707s
Average Latency: 703.895594ms
Standard Deviation: 214.400818ms

closes: #9331

PR checklist

Tests written/updated, or no tests needed
CHANGELOG_PENDING.md updated, or no changelog entry needed
Updated relevant documentation (docs/) and code comments, or no
documentation updates needed

test/loadtime/report/report.go

test/loadtime/README.md

test/loadtime/cmd/report/main.go

sergio-mena · 2022-09-02T08:34:14Z

test/loadtime/report/report.go

+					pdc <- payloadData{err: err}
+					continue
+				}
+				l := b.bt.Sub(p.Time.AsTime())


We agreed that, for this scheme to work well, we'll need to configure NTP on nodes involved.
In order to (try to) catch the fact that clocks are not synchronized, could you add some (rough) sanity checks here? For instance:

the resulting duration should be positive

the resulting duration should be less than, say, 120 seconds

These could also be done by inspecting the results provided by the report (say, the script that calls report). Not sure what's best here.

the resulting duration should be positive

I think it may make sense to just report a count of negative durations. I doubt we want to error in those cases since the data is still perhaps interesting. Does this seem reasonable?

Thinking a bit more about this, probably a single flag in the report (set to true if any delta is negative or ridiculously big) should suffice.
If we have evidence the the clocks aren't synchronized (because we forgot, or NTP daemon crashed, or was accidentally stopped in one of the machines), the simplest course of action to me would be to repeat the test from scratch with a good NTP setup, rather than trying to extract info from a report with skewed measurements.

In short, a count of negative (or very big) durations would be good, but we can also have just a general flag ("clocks likely not in sync").

ridiculously big

We have the Min and Max so users can determine this on their own.

but we can also have just a general flag ("clocks likely not in sync").

I'd prefer to not try to be much smarter than we need to be at the possible expense of clarity. Adding such a flag is doing the work of interpreting the data for the user which I don't think this tool should be doing. IMO, it should report the data in a manner that helps the user draw conclusions about what happened during the experimental run. It should be clear enough from the outputs given (large number of negative durations, or huge values for Max and Min) that the clocks are not in sync.

sergio-mena

Cool job!

Co-authored-by: Sergio Mena <sergio@informal.systems>

This pull request adds the report tool and modifies the loadtime libraries to better support its use. (cherry picked from commit 8655080)

This pull request adds the report tool and modifies the loadtime libraries to better support its use. (cherry picked from commit 8655080) Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com>

* test: add the loadtime report tool (#9351) This pull request adds the report tool and modifies the loadtime libraries to better support its use. (cherry picked from commit 8655080) * add nolint Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com> Co-authored-by: William Banfield <wbanfield@gmail.com>

…x-rc1 * release/v0.37.0-rc1: QA Process report for v0.37.x (and baseline for v0.34.x) (tendermint#9499) (tendermint#9577) Fix TX payload for DO testnets (tendermint#9540) (tendermint#9542) blocksync: retry requests after timeout (backport tendermint#9518) (tendermint#9533) Extend the load report tool to include transactions' hashes (tendermint#9509) (tendermint#9513) build(deps): Bump styfle/cancel-workflow-action from 0.10.0 to 0.10.1 (tendermint#9502) build(deps): Bump actions/stale from 5 to 6 (tendermint#9494) loadtime: add block time to the data point (tendermint#9484) (tendermint#9489) config: Add missing storage section when generating config (tendermint#9483) (tendermint#9487) Sync Vote.Verify() in spec with implementation (tendermint#9466) (tendermint#9476) fix spec (tendermint#9467) (tendermint#9469) metrics: fix panic because of absent prometheus label (tendermint#9455) (tendermint#9474) Ensure Dockerfile stages use consistent Go version (backport tendermint#9462) (tendermint#9472) build(deps): Bump slackapi/slack-github-action from 1.21.0 to 1.22.0 (tendermint#9432) build(deps): Bump bufbuild/buf-setup-action from 1.7.0 to 1.8.0 (tendermint#9453) state: restore previous error message (tendermint#9435) (tendermint#9440) build(deps): Bump gonum.org/v1/gonum from 0.11.0 to 0.12.0 (tendermint#9411) docs: Update ADRs for v0.37 (tendermint#9399) (tendermint#9418) build(deps): Bump github.com/spf13/viper from 1.12.0 to 1.13.0 (tendermint#9410) build(deps): Bump github.com/lib/pq from 1.10.6 to 1.10.7 (tendermint#9409) feat: support HTTPS inside websocket (tendermint#9416) (tendermint#9422) Removed unused param (tendermint#9394) test: generate uuid on startup for load tool (tendermint#9383) (tendermint#9392) add redirect links (tendermint#9385) (tendermint#9389) refactor: migrate to cosmos/gogoproto (backport tendermint#9356) (tendermint#9381) cmd: print all versions of tendermint and its sub protocols (tendermint#9329) (tendermint#9386) Add missing changes changelog files (backport tendermint#9376) (tendermint#9382) add separated runs by UUID (backport tendermint#9367) (tendermint#9379) spec: abci++ cleanup for v0.37 (backport tendermint#9288) (tendermint#9374) ci: Remove "(WARNING: BETA SOFTWARE)" tagline from all upcoming releases (tendermint#9371) (tendermint#9372) Update rpc client header (tendermint#9276) (tendermint#9349) ci: Pre-release workflows (backport tendermint#9366) (tendermint#9368) test: add the loadtime report tool (tendermint#9351) (tendermint#9364) Update Tendermint version to v0.37.0 (tendermint#9354) test: add the loadtime tool (tendermint#9342) (tendermint#9357) # Conflicts: # version/version.go

williambanfield added 9 commits September 1, 2022 17:49

working basic report generator

7775dfa

begin separating commands

9df9615

add min, max, and error count

d6b8a8c

add basic report main

8960a19

use worker pool for parallel parsing

729d3e1

initial working version of all

f02d22c

support csv

006e14e

update readme

e768dee

fix test

9ad3132

williambanfield marked this pull request as ready for review September 1, 2022 22:02

williambanfield requested a review from ebuchman as a code owner September 1, 2022 22:02

williambanfield requested a review from a team September 1, 2022 22:02

williambanfield added 2 commits September 1, 2022 18:03

spell fix

7351714

fix test cases to exercise code correctly

0ba6562

williambanfield commented Sep 1, 2022

View reviewed changes

test/loadtime/report/report.go Show resolved Hide resolved

sergio-mena reviewed Sep 2, 2022

View reviewed changes

test/loadtime/README.md Outdated Show resolved Hide resolved

sergio-mena reviewed Sep 2, 2022

View reviewed changes

test/loadtime/cmd/report/main.go Outdated Show resolved Hide resolved

sergio-mena reviewed Sep 2, 2022

View reviewed changes

test/loadtime/cmd/report/main.go Show resolved Hide resolved

sergio-mena reviewed Sep 2, 2022

View reviewed changes

sergio-mena approved these changes Sep 2, 2022

View reviewed changes

williambanfield and others added 5 commits September 2, 2022 09:56

Update test/loadtime/README.md

63181ba

Co-authored-by: Sergio Mena <sergio@informal.systems>

fix latency calculation to take off-by-one into account

b490437

calculate the number of negative durations

114e521

remove data dir default

8b57f6b

comment regarding parallel ordering

bf1a8e1

williambanfield added S:backport-to-v0.34.x Tell mergify to backport the PR to v0.34.x S:backport-to-v0.37.x Tell mergify to backport the PR to v0.37.x labels Sep 2, 2022

Merge branch 'main' into wb/issue-9331

7aef399

williambanfield merged commit 8655080 into main Sep 2, 2022

williambanfield deleted the wb/issue-9331 branch September 2, 2022 17:32

mergify bot pushed a commit that referenced this pull request Sep 2, 2022

test: add the loadtime report tool (#9351)

7983035

This pull request adds the report tool and modifies the loadtime libraries to better support its use. (cherry picked from commit 8655080)

This was referenced Sep 2, 2022

test: add the loadtime report tool (backport #9351) #9364

Merged

test: add the loadtime report tool (backport #9351) #9365

Merged

mergify bot pushed a commit that referenced this pull request Sep 2, 2022

test: add the loadtime report tool (#9351)

2f7e541

This pull request adds the report tool and modifies the loadtime libraries to better support its use. (cherry picked from commit 8655080)

hvanz mentioned this pull request Apr 17, 2024

CometBFT E2E Latency GeoDecConsensus/geodec#13

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: add the loadtime report tool#9351

test: add the loadtime report tool#9351
williambanfield merged 17 commits intomainfrom
wb/issue-9331

williambanfield commented Sep 1, 2022 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sergio-mena Sep 2, 2022 •

edited

Loading

Uh oh!

williambanfield Sep 2, 2022

Uh oh!

sergio-mena Sep 2, 2022

Uh oh!

williambanfield Sep 2, 2022

Uh oh!

sergio-mena left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

williambanfield commented Sep 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sergio-mena Sep 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

williambanfield Sep 2, 2022

Choose a reason for hiding this comment

Uh oh!

sergio-mena Sep 2, 2022

Choose a reason for hiding this comment

Uh oh!

williambanfield Sep 2, 2022

Choose a reason for hiding this comment

Uh oh!

sergio-mena left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

williambanfield commented Sep 1, 2022 •

edited

Loading

sergio-mena Sep 2, 2022 •

edited

Loading