test: add the loadtime report tool#9351
Conversation
| pdc <- payloadData{err: err} | ||
| continue | ||
| } | ||
| l := b.bt.Sub(p.Time.AsTime()) |
There was a problem hiding this comment.
We agreed that, for this scheme to work well, we'll need to configure NTP on nodes involved.
In order to (try to) catch the fact that clocks are not synchronized, could you add some (rough) sanity checks here? For instance:
- the resulting duration should be positive
- the resulting duration should be less than, say, 120 seconds
These could also be done by inspecting the results provided by the report (say, the script that calls report). Not sure what's best here.
There was a problem hiding this comment.
the resulting duration should be positive
I think it may make sense to just report a count of negative durations. I doubt we want to error in those cases since the data is still perhaps interesting. Does this seem reasonable?
There was a problem hiding this comment.
Thinking a bit more about this, probably a single flag in the report (set to true if any delta is negative or ridiculously big) should suffice.
If we have evidence the the clocks aren't synchronized (because we forgot, or NTP daemon crashed, or was accidentally stopped in one of the machines), the simplest course of action to me would be to repeat the test from scratch with a good NTP setup, rather than trying to extract info from a report with skewed measurements.
In short, a count of negative (or very big) durations would be good, but we can also have just a general flag ("clocks likely not in sync").
There was a problem hiding this comment.
ridiculously big
We have the Min and Max so users can determine this on their own.
but we can also have just a general flag ("clocks likely not in sync").
I'd prefer to not try to be much smarter than we need to be at the possible expense of clarity. Adding such a flag is doing the work of interpreting the data for the user which I don't think this tool should be doing. IMO, it should report the data in a manner that helps the user draw conclusions about what happened during the experimental run. It should be clear enough from the outputs given (large number of negative durations, or huge values for Max and Min) that the clocks are not in sync.
Co-authored-by: Sergio Mena <sergio@informal.systems>
This pull request adds the report tool and modifies the loadtime libraries to better support its use. (cherry picked from commit 8655080)
This pull request adds the report tool and modifies the loadtime libraries to better support its use. (cherry picked from commit 8655080)
* test: add the loadtime report tool (#9351) This pull request adds the report tool and modifies the loadtime libraries to better support its use. (cherry picked from commit 8655080) * add nolint Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com> Co-authored-by: William Banfield <wbanfield@gmail.com>
…x-rc1 * release/v0.37.0-rc1: QA Process report for v0.37.x (and baseline for v0.34.x) (tendermint#9499) (tendermint#9577) Fix TX payload for DO testnets (tendermint#9540) (tendermint#9542) blocksync: retry requests after timeout (backport tendermint#9518) (tendermint#9533) Extend the load report tool to include transactions' hashes (tendermint#9509) (tendermint#9513) build(deps): Bump styfle/cancel-workflow-action from 0.10.0 to 0.10.1 (tendermint#9502) build(deps): Bump actions/stale from 5 to 6 (tendermint#9494) loadtime: add block time to the data point (tendermint#9484) (tendermint#9489) config: Add missing storage section when generating config (tendermint#9483) (tendermint#9487) Sync Vote.Verify() in spec with implementation (tendermint#9466) (tendermint#9476) fix spec (tendermint#9467) (tendermint#9469) metrics: fix panic because of absent prometheus label (tendermint#9455) (tendermint#9474) Ensure Dockerfile stages use consistent Go version (backport tendermint#9462) (tendermint#9472) build(deps): Bump slackapi/slack-github-action from 1.21.0 to 1.22.0 (tendermint#9432) build(deps): Bump bufbuild/buf-setup-action from 1.7.0 to 1.8.0 (tendermint#9453) state: restore previous error message (tendermint#9435) (tendermint#9440) build(deps): Bump gonum.org/v1/gonum from 0.11.0 to 0.12.0 (tendermint#9411) docs: Update ADRs for v0.37 (tendermint#9399) (tendermint#9418) build(deps): Bump github.com/spf13/viper from 1.12.0 to 1.13.0 (tendermint#9410) build(deps): Bump github.com/lib/pq from 1.10.6 to 1.10.7 (tendermint#9409) feat: support HTTPS inside websocket (tendermint#9416) (tendermint#9422) Removed unused param (tendermint#9394) test: generate uuid on startup for load tool (tendermint#9383) (tendermint#9392) add redirect links (tendermint#9385) (tendermint#9389) refactor: migrate to cosmos/gogoproto (backport tendermint#9356) (tendermint#9381) cmd: print all versions of tendermint and its sub protocols (tendermint#9329) (tendermint#9386) Add missing changes changelog files (backport tendermint#9376) (tendermint#9382) add separated runs by UUID (backport tendermint#9367) (tendermint#9379) spec: abci++ cleanup for v0.37 (backport tendermint#9288) (tendermint#9374) ci: Remove "(WARNING: BETA SOFTWARE)" tagline from all upcoming releases (tendermint#9371) (tendermint#9372) Update rpc client header (tendermint#9276) (tendermint#9349) ci: Pre-release workflows (backport tendermint#9366) (tendermint#9368) test: add the loadtime report tool (tendermint#9351) (tendermint#9364) Update Tendermint version to v0.37.0 (tendermint#9354) test: add the loadtime tool (tendermint#9342) (tendermint#9357) # Conflicts: # version/version.go
This pull request adds the
reporttool and modifies theloadtimelibraries to better support its use.It moves the previously named
loadtimetool to be called justload, built undertest/loadtime/cmd/load/. The new report tool is built fromtest/loadtime/cmd/report. These commands cannot easily be made subcommands in the same binary becauseloadmakes use of thetm-load-testframework, which provides flags and subcommands of its own. Making both subcommands of the same binary would require surgery to that code base and likely is not worth it.The
reporttool works by opening a handle to the Tendermint blockstore and reading all of the transactions out of the block. It calculates the latency of each transaction by differencing the timestamp listed in the transactionPayloadand the timestamp of the block. It uses these latencies to calculate a set of metrics including the minimum, maximum, average and standard deviation value.The tool currently treats all of the transactions it finds as being part of the same experimental run, but minor additional work is needed to separate transactions by experimental variables (Connections / Size / Rate as listed in the
Payloaddata). See: #9352Sample Output
closes: #9331
PR checklist
CHANGELOG_PENDING.mdupdated, or no changelog entry neededdocs/) and code comments, or nodocumentation updates needed