Skip to content

e2e: add defer to ensure waitgroup is finished.#9822

Merged
williambanfield merged 3 commits intomainfrom
wb/defer-in-e2e-load
Dec 2, 2022
Merged

e2e: add defer to ensure waitgroup is finished.#9822
williambanfield merged 3 commits intomainfrom
wb/defer-in-e2e-load

Conversation

@williambanfield
Copy link
Contributor

@williambanfield williambanfield commented Dec 2, 2022

Aimed at fixing: https://github.com/tendermint/tendermint/actions/runs/3598326181/jobs/6060978982

The issue is that if the context in the createBatch function is canceled before all of the transactions are generated, we will hit the return in the ctx.Done() case of the select and the wg.Done() will never be called, causing the createBatch function to never return and therefore never allow more transactions to be generated.

To test this, I passed a context with very short timeout into the createBatch function and observed that, after the initial set of transactions fail because the connection is not yet established to the node, the ctx.Done() case is hit and no more transactions are ever created, leading to the same failure we see in the e2e nightly.

A sample of output from this experiment is below:

I[2022-12-02|09:43:02.357] Removing Docker containers and networks
I[2022-12-02|09:43:04.122] cleanup dir                                  msg="Removing testnet directory \"networks/nightly/gen-0000\""
I[2022-12-02|09:43:06.128] setup                                        msg="Generating testnet files in \"networks/nightly/gen-0000\""
I[2022-12-02|09:43:06.137] Starting initial network nodes...
I[2022-12-02|09:43:06.137] load                                         msg="Starting transaction load (16 workers)..."
broadcast fail
broadcast fail
...
broadcast fail
ctx done
broadcast fail
ctx done
ctx done
... 
ctx done

I[2022-12-02|09:43:08.785] start                                        msg="Node validator01 up on http://127.0.0.1:5701"
I[2022-12-02|09:43:08.785] Waiting for initial height                   height=1 nodes=1 pending=0
I[2022-12-02|09:43:10.788] wait until                                   msg="Waiting for all nodes to reach height 6..."
I[2022-12-02|09:43:15.014] Injecting evidence through validator01 (amount: 1)...
I[2022-12-02|09:43:18.036] Finished sending evidence (height 8)
I[2022-12-02|09:43:18.039] wait until                                   msg="Waiting for all nodes to reach height 14..."
E[2022-12-02|09:43:23.173] Transaction load failed: failed to submit any transactions
E[2022-12-02|09:43:23.173] failed to submit any transactions

This pull request also contains a fix to the number of transactions generated per batch. The purpose of the LoadTxBatchSize parameter was to generate a total of LoadTxBatchSize tx each second. However, the code as written generates workerPoolSize * LoadTxBatchSize transactions since each worker creates LoadTxBatchSize. This pull request places another loop outside the worker pool that instructs the worker pools to generate new transactions for a total of LoadTxBatchSize.

A full nightly run of the e2e test has been run with this code in place: https://github.com/tendermint/tendermint/actions/runs/3602704988

PR checklist

  • Tests written/updated, or no tests needed
  • CHANGELOG_PENDING.md updated, or no changelog entry needed
  • Updated relevant documentation (docs/) and code comments, or no
    documentation updates needed

@williambanfield williambanfield changed the title defer wait group completion e2e: add defer to ensure waitgroup is finished. Dec 2, 2022
@williambanfield williambanfield marked this pull request as ready for review December 2, 2022 15:04
@williambanfield williambanfield requested a review from a team December 2, 2022 15:04
Copy link
Contributor

@thanethomson thanethomson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be backported too right?

@williambanfield williambanfield added S:backport-to-v0.34.x Tell mergify to backport the PR to v0.34.x S:backport-to-v0.37.x Tell mergify to backport the PR to v0.37.x labels Dec 2, 2022
@williambanfield
Copy link
Contributor Author

@thanethomson Yup. I've been in the habit of adding the tags once everything looks right, but I'll just add those beforehand to make it clearer.

@williambanfield williambanfield merged commit 654e565 into main Dec 2, 2022
@williambanfield williambanfield deleted the wb/defer-in-e2e-load branch December 2, 2022 18:19
mergify bot pushed a commit that referenced this pull request Dec 2, 2022
* defer wait group completion

* generate correct number of tx

(cherry picked from commit 654e565)
mergify bot pushed a commit that referenced this pull request Dec 2, 2022
* defer wait group completion

* generate correct number of tx

(cherry picked from commit 654e565)
williambanfield added a commit that referenced this pull request Dec 2, 2022
* defer wait group completion

* generate correct number of tx

(cherry picked from commit 654e565)

Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com>
williambanfield added a commit that referenced this pull request Dec 2, 2022
* defer wait group completion

* generate correct number of tx

(cherry picked from commit 654e565)

Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com>
@moul moul mentioned this pull request Mar 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S:backport-to-v0.34.x Tell mergify to backport the PR to v0.34.x S:backport-to-v0.37.x Tell mergify to backport the PR to v0.37.x

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants