Improve wait for node in e2e runner for long catch-up times#375
Merged
sergio-mena merged 16 commits intomainfrom Mar 1, 2023
Merged
Improve wait for node in e2e runner for long catch-up times#375sergio-mena merged 16 commits intomainfrom
sergio-mena merged 16 commits intomainfrom
Conversation
Closed
8 tasks
lasarojc
reviewed
Feb 23, 2023
test/e2e/networks/long.toml
Outdated
| [node] | ||
| [node.full01] | ||
| mode = "full" | ||
| version = "ghcr.io/informalsystems/tendermint-e2e-node:v0.34.x" |
Contributor
There was a problem hiding this comment.
How can we make this general for all branches?
Maybe the update should be from cometbft/e2e-node:latest to cometbft/e2e-node:local?
Contributor
There was a problem hiding this comment.
Would this not actually break if merging into main? We'd be running a v0.34-based node alongside v0.38-based ones, right?
Collaborator
Author
There was a problem hiding this comment.
Hi, sorry for delay coming back to this. I looked at @lasarojc's commits below (thanks for them!) and I think they address this concern.
I have also removed backport-to-v0.34x as that will need to be a special case.
Are we still missing something here?
jmalicevic
reviewed
Feb 24, 2023
lasarojc
approved these changes
Feb 24, 2023
lasarojc
reviewed
Mar 1, 2023
3 tasks
…thup on long runs. Waits for all nodes for the duration of `timeout`, not each node for the duration of `timeout`.
mergify bot
pushed a commit
that referenced
this pull request
Mar 1, 2023
* e2e: Flexible timeout while waiting for a node * QA: added big e2e network manifest * Don't build latest locally, but rely on what has been released * Do not build `latest`=`local` locally. Long test upgrades from `latest` to `local`. * CatchingUp shouldn't be checked when waiting for height 0 * Adjust retention to support evidence gathering * fix typo * More adjestments for long-running e2e manifests * Makes wait proportional to the height, so nodes have more time to cacthup on long runs. Waits for all nodes for the duration of `timeout`, not each node for the duration of `timeout`. --------- Co-authored-by: lasarojc <lasaro@informal.systems> (cherry picked from commit 167a845) # Conflicts: # state/txindex/indexer_service.go
sergio-mena
added a commit
that referenced
this pull request
Mar 1, 2023
sergio-mena
added a commit
that referenced
this pull request
Mar 1, 2023
* e2e: Flexible timeout while waiting for a node * QA: added big e2e network manifest * Don't build latest locally, but rely on what has been released * Do not build `latest`=`local` locally. Long test upgrades from `latest` to `local`. * CatchingUp shouldn't be checked when waiting for height 0 * Adjust retention to support evidence gathering * fix typo * More adjestments for long-running e2e manifests * Makes wait proportional to the height, so nodes have more time to cacthup on long runs. Waits for all nodes for the duration of `timeout`, not each node for the duration of `timeout`. --------- Co-authored-by: lasarojc <lasaro@informal.systems>
sergio-mena
added a commit
that referenced
this pull request
Mar 1, 2023
…375) (#435) * Improve wait for node in e2e runner for long catch-up times (#375) * e2e: Flexible timeout while waiting for a node * QA: added big e2e network manifest * Don't build latest locally, but rely on what has been released * Do not build `latest`=`local` locally. Long test upgrades from `latest` to `local`. * CatchingUp shouldn't be checked when waiting for height 0 * Adjust retention to support evidence gathering * fix typo * More adjestments for long-running e2e manifests * Makes wait proportional to the height, so nodes have more time to cacthup on long runs. Waits for all nodes for the duration of `timeout`, not each node for the duration of `timeout`. --------- Co-authored-by: lasarojc <lasaro@informal.systems> (cherry picked from commit 167a845) # Conflicts: # state/txindex/indexer_service.go * Revert "Improve wait for node in e2e runner for long catch-up times (#375)" * Improve wait for node in e2e runner for long catch-up times (#375) * e2e: Flexible timeout while waiting for a node * QA: added big e2e network manifest * Don't build latest locally, but rely on what has been released * Do not build `latest`=`local` locally. Long test upgrades from `latest` to `local`. * CatchingUp shouldn't be checked when waiting for height 0 * Adjust retention to support evidence gathering * fix typo * More adjestments for long-running e2e manifests * Makes wait proportional to the height, so nodes have more time to cacthup on long runs. Waits for all nodes for the duration of `timeout`, not each node for the duration of `timeout`. --------- Co-authored-by: lasarojc <lasaro@informal.systems> * Remove trailing spaces --------- Co-authored-by: Sergio Mena <sergio@informal.systems> Co-authored-by: lasarojc <lasaro@informal.systems>
3 tasks
cason
pushed a commit
that referenced
this pull request
Jan 5, 2024
* PBTS: second version of system model * PBTS: new model referred in algorithm spec * PBTS: removed model discussion from algorithm spec * PBTS: corrections on the ystem model * PBTS: a pretty complex problem statement * PBTS: minor fixes on the problem spefication * PBTS: liveness part of problem specification * PBTS: link updated, outdated note on sysmodel_v1 * Update spec/consensus/proposer-based-timestamp/pbts-algorithm_002_draft.md Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com> * Apply William's suggestions from code review Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com> * PBTS: new discussion and definition for accuracy * Apply Josef's suggestion from code review Co-authored-by: Josef Widder <44643235+josef-widder@users.noreply.github.com> * PBTS: some tags added to sysmodel * PBTS: motivation and link to Issue #371 * PBTS: fixing lint error Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com> Co-authored-by: Josef Widder <44643235+josef-widder@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Contributes to #283
The e2e runner has a fix timeout when waiting for a process to reach a particular height.
In longer e2e tests (with complex manifests) this hard-code timeout value may not be well adapted (too short).
This PR renders the logic more flexible in that it only applies the timeout from the time it sees the node is not advancing heights.
This PR also adds the long-running manifest that motivated the improvement (thanks @lasarojc)
PR checklist
.changelog(we use unclog to manage our changelog)docs/orspec/) and code comments