Skip to content

Improve wait for node in e2e runner for long catch-up times#375

Merged
sergio-mena merged 16 commits intomainfrom
sergio/e2e-improve-wait-for-node
Mar 1, 2023
Merged

Improve wait for node in e2e runner for long catch-up times#375
sergio-mena merged 16 commits intomainfrom
sergio/e2e-improve-wait-for-node

Conversation

@sergio-mena
Copy link
Collaborator

Contributes to #283

The e2e runner has a fix timeout when waiting for a process to reach a particular height.
In longer e2e tests (with complex manifests) this hard-code timeout value may not be well adapted (too short).
This PR renders the logic more flexible in that it only applies the timeout from the time it sees the node is not advancing heights.

This PR also adds the long-running manifest that motivated the improvement (thanks @lasarojc)


PR checklist

  • Tests written/updated
  • Changelog entry added in .changelog (we use unclog to manage our changelog)
  • Updated relevant documentation (docs/ or spec/) and code comments

@sergio-mena sergio-mena requested a review from a team as a code owner February 23, 2023 10:03
@sergio-mena sergio-mena added the rename Renaming our fork label Feb 23, 2023
@sergio-mena sergio-mena mentioned this pull request Feb 23, 2023
8 tasks
@lasarojc lasarojc linked an issue Feb 23, 2023 that may be closed by this pull request
8 tasks
[node]
[node.full01]
mode = "full"
version = "ghcr.io/informalsystems/tendermint-e2e-node:v0.34.x"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we make this general for all branches?
Maybe the update should be from cometbft/e2e-node:latest to cometbft/e2e-node:local?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this not actually break if merging into main? We'd be running a v0.34-based node alongside v0.38-based ones, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, sorry for delay coming back to this. I looked at @lasarojc's commits below (thanks for them!) and I think they address this concern.
I have also removed backport-to-v0.34x as that will need to be a special case.
Are we still missing something here?

@lasarojc lasarojc added the qa Quality assurance label Feb 24, 2023
…thup on long runs. Waits for all nodes for the duration of `timeout`, not each node for the duration of `timeout`.
@sergio-mena sergio-mena merged commit 167a845 into main Mar 1, 2023
@sergio-mena sergio-mena deleted the sergio/e2e-improve-wait-for-node branch March 1, 2023 17:47
mergify bot pushed a commit that referenced this pull request Mar 1, 2023
* e2e: Flexible timeout while waiting for a node

* QA: added big e2e network manifest

* Don't build latest locally, but rely on what has been released

* Do not build `latest`=`local` locally. Long test upgrades from `latest` to `local`.

* CatchingUp shouldn't be checked when waiting for height 0

* Adjust retention to support evidence gathering

* fix typo

* More adjestments for long-running e2e manifests

* Makes wait proportional to the height, so nodes have more time to cacthup on long runs. Waits for all nodes for the duration of `timeout`, not each node for the duration of `timeout`.

---------

Co-authored-by: lasarojc <lasaro@informal.systems>
(cherry picked from commit 167a845)

# Conflicts:
#	state/txindex/indexer_service.go
sergio-mena added a commit that referenced this pull request Mar 1, 2023
* e2e: Flexible timeout while waiting for a node

* QA: added big e2e network manifest

* Don't build latest locally, but rely on what has been released

* Do not build `latest`=`local` locally. Long test upgrades from `latest` to `local`.

* CatchingUp shouldn't be checked when waiting for height 0

* Adjust retention to support evidence gathering

* fix typo

* More adjestments for long-running e2e manifests

* Makes wait proportional to the height, so nodes have more time to cacthup on long runs. Waits for all nodes for the duration of `timeout`, not each node for the duration of `timeout`.

---------

Co-authored-by: lasarojc <lasaro@informal.systems>
sergio-mena added a commit that referenced this pull request Mar 1, 2023
…375) (#435)

* Improve wait for node in e2e runner for long catch-up times (#375)

* e2e: Flexible timeout while waiting for a node

* QA: added big e2e network manifest

* Don't build latest locally, but rely on what has been released

* Do not build `latest`=`local` locally. Long test upgrades from `latest` to `local`.

* CatchingUp shouldn't be checked when waiting for height 0

* Adjust retention to support evidence gathering

* fix typo

* More adjestments for long-running e2e manifests

* Makes wait proportional to the height, so nodes have more time to cacthup on long runs. Waits for all nodes for the duration of `timeout`, not each node for the duration of `timeout`.

---------

Co-authored-by: lasarojc <lasaro@informal.systems>
(cherry picked from commit 167a845)

# Conflicts:
#	state/txindex/indexer_service.go

* Revert "Improve wait for node in e2e runner for long catch-up times (#375)"

* Improve wait for node in e2e runner for long catch-up times (#375)

* e2e: Flexible timeout while waiting for a node

* QA: added big e2e network manifest

* Don't build latest locally, but rely on what has been released

* Do not build `latest`=`local` locally. Long test upgrades from `latest` to `local`.

* CatchingUp shouldn't be checked when waiting for height 0

* Adjust retention to support evidence gathering

* fix typo

* More adjestments for long-running e2e manifests

* Makes wait proportional to the height, so nodes have more time to cacthup on long runs. Waits for all nodes for the duration of `timeout`, not each node for the duration of `timeout`.

---------

Co-authored-by: lasarojc <lasaro@informal.systems>

* Remove trailing spaces

---------

Co-authored-by: Sergio Mena <sergio@informal.systems>
Co-authored-by: lasarojc <lasaro@informal.systems>
cason pushed a commit that referenced this pull request Jan 5, 2024
* PBTS: second version of system model

* PBTS: new model referred in algorithm spec

* PBTS: removed model discussion from algorithm spec

* PBTS: corrections on the ystem model

* PBTS: a pretty complex problem statement

* PBTS: minor fixes on the problem spefication

* PBTS: liveness part of problem specification

* PBTS: link updated, outdated note on sysmodel_v1

* Update spec/consensus/proposer-based-timestamp/pbts-algorithm_002_draft.md

Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com>

* Apply William's suggestions from code review

Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com>

* PBTS: new discussion and definition for accuracy

* Apply Josef's suggestion from code review

Co-authored-by: Josef Widder <44643235+josef-widder@users.noreply.github.com>

* PBTS: some tags added to sysmodel

* PBTS: motivation and link to Issue #371

* PBTS: fixing lint error

Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com>
Co-authored-by: Josef Widder <44643235+josef-widder@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

qa Quality assurance rename Renaming our fork

Projects

No open projects
Status: Done

Development

Successfully merging this pull request may close these issues.

QA of CMT 0.34.x

4 participants