Skip to content

Implement uncoordinated upgrades on e2e#238

Merged
sergio-mena merged 14 commits intomainfrom
sergio/e2e-upgrade
Jan 31, 2023
Merged

Implement uncoordinated upgrades on e2e#238
sergio-mena merged 14 commits intomainfrom
sergio/e2e-upgrade

Conversation

@sergio-mena
Copy link
Collaborator

@sergio-mena sergio-mena commented Jan 30, 2023

Closes #56 and #167

This PR builds on existing "multiversion" infra to change a node's docker image on the fly (stop, change image, restart).

I spent some time researching solutions, with the intention of finding solutions as surgical as possible. Please review the two approaches included in this PR:

  • Approach 1: please review corresponding commit in isolation.
    • This approach changes the docker compose file just after having started all nodes
    • Then, upon a node upgrade, we call "docker-compose up", which recreates the container with the new image, but keeps the old data and config (since they are stored in a volume)
    • This is more surgical than Approach 2, but has a BIG disadvantage: we lose the docker logs for the original container
    • This commit has been reverted, because Approach2 was preferred
  • Approach 2: please review corresponding commit in isolation.
    • In this approach, when the upgrade version and the node's version don't match, we create an extra container for that node in docker compose, which shares the same volumes (same config and data dirs)
    • Upgrades are modeled as perturbations (surgical), which allows for uncoordinated (minor) upgrade
    • When it is time to upgrade, we stop the node's initial container, and start the extra container
    • This is less surgical than Approach 1, but it is preferred because we can keep all logs

The first commit of this PR is also fixing the e2e infra for tendermint-based releases (<= v0.34.24)

With this PR, we can define testnet manifests that can:

  • Test uncoordinated upgrades for minor versions. Many nodes can be defined with a mix of initial versions
  • Test upgrades for major versions with a single node
  • The logic can be easily extended to test coordinated (major) upgrades (not done in this PR)

I still need to update the changelog and the doc, but first, I'd like to have the two approaches reviewed.


PR checklist

  • Tests written/updated, or no tests needed
  • CHANGELOG_PENDING.md updated, or no changelog entry needed
  • Updated relevant documentation (docs/) and code comments, or no
    documentation updates needed

@sergio-mena sergio-mena requested a review from a team as a code owner January 30, 2023 10:10
@sergio-mena sergio-mena self-assigned this Jan 30, 2023
@sergio-mena sergio-mena added the rename Renaming our fork label Jan 30, 2023
Copy link
Contributor

@thanethomson thanethomson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I tried running this locally but got the same kinds of errors that were seen in the nightlies, so I assume that #237 will fix that.

Comment on lines 73 to +74
- ./{{ .Name }}:/cometbft
- ./{{ .Name }}:/tendermint
Copy link
Contributor

@thanethomson thanethomson Jan 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work as expected? (i.e. the same host folder being mounted in two different places)

Never tried this before with Docker 🙂

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to do this, since older images ( < v0.34.24) use /tendermint as home path, and we can no longer change them.

In normal conditions, I agree it's dangerous to mount the same volume in two different places, but in this case, I think we are safe because:

  • Old images will only use /tendermint
  • New images will only use /cometbft

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this, the docker-compose didn't work for old images

{{- if eq .ABCIProtocol "builtin" }}
entrypoint: /usr/bin/entrypoint-builtin
{{- else }}{{ if eq .ABCIProtocol "builtin_unsync" }}
{{- if or (eq .ABCIProtocol "builtin") (eq .ABCIProtocol "builtin_unsync") }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Didn't know one could do this in templates.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah :-)
I needed to get familiar with templates while working on this, and found this trick by chance

@sergio-mena
Copy link
Collaborator Author

so I assume that #237 will fix that.

Let me know if that's not the case. I'll keep this PR open until tomorrow

@sergio-mena sergio-mena merged commit b27bb3a into main Jan 31, 2023
@sergio-mena sergio-mena deleted the sergio/e2e-upgrade branch January 31, 2023 09:46
mergify bot pushed a commit that referenced this pull request Jan 31, 2023
* Fix multiversion e2e when `version <= v0.34.24`

* Changes we want no matter what we do

* Approach 1: With docker-compose up recreating image

* Revert "Approach 1: With docker-compose up recreating image"

This reverts commit 068f5b78d3b7cf99fb21296707117e4f4e2ef3e0.

* Approach 2: With distinct docker-compose service

* Fix the case when 'upgrade' is not the last perturbation

* Update generator

* Simplify template

* bump

* Update test/e2e/pkg/manifest.go

* changelog

* doc

* fix doc

(cherry picked from commit b27bb3a)
sergio-mena added a commit that referenced this pull request Jan 31, 2023
* Fix multiversion e2e when `version <= v0.34.24`

* Changes we want no matter what we do

* Approach 1: With docker-compose up recreating image

* Revert "Approach 1: With docker-compose up recreating image"

This reverts commit 068f5b78d3b7cf99fb21296707117e4f4e2ef3e0.

* Approach 2: With distinct docker-compose service

* Fix the case when 'upgrade' is not the last perturbation

* Update generator

* Simplify template

* bump

* Update test/e2e/pkg/manifest.go

* changelog

* doc

* fix doc

(cherry picked from commit b27bb3a)

Co-authored-by: Sergio Mena <sergio@informal.systems>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rename Renaming our fork

Projects

No open projects
Status: Done

Development

Successfully merging this pull request may close these issues.

Improve QA infra to test upgrades on e2e/testnets

2 participants