Skip to content

docker: release Linux/ARM64 image#5925

Merged
tac0turtle merged 3 commits intotendermint:masterfrom
odidev:tendermint_arm64
Jan 25, 2021
Merged

docker: release Linux/ARM64 image#5925
tac0turtle merged 3 commits intotendermint:masterfrom
odidev:tendermint_arm64

Conversation

@odidev
Copy link
Contributor

@odidev odidev commented Jan 19, 2021

Description

  1. The build tendermint process has been added to the Dockerfile because github actions do not support direct AArch64 builds, and we require the tendermint binary to build tendermint docker image for both the platforms.
  2. Installed QEMU via docker.yml file, and added platforms as "linux/amd64" and "linux/arm64" to the buildx, to release both architecture specific docker images for tendermint. Changed the context to the root of the package, to ease the build tendermint process via Dockerfile.
  3. Added the build tendermint part to the Dockerfile in stage 1, and copied the generated binary to the desired location in stage 2. Changed the base image to "golang:1.15-alpine", to incorporate the go1.15 installation.

Signed-off-by: odidev odidev@puresoftware.com

1. The build tendermint process has been added to the Dockerfile because github actions do not support direct AArch64 builds, and we require the tendermint binary to build tendermint docker image for both the platforms.
2. Installed QEMU via docker.yml file, and added platforms as "linux/amd64" and "linux/arm64" to the buildx, to release both architecture specific docker images for tendermint. Changed the context to the root of the package, to ease the build tendermint process via Dockerfile.
3. Added the build tendermint part to the Dockerfile in stage 1, and copied the generated binary to the desired location in stage 2. Changed the base image to "golang:1.15-alpine", to incorporate the go1.15 installation.

Signed-off-by: odidev <odidev@puresoftware.com>
@tessr
Copy link
Contributor

tessr commented Jan 19, 2021

Hi odidev,

Thanks for this contribution. We are meticulous about understanding the motivation behind any build/tooling changes. Can you zoom out and first describe the problem you have, and then explain how this PR will fix it?

@odidev
Copy link
Contributor Author

odidev commented Jan 20, 2021

@tessr
Thanks for the response.

I am working on adding Linux/ARM64 support to a project that uses both Tendermint binary and docker image.

I have already raised a PR for releasing binaries for Linux/ARM64 and thanks to you that it got merged and binaries have been released. But the docker image is only available for Linux/AMD64. Hence, this PR is focused to release Tendermint docker images for both Linux/AMD64 and Linux/ARM64 via workflows using buildx.

Tendermint uses Github actions to first build the tendermint binary and then copy it to the Dockerfile, and release the docker image. However, github actions do not support direct AArch64 builds, so generating the Linux/ARM64 tendermint binary via workflows is not possible.

To accomplish this build process, I have moved the ‘build tendermint’ stage into the Dockerfile itself. Buildx uses QEMU to create Linux/ARM64 environments, and hence will build tendermint from source to generate the tendermint binary inside the Dockerfile itself in stage 1. Further in stage 2, the binary is just copied to the desired location. Breaking the Dockerfile in multi-stage will reduce the size of the docker image.

The base image has been replaced with ‘golang:1.15-alpine’, because the build requires Go 1.15 or above, and this base image incorporates Go 1.15.6 along with the alpine environment.
The ‘context’ clause in the workflows has been set to the root of the project, to execute the ‘make build-linux’ command. Henceforth, the COPY clause in the Dockerfile has been changed from “./docker-entrypoint.sh /usr/local/bin/” to “./DOCKER/docker-entrypoint.sh /usr/local/bin/”.

Please let me know if you need more information.

@tessr
Copy link
Contributor

tessr commented Jan 20, 2021

I see - I think that all sounds reasonable. @marbar3778: while you're looking at Docker stuff, can you take a look at this PR, as well?

Copy link
Contributor

@tac0turtle tac0turtle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you for the contribution.

@tac0turtle tac0turtle changed the title Release Linux/ARM64 Tendermint Docker Image docker: release Linux/ARM64 image Jan 21, 2021
@tac0turtle tac0turtle merged commit cd3ebe8 into tendermint:master Jan 25, 2021
lovincyrus added a commit that referenced this pull request Jan 26, 2021
* Makefile: always pull image in proto-gen-docker. (#5953)

The `proto-gen-docker` target didn't pull an updated Docker image, and would use a local image if present which could be outdated and produce wrong results.

* test: fix TestPEXReactorRunning data race (#5955)

Fixes #5941.

Not entirely sure that this will fix the problem (couldn't reproduce), but in any case this is an artifact of a hack in the P2P transport refactor to make it work with the legacy P2P stack, and will be removed when the refactor is done anyway.

* test/fuzz: move fuzz tests into this repo (#5918)

Co-authored-by: Emmanuel T Odeke <emmanuel@orijtech.com>

Closes #5907

- add init-corpus to blockchain reactor
- remove validator-set FromBytes test
now that we have proto, we don't need to test it! bye amino
- simplify mempool test
do we want to test remote ABCI app?
- do not recreate mux on every crash in jsonrpc test
- update p2p pex reactor test
- remove p2p/listener test
the API has changed + I did not understand what it's tested anyway
- update secretconnection test
- add readme and makefile
- list inputs in readme
- add nightly workflow
- remove blockchain fuzz test
EncodeMsg / DecodeMsg no longer exist

* docker: dont login when in PR (#5961)

* docker: release Linux/ARM64 image (#5925)

Co-authored-by: Marko <marbar3778@yahoo.com>

* p2p: make PeerManager.DialNext() and EvictNext() block (#5947)

See #5936 and #5938 for background.

The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about.

I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.

* libs/log: format []byte as hexidecimal string (uppercased) (#5960)

Closes: #5806 

Co-authored-by: Lanie Hei <heixx011@umn.edu>

* docs: log level docs (#5945)

## Description

add section on configuring log levels

Closes: #XXX

* .github: fix fuzz-nightly job (#5965)

outputs is a property of the job, not an individual step.

* e2e: add control over the log level of nodes (#5958)

* mempool: fix reactor tests (#5967)

## Description

Update the faux router to either drop channel errors or handle them based on an argument. This prevents deadlocks in tests where we try to send an error on the mempool channel but there is no reader.

Closes: #5956

* p2p: improve peerStore prototype (#5954)

This improves the `peerStore` prototype by e.g.:

* Using a database with Protobuf for persistence, but also keeping full peer set in memory for performance.
* Simplifying the API, by taking/returning struct copies for safety, and removing errors for in-memory operations.
* Caching the ranked peer set, as a temporary solution until a better data structure is implemented.
* Adding `PeerManagerOptions.MaxPeers` and pruning the peer store (based on rank) when it's full.
* Rewriting `PeerAddress` to be independent of `url.URL`, normalizing it and tightening semantics.

* p2p: simplify PeerManager upgrade logic (#5962)

Follow-up from #5947, branched off of #5954.

This simplifies the upgrade logic by adding explicit eviction requests, which can also be useful for other use-cases (e.g. if we need to ban a peer that's misbehaving). Changes:

* Add `evict` map which queues up peers to explicitly evict.
* `upgrading` now only tracks peers that we're upgrading via dialing (`DialNext` → `Dialed`/`DialFailed`).
* `Dialed` will unmark `upgrading`, and queue `evict` if still beyond capacity.
* `Accepted` will pick a random lower-scored peer to upgrade to, if appropriate, and doesn't care about `upgrading` (the dial will fail later, since it's already connected).
* `EvictNext` will return a peer scheduled in `evict` if any, otherwise if beyond capacity just evict the lowest-scored peer.

This limits all of the `upgrading` logic to `DialNext`, `Dialed`, and `DialFailed`, making it much simplier, and it should generally do the right thing in all cases I can think of.

* p2p: add PeerManager.Advertise() (#5957)

Adds a naïve `PeerManager.Advertise()` method that the new PEX reactor can use to fetch addresses to advertise, as well as some other `FIXME`s on address advertisement.

* blockchain v0: fix waitgroup data race (#5970)

## Description

Fixes the data race in usage of `WaitGroup`. Specifically, the case where we invoke `Wait` _before_ the first delta `Add` call when the current waitgroup counter is zero. See https://golang.org/pkg/sync/#WaitGroup.Add.

Still not sure how this manifests itself in a test since the reactor has to be stopped virtually immediately after being started (I think?).

Regardless, this is the appropriate fix.

closes: #5968

* tests: fix `make test` (#5966)

## Description
 
- bump deadlock dep to master
  - fixes `make test` since we now use `deadlock.Once`

Closes: #XXX

* terminate go-fuzz gracefully (w/ SIGINT) (#5973)

and preserve exit code.

```
2021/01/26 03:34:49 workers: 2, corpus: 4 (8m28s ago), crashers: 0, restarts: 1/9976, execs: 11013732 (21596/sec), cover: 121, uptime: 8m30s
make: *** [fuzz-mempool] Terminated
Makefile:5: recipe for target 'fuzz-mempool' failed
Error: Process completed with exit code 124.
```

https://github.com/tendermint/tendermint/runs/1766661614

`continue-on-error` should make GH ignore any error codes.

* p2p: add prototype PEX reactor for new stack (#5971)

This adds a prototype PEX reactor for the new P2P stack.

* proto/p2p: rename PEX messages and fields (#5974)

Fixes #5899 by renaming a bunch of P2P Protobuf entities (while maintaining wire compatibility):

* `Message` to `PexMessage` (as it's only used for PEX messages).
* `PexAddrs` to `PexResponse`.
* `PexResponse.Addrs` to `PexResponse.Addresses`.
* `NetAddress` to `PexAddress` (as it's only used by PEX).

* p2p: resolve PEX addresses in PEX reactor (#5980)

This changes the new prototype PEX reactor to resolve peer address URLs into IP/port PEX addresses itself. Branched off of #5974.

I've spent some time thinking about address handling in the P2P stack. We currently use `PeerAddress` URLs everywhere, except for two places: when dialing a peer, and when exchanging addresses via PEX. We had two options:

1. Resolve addresses to endpoints inside `PeerManager`. This would introduce a lot of added complexity: we would have to track connection statistics per endpoint, have goroutines that asynchronously resolve and refresh these endpoints, deal with resolve scheduling before dialing (which is trickier than it sounds since it involves multiple goroutines in the peer manager and router and messes with peer rating order), handle IP address visibility issues, and so on.

2. Resolve addresses to endpoints (IP/port) only where they're used: when dialing, and in PEX. Everywhere else we use URLs.

I went with 2, because this significantly simplifies the handling of hostname resolution, and because I really think the PEX reactor should migrate to exchanging URLs instead of IP/port numbers anyway -- this allows operators to use DNS names for validators (and can easily migrate them to new IPs and/or load balance requests), and also allows different protocols (e.g. QUIC and `MemoryTransport`). Happy to discuss this.

* test/p2p: close transports to avoid goroutine leak failures (#5982)

* mempool: fix TestReactorNoBroadcastToSender (#5984)

## Description

Looks like I missed a test in the original PR when fixing the tests.

Closes: #5956

* mempool: fix mempool tests timeout (#5988)

* p2p: use stopCtx when dialing peers in Router (#5983)

This ensures we don't leak dial goroutines when shutting down the router.

* docs: fix typo in state sync example (#5989)

Co-authored-by: Erik Grinaker <erik@interchain.berlin>
Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
Co-authored-by: Marko <marbar3778@yahoo.com>
Co-authored-by: odidev <odidev@puresoftware.com>
Co-authored-by: Lanie Hei <heixx011@umn.edu>
Co-authored-by: Callum Waters <cmwaters19@gmail.com>
Co-authored-by: Aleksandr Bezobchuk <alexanderbez@users.noreply.github.com>
Co-authored-by: Sergey <52304443+c29r3@users.noreply.github.com>
tac0turtle added a commit that referenced this pull request Feb 11, 2021
Co-authored-by: Marko <marbar3778@yahoo.com>
tessr pushed a commit that referenced this pull request Feb 11, 2021
Co-authored-by: Marko <marbar3778@yahoo.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants