Skip to content

multi-platform-build#4754

Merged
thrau merged 5 commits intolocalstack:masterfrom
chezsmithy:multi-platform-build
Nov 15, 2021
Merged

multi-platform-build#4754
thrau merged 5 commits intolocalstack:masterfrom
chezsmithy:multi-platform-build

Conversation

@chezsmithy
Copy link
Contributor

Multi-platform changes for Dockerfile-base and Dockerfile.
Preparation for changes to circleci.yml to perform multi-arch builds using docker buildx build.

Confirmed tests run with locally integrated tests harnesses.
Confirmed working sns, s3, dynamodb, sqs, secretsmanager, ssm

TODO:

Need to run full test suite locally.
Need to modify make file and circle ci configuration to include docker buildx build with multiple platforms and qemu for arm builds in circle ci and local.

@localstack localstack deleted a comment Oct 19, 2021
@localstack localstack deleted a comment Oct 19, 2021
@chezsmithy
Copy link
Contributor Author

@whummer @thrau I am having a heck of a time with two things. 1) getting the elastic search plugins to install without hanging for 20+ minutes. 2) getting the tests to run on arm in circleCI.

I'm wondering if the choice of alpine as base is causing us some pain with so much python and musl. Maybe it's causing compile issues on Arm? What's interesting is the tests will run locally on my M1 mac build so maybe an available CPU/memory issue? I'm open to thoughts.

@chezsmithy chezsmithy force-pushed the multi-platform-build branch from 9d7d647 to f63d375 Compare October 25, 2021 06:07
@whummer
Copy link
Member

whummer commented Oct 25, 2021

Hi @chezsmithy - thanks again so much for your hard work on this PR. We'd be happy to jump in and help out with the remaining issues (e.g., test failures). Do you think you could give us push access to your fork, so we can collaborate on your branch? Thanks! 🙏

@chezsmithy
Copy link
Contributor Author

Hi @chezsmithy - thanks again so much for your hard work on this PR. We'd be happy to jump in and help out with the remaining issues (e.g., test failures). Do you think you could give us push access to your fork, so we can collaborate on your branch? Thanks! 🙏

You should have access now. I've added in a new Dockerfile called Dockerfile.complete-debian. It's based off of Debian buster python 3.8. Build times are greatly improved for arm. Also, it has stabilized my test runs. Likely this is due to having glibc included. I wonder if it might make sense to use this rather than alpine as the base? Thoughts?

@chezsmithy
Copy link
Contributor Author

@whummer I've setup the full run. After the move to Debian it's all green locally and in ci, except the same Elasticsearch tests still failing only in ci. They run just fine locally so I could use some help debugging those. I'm getting close to needing to integration test an actual push to docker. Going to need to simulate that by changing the image tags perhaps to avoid actually deploying images.

@whummer
Copy link
Member

whummer commented Oct 27, 2021

Thank you so much for crunching through this PR @chezsmithy , 🙏 and apologies for the delay on our end - unfortunately we got swamped with a few tasks the last couple of days. Will carve out some time later today and tomorrow to review your changes and see how we can best integrate them into our e2e pipeline.

Please do let us know if we can help out with testing the Docker push - if we could ensure that we're using a different testing tag for now, that would be great.. 👍 Thanks!

@whummer
Copy link
Member

whummer commented Oct 28, 2021

@chezsmithy - can we please also give @alexrashed push access to your branch, he'll also help out with reviewing and incorporating the changes into our e2e flow. Thanks! 🙏

@chezsmithy
Copy link
Contributor Author

chezsmithy commented Nov 2, 2021

I've got the Elasticsearch plug-ins to install now, but the tests still seem to fail. Root cause looks like this in the test logs:

tests/integration/test_elasticsearch.py::ElasticsearchTest::test_create_existing_domain_causes_exception
-------------------------------- live log setup --------------------------------
INFO tests.integration.test_elasticsearch:test_elasticsearch.py:53 waiting for initialization lock
INFO tests.integration.test_elasticsearch:test_elasticsearch.py:55 initialization lock acquired in 0.00 seconds
INFO tests.integration.test_elasticsearch:test_elasticsearch.py:206 creating elasticsearch domain test_es_domain_1
INFO localstack.services.es.es_api:es_api.py:107 starting <class 'localstack.services.es.cluster.ProxiedElasticsearchCluster'> on localhost:4571
[2021-11-02 05:08:08 +0000] [17] [INFO] Running on http://0.0.0.0:4571 (CTRL + C to quit)
INFO hypercorn.error:logging.py:90 Running on http://0.0.0.0:4571 (CTRL + C to quit)
INFO tests.integration.test_elasticsearch:test_elasticsearch.py:214 asserting created state of domain test_es_domain_1 (state = False)
INFO localstack.services.es.cluster:cluster.py:163 starting elasticsearch: su localstack -c '/opt/code/localstack/localstack/infra/elasticsearch/bin/elasticsearch -E http.port=44339 -E http.publish_port=44339 -E transport.port=0 -E network.host=127.0.0.1 -E http.compression=false -E path.data="/opt/code/localstack/localstack/infra/elasticsearch/data" -E path.repo="/tmp/localstack/es_backup" -E xpack.ml.enabled=false' with env {'ES_JAVA_OPTS': '-Xms200m -Xmx600m', 'ES_TMPDIR': '/opt/code/localstack/localstack/infra/elasticsearch/tmp'}
INFO localstack.services.es.cluster:cluster.py:172 elasticsearch process ended <--- Looks like the process died
INFO tests.integration.test_elasticsearch:test_elasticsearch.py:214 asserting created state of domain test_es_domain_1 (state = False)
INFO tests.integration.test_elasticsearch:test_elasticsearch.py:214 asserting created state of domain test_es_domain_1 (state = False)
INFO tests.integration.test_elasticsearch:test_elasticsearch.py:214 asserting created state of domain test_es_domain_1 (state = False)
INFO tests.integration.test_elasticsearch:test_elasticsearch.py:214 asserting created state of domain test_es_domain_1 (state = False)
INFO tests.integration.test_elasticsearch:test_elasticsearch.py:214 asserting created

What's odd is if I try the same test using the same docker file locally on my M1 Mac, the same tests pass. Must be something environmental on the circleci runners?

@chezsmithy chezsmithy force-pushed the multi-platform-build branch from adb6eff to 585ac14 Compare November 2, 2021 05:35
@alexrashed
Copy link
Member

Hi @chezsmithy!
Thanks for giving me the permissions to push on your fork and thanks for all the great work! That's quite some heavy lifting with the different archs, qemu, buildx,...

I started working on this on Friday. First thing was to fix the base image build's ES plugin installations (which was broken due to changes to their webserver's cipher suite, which in turn weren't supported in our stripped down JRE). This actually had nothing to do with this PR, but was basically a prerequisite to get started.

Currently I'm working on getting the multiarch alpine base image build working for me, because we would actually try to stick with alpine if that's possible. We really want to keep things as small and fast as possible, and therefore we really want to avoid the additional 310 MB of the Debian image.

We could also short-circuit in our community Slack to join forces if you would like to.

@chezsmithy
Copy link
Contributor Author

Hi @chezsmithy!

Thanks for giving me the permissions to push on your fork and thanks for all the great work! That's quite some heavy lifting with the different archs, qemu, buildx,...

I started working on this on Friday. First thing was to fix the base image build's ES plugin installations (which was broken due to changes to their webserver's cipher suite, which in turn weren't supported in our stripped down JRE). This actually had nothing to do with this PR, but was basically a prerequisite to get started.

Currently I'm working on getting the multiarch alpine base image build working for me, because we would actually try to stick with alpine if that's possible. We really want to keep things as small and fast as possible, and therefore we really want to avoid the additional 310 MB of the Debian image.

We could also short-circuit in our community Slack to join forces if you would like to.

Thanks! Not many changes in the end, but it's got a few moving parts to it. It might be worth considering Debian, just for the benefit of your python code build times. On alpine, with musl it appears may of the wheels need to be built from scratch extending the docker build time. With Debian and glibc only a few need to build making it much faster. I have both docker files so you can compare and see. Love to work with you on slack.

@chezsmithy chezsmithy force-pushed the multi-platform-build branch from ab64a0f to e4ab2a5 Compare November 3, 2021 05:02
mkdir(dirs.tmp)
chmod_r(dirs.tmp, 0o777)

rm_rf(dirs.backup)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@whummer @alexrashed @thrau I added this cleanup for the backup directory here to stabilize my tests. Not sure why it's working without the backup directory being pre-created in the current docker runs, but this fixes things for me now and all elastic search tests pass. Thoughts?

@thrau
Copy link
Member

thrau commented Nov 12, 2021

Hey, please see my comment here for an update: #2550 (comment)

we cleaned up the history in this PR, and there are a few kinks which i'm currently fixing. We should then be good to prepare the release for monday!

@thrau thrau force-pushed the multi-platform-build branch from 4d422b7 to 80bc2fc Compare November 12, 2021 13:47

SHELL [ "/bin/bash", "-c" ]

# install Docker (CLI only)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can look into removing the Docker CLI from here, actually (now that we have the Docker Python SDK fully integrated). Would probably save us another ~50MB in the final image 💪 - we can look into that post initial release.. /cc @thrau @dfangl @alexrashed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point! some folks do rely on the legacy docker client AFAIK, but we can maybe make that part of the full image.

@thrau
Copy link
Member

thrau commented Nov 12, 2021

build is green :-) we will prepare the 0.13 release now, and will merge the PR manually hopefully on monday

pass
if system not in ["linux"]:
raise ValueError("unsupported os %s for awslambda-go-runtime" % system)
if arch not in ["amd64", "arm32"]:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be arm32 here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thrau not sure if you noticed this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jep good catch! thanks for pointing it out. will fix

chezsmithy and others added 4 commits November 15, 2021 18:46
Co-authored-by: Alexander Rashed <alexander.rashed@localstack.cloud>
Co-authored-by: Shawn Smith <chezsmithy@me.com>
Co-authored-by: Daniel Fangl <daniel.fangl@localstack.cloud>
Co-authored-by: Daniel Fangl <daniel.fangl@localstack.cloud>
Co-authored-by: Thomas Rausch <thomas@thrau.at>
@thrau thrau force-pushed the multi-platform-build branch from 2263526 to 990da3e Compare November 15, 2021 17:58
@thrau thrau force-pushed the multi-platform-build branch from 990da3e to d295169 Compare November 15, 2021 19:10
@thrau thrau merged commit d295169 into localstack:master Nov 15, 2021
@thrau thrau temporarily deployed to localstack-ext-tests November 15, 2021 19:12 Inactive
@github-pages github-pages bot temporarily deployed to github-pages November 15, 2021 19:12 Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants