Description
We have a few Docker images in our build process that use "private" *personal" docker images. It would be great to have all the images either officially released and maintained by a community or to migrate the images to be managed by our Community.
Depending on external images, binaries, and repositories, where the community has no control or influence on might be acceptable for certain cases, but whenever community releases a code, we have to make sure our users are not implicitly depending on components that are outside of the community control.
PyPI packages
Proposed approach
No more changes are needed here.
Motivation
In the case of PyPI packages, the mechanism of PyPI provides sufficient protection. We had cases in the past when releases of packages that we depend on (explicitly or implicitly) caused problems with installing Apache Airflow. Unfortunately, we cannot pin Airflow to specific versions. More information why and how we solved it, you can find in airflow-dependencies. The problem has been mitigated by the recommended installation procedure and mechanism to update and test "recommended" constraints for PyPI stored in the requirements folder.
Static check images and repositories
Proposed approach
Those repositories and images can likely stay as they are once we fix several unpinned dependencies.
Motivation
Static checks can be easily disabled or switched to another image/repository with only a minor inconvenience for people developing actively for Apache Airflow. Those images and repositories are configured in .pre-commit-config.yaml.
Pre-commit has a built-in mechanism to upgrade the requirements via the pre-commit autoupdate mechanism. All our pre-commit hooks are pinned to a specific revision or version of the hooks released via GitHub.
More information can be found in the STATIC_CODE_CHECKS.rst.
Current status:
There are following exceptions to the pre-commit auto-update maintenance:
- the hadolint image is pinned in ci_lint_dockerfile.sh. We might want to switch it to one of the existing pre-commit hadolint hooks available.
- bats image is pinned in ci_bat_tests.sh to bats/bats:latest.
- stylelint is using fixed, released version of stylelint being a node dependency and needs to be updated manually
- shellcheck uses
koalaman/shellcheck:stable
- lint-openapi uses
wework/speccy image
Actions
Test dependencies
Proposed approach
We should move all images, binaries, apt-packages used during testing to community-controlled images where they're created by individuals or companies that control the images. The proposal is to create "custom-images" folder where we will keep Dockerfiles for those images and publish them in apache/airflow Docker registry with the tags corresponding to the image names
Motivation
We are using currently using externally downloaded images and binaries and apt-installed packages during the tests. Those images can be easily incorporated into the Apache Airflow code. We need usually a few lines Dockerfiles to run builds and push the images. Those images are very static - i.e. they can be used without changes for a long time, but it would be great to have a way to update them occasionally by the community.
Where the images/binaries/apt sources are officially supported by the organization (docker. gcloud for example) we can rely on that those images will be maintained and we can download them from the official source. All apt - packages should be installed from the original sources.
Current status
We have the following images that we use during the tests which are not "official" images maintained and released by the organizations where they belong or "private" user images:
- aneeshkj/helm-unittest
- ashb/apache-rat:0.13-1
- godatadriven/krb5-kdc-server
- polinux/stress
- osixia/openldap:1.2.0 (?)
We use the following, officially supported images. For those images we should make sure that we are using the latest stable
versions.
- mysql:${VERSION}
- postgresql:${VERSION}
- cassandra: 3.0
- mongo:3
- prestosql/presto:330
- rabbitmq:3.7
- redis:5.0.1
- kindest/node:${VERSION}
We use the following, officially supported images for CLI tools for CI image. They are always pointing to "latest" version and this is fine because those are tools that should be always using the latest stable version
- amazon/aws-cli:latest
- mcr.microsoft.com/azure-cli:latest
- gcr.io/google.com/cloudsdktool/cloud-sdk:latest
- hashicorp/terraform:latest
We are using the official openjdk Java image that points to JRE 8 (which is a requirement for DataFlow).
We also have the following binaries downloaded during CI image build or for Kubernetes testing:
We use the following external APT repositories during CI image build:
Actions
Migrate those images to "apache/airflow":
Review/ possibly update to latest versions:
Helm Chart and Production image dependencies
Proposed approach
Move all the dependencies in Helm Chart and Production image to either use officially supported images or Airflow Community managed one.
Motivation
Production Images and Helm Charts will soon be released officially and our users will use them to build their own images of Apache Airflow and install it via Helm Charts. We should make sure that all the dependencies are either officially released and supported by the organizations owning the code or that we manage them via the community.
Current status
We use the following external APT repositories during CI image build:
We use the following binaries in the Helm Chart:
- postgresql:6.3.12 (stable repo) Helm installation of Postgres.
We use the following images in the Helm Chart:
- astronomerinc/ap-statsd-exporter:0.11.0
- astronomerinc/ap-pgbouncer:1.8.1
- astronomerinc/ap-pgbouncer-exporter:0.5.0-1
- redis:6-buster
Actions:
Move the following images to Airflow managed "apache/airflow" registry:
Description
We have a few Docker images in our build process that use "private" *personal" docker images. It would be great to have all the images either officially released and maintained by a community or to migrate the images to be managed by our Community.
Depending on external images, binaries, and repositories, where the community has no control or influence on might be acceptable for certain cases, but whenever community releases a code, we have to make sure our users are not implicitly depending on components that are outside of the community control.
PyPI packages
Proposed approach
No more changes are needed here.
Motivation
In the case of PyPI packages, the mechanism of PyPI provides sufficient protection. We had cases in the past when releases of packages that we depend on (explicitly or implicitly) caused problems with installing Apache Airflow. Unfortunately, we cannot pin Airflow to specific versions. More information why and how we solved it, you can find in airflow-dependencies. The problem has been mitigated by the recommended installation procedure and mechanism to update and test "recommended" constraints for PyPI stored in the requirements folder.
Static check images and repositories
Proposed approach
Those repositories and images can likely stay as they are once we fix several unpinned dependencies.
Motivation
Static checks can be easily disabled or switched to another image/repository with only a minor inconvenience for people developing actively for Apache Airflow. Those images and repositories are configured in .pre-commit-config.yaml.
Pre-commit has a built-in mechanism to upgrade the requirements via the
pre-commit autoupdatemechanism. All our pre-commit hooks are pinned to a specific revision or version of the hooks released via GitHub.More information can be found in the STATIC_CODE_CHECKS.rst.
Current status:
There are following exceptions to the pre-commit auto-update maintenance:
koalaman/shellcheck:stablewework/speccyimageActions
hadolintto use an existing pre-commit updateable pre-commit hook rather than custom shell script if possiblestylelintto an existing pre-commit updateable pre-commit hook rather than custom dependencies if possibleshellcheckto an existing pre-commit updateable hook rather than direct image if possiblelint-openapito an existing pre-commit updateable hook rather than direct image if possibleTest dependencies
Proposed approach
We should move all images, binaries, apt-packages used during testing to community-controlled images where they're created by individuals or companies that control the images. The proposal is to create "custom-images" folder where we will keep Dockerfiles for those images and publish them in
apache/airflowDocker registry with the tags corresponding to the image namesMotivation
We are using currently using externally downloaded images and binaries and apt-installed packages during the tests. Those images can be easily incorporated into the Apache Airflow code. We need usually a few lines Dockerfiles to run builds and push the images. Those images are very static - i.e. they can be used without changes for a long time, but it would be great to have a way to update them occasionally by the community.
Where the images/binaries/apt sources are officially supported by the organization (docker. gcloud for example) we can rely on that those images will be maintained and we can download them from the official source. All apt - packages should be installed from the original sources.
Current status
We have the following images that we use during the tests which are not "official" images maintained and released by the organizations where they belong or "private" user images:
We use the following, officially supported images. For those images we should make sure that we are using the latest stable
versions.
We use the following, officially supported images for CLI tools for CI image. They are always pointing to "latest" version and this is fine because those are tools that should be always using the latest stable version
We are using the official openjdk Java image that points to JRE 8 (which is a requirement for DataFlow).
We also have the following binaries downloaded during CI image build or for Kubernetes testing:
We use the following external APT repositories during CI image build:
Actions
Migrate those images to "apache/airflow":
Review/ possibly update to latest versions:
Helm Chart and Production image dependencies
Proposed approach
Move all the dependencies in Helm Chart and Production image to either use officially supported images or Airflow Community managed one.
Motivation
Production Images and Helm Charts will soon be released officially and our users will use them to build their own images of Apache Airflow and install it via Helm Charts. We should make sure that all the dependencies are either officially released and supported by the organizations owning the code or that we manage them via the community.
Current status
We use the following external APT repositories during CI image build:
We use the following binaries in the Helm Chart:
We use the following images in the Helm Chart:
Actions:
Move the following images to Airflow managed "apache/airflow" registry: