Skip to content

All "private" images and binaries used by Airflow should be manage by community #9401

@potiuk

Description

@potiuk

Description

We have a few Docker images in our build process that use "private" *personal" docker images. It would be great to have all the images either officially released and maintained by a community or to migrate the images to be managed by our Community.

Depending on external images, binaries, and repositories, where the community has no control or influence on might be acceptable for certain cases, but whenever community releases a code, we have to make sure our users are not implicitly depending on components that are outside of the community control.

PyPI packages

Proposed approach

No more changes are needed here.

Motivation

In the case of PyPI packages, the mechanism of PyPI provides sufficient protection. We had cases in the past when releases of packages that we depend on (explicitly or implicitly) caused problems with installing Apache Airflow. Unfortunately, we cannot pin Airflow to specific versions. More information why and how we solved it, you can find in airflow-dependencies. The problem has been mitigated by the recommended installation procedure and mechanism to update and test "recommended" constraints for PyPI stored in the requirements folder.

Static check images and repositories

Proposed approach

Those repositories and images can likely stay as they are once we fix several unpinned dependencies.

Motivation

Static checks can be easily disabled or switched to another image/repository with only a minor inconvenience for people developing actively for Apache Airflow. Those images and repositories are configured in .pre-commit-config.yaml.

Pre-commit has a built-in mechanism to upgrade the requirements via the pre-commit autoupdate mechanism. All our pre-commit hooks are pinned to a specific revision or version of the hooks released via GitHub.

More information can be found in the STATIC_CODE_CHECKS.rst.

Current status:

There are following exceptions to the pre-commit auto-update maintenance:

  • the hadolint image is pinned in ci_lint_dockerfile.sh. We might want to switch it to one of the existing pre-commit hadolint hooks available.
  • bats image is pinned in ci_bat_tests.sh to bats/bats:latest.
  • stylelint is using fixed, released version of stylelint being a node dependency and needs to be updated manually
  • shellcheck uses koalaman/shellcheck:stable
  • lint-openapi uses wework/speccy image

Actions

  • Update to latest versions of pre-commit hooks via 'pre-commit autoupdate`
  • Convert hadolint to use an existing pre-commit updateable pre-commit hook rather than custom shell script if possible
  • Convert stylelint to an existing pre-commit updateable pre-commit hook rather than custom dependencies if possible
  • Convert shellcheck to an existing pre-commit updateable hook rather than direct image if possible
  • Convert lint-openapi to an existing pre-commit updateable hook rather than direct image if possible

Test dependencies

Proposed approach

We should move all images, binaries, apt-packages used during testing to community-controlled images where they're created by individuals or companies that control the images. The proposal is to create "custom-images" folder where we will keep Dockerfiles for those images and publish them in apache/airflow Docker registry with the tags corresponding to the image names

Motivation

We are using currently using externally downloaded images and binaries and apt-installed packages during the tests. Those images can be easily incorporated into the Apache Airflow code. We need usually a few lines Dockerfiles to run builds and push the images. Those images are very static - i.e. they can be used without changes for a long time, but it would be great to have a way to update them occasionally by the community.

Where the images/binaries/apt sources are officially supported by the organization (docker. gcloud for example) we can rely on that those images will be maintained and we can download them from the official source. All apt - packages should be installed from the original sources.

Current status

We have the following images that we use during the tests which are not "official" images maintained and released by the organizations where they belong or "private" user images:

  • aneeshkj/helm-unittest
  • ashb/apache-rat:0.13-1
  • godatadriven/krb5-kdc-server
  • polinux/stress
  • osixia/openldap:1.2.0 (?)

We use the following, officially supported images. For those images we should make sure that we are using the latest stable
versions.

  • mysql:${VERSION}
  • postgresql:${VERSION}
  • cassandra: 3.0
  • mongo:3
  • prestosql/presto:330
  • rabbitmq:3.7
  • redis:5.0.1
  • kindest/node:${VERSION}

We use the following, officially supported images for CLI tools for CI image. They are always pointing to "latest" version and this is fine because those are tools that should be always using the latest stable version

  • amazon/aws-cli:latest
  • mcr.microsoft.com/azure-cli:latest
  • gcr.io/google.com/cloudsdktool/cloud-sdk:latest
  • hashicorp/terraform:latest

We are using the official openjdk Java image that points to JRE 8 (which is a requirement for DataFlow).

  • openjdk:8-jre-slim

We also have the following binaries downloaded during CI image build or for Kubernetes testing:

We use the following external APT repositories during CI image build:

Actions

Migrate those images to "apache/airflow":

  • aneeshkj/helm-unittest
  • ashb/apache-rat:0.13-1
  • godatadriven/krb5-kdc-server
  • polinux/stress
  • osixia/openldap:1.2.0 (?)

Review/ possibly update to latest versions:

  • cassandra
  • mongo
  • prestosql/presto
  • rabbitmq
  • redis
  • kindest/node
  • docker
  • kind
  • helm
  • kubectl

Helm Chart and Production image dependencies

Proposed approach

Move all the dependencies in Helm Chart and Production image to either use officially supported images or Airflow Community managed one.

Motivation

Production Images and Helm Charts will soon be released officially and our users will use them to build their own images of Apache Airflow and install it via Helm Charts. We should make sure that all the dependencies are either officially released and supported by the organizations owning the code or that we manage them via the community.

Current status

We use the following external APT repositories during CI image build:

We use the following binaries in the Helm Chart:

  • postgresql:6.3.12 (stable repo) Helm installation of Postgres.

We use the following images in the Helm Chart:

  • astronomerinc/ap-statsd-exporter:0.11.0
  • astronomerinc/ap-pgbouncer:1.8.1
  • astronomerinc/ap-pgbouncer-exporter:0.5.0-1
  • redis:6-buster

Actions:

Move the following images to Airflow managed "apache/airflow" registry:

  • astronomerinc/ap-statsd-exporter:0.11.0
  • astronomerinc/ap-pgbouncer:1.8.1
  • astronomerinc/ap-pgbouncer-exporter:0.5.0-1

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:CIAirflow's tests and continious integrationarea:helm-chartAirflow Helm Chartarea:production-imageProduction image improvements and fixeskind:featureFeature Requests

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions