Skip to content

Cant add additional kill command to airflow.providers.apache.spark.hooks.spark_submit #50958

@patrickmoch

Description

@patrickmoch

Apache Airflow Provider(s)

apache-spark

Versions of Apache Airflow Providers

pip freeze for providers:

apache-airflow==2.10.5
apache-airflow-providers-amazon==9.2.0
apache-airflow-providers-apache-spark==5.2.1
apache-airflow-providers-celery==3.10.0
apache-airflow-providers-cncf-kubernetes==10.1.0
apache-airflow-providers-common-compat==1.6.1
apache-airflow-providers-common-io==1.5.0
apache-airflow-providers-common-sql==1.21.0
apache-airflow-providers-docker==4.0.0
apache-airflow-providers-elasticsearch==6.0.0
apache-airflow-providers-fab==1.5.2
apache-airflow-providers-ftp==3.12.0
apache-airflow-providers-google==12.0.0
apache-airflow-providers-grpc==3.7.0
apache-airflow-providers-hashicorp==4.0.0
apache-airflow-providers-http==5.0.0
apache-airflow-providers-imap==3.8.0
apache-airflow-providers-microsoft-azure==12.0.0
apache-airflow-providers-mysql==6.0.0
apache-airflow-providers-odbc==4.9.0
apache-airflow-providers-openlineage==2.0.0
apache-airflow-providers-postgres==6.0.0
apache-airflow-providers-redis==4.0.0
apache-airflow-providers-sendgrid==4.0.0
apache-airflow-providers-sftp==5.0.0
apache-airflow-providers-slack==9.0.0
apache-airflow-providers-smtp==1.9.0
apache-airflow-providers-snowflake==6.0.0
apache-airflow-providers-sqlite==4.0.0
apache-airflow-providers-ssh==4.0.0
apache-airflow-providers-trino==4.3.1

Apache Airflow version

2.10.5

Operating System

linux - openshift k8s

Deployment

Official Apache Airflow Helm Chart

Deployment details

k8s openshift with argocd. We are using celery executor for running dags.

What happened

Hi there,

we are running airflow on kubernetes and using apache-airflow-providers-apache-spark to submit a spark job to the cluster, this works fine. Now we are working on implementing istio to our artifacts.

With spark config defaults like
spark.kubernetes.driver.label.sidecar.istio.io/inject: "true"
spark.kubernetes.executor.label.sidecar.istio.io/inject: "true"
we are able to spin up pods with istio labels and sidecar will be injected.

When submitting spark job with airflow to the cluster the driver and executor pods come up and istio sidecar is injected to the pod. Spark job runs success in its cotainer inside the pod but istio in its own container wont be killed. In the result the task in the dag cant get success state. The only way at the moment is to add a curl command to the image running the application called from airflow to send kill signal to istio.

Maybe there is a chance to add a custom "on Kill" to the spark submit command to kill istio or other sidecars from airflow.

What you think should happen instead

Spark submit getting state success and pod completely shutting down.

How to reproduce

Run a dag with spark submit operator in an environment which is using istio and all pods getting labeled with isitio and in this result istio sidecar is injected to driver and executor job.

Anything else

Problem occurs every time submitting a spark job while istio is active.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions