Apache Airflow Provider(s)
apache-spark
Versions of Apache Airflow Providers
pip freeze for providers:
apache-airflow==2.10.5
apache-airflow-providers-amazon==9.2.0
apache-airflow-providers-apache-spark==5.2.1
apache-airflow-providers-celery==3.10.0
apache-airflow-providers-cncf-kubernetes==10.1.0
apache-airflow-providers-common-compat==1.6.1
apache-airflow-providers-common-io==1.5.0
apache-airflow-providers-common-sql==1.21.0
apache-airflow-providers-docker==4.0.0
apache-airflow-providers-elasticsearch==6.0.0
apache-airflow-providers-fab==1.5.2
apache-airflow-providers-ftp==3.12.0
apache-airflow-providers-google==12.0.0
apache-airflow-providers-grpc==3.7.0
apache-airflow-providers-hashicorp==4.0.0
apache-airflow-providers-http==5.0.0
apache-airflow-providers-imap==3.8.0
apache-airflow-providers-microsoft-azure==12.0.0
apache-airflow-providers-mysql==6.0.0
apache-airflow-providers-odbc==4.9.0
apache-airflow-providers-openlineage==2.0.0
apache-airflow-providers-postgres==6.0.0
apache-airflow-providers-redis==4.0.0
apache-airflow-providers-sendgrid==4.0.0
apache-airflow-providers-sftp==5.0.0
apache-airflow-providers-slack==9.0.0
apache-airflow-providers-smtp==1.9.0
apache-airflow-providers-snowflake==6.0.0
apache-airflow-providers-sqlite==4.0.0
apache-airflow-providers-ssh==4.0.0
apache-airflow-providers-trino==4.3.1
Apache Airflow version
2.10.5
Operating System
linux - openshift k8s
Deployment
Official Apache Airflow Helm Chart
Deployment details
k8s openshift with argocd. We are using celery executor for running dags.
What happened
Hi there,
we are running airflow on kubernetes and using apache-airflow-providers-apache-spark to submit a spark job to the cluster, this works fine. Now we are working on implementing istio to our artifacts.
With spark config defaults like
spark.kubernetes.driver.label.sidecar.istio.io/inject: "true"
spark.kubernetes.executor.label.sidecar.istio.io/inject: "true"
we are able to spin up pods with istio labels and sidecar will be injected.
When submitting spark job with airflow to the cluster the driver and executor pods come up and istio sidecar is injected to the pod. Spark job runs success in its cotainer inside the pod but istio in its own container wont be killed. In the result the task in the dag cant get success state. The only way at the moment is to add a curl command to the image running the application called from airflow to send kill signal to istio.
Maybe there is a chance to add a custom "on Kill" to the spark submit command to kill istio or other sidecars from airflow.
What you think should happen instead
Spark submit getting state success and pod completely shutting down.
How to reproduce
Run a dag with spark submit operator in an environment which is using istio and all pods getting labeled with isitio and in this result istio sidecar is injected to driver and executor job.
Anything else
Problem occurs every time submitting a spark job while istio is active.
Are you willing to submit PR?
Code of Conduct
Apache Airflow Provider(s)
apache-spark
Versions of Apache Airflow Providers
pip freeze for providers:
apache-airflow==2.10.5
apache-airflow-providers-amazon==9.2.0
apache-airflow-providers-apache-spark==5.2.1
apache-airflow-providers-celery==3.10.0
apache-airflow-providers-cncf-kubernetes==10.1.0
apache-airflow-providers-common-compat==1.6.1
apache-airflow-providers-common-io==1.5.0
apache-airflow-providers-common-sql==1.21.0
apache-airflow-providers-docker==4.0.0
apache-airflow-providers-elasticsearch==6.0.0
apache-airflow-providers-fab==1.5.2
apache-airflow-providers-ftp==3.12.0
apache-airflow-providers-google==12.0.0
apache-airflow-providers-grpc==3.7.0
apache-airflow-providers-hashicorp==4.0.0
apache-airflow-providers-http==5.0.0
apache-airflow-providers-imap==3.8.0
apache-airflow-providers-microsoft-azure==12.0.0
apache-airflow-providers-mysql==6.0.0
apache-airflow-providers-odbc==4.9.0
apache-airflow-providers-openlineage==2.0.0
apache-airflow-providers-postgres==6.0.0
apache-airflow-providers-redis==4.0.0
apache-airflow-providers-sendgrid==4.0.0
apache-airflow-providers-sftp==5.0.0
apache-airflow-providers-slack==9.0.0
apache-airflow-providers-smtp==1.9.0
apache-airflow-providers-snowflake==6.0.0
apache-airflow-providers-sqlite==4.0.0
apache-airflow-providers-ssh==4.0.0
apache-airflow-providers-trino==4.3.1
Apache Airflow version
2.10.5
Operating System
linux - openshift k8s
Deployment
Official Apache Airflow Helm Chart
Deployment details
k8s openshift with argocd. We are using celery executor for running dags.
What happened
Hi there,
we are running airflow on kubernetes and using apache-airflow-providers-apache-spark to submit a spark job to the cluster, this works fine. Now we are working on implementing istio to our artifacts.
With spark config defaults like
spark.kubernetes.driver.label.sidecar.istio.io/inject: "true"
spark.kubernetes.executor.label.sidecar.istio.io/inject: "true"
we are able to spin up pods with istio labels and sidecar will be injected.
When submitting spark job with airflow to the cluster the driver and executor pods come up and istio sidecar is injected to the pod. Spark job runs success in its cotainer inside the pod but istio in its own container wont be killed. In the result the task in the dag cant get success state. The only way at the moment is to add a curl command to the image running the application called from airflow to send kill signal to istio.
Maybe there is a chance to add a custom "on Kill" to the spark submit command to kill istio or other sidecars from airflow.
What you think should happen instead
Spark submit getting state success and pod completely shutting down.
How to reproduce
Run a dag with spark submit operator in an environment which is using istio and all pods getting labeled with isitio and in this result istio sidecar is injected to driver and executor job.
Anything else
Problem occurs every time submitting a spark job while istio is active.
Are you willing to submit PR?
Code of Conduct