Airflow + Kubernetes VS Airflow + Spark


Hi All,

Like some article that I previously read. It said that in new Kubernetes version, already include Spark capabilities. But with some different ways such as using KubernetesPodOperator instead of using BashOperator / PythonOperator to do SparkSubmit.

Is that the best practice to Combine Airflow + Kubernetes is to remove Spark and using KubernetesPodOperator to execute the task?

Which is have a better performance since Kubernetes have AutoScaling that Spark doesn’t have.

Need someone expert in Kubernetes to help me explain this. I’m still newbie with this Kubernetes, Spark, and Airflow things. :slight_smile:

Thank You.