Like some article that I previously read. It said that in new Kubernetes version, already include Spark capabilities. But with some different ways such as using KubernetesPodOperator instead of using BashOperator / PythonOperator to do SparkSubmit.
Is that the best practice to Combine Airflow + Kubernetes is to remove Spark and using KubernetesPodOperator to execute the task?
Which is have a better performance since Kubernetes have AutoScaling that Spark doesn’t have.
Need someone expert in Kubernetes to help me explain this. I’m still newbie with this Kubernetes, Spark, and Airflow things.