I am configuring a Kubernetes cluster to support a microservice with the following workflow:
Current Setup:
- Ingress: Manages external traffic.
- FastAPI Service and Pod: Connected to a FastAPI deployment pod.
- Redis pod: Handles task queuing.
- Celery Workers Pods: Process tasks from the Redis queue and scale based on queue length.
To manage scaling, I am using KEDA to autoscale based on the length of the Redis queue.
This architecture works, but I am exploring whether a Kubernetes Job-based workflow might provide better resilience during high-traffic and for batch processing. For example, replacing Celery workers with Kubernetes Jobs for task processing.
Specific Questions:
- Performance: How does the performance of Celery workers compare to Kubernetes Jobs in handling high-throughput scenarios?
- Scalability: Are Kubernetes Jobs more effective at scaling during traffic spikes compared to a KEDA+Celery worker setup?
- Failure Recovery and Observability: Which approach provides better fault tolerance and task retry mechanisms in the event of pod or node failures? And does it allow better monitoring in case of failure ?
- Resource Efficiency: Does using Kubernetes Jobs result in better resource utilization, considering overhead and runtime behavior?
I want to understand the trade-offs between my current workflow and a Job-based architecture in terms of scalability, resilience, and performance.
What are the key factors to consider when deciding between these two architectures? Are there benchmarks or real-world examples that highlight the differences?