I'm developing a python project that will be hosted on kubernets, on the Google Cloud provider. The idea is to read a file of millions of rows, where each row is the query's input key in an API

Guilherme_Duarte · November 19, 2020, 11:42pm

Cloud being used: Google Cloud

I’m developing a python project that will be hosted on kubernets, on the Google Cloud provider. The idea is to read a file of millions of rows, where each row is the query’s input key in an API.

I want to run my application on several Kubernetes PODs, because I want to have scalability, that is, multiple queries running at the same time. However, in this code structure (read lines from txt file), each pod will end up iterating over the file from the beginning, reading lines already consulted. And it is not what I want. Then two ideas came up:

Split the files, and each split would be distributed among the pods. (Example: for a 100 line file with 10 pods, each pod would read 10 lines)
Before running the application, create a consumption queue with the lines of the file, so that all pods would read the queue and not the file directly.

Option 2 seems to me to be more scalable and faster. But I would like suggestions for the best way to make a query using a file as a reference. I may want to run, for example, 1 million queries in 24 hours.

feloy · November 21, 2020, 1:29pm

Hi @Guilherme_Duarte

I prefer the second option: as you run on GCP, you can have one pod that reads the file and sends lines into a pub/sub queue, and other pods that are getting lines from this pub/sub queue.

You can also run Kubernetes jobs for these latter, for which it is a typical case: Jobs | Kubernetes

Topic		Replies	Views
Kubernetes performance on the GCP General Discussions network	2	475	June 30, 2022
GKE javascript API General Discussions development	1	475	May 17, 2021
Python code performance in Kubernetes pods General Discussions	4	2495	May 19, 2022
Run multiple instances of a python application General Discussions	0	874	November 30, 2019
Kubeflow - Using Kubernetes python client to mount a volume on GKE General Discussions	2	2318	October 7, 2019

I'm developing a python project that will be hosted on kubernets, on the Google Cloud provider. The idea is to read a file of millions of rows, where each row is the query's input key in an API

Related topics