Hi. I’m new to kubernetes (but solid with docker). I have to implement a cluster, have done a CKAD crash course, but it does not answer my questions. So, here they go, hope you can just guide me, or at least refer me to related documentation.
I have a variable set of producers, number that changes each minute (I might have 150 now, and 120 in five minutes). Each one produces text, which must be processed in a pipeline (I already have the containers running OK with docker-compose) and the result is sent to a database. This is essentially the structure of a pipeline:
[producer 122] - [transform122.1] - [transform122.2] - [transform122.3] - DB
The communication is made with kafka, no problem there. I have tested my pipeline using docker-compose, and it runs fine. The pipelines are linear, meaning that they are always a sequence, no possibility of [transform 122.1] receiving something from [producer34]. The structure for each pipeline does not change.
My questions are:
- How should I deploy this? Considering that each process consumes always a unique and predefined previous output, I don’t know how to assign the environment variables for that, except if I make a YAML definition for each pod (so, I would have hundreds of yaml files, which seems weird).
- Should I make a full pipeline --multiple containers-- inside each pod? This might simplify the piping, and provide me of the control of pipes, given that the producers quantity vary each minute, but all docs recommend just one container per pipeline.
- Some containers need config files, how do I inject the proper configuration files to the proper volumes? again, without making one yaml definition per pipeline/container.
Thanks in advance.