I am working on containerizing a bunch of applications that have the following structure at a high level:-
Read from a DB/File system
Extract the data or do some parsing (Business logic)
Write back the crunched data to datastore.
Let name these 3 steps as s1, s2, and s3.
What is the best way to have a balance between code reuse and making the solution overly complexity? In other terms what is the best design pattern/practice to implement this kind of solutions that are industry-accepted?
Few approached that I could pen own are as follows:-
Separate pods for each application with one container each, having s1, s2, and s3 as part of same code.
Benefits : Simple and compact code base. No interprocess/pod communication
Limitation : No Code reuse
Separate pods for each application, with each pod having 3 container doing separate functionality as s1, s2, and s3.
Benefits : Code reuse.
Limitation : Interprocess communication may increase processing latency.
Separate group of pods for s1, s2, and s3 says sg1, sg2, and sg3 respectively running independently. From an application perspective, we create a new pod that talks to the mentioned 3 pod groups to get work done.
Benefits : Code reuse.
Limitation : Interprocess communication may increase processing latency. Also, maintaining pod groups is an add-on overhead. Increase in complexity
Request to suggest any other alternative if suitable.
Code reuse (in particular, your s1 and s3 parts) is more related to your source layout than the runtime characteristics of the job. Your option #2 is likely a poor choice.
You should probably have one container image for each s2 (or each distinct s1+s2+s3 combination) that you have, with the three steps all bundled together.
Then, you can submit Jobs or create a CronJob to run the Pod when you need to, instead of leaving idle Pods running in your cluster.
Benefits : Code reuse, No interprocess/pod communication (except to storage) Limitation : Increased container image storage size, as you are storing multiple copies of the compiled s1 and s3 code
If I understand correctly, you are suggesting #1 as the proposed solution. In that scenario how will we be able to reuse the code for s1 and s3, we will need to copy paste teh code for s1 and s3 in all application image.
Please let me know if my interpretation is correct.
Sorry, maybe I’m missing something, but I don’t see why if the language is interpreted then code won’t be reused?
I tend to agree with your previous answer, that it depends on code layout. And in compiled or interpreted languages you can share code (modules, libraries/packages, etc). This doesn’t even need to be on different repositories (although they can), you can create different executables (for example in Python) which execute s1, s2 or s3 in the repo root and modules within the repo to share the code you want.
That is only one option, as I said there are several others.
The container needs to have everything needed to run your application. If you can get away with having all different variations of your s2 segment (business logic) in one codebase and switch between them with CLI flags or ConfigMaps - do it!
This would mean you only need a single container image, and single container Pods, to run all of your Jobs.
Hmm, not sure I follow. Of course you can have it all in a single container, you can do that.
But why an interpreted language means code can’t be re-used and needs to be copy pasted? You can use packages (like gems in Ruby, etc.), you can even import code from different paths (the mono repo approach is an extreme of this), etc.