In short: our application consists of 20 modules, each of which contains its own deployments, services, etc. We use Helm to start the modules. Starting the modules in sequence takes a very long time (>25 minutes) even though the underlying containers are lightweight. Starting the modules in parallel leads to timeouts and an unresponsive Kubernetes. We are doing something terribly wrong or Kubernetes might not be the right tool for the job.
Any insight to speed up the deployment process is greatly appreciated. Tips to find the bottleneck are also more than welcome. Thank you!
That is pretty much our question; we are not really able to pinpoint the bottleneck.
Starting and running a simple Job regularly takes more than 10 seconds through Kubernetes. Running it manually through Docker takes far less, say 3 seconds. The overhead of Kubernetes seems so amazingly high, but we have no clue where to look for improvements.
The logs (as far as we could find them) doesnât seem to point in any specific direction. The only thing that stood out to us was high IO when starting and running the Job, even through we arenât doing anything IO intensive in the underlying container.
There are a slew of reasons that things can take long to start. @rata hinted at one, if youâre pulling your images every time remember each layer must be untarâed and expanded. By default kubernetes will also serialize the image pulls instead of doing them in parallel, this can be controlled through --serialize-image-pulls kubelet flag. If you flip that to false youâll probably want to adjust --image-pull-progress-deadline and adjust the docker daemonâs max-concurrent-downloads.
As far as general scheduling goes, kubelet checks back in with the api server every 20 seconds (by default) to see if it has anything it should schedule locally. When youâre communicating with docker you are talking instantly to the thing that is going to start it up. When talking with kubernetes (or mesos, or swarm or any scheduler) theres going to be SOME overhead for coming to a placement decision and waiting for nodes to check in. It should not be expected to be instantaneous.
The Docker images are already available on the host machine. The message âContainer image X already present on machineâ is present in the eventlog of all Pods.
We understand that Kubernetes will have some overhead compared to using vanilla Docker. However, the overhead is so substantial that we suspect there is more going on. We are probably doing something wrong or our Kubernetes installation is misconfigured / misfunctioning, but we are having a hard time finding the exact cause.
Is the delay you describe related to the Pod states âPodScheduledâ, âInitializedâ and âReadyâ? All Pods (almost) instantly change from âPodScheduledâ to âInitializedâ, but take quite some time to transition into a âReadyâ state. Is there logging somewhere about the kubelet scheduling the container execution after the master has scheduled it?
Are you using init containers to do any preprocessing by chance? Initialized should signify that the init containers have been started and are processing. Once they complete the pod should flip over to Ready.
We are not using any init containers. All of our Deployments (and by extension our Pods) contain a single container. The container image differs from Deployment to Deployment, but most images are fairly simple.
HmmâŠthat is quite oddâŠis there anything in the events on the pods? Have you turned up the verbosity on kubelet and the api server to try and get a better picture of whats going on?
A simplified version of the events for a specific Job can be found below:
2018-11-09T15:11:57Z - Successfully assigned ZZZ to docker-for-desktop
2018-11-09T15:11:58Z - MountVolume.SetUp succeeded for volume ârepositoryâ
2018-11-09T15:11:58Z - MountVolume.SetUp succeeded for volume âdefault-token-xqbbhâ
2018-11-09T15:12:02Z - Container image YYY already present on machine
2018-11-09T15:12:03Z - Created container
2018-11-09T15:12:05Z - Started container
The time between the volume mounting and checking of the container image seems strange. Running docker images -q YYY is almost instant; not anywhere near the reported 4 seconds.
We havenât tried changing the verbosity of the kubelet and api server. As we are working with Docker for Windows (or Mac), changing settings of Kubernetes components is not straightforward. The default verbosity didnât reveal much useful information.
I will give it a try and report back with any findings, probably next monday.