Minimal restricted Kubernetes Cluster on Ubuntu in production

If you take a look at the Pod’s (kubectl describe pods <podname>) that are in stuck in the Pending state, what are the events showing? If nothing interesting is there you may want to check at the deployment level.

In Calico 3.6 my pods (kubectl get pods -n kube-system) get this status:

NAME                                       READY   STATUS     RESTARTS   AGE
calico-kube-controllers-55df754b5d-h6hbq   0/1     Pending    0          6m3s
calico-node-rb8ht                          0/1     Init:0/2   0          6m3s
coredns-86c58d9df4-4nlz4                   0/1     Pending    0          7m33s
coredns-86c58d9df4-tzfnk                   0/1     Pending    0          7m33s
etcd-cherokee                              1/1     Running    0          7m1s
kube-apiserver-cherokee                    1/1     Running    0          6m34s
kube-controller-manager-cherokee           1/1     Running    0          7m1s
kube-proxy-9psnk                           1/1     Running    0          7m33s
kube-scheduler-cherokee                    1/1     Running    0          6m51s

All of the Pending one’s have this warning message:

   Type     Reason            Age                   From               Message
  ----     ------            ----                  ----               -------
  Warning  FailedScheduling  36s (x14 over 9m35s)  default-scheduler  0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.

And the one with status Init:0/2 have the following describe (kubectl describe pods calico-node-rb8ht -n kube-system):

Name: calico-node-rb8ht
Namespace: kube-system
Priority: 0
PriorityClassName:
Node: cherokee/150.164.7.70
Start Time: Mon, 15 Apr 2019 22:13:25 -0300
Labels: controller-revision-hash=6d7cd85bcc
k8s-app=calico-node
pod-template-generation=1
Annotations: scheduler.alpha.kubernetes.io/critical-pod:
Status: Pending
IP: 150.164.7.70
Controlled By: DaemonSet/calico-node
Init Containers:
upgrade-ipam:
Container ID: docker://1d5ffdc2aca63d2b6a71d76e635d6d0394ee3c76ed33608aa9c559ac002ed0ee
Image: calico/cni:v3.6.1
Image ID: docker-pullable://calico/cni@sha256:285b8409910c72d410807a346c339a203ecbc38c39333567666066bc167a4b82
Port:
Host Port:
Command:
/opt/cni/bin/calico-ipam
-upgrade
State: Running
Started: Mon, 15 Apr 2019 22:13:30 -0300
Ready: False
Restart Count: 0
Environment:
KUBERNETES_NODE_NAME: (v1:spec.nodeName)
CALICO_NETWORKING_BACKEND: <set to the key ‘calico_backend’ of config map ‘calico-config’> Optional: false
Mounts:
/host/opt/cni/bin from cni-bin-dir (rw)
/var/lib/cni/networks from host-local-net-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-jcc85 (ro)
install-cni:
Container ID:
Image: calico/cni:v3.6.1
Image ID:
Port:
Host Port:
Command:
/install-cni.sh
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Environment:
CNI_CONF_NAME: 10-calico.conflist
CNI_NETWORK_CONFIG: <set to the key ‘cni_network_config’ of config map ‘calico-config’> Optional: false
KUBERNETES_NODE_NAME: (v1:spec.nodeName)
CNI_MTU: <set to the key ‘veth_mtu’ of config map ‘calico-config’> Optional: false
SLEEP: false
Mounts:
/host/etc/cni/net.d from cni-net-dir (rw)
/host/opt/cni/bin from cni-bin-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-jcc85 (ro)
Containers:
calico-node:
Container ID:
Image: calico/node:v3.6.1
Image ID:
Port:
Host Port:
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Requests:
cpu: 250m
Liveness: http-get http://localhost:9099/liveness delay=10s timeout=1s period=10s #success=1 #failure=6
Readiness: exec [/bin/calico-node -bird-ready -felix-ready] delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
DATASTORE_TYPE: kubernetes
WAIT_FOR_DATASTORE: true
NODENAME: (v1:spec.nodeName)
CALICO_NETWORKING_BACKEND: <set to the key ‘calico_backend’ of config map ‘calico-config’> Optional: false
CLUSTER_TYPE: k8s,bgp
IP: autodetect
CALICO_IPV4POOL_IPIP: Always
FELIX_IPINIPMTU: <set to the key ‘veth_mtu’ of config map ‘calico-config’> Optional: false
CALICO_IPV4POOL_CIDR: 192.168.0.0/16
CALICO_DISABLE_FILE_LOGGING: true
FELIX_DEFAULTENDPOINTTOHOSTACTION: ACCEPT
FELIX_IPV6SUPPORT: false
FELIX_LOGSEVERITYSCREEN: info
FELIX_HEALTHENABLED: true
Mounts:
/lib/modules from lib-modules (ro)
/run/xtables.lock from xtables-lock (rw)
/var/lib/calico from var-lib-calico (rw)
/var/run/calico from var-run-calico (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-jcc85 (ro)
Conditions:
Type Status
Initialized False
Ready False
ContainersReady False
PodScheduled True
Volumes:
lib-modules:
Type: HostPath (bare host directory volume)
Path: /lib/modules
HostPathType:
var-run-calico:
Type: HostPath (bare host directory volume)
Path: /var/run/calico
HostPathType:
var-lib-calico:
Type: HostPath (bare host directory volume)
Path: /var/lib/calico
HostPathType:
xtables-lock:
Type: HostPath (bare host directory volume)
Path: /run/xtables.lock
HostPathType: FileOrCreate
cni-bin-dir:
Type: HostPath (bare host directory volume)
Path: /opt/cni/bin
HostPathType:
cni-net-dir:
Type: HostPath (bare host directory volume)
Path: /etc/cni/net.d
HostPathType:
host-local-net-dir:
Type: HostPath (bare host directory volume)
Path: /var/lib/cni/networks
HostPathType:
calico-node-token-jcc85:
Type: Secret (a volume populated by a Secret)
SecretName: calico-node-token-jcc85
Optional: false
QoS Class: Burstable
Node-Selectors: beta.kubernetes.io/os=linux
Tolerations: :NoSchedule
:NoExecute
CriticalAddonsOnly
disk-pressure:NoSchedule
memory-pressure:NoSchedule
network-unavailable:NoSchedule
not-ready:NoExecute
unreachable:NoExecute
unschedulable:NoSchedule
Events:
Type Reason Age From Message


Normal Scheduled 9m11s default-scheduler Successfully assigned kube-system/calico-node-rb8ht to cherokee
Normal Pulled 9m8s kubelet, cherokee Container image “calico/cni:v3.6.1” already present on machine
Normal Created 9m7s kubelet, cherokee Created container
Normal Started 9m6s kubelet, cherokee Started container

I am new to k8s and couldn’t find any useful info from those outputs.

Ahh it’s a single node cluster. Try removing the master node taint, kubectl taint nodes --all node-role.kubernetes.io/master-. That should allow pods to be scheduled on it and bring Calico up.

Thanks a lot @macintoshprime… that definitely got me one step further, but now I get another strange warning in my mongodb pod:

Name: mongo-5d89cc6f7f-t7cph
Namespace: default
Priority: 0
PriorityClassName:
Node:
Labels: name=mongo
pod-template-hash=5d89cc6f7f
Annotations:
Status: Pending
IP:
Controlled By: ReplicaSet/mongo-5d89cc6f7f
Containers:
mongo:
Image: mongo
Port: 27017/TCP
Host Port: 0/TCP
Environment:
Mounts:
/data/db from mongo-claim0 (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-8bngw (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
mongo-claim0:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: mongo-claim0
ReadOnly: false
default-token-8bngw:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-8bngw
Optional: false
QoS Class: BestEffort
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message


Warning FailedScheduling 3m59s (x2 over 3m59s) default-scheduler 0/1 nodes are available: 1 node(s) had taints that the pod didn’t tolerate.
Warning FailedScheduling 29s (x7 over 3m38s) default-scheduler pod has unbound immediate PersistentVolumeClaims

I will now try to see if I find something wrong in the manifests generated by kompose.

So mongo is trying to provision some storage via the PVC and my guess is there is no Storage Class to handle the provisioning. Are you using the Helm Chart or did you build it on your own?

Here are a few of the docs you may find helpful

Wow! There are a lot of k8s related concepts to digest. I’ve spent more than an hour reading your links to understand a bit.

I did not use Helm, just kompose convert in my previus docker-compose.yml:

version: "3.3"
services:
  mongo:
    image: mongo
    restart: always
    container_name: mongo
    volumes:
      - /data/mongodb:/data/db
    ports:
     - "30001:27017"

Then it just created a Deployment, Service and a PVC from it. The documentation didn’t tell me a needed to create a PersistentVolume as in Configure a Pod to Use a PersistentVolume for Storage. There is also this Claims As Volumes, I am tryng to understand if it is a replacement for a PV, but I think this is what kompose did.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  creationTimestamp: null
  labels:
    name: mongo-claim0
  name: mongo-claim0
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 100Mi
status: {}
---
apiVersion: v1
kind: Service
metadata:
  labels:
    name: mongo
  name: mongo
spec:
  type: NodePort
  ports:
    - port: 27017
      nodePort: 30001
      targetPort: 27017
  selector:
    name: mongo
status:
  loadBalancer: {}
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  creationTimestamp: null
  labels:
    name: mongo
  name: mongo
spec:
  replicas: 1
  strategy:
    type: Recreate
  template:
    metadata:
      creationTimestamp: null
      labels:
        name: mongo
    spec:
      containers:
      - image: mongo
        name: mongo
        ports:
        - containerPort: 27017
        resources: {}
        volumeMounts:
        - mountPath: /data/db
          name: mongo-claim0
      restartPolicy: Always
      volumes:
      - name: mongo-claim0
        persistentVolumeClaim:
          claimName: mongo-claim0
status: {}

What is even MORE strange, I that I used kompose convert in an almost identical Elasticsearch service and it is Running in my single-node without problems.

Trying to figure out the magic in this PVC-PV bindings…

Ya so looked at what you provided from the docker-compose.yaml and my guess would be that kompose tried to handle the following line by creating a PVC and mounting at /data/db

volumes: - /data/mongodb:/data/db

Persistent Volumes can be a little weird at first, think I did that tutorial a few times before I started getting the hang of them. The Docs from GKE have a nice explanation of what they are, https://cloud.google.com/kubernetes-engine/docs/concepts/persistent-volumes .

Saad Ali had a really good talk about Storage which you can find here, https://www.youtube.com/watch?v=uSxlgK1bCuA

Very clarifying video and docs @macintoshprime, thanks again. Now I see there is no StorageClass and PersistentVolumes in my single-node, so the PVCs are in Pending state.

I saw hostPath is not recommended if I am going for a cloud provider latter (AWS, Google) because it is not “Workload portability friendly”. Should I use local PersistentVolume or there is an out-of-tree recommended plugin for storage in this case?

Happy to hear!

If you’re already on AWS you can take advantage of their S3 storage, Storage Classes - Kubernetes .

If you’re looking for other solutions here are a few that are pretty awesome.

https://rook.io/
https://storageos.com/

@macintoshprime there is no way my StorageClasses work on a single-node kubeadm. I tried using AWS-EBS storageclass, but it doesn’t work locally without the actual use of an account. I also tried 3 different approaches for StorageClass:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: standard
annotations:
  storageclass.kubernetes.io/is-default-class: "true"
provisioner: docker.io/hostpath
reclaimPolicy: Retain

After creating the StorageClass I try creating 2 PVCs, but they do not get binded and no PV is created. I also tried the class used in Minikube (which works in my development environment):

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  namespace: kube-system
  name: standard
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
  labels:
    addonmanager.kubernetes.io/mode: EnsureExists
provisioner: k8s.io/minikube-hostpath

Same result… and at last I tried a class sugested in Kubernetes Slack:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    storageclass.beta.kubernetes.io/is-default-class: "true"
  labels:
    addonmanager.kubernetes.io/mode: EnsureExists
  name: standard
provisioner: kubernetes.io/host-path
reclaimPolicy: Delete
volumeBindingMode: Immediate

Again, no PVs, no bind. I though it could be some taint related issue, but I’ve already removed all the taint from master using kubectl taint nodes --all node-role.kubernetes.io/master-. Any idea?

If you look at the state of the PVC’s that are not bind what error’s do the give kubectl describe pvc <brokenpvc> ?

Have you tried either Portworx or StorageOS both of them have nice Helm installers for a quick install.

Just as a general FYI - the storage classes you tried:

  • provisioner: docker.io/hostpath - Does not exist in vanilla deployments. It is only found in docker managed instances.
  • provisioner: k8s.io/minikube-hostpath is only available with minikube.
  • kubernetes.io/host-path - IIRC it has been superceded by local volumes.

If you need something that is just tied to a single host, the local volume route might not be a bad option. It will prevent migration etc, but should be okay for a single node instance.

1 Like

It seems to be a problem with all these provisioners at @mrbobbytables said. Is this a Kubernetes class I need to deploy? Is there a way to get minikube’s provisioner for example? I also found out there is need to enable the kubernetes.io/host-path which is disabled by default. I did not find any announcement of documentation about it being superceded by local volumes…

I got this:

Name:          esdata
Namespace:     default
StorageClass:  standard
Status:        Pending
Volume:        
Labels:        name=esdata
Annotations:   volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/host-path
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      
Access Modes:  
VolumeMode:    Filesystem
Events:
  Type       Reason              Age                 From                         Message
  ----       ------              ----                ----                         -------
  Warning    ProvisioningFailed  36s (x6 over 2m5s)  persistentvolume-controller  Failed to create provisioner: Provisioning in volume plugin "kubernetes.io/host-path" is disabled
Mounted By:  elasticsearch-deployment-56fbc8698-6p4jg```

The local volumes option is a good one if your just testing things out but if you need auto-provisioning then using something else would probably be best.

Again storageOS, portworx are good options as they have helm charts you can use to quickly spin things up. OpenEBS just got inducted into the CNCF as an incubating project, it’s worth looking at as well https://openebs.io/

I will take a look at them too! Thanks again =)

I also tried using this local storage (kubernetes.io/no-provisioner) and as it does not support dynamic provisioning, my PVC are also not being bound =/

kubectl get storageclass
NAME            PROVISIONER                    AGE
local-storage   kubernetes.io/no-provisioner   5m24s

kubectl describe pvc esdata
Name:          esdata
Namespace:     default
StorageClass:  
Status:        Pending
Volume:        
Labels:        name=esdata
Annotations:   <none>
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      
Access Modes:  
VolumeMode:    Filesystem
Events:
  Type       Reason         Age               From                         Message
  ----       ------         ----              ----                         -------
  Normal     FailedBinding  5s (x7 over 78s)  persistentvolume-controller  no persistent volumes available for this claim and no storage class is set
Mounted By:  elasticsearch-deployment-56fbc8698-4dblq```

Happy to help. Hope one of those options works out for you. Lets us know how it goes.

I gave up this no-provider and decided to give StorageOS a try using Helm. I installed Helm 2.14.0-linux-amd64 and followed the documentation. I was having an error by running helm init && helm repo add storageos https://charts.storageos.com && helm repo update && helm install storageos/storageos --namespace storageos --set cluster.join=singlenode --set csi.enable=true:

Error: no available release name found

After more reading I realized kubeadm should enable RBAC by default, so I had to create a ServiceAccount for Tiller. After that the StorageOS chart seemed to have a syntax problem:

Error: validation failed: error validating “”: error validating data: [ValidationError(CustomResourceDefinition.status): missing required field “conditions” in io.k8s.apiextensions-apiserver.pkg.apis.apiextensions.v1beta1.CustomResourceDefinitionStatus, ValidationError(CustomResourceDefinition.status): missing required field “storedVersions” in io.k8s.apiextensions-apiserver.pkg.apis.apiextensions.v1beta1.CustomResourceDefinitionStatus]

I am trying to see if I am doing something dumb or I should go for another solution like Portworx.

Sidenote: I read much about StorageClasses and dynamic provisioning, thats why I went for a CSI driver which is now at 1.1.0 API version. In their official list StorageOS have 1.0.0 implementation while Portworx has an older (and less featured) 0.3.

I don’t have access to my dev cluster until tomorrow. I did try deploying somethings in Kind and I got different errors (assuming that’s due to some missing featureflags though).

If you’re having trouble it might be worth checking out one of the other solutions, to see if you get the same error or not. If it does, might be that there is a config error in the cluster.

I’ve created an issue in the storageos/chart project and they recommended using another chart (which is also a StorageOS CSI driver) storageos-operator. After installing it, the storageclass is set and PODs are running. BUT, I get the same problem as the no-provider! No dynamic provisioning of PVs… PVCs also has the events:

  Normal     FailedBinding  5s (x7 over 78s)  persistentvolume-controller  no persistent volumes available for this claim and no storage class is set

Heading for 3 months trying to run this not-that-complex application on kubeadm. Despite learning a lot, I couldn’t see the light at the end of the tunnel.