I am using a EKS cluster over aws
I have create docker registry as a deployment and then created a svc and an ingress over it
In the ingress , I have placed tls secrets for the ingress Host
spec:
rules:
- host: xxxxxxxxx.com
http:
paths:
- backend:
serviceName: docker-registry
servicePort: 5000
path: /
pathType: ImplementationSpecific
tls:
- hosts:
I have 4 worker nodes and a jump server
Issue I am facing is that I am able to access the docker registry on ingress address from the jump host but from worker nodes it is failing with error
request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
what wrong i am doing here ?
I have tried placing the service IP and the registry ingress host in /etc/hosts copying the certs to /etc/docker.certs.d/registryname .
Any hint would be great ?
Cluster information:
kubectl version o/p
Client Version: version.Info{Major:“1”, Minor:“20+”, GitVersion:“v1.20.4-eks-6b7464”, GitCommit:“6b746440c04cb81db4426842b4ae65c3f7035e53”, GitTreeState:“clean”, BuildDate:“2021-03-19T19:35:50Z”, GoVersion:“go1.15.8”, Compiler:“gc”, Platform:“linux/arm64”}
Server Version: version.Info{Major:“1”, Minor:“19+”, GitVersion:“v1.19.8-eks-96780e”, GitCommit:“96780e1b30acbf0a52c38b6030d7853e575bcdf3”, GitTreeState:“clean”, BuildDate:“2021-03-10T21:32:29Z”, GoVersion:“go1.15.8”, Compiler:“gc”, Platform:“linux/amd64”}
Cloud being used: AWS
Installation method: EKS
Host OS: amazon linux ami arm64
CNI and version: Not known
CRI and version: Not known
I ran into this myself recently. Does EKS use containerd? From what I know containerd doesn’t respect the CA certs until it’s restarted and won’t until version 1.5 propagates out.
Edit: when you see that error you need to dig further into the container engine logs to confirm it’s the cert.
I checked on one worker node to find the CRI , kubelet process is as below , so I think CRI is docker
/usr/bin/kubelet --cloud-provider aws --config /etc/kubernetes/kubelet/kubelet-config.json --kubeconfig /var/lib/kubelet/kubeconfig --container-runtime docker
but i did see both dockerd and containerd processes running on the worker node.
Also on checking the docker service logs I got same error.
Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2021-06-14 08:31:57 UTC; 4 days ago
Docs: https://docs.docker.com
Process: 12574 ExecStartPre=/usr/libexec/docker/docker-setup-runtimes.sh (code=exited, status=0/SUCCESS)
Process: 12571 ExecStartPre=/bin/mkdir -p /run/docker (code=exited, status=0/SUCCESS)
Main PID: 12579 (dockerd)
Tasks: 23
Memory: 116.5M
CGroup: /system.slice/docker.service
└─12579 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
Jun 19 02:23:45 ip-xxxxx dockerd[12579]: time=“2021-06-19T02:23:45.876987774Z” level=error msg=“Handler for POST /v1.40/images/create returned error: Get https://xxxx: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)”
After thinking about this a bit and the experiences I went through, there’s two areas that you can focus on.
- Does the registry work as it’s deployed?
- Is the problem with the certificate being installed on all the nodes that pull from the registry?
- Can the nodes actually reach the registry.
You could do testing locally with kubectl port-forward
. If you’re using Docker for Desktop you will need to use the --address
flag to use a network interface that the Docker VM can reach for push/pull. Also it’s just easiest to configure the interface as an insecure registry in the docker settings.
On the nodes, you can curl /_v2/
and /_v2/catalog
, just to confirm that your registry image is working. If you have to add -k
or --insecure
to curl, I would work on the assumption the SSL isn’t installed correctly yet.
Also, here’s my registry deployment yaml. It’s a bit custom, you’ll see that I maintain a CA cert/key pair on my nodes and I generate SSL certificates adhoc in containers. The real take away is probably how I have things mounted.
---
apiVersion: v1
kind: Service
metadata:
name: registry
spec:
selector:
app: registry
ports:
- protocol: TCP
port: 443
targetPort: 5000
---
# registry.ci.svc.cluster.local
apiVersion: apps/v1
kind: Deployment
metadata:
name: registry
labels:
app: registry
spec:
replicas: 1
selector:
matchLabels:
app: registry
template:
metadata:
labels:
app: registry
spec:
volumes:
- name: registry-vol
hostPath:
path: /var/lib/data/registry
type: DirectoryOrCreate
- name: cluster-shared-ca-vol
hostPath:
path: /etc/ssl/k8s
type: Directory
- name: cert-vol
emptyDir: {}
initContainers:
# Utility that generates ssl certificate
- name: generate-ssl-certificate
image: alpine:latest
imagePullPolicy: Always
env:
- name: K8S_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
workingDir: /certs/
command:
- /bin/sh
- -c
- |
set -x -e
date -u
apk add openssl
openssl req -subj '/C=K8/ST=Cluster/L=Pod/O=SnakeOil/OU=SnakeOil/CN=registry.'"${K8S_NAMESPACE}"'.svc.cluster.local' -nodes -newkey rsa:4096 -keyout /certs/registry.key -out snakeoil.csr
openssl x509 -req -in snakeoil.csr -CA /etc/ssl/k8s/cluster-shared-ca.crt -CAkey /etc/ssl/k8s/cluster-shared-ca.key -CAcreateserial -out /certs/registry.crt -days 365 -sha256
volumeMounts:
- name: cert-vol
mountPath: /certs/
- name: cluster-shared-ca-vol
mountPath: /etc/ssl/k8s
containers:
- image: registry:2
name: registry
imagePullPolicy: IfNotPresent
env: # Ref: https://docs.docker.com/registry/configuration/
- name: REGISTRY_HTTP_ADDR
value: 0.0.0.0:5000
- name: REGISTRY_HTTP_SECRET
value: SNAIK-OIL-SECRET
- name: REGISTRY_HTTP_TLS_CERTIFICATE
value: "/certs/registry.crt"
- name: REGISTRY_HTTP_TLS_KEY
value: "/certs/registry.key"
- name: REGISTRY_LOG_LEVEL
value: debug
ports:
- containerPort: 5000
volumeMounts:
- name: registry-vol
mountPath: /var/lib/registry
- name: cert-vol
mountPath: /certs/
will try this certs part , I have installed certs at ingress level
I think its some issue with SG or NACL , because registry is accessible through the jump host which is in same subnet as the worker nodes
just within the k8s cluster pods and workers registry is not accessible
my deployment def is like this
> apiVersion: apps/v1
> kind: Deployment
> metadata:
> name: registry-deployment
> namespace: devops
> labels:
> app: registry
> spec:
> replicas: 1
> selector:
> matchLabels:
> app: registry
> template:
> metadata:
> namespace: registry
> labels:
> app: registry
> spec:
> containers:
> - name: registry
> image: registry:2.6.2
> volumeMounts:
> - name: repo-vol
> mountPath: "/var/lib/registry"
> - name: certs-vol
> mountPath: "/certs"
> readOnly: true
> - name: auth-vol
> mountPath: "/auth"
> readOnly: true
> env:
> - name: REGISTRY_AUTH
> value: "htpasswd"
> - name: REGISTRY_AUTH_HTPASSWD_REALM
> value: "Registry Realm"
> - name: REGISTRY_AUTH_HTPASSWD_PATH
> value: "/auth/htpasswd"
> volumes:
> - name: repo-vol
> persistentVolumeClaim:
> claimName: docker-repo-pvc
> - name: certs-vol
> secret:
> secretName: certs-secret
> - name: auth-vol
> secret:
> secretName: auth-secret
SVC
> apiVersion: v1
> kind: Service
> metadata:
> name: docker-registry
> namespace: devops
> spec:
> selector:
> app: registry
> ports:
> - port: 5000
> targetPort: 5000
INGRESS
> apiVersion: networkingk8sio/v1beta1
> kind: Ingress
> metadata:
> name: registry-ingress
> namespace: devops
> annotations:
> nginxingresskubernetesio/rewrite-target: /
> kubernetesio/ingressclass: nginx
> spec:
> tls:
> - hosts:
> - examplecom
> secretName: tls-registry
> rules:
> - host: examplecom
> http:
> paths:
> - path: /
> backend:
> serviceName: docker-registry
> servicePort: 5000
I have created the certs for example.com and placed it in both the deployment secret and the secret used in ingress , also tried with same secretname in both
I suspect that if its a tls issue then docker login should not work from the jump server also , here it is working from jump server but not worker nodes
Or maybe it has to do something with dns routing ?
ingress host is not accessible from inside worker nodes
I ll give a try the way you are doing
I tried with port forward and it works on the worker node
but with it i cannot access it from jump server with workerip:forwardeport
docker login 127.0.0.1:49999
Username: myuser
Password:
WARNING! Your password will be stored unencrypted in /root/.docker/config.json.
Configure a credential helper to remove this warning. See
Login Succeeded
Still cant figure out why i cant use ingress