Kubternets dashboard

Cluster information:

Kubernetes version: v1.20.1
Cloud being used: bare-metal (VirtualBox)
Installation method: modeled after kubernetes-the-hard-way
Host OS: CentOS 8
CNI and version: kube-router 1.1.1
CRI and version: containerd 1.4.3

The problem

I’m hoping someone can help me resolve an issue with kubernetes-dashboard in a test cluster I’m running. The cluster has one controller and two workers. I’ve installed metrics server and kubernetes dashboard. Following the instructions to kubectl proxy, and then access the dashboard at: http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/kubernetes-dashboard/proxy/. The dashboard says:

{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {
    
  },
  "status": "Failure",
  "message": "error trying to reach service: dial tcp 10.200.1.109:8443: i/o timeout",
  "code": 500
}

There are many posts with similar errors - I believe I’ve read and tried everything to no avail!

The cluster looks like:

$ kubectl get po -A -owide
NAMESPACE              NAME                                         READY   STATUS    RESTARTS   AGE     IP              NODE   NOMINATED NODE   READINESS GATES
kube-system            coredns-75c4c67f76-nh2zr                     1/1     Running   1          46h     10.200.1.103    monk   <none>           <none>
kube-system            kube-router-dq54r                            1/1     Running   3          4d20h   192.168.0.198   monk   <none>           <none>
kube-system            kube-router-hw2zc                            1/1     Running   3          4d20h   192.168.0.197   ham    <none>           <none>
kube-system            metrics-server-649cb58d9b-m9q6k              1/1     Running   0          26h     192.168.0.197   ham    <none>           <none>
kubernetes-dashboard   alpine                                       1/1     Running   0          14h     10.200.1.108    monk   <none>           <none>
kubernetes-dashboard   dashboard-metrics-scraper-7b59f7d4df-rx248   1/1     Running   0          27h     10.200.1.105    monk   <none>           <none>
kubernetes-dashboard   kubernetes-dashboard-7fd8dbbd79-ndr87        1/1     Running   0          13h     10.200.1.110    monk   <none>           <none>

And the dashboard logs:

$ kubectl -n kubernetes-dashboard logs kubernetes-dashboard-7fd8dbbd79-ndr87
2021/02/20 02:54:57 Starting overwatch
2021/02/20 02:54:57 Using namespace: kubernetes-dashboard
2021/02/20 02:54:57 Using in-cluster config to connect to apiserver
2021/02/20 02:54:57 Using secret token for csrf signing
2021/02/20 02:54:57 Initializing csrf token from kubernetes-dashboard-csrf secret
2021/02/20 02:54:57 Successful initial request to the apiserver, version: v1.20.1
2021/02/20 02:54:57 Generating JWE encryption key
2021/02/20 02:54:57 New synchronizer has been registered: kubernetes-dashboard-key-holder-kubernetes-dashboard. Starting
2021/02/20 02:54:57 Starting secret synchronizer for kubernetes-dashboard-key-holder in namespace kubernetes-dashboard
2021/02/20 02:54:57 Initializing JWE encryption key from synchronized object
2021/02/20 02:54:57 Creating in-cluster Sidecar client
2021/02/20 02:54:57 Auto-generating certificates
2021/02/20 02:54:57 Successfully created certificates
2021/02/20 02:54:57 Serving securely on HTTPS port: 8443
2021/02/20 02:55:27 Metric client health check failed: an error on the server ("unknown") has prevented the request from succeeding (get services dashboard-metrics-scraper). Retrying in 30 

(the health check error repeats every 30 seconds indefinitely)

I’m interpreting this as either:

  1. the metrics scraper health check endpoint is not responding or is unreachable, or
  2. the dashboard pod can’t reach the metrics scraper health check endpoint

To rule out “a”:

$ kubectl -n kubernetes-dashboard port-forward\
   dashboard-metrics-scraper-7b59f7d4df-rx248 8000
Forwarding from 127.0.0.1:8000 -> 8000
Forwarding from [::1]:8000 -> 8000

$ curl -i localhost:8000/proxy/healthz
HTTP/1.1 200 OK
Date: Sat, 20 Feb 2021 16:47:59 GMT
Content-Length: 19
Content-Type: text/plain; charset=utf-8

URL: /proxy/healthz

So that tells me that the metrics scraper health check is responding.

Verify it in cluster (via the service):

kubectl -n kubernetes-dashboard run alpine\
  --image=alpine:latest --serviceaccount kubernetes-dashboard\
  --command -- sleep 60000
kubectl -n kubernetes-dashboard  exec -it alpine -- sh

/ # apk update && apk add curl
/ # curl -i http://dashboard-metrics-scraper:8000/proxy/healthz
HTTP/1.1 200 OK
Date: Sat, 20 Feb 2021 16:50:15 GMT
Content-Length: 19
Content-Type: text/plain; charset=utf-8

URL: /proxy/healthz/

So it looks like the metrics scraper health endpoint is up - and - accessible in-cluster

Following guidance others have provided as a result of various searches for this issue -

This works:

$ kubectl --as system:serviceaccount:kubernetes-dashboard:kubernetes-dashboard\
   get --raw /api/v1/namespaces/kubernetes-dashboard/services/dashboard-metrics-scraper\
  | json_pp
{
   "apiVersion" : "v1",
   "kind" : "Service",
   "metadata" : {
      "annotations" : {
         "kubectl.kubernetes.io/last-applied-configuration" : "{\"apiVersion\":\"v1\",\"kind\":\"Service\",\"metadata\":{\"annotations\":{},\"labels\":{\"k8s-app\":\"dashboard-metrics-scraper\"},\"name\":\"dashboard-metrics-scraper\",\"namespace\":\"kubernetes-dashboard\"},\"spec\":{\"ports\":[{\"port\":8000,\"targetPort\":8000}],\"selector\":{\"k8s-app\":\"dashboard-metrics-scraper\"}}}\n"
      },
      "creationTimestamp" : "2021-02-19T12:46:52Z",
      "labels" : {
         "k8s-app" : "dashboard-metrics-scraper"
      },
      "managedFields" : (deleted for brevity)
      "name" : "dashboard-metrics-scraper",
      "namespace" : "kubernetes-dashboard",
      "resourceVersion" : "76761",
      "uid" : "3d94e749-09c2-499a-a311-face6943f523"
   },
   "spec" : {
      "clusterIP" : "10.32.0.81",
      "clusterIPs" : [
         "10.32.0.81"
      ],
      "ports" : [
         {
            "port" : 8000,
            "protocol" : "TCP",
            "targetPort" : 8000
         }
      ],
      "selector" : {
         "k8s-app" : "dashboard-metrics-scraper"
      },
      "sessionAffinity" : "None",
      "type" : "ClusterIP"
   },
   "status" : {
      "loadBalancer" : {}
   }
}

But - the following does not work (and has been recommended as a troubleshooting step):

$ kubectl --as system:serviceaccount:kubernetes-dashboard:kubernetes-dashboard\
   get --raw /api/v1/namespaces/kubernetes-dashboard/services/dashboard-metrics-scraper/proxy
Error from server: error trying to reach service: dial tcp 10.200.1.105:8000: i/o timeout

So it seems to be something is preventing the dashboard pod from reaching the metrics scraper pod and it seems to only affect the dashboard pod if I am interpreting this correctly.

Looking at the scraper logs they look healthy and I can see where the curl test commands are logged:

$ kubectl -n kubernetes-dashboard logs dashboard-metrics-scraper-7b59f7d4df-rx248

...
127.0.0.1 - - [20/Feb/2021:16:47:59 +0000] "GET /proxy/healthz HTTP/1.1" 200 19 "" "curl/7.68.0"
...
10.200.1.108 - - [20/Feb/2021:16:50:15 +0000] "GET /proxy/healthz HTTP/1.1" 200 19 "" "curl/7.74.0"
...
{"level":"info","msg":"Database updated: 2 nodes, 8 pods","time":"2021-02-20T16:58:56Z"}
10.200.1.1 - - [20/Feb/2021:16:58:58 +0000] "GET / HTTP/1.1" 200 6 "" "kube-probe/1.20"
10.200.1.1 - - [20/Feb/2021:16:59:08 +0000] "GET / HTTP/1.1" 200 6 "" "kube-probe/1.20"
10.200.1.1 - - [20/Feb/2021:16:59:18 +0000] "GET / HTTP/1.1" 200 6 "" "kube-probe/1.20"

Referring to the logs above, in addition to the health checks in the logs, I can see my troubleshooting curl calls logged. But dashboard pod’s calls every 30 seconds to the endpoint are never logged. This supports the idea that the network traffic is not making it from the dashboard pod to the metrics scraper pod.

Normally that would take one down a network troubleshooting path but you can see that the alpine pod had no issues accessing the scraper pod’s endpoint in cluster. And - I do many other things in this test cluster that lead me to believe the networking is working correctly.

The metrics server itself also appears to be functioning:

$ kubectl top nodes
NAME   CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
ham    55m          2%     734Mi           20%       
monk   46m          1%     796Mi           10%       

$ kubectl top pods -A
NAMESPACE              NAME                                         CPU(cores)   MEMORY(bytes)   
kube-system            coredns-75c4c67f76-nh2zr                     5m           17Mi            
kube-system            kube-router-dq54r                            8m           18Mi            
kube-system            kube-router-hw2zc                            7m           18Mi            
kube-system            metrics-server-649cb58d9b-m9q6k              5m           18Mi            
kubernetes-dashboard   alpine                                       0m           5Mi             
kubernetes-dashboard   dashboard-metrics-scraper-7b59f7d4df-rx248   1m           11Mi            
kubernetes-dashboard   kubernetes-dashboard-7fd8dbbd79-ndr87        1m           9Mi             

I’ve scaled the dashboard deployment multiple times but it never is able to access the scraper. I’m not asking for help troubleshooting the network - I realize that’s on me. I’m just kind of out of ideas. Any recommended troubleshooting steps to isolate the problem would be very much appreciated. Thanks.

This is resolved. When standing up a cluster based on Hightower’s Kubernetes the hard way the controller doesn’t get a kubelet installed and so it is not a node in the cluster.

Therefore, the controller doesn’t have networking to the nodes. Until now I didn’t fully understand that the dashboard is proxying through the api server and - because the controller doesn’t have networking to the nodes - the api server proxy couldn’t access the dashboard node.

I changed the cluster configuration so that the controller was also a node and instantly everything worked.