Error enabling metallb: internal error, context deadline exceeded

Hello,

MicroK8s v1.25.4 revision 4221
two nodes (nuc and nuc2): ubuntu kinetic 22.10, NUC with 8Gb of RAM

Plain fresh installation with:

  • sudo snap install microk8s --classic on both nodes
  • microk8s add-node on node nuc
  • microk8s join .... --worker on node nuc2
  • microk8s enable dns on node nuc
  • microk8s enable metallb with 10.10.18.10-10.10.18.250 when prompted, on node nuc

The enable metallb command on nuc had the following failure:

$ microk8s enable metallb
Infer repository core for addon metallb
Enabling MetalLB
Enter each IP address range delimited by comma (e.g. '10.64.140.43-10.64.140.49,192.168.0.105-192.168.0.111'): 10.10.18.10-10.10.18.250
Applying Metallb manifest
customresourcedefinition.apiextensions.k8s.io/addresspools.metallb.io created
customresourcedefinition.apiextensions.k8s.io/bfdprofiles.metallb.io created
customresourcedefinition.apiextensions.k8s.io/bgpadvertisements.metallb.io created
customresourcedefinition.apiextensions.k8s.io/bgppeers.metallb.io created
customresourcedefinition.apiextensions.k8s.io/communities.metallb.io created
customresourcedefinition.apiextensions.k8s.io/ipaddresspools.metallb.io created
customresourcedefinition.apiextensions.k8s.io/l2advertisements.metallb.io created
namespace/metallb-system created
serviceaccount/controller created
serviceaccount/speaker created
clusterrole.rbac.authorization.k8s.io/metallb-system:controller created
clusterrole.rbac.authorization.k8s.io/metallb-system:speaker created
role.rbac.authorization.k8s.io/controller created
role.rbac.authorization.k8s.io/pod-lister created
clusterrolebinding.rbac.authorization.k8s.io/metallb-system:controller created
clusterrolebinding.rbac.authorization.k8s.io/metallb-system:speaker created
rolebinding.rbac.authorization.k8s.io/controller created
secret/webhook-server-cert created
service/webhook-service created
rolebinding.rbac.authorization.k8s.io/pod-lister created
daemonset.apps/speaker created
deployment.apps/controller created
validatingwebhookconfiguration.admissionregistration.k8s.io/validating-webhook-configuration created
Waiting for Metallb controller to be ready.
deployment.apps/controller condition met
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "ipaddresspoolvalidationwebhook.metallb.io": failed to call webhook: Post "https://webhook-service.metallb-system.svc:443/validate-metallb-io-v1beta1-ipaddresspool?timeout=10s": context deadline exceeded
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "l2advertisementvalidationwebhook.metallb.io": failed to call webhook: Post "https://webhook-service.metallb-system.svc:443/validate-metallb-io-v1beta1-l2advertisement?timeout=10s": context deadline exceeded

It might have worked in the end, though. All pods seem to be running and have no restarts:

$ kubectl get pods -A -o wide -w
NAMESPACE        NAME                                      READY   STATUS    RESTARTS   AGE     IP             NODE   NOMINATED NODE   READINESS GATES
kube-system      calico-node-4mrj9                         1/1     Running   0          11m     10.10.10.251   nuc    <none>           <none>
kube-system      calico-kube-controllers-fd5fccb79-wbtvn   1/1     Running   0          13m     10.1.119.129   nuc    <none>           <none>
kube-system      coredns-d489fb88-kpxp7                    1/1     Running   0          7m37s   10.1.40.193    nuc2   <none>           <none>
kube-system      calico-node-46pfv                         1/1     Running   0          11m     10.10.10.31    nuc2   <none>           <none>
metallb-system   controller-56c4696b5-fv5gw                1/1     Running   0          5m23s   10.1.40.194    nuc2   <none>           <none>
metallb-system   speaker-fxbpt                             1/1     Running   0          5m23s   10.10.10.31    nuc2   <none>           <none>
metallb-system   speaker-blrnp                             1/1     Running   0          5m23s   10.10.10.251   nuc    <none>           <none>

Current microk8s status on the primary (nuc) node:

andreas@nuc:~$ microk8s status
microk8s is running
high-availability: no
  datastore master nodes: 10.10.10.251:19001
  datastore standby nodes: none
addons:
  enabled:
    dns                  # (core) CoreDNS
    ha-cluster           # (core) Configure high availability on the current node
    helm                 # (core) Helm - the package manager for Kubernetes
    helm3                # (core) Helm 3 - the package manager for Kubernetes
    metallb              # (core) Loadbalancer for your Kubernetes cluster
  disabled:
    cert-manager         # (core) Cloud native certificate management
    community            # (core) The community addons repository
    dashboard            # (core) The Kubernetes dashboard
    gpu                  # (core) Automatic enablement of Nvidia CUDA
    host-access          # (core) Allow Pods connecting to Host services smoothly
    hostpath-storage     # (core) Storage class; allocates storage from host directory
    ingress              # (core) Ingress controller for external access
    kube-ovn             # (core) An advanced network fabric for Kubernetes
    mayastor             # (core) OpenEBS MayaStor
    metrics-server       # (core) K8s Metrics Server for API access to service metrics
    observability        # (core) A lightweight observability stack for logs, traces and metrics
    prometheus           # (core) Prometheus operator for monitoring and logging
    rbac                 # (core) Role-Based Access Control for authorisation
    registry             # (core) Private image registry exposed on localhost:32000
    storage              # (core) Alias to hostpath-storage add-on, deprecated

I’m about to exercise this cluster to confirm it’s working correctly, but wanted to share this error message which may indicate a lingering bug somewhere, and see if others saw it too.

Cheers!

Yeah, I don’t think that metallb enablement worked. I don’t have the CRDs:

$ kubectl get -A IPAddressPool
No resources found

$ kubectl get -A L2Advertisement
No resources found

But they are defined at least:

$ kubectl api-resources  | grep -E -i "(IPAddressPool|L2Advertisement)"
ipaddresspools                                 metallb.io/v1beta1                     true         IPAddressPool
l2advertisements                               metallb.io/v1beta1                     true         L2Advertisement

Something doesn’t seem to be coping with the fact I have two nodes. I see different behavior wrt cluster ips depending on which node I use to connect.

I’ve since removed nuc2 from the cluster and re-added it without using the --worker options, so both nodes have the control plane, but that didn’t help.

On nuc I cannot connect to the webook-service service cluster-ip:

andreas@nuc:~$ microk8s kubectl get svc -n metallb-system
NAME              TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
webhook-service   ClusterIP   10.152.183.167   <none>        443/TCP   20m
andreas@nuc:~$ telnet 10.152.183.167 443
Trying 10.152.183.167...
^C

But on node nuc2, I can:

andreas@nuc2:~$ microk8s kubectl get svc -n metallb-system
NAME              TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
webhook-service   ClusterIP   10.152.183.167   <none>        443/TCP   20m
andreas@nuc2:~$ telnet 10.152.183.167 443
Trying 10.152.183.167...
Connected to 10.152.183.167.
Escape character is '^]'.
^]
telnet> quit
Connection closed.

Looks like someone reported something similar: Error Enabling Addon "metallb" · Issue #3530 · canonical/microk8s · GitHub

I repeated my same steps on an Ubuntu 22.04 LTS install, and this time it all worked.

More specifically, I retried a simpler case in VMs first, without involving metallb, and found out that the connection to a service ip was flaky, and only worked quickly when the endpoint it was hitting happened to be on the same node. I retested that scenario with ubuntu 22.10 and 22.04, and it consistently failed when the OS was ubuntu 22.10.