Hi everyone!
I want to install kubeflow with gpu support so I adapted to Canonical tutorial and installed (MicroK8s v1.29.4 revision 6809, classic) with nvidia addon.
When I try to deploy kubeflow with :
juju deploy kubeflow --trust --channel=1.8/stable
the deployment gets stuck because of an error with the unit “istio-pilot/0” (see below)
juju status
Model Controller Cloud/Region Version SLA Timestamp
kubeflow uk8sx my-k8s/localhost 3.4.5 unsupported 16:36:15+02:00
App Version Status Scale Charm Channel Rev Address Exposed Message
admission-webhook active 1 admission-webhook 1.8/stable 301 10.152.183.52 no
argo-controller active 1 argo-controller 3.3.10/stable 424 10.152.183.46 no
dex-auth active 1 dex-auth 2.36/stable 422 10.152.183.22 no
envoy res:oci-image@cc06b3e active 1 envoy 2.0/stable 194 10.152.183.103 no
istio-ingressgateway active 1 istio-gateway 1.17/stable 1000 10.152.183.154 no
istio-pilot waiting 1 istio-pilot 1.17/stable 1011 10.152.183.125 no installing agent
jupyter-controller active 1 jupyter-controller 1.8/stable 849 10.152.183.63 no
jupyter-ui active 1 jupyter-ui 1.8/stable 858 10.152.183.244 no
katib-controller res:oci-image@31ccd70 active 1 katib-controller 0.16/stable 576 10.152.183.226 no
katib-db 8.0.36-0ubuntu0.22.04.1 active 1 mysql-k8s 8.0/stable 153 10.152.183.223 no
katib-db-manager active 1 katib-db-manager 0.16/stable 539 10.152.183.76 no
katib-ui active 1 katib-ui 0.16/stable 422 10.152.183.87 no
kfp-api active 1 kfp-api 2.0/stable 1283 10.152.183.55 no
kfp-db 8.0.36-0ubuntu0.22.04.1 active 1 mysql-k8s 8.0/stable 153 10.152.183.84 no
kfp-metadata-writer active 1 kfp-metadata-writer 2.0/stable 334 10.152.183.91 no
kfp-persistence active 1 kfp-persistence 2.0/stable 1291 10.152.183.85 no
kfp-profile-controller active 1 kfp-profile-controller 2.0/stable 1315 10.152.183.142 no
kfp-schedwf active 1 kfp-schedwf 2.0/stable 1466 10.152.183.119 no
kfp-ui active 1 kfp-ui 2.0/stable 1285 10.152.183.184 no
kfp-viewer active 1 kfp-viewer 2.0/stable 1317 10.152.183.56 no
kfp-viz active 1 kfp-viz 2.0/stable 1235 10.152.183.31 no
knative-eventing active 1 knative-eventing 1.10/stable 353 10.152.183.47 no
knative-operator active 1 knative-operator 1.10/stable 328 10.152.183.111 no
knative-serving active 1 knative-serving 1.10/stable 409 10.152.183.237 no
kserve-controller active 1 kserve-controller 0.11/stable 573 10.152.183.218 no
kubeflow-dashboard active 1 kubeflow-dashboard 1.8/stable 582 10.152.183.194 no
kubeflow-profiles active 1 kubeflow-profiles 1.8/stable 355 10.152.183.232 no
kubeflow-roles active 1 kubeflow-roles 1.8/stable 187 10.152.183.239 no
kubeflow-volumes res:oci-image@2261827 active 1 kubeflow-volumes 1.8/stable 260 10.152.183.251 no
metacontroller-operator active 1 metacontroller-operator 3.0/stable 252 10.152.183.143 no
minio res:oci-image@1755999 active 1 minio ckf-1.8/stable 278 10.152.183.53 no
mlmd res:oci-image@44abc5d active 1 mlmd 1.14/stable 127 10.152.183.157 no
oidc-gatekeeper active 1 oidc-gatekeeper ckf-1.8/stable 350 10.152.183.57 no
pvcviewer-operator active 1 pvcviewer-operator 1.8/stable 30 10.152.183.188 no
seldon-controller-manager active 1 seldon-core 1.17/stable 664 10.152.183.24 no
tensorboard-controller active 1 tensorboard-controller 1.8/stable 257 10.152.183.89 no
tensorboards-web-app active 1 tensorboards-web-app 1.8/stable 245 10.152.183.28 no
training-operator active 1 training-operator 1.7/stable 347 10.152.183.248 no
Unit Workload Agent Address Ports Message
admission-webhook/0* active idle 10.1.14.122
argo-controller/0* active idle 10.1.14.97
dex-auth/0* active idle 10.1.14.82
envoy/0* active idle 10.1.14.136 9090,9901/TCP
istio-ingressgateway/0* active idle 10.1.14.118
istio-pilot/0* error idle 10.1.14.65 hook failed: “ingress-relation-created”
jupyter-controller/0* active idle 10.1.14.66
jupyter-ui/0* active idle 10.1.14.123
katib-controller/0* active idle 10.1.14.140 443,8080/TCP
katib-db-manager/0* active idle 10.1.14.91
katib-db/0* active idle 10.1.14.79 Primary
katib-ui/0* active idle 10.1.14.100
kfp-api/0* active idle 10.1.14.119
kfp-db/0* active idle 10.1.14.86 Primary
kfp-metadata-writer/0* active idle 10.1.14.121
kfp-persistence/0* active idle 10.1.14.99
kfp-profile-controller/0* active idle 10.1.14.93
kfp-schedwf/0* active idle 10.1.14.101
kfp-ui/0* active idle 10.1.14.67
kfp-viewer/0* active idle 10.1.14.90
kfp-viz/0* active idle 10.1.14.68
knative-eventing/0* active idle 10.1.14.89
knative-operator/0* active idle 10.1.14.109
knative-serving/0* active idle 10.1.14.102
kserve-controller/0* active idle 10.1.14.107
kubeflow-dashboard/0* active idle 10.1.14.95
kubeflow-profiles/0* active idle 10.1.14.106
kubeflow-roles/0* active idle 10.1.14.114
kubeflow-volumes/0* active idle 10.1.14.134 5000/TCP
metacontroller-operator/0* active idle 10.1.14.103
minio/0* active idle 10.1.14.142 9000-9001/TCP
mlmd/0* active idle 10.1.14.139 8080/TCP
oidc-gatekeeper/0* active idle 10.1.14.84
pvcviewer-operator/0* active idle 10.1.14.98
seldon-controller-manager/0* active idle 10.1.14.110
tensorboard-controller/0* active idle 10.1.14.117
tensorboards-web-app/0* active idle 10.1.14.96
training-operator/0* active idle 10.1.14.112
Here some other info :
juju debug-log --replay --include=istio-pilot
unit-istio-pilot-0: 16:33:29 INFO juju.worker.uniter awaiting error resolution for “relation-created” hook
unit-istio-pilot-0: 16:34:10 INFO juju.worker.uniter awaiting error resolution for “relation-created” hook
unit-istio-pilot-0: 16:34:10 INFO juju.worker.uniter awaiting error resolution for “relation-created” hook
unit-istio-pilot-0: 16:34:10 INFO juju.worker.uniter awaiting error resolution for “relation-created” hook
unit-istio-pilot-0: 16:35:13 INFO juju.worker.uniter awaiting error resolution for “relation-created” hook
unit-istio-pilot-0: 16:35:51 INFO juju.worker.uniter awaiting error resolution for “relation-created” hook
unit-istio-pilot-0: 16:36:54 INFO juju.worker.uniter awaiting error resolution for “relation-created” hook
unit-istio-pilot-0: 16:38:29 INFO juju.worker.uniter awaiting error resolution for “relation-created” hook
unit-istio-pilot-0: 16:38:30 INFO unit.istio-pilot/0.juju-log ingress:20: HTTP Request: GET https://10.152.183.1/api/v1/namespaces/kubeflow/services/istio-ingressgateway-workload “HTTP/1.1 200 OK”
unit-istio-pilot-0: 16:38:30 ERROR unit.istio-pilot/0.juju-log ingress:20: Uncaught exception while in charm code:
Traceback (most recent call last):
File “./src/charm.py”, line 1203, in
main(Operator)
File “/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/ops/main.py”, line 540, in main
manager = _Manager(
File “/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/ops/main.py”, line 424, in init
self.charm = self._make_charm(self.framework, self.dispatcher)
File “/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/ops/main.py”, line 427, in _make_charm
charm = self._charm_class(framework)
File “./src/charm.py”, line 116, in init
cert_subject=self._cert_subject,
File “./src/charm.py”, line 474, in _cert_subject
svc_address = _get_gateway_address_from_svc(svc)
File “./src/charm.py”, line 1057, in _get_gateway_address_from_svc
gateway_address = _get_address_from_loadbalancer(svc)
File “./src/charm.py”, line 1072, in _get_address_from_loadbalancer
if len(ingresses) != 1:
TypeError: object of type ‘NoneType’ has no len()
unit-istio-pilot-0: 16:38:30 ERROR juju.worker.uniter.operation hook “ingress-relation-created” (via hook dispatching script: dispatch) failed: exit status 1
unit-istio-pilot-0: 16:38:30 INFO juju.worker.uniter awaiting error resolution for “relation-created” hook
unit-istio-pilot-0: 16:41:04 INFO juju.worker.uniter awaiting error resolution for “relation-created” hook
I do not find any solution on other forum. What can I do to start the istio-pilot unit ?