Cluster information:
Kubernetes version: v1.17.1
Cloud being used: bare-metal
Installation method: kubeadm
Host OS: Ubuntu 18.04 LTS Server
CNI and version: weave/flannel/kube-router/calico (latest releases of each)
CRI and version: docker/containerd (19.03.5/1.2.10)
Problem:
I am attempting to bring up a ROS 2 installation on Kubernetes, ideally using multiple containers in a single pod. Under the hood, ROS 2 relies upon DDS for communication, which is based upon UDP multicast.
When I bring up a simple pod deployment with two containers in a producer-consumer configuration, the consumer rarely (if ever) receives a message from the producer. When I bring up two pods, each with a single container the same producer-consumer configuration, the consumer always receives the messages.
Surprises
Every once in a while, the consumer will start up and receive messages as expected.
Furthermore, if one logs into the consumer with kubectl exec -it ros2-1 -c consumer /bin/bash
then runs /ros_entrypoint.sh ros2 run demo_nodes_cpp listener
, messages are sometimes received from the producer in the single pod scenario.
Expected Behavior
Successful messages appear in the logs of the consumer container as:
[INFO] [1579805884.017171859] [listener]: I heard: [Hello World: 1]
[INFO] [1579805885.017168023] [listener]: I heard: [Hello World: 2]
[INFO] [1579805886.017025092] [listener]: I heard: [Hello World: 3]
Actual Behavior
No such log messages are observed from the consumer.
Steps to Reproduce:
Failure within same pod
-
Bring up a kubernetes cluster
-
Apply the following pod definition: ros2-1.yaml
apiVersion: v1 kind: Pod metadata: name: ros2-1 spec: containers: - name: producer image: osrf/ros2:nightly args: ["ros2", "run", "demo_nodes_cpp", "talker"] - name: consumer image: osrf/ros2:nightly args: ["ros2", "run", "demo_nodes_cpp", "listener"] restartPolicy: Never
-
Watch for messages from the consumer with
kubectl logs --follow ros2-1 consumer
.
Success in different pods
-
Bring up a kubernetes cluster
-
Apply the following pod definition: ros2-2.yaml
apiVersion: v1 kind: Pod metadata: name: ros2-2-producer spec: containers: - name: producer image: osrf/ros2:nightly args: ["ros2", "run", "demo_nodes_cpp", "talker"] restartPolicy: Never --- apiVersion: v1 kind: Pod metadata: name: ros2-2-consumer spec: containers: - name: consumer image: osrf/ros2:nightly args: ["ros2", "run", "demo_nodes_cpp", "listener"] restartPolicy: Never
-
Watch for messages from the consumer with
kubectl logs --follow ros2-2-consumer
.
Questions:
- What is causing a single pod deployment to fail, but multi pod deployment to succeed?
- I am unfamiliar with debugging networking issues within the Kubernetes environment, while fairly experienced on bare-metal. How should I go about investigating this issue under flannel, weave, or kube-router?