Fast IPC between multiple pods on same node

Hello,

I have a single-node cluster that has 2 pods for now, but it may grow up to 5 pods. As part of a hard business requirement, I have to run each of my apps in their own dedicated pods (i.e. not allowed to stuff multiple containers into the same pod, or multiple apps into the same container/image). I need to put in place a high-speed, low-latency means of inter-process communication between 2 pods initially, but might need to support up to 5 pods each talking directly to each other. The idea is to have one of the pods act as a “re-assmbler”, that takes data from all the other pods, plus its own data, and re-assembles data a specific way into a stream.

What methods could I use to provide something faster than IP-based connectivity between these pods, given that they’re all guaranteed to live on the same node? I’ve tried packet-based connectivity, and it’s just too slow for the amount of data I have to process (given the hard requirements I have).

Thoughts so far:

  • Create a pair of named pipes (i.e. mkfifo) on the bare-metal OS, and expose it as a volume mount to the two pods, and they can both talk to each other via the pipes. Should be fast, and not too hard to synchronize. Becomes ugly with 5x pods though, as the number of pipes will grow (i.e. (n)(n-1)/2 == (5)(4)/2 == 10 pipes; and I have to figure out a sane way for pods to know which pipe sets to use for read-versus-write).
  • Shared memory?
  • Deploy redis or Memached, but I don’t know how the performance/throughput would scale compared to pipes or shared memory.
  • Some other mechanism I haven’t considered?

Thank you!

Cluster information:

Kubernetes version: 1.17.2
Cloud being used: bare-metal (kubeadm)
Installation method: apt-get
Host OS: Ubuntu Server 18.04 LTS x86_64
CNI and version: Flannel 0.3.1

Disclaimer: I have no experience doing what you are trying to do :slight_smile:

  • Named pipes - This seems like awkward complexity outside of K8s to get a worker node in to a state to run the pods.
  • Shared memory - containers within a pod can share memory based on what I’ve read, but not between different pods.
  • Redis/memcache - Assuming you would need IP connectivity to these (even if running in another pod) which you have already stated is not meeting performance requirements.

Is the business requirement some kind of company wide policy or specific to your project?
Because it seems to me like you have a valid technical requirement for multiple containers per pod which is a common K8s pattern.

Kind regards,
Stephen

@stephendotcarter It’s a legal + customer hard requirement (i.e. already raised the topic of multiple containers in the same pod, or moving all the apps into a single container: answer was no).

I don’t believe shared memory will work between pods.

As for redis/memcache, I’m not overly familiar with them, but a colleague suggested it as an alternative to the trivial/simplistic buffering client/server code I’ve written (i.e. much more efficient data-over-IP implementation).

Understood about the hard requirements :+1:

As far as Redis and Memcached are concerned, I would have thought they are more suited to storing and retrieving data whereas your requirements sounds like a direct stream between processes would be required for performance.

Are these streams something like video where its one continuous stream of binary data? or are they more like a stream of individual events?

Some requirements are not easily fulfillable. :frowning:

In this case you might be able to use the host IPC namespace to SHM across pods, but that’s a fairly privileged operation which puts the stability of the node in jeopardy. Maybe hostPath mounting a tmpfs? I have not tried that.

It’s a set of independent streams of encrypted data being decrypted by different pods (each one offloads to a different GPU/ASIC/FPGA to do different types of decryption). The data needs to be recombined (basically a glorified interleaving algorithm) in a odd/proprietary manner. It can be treated as a giant binary stream with sequence numbers.

I considered using a tmpfs via hostPath mounting, but I couldn’t figure out how to properly guarantee atomic reads/writes between pods. Maybe something like a Boost::NamedMutex or something similar (i.e. filesystem mutex), but I was hoping for something more generic (some of my apps are written in C, C++, Python, Go, node.js), and I need a solution for all of them, so Boost isn’t a magic wand in my case.

How would you do this without Kubernetes?

Random reply here (came across this when Googling something similar-ish); but for an answer to the ‘how to use filesystems’ approach;

File renames on Linux systems are atomic on the same mount point; this means if you write to a new file, then rename the new file onto the old; a given process will only see old or new, never a mixed state between the two. You need then only choose when they (re-)open the file.