suddenly in-kernel communication between lxc containers gets erratic

we’re using LXC containers to host multiple workloads on the same physical servers. e.g. few instances of database servers running side-by-side.

once in a while we end up with strange situation where tcp connections between containers running on the same physical server get torn down abruptly. in our case – this manifested e.g. by those errors: Caused by: Connection reset or The last packet successfully received from the server was 415 milliseconds ago. The last packet sent successfully to the server was 0 milliseconds Communications link failure.

it seems there’s a random issue where container’s network stack is not properly cleaned after container reboot or shutdown. mentions of similar cases: 1, 2.

in result we have multiple IP stacks having the same IP and the same MAC address. this manifests in 2 ARP responses sent to a single ARP who-has reply:

root@someserver:~# arping -c 1 somelxccontainer
42 bytes from 00:ff:1a:1e:ba:21 ( index=0 time=7.881 usec
42 bytes from 00:ff:1a:1e:ba:21 ( index=1 time=22.879 usec

--- statistics ---
1 packets transmitted, 2 packets received,   0% unanswered (1 extra)
rtt min/avg/max/std-dev = 0.008/0.015/0.023/0.007 ms

once diagnosed remediation is simple: either reboot the whole lxc server or identify and delete no-longer-needed vethXXXXX interfaces.

brctl show
bridge name     bridge id               STP enabled     interfaces
br0             8000.fe0023c26675       no              veth1KCRWX

how to find out which interface is orphaned?

  • within each container run cat /sys/class/net/eth0/iflink and write down the numerical interface id
  • determine which veth are not referred from any of the containers – by running /sys/class/net/vethXXXXX/ifindex on the server hosting lxc guests for each of veths: for if in $(ls -1 /sys/devices/virtual/net/br0/brif); do echo -n “$if “; cat /sys/class/net/$if/ifindex ; done
  • cross check both lists and find which veth does not belong to any lxc guest. then delete the no-longer needed interface by ip link del vethXXXXXX

how to avoid it? hopefully by explicitly naming network interfaces for guests created on the host – in /var/lib/lxc/guest/config: = veth0-nameOfOurGuest

resources that were helpful:

Leave a Reply

Your email address will not be published. Required fields are marked *


(Spamcheck Enabled)