{"id":3231,"date":"2021-08-17T21:09:10","date_gmt":"2021-08-17T20:09:10","guid":{"rendered":"https:\/\/kudzia.eu\/b\/?p=3231"},"modified":"2021-08-20T18:47:16","modified_gmt":"2021-08-20T17:47:16","slug":"suddenly-in-kernel-communication-between-lxc-containers-gets-erratic","status":"publish","type":"post","link":"https:\/\/kudzia.eu\/b\/2021\/08\/suddenly-in-kernel-communication-between-lxc-containers-gets-erratic\/","title":{"rendered":"suddenly in-kernel communication between lxc containers gets erratic"},"content":{"rendered":"<p>we&#8217;re using LXC containers to host multiple workloads on the same physical servers. e.g. few instances of database servers running side-by-side.<\/p>\n<p>once in a while we end up with strange situation where tcp connections between containers running on the same physical server get torn down abruptly. in our case &#8211; this manifested e.g. by those errors: <i>Caused by: java.net.SocketException: Connection reset<\/i> or <i>The last packet successfully received from the server was 415 milliseconds ago.  The last packet sent successfully to the server was 0 milliseconds ago.com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure<\/i>.<\/p>\n<p>it seems there&#8217;s a random issue where container&#8217;s network stack is not properly cleaned after container reboot or shutdown. mentions of similar cases: <a href=\"https:\/\/discuss.linuxcontainers.org\/t\/vethxxxxx-interfaces-are-not-removed-when-lxc-container-is-stopped\/4816\">1<\/a>, <a href=\"https:\/\/marc.info\/?t=158523586000001&#038;r=1&#038;w=2\">2<\/a>.<\/p>\n<p>in result we have multiple IP stacks having the same IP and the same MAC address. this manifests in 2 ARP responses sent to a single ARP who-has reply:<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nroot@someserver:~# arping -c 1 somelxccontainer\r\nARPING 172.16.1.21\r\n42 bytes from 00:ff:1a:1e:ba:21 (172.16.1.21): index=0 time=7.881 usec\r\n42 bytes from 00:ff:1a:1e:ba:21 (172.16.1.21): index=1 time=22.879 usec\r\n\r\n--- 172.16.17.21 statistics ---\r\n1 packets transmitted, 2 packets received,   0% unanswered (1 extra)\r\nrtt min\/avg\/max\/std-dev = 0.008\/0.015\/0.023\/0.007 ms\r\n<\/pre>\n<p>once diagnosed remediation is simple: either reboot the whole lxc server or identify and delete no-longer-needed vethXXXXX interfaces.<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nbrctl show\r\nbridge name     bridge id               STP enabled     interfaces\r\nbr0             8000.fe0023c26675       no              veth1KCRWX\r\n                                                        veth7G72IS\r\n                                                        vethBH3Q2G\r\n                                                        vethBMW0L7\r\n                                                        vethEUGFKO\r\n                                                        vethGRH1MH\r\n                                                        vethJ7RPXB\r\n                                                        vethJGTBAH\r\n                                                        vethPE6HWW\r\n                                                        vethPUKEU5\r\n                                                        vethTCVY3G\r\n                                                        vethVH9TQH\r\n<\/pre>\n<p>how to find out which interface is orphaned?<\/p>\n<ul>\n<li>within each container run cat \/sys\/class\/net\/eth0\/iflink and write down the numerical interface id<\/li>\n<li>determine which veth are not referred from any of the containers &#8211; by running \/sys\/class\/net\/vethXXXXX\/ifindex on the server hosting lxc guests for each of veths: <i>for if in $(ls -1 \/sys\/devices\/virtual\/net\/br0\/brif); do echo -n &#8220;$if &#8220;; cat \/sys\/class\/net\/$if\/ifindex ; done<\/i><\/li>\n<li>cross check both lists and find which veth does not belong to any lxc guest. then delete the no-longer needed interface by ip link del vethXXXXXX<\/li>\n<\/ul>\n<p>how to avoid it? hopefully by explicitly naming network interfaces for guests created on the host &#8211; in \/var\/lib\/lxc\/guest\/config:<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nlxc.net.1.veth.pair = veth0-nameOfOurGuest\r\n<\/pre>\n<p>resources that were helpful:<\/p>\n<ul>\n<li>https:\/\/superuser.com\/a\/1183520\/1674<\/li>\n<li>https:\/\/lxc-users.linuxcontainers.narkive.com\/EaQUUpe8\/determine-which-veth-interface-belongs-to-which-container<\/li>\n<li>https:\/\/superuser.com\/questions\/1183454\/finding-out-the-veth-interface-of-a-docker-container<\/li>\n<li><a href=\"https:\/\/www.mail-archive.com\/lxc-users@lists.sourceforge.net\/msg05316.html\">https:\/\/www.mail-archive.com\/lxc-users@lists.sourceforge.net\/msg05316.html<\/a><\/li>\n<li>https:\/\/serverfault.com\/questions\/765789\/where-are-network-namespaces-in-lxc-lxd<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>we&#8217;re using LXC containers to host multiple workloads on the same physical servers. e.g. few instances of database servers running side-by-side. once in a while we end up with strange situation where tcp connections between containers running on the same physical server get torn down abruptly. in our case &#8211; this manifested e.g. by those [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[17,51],"tags":[47,85],"class_list":["post-3231","post","type-post","status-publish","format-standard","hentry","category-tech","category-unimportant","tag-linux-networking","tag-lxc"],"_links":{"self":[{"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/posts\/3231","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/comments?post=3231"}],"version-history":[{"count":7,"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/posts\/3231\/revisions"}],"predecessor-version":[{"id":3240,"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/posts\/3231\/revisions\/3240"}],"wp:attachment":[{"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/media?parent=3231"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/categories?post=3231"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/tags?post=3231"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}