{"id":2368,"date":"2014-10-23T15:07:36","date_gmt":"2014-10-23T14:07:36","guid":{"rendered":"http:\/\/kudzia.eu\/b\/?p=2368"},"modified":"2017-08-21T17:49:24","modified_gmt":"2017-08-21T16:49:24","slug":"bridge-on-vlans-on-active-backup-bonding-under-debian","status":"publish","type":"post","link":"https:\/\/kudzia.eu\/b\/2014\/10\/bridge-on-vlans-on-active-backup-bonding-under-debian\/","title":{"rendered":"bridge on vlans on active-backup bonding under debian"},"content":{"rendered":"<p>the idea: i&#8217;d like to run kvm\/lxc on debian, have guests bridged to couple of vlans and handle the network failover on the host level. network failure should be detected using arp probes not just the link [ mii ] status. after few attempts i got it working in the test environment.<br \/>\n<!--more--><br \/>\ni use happily <a href=\"https:\/\/kudzia.eu\/b\/2009\/12\/interface-teaming-bonding-vland-under-linux-debian\/\" title=\"interface teaming \/ bonding + vlan under linux \/ debian\">interface bonding to achieve ha and vlans on bonded interface<\/a> &#8211; it works fine in production. we&#8217;ll soon put few new servers in production, they&#8217;ll provide lxc\/kvm hosting and would be nice if we can still have HA\/filover and bridging possible. <a href=\"http:\/\/vk5fj.blogspot.se\/2012\/04\/vm-on-kvm-on-vlan-on-bridge-interface.html\">this<\/a> or <a href=\"http:\/\/forum.proxmox.com\/threads\/848-Bonding-Bridging-and-vLANS?p=4367#post4367\">this<\/a> describe how to set up similar configuration, but in both examples failover is triggered by a link-down event &#8211; caused by pulling a network cable, death of network interface or ethernet switch. unfortunately i&#8217;ve experienced hangs of switches that left all ports in the &#8216;up&#8217; state, so i prefer to rely on &#8217;empirical&#8217; monitoring done with help of arp probes set to known targets. after a bit of poking i&#8217;ve found that the following configuration works on debian wheezy with 3.16.3-2~bpo70+1 kernel from backports, but not using stock 3.2.60-1+deb7u3 image:<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nauto lo\r\niface lo inet loopback\r\n\r\nauto eth0\r\niface eth0 inet manual\r\n    pre-up ethtool -K eth0 tso off gso off gro off tx off rx off sg off rxvlan off txvlan off rxhash off\r\n\r\nauto eth1\r\niface eth1 inet manual\r\n    pre-up ethtool -K eth1 tso off gso off gro off tx off rx off sg off rxvlan off txvlan off rxhash off\r\n\r\nauto bond0\r\niface bond0 inet manual\r\n    bond-arp_interval 500\r\n    bond-arp_ip_target 10.0.0.1\r\n    bond-arp-validate all\r\n    bond_mode active-backup\r\n    bond-slaves eth0 eth1\r\n\r\nauto br.201\r\niface br.201 inet static\r\n    address 10.0.0.250\/24\r\n    gateway 10.0.0.1\r\n    bridge_ports bond0.201\r\n    vlan-raw-device bond0\r\n    bridge_stp off\r\n\r\nauto br.203\r\niface br.203 inet static\r\n    address 10.0.1.250\/24\r\n    bridge_ports bond0.203\r\n    vlan-raw-device bond0\r\n    bridge_stp off\r\n<\/pre>\n<p>without the ethtool lines i was getting plenty of kernel errors:<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nOct 21 22:07:56 hostname kernel: &#x5B;  787.638975] WARNING: CPU: 3 PID: 2578 at \/build\/linux-nBoDV9\/linux-3.16.3\/net\/core\/dev.c:2246 skb_warn_bad_offload+0xc4\/0xcd()\r\nOct 21 22:07:56 hostname kernel: &#x5B;  787.638977] : caps=(0x0000000004197ba9, 0x000000801fdb78a9) len=1466 data_len=0 gso_size=1368 gso_type=1 ip_summed=3\r\nOct 21 22:07:56 hostname kernel: &#x5B;  787.638978] Modules linked in: ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables 8021q garp mrp bridge stp llc bonding loop radeon ttm i5000_edac drm_kms_helper edac_core d\r\nrm psmouse shpchp ipmi_si iTCO_wdt iTCO_vendor_support lpc_ich mfd_core coretemp i5k_amb i2c_algo_bit i2c_core dcdbas ipmi_msghandler joydev evdev serio_raw pcspkr rng_core processor tpm_tis tpm thermal_sys kvm button ext4 crc16 mbcache j\r\nbd2 dm_mod usb_storage hid_generic usbhid hid sg sd_mod ses crc_t10dif crct10dif_common enclosure sr_mod cdrom ata_generic ata_piix bnx2 uhci_hcd ehci_pci ehci_hcd libata megaraid_sas scsi_mod usbcore usb_common\r\nOct 21 22:07:56 hostname kernel: &#x5B;  787.639010] CPU: 3 PID: 2578 Comm: sshd Tainted: G        W     3.16-0.bpo.2-amd64 #1 Debian 3.16.3-2~bpo70+1\r\nOct 21 22:07:56 hostname kernel: &#x5B;  787.639011] Hardware name: Dell Inc. PowerEdge 1950\/0H723K, BIOS 2.7.0 10\/30\/2010\r\nOct 21 22:07:56 hostname kernel: &#x5B;  787.639012]  0000000000000000 ffffffff81773e70 ffffffff8153ff96 ffff88022536f858\r\nOct 21 22:07:56 hostname kernel: &#x5B;  787.639014]  ffffffff8106be4c ffff8802250f1ce8 ffff880225bb0000 0000000000000003\r\nOct 21 22:07:56 hostname kernel: &#x5B;  787.639017]  00000000000010c9 0000000000000000 ffffffff8106bf3a ffffffff81773f00\r\nOct 21 22:07:56 hostname kernel: &#x5B;  787.639019] Call Trace:\r\nOct 21 22:07:56 hostname kernel: &#x5B;  787.639021]  &#x5B;&lt;ffffffff8153ff96&gt;] ? dump_stack+0x41\/0x51\r\nOct 21 22:07:56 hostname kernel: &#x5B;  787.639023]  &#x5B;&lt;ffffffff8106be4c&gt;] ? warn_slowpath_common+0x8c\/0xc0\r\nOct 21 22:07:56 hostname kernel: &#x5B;  787.639026]  &#x5B;&lt;ffffffff8106bf3a&gt;] ? warn_slowpath_fmt+0x4a\/0x50\r\nOct 21 22:07:56 hostname kernel: &#x5B;  787.639028]  &#x5B;&lt;ffffffff812cc679&gt;] ? ___ratelimit+0xa9\/0x120\r\nOct 21 22:07:56 hostname kernel: &#x5B;  787.639030]  &#x5B;&lt;ffffffff81541709&gt;] ? skb_warn_bad_offload+0xc4\/0xcd\r\nOct 21 22:07:56 hostname kernel: &#x5B;  787.639033]  &#x5B;&lt;ffffffff81444c65&gt;] ? skb_checksum_help+0x1a5\/0x1c0\r\nOct 21 22:07:56 hostname kernel: &#x5B;  787.639035]  &#x5B;&lt;ffffffff8144a8fc&gt;] ? dev_hard_start_xmit+0x4bc\/0x5f0\r\nOct 21 22:07:56 hostname kernel: &#x5B;  787.639038]  &#x5B;&lt;ffffffff8144ad65&gt;] ? __dev_queue_xmit+0x335\/0x4d0\r\nOct 21 22:07:56 hostname kernel: &#x5B;  787.639041]  &#x5B;&lt;ffffffffa037b895&gt;] ? vlan_dev_hard_start_xmit+0x95\/0x120 &#x5B;8021q]\r\nOct 21 22:07:56 hostname kernel: &#x5B;  787.639043]  &#x5B;&lt;ffffffff8144a77e&gt;] ? dev_hard_start_xmit+0x33e\/0x5f0\r\nOct 21 22:07:56 hostname kernel: &#x5B;  787.639045]  &#x5B;&lt;ffffffff8144ad65&gt;] ? __dev_queue_xmit+0x335\/0x4d0\r\nOct 21 22:07:56 hostname kernel: &#x5B;  787.639049]  &#x5B;&lt;ffffffffa0584c99&gt;] ? br_dev_queue_push_xmit+0x79\/0xa0 &#x5B;bridge]\r\nOct 21 22:07:56 hostname kernel: &#x5B;  787.639052]  &#x5B;&lt;ffffffffa0582835&gt;] ? br_dev_xmit+0x1d5\/0x280 &#x5B;bridge]\r\nOct 21 22:07:56 hostname kernel: &#x5B;  787.639055]  &#x5B;&lt;ffffffff8144a77e&gt;] ? dev_hard_start_xmit+0x33e\/0x5f0\r\nOct 21 22:07:56 hostname kernel: &#x5B;  787.639057]  &#x5B;&lt;ffffffff81486110&gt;] ? ip_forward_options+0x210\/0x210\r\nOct 21 22:07:56 hostname kernel: &#x5B;  787.639060]  &#x5B;&lt;ffffffff8144ad65&gt;] ? __dev_queue_xmit+0x335\/0x4d0\r\nOct 21 22:07:56 hostname kernel: &#x5B;  787.639062]  &#x5B;&lt;ffffffff81488029&gt;] ? ip_finish_output+0x4b9\/0x8d0\r\nOct 21 22:07:56 hostname kernel: &#x5B;  787.639064]  &#x5B;&lt;ffffffff8148864a&gt;] ? ip_queue_xmit+0x12a\/0x3b0\r\nOct 21 22:07:56 hostname kernel: &#x5B;  787.639067]  &#x5B;&lt;ffffffff8149f30e&gt;] ? tcp_transmit_skb+0x41e\/0x900\r\nOct 21 22:07:56 hostname kernel: &#x5B;  787.639069]  &#x5B;&lt;ffffffff814a0330&gt;] ? tcp_write_xmit+0x140\/0xc40\r\nOct 21 22:07:56 hostname kernel: &#x5B;  787.639071]  &#x5B;&lt;ffffffff814a0e9a&gt;] ? __tcp_push_pending_frames+0x2a\/0xc0\r\nOct 21 22:07:56 hostname kernel: &#x5B;  787.639073]  &#x5B;&lt;ffffffff81492051&gt;] ? tcp_sendmsg+0xc1\/0xcc0\r\nOct 21 22:07:56 hostname kernel: &#x5B;  787.639076]  &#x5B;&lt;ffffffff8142f1fe&gt;] ? sock_aio_write+0xfe\/0x120\r\nOct 21 22:07:56 hostname kernel: &#x5B;  787.639078]  &#x5B;&lt;ffffffff81388e27&gt;] ? tty_ioctl+0x327\/0xba0\r\nOct 21 22:07:56 hostname kernel: &#x5B;  787.639081]  &#x5B;&lt;ffffffff811b9e5f&gt;] ? do_sync_write+0x5f\/0x90\r\nOct 21 22:07:56 hostname kernel: &#x5B;  787.639083]  &#x5B;&lt;ffffffff811bac35&gt;] ? vfs_write+0x1b5\/0x1f0\r\nOct 21 22:07:56 hostname kernel: &#x5B;  787.639085]  &#x5B;&lt;ffffffff811bb050&gt;] ? SyS_write+0x50\/0xb0\r\nOct 21 22:07:56 hostname kernel: &#x5B;  787.639087]  &#x5B;&lt;ffffffff8154646d&gt;] ? system_call_fast_compare_end+0x10\/0x15\r\nOct 21 22:07:56 hostname kernel: &#x5B;  787.639089] ---&#x5B; end trace a2b7546b39be7b76 ]---\r\n<\/pre>\n<p>it seems to be the same issue as described at https:\/\/bugzilla.kernel.org\/show_bug.cgi?id=82471 and can be mitigated by disabling [at least] scatter-gather offloading<\/p>\n<p>we&#8217;ll test this set up for few more weeks and put it into production if all works fine.<\/p>\n<p><b>2017-08-20<\/b><\/p>\n<p>the setup above worked fine for me for the past 3 years. after upgrade to debian stretch it stopped &#8211; despite of <i>bond_mode active-backup<\/i> in \/etc\/network\/interfaces bond0 was configured as round robin. looks like <a href=\"https:\/\/bugs.debian.org\/cgi-bin\/bugreport.cgi?bug=870633\">this<\/a> bug report describes similar situation.<\/p>\n<p>after some experimentation i <a href=\"\/b\/2017\/08\/bridge-on-vlans-on-active-backup-bonding-under-debian-stretch\/\">got it working<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>the idea: i&#8217;d like to run kvm\/lxc on debian, have guests bridged to couple of vlans and handle the network failover on the host level. network failure should be detected using arp probes not just the link [ mii ] status. after few attempts i got it working in the test environment.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[17],"tags":[47],"class_list":["post-2368","post","type-post","status-publish","format-standard","hentry","category-tech","tag-linux-networking"],"_links":{"self":[{"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/posts\/2368","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/comments?post=2368"}],"version-history":[{"count":9,"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/posts\/2368\/revisions"}],"predecessor-version":[{"id":2779,"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/posts\/2368\/revisions\/2779"}],"wp:attachment":[{"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/media?parent=2368"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/categories?post=2368"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/tags?post=2368"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}