the idea: i’d like to run kvm/lxc on debian, have guests bridged to couple of vlans and handle the network failover on the host level. network failure should be detected using arp probes not just the link [ mii ] status. after few attempts i got it working in the test environment.
i use happily interface bonding to achieve ha and vlans on bonded interface – it works fine in production. we’ll soon put few new servers in production, they’ll provide lxc/kvm hosting and would be nice if we can still have HA/filover and bridging possible. this or this describe how to set up similar configuration, but in both examples failover is triggered by a link-down event – caused by pulling a network cable, death of network interface or ethernet switch. unfortunately i’ve experienced hangs of switches that left all ports in the ‘up’ state, so i prefer to rely on ’empirical’ monitoring done with help of arp probes set to known targets. after a bit of poking i’ve found that the following configuration works on debian wheezy with 3.16.3-2~bpo70+1 kernel from backports, but not using stock 3.2.60-1+deb7u3 image:
auto lo iface lo inet loopback auto eth0 iface eth0 inet manual pre-up ethtool -K eth0 tso off gso off gro off tx off rx off sg off rxvlan off txvlan off rxhash off auto eth1 iface eth1 inet manual pre-up ethtool -K eth1 tso off gso off gro off tx off rx off sg off rxvlan off txvlan off rxhash off auto bond0 iface bond0 inet manual bond-arp_interval 500 bond-arp_ip_target 10.0.0.1 bond-arp-validate all bond_mode active-backup bond-slaves eth0 eth1 auto br.201 iface br.201 inet static address 10.0.0.250/24 gateway 10.0.0.1 bridge_ports bond0.201 vlan-raw-device bond0 bridge_stp off auto br.203 iface br.203 inet static address 10.0.1.250/24 bridge_ports bond0.203 vlan-raw-device bond0 bridge_stp off
without the ethtool lines i was getting plenty of kernel errors:
Oct 21 22:07:56 hostname kernel: [ 787.638975] WARNING: CPU: 3 PID: 2578 at /build/linux-nBoDV9/linux-3.16.3/net/core/dev.c:2246 skb_warn_bad_offload+0xc4/0xcd() Oct 21 22:07:56 hostname kernel: [ 787.638977] : caps=(0x0000000004197ba9, 0x000000801fdb78a9) len=1466 data_len=0 gso_size=1368 gso_type=1 ip_summed=3 Oct 21 22:07:56 hostname kernel: [ 787.638978] Modules linked in: ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables 8021q garp mrp bridge stp llc bonding loop radeon ttm i5000_edac drm_kms_helper edac_core d rm psmouse shpchp ipmi_si iTCO_wdt iTCO_vendor_support lpc_ich mfd_core coretemp i5k_amb i2c_algo_bit i2c_core dcdbas ipmi_msghandler joydev evdev serio_raw pcspkr rng_core processor tpm_tis tpm thermal_sys kvm button ext4 crc16 mbcache j bd2 dm_mod usb_storage hid_generic usbhid hid sg sd_mod ses crc_t10dif crct10dif_common enclosure sr_mod cdrom ata_generic ata_piix bnx2 uhci_hcd ehci_pci ehci_hcd libata megaraid_sas scsi_mod usbcore usb_common Oct 21 22:07:56 hostname kernel: [ 787.639010] CPU: 3 PID: 2578 Comm: sshd Tainted: G W 3.16-0.bpo.2-amd64 #1 Debian 3.16.3-2~bpo70+1 Oct 21 22:07:56 hostname kernel: [ 787.639011] Hardware name: Dell Inc. PowerEdge 1950/0H723K, BIOS 2.7.0 10/30/2010 Oct 21 22:07:56 hostname kernel: [ 787.639012] 0000000000000000 ffffffff81773e70 ffffffff8153ff96 ffff88022536f858 Oct 21 22:07:56 hostname kernel: [ 787.639014] ffffffff8106be4c ffff8802250f1ce8 ffff880225bb0000 0000000000000003 Oct 21 22:07:56 hostname kernel: [ 787.639017] 00000000000010c9 0000000000000000 ffffffff8106bf3a ffffffff81773f00 Oct 21 22:07:56 hostname kernel: [ 787.639019] Call Trace: Oct 21 22:07:56 hostname kernel: [ 787.639021] [<ffffffff8153ff96>] ? dump_stack+0x41/0x51 Oct 21 22:07:56 hostname kernel: [ 787.639023] [<ffffffff8106be4c>] ? warn_slowpath_common+0x8c/0xc0 Oct 21 22:07:56 hostname kernel: [ 787.639026] [<ffffffff8106bf3a>] ? warn_slowpath_fmt+0x4a/0x50 Oct 21 22:07:56 hostname kernel: [ 787.639028] [<ffffffff812cc679>] ? ___ratelimit+0xa9/0x120 Oct 21 22:07:56 hostname kernel: [ 787.639030] [<ffffffff81541709>] ? skb_warn_bad_offload+0xc4/0xcd Oct 21 22:07:56 hostname kernel: [ 787.639033] [<ffffffff81444c65>] ? skb_checksum_help+0x1a5/0x1c0 Oct 21 22:07:56 hostname kernel: [ 787.639035] [<ffffffff8144a8fc>] ? dev_hard_start_xmit+0x4bc/0x5f0 Oct 21 22:07:56 hostname kernel: [ 787.639038] [<ffffffff8144ad65>] ? __dev_queue_xmit+0x335/0x4d0 Oct 21 22:07:56 hostname kernel: [ 787.639041] [<ffffffffa037b895>] ? vlan_dev_hard_start_xmit+0x95/0x120 [8021q] Oct 21 22:07:56 hostname kernel: [ 787.639043] [<ffffffff8144a77e>] ? dev_hard_start_xmit+0x33e/0x5f0 Oct 21 22:07:56 hostname kernel: [ 787.639045] [<ffffffff8144ad65>] ? __dev_queue_xmit+0x335/0x4d0 Oct 21 22:07:56 hostname kernel: [ 787.639049] [<ffffffffa0584c99>] ? br_dev_queue_push_xmit+0x79/0xa0 [bridge] Oct 21 22:07:56 hostname kernel: [ 787.639052] [<ffffffffa0582835>] ? br_dev_xmit+0x1d5/0x280 [bridge] Oct 21 22:07:56 hostname kernel: [ 787.639055] [<ffffffff8144a77e>] ? dev_hard_start_xmit+0x33e/0x5f0 Oct 21 22:07:56 hostname kernel: [ 787.639057] [<ffffffff81486110>] ? ip_forward_options+0x210/0x210 Oct 21 22:07:56 hostname kernel: [ 787.639060] [<ffffffff8144ad65>] ? __dev_queue_xmit+0x335/0x4d0 Oct 21 22:07:56 hostname kernel: [ 787.639062] [<ffffffff81488029>] ? ip_finish_output+0x4b9/0x8d0 Oct 21 22:07:56 hostname kernel: [ 787.639064] [<ffffffff8148864a>] ? ip_queue_xmit+0x12a/0x3b0 Oct 21 22:07:56 hostname kernel: [ 787.639067] [<ffffffff8149f30e>] ? tcp_transmit_skb+0x41e/0x900 Oct 21 22:07:56 hostname kernel: [ 787.639069] [<ffffffff814a0330>] ? tcp_write_xmit+0x140/0xc40 Oct 21 22:07:56 hostname kernel: [ 787.639071] [<ffffffff814a0e9a>] ? __tcp_push_pending_frames+0x2a/0xc0 Oct 21 22:07:56 hostname kernel: [ 787.639073] [<ffffffff81492051>] ? tcp_sendmsg+0xc1/0xcc0 Oct 21 22:07:56 hostname kernel: [ 787.639076] [<ffffffff8142f1fe>] ? sock_aio_write+0xfe/0x120 Oct 21 22:07:56 hostname kernel: [ 787.639078] [<ffffffff81388e27>] ? tty_ioctl+0x327/0xba0 Oct 21 22:07:56 hostname kernel: [ 787.639081] [<ffffffff811b9e5f>] ? do_sync_write+0x5f/0x90 Oct 21 22:07:56 hostname kernel: [ 787.639083] [<ffffffff811bac35>] ? vfs_write+0x1b5/0x1f0 Oct 21 22:07:56 hostname kernel: [ 787.639085] [<ffffffff811bb050>] ? SyS_write+0x50/0xb0 Oct 21 22:07:56 hostname kernel: [ 787.639087] [<ffffffff8154646d>] ? system_call_fast_compare_end+0x10/0x15 Oct 21 22:07:56 hostname kernel: [ 787.639089] ---[ end trace a2b7546b39be7b76 ]---
it seems to be the same issue as described at https://bugzilla.kernel.org/show_bug.cgi?id=82471 and can be mitigated by disabling [at least] scatter-gather offloading
we’ll test this set up for few more weeks and put it into production if all works fine.
2017-08-20
the setup above worked fine for me for the past 3 years. after upgrade to debian stretch it stopped – despite of bond_mode active-backup in /etc/network/interfaces bond0 was configured as round robin. looks like this bug report describes similar situation.
after some experimentation i got it working.
Thanks! I was looking for this! The rxvlan, txvlan and rxhash settings did not need to change on my end, but ymmv.
you’re welcome.
i’d love to get active-active setup [ where more than one network card is used in the ‘normal state’] together with arp-based failover [opposed to just mii-based], with vlans and bridges on the top of it. if you have any suggestions how to get that done – i’m eager to hear.
We never really got that working on the switches we have (Ciscos). I think your best bet here is to go for switches that support 802.3ad (LACP) and use that.
thanks.. ideally i’d like it to be ‘switch agnostic’ [ eg work even with ‘dumb switches’ ]. but – fortunately – extra bandwidth beyond single 1 gbit/s is nice to have but not essential in my case.