bridge on vlans on active-backup bonding under debian

the idea: i’d like to run kvm/lxc on debian, have guests bridged to couple of vlans and handle the network failover on the host level. network failure should be detected using arp probes not just the link [ mii ] status. after few attempts i got it working in the test environment.

i use happily interface bonding to achieve ha and vlans on bonded interface – it works fine in production. we’ll soon put few new servers in production, they’ll provide lxc/kvm hosting and would be nice if we can still have HA/filover and bridging possible. this or this describe how to set up similar configuration, but in both examples failover is triggered by a link-down event – caused by pulling a network cable, death of network interface or ethernet switch. unfortunately i’ve experienced hangs of switches that left all ports in the ‘up’ state, so i prefer to rely on ’empirical’ monitoring done with help of arp probes set to known targets. after a bit of poking i’ve found that the following configuration works on debian wheezy with 3.16.3-2~bpo70+1 kernel from backports, but not using stock 3.2.60-1+deb7u3 image:

auto lo
iface lo inet loopback

auto eth0
iface eth0 inet manual
    pre-up ethtool -K eth0 tso off gso off gro off tx off rx off sg off rxvlan off txvlan off rxhash off

auto eth1
iface eth1 inet manual
    pre-up ethtool -K eth1 tso off gso off gro off tx off rx off sg off rxvlan off txvlan off rxhash off

auto bond0
iface bond0 inet manual
    bond-arp_interval 500
    bond-arp_ip_target 10.0.0.1
    bond-arp-validate all
    bond_mode active-backup
    bond-slaves eth0 eth1

auto br.201
iface br.201 inet static
    address 10.0.0.250/24
    gateway 10.0.0.1
    bridge_ports bond0.201
    vlan-raw-device bond0
    bridge_stp off

auto br.203
iface br.203 inet static
    address 10.0.1.250/24
    bridge_ports bond0.203
    vlan-raw-device bond0
    bridge_stp off

without the ethtool lines i was getting plenty of kernel errors:

Oct 21 22:07:56 hostname kernel: [  787.638975] WARNING: CPU: 3 PID: 2578 at /build/linux-nBoDV9/linux-3.16.3/net/core/dev.c:2246 skb_warn_bad_offload+0xc4/0xcd()
Oct 21 22:07:56 hostname kernel: [  787.638977] : caps=(0x0000000004197ba9, 0x000000801fdb78a9) len=1466 data_len=0 gso_size=1368 gso_type=1 ip_summed=3
Oct 21 22:07:56 hostname kernel: [  787.638978] Modules linked in: ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables 8021q garp mrp bridge stp llc bonding loop radeon ttm i5000_edac drm_kms_helper edac_core d
rm psmouse shpchp ipmi_si iTCO_wdt iTCO_vendor_support lpc_ich mfd_core coretemp i5k_amb i2c_algo_bit i2c_core dcdbas ipmi_msghandler joydev evdev serio_raw pcspkr rng_core processor tpm_tis tpm thermal_sys kvm button ext4 crc16 mbcache j
bd2 dm_mod usb_storage hid_generic usbhid hid sg sd_mod ses crc_t10dif crct10dif_common enclosure sr_mod cdrom ata_generic ata_piix bnx2 uhci_hcd ehci_pci ehci_hcd libata megaraid_sas scsi_mod usbcore usb_common
Oct 21 22:07:56 hostname kernel: [  787.639010] CPU: 3 PID: 2578 Comm: sshd Tainted: G        W     3.16-0.bpo.2-amd64 #1 Debian 3.16.3-2~bpo70+1
Oct 21 22:07:56 hostname kernel: [  787.639011] Hardware name: Dell Inc. PowerEdge 1950/0H723K, BIOS 2.7.0 10/30/2010
Oct 21 22:07:56 hostname kernel: [  787.639012]  0000000000000000 ffffffff81773e70 ffffffff8153ff96 ffff88022536f858
Oct 21 22:07:56 hostname kernel: [  787.639014]  ffffffff8106be4c ffff8802250f1ce8 ffff880225bb0000 0000000000000003
Oct 21 22:07:56 hostname kernel: [  787.639017]  00000000000010c9 0000000000000000 ffffffff8106bf3a ffffffff81773f00
Oct 21 22:07:56 hostname kernel: [  787.639019] Call Trace:
Oct 21 22:07:56 hostname kernel: [  787.639021]  [<ffffffff8153ff96>] ? dump_stack+0x41/0x51
Oct 21 22:07:56 hostname kernel: [  787.639023]  [<ffffffff8106be4c>] ? warn_slowpath_common+0x8c/0xc0
Oct 21 22:07:56 hostname kernel: [  787.639026]  [<ffffffff8106bf3a>] ? warn_slowpath_fmt+0x4a/0x50
Oct 21 22:07:56 hostname kernel: [  787.639028]  [<ffffffff812cc679>] ? ___ratelimit+0xa9/0x120
Oct 21 22:07:56 hostname kernel: [  787.639030]  [<ffffffff81541709>] ? skb_warn_bad_offload+0xc4/0xcd
Oct 21 22:07:56 hostname kernel: [  787.639033]  [<ffffffff81444c65>] ? skb_checksum_help+0x1a5/0x1c0
Oct 21 22:07:56 hostname kernel: [  787.639035]  [<ffffffff8144a8fc>] ? dev_hard_start_xmit+0x4bc/0x5f0
Oct 21 22:07:56 hostname kernel: [  787.639038]  [<ffffffff8144ad65>] ? __dev_queue_xmit+0x335/0x4d0
Oct 21 22:07:56 hostname kernel: [  787.639041]  [<ffffffffa037b895>] ? vlan_dev_hard_start_xmit+0x95/0x120 [8021q]
Oct 21 22:07:56 hostname kernel: [  787.639043]  [<ffffffff8144a77e>] ? dev_hard_start_xmit+0x33e/0x5f0
Oct 21 22:07:56 hostname kernel: [  787.639045]  [<ffffffff8144ad65>] ? __dev_queue_xmit+0x335/0x4d0
Oct 21 22:07:56 hostname kernel: [  787.639049]  [<ffffffffa0584c99>] ? br_dev_queue_push_xmit+0x79/0xa0 [bridge]
Oct 21 22:07:56 hostname kernel: [  787.639052]  [<ffffffffa0582835>] ? br_dev_xmit+0x1d5/0x280 [bridge]
Oct 21 22:07:56 hostname kernel: [  787.639055]  [<ffffffff8144a77e>] ? dev_hard_start_xmit+0x33e/0x5f0
Oct 21 22:07:56 hostname kernel: [  787.639057]  [<ffffffff81486110>] ? ip_forward_options+0x210/0x210
Oct 21 22:07:56 hostname kernel: [  787.639060]  [<ffffffff8144ad65>] ? __dev_queue_xmit+0x335/0x4d0
Oct 21 22:07:56 hostname kernel: [  787.639062]  [<ffffffff81488029>] ? ip_finish_output+0x4b9/0x8d0
Oct 21 22:07:56 hostname kernel: [  787.639064]  [<ffffffff8148864a>] ? ip_queue_xmit+0x12a/0x3b0
Oct 21 22:07:56 hostname kernel: [  787.639067]  [<ffffffff8149f30e>] ? tcp_transmit_skb+0x41e/0x900
Oct 21 22:07:56 hostname kernel: [  787.639069]  [<ffffffff814a0330>] ? tcp_write_xmit+0x140/0xc40
Oct 21 22:07:56 hostname kernel: [  787.639071]  [<ffffffff814a0e9a>] ? __tcp_push_pending_frames+0x2a/0xc0
Oct 21 22:07:56 hostname kernel: [  787.639073]  [<ffffffff81492051>] ? tcp_sendmsg+0xc1/0xcc0
Oct 21 22:07:56 hostname kernel: [  787.639076]  [<ffffffff8142f1fe>] ? sock_aio_write+0xfe/0x120
Oct 21 22:07:56 hostname kernel: [  787.639078]  [<ffffffff81388e27>] ? tty_ioctl+0x327/0xba0
Oct 21 22:07:56 hostname kernel: [  787.639081]  [<ffffffff811b9e5f>] ? do_sync_write+0x5f/0x90
Oct 21 22:07:56 hostname kernel: [  787.639083]  [<ffffffff811bac35>] ? vfs_write+0x1b5/0x1f0
Oct 21 22:07:56 hostname kernel: [  787.639085]  [<ffffffff811bb050>] ? SyS_write+0x50/0xb0
Oct 21 22:07:56 hostname kernel: [  787.639087]  [<ffffffff8154646d>] ? system_call_fast_compare_end+0x10/0x15
Oct 21 22:07:56 hostname kernel: [  787.639089] ---[ end trace a2b7546b39be7b76 ]---

it seems to be the same issue as described at https://bugzilla.kernel.org/show_bug.cgi?id=82471 and can be mitigated by disabling [at least] scatter-gather offloading

we’ll test this set up for few more weeks and put it into production if all works fine.

2017-08-20

the setup above worked fine for me for the past 3 years. after upgrade to debian stretch it stopped – despite of bond_mode active-backup in /etc/network/interfaces bond0 was configured as round robin. looks like this bug report describes similar situation.

after some experimentation i got it working.

4 Comments

    • you’re welcome.

      i’d love to get active-active setup [ where more than one network card is used in the ‘normal state’] together with arp-based failover [opposed to just mii-based], with vlans and bridges on the top of it. if you have any suggestions how to get that done – i’m eager to hear.

Leave a Reply

Your email address will not be published. Required fields are marked *

 

(Spamcheck Enabled)