Bond Interoperability with Cisco and Arista Switches

This knowledge base article discusses interoperability and troubleshooting in a mixed switch topology, featuring switches running Cumulus Linux on the same network as switches running Cisco and Arista network OSes. The article uses the syntax of ifupdown2.

Environment

  • Cumulus Linux 2.1 and later

Example Mixed Switch Topology

The three examples described below use the following diagram, with all testing performed on actual equipment. Both sides use the same interfaces; for example, swp19 (switch port 19) on the Cumulus Linux switch connects to g0/19 (Gigabit Ethernet 0/19) on the Cisco 3560.

Cumulus Linux and Cisco IOS

The following example utilizes slow LACPDUs (that is, bond-lacp-rate is set to 0):

Quanta LY2 w/Cumulus Linux 4.2.0 Cisco WS-C3560X-24 12.2(55)SE5
auto bond1

iface bond1
bond-slaves glob swp19-20
bond-miimon 100
bond-min-links 1
bond-mode 802.3ad
bond-xmit-hash-policy layer3+4 bond-lacp-rate 0

auto vlan10
iface vlan10
bridge-ports bond1.10
address 10.10.10.11/24
bridge-stp on
vlan 10

interface GigabitEthernet0/19
switchport trunk encapsulation dot1q switchport mode trunk channel-group 1 mode active interface GigabitEthernet0/20 switchport trunk encapsulati on dot1q switchport mode trunk channel-group 1 mode active

interface Port-channel1
switchport trunk encapsulation dot1q switchport mode trunk

interface Vlan10
ip address 10.10.10.10 255.2

Cumulus Linux and Arista EOS

The following example utilizes fast LACPDUs (where bond-lacp-rate is set to 1):

Quanta LY2 w/Cumulus Linux 4.2.0 Arista DCS-7148S-R 4.13.5F
auto bond2

iface bond2
bond-slaves glob swp37-38
bond-miimon 100
bond-min-links 1
bond-mode 802.3ad
bond-xmit-hash-policy layer3+4 bond-lacp-rate 1

auto vlan12
iface vlan12
bridge_ports bond2.12
address 12.12.12.11/24
bridge-stp on
interface Ethernet37

switchport mode trunk
channel-group 2 mode active interface Ethernet38 switchport mode trunk channel-group 2 mode active

interface Port-Channel2
switchport trunk allowed vlan 12 switchport mode trunk

interface Vlan12
ip address 12.12.12.12/24

Cumulus Linux and Cisco NX-OS

The following example utilizes fast LACPDUs (where bond-lacp-rate is set to 1):

Quanta LY2 w/Cumulus Linux 4.2.0 Cisco Nexus3064 5.0(3)U2(2c)
auto bond3

iface bond3
bond-slaves glob swp39-40
bond-miimon 100
bond-min-links 1
bond-mode 802.3ad
bond-xmit-hash-policy layer3+4 bond-lacp-rate 1

auto vlan14
iface vlan14
bridge-ports bond3.14
address 14.14.14.11/24
bridge-stp on
feature interface-vlan
feature lacp

vlan 14

interface Ethernet1/39
switchport mode trunk
channel-group 3 mode active
interface Ethernet1/40
switchport mode trunk
channel-group 3 mode active

interface port-channel3
switchport mode trunk

interface Vlan14
no shutdown
ip address 14.14.14.14/24

The three most common problems with EtherChannels are:

  • VLAN mismatches with layer 2 bonds
  • Fast vs slow LACP rate of LACPDUs
  • Both sides using passive LACP mode instead of active LACP mode

Because Cumulus Linux is Linux, it utilizes the same kernel syntax for bonds that you can find in the kernel.org documentation. The Cumulus Linux bonding documentation contains specific examples. The following guide compares the Cisco 3560 to the Quanta LY2 in the diagram and configuration above.

Bond Parameters

Here is the recommended way to configure a bond in Cumulus Linux:

auto bond0
iface bond0
    bond-slaves swp1 swp2
    bond-mode 802.3ad
    bond-miimon 100
    bond-lacp-rate 1
    bond-min-links 1
    bond-xmit-hash-policy layer3+4
  • bond-slaves equates to the members of the bond. In this case, swp1 and swp2 are members of bond0; LACP bonds requirebond-mode 802.3ad.
  • bond-miimon 100 is the failure inspection frequency. The default value is 0, but NVIDIA recommends 100.
  • bond-lacp-rate 1 means fast LACP, see Fast vs Slow LACP Rates below; NVIDIA recommends using fast LACP.
  • bond-min-links is an integer indicating the number of links that must be up for the bond to become active.
  • bond-xmit-hash-policy must be set to layer3+4 so it is evenly distributed.

To read more information about the bond parameters, read the kernel.org documentation.

VLAN Mismatch

The following configuration has a VLAN mismatch. Can you find it?

Quanta LY2 w/Cumulus Linux 4.2.0 Cisco WS-C3560X-24 12.2(55)SE5
auto bond1
iface bond1
bond-slaves glob swp19-20
bond-miimon 100
bond-min-links 1
bond-mode 802.3ad
bond-xmit-hash-policy layer3+4 bond-lacp-rate 0

auto vlan10
iface vlan10
bridge-ports bond1.100
address 10.10.10.11/24
bridge-stp on
vlan 10

interface GigabitEthernet0/19
switchport trunk encapsulation dot1q switchport mode trunk channel-group 1 mode active interface GigabitEthernet0/20 switchport trunk encapsulation dot1q switchport mode trunk channel-group 1 mode active

interface Port-channel1
switchport trunk encapsulation dot1q switchport mode trunk

interface Vlan10
ip address 10.10.10.10 255.255.255.0

As illustrated above, the bridge called vlan10 indicates the member of this bridge is bond1.100. The name vlan10 does not mean that you have to tag the bridge members with vlan10. The name has nothing to do with what 802.1q tags are within the bridge. The subinterface .100 (bond1.100) indicates that tagged ingress packets become a member of VLAN 100, but in a bridge named vlan10. This syntax is correct but might not be the result you want. You could name the bridge anything, such as mgmt-bridge, or outofband.

Unlike Cisco IOS, Cumulus Linux drops packets unless you join the tagged subinterface to a bridge or layer 3 interface. Many IOS and IOS-look-alikes do something like this:

switchport trunk allowed vlan 5

This allows only vlan5 and nothing else. Cumulus Linux does the opposite, where it drops everything unless it allows it in.

You can find more information on configuring VLAN tagging in the Cumulus Linux user guide.

Fast vs Slow LACP Rates

The Cumulus Linux documentation recommends:

bond-lacp-rate 1

This means fast; according to the kernel.org documentation, it means “Request partner to transmit LACPDUs every 1 second.”

In some cases, the other vendor cannot perform fast LACPDUs or there might be some other unknown requirement requiring slow LACP. To configure slow rate, use:

bond-lacp-rate 0

According to kernel.org, this means “Request partner to transmit LACPDUs every 30 seconds.”

Troubleshooting Fast vs Slow

To see a bond configuration and what it runs, use this command:

cat /proc/net/bonding/bond1

The following output is a snippet of the information received:

cumulus@switch:~$ cat /proc/net/bonding/bond1
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: slow

On the Cisco switch, you check the port channel like this:

show etherchannel summary

The following output is a snippet of the information received:

Group  Port-channel  Protocol    Ports
------+-------------+-----------+-----------------------------------------------
1      Po1(SU)         LACP      Gi0/19(P)   Gi0/20(P)

Notice that the port channel is up on both ports and looks good. To see the LACPDU’s speed on the Cisco side, run the following command:

show etherchannel detail

The following output is a snippet of the information received:

Local information:
                            LACP port     Admin     Oper    Port        Port
Port      Flags   State     Priority      Key       Key     Number      State
Gi0/19    SA      bndl      32768         0x1       0x1     0x114       0x3D

Partner's information:

                  LACP port                        Admin  Oper   Port    Port
Port      Flags   Priority  Dev ID          Age    key    Key    Number  State
Gi0/19    SA      255       089e.01ce.e216   3s    0x0    0x11   0x1     0x3D

Where the SA flags mean:

A - Device is in active mode
S - Device is sending Slow LACPDUs

Making sure both sides match is imperative for traffic to pass and the bond to stay up and be stable. In the case above, they were both utilizing slow LACPDUs. The following table helps you match:

Cumulus Linux Cisco Rate
LACP rate: slow S every 30 seconds
LACP rate: fast F every second

Active vs Passive Modes

Cumulus Linux does not currently support passive mode. Because active mode works with active and passive configurations, and Cumulus Linux does not have a knob to change it, there is no interoperability issue between switches running Cumulus Linux and switches from other network OS vendors.