VXLAN Active-Active Mode
VXLAN active-active mode allows a pair of MLAG switches to act as a single VTEP, providing active-active VXLAN termination for bare metal as well as virtualized workloads.
There are some differences whether you’re deploying this with EVPN or LNV. This chapter outlines the configurations for both options.
Terminology
Term | Definition |
---|---|
VTEP | The virtual tunnel endpoint. This is an encapsulation and decapsulation point for VXLANs. |
active-active VTEP | A pair of switches acting as a single VTEP. |
ToR | The top of rack switch; also referred to as a leaf or access switch. |
spine | The aggregation switch for multiple leafs. Specifically used when a data center is using a Clos network architecture. Read more about spine-leaf architecture in this white paper. |
exit leaf | A switch dedicated to peering the Clos network to an outside network; also referred to as a border leaf, service leaf, or edge leaf. |
anycast | An IP address that is advertised from multiple locations. Anycast enables multiple devices to share the same IP address and effectively load balance traffic across them. With VXLAN, anycast is used to share a VTEP IP address between a pair of MLAG switches. |
RIOT | Routing in and out of tunnels. A Broadcom feature for routing in and out of tunnels. Allows a VXLAN bridge to have a switch VLAN interface associated with it, and traffic to exit a VXLAN into the layer 3 fabric. Also called VXLAN Routing. |
VXLAN routing | The industry standard term for the ability to route in and out of a VXLAN. Equivalent to the Broadcom RIOT feature. |
clagd-vxlan-anycast-ip |
The anycast address for the MLAG pair to share and bind to when MLAG is up and running. |
Configure VXLAN Active-active Mode
VXLAN active-active mode requires the following underlying technologies to work correctly.
Technology | More Information |
---|---|
MLAG | Refer to the MLAG chapter for more detailed configuration information. Configurations for the demonstration are provided below. |
OSPF or BGP | Refer to the OSPF chapter or the BGP chapter for more detailed configuration information. Configurations for the BGP demonstration are provided below. |
STP | You must enable BPDU filter and BPDU guard in the VXLAN interfaces if STP is enabled in the bridge that is connected to the VXLAN. Configurations for the demonstration are provided below. |
Active-active VTEP Anycast IP Behavior
You must provision each individual switch within an MLAG pair with a
virtual IP address in the form of an anycast IP address for VXLAN
data-path termination. The VXLAN termination address is an anycast IP
address that you configure as a clagd
parameter
(clagd-vxlan-anycast-ip
) under the loopback interface. clagd
dynamically adds and removes this address as the loopback interface
address as follows:
-
When the switches boot up,
ifupdown2
places all VXLAN interfaces in a PROTO_DOWN state. The configured anycast addresses are not configured yet. -
MLAG peering takes place and a successful VXLAN interface consistency check between the switches occurs.
-
clagd
(the daemon responsible for MLAG) adds the anycast address to the loopback interface as a second address. It then changes the local IP address of the VXLAN interface from a unique address to the anycast virtual IP address and puts the interface in an UP state.
In order for the anycast address to activate, you must configure a VXLAN interface on each switch in the MLAG pair.
Failure Scenario Behaviors
Scenario | Behavior |
---|---|
The peer link goes down. | The primary MLAG switch continues to keep all VXLAN interfaces up with the anycast IP address while the secondary switch brings down all VXLAN interfaces and places them in a PROTO_DOWN state. The secondary MLAG switch removes the anycast IP address from the loopback interface. |
One of the switches goes down. | The other operational switch continues to use the anycast IP address. |
clagd is stopped. |
All VXLAN interfaces are put in a PROTO_DOWN state. The anycast IP address is removed from the loopback interface and the local IP addresses of the VXLAN interfaces are changed from the anycast IP address to unique non-virtual IP addresses. |
MLAG peering could not be established between the switches. | clagd brings up all the VXLAN interfaces after the reload timer expires with the configured anycast IP address. This allows the VXLAN interface to be up and running on both switches even though peering is not established. |
The peer link goes down but the peer switch is up (the backup link is active). | All VXLAN interfaces are put into a PROTO_DOWN state on the secondary switch. |
The anycast IP address is different on the MLAG peers. | The VXLAN interface is placed into a PROTO_DOWN state on the secondary switch. |
Check VXLAN Interface Configuration Consistency
The active-active configuration for a given VXLAN interface must be consistent between the MLAG switches for correct traffic behavior. MLAG ensures that the configuration consistency is met before bringing up the VXLAN interfaces
The consistency checks include:
- The anycast virtual IP address for VXLAN termination must be the same on each pair of switches.
- A VXLAN interface with the same VXLAN ID must be configured and administratively up on both switches.
You can use the clagctl
command to check if any VXLAN switches are in
a PROTO_DOWN state.
Configure the Anycast IP Address
With MLAG peering, both switches use an anycast IP address for VXLAN encapsulation and decapsulation. This allows remote VTEPs to learn the host MAC addresses attached to the MLAG switches against one logical VTEP, even though the switches independently encapsulate and decapsulate layer 2 traffic originating from the host. You can configure the anycast address under the loopback interface, as shown below.
auto lo
iface lo inet loopback
address 10.0.0.11/32
clagd-vxlan-anycast-ip 10.10.10.20
auto lo
iface lo inet loopback
address 10.0.0.12/32
clagd-vxlan-anycast-ip 10.10.10.20
Example VXLAN Active-Active Configuration
Note the configuration of the local IP address in the VXLAN interfaces
below. They are configured with individual IP addresses, which clagd
changes to anycast upon MLAG peering.
FRRouting Configuration
You can configure the layer 3 fabric using BGP or OSPF. The following example uses BGP unnumbered. The MLAG switch configuration for the topology above is shown below.
Layer 3 IP Addressing
The IP address configuration for this example:
|
|
|
|
|
|
Host Configuration
In this example, the servers are running Ubuntu 14.04. A layer2 bond must be mapped from server01 and server03 to the respective switch. In Ubuntu this is done with subinterfaces.
|
|
Using Active-active Mode with LNV
When using VXLAN active-active mode with lightweight network virtualization (LNV), follow the steps outlined above. In addition, the following configuration steps are needed:
- Configuring the loopback interface for active-active mode
- Enabling the registration daemon
- Configuring a VTEP
- Enabling the service node daemon
- Configuring the service node
Terminology
Term |
Definition |
---|---|
|
The VXLAN registration daemon. The daemon runs on the switch that is mapping VLANs to VXLANs. You must configure the |
|
The VXLAN service node daemon that you can run to register multiple VTEPs. |
|
The unique IP address to which the |
|
The service node anycast IP address in the topology. In this demonstration, this is an anycast IP address shared by both spine switches. |
anycast |
When an IP address is advertised from multiple locations. Allows multiple devices to share the same IP and effectively load balance traffic across them. With VXLAN, anycast is used in two places:
|
Configure the Loopback Interface for Active-active Mode
You configure active-active mode as you would for EVPN, as described
above, adding two more configuration options to the loopback interface:
the vxrd
IP address and the service node IP address.
Continuing with the example configuration above, the loopback interface configuration on the leaf switches would look like this:
|
|
|
|
Enable the Registration Daemon
You must enable the registration daemon (vxrd
) on each ToR switch
acting as a VTEP that is participating in the VXLAN. The daemon is
installed by default.
-
Open the
/etc/default/vxrd
configuration file in a text editor. -
Enable the daemon, then save the file.
START=yes
-
Restart the
vxrd
daemon.cumulus@leaf0X:~$ sudo systemctl restart vxrd.service
Configure a VTEP
The registration node is already configured in
/etc/network/interfaces
; no additional configuration is typically
needed. However, you can configure the VTEP in the /etc/vxrd.conf
file
instead, which has additional configuration knobs available.
Enable the Service Node Daemon
-
Open the
/etc/default/vxsnd
configuration file in a text editor. -
Enable the daemon, then save the file:
START=yes
-
Restart the daemon.
cumulus@spine0X:~$ sudo systemctl restart vxsnd.service
Configure the Service Node
To configure the service node daemon, edit the /etc/vxsnd.conf
configuration file:
Full configuration of vxsnd.conf
|
Full configuration of vxsnd.conf
|
Troubleshooting
In addition to troubleshooting single-attached configurations,
there is now the MLAG daemon (clagd
) to consider. The clagctl
command gives the output of MLAG behavior and any inconsistencies that
might arise between a MLAG pair.
cumulus@leaf01$ clagctl
The peer is alive
Our Priority, ID, and Role: 32768 44:38:39:00:00:35 primary
Peer Priority, ID, and Role: 32768 44:38:39:00:00:36 secondary
Peer Interface and IP: peerlink.4094 169.254.1.2
VxLAN Anycast IP: 10.10.10.30
Backup IP: 10.0.0.14 (inactive)
System MAC: 44:38:39:ff:40:95
CLAG Interfaces
Our Interface Peer Interface CLAG Id Conflicts Proto-Down Reason
---------------- ---------------- ------- -------------------- -----------------
bond0 bond0 1 - -
vxlan20 vxlan20 - - -
vxlan1 vxlan1 - - -
vxlan10 vxlan10 - - -
The additions to normal MLAG behavior are the following:
Output | Explanation |
---|---|
VXLAN Anycast IP: 10.10.10.30 |
The anycast IP address being shared by the MLAG pair for VTEP termination is in use and is 10.10.10.30. |
Conflicts: - |
There are no conflicts for this MLAG Interface. |
Proto-Down Reason: - |
The VXLAN is up and running (there is no Proto-Down). |
In the next example the vxlan-id
on VXLAN10 is switched to the wrong
vxlan-id
. When the clagctl
command is run, you see that VXLAN10 goes
down because this switch is the secondary switch and the peer switch
takes control of VXLAN. The reason code is vxlan-single
indicating
that there is a vxlan-id
mis-match on VXLAN10.
cumulus@leaf02$ clagctl
The peer is alive
Peer Priority, ID, and Role: 32768 44:38:39:00:00:11 primary
Our Priority, ID, and Role: 32768 44:38:39:00:00:12 secondary
Peer Interface and IP: peerlink.4094 169.254.1.1
VxLAN Anycast IP: 10.10.10.20
Backup IP: 10.0.0.11 (inactive)
System MAC: 44:38:39:ff:40:94
CLAG Interfaces
Our Interface Peer Interface CLAG Id Conflicts Proto-Down Reason
---------------- ---------------- ------- -------------------- -----------------
bond0 bond0 1 - -
vxlan20 vxlan20 - - -
vxlan1 vxlan1 - - -
vxlan10 - - - vxlan-single
Caveats and Errata
Use VLAN for Peer Link Only Once
Do not reuse the VLAN used for the peer link layer 3 subinterface for any other interface in the system. A high VLAN ID value is recommended. For more information on VLAN ID ranges, refer to the VLAN-aware bridge chapter.
Bonds with Vagrant in Cumulus VX
Bonds (or LACP Etherchannels) fail to work in a Vagrant setup unless the link is set to promiscuous mode. This is a limitation on virtual topologies only, and is not needed on real hardware.
auto swp49
iface swp49
#for vagrant so bonds work correctly
post-up ip link set $IFACE promisc on
auto swp50
iface swp50
#for vagrant so bonds work correctly
post-up ip link set $IFACE promisc on
For more information on using Cumulus VX and Vagrant, refer to the Cumulus VX documentation.
With LNV, Unique Node ID Required for vxrd in Cumulus VX
vxrd
requires a unique node_id
for each individual switch. This
node_id
is based off the first interface’s MAC address; when using
certain virtual topologies like Vagrant, both leaf switches within an
MLAG pair can generate the same exact unique node_id
. You must
configure one of the node_id
s manually (or make sure the first
interface always has a unique MAC address), as they are not unique.
To verify the node_id
that gets configured by your switch, use the
vxrdctl get config
command:
cumulus@leaf01$ vxrdctl get config
{
"concurrency": 1000,
"config_check_rate": 60,
"debug": false,
"eventlet_backdoor_port": 9000,
"head_rep": true,
"holdtime": 90,
"logbackupcount": 14,
"logdest": "syslog",
"logfilesize": 512000,
"loglevel": "INFO",
"max_packet_size": 1500,
"node_id": 13,
"pidfile": "/var/run/vxrd.pid",
"refresh_rate": 3,
"src_ip": "10.2.1.50",
"svcnode_ip": "10.10.10.10",
"udsfile": "/var/run/vxrd.sock",
"vxfld_port": 10001
}
To set the node_id
manually:
-
Open
/etc/vxrd.conf
in a text editor. -
Set the
node_id
value within thecommon
section, then save the file:[common] node_id = 13
Ensure that each leaf has a separate node_id
so that active-active
mode can function correctly.