Data Center Host to ToR Architecture
This chapter discusses the various architectures and strategies available from the top of rack (ToR) switches all the way down to the server hosts.
Layer 2 - Traditional Spanning Tree - Single Attached
| Example | Summary | 
|---|---|
|  | Bond and Etherchannel are not configured on host to multiple switches (bonds can still occur but to one switch at a time). leaf01 and leaf02 see two different MAC addresses. | 
| Benefits | Considerations | 
|---|---|
| 
 | 
 | 
| Active-Active Mode | Active-Passive Mode | Layer 2 to layer 3 Demarcation | 
|---|---|---|
| None (not possible with traditional spanning tree) | VRR | 
 You can configure VRR on a pair of switches at any level in the network. However, the higher up the network, the larger the layer 2 domain becomes. The benefit is layer 2 reachability. The drawback is that the layer 2 domain is more difficult to troubleshoot, does not scale as well, and the pair of switches running VRR needs to carry the entire MAC address table of everything below it in the network. Cumulus Professional Services recommends minimizing the layer 2 domain as much as possible. For more information, see this presentation. | 
Example Configuration
auto bridge
iface bridge
  bridge-vlan-aware yes
  bridge-ports swp1 peerlink
  bridge-vids 1-2000
  bridge-stp on
auto bridge.10
iface bridge.10
  address 10.1.10.2/24
auto peerlink
iface peerlink
    bond-slaves glob swp49-50
auto swp1
iface swp1
  mstpctl-portadminedge yes
  mstpctl-bpduguard yes
auto eth1
iface eth1 inet manual
auto eth1.10
iface eth1.10 inet manual
auto eth2
iface eth1 inet manual
auto eth2.20
iface eth2.20 inet manual
auto br-10
iface br-10 inet manual
  bridge-ports eth1.10 vnet0
auto br-20
iface br-20 inet manual
  bridge-ports eth2.20 vnet1
Layer 2 - MLAG
| Example | Summary | 
|---|---|
|  | MLAG (multi-chassis link aggregation) uses both uplinks at the same time. VRR enables both spines to act as gateways simultaneously for HA (high availability) and active-active mode. | 
| Benefits | Considerations | 
|---|---|
| 100% of links utilized | 
 | 
| Active-Active Mode | Active-Passive Mode | Layer 2 to layer 3 Demarcation | More Information | 
|---|---|---|---|
| VRR | None | 
 | 
 | 
Example Configuration
auto bridge
iface bridge
  bridge-vlan-aware yes
  bridge-ports host-01 peerlink
  bridge-vids 1-2000
  bridge-stp on
auto bridge.10
iface bridge.10
  address 172.16.1.2/24
  address-virtual 44:38:39:00:00:10 172.16.1.1/24
auto peerlink
iface peerlink
    bond-slaves glob swp49-50
auto peerlink.4094
iface peerlink.4094
    address 169.254.1.1/30
    clagd-enable yes
    clagd-peer-ip 169.254.1.2
    clagd-system-mac 44:38:39:FF:40:94
auto host-01
iface host-01
  bond-slaves swp1
  clag-id 1
  {bond-defaults removed for brevity}
auto bond0
iface bond0 inet manual
  bond-slaves eth0 eth1
  {bond-defaults removed for brevity}
auto bond0.10
iface bond0.10 inet manual
auto vm-br10
iface vm-br10 inet manual
  bridge-ports bond0.10 vnet0
Layer 3 - Single-attached Hosts
| Example | Summary | 
|---|---|
|  | The server (physical host) has one link to one ToR switch. | 
| Benefits | Considerations | 
|---|---|
| 
 | 
 | 
| FHR (First Hop Redundancy) | More Information | 
|---|---|
| No redundancy for ToR, uses single ToR as gateway. | For additional bandwidth, bond links between the host and leaf. | 
Example Configuration
/etc/network/interfaces file
auto swp1
iface swp1
  address 172.16.1.1/30
/etc/frr/frr.conf file
router ospf
  router-id 10.0.0.11
interface swp1
  ip ospf area 0
/etc/network/interfaces file
auto swp1
iface swp1
  address 172.16.2.1/30
/etc/frr/frr.conf file
router ospf
  router-id 10.0.0.12
interface swp1
  ip ospf area 0
auto eth1
iface eth1 inet static
  address 172.16.1.2/30
  up ip route add 0.0.0.0/0 nexthop via 172.16.1.1
auto eth1
iface eth1 inet static
  address 172.16.2.2/30
  up ip route add 0.0.0.0/0 nexthop via 172.16.2.1
Layer 3 - Redistribute Neighbor
| Example | Summary | 
|---|---|
|  | The Redistribute neighbor daemon grabs ARP entries dynamically and uses the redistribute table for FRRouting to take these dynamic entries and redistribute them into the fabric. | 
| Benefits | Considerations | 
|---|---|
| Configuration in FRRouting is simple (route map plus redistribute table) | 
 | 
| FHR (First Hop Redundancy) | More Information | 
|---|---|
| 
 | 
Layer 3 - Routing on the Host
| Example | Summary | 
|---|---|
|  | Routing on the host means there is a routing application (such as FRRouting, either on the bare metal host (no VMs or containers) or the hypervisor (for example, Ubuntu with KVM). This is highly recommended by the Professional Services team. | 
| Benefits | Considerations | 
|---|---|
| 
 | 
 | 
| FHR (First Hop Redundancy) | More Information | 
|---|---|
| 
 | 
Layer 3 - Routing on the VM
| Example | Summary | 
|---|---|
|  | Instead of routing on the hypervisor, each virtual machine uses its own routing stack. | 
| Benefits | Considerations | 
|---|---|
| In addition to routing on host: 
 | 
 | 
| FHR (First Hop Redundancy) | More Information | 
|---|---|
| 
 | 
Layer 3 - Virtual Router
| Example | Summary | 
|---|---|
|  | Virtual router (vRouter) runs as a VM on the hypervisor or host and sends routes to the ToR using BGP or OSPF. | 
| Benefits | Considerations | 
|---|---|
| In addition to routing on a host: 
 | 
 | 
| FHR (First Hop Redundancy) | More Information | 
|---|---|
| 
 | 
Layer 3 - Anycast with Manual Redistribution
| Example | Summary | 
|---|---|
|  | In contrast to routing on the host (preferred), this method allows you to route to the host. The ToRs are the gateway, as with redistribute neighbor, except because there is no daemon running, you must manually configure the networks under the routing process. There is a potential to black hole unless you run a script to remove the routes when the host no longer responds. | 
| Benefits | Considerations | 
|---|---|
| 
 | 
 | 
| FHR (First Hop Redundancy) | 
|---|
| The gateways are the ToRs, like redistribute neighbor with an equal cost route installed. | 
Example Configuration
/etc/network/interfaces file
auto swp1
iface swp1
  address 172.16.1.1/30
/etc/frr/frr.conf file
router ospf
  router-id 10.0.0.11
interface swp1
  ip ospf area 0
/etc/network/interfaces file
auto swp2
iface swp2
  address 172.16.1.1/30
/etc/frr/frr.conf file
router ospf
  router-id 10.0.0.12
interface swp1
  ip ospf area 0
auto lo
iface lo inet loopback
auto lo:1
iface lo:1 inet static
  address 172.16.1.2/32
  up ip route add 0.0.0.0/0 nexthop via 172.16.1.1 dev eth0 onlink nexthop via 172.16.1.1 dev eth1 onlink
auto eth1
iface eth2 inet static
  address 172.16.1.2/32
auto eth2
iface eth2 inet static
  address 172.16.1.2/32
Layer 3 - EVPN with Symmetric VXLAN Routing
Symmetric VXLAN routing on the ToR with EVPN for both VLAN and VXLAN bridging as well as VXLAN and external routing.
 
Each server is on a VLAN, with a total of two VLANs for the setup. MLAG is also set up between servers and the leafs. Each leaf has an anycast gateway and the servers default gateways point towards the corresponding leaf switch IP gateway address. Two tenant VNIs (corresponding to two VLANs or VXLANs) bridge to corresponding VLANs.
| Benefits | Considerations | 
|---|---|
| 
 | Needs MLAG (with the same considerations as the MLAG section above). | 
| Active-Active Mode | Active-Passive Mode | Demarcation | More Information | 
|---|---|---|---|
| VRR | None | ToR layer | 
Example /etc/network/interfaces File Configuration
# Loopback interface
auto lo
iface lo inet loopback
  address 10.0.0.11/32
  clagd-vxlan-anycast-ip 10.0.0.112
  alias loopback interface
# Management interface
 auto eth0
 iface eth0 inet dhcp
    vrf mgmt
auto mgmt
iface mgmt
    address 127.0.0.1/8
    address ::1/128
    vrf-table auto
# Port to Server01
auto swp1
iface swp1
  alias to Server01
  # This is required for Vagrant only
  post-up ip link set swp1 promisc on
# Port to Server02
auto swp2
iface swp2
  alias to Server02
  # This is required for Vagrant only
  post-up ip link set swp2 promisc on
# Port to Leaf02
auto swp49
iface swp49
  alias to Leaf02
  # This is required for Vagrant only
  post-up ip link set swp49 promisc on
# Port to Leaf02
auto swp50
iface swp50
  alias to Leaf02
  # This is required for Vagrant only
  post-up ip link set swp50 promisc on
# Port to Spine01
auto swp51
iface swp51
  mtu 9216
  alias to Spine01
# Port to Spine02
auto swp52
iface swp52
  mtu 9216
  alias to Spine02
# MLAG Peerlink bond
auto peerlink
iface peerlink
  mtu 9000
  bond-slaves swp49 swp50
# MLAG Peerlink layer 2 interface.
# This creates VLAN 4094 that only lives on the peerlink bond
# No other interface will be aware of VLAN 4094
auto peerlink.4094
iface peerlink.4094
  address 169.254.1.1/30
  clagd-peer-ip 169.254.1.2
  clagd-backup-ip 10.0.0.12
  clagd-sys-mac 44:39:39:ff:40:94
  clagd-priority 100
# Bond to Server01
auto bond01
iface bond01
  mtu 9000
  bond-slaves swp1
  bridge-access 13
  clag-id 1
# Bond to Server02
auto bond02
iface bond02
  mtu 9000
  bond-slaves swp2
  bridge-access 24
  clag-id 2
# Define the bridge for STP
auto bridge
iface bridge
  bridge-vlan-aware yes
  # bridge-ports includes all ports related to VxLAN and CLAG.
  # does not include the Peerlink.4094 subinterface
  bridge-ports bond01 bond02 peerlink vni13 vni24 vxlan4001
  bridge-vids 13 24
  bridge-pvid 1
# VXLAN Tunnel for Server1-Server3 (Vlan 13)
auto vni13
iface vni13
  mtu 9000
  vxlan-id 13
  vxlan-local-tunnelip 10.0.0.11
  bridge-access 13
  mstpctl-bpduguard yes
  mstpctl-portbpdufilter yes
#VXLAN Tunnel for Server2-Server4 (Vlan 24)
auto vni24
iface vni24
  mtu 9000
  vxlan-id 24
  vxlan-local-tunnelip 10.0.0.11
  bridge-access 24
  mstpctl-bpduguard yes
  mstpctl-portbpdufilter yes
auto vxlan4001
iface vxlan4001
    vxlan-id 104001
    vxlan-local-tunnelip 10.0.0.11
    bridge-access 4001
auto vrf1
iface vrf1
   vrf-table auto
#Tenant SVIs - anycast GW
auto vlan13
iface vlan13
    address 10.1.3.11/24
    address-virtual 44:39:39:ff:00:13 10.1.3.1/24
    vlan-id 13
    vlan-raw-device bridge
    vrf vrf1
auto vlan24
iface vlan24
    address 10.2.4.11/24
    address-virtual 44:39:39:ff:00:24 10.2.4.1/24
    vlan-id 24
    vlan-raw-device bridge
    vrf vrf1
#Layer 3 VLAN interface per tenant (for layer 3 VNI)
auto vlan4001
iface vlan4001
    hwaddress 44:39:39:FF:40:94
    vlan-id 4001
    vlan-raw-device bridge
    vrf vrf1
# Loopback interface
auto lo
iface lo inet loopback
  address 10.0.0.12/32
  clagd-vxlan-anycast-ip 10.0.0.112
  alias loopback interface
# Management interface
auto eth0
iface eth0 inet dhcp
    vrf mgmt
auto mgmt
iface mgmt
    address 127.0.0.1/8
    address ::1/128
    vrf-table auto
# Port to Server01
auto swp1
iface swp1
  alias to Server01
  # This is required for Vagrant only
  post-up ip link set swp1 promisc on
# Port to Server02
auto swp2
iface swp2
  alias to Server02
  # This is required for Vagrant only
  post-up ip link set swp2 promisc on
# Port to Leaf01
auto swp49
iface swp49
  alias to Leaf01
  # This is required for Vagrant only
  post-up ip link set swp49 promisc on
# Port to Leaf01
auto swp50
iface swp50
  alias to Leaf01
  # This is required for Vagrant only
  post-up ip link set swp50 promisc on
# Port to Spine01
auto swp51
iface swp51
  mtu 9216
  alias to Spine01
# Port to Spine02
auto swp52
iface swp52
  mtu 9216
  alias to Spine02
# MLAG Peerlink bond
auto peerlink
iface peerlink
  mtu 9000
  bond-slaves swp49 swp50
# MLAG Peerlink layer 2 interface.
# This creates VLAN 4094 that only lives on the peerlink bond
# No other interface will be aware of VLAN 4094
auto peerlink.4094
iface peerlink.4094
  address 169.254.1.2/30
  clagd-peer-ip 169.254.1.1
  clagd-backup-ip 10.0.0.11
  clagd-sys-mac 44:39:39:ff:40:94
  clagd-priority 200
# Bond to Server01
auto bond01
iface bond01
  mtu 9000
  bond-slaves swp1
  bridge-access 13
  clag-id 1
# Bond to Server02
auto bond02
iface bond02
  mtu 9000
  bond-slaves swp2
  bridge-access 24
  clag-id 2
# Define the bridge for STP
auto bridge
iface bridge
  bridge-vlan-aware yes
  # bridge-ports includes all ports related to VxLAN and CLAG.
  # does not include the Peerlink.4094 subinterface
  bridge-ports bond01 bond02 peerlink vni13 vni24 vxlan4001
  bridge-vids 13 24
  bridge-pvid 1
auto vxlan4001
iface vxlan4001
     vxlan-id 104001
     vxlan-local-tunnelip 10.0.0.12
     bridge-access 4001
# VXLAN Tunnel for Server1-Server3 (Vlan 13)
auto vni13
iface vni13
  mtu 9000
  vxlan-id 13
  vxlan-local-tunnelip 10.0.0.12
  bridge-access 13
  mstpctl-bpduguard yes
  mstpctl-portbpdufilter yes
#VXLAN Tunnel for Server2-Server4 (Vlan 24)
auto vni24
iface vni24
  mtu 9000
  vxlan-id 24
  vxlan-local-tunnelip 10.0.0.12
  bridge-access 24
  mstpctl-bpduguard yes
  mstpctl-portbpdufilter yes
auto vrf1
iface vrf1
   vrf-table auto
auto vlan13
iface vlan13
    address 10.1.3.12/24
    address-virtual 44:39:39:ff:00:13 10.1.3.1/24
    vlan-id 13
    vlan-raw-device bridge
    vrf vrf1
auto vlan24
iface vlan24
    address 10.2.4.12/24
    address-virtual 44:39:39:ff:00:24 10.2.4.1/24
    vlan-id 24
    vlan-raw-device bridge
    vrf vrf1
#Layer 3 VLAN interface per tenant (for layer 3 VNI)
auto vlan4001
iface vlan4001
    hwaddress 44:39:39:FF:40:94
    vlan-id 4001
    vlan-raw-device bridge
    vrf vrf1
auto lo
iface lo inet loopback
auto eth0
iface eth0 inet dhcp
auto eth1
iface eth1 inet manual
  bond-master uplink
  # Required for Vagrant
  post-up ip link set promisc on dev eth1
auto eth2
iface eth2 inet manual
  bond-master uplink
  # Required for Vagrant
  post-up ip link set promisc on dev eth2
auto uplink
iface uplink inet static
  mtu 9000
  bond-slaves none
  bond-mode 802.3ad
  bond-miimon 100
  bond-lacp-rate 1
  bond-min-links 1
  bond-xmit-hash-policy layer3+4
  address 10.1.3.101
  netmask 255.255.255.0
  post-up ip route add default via 10.1.3.1
auto lo
iface lo inet loopback
auto eth0
iface eth0 inet dhcp
auto eth1
iface eth1 inet manual
  bond-master uplink
  # Required for Vagrant
  post-up ip link set promisc on dev eth1
auto eth2
iface eth2 inet manual
  bond-master uplink
  # Required for Vagrant
  post-up ip link set promisc on dev eth2
auto uplink
iface uplink inet static
  mtu 9000
  bond-slaves none
  bond-mode 802.3ad
  bond-miimon 100
  bond-lacp-rate 1
  bond-min-links 1
  bond-xmit-hash-policy layer3+4
  address 10.2.4.102
  netmask 255.255.255.0
  post-up ip route add default via 10.2.4.1