EVPN BUM Traffic with PIM-SM
Without EVPN and PIM-SM, HER is the default way to replicate BUM traffic to remote VTEPs, where the ingress VTEP generates as many copies as VTEPs for each overlay BUM packet. This might not be optimal in certain deployments.
The following example shows a EVPN-PIM configuration, where underlay multicast is used to distribute BUM traffic. A multicast distribution tree (MDT) optimizes the flow of overlay BUM in the underlay network.
In the above example, host01 sends an ARP request to resolve host03. leaf01 (in addition to flooding the packet to host02) sends an encapsulated packet over the underlay network, which is forwarded using the MDT to leaf02 and leaf03.
For PIM-SM, type-3 routes do not result in any forwarding entries. Cumulus Linux does not advertise type-3 routes for a layer 2 VNI when BUM mode for that VNI is PIM-SM.
EVPN-PIM is supported on Broadcom Trident3 and Trident 2+ switches, and Mellanox Spectrum, Spectrum-2 and Spectrum-3 switches.
Configure Multicast VXLAN Tunnels
To configure multicast VXLAN tunnels, you need to configure PIM-SM in the underlay:
- Enable PIM-SM on the appropriate layer 3 interfaces.
- Configure static RP on all the PIM routers.
- Configure MSDP on the RPs for RP redundancy.
The configuration steps needed to configure PIM-SM in the underlay are provided in Protocol Independent Multicast - PIM.
In addition to the PIM-SM configuration, you need to run the following commands on each VTEP to provide the layer 2 VNI to MDT mapping.
Run the net add vxlan <interface> vxlan mcastgrp <ip-address>
command. For example:
cumulus@switch:~$ net add vxlan vxlan1000111 vxlan mcastgrp 239.1.1.111
Edit the /etc/network/interfaces
file and add vxlan-mcastgrp <ip-address>
to the interface stanza. For example:
cumulus@switch:~$ sudo vi /etc/network/interfaces
...
auto vxlan1000111
iface vxlan1000111
vxlan-id 1000111
vxlan-local-tunnelip 10.0.0.28
vxlan-mcastgrp 239.1.1.111
Run the ifreload -a
command to load the new configuration:
cumulus@switch:~$ ifreload -a
One multicast group per layer 2 VNI is optimal configuration for underlay bandwidth utilization. However, you can specify the same multicast group for more than one layer 2 VNI.
Verify EVPN-PIM
Run the NCLU net show mroute
command or the vtysh show ip mroute
command to review the multicast route information in FRRouting. When using EVPN-PIM, every VTEP acts as both source and destination for a VNI-MDT group, therefore, mroute entries on each VTEP should look like this:
cumulus@switch:~$ net show mroute
Source Group Proto Input Output TTL Uptime
* 239.1.1.111 IGMP swp2 pimreg 1 21:37:36
PIM ipmr-lo 1 21:37:36
10.0.0.28 239.1.1.111 STAR lo ipmr-lo 1 21:36:41
PIM swp2 1 21:36:41
* 239.1.1.112 IGMP swp2 pimreg 1 21:37:36
PIM ipmr-lo 1 21:37:36
10.0.0.28 239.1.1.112 STAR lo ipmr-lo 1 21:36:41
PIM swp2 1 21:36:41
(*,G) entries should show ipmr-lo
in the OIL (Outgoing Interface List) and (S,G) entries should show lo
as the Source interface or incoming interface and ipmr-lo
in the OIL.
Run the ip mroute
command to review the multicast route information in the kernel. The kernel information should match the FRR information.
cumulus@switch:~$ ip mroute
(10.0.0.28,239.1.1.112) Iif: lo Oifs: swp2 State: resolved
(10.0.0.28,239.1.1.111) Iif: lo Oifs: swp2 State: resolved
(0.0.0.0,239.1.1.111) Iif: swp2 Oifs: pimreg ipmr-lo swp2 State: resolved
(0.0.0.0,239.1.1.112) Iif: swp2 Oifs: pimreg ipmr-lo swp2 State: resolved
Run the bridge fdb show | grep 00:00:00:00:00:00
command to verify that all zero MAC addresses for every VXLAN device point to the correct multicast group destination.
cumulus@switch:~$ bridge fdb show | grep 00:00:00:00:00:00
00:00:00:00:00:00 dev vxlan1000112 dst 239.1.1.112 self permanent
00:00:00:00:00:00 dev vxlan1000111 dst 239.1.1.111 self permanent
The show ip mroute count
command, often used to check multicast packet counts does not update for encapsulated BUM traffic originating or terminating on the VTEPs.
Run the NCLU net show evpn vni <vni>
command or the vtysh show evpn vni <vni>
command to ensure that your layer 2 VNI has the correct flooding information:
cumulus@switch:~$ net show evpn vni 10
VNI: 10
Type: L2
Tenant VRF: default
VxLAN interface: vni10
VxLAN ifIndex: 18
Local VTEP IP: 10.0.0.28
Mcast group: 239.1.1.112 <<<<<<<
Remote VTEPs for this VNI:
10.0.0.26 flood: -
10.0.0.27 flood: -
Number of MACs (local and remote) known for this VNI: 9
Number of ARPs (IPv4 and IPv6, local and remote) known for this VNI: 14
Advertise-gw-macip: No
Configure EVPN-PIM in VXLAN Active-active Mode
To configure EVPN-PIM in VXLAN active-active mode, enable PIM on the peer link on each MLAG peer switch (in addition to the configuration described in Configure Multicast VXLAN Tunnels, above).
Run the net add interface <peerlink> pim
command. For example:
cumulus@switch:~$ net add interface peerlink.4094 pim
cumulus@switch:~$ net commit
cumulus@switch:~$ net pending
In the vtysh shell, run the following commands:
cumulus@switch:~$ sudo vtysh
switch# configure terminal
switch(config)# interface peerlink.4094
switch(config-if)# ip pim
switch(config-if)# end
switch# write memory
switch# exit
cumulus@switch:~$
Example Configuration
The following example shows an EVPN-PIM configuration on the VTEP, where:
- PIM is enabled on swp1, swp2, and the loopback interface (shown in the example
/etc/frr/frr.conf
file below). - The group mapping 192.168.0.1 is configured for a static RP (shown at the top of the
/etc/frr/frr.conf
file example below). - Multicast group 239.1.1.111 is mapped to VXLAN1000111. Multicast group 239.1.1.112 is mapped to VXLAN1000112 (shown in the example
/etc/network/interfaces
file below).
cumulus@switch:~$ sudo cat /etc/frr/frr.conf
...
ip pim rp 192.168.0.1
ip pim keep-alive-timer 3600
...
vrf vrf1
vni 104001
exit-vrf
!
vrf vrf2
vni 104002
exit-vrf
!
interface swp1
description swp1 > leaf-11's swp3
ip ospf network point-to-point
ip pim
!
interface swp2
description swp2 > leaf-12's swp3
ip ospf network point-to-point
ip pim
!
interface swp3
description swp3 > host-111's swp1
!
interface swp6
description swp6 > host-112's swp1
!
interface lo
ip pim
!
router bgp 650000
bgp router-id 10.0.0.28
bgp bestpath as-path multipath-relax
bgp bestpath compare-routerid
neighbor RR peer-group
neighbor RR remote-as internal
neighbor RR advertisement-interval 0
neighbor RR timers 3 10
neighbor RR timers connect 5
neighbor 10.0.0.26 peer-group RR
neighbor 10.0.0.26 update-source lo
neighbor 10.0.0.27 peer-group RR
neighbor 10.0.0.27 update-source lo
!
address-family ipv4 unicast
redistribute connected
maximum-paths ibgp 16
exit-address-family
!
address-family l2vpn evpn
neighbor RR activate
advertise-all-vni
exit-address-family
!
router ospf
ospf router-id 10.0.0.28
network 10.0.0.28/32 area 0.0.0.0
!
line vty
exec-timeout 0 0
!
end
cumulus@switch:~$ sudo cat /etc/network/interfaces
auto lo
iface lo
address 10.0.0.28/32
# The primary network interface
auto eth0
iface eth0 inet dhcp
auto swp1
iface swp1
link-speed 10000
link-duplex full
link-autoneg off
address 10.0.0.28/32
auto swp2
iface swp2
link-speed 10000
link-duplex full
link-autoneg off
address 10.0.0.28/32
auto swp3
iface swp3
link-speed 10000
link-duplex full
link-autoneg off
bridge-access 111
auto swp6
iface swp6
link-speed 10000
link-duplex full
link-autoneg off
bridge-access 112
auto vxlan1000111
iface vxlan1000111
vxlan-id 1000111
vxlan-local-tunnelip 10.0.0.28
bridge-access 111
vxlan-mcastgrp 239.1.1.111
auto vxlan1000112
iface vxlan1000112
vxlan-id 1000112
vxlan-local-tunnelip 10.0.0.28
bridge-access 112
vxlan-mcastgrp 239.1.1.112
auto vrf1
iface vrf1
vrf-table auto
auto vrf2
iface vrf2
vrf-table auto
auto vxlan104001
iface vxlan104001
vxlan-id 104001
vxlan-local-tunnelip 10.0.0.28
bridge-access 4001
auto vxlan104002
iface vxlan104002
vxlan-id 104002
vxlan-local-tunnelip 10.0.0.28
bridge-access 4002
auto bridge
iface bridge
bridge-ports swp3 swp6 swp56s0 swp56s1 vxlan1000111 vxlan1000112 vxlan104001 vxlan104002
bridge-vlan-aware yes
bridge-vids 111 112 4001 4002
auto vlan111
iface vlan111
address 10.1.1.11/24
address 2060:1:1:1::11/64
vlan-id 111
vlan-raw-device bridge
address-virtual 00:00:5e:00:01:01 10.1.1.250/24 2060:1:1:1::250/64
vrf vrf2
auto vlan112
iface vlan112
address 50.1.1.11/24
address 2050:1:1:1::11/64
vlan-id 112
vlan-raw-device bridge
address-virtual 00:00:5e:00:01:01 10.10.1.250/24 2050:1:1:1::250/64
vrf vrf1
auto vlan4001
iface vlan4001
vlan-id 4001
vlan-raw-device bridge
vrf vrf1
auto vlan4002
iface vlan4002
vlan-id 4002
vlan-raw-device bridge
vrf vrf2