TX Drop Counter Counts Twice for VXLAN Traffic
Switches with a Broadcom ASIC reflect BUM traffic as TX drops on a VXLAN-enabled bridge. This behavior stems from the ASIC itself when you configure the switch with a VXLAN-enabled bridge.
Symptom
When a VXLAN-enabled bridge receives a broadcast, unknown unicast, or multicast (BUM) frame, the ASIC avoids sending the frame back on the same port on which it is originally received to avoid loops. This split-horizon correction causes the HwIfOutNonQDrops
transmit drop counter to increment twice for each count.
You see the effects of this double count when you run these commands:
sudo ethtool -S swpX
net show counters
cl-netstat
This behavior does not occur on a bridge configured with non-VXLAN ports. However, when you configure VNIs as bridge-ports, the switch no longer treats switch ports in that bridge as regular ports. These switch ports are susceptible to this behavior, and the traffic statistics reflect this split-horizon correction.
You also see the HwIfOutNonQDrops
counter counting twice on routed uplink ports, which receives VXLAN-encapsulated packets.
Example
Host11 connects directly to VTEP1 on port swp11 and communicates with Host2, which connects directly to VTEP2 on port swp22. An IP-routed network separates VTEP1 and VTEP2.
For a unidirectional traffic flow from Host11 towards Host22:
- The ingress VTEP counts TX drops on the edge port where it receives traffic (VTEP1, swp11) at a rate equal to the number of BUM packets that the ingress port receives (from the source of the traffic flow).
- The egress VTEP counts TX drops on the routed uplink port (VTEP2, swp22) at a rate equal to number of BUM packets multiplied by number of remote VTEPs in the flood list (per VNI).
In the following example, the host sends 10000 multicast packets:
- The
TX_DRP
column for swp5 shows 10001, which indicates where theTX_DRP
counter increments because of the split-horizon correction behavior. - The
RX_OK
column for swp5 shows 10005 and theTX_OK
column for swp1 shows 20049. These indicate the physical network path for the traffic flow relevant to this example. - The
TX_OK
column for vni-1010 shows 20007, which indicates the network path across logical ports that are relevant to this example.
On the ingress VTEP:
- The edge port (swp5) receives 10005 packets. 10001 packets show as TX drops on the ingress port because of split-horizon correction.
- The routed uplink (swp1) sends 20049 packets toward the two remote VTEPs (VTEP2 and VTEP3). vni-1010 has two remote VTEPs in the flood list, so 20007 packets show as
TX_OK
; this is normal behavior.
cumulus@vtep1:~$ net show counters
Kernel Interface table
Iface MTU Met RX_OK RX_ERR RX_DRP RX_OVR TX_OK TX_ERR TX_DRP TX_OVR Flg
-------- ----- ----- ------- -------- -------- -------- ------- -------- -------- -------- -----
bridge 1500 0 10011 0 0 0 50023 0 0 0 BMRU
eth0 1500 0 95 0 0 0 14 0 0 0 BMRU
lo 65536 0 0 0 0 0 0 0 0 0 LRU
swp1 1500 0 80 0 0 0 20049 0 6 0 BMRU
swp5 1500 0 10005 0 0 0 54 0 10001 0 BMRU
swp15 1500 0 13 0 0 0 23 0 0 0 BMRU
swp16 1500 0 3 0 0 0 20058 0 0 0 BMRU
swp51 1500 0 0 0 0 0 0 0 0 0 BMU
vni-1010 1500 0 3 0 0 0 20007 0 0 0 BMRU
For the egress VTEP:
- The routed uplink port (swp5) receives 10055 packets. 20009 packets show as TX drops on the ingress port (and 20011 packets for the corresponding VNI) because of split-horizon correction.
- The edge port (swp49) sends 10061 packets towards the destination.
cumulus@vtep2:~$ net show counters
Kernel Interface table
Iface MTU Met RX_OK RX_ERR RX_DRP RX_OVR TX_OK TX_ERR TX_DRP TX_OVR Flg
-------- ----- ----- ------- -------- -------- -------- ------- -------- -------- -------- -----
bridge 1500 0 10014 0 0 0 40027 0 0 0 BMRU
eth0 1500 0 133 0 0 0 25 0 0 0 BMRU
lo 65536 0 0 0 0 0 0 0 0 0 LRU
swp5 1500 0 10055 0 0 0 90 0 20009 0 BMRU
swp15 1500 0 3 0 0 0 14 0 0 0 BMRU
swp16 1500 0 3 0 0 0 10060 0 0 0 BMRU
swp49 1500 0 6 0 0 0 10061 0 0 0 BMRU
vni-1010 1500 0 10004 0 0 0 20011 0 0 0 BMRU
vni-1020 1500 0 1 0 0 0 1 0 0 0 BMRU