If you are using the current version of Cumulus NetQ, the content on this page may not be up to date. The current version of the documentation is available here. If you are redirected to the main page of the user guide, then this page may have been renamed; please search for it there.

Monitor Network Elements

In addition to network performance monitoring, the Cumulus NetQ UI provides a view into the current status and configuration of the network elements in a tabular, network-wide view. These are helpful when you want to see all data for all of a particular element in your network for troubleshooting, or you want to export a list view.

Some of these views provide data that is also available through the card workflows, but these views are not treated like cards. They only provide the current status; you cannot change the time period of the views, or graph the data within the UI.

Access these tables through the Main Menu (), under the Network heading.

Tables can be manipulated using the settings above the tables, shown here and described in Table Settings.

Pagination options are shown when there are more than 25 results.

View All NetQ Agents

The Agents view provides all available parameter data about all NetQ Agents in the system.

View All Events

The Events view provides all available parameter data about all events in the system.

View All MACs

The MACs (media access control addresses) view provides all available parameter data about all MAC addresses in the system.

View All VLANs

The VLANs (virtual local area networks) view provides all available parameter data about all VLANs in the system.

View All IP Routes

The IP Routes view provides all available parameter data about all IP routes, all IPv4 routes, and all IPv6 routes in the system.

View All IP Neighbors

The IP Neighbors view provides all available parameter data about all IP neighbors, all IPv4 neighbors, and all IPv6 neighbors in the system.

View All IP Addresses

The IP Addresses view provides all available parameter data about all IP addresses, all IPv4 addresses, and all IPv6 addresses in the system.

View All Sensors

The Sensors view provides all available parameter data provided by the power supply units (PSUs), fans, and temperature sensors in the system.

View What Just Happened

The What Just Happened (WJH) feature, available on Mellanox switches, streams detailed and contextual telemetry data for analysis. This provides real-time visibility into problems in the network, such as hardware packet drops due to buffer congestion, incorrect routing, and ACL or layer 1 problems. You must have Cumulus Linux 4.0.0 or later and NetQ 2.4.0 or later to take advantage of this feature.

If your switches are sourced from a vendor other than Mellanox, this view is blank as no data is collected.

When WJH capabilities are combined with Cumulus NetQ, you have the ability to hone in on losses, anywhere in the fabric, from a single management console. You can:

  • View any current or historic drop information, including the reason for the drop
  • Identify problematic flows or endpoints, and pin-point exactly where communication is failing in the network

By default, Cumulus Linux 4.0.0 provides the NetQ 2.3.1 Agent and CLI. If you installed Cumulus Linux 4.0.0 on your Mellanox switch, you need to upgrade the NetQ Agent and optionally the CLI to release 2.4.0 or later.

cumulus@<hostname>:~$ sudo apt-get update
cumulus@<hostname>:~$ sudo apt-get install -y netq-agent
cumulus@<hostname>:~$ netq config restart agent
cumulus@<hostname>:~$ sudo apt-get install -y netq-apps
cumulus@<hostname>:~$ netq config restart cli

Configure the WJH Feature

WJH is enabled by default on Mellanox switches and no configuration is required in Cumulus Linux 4.0.0; however, you must enable the NetQ Agent to collect the data in NetQ 2.4.0 or later.

To enable WJH in NetQ:

  1. Configure the NetQ Agent on the Mellanox switch.

    cumulus@switch:~$ netq config add agent wjh
    
  2. Restart the NetQ Agent to start collecting the WJH data.

    cumulus@switch:~$ netq config restart agent
    

When you are finished viewing the WJH metrics, you might want to disable the NetQ Agent to reduce network traffic. Use netq config del agent wjh followed by netq config restart agent to disable the WJH feature on the given switch.

Using wjh_dump.py on a Mellanox platform that is running Cumulus Linux 4.0 and the NetQ 2.4.0 agent causes the NetQ WJH client to stop receiving packet drop call backs. To prevent this issue, run wjh_dump.py on a different system than the one where the NetQ Agent has WJH enabled, or disable wjh_dump.py and restart the NetQ Agent (run netq config restart agent).

View What Just Happened Metrics

The What Just Happened view displays events based on conditions detected in the data plane. The most recent 1000 events from the last 24 hours are presented for each drop category.

Item Description
Title What Just Happened
Closes full screen card and returns to workbench
Results Number of results found for the selected tab
L1 Drops tab Displays the reason why a port is in the down state. By default, the listing is sorted by Last Timestamp. The tab provides the following additional data about each drop event:
  • Hostname: Name of the Mellanox server
  • Port Down Reason: Reason why the port is down
    • Port admin down: Port has been purposely set down by user
    • Auto-negotiation failure: Negotiation of port speed with peer has failed
    • Logical mismatch with peer link: Logical mismatch with peer link
    • Link training failure: Link is not able to go operational up due to link training failure
    • Peer is sending remote faults: Peer node is not operating correctly
    • Bad signal integrity: Integrity of the signal on port is not sufficient for good communication
    • Cable/transceiver is not supported: The attached cable or transceiver is not supported by this port
    • Cable/transceiver is unplugged: A cable or transceiver is missing or not fully plugged into the port
    • Calibration failure: Calibration failure
    • Port state changes counter: Cumulative number of state changes
    • Symbol error counter: Cumulative number of symbol errors
    • CRC error counter: Cumulative number of CRC errors
  • Corrective Action: Provides recommend action(s) to take to resolve the port down state
  • First Timestamp: Date and time this port was marked as down for the first time
  • Ingress Port: Port accepting incoming traffic
  • CRC Error Count: Number of CRC errors generated by this port
  • Symbol Error Count: Number of Symbol errors generated by this port
  • State Change Count: Number of state changes that have occurred on this port
  • OPID: Operation identifier; used for internal purposes
  • Is Port Up: Indicates whether the port is in an Up (true) or Down (false) state
L2 Drops tab Displays the reason for a link to be down. By default, the listing is sorted by Last Timestamp. The tab provides the following additional data about each drop event:
  • Hostname: Name of the Mellanox server
  • Source Port: Port ID where the link originates
  • Source IP: Port IP address where the link originates
  • Source MAC: Port MAC address where the link originates
  • Destination Port: Port ID where the link terminates
  • Destination IP: Port IP address where the link terminates
  • Destination MAC: Port MAC address where the link terminates
  • Reason: Reason why the link is down
    • MLAG port isolation: Not supported for port isolation implemented with system ACL
    • Destination MAC is reserved (DMAC=01-80-C2-00-00-0x): The address cannot be used by this link
    • VLAN tagging mismatch: VLAN tags on the source and destination do not match
    • Ingress VLAN filtering: Frames whose port is not a member of the VLAN are discarded
    • Ingress spanning tree filter: Port is in Spanning Tree blocking state
    • Unicast MAC table action discard: Currently not supported
    • Multicast egress port list is empty: No ports are defined for multicast egress
    • Port loopback filter: Port is operating in loopback mode; packets are being sent to itself (source MAC address is the same as the destination MAC address
    • Source MAC is multicast: Packets have multicast source MAC address
    • Source MAC equals destination MAC: Source MAC address is the same as the destination MAC address
  • First Timestamp: Date and time this link was marked as down for the first time
  • Aggregate Count : Total number of dropped packets
  • Protocol: ID of the communication protocol running on this link
  • Ingress Port: Port accepting incoming traffic
  • OPID: Operation identifier; used for internal purposes
Router Drops tab Displays the reason why the server is unable to route a packet. By default, the listing is sorted by Last Timestamp. The tab provides the following additional data about each drop event:
  • Hostname: Name of the Mellanox server
  • Reason: Reason why the server is unable to route a packet
    • Non-routable packet: Packet has no route in routing table
    • Blackhole route: Packet received with action equal to discard
    • Unresolved next-hop: The next hop in the route is unknown
    • Blackhole ARP/neighbor: Packet received with blackhole adjacency
    • IPv6 destination in multicast scope FFx0:/16: Packet received with multicast destination address in FFx0:/16 address range
    • IPv6 destination in multicast scope FFx1:/16: Packet received with multicast destination address in FFx1:/16 address range
    • Non-IP packet: Cannot read packet header because it is not an IP packet
    • Unicast destination IP but non-unicast destination MAC: Cannot read packet with IP unicast address when destination MAC address is not unicast (FF:FF:FF:FF:FF:FF)
    • Destination IP is loopback address: Cannot read packet as destination IP address is a loopback address (dip=>127.0.0.0/8)
    • Source IP is multicast: Cannot read packet as source IP address is a multicast address (ipv4 SIP => 224.0.0.0/4)
    • Source IP is in class E: Cannot read packet as source IP address is a Class E address
    • Source IP is loopback address: Cannot read packet as source IP address is a loopback address ( ipv4 => 127.0.0.0/8 for ipv6 => ::1/128)
    • Source IP is unspecified: Cannot read packet as source IP address is unspecified (ipv4 = 0.0.0.0/32; for ipv6 = ::0)
    • Checksum or IP ver or IPv4 IHL too short: Cannot read packet due to header checksum error, IP version mismatch, or IPv4 header length is too short
    • Multicast MAC mismatch: For IPv4, destination MAC address is not equal to {0x01-00-5E-0 (25 bits), DIP[22:0]} and DIP is multicast. For IPv6, destination MAC address is not equal to {0x3333, DIP[31:0]} and DIP is multicast.
    • Source IP equals destination IP: Packet has a source IP address equal to the destination IP address
    • IPv4 source IP is limited broadcast: Packet has broadcast source IP address
    • IPv4 destination IP is local network (destination = 0.0.0.0/8): Packet has IPv4 destination address that is a local network (destination=0.0.0.0/8)
    • IPv4 destination IP is link local: Packet has IPv4 destination address that is a local link
    • Ingress router interface is disabled: Packet destined to a different subnet cannot be routed because ingress router interface is disabled
    • Egress router interface is disabled: Packet destined to a different subnet cannot be routed because egress router interface is disabled
    • IPv4 routing table (LPM) unicast miss: No route available in routing table for packet
    • IPv6 routing table (LPM) unicast miss: No route available in routing table for packet
    • Router interface loopback: Packet has destination IP address that is local. For example, SIP = 1.1.1.1, DIP = 1.1.1.128.
    • Packet size is larger than MTU: Packet has larger MTU configured than the VLAN
    • TTL value is too small: Packet has TTL value of 1
Tunnel Drops tab Displays the reason for a tunnel to be down. By default, the listing is sorted by Last Timestamp. The tab provides the following additional data about each drop event:
  • Hostname: Name of the Mellanox server
  • Reason: Reason why the tunnel is down
    • Overlay switch - source MAC is multicast: Overlay packet's source MAC address is multicast
    • Overlay switch - source MAC equals destination MAC: Overlay packet's source MAC address is the same as the destination MAC address
    • Decapsulation error: Decapsulation produced incorrect format of packet. For example, encapsulation of packet with many VLANs or IP options on the underlay can cause decapsulation to result in a short packet.
Buffer Drops tab Displays the reason for the server buffer to be drop packets. By default, the listing is sorted by Last Timestamp. The tab provides the following additional data about each drop event:
  • Hostname: Name of the Mellanox server
  • Reason: Reason why the buffer dropped packet
    • Tail drop: Tail drop is enabled, and buffer queue is filled to maximum capacity
    • WRED: Weighted Random Early Detection is enabled, and buffer queue is filled to maximum capacity or the RED engine dropped the packet as of random congestion prevention.
ACL Drops tab Displays the reason for an ACL to drop packets. By default, the listing is sorted by Last Timestamp. The tab provides the following additional data about each drop event:
  • Hostname: Name of the Mellanox server
  • Reason: Reason why ACL dropped packets
    • Ingress port ACL: ACL action set to deny on the ingress port
    • Ingress router ACL: ACL action set to deny on the ingress router interface
    • Egress port ACL: ACL action set to deny on the egress port
    • Egress router ACL: ACL action set to deny on the egress router interface
Table Actions Select, export, or filter the list. Refer to Table Settings.