NVIDIA® NetQ™ is a scalable, modern network operations tool set that provides visibility into your overlay and underlay networks, enabling troubleshooting in real-time. NetQ delivers data and statistics about the health of your data center—from the container, virtual machine, or host, all the way to the switch and port. NetQ correlates configuration and operational status, and tracks state changes while simplifying management for the entire Linux-based data center. With NetQ, network operations change from a manual, reactive, node-by-node approach to an automated, informed, and agile one. Visit Network Operations with NetQ to learn more.
This user guide provides documentation for network administrators who are responsible for deploying, configuring, monitoring, and troubleshooting the network in their data center or campus environment.
For a list of the new features in this release, see What's New. For bug fixes and known issues, refer to the release notes.
What's New
This page summarizes new features and improvements for the NetQ 4.5 release. For a complete list of open and fixed issues, see the release notes.
What’s New in NetQ 4.5.0
NetQ 4.5.0 includes the following new features and improvements:
Access credentials that can be applied to individual switches for better security and increased flexibility
RoCE check and show commands that display priority code point (PCP) and switch priority (SP) mapping misconfigurations and recommendations
NetQ 4.5.0 images have been upgraded to Ubuntu 20.04.
To upgrade to NetQ 4.5.0, you must back up your current NetQ data and perform a new installation of NetQ 4.5.0. This process is supported when upgrading from NetQ 4.3.0 or above.
Upgrades from releases earlier than NetQ 4.3.0 require an incremental upgrade to version 4.3.0 before you back up your data and perform a new installation of NetQ 4.5.0.
The NetQ Hardware Appliance is no longer available for purchase. For existing customers, contact NVIDIA support for assistance upgrading to 4.5.
Compatible Agent Versions
NetQ 4.5.0 is compatible with NetQ Agent versions 4.4.0 and above. You can install NetQ Agents on switches and servers running:
Cumulus Linux 3.7.16 and later
SONiC 202012
CentOS 7
RHEL 7.1
Ubuntu 18.04
You must upgrade to the latest agent version to enable 4.5 features.
NetQ Overview
This section describes NetQ components and deployment models. It also outlines how to get started with the NetQ user interface and command line.
NetQ Basics
This section provides an overview of the NetQ hardware, software, and deployment models.
NetQ Components
NetQ contains the following applications and key components:
Telemetry data collection and aggregation via
NetQ switch agents
NetQ host agents
Database
Data streaming
Network services
User interfaces
While these functions apply to both the on-premises and cloud solutions, they are configured differently, as shown in the following diagrams.
NetQ Agents
NetQ Agents are installed via software and run on every monitored node in the network—including Cumulus® Linux® switches, Linux bare metal hosts, and virtual machines. The NetQ Agents push network data regularly and event information immediately to the NetQ Platform.
Switch Agents
The NetQ Agents running on Cumulus Linux or SONiC switches gather the following network data via Netlink:
Interfaces
IP addresses (v4 and v6)
IP routes (v4 and v6)
IP nexthops (v4 and v6)
Links
Bridge FDB (MAC address table)
ARP Entries/Neighbors (IPv4 and IPv6)
for the following protocols:
Bridging protocols: LLDP, STP, MLAG
Routing protocols: BGP, OSPF
Network virtualization: EVPN, VXLAN
Host Agents
The NetQ Agents running on hosts gather the same information as that for switches, plus the following network data:
Network IP and MAC addresses
Container IP and MAC addresses
The NetQ Agent obtains container information by listening to the Kubernetes orchestration tool.
The NetQ Agent is supported on hosts running Ubuntu 18.04, Red Hat® Enterprise Linux 7, and CentOS 7.
NetQ Core
The NetQ core performs the data collection, storage, and processing for delivery to various user interfaces. It consists of a collection of scalable components running entirely within a single server. The NetQ software queries this server, rather than individual devices, enabling greater system scalability.
Data Aggregation
The data aggregation component collects data coming from all of the NetQ Agents. It then filters, compresses, and forwards the data to the streaming component. The server monitors for missing messages and also monitors the NetQ Agents themselves, sending notifications about events when appropriate. In addition to the telemetry data collected from the NetQ Agents, the aggregation component collects information from the switches and hosts, such as vendor, model, version, and basic operational state.
Data Stores
NetQ uses two types of data stores. The first stores the raw data, data aggregations, and discrete events needed for quick response to data requests. The second stores data based on correlations, transformations, and raw-data processing.
Real-time Streaming
The streaming component processes the incoming raw data from the aggregation server in real time. It reads the metrics and stores them as a time series, and triggers alarms based on anomaly detection, thresholds, and events.
Network Services
The network services component monitors protocols and services operation individually and on a networkwide basis and stores status details.
User Interfaces
NetQ data is available through several interfaces:
NetQ CLI (command-line interface)
NetQ GUI (graphical user interface)
NetQ RESTful API (representational state transfer application programming interface)
The CLI and UI query the RESTful API to present data. NetQ can integrate with event notification applications and third-party analytics tools.
Data Center Network Deployments
This section describes three common data center deployment types for network management:
Out-of-band management (recommended)
In-band management
High availability
NetQ operates over layer 3, and can operate in both layer 2 bridged and layer 3 routed environments. NVIDIA recommends a layer 3 routed environment whenever possible.
Out-of-band Management Deployment
NVIDIA recommends deploying NetQ on an out-of-band (OOB) management network to separate network management traffic from standard network data traffic.
The physical network hardware includes:
Spine switches: aggregate and distribute data; also known as an aggregation switch, end-of-row (EOR) switch or distribution switch
Leaf switches: where servers connect to the network; also known as a top-of-rack (TOR) or access switch
Server hosts: host applications and data served to the user through the network
Exit switch: where connections to outside the data center occur; also known as Border Leaf or Service Leaf
Edge server (optional): where the firewall is the demarcation point, peering can occur through the exit switch layer to Internet (PE) devices
Internet device: where provider edge (PE) equipment communicates at layer 3 with the network fabric
The following figure shows an example of a Clos network fabric design for a data center using an OOB management network overlaid on top, where NetQ resides. The physical connections (shown as gray lines) between Spine 01 and four Leaf devices and two Exit devices, and Spine 02 and the same four Leaf devices and two Exit devices. Leaf 01 and Leaf 02 connect to each other over a peerlink and act as an MLAG pair for Server 01 and Server 02. Leaf 03 and Leaf 04 connect to each other over a peerlink and act as an MLAG pair for Server 03 and Server 04. The Edge connects to both Exit devices, and the Internet node connects to Exit 01.
The physical management hardware includes:
OOB management switch: aggregation switch that connects to all network devices through communications with the NetQ Agent on each node
NetQ Platform: hosts the telemetry software, database, and user interfaces
These switches connect to each physical network device through a virtual network overlay, shown with purple lines.
In-band Management Deployment
While not recommended, you can implement NetQ within your data network. In this scenario, there is no overlay and all traffic to and from the NetQ Agents and the NetQ Platform traverses the data paths along with your regular network traffic. The roles of the switches in the Clos network are the same, except that the NetQ Platform performs the aggregation function that the OOB management switch performed. If your network goes down, you might not have access to the NetQ Platform for troubleshooting.
High Availability Deployment
NetQ supports a high availability deployment for users who prefer a solution in which the collected data and processing provided by the NetQ Platform remains available through alternate equipment should the platform fail for any reason. In this configuration, three NetQ Platforms are deployed, with one as the master and two as workers (or replicas). NetQ Agents send data to all three switches so that if the master NetQ Platform fails, one of the replicas automatically becomes the master and continues to store and provide the telemetry data. The following example is based on an OOB-management configuration, and modified to support high availability for NetQ.
NetQ Operation
In either in-band or out-of-band deployments, NetQ offers networkwide configuration and device management, proactive monitoring capabilities, and network performance diagnostics.
The NetQ Agent
From a software perspective, a network switch has software associated with the hardware platform, the operating system, and communications. For data centers, the software on a network switch is similar to the following diagram:
The NetQ Agent interacts with the various components and software on switches and hosts and provides the gathered information to the NetQ Platform. You can view the data using the NetQ CLI or UI.
The NetQ Agent polls the user space applications for information about the performance of the various routing protocols and services that are running on the switch. Cumulus Linux supports BGP and OSPF routing protocols as well as static addressing through FRRouting (FRR). Cumulus Linux also supports LLDP and MSTP among other protocols, and a variety of services such as systemd and sensors. SONiC supports BGP and LLDP.
For hosts, the NetQ Agent also polls for performance of containers managed with Kubernetes. This information is used to calculate the network’s health and check if the network is configured and operating correctly.
The NetQ Agent interacts with the Netlink communications between the Linux kernel and the user space, listening for changes to the network state, configurations, routes, and MAC addresses. NetQ sends notifications about these changes so that network operators and administrators can respond quickly when changes are not expected or favorable.
The NetQ Agent also interacts with the hardware platform to obtain performance information about various physical components, such as fans and power supplies, on the switch. The agent measures operational states and temperatures, along with cabling information to allow for proactive maintenance.
The NetQ Platform
After the collected data is sent to and stored in the NetQ database, you can:
Validate configurations and identify misconfigurations in your current network or in a previous deployment.
Monitor communication paths throughout the network.
Notify users of network issues.
Anticipate the impact of connectivity changes.
Validate Configurations
The NetQ CLI lets you validate your network’s health through two sets of commands: netq check and netq show. They extract the information from the network service component and event service. The network service component is continually validating the connectivity and configuration of the devices and protocols running on the network. Using the netq check and netq show commands displays the status of the various components and services on a networkwide and complete software stack basis. netq check and netq show commands are available for the following components and services:
Component or Service
Check
Show
Component or Service
Check
Show
Agents
X
X
LLDP
X
BGP
X
X
MACs
X
MLAG (CLAG)
X
X
MTU
X
Events
X
NTP
X
X
EVPN
X
X
OSPF
X
X
Interfaces
X
X
Sensors
X
X
Inventory
X
Services
X
IPv4/v6
X
VLAN
X
X
Kubernetes
X
VXLAN
X
X
Monitor Communication Paths
The trace engine validates the available communication paths between two network devices. The corresponding netq trace command enables you to view all of the paths between the two devices and if there are any breaks in the paths. For more information about trace requests, refer to Verify Network Connectivity.
View Historical State and Configuration Info
You can run all check, show, and trace commands for current and past statuses. To investigate past issues, use the netq check command and look for configuration or operational issues around the time that NetQ timestamped event messages. Then use the netq show commands to view information about device configurations. You can also use the netq trace command to see what the connectivity looked like between any problematic nodes at a particular time.
For example, the following diagram shows issues on spine01, leaf04, and server03:
An administrator can run the following commands from any switch in the network to determine the cause of a BGP error on spine01:
cumulus@switch:~$ netq check bgp around 30m
Total Nodes: 25, Failed Nodes: 3, Total Sessions: 220 , Failed Sessions: 24,
Hostname VRF Peer Name Peer Hostname Reason Last Changed
----------------- --------------- ----------------- ----------------- --------------------------------------------- -------------------------
exit-1 DataVrf1080 swp6.2 firewall-1 BGP session with peer firewall-1 swp6.2: AFI/ 1d:2h:6m:21s
SAFI evpn not activated on peer
exit-1 DataVrf1080 swp7.2 firewall-2 BGP session with peer firewall-2 (swp7.2 vrf 1d:1h:59m:43s
DataVrf1080) failed,
reason: Peer not configured
exit-1 DataVrf1081 swp6.3 firewall-1 BGP session with peer firewall-1 swp6.3: AFI/ 1d:2h:6m:21s
SAFI evpn not activated on peer
exit-1 DataVrf1081 swp7.3 firewall-2 BGP session with peer firewall-2 (swp7.3 vrf 1d:1h:59m:43s
DataVrf1081) failed,
reason: Peer not configured
exit-1 DataVrf1082 swp6.4 firewall-1 BGP session with peer firewall-1 swp6.4: AFI/ 1d:2h:6m:21s
SAFI evpn not activated on peer
exit-1 DataVrf1082 swp7.4 firewall-2 BGP session with peer firewall-2 (swp7.4 vrf 1d:1h:59m:43s
DataVrf1082) failed,
reason: Peer not configured
exit-1 default swp6 firewall-1 BGP session with peer firewall-1 swp6: AFI/SA 1d:2h:6m:21s
FI evpn not activated on peer
exit-1 default swp7 firewall-2 BGP session with peer firewall-2 (swp7 vrf de 1d:1h:59m:43s
...
cumulus@switch:~$ netq exit-1 show bgp
Matching bgp records:
Hostname Neighbor VRF ASN Peer ASN PfxRx Last Changed
----------------- ---------------------------- --------------- ---------- ---------- ------------ -------------------------
exit-1 swp3(spine-1) default 655537 655435 27/24/412 Fri Feb 15 17:20:00 2019
exit-1 swp3.2(spine-1) DataVrf1080 655537 655435 14/12/0 Fri Feb 15 17:20:00 2019
exit-1 swp3.3(spine-1) DataVrf1081 655537 655435 14/12/0 Fri Feb 15 17:20:00 2019
exit-1 swp3.4(spine-1) DataVrf1082 655537 655435 14/12/0 Fri Feb 15 17:20:00 2019
exit-1 swp4(spine-2) default 655537 655435 27/24/412 Fri Feb 15 17:20:00 2019
exit-1 swp4.2(spine-2) DataVrf1080 655537 655435 14/12/0 Fri Feb 15 17:20:00 2019
exit-1 swp4.3(spine-2) DataVrf1081 655537 655435 14/12/0 Fri Feb 15 17:20:00 2019
exit-1 swp4.4(spine-2) DataVrf1082 655537 655435 13/12/0 Fri Feb 15 17:20:00 2019
exit-1 swp5(spine-3) default 655537 655435 28/24/412 Fri Feb 15 17:20:00 2019
exit-1 swp5.2(spine-3) DataVrf1080 655537 655435 14/12/0 Fri Feb 15 17:20:00 2019
exit-1 swp5.3(spine-3) DataVrf1081 655537 655435 14/12/0 Fri Feb 15 17:20:00 2019
exit-1 swp5.4(spine-3) DataVrf1082 655537 655435 14/12/0 Fri Feb 15 17:20:00 2019
exit-1 swp6(firewall-1) default 655537 655539 73/69/- Fri Feb 15 17:22:10 2019
exit-1 swp6.2(firewall-1) DataVrf1080 655537 655539 73/69/- Fri Feb 15 17:22:10 2019
exit-1 swp6.3(firewall-1) DataVrf1081 655537 655539 73/69/- Fri Feb 15 17:22:10 2019
exit-1 swp6.4(firewall-1) DataVrf1082 655537 655539 73/69/- Fri Feb 15 17:22:10 2019
exit-1 swp7 default 655537 - NotEstd Fri Feb 15 17:28:48 2019
exit-1 swp7.2 DataVrf1080 655537 - NotEstd Fri Feb 15 17:28:48 2019
exit-1 swp7.3 DataVrf1081 655537 - NotEstd Fri Feb 15 17:28:48 2019
exit-1 swp7.4 DataVrf1082 655537 - NotEstd Fri Feb 15 17:28:48 2019
Manage Network Events
The NetQ notifier lets you capture and filter events for devices, components, protocols, and services. This is especially useful when an interface or routing protocol goes down and you want to get them back up and running as quickly as possible. You can improve resolution time significantly by creating filters that focus on topics appropriate for a particular group of users. You can create filters for events related to BGP and MLAG session states, interfaces, links, NTP and other services, fans, power supplies, and physical sensor measurements.
The following is an example of a Slack message received on a netq-notifier channel indicating that the BGP session on switch leaf04 interface swp2 has gone down:
Every event or entry in the NetQ database is stored with a timestamp that reports when the NetQ Agent captured an event on the switch or server. This timestamp is based on the switch or server time where the NetQ Agent is running, and is pushed in UTC format.
Interface state, IP addresses, routes, ARP/ND table (IP neighbor) entries and MAC table entries carry a timestamp that represents the time an event occurred (such as when a route is deleted or an interface comes up).
Data that is captured and saved based on polling has a timestamp according to when the information was captured rather than when the event actually happened, though NetQ compensates for this if the data extracted provides additional information to compute a more precise time of the event. For example, BGP uptime can be used to determine when the event actually happened in conjunction with the timestamp.
Restarting a NetQ Agent on a device does not update the timestamps for existing objects to reflect this new restart time. NetQ preserves their timestamps relative to the original start time of the Agent. A rare exception is if you reboot the device between the time it takes the Agent to stop and restart; in this case, the time is still relative to the start time of the Agent.
Exporting NetQ Data
You can export data from the NetQ Platform in the CLI or UI:
In the CLI, use the json option to output command results to JSON format for parsing in other applications
In the UI, expand the cards to a full-screen, tabular view and select export
Important File Locations
The following configuration and log files can help with troubleshooting:
File
Description
/etc/netq/netq.yml
The NetQ configuration file. This file appears only if you installed either the netq-apps package or the NetQ Agent on the system.
/var/log/netqd.log
The NetQ daemon log file for the NetQ CLI. This log file appears only if you installed the netq-apps package on the system.
/var/log/netq-agent.log
The NetQ Agent log file. This log file appears only if you installed the NetQ Agent on the system.
NetQ User Interface Overview
The NetQ user interface (UI) lets you access NetQ through a web browser, where you can visualize your network and interact with the display using a keyboard and mouse.
The NetQ UI is supported on Google Chrome and Mozilla Firefox. It is designed to be viewed on a display with a minimum resolution of 1920 × 1080 pixels.
The following are the default usernames and passwords for UI access:
NetQ On-premises: admin, admin
NetQ Cloud: Use the credentials you created during setup. You should receive an email from NVIDIA titled NetQ Access Link.
Enter your username and password to log in. You can also log in with SSO if your company has enabled it.
Username and Password
Locate the email you received from NVIDIA titled NetQ Access Link. Select Create Password.
Enter a new password, then enter it again to confirm it.
Log in using your email address and new password.
Accept the Terms of Use after reading them.
The default workbench opens, with your username and premises shown in the top-right corner of NetQ.
SSO
Follow the steps above until you reach the NetQ login screen.
Select Sign up for SSO and enter your organization’s name.
Enter your username and password.
Create a new password and enter the new password again to confirm it.
Click Update and Accept after reading the Terms of Use.
The default workbench opens, with your username shown in the top-right corner of NetQ.
Enter your username.
Enter your password.
The user-specified home workbench is displayed. If a home workbench is not specified, then the default workbench is displayed.
Any workbench can be set as the home workbench. Select User Settings > Profiles and Preferences, then on the Workbenches card select the workbench you'd like to designate as your home workbench.
Log Out of NetQ
Select User Settings in the top-right corner of NetQ.
Application Header: Contains the main menu, NetQ version, search, validation summary, local time zone, premises list, and account information.
Workbench: Contains a task bar and content cards (with status and configuration information about your network and its various components).
Main Menu
Found in the application header, click Menu to navigate to:
Header
Menu
Search: a search bar to quickly find an item on the main menu
Favorites: contains link to the user-defined favorite workbenches; Home points to the NetQ Workbench until reset by a user
Workbenches: contains links to all workbenches
Network: contains links to tabular data about various network elements and the What Just Happened feature
Notifications: contains link to threshold-based event rules and notification channel specifications
Admin: contains links to application management and lifecycle management features (only visible to users with Admin access role)
Search
You can search for devices and cards in the Global Search field in the header. It behaves like most searches and can help you quickly find device information.
NVIDIA Logo
Clicking the NVIDIA logo takes you to your favorite workbench. For details about specifying your favorite workbench, refer to Set User Preferences.
Validation Summary
Found in the header, the validation summary displays the overall health of your network.
On initial start up of the application, it can take up to an hour to reach an accurate health indication as some processes only run every 30 minutes.
Workbenches
A workbench comprises a given set of cards. A pre-configured default workbench, NetQ Workbench, is available to get you started. You can customize your workbenches by adding or removing cards. For more detail about managing your data using workbenches, refer to Focus Your Monitoring Using Workbenches.
Cards
Cards display information about your network. Each card describes a particular aspect of the network and can be expanded to display information and statistics at increasingly granular levels. You can add and remove cards from a workbench, move between cards and card sizes, and make copies of cards to show different levels of data at the same time. For details about working with cards, refer to Access Data with Cards.
User Settings
Each user can customize the NetQ application display, time zone and date format; change their account password; and manage their workbenches. This is all performed from User Settings > Profile & Preferences. For details, refer to Set User Preferences.
Focus Your Monitoring Using Workbenches
Workbenches are dashboards where you collect and view data. Two types of workbenches are available:
Default: Provided by NVIDIA; you cannot save changes you make to these workbenches
Custom: Created by the user; changes made to these workbenches are saved automatically
Both types of workbenches display a set of cards. Default workbenches are public (accessible to all users), whereas custom workbenches are private (viewing is restricted to the user who created them).
Default Workbenches
The default workbench contains Device Inventory, Switch Inventory, Events, and Validation Summary cards, giving you an overview of how your network is operating.
Upon initial login, the NetQ Workbench opens. Upon subsequent logins, the last workbench you viewed opens.
Custom Workbenches
People with either administrative or user roles can create and save an unlimited number of custom workbenches. For example, you might create a workbench that:
Shows network statistics for the past week alongside network statistics for the past 24 hours.
Only displays data about virtual overlays.
Displays switches that you are troubleshooting.
Is focused on application or account management.
Create a Workbench
Select New in the workbench header.
Enter a name for the workbench and choose whether to set it as your default home workbench.
Select the cards you would like displayed on your new workbench.
Click Create.
Refer to Access Data with Cards for information about interacting with cards on your workbenches.
Clone a Workbench
To create a duplicate of an existing workbench:
Select Clone in the workbench header.
Name the cloned workbench and select Clone.
Remove a Workbench
Admins can remove any workbench, except for the default NetQ Workbench. User accounts can only remove workbenches they have created.
To remove a workbench:
Select User Settings in the top-right corner.
Select Profile & Preferences.
Locate the Workbenches card.
Hover over the workbench you want to remove, and click Delete.
Open an Existing Workbench
There are several options for opening workbenches:
Open through the Workbench header
Click next to the current workbench name and locate the workbench
Under My Home, click the name of your favorite workbench
Under My Most Recent, click the workbench if in list
Search by workbench name
Click All My WB to open all workbenches and select it from the list
Open through the main menu
Expand the Menu and select the workbench from the Favorites or Workbenches sections
Open through the NVIDIA logo
Click the logo in the header to open your favorite workbench
Manage Auto-refresh
You can specify how often to update the data displayed on your workbenches. Three refresh rates are available:
Analyze: updates every 30 seconds
Debug: updates every minute
Monitor: updates every 2 minutes
By default, auto-refresh is configured to update every 30 seconds.
To modify the auto-refresh setting:
Select the dropdown next to Refresh.
Select the refresh rate. A check mark indicates the current selection. The new refresh rate is applied immediately.
To disable auto-refresh, select Pause. When you’re ready for the data to refresh, select Play.
Access Data with Cards
Cards present information about your network for monitoring and troubleshooting; each card describes a particular aspect of the network. Cards are collected onto a workbench where all data relevant to a task or set of tasks is visible. You can add and remove cards from a workbench, increase or decrease their sizes, change the time period of the data shown on a card, and make copies of cards to show different levels of data at the same time.
Available Cards
Each card focuses on a particular aspect of your network. They include:
Validation summary: overview of your network’s health
Events: system events and anomalies
What Just Happened: network issues and packet drops
Device groups: distribution of device components
Trace request: discovery workflow for paths between two devices in the network fabric
MAC move commentary: info about changes to a MAC address on a specific VLAN
Network services cards: BGP, MLAG, EVPN, OSPF, and LLDP
Inventory cards: Devices, Switches, DPUs, and Hosts
Card Sizes
Cards are available in 4 sizes. The granularity of the content on a card varies with the size of the card, with the highest level of information on the smallest card to the most detailed information on the full-screen card.
Card Size Summary
Card Size
Small
Medium
Large
Full Screen
Primary Purpose
Quick view of status, typically at the level of good or bad
View key performance parameters or statistics
Perform quick actions
Monitor for potential issues
View detailed performance and statistics
Perform actions
Compare and review related information
View all attributes for given network aspect
Analyze and visualize detailed data
Export and filter data
Card Actions
Add Cards to Your Workbench
Click Add card in the header.
Select the card(s) you want to add to your workbench.
When you have selected the cards you want to add to your workbench, select Open cards:
The cards are placed at the end of the set of cards currently on the workbench. You might need to scroll down to see them. Drag and drop the cards on the workbench to rearrange them.
Add Switch Cards to Your Workbench
You can add switch cards to a workbench through the Devices icon on the header or by searching for it in the Global Search field. To add a switch card from the header:
Click Devices, then select Open a device card.
Select the device from the suggestions that appear:
Choose the card’s size, then select Add.
Remove Cards from Your Workbench
To remove all the cards from your workbench, click the Clear icon in the header. To remove an individual card:
Hover over the card you want to remove.
Click (More Actions menu).
Select Remove.
The card is removed from the workbench, but not from the application.
Change the Size of the Card
Hover over the top portion of the card until you see a rectangular box divided into four segments.
Move your cursor over the box until the desired size option is highlighted.
One-quarter width opens a small card. One-half width opens a medium card. Three-quarters width opens a large card. Full width opens a full-screen card.
Select the size. When the card changes to the selected size, it might move to a different area on the workbench.
Change the Time Period for the Card Data
All cards have a default time period for the data shown on the card, typically the last 24 hours. You can change the time period to view the data during a different time range to aid analysis of previous or existing issues.
To change the time period for a card:
Hover over the top portion of the card and select the clock icon .
Select a time period from the dropdown list.
Changing the time period in this manner only changes the time period for the given card.
Table Settings
You can manipulate the tabular data displayed in a full-screen card by filtering and sorting the columns. Hover over the column header and select it to sort the column. The data is sorted in ascending or descending order: A-Z, Z-A, 1-n, or n-1. The number of rows that can be sorted is limited to 10,000.
To reposition the columns, drag and drop them using your mouse. You can also export the data presented in the table by selecting Export.
The following icons are common in the full-screen card view:
Icon
Action
Description
Select All
Selects all items in the list.
Clear All
Clears all existing selections in the list.
Add Item
Adds item to the list.
Edit
Edits the selected item.
Delete
Removes the selected items.
Filter
Filters the list using available parameters.
,
Generate/Delete AuthKeys
Creates or removes NetQ CLI authorization keys.
Open Cards
Opens the corresponding validation or trace card(s).
Assign role
Opens role assignment options for switches.
Export
Exports selected data into either a .csv or JSON-formatted file.
When there are many items in a table, NetQ loads up to 25 rows by default and provides the rest in additional table pages, accessible through the pagination controls. Pagination is displayed under the table.
Set User Preferences
Each user can customize the NetQ application display, change their account password, and manage their workbenches.
Configure Display Settings
The Display card contains the options for setting the application theme (light or dark), language, time zone, and date formats.
To configure the display settings:
Select User Settings in the top-right corner.
Select Profile & Preferences.
Locate the Display card:
In the Theme field, click to select either dark or light theme. The following figure shows the light theme:
In the Time Zone field, click to change the time zone from the default.
By default, the time zone is set to the user’s local time zone. If a time zone has not been selected, NetQ defaults to the current local time zone where NetQ is installed. All time values are based on this setting. This is displayed (and can also be changed) in the application header, and is based on Greenwich Mean Time (GMT). If your deployment is not local to you (for example, you want to view the data from the perspective of a data center in another time zone) you can change the display to a different time zone.
In the Date Format field, select the date and time format you want displayed on the cards.
Change Your Password
Click User Settings in the top-right corner.
Click Profile & Preferences.
In the Basic Account Info card, select Change password.
Enter your current password, followed by your new password.
A workbench is similar to a dashboard. This is where you collect and view the data that is important to you. You can have more than one workbench and manage them with the Workbenches card located in Profile & Preferences. From the Workbenches card, you can view, sort, and delete workbenches. For a detailed overview of workbenches, see Focus Your Monitoring Using Workbenches.
NetQ Command Line Overview
The NetQ CLI provides access to all network state and event information collected by NetQ Agents. It behaves similarly to typical CLIs, with groups of commands that display related information, and help commands that provide additional information. See the command line reference for a comprehensive list of NetQ commands, including examples, options, and definitions.
The NetQ command line interface only runs on switches and server hosts implemented with Intel x86 or ARM-based architectures.
CLI Access
When you install or upgrade NetQ, you can also install and enable the CLI on your NetQ server or appliance and hosts.
To access the CLI from a switch or server:
Log in to the device. The following example uses the default username of cumulus and a hostname of switch:
<computer>:~<username>$ ssh cumulus@switch
Enter your password to reach the command prompt. The default password is CumulusLinux!
You can now run commands:
cumulus@switch:~$ netq show agents
cumulus@switch:~$ netq check bgp
Command Line Basics
This section describes the core structure and behavior of the NetQ CLI.
Command Line Structure
The NetQ command line has a flat structure as opposed to a modal structure: you can run all commands from the standard command prompt instead of only in a specific mode, at the same level.
Command Syntax
All NetQ CLI commands begin with netq. NetQ commands fall into one of four syntax categories: validation (check), monitoring (show), configuration, and trace.
netq check <network-protocol-or-service> [options]
netq show <network-protocol-or-service> [options]
netq config <action> <object> [options]
netq trace <destination> from <source> [options]
Symbols
Meaning
Parentheses ( )
Grouping of required parameters. Choose one.
Square brackets [ ]
Single or group of optional parameters. If more than one object or keyword is available, choose one.
Angle brackets < >
Required variable. Value for a keyword or option; enter according to your deployment nomenclature.
Pipe |
Separates object and keyword options, also separates value options; enter one object or keyword and zero or one value.
For example, in the netq check command:
[<hostname>] is an optional parameter with a variable value named hostname
<network-protocol-or-service> represents a number of possible keywords, such as agents, bgp, evpn, and so forth
<options> represents a number of possible conditions for the given object, such as around, vrf, or json
Examples of valid commands include:
netq show bgp
netq config restart cli
netq trace 10.0.0.5 from 10.0.0.35
Command Output
The command output presents results in color for many commands. Results with errors appear in red, and warnings appear in yellow. Results without errors or warnings appear in either black or green. VTEPs appear in blue. A node in the pretty output appears in bold, and angle brackets (< >) wrap around a router interface. To view the output with only black text, run the netq config del color command. You can view output with colors again by running netq config add color.
All check and show commands have a default timeframe of now to one hour ago, unless you specify an approximate time using the around keyword or a range using the between keyword. For example, running netq check bgp shows the status of BGP over the last hour. Running netq show bgp around 3h shows the status of BGP three hours ago.
When entering a time value, you must include a numeric value and the unit of measure:
w: weeks
d: days
h: hours
m: minutes
s: seconds
now
When using the between option, you can enter the start time (text-time) and end time (text-endtime) values as most recent first and least recent second, or vice versa. The values do not have to have the same unit of measure. Use the around option to view information for a particular time.
Command Prompts
NetQ code examples use the following prompts:
cumulus@switch:~$ indicates the user cumulus is logged in to a switch to run the example command
cumulus@host:~$ indicates the user cumulus is logged in to a host to run the example command
cumulus@netq-appliance:~$ indicates the user cumulus is logged in to either the NetQ Appliance or NetQ Cloud Appliance to run the command
cumulus@hostname:~$ indicates the user cumulus is logged in to a switch, host or appliance to run the example command
To use the NetQ CLI, the switches must be running the Cumulus Linux or SONiC operating system, NetQ Platform or NetQ Collector software, the NetQ Agent, and the NetQ CLI. The hosts must be running CentOS, RHEL, or Ubuntu OS, the NetQ Agent, and the NetQ CLI. Refer to Install NetQ for additional information.
Command Completion
As you enter commands, you can get help with the valid keywords or options using the tab key. For example, using tab completion with netq check displays the possible objects for the command, and returns you to the command prompt to complete the command:
cumulus@switch:~$ netq check <<press Tab>>
agents : Netq agent
bgp : BGP info
cl-version : Cumulus Linux version
clag : Cumulus Multi-chassis LAG
evpn : EVPN
interfaces : network interface port
mlag : Multi-chassis LAG (alias of clag)
mtu : Link MTU
ntp : NTP
ospf : OSPF info
sensors : Temperature/Fan/PSU sensors
vlan : VLAN
vxlan : VXLAN data path
cumulus@switch:~$ netq check
Command Help
As you enter commands, you can get help with command syntax by entering help at various points within a command entry. For example, to find out which options are available for a BGP check, enter help after entering some of the netq check command. In the following example, you can see that there are no additional required parameters and you can use three optional parameters — hostnames, vrf, and around — with a BGP check:
The CLI stores commands issued within a session, which lets you review and rerun commands that you already ran. At the command prompt, press the Up Arrow and Down Arrow keys to move back and forth through the list of commands previously entered. When you have found a given command, you can run the command by pressing Enter, just as you would if you had entered it manually. You can also modify the command before you run it.
Command Categories
While the CLI has a flat structure, NetQ commands are conceptually grouped into the following functional categories:
The netq check commands validate the current or historical state of the network by looking for errors and misconfigurations in the network. The commands run fabric-wide validations against various configured protocols and services to determine how well the network is operating. You can perform validation checks for the following:
addresses: IPv4 and IPv6 addresses duplicates across devices
agents: NetQ Agents operation on all switches and hosts
bgp: BGP (Border Gateway Protocol) operation across the network
fabric
clag: Cumulus Linux MLAG (multi-chassis LAG/link aggregation) operation
mtu: Link MTU (maximum transmission unit) consistency across paths
ntp: NTP (Network Time Protocol) operation
ospf: OSPF (Open Shortest Path First) operation
roce: RoCE (RDMA over Converged Ethernet) configurations
sensors: Temperature/Fan/PSU sensor operation
vlan: VLAN (Virtual Local Area Network) operation
vxlan: VXLAN (Virtual Extensible LAN) data path operation
The commands take the form of netq check <network-protocol-or-service> [options], where the options vary according to the protocol or service.
▼
Example check command
The following example shows the output for the netq check bgp command. If there were any failures, they would appear below the summary results or in the failedNodes section, respectively.
cumulus@switch:~$ netq check bgp
bgp check result summary:
Checked nodes : 8
Total nodes : 8
Rotten nodes : 0
Failed nodes : 0
Warning nodes : 0
Additional summary:
Total Sessions : 30
Failed Sessions : 0
Session Establishment Test : passed
Address Families Test : passed
Router ID Test : passed
The netq show commands let you view details about the current or historical configuration and status of various protocols and services. You can view the configuration and status for the following:
address-history: Address history info for an IP address/prefix
agents: NetQ Agents status on switches and hosts
bgp: BGP status across the network fabric
cl-btrfs-info: BTRFS file system data for monitored Cumulus Linux switches
cl-manifest: Information about the versions of Cumulus Linux available on monitored switches
cl-pkg-info: Information about software packages installed on monitored switches
cl-resource: ACL and forwarding information
cl-ssd-util: SSD utilization information
clag: CLAG/MLAG status
dom: Digital Optical Monitoring information
ecmp: Equal-cost multi-path routing
ethtool-stats: Interface statistics
events: Display changes over time
events-config: Event suppression configuration
evpn: EVPN status
interfaces: Interface information
interface-stats: Interface performance statistics
interface-utilization: Interface statistics plus utilization
interfaces: network interface port status
inventory: hardware component information
ip: IPv4 status
ipv6: IPv6 status
job-status: status of upgrade jobs running on the appliance or VM
kubernetes: Kubernetes cluster, daemon, pod, node, service, and replication status
lldp: LLDP status
mac-commentary: MAC commentary info for a MAC address
mac-history: Historical information for a MAC address
macs: MAC table or address information
mlag: MLAG status (an alias for CLAG)
neighbor-history: Neighbor history info for an IP address
notification: Notifications sent to various channels
ntp: NTP status
opta-health: Display health of apps on the OPTA
opta-platform: NetQ Appliance version information and uptime
ospf: OSPF status
ptp: Precision Time Protocol status
recommended-pkg-version: Current host information to be considered
resource-util: Display usage of memory, CPU and disk resources
roce-config: Display RoCE configuration
roce-counters: Displays RDMA over Converged Ethernet counters for a given switch
sensors: Temperature/Fan/PSU sensor status
services: System services status
stp topology: Spanning Tree Protocol topology
tca: Threshold crossing alerts
trace: Control plane trace path across fabric
unit-tests: Show list of unit tests for netq check
validation: Scheduled validation check
vlan: VLAN status
vxlan: VXLAN data path status
wjh-drop: dropped packet data from NVIDIA® Mellanox® What Just Happened®
The commands take the form of netq [<hostname>] show <network-protocol-or-service> [options], where the options vary according to the protocol or service. You can restrict the commands from showing the information for all devices to showing information only for a selected device using the hostname option.
▼
Example show command
The following example shows the standard output for the netq show agents command:
The following example shows the filtered output for the netq show agents command:
cumulus@switch:~$ netq leaf01 show agents
Matching agents records:
Hostname Status NTP Sync Version Sys Uptime Agent Uptime Reinitialize Time Last Changed
----------------- ---------------- -------- ------------------------------------ ------------------------- ------------------------- -------------------------- -------------------------
leaf01 Fresh yes 3.2.0-cl4u30~1601410518.104fb9ed Mon Sep 21 16:49:04 2020 Tue Sep 29 21:24:49 2020 Tue Sep 29 21:24:49 2020 Thu Oct 1 16:26:33 2020
Configuration Commands
Various commands—including netq config, netq notification, and netq install—allow you to manage NetQ Agent and CLI server configurations, configure lifecycle management, set up container monitoring, and manage notifications.
NetQ Agent Configuration
The agent commands configure individual NetQ Agents.
The agent configuration commands can add and remove agents from switches and hosts, start and stop agent operations, debug the agent, specify default commands, and enable or disable a variety of monitoring features (including Kubernetes, sensors, FRR (FRRouting), CPU usage limit, and What Just Happened).
Commands apply to one agent at a time. Run them from the switch or host where the NetQ Agent resides.
The following example shows how to view the NetQ Agent configuration:
cumulus@switch:~$ netq config show agent
netq-agent value default
--------------------- --------- ---------
enable-opta-discovery True True
exhibitport
agenturl
server 127.0.0.1 127.0.0.1
exhibiturl
vrf default default
agentport 8981 8981
port 31980 31980
After making configuration changes to your agents, you must restart the agent for the changes to take effect. Use the netq config restart agent command.
The netq config cli configures and manages the CLI component. You can add or remove the CLI (essentially enabling/disabling the service), start and restart it, and view the configuration of the service.
Commands apply to one device at a time, and you run them from the switch or host where you run the CLI.
The CLI configuration commands include:
netq config add cli server
netq config del cli server
netq config show cli premises [json]
netq config show (cli|all) [json]
netq config (status|restart) cli
netq config select cli premise
The following example shows how to restart the CLI instance:
cumulus@switch~:$ netq config restart cli
The following example shows how to enable the CLI on a NetQ on-premises appliance or virtual machine (VM):
cumulus@switch~:$ netq config add cli server 10.1.3.101
The following example shows how to enable the CLI on a NetQ Cloud Appliance or VM for the Chicago premises and the default port:
netq config add cli server api.netq.cumulusnetworks.com access-key <user-access-key> secret-key <user-secret-key> premises chicago port 443
NetQ System Configuration Commands
Use the following commands to manage the NetQ system itself:
bootstrap: Loads the installation program onto the network switches and hosts in either a single server or server cluster arrangement.
decommission: Decommissions a switch or host.
install: Installs NetQ in standalone or cluster deployments; also used to install patch software.
upgrade bundle: Upgrades NetQ on NetQ On-premises Appliances or VMs.
The following example shows how to bootstrap a single server or master server in a server cluster:
For information and examples on installing and upgrading the NetQ system, see Install NetQ and Upgrade NetQ.
Event Notification Commands
The notification configuration commands can add, remove, and show notification application integrations. These commands create the channels, filters, and rules needed to control event messaging. The commands include:
NetQ supports TCA events, a set of events that are triggered by crossing a user-defined threshold. Configure and manage TCA events using the following commands:
The netq lcmlifecycle management commands help you efficiently manage the deployment of NVIDIA product software onto your network devices (servers, appliances, and switches).
LCM commands allow you to:
Manage network OS and NetQ images in a local repository
Configure switch access credentials for installations and upgrades
Manage switch inventory and roles
Upgrade NetQ (Agents and CLI) on switches with NetQ Agents
Install or upgrade NetQ Agents and CLI on switches with or without NetQ Agents
Upgrade the network OS on switches with NetQ Agents
View a result history of upgrade attempts
The following example shows the NetQ configuration profiles:
cumulus@switch:~$ netq lcm show netq-config
ID Name Default Profile VRF WJH CPU Limit Log Level Last Changed
------------------------- --------------- ------------------------------ --------------- --------- --------- --------- -------------------------
config_profile_3289efda36 NetQ default co Yes mgmt Disable Disable info Tue Apr 27 22:42:05 2021
db4065d56f91ebbd34a523b45 nfig
944fbfd10c5d75f9134d42023
eb2b
The following example shows how to add a Cumulus Linux installation image to the NetQ repository on the switch:
The netq trace commands lets you view the available paths between two nodes on the network currently and at a time in the past. You can perform a layer 2 or layer 3 trace, and view the output in one of three formats: JSON, pretty, and detail. JSON output provides the output in a JSON file format for ease of importing to other applications or software. Pretty output lines up the paths in a pseudo-graphical manner to help visualize multiple paths. Detail output is useful for traces with higher hop counts where the pretty output wraps lines, making it harder to interpret the results. The detail output displays a table with a row for each path.
This section describes how to install, configure, and upgrade NetQ.
Before you begin, review the release notes for this version.
Before You Install
This overview is designed to help you understand the various NetQ deployment and installation options.
Installation Overview
Consider the following before you install the NetQ system:
Determine whether to deploy the solution fully on premises or as a remote solution.
Decide whether to deploy a virtual machine on your own hardware or use one of the NetQ appliances.
Choose whether to install the software on a single server or as a server cluster.
The following decision tree reflects these steps:
Deployment Type: On Premises or Remote
You can deploy NetQ in one of two ways.
Hosted on premises: Choose this deployment if you want to host all required hardware and software at your location, and you have the in-house skill set to install, configure, and maintain it—including performing data backups, acquiring and maintaining hardware and software, and integration management. This model is also a good choice if you want very limited or no access to the internet from switches and hosts in your network or you have data residency requirements like GDPR.
Hosted remotely: Choose this deployment to host a multi-site, on-premises deployment or use the NetQ Cloud service. In the multi-site deployment, you host multiple small servers at each site and a large server and database at another site. In the cloud service deployment, you host only a small local server on your premises that connects to the NetQ Cloud service over selected ports or through a proxy server. The cloud service supports only data aggregation and forwarding locally, and the majority of the NetQ applications use a hosted deployment strategy, storing data in the cloud. NVIDIA handles the backups and maintenance of the application and storage. This remote cloud service model is often chosen when it is untenable to support deployment in-house or if you need the flexibility to scale quickly, while also reducing capital expenses.
With either deployment model, the NetQ Agents reside on the switches and hosts they monitor in your network.
System: Virtual Machine or NetQ Appliances
The next installation consideration is whether you plan to use NetQ Cloud Appliances or your own servers with VMs. Both options provide the same services and features. The difference is in the implementation. When you install NetQ software on your own hardware, you create and maintain a KVM or VMware VM, and the software runs from there. This requires you to scope and order an appropriate hardware server to support the NetQ requirements, but might allow you to reuse an existing server in your stock.
When you choose to purchase and install NetQ Cloud Appliances, the initial configuration of the server with Ubuntu OS is already done for you, and the NetQ software components are pre-loaded, saving you time during the physical deployment.
Data Flow
The flow of data differs based on your deployment model.
For the on-premises deployment, the NetQ Agents collect and transmit data from the switches and hosts back to the NetQ On-premises Appliance or virtual machine running the NetQ Platform software, which in turn processes and stores the data in its database. This data is then displayed through the user interface.
For the remote, multi-site NetQ implementation, the NetQ Agents at each premises collect and transmit data from the switches and hosts at that premises to its NetQ Cloud Appliance or virtual machine running the NetQ Collector software. The NetQ Collectors then transmit this data to the common NetQ Cloud Appliance or virtual machine and database at one of your premises for processing and storage.
For the remote, cloud-service implementation, the NetQ Agents collect and transmit data from the switches and hosts to the NetQ Cloud Appliance or virtual machine running the NetQ Collector software. The NetQ Collector then transmits this data to the NVIDIA cloud-based infrastructure for further processing and storage.
For either remote solution, telemetry data is displayed through the same user interfaces as the on-premises solution. When using the cloud service implementation of the remote solution, the browser interface can be pointed to the local NetQ Cloud Appliance or VM, or directly to netq.nvidia.com.
Server Arrangement: Single or Cluster
The next installation step is deciding whether to deploy a single server or a server cluster. Both options provide the same services and features. The biggest difference is the number of servers deployed and the continued availability of services running on those servers should hardware failures occur.
A single server is easier to set up, configure and manage, but can limit your ability to scale your network monitoring quickly. Deploying multiple servers is a bit more complicated, but you limit potential downtime and increase availability by having more than one server that can run the software and store the data. Select the standalone single-server arrangements for smaller, simpler deployments. Be sure to consider the capabilities and resources needed on this server to support the size of your final deployment.
Select the server cluster arrangement to obtain scalability and high availability for your network. The default clustering implementation has three servers: 1 master and 2 workers. However, NetQ supports up to 10 worker nodes in a cluster. When you configure the cluster, configure the NetQ Agents to connect to these three nodes in the cluster first by providing the IP addresses as a comma-separated list. If you decide to add additional nodes to the cluster, you do not need to configure these nodes again.
Cluster Deployments and Kubernetes
NetQ also monitors Kubernetes containers. If the master node ever goes down, all NetQ services should continue to work. However, keep in mind that the master hosts the Kubernetes control plane so anything that requires connectivity with the Kubernetes cluster—such as upgrading NetQ or rescheduling pods to other workers if a worker goes down—will not work.
Cluster Deployments and Load Balancers
You need a load balancer for high availability for the NetQ API and the NetQ UI.
However, you need to be mindful of where you install the certificates for the NetQ UI (port 443); otherwise, you cannot access the NetQ UI.
If you are using a load balancer in your deployment, we recommend you install the certificates directly on the load balancer for SSL offloading. However, if you install the certificates on the master node, then configure the load balancer to allow for SSL passthrough.
Where to Go Next
After you’ve decided on your deployment type, you’re ready to install NetQ.
Install NetQ
The following sections provide installation instruction for the NetQ system and software. To install NetQ:
Set Up Your VMware Virtual Machine for a Single On-premises Server
Follow these steps to set up and configure your VM on a single server in an on-premises deployment:
Verify that your system meets the VM requirements.
Resource
Minimum Requirements
Processor
Sixteen (16) virtual CPUs
Memory
64 GB RAM
Local disk storage
500 GB SSD with minimum disk IOPS of 1000 for a standard 4kb block size (Note: This must be an SSD; use of other storage options can lead to system instability and are not supported.)
Network interface speed
1 Gb NIC
Hypervisor
VMware ESXi™ 6.5 or later (OVA image) for servers running Cumulus Linux, CentOS, Ubuntu, and RedHat operating systems
Confirm that the required ports are open for communications.
You must open the following ports on your NetQ on-premises server:
VMware Example Configuration
This example shows the VM setup process using an OVA file with VMware ESXi.
Enter the address of the hardware in your browser.
Log in to VMware using credentials with root access.
Click Storage in the Navigator to verify you have an SSD installed.
Click Create/Register VM at the top of the right pane.
Select Deploy a virtual machine from an OVF or OVA file, and click Next.
Provide a name for the VM, for example NetQ.
Tip: Make note of the name used during install as this is needed in a later step.
Drag and drop the NetQ Platform image file you downloaded in Step 1 above.
Click Next.
Select the storage type and data store for the image to use, then click Next. In this example, only one is available.
Accept the default deployment options or modify them according to your network needs. Click Next when you are finished.
Review the configuration summary. Click Back to change any of the settings, or click Finish to continue with the creation of the VM.
The progress of the request is shown in the Recent Tasks window at the bottom of the application. This may take some time, so continue with your other work until the upload finishes.
Once completed, view the full details of the VM and hardware.
Log in to the VM and change the password.
Use the default credentials to log in the first time:
Username: cumulus
Password: cumulus
$ ssh cumulus@<ipaddr>
Warning: Permanently added '<ipaddr>' (ECDSA) to the list of known hosts.
Ubuntu 20.04 LTS
cumulus@<ipaddr>'s password:
You are required to change your password immediately (root enforced)
System information as of Thu Dec 3 21:35:42 UTC 2020
System load: 0.09 Processes: 120
Usage of /: 8.1% of 61.86GB Users logged in: 0
Memory usage: 5% IP address for eth0: <ipaddr>
Swap usage: 0%
WARNING: Your password has expired.
You must change your password now and login again!
Changing password for cumulus.
(current) UNIX password: cumulus
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
Connection to <ipaddr> closed.
Log in again with your new password.
$ ssh cumulus@<ipaddr>
Warning: Permanently added '<ipaddr>' (ECDSA) to the list of known hosts.
Ubuntu 20.04 LTS
cumulus@<ipaddr>'s password:
System information as of Thu Dec 3 21:35:59 UTC 2020
System load: 0.07 Processes: 121
Usage of /: 8.1% of 61.86GB Users logged in: 0
Memory usage: 5% IP address for eth0: <ipaddr>
Swap usage: 0%
Last login: Thu Dec 3 21:35:43 2020 from <local-ipaddr>
cumulus@ubuntu:~$
Verify the platform is ready for installation. Fix any errors indicated before installing the NetQ software.
cumulus@hostname:~$ sudo opta-check
Change the hostname for the VM from the default value.
The default hostname for the NetQ Virtual Machines is ubuntu. Change the hostname to fit your naming conventions while meeting Internet and Kubernetes naming standards.
Kubernetes requires that hostnames are composed of a sequence of labels concatenated with dots. For example, “en.wikipedia.org” is a hostname. Each label must be from 1 to 63 characters long. The entire hostname, including the delimiting dots, has a maximum of 253 ASCII characters.
The Internet standards (RFCs) for protocols specify that labels may contain only the ASCII letters a through z (in lower case), the digits 0 through 9, and the hyphen-minus character ('-').
Re-run the install CLI on the appliance. This example uses interface eno1. Replace this with your updated IP address, hostname or interface using the interface or ip-addr option.
cumulus@hostname:~$ netq install standalone full interface eno1 bundle /mnt/installables/NetQ-4.5.0.tgz
If this step fails for any reason, you can run netq bootstrap reset and then try again.
Verify Installation Status
To view the status of the installation, use the netq show status [verbose] command. The following example shows a successful on-premises installation:
State: Active
Version: 4.5.0
Installer Version: 4.5.0
Installation Type: Standalone
Activation Key: PKrgipMGEhVuZXRxLWVuZHBvaW50LWdhdGV3YXkYsagDIixUQmFLTUhzZU80RUdTL3pOT01uQ2lnRnrrUhTbXNPUGRXdnUwTVo5SEpBPTIHZGVmYXVsdDoHbmV0cWRldgz=
Master SSH Public Key: a3NoLXJzYSBBQUFBQjNOemFDMXljMkVBQUFBREFRQUJBQUFCQVFEazliekZDblJUajkvQVhOZ0hteXByTzZIb3Y2cVZBWFdsNVNtKzVrTXo3dmMrcFNZTGlOdWl1bEhZeUZZVDhSNmU3bFdqS3NrSE10bzArNFJsQVd6cnRvbVVzLzlLMzQ4M3pUMjVZQXpIU2N1ZVhBSE1TdTZHZ0JyUkpXYUpTNjJ2RTkzcHBDVjBxWWJvUFo3aGpCY3ozb0VVWnRsU1lqQlZVdjhsVjBNN3JEWW52TXNGSURWLzJ2eks3K0x2N01XTG5aT054S09hdWZKZnVOT0R4YjFLbk1mN0JWK3hURUpLWW1mbTY1ckoyS1ArOEtFUllrr5TkF3bFVRTUdmT3daVHF2RWNoZnpQajMwQ29CWDZZMzVST2hDNmhVVnN5OEkwdjVSV0tCbktrWk81MWlMSDAyZUpJbXJHUGdQa2s1SzhJdGRrQXZISVlTZ0RwRlpRb3Igcm9vdEBucXRzLTEwLTE4OC00NC0xNDc=
Is Cloud: False
Cluster Status:
IP Address Hostname Role Status
------------- ------------- ------ --------
10.188.44.147 10.188.44.147 Role Ready
NetQ... Active
Run the netq show opta-health command to verify all applications are operating properly. Allow 10-15 minutes for all applications to come up and report their status.
If any of the applications or services display Status as DOWN after 30 minutes, open a support ticket and attach the output of the opta-support command.
After NetQ is installed, you can log in to NetQ from your browser.
Set Up Your VMware Virtual Machine for a Single Cloud Server
Follow these steps to set up and configure your VM for a cloud deployment:
Verify that your system meets the VM requirements.
Resource
Minimum Requirements
Processor
Four (4) virtual CPUs
Memory
8 GB RAM
Local disk storage
64 GB
Network interface speed
1 Gb NIC
Hypervisor
VMware ESXi™ 6.5 or later (OVA image) for servers running Cumulus Linux, CentOS, Ubuntu, and RedHat operating systems
Confirm that the required ports are open for communications. The OPTA must be able to initiate HTTPS connections (destination TCP port 443) to the netq.nvidia.com domain (*.netq.nvidia.com). You must also open the following ports on your NetQ OPTA:
VMware Example Configuration
This example shows the VM setup process using an OVA file with VMware ESXi.
Enter the address of the hardware in your browser.
Log in to VMware using credentials with root access.
Click Storage in the Navigator to verify you have an SSD installed.
Click Create/Register VM at the top of the right pane.
Select Deploy a virtual machine from an OVF or OVA file, and click Next.
Provide a name for the VM, for example NetQ.
Tip: Make note of the name used during install as this is needed in a later step.
Drag and drop the NetQ Platform image file you downloaded in Step 1 above.
Click Next.
Select the storage type and data store for the image to use, then click Next. In this example, only one is available.
Accept the default deployment options or modify them according to your network needs. Click Next when you are finished.
Review the configuration summary. Click Back to change any of the settings, or click Finish to continue with the creation of the VM.
The progress of the request is shown in the Recent Tasks window at the bottom of the application. This may take some time, so continue with your other work until the upload finishes.
Once completed, view the full details of the VM and hardware.
Log in to the VM and change the password.
Use the default credentials to log in the first time:
Username: cumulus
Password: cumulus
$ ssh cumulus@<ipaddr>
Warning: Permanently added '<ipaddr>' (ECDSA) to the list of known hosts.
Ubuntu 20.04 LTS
cumulus@<ipaddr>'s password:
You are required to change your password immediately (root enforced)
System information as of Thu Dec 3 21:35:42 UTC 2020
System load: 0.09 Processes: 120
Usage of /: 8.1% of 61.86GB Users logged in: 0
Memory usage: 5% IP address for eth0: <ipaddr>
Swap usage: 0%
WARNING: Your password has expired.
You must change your password now and login again!
Changing password for cumulus.
(current) UNIX password: cumulus
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
Connection to <ipaddr> closed.
Log in again with your new password.
$ ssh cumulus@<ipaddr>
Warning: Permanently added '<ipaddr>' (ECDSA) to the list of known hosts.
Ubuntu 20.04 LTS
cumulus@<ipaddr>'s password:
System information as of Thu Dec 3 21:35:59 UTC 2020
System load: 0.07 Processes: 121
Usage of /: 8.1% of 61.86GB Users logged in: 0
Memory usage: 5% IP address for eth0: <ipaddr>
Swap usage: 0%
Last login: Thu Dec 3 21:35:43 2020 from <local-ipaddr>
cumulus@ubuntu:~$
Verify the platform is ready for installation. Fix any errors indicated before installing the NetQ software.
cumulus@hostname:~$ sudo opta-check-cloud
Change the hostname for the VM from the default value.
The default hostname for the NetQ Virtual Machines is ubuntu. Change the hostname to fit your naming conventions while meeting Internet and Kubernetes naming standards.
Kubernetes requires that hostnames are composed of a sequence of labels concatenated with dots. For example, “en.wikipedia.org” is a hostname. Each label must be from 1 to 63 characters long. The entire hostname, including the delimiting dots, has a maximum of 253 ASCII characters.
The Internet standards (RFCs) for protocols specify that labels may contain only the ASCII letters a through z (in lower case), the digits 0 through 9, and the hyphen-minus character ('-').
Add the same NEW_HOSTNAME value to /etc/hosts on your VM for the localhost entry. Example:
127.0.0.1 localhost NEW_HOSTNAME
The final step is to install and activate the NetQ software using the CLI:
Run the following command on your NetQ Cloud Appliance with the config-key obtained from the email you received from NVIDIA titled NetQ Access Link. You can also obtain the configuration key through the NetQ UI.
You can specify the IP address instead of the interface name here: use ip-addr <IP address> in place of interface <ifname> above.
If you have changed the IP address or hostname of the NetQ OPTA after this step, you need to re-register this address with NetQ as follows:
Reset the VM:
cumulus@hostname:~$ netq bootstrap reset
Re-run the install CLI on the appliance. This example uses interface eno1. Replace this with your updated IP address, hostname or interface using the interface or ip-addr option.
If this step fails for any reason, you can run netq bootstrap reset and then try again.
Consider the following for container environments, and make adjustments as needed.
Calico Networking
NetQ overrides the Calico default address range and changes it to 10.244.0.0/16. To modify this range, use the netq install opta command, specifying the default address range with the pod-ip-range option. For example:
The default Docker bridge interface is disabled in NetQ. If you need to reenable the interface, contact support.
Verify Installation Status
To view the status of the installation, use the netq show status [verbose] command. The following example shows a successful on-premises installation:
State: Active
Version: 4.5.0
Installer Version: 4.5.0
Installation Type: Standalone
Activation Key: PKrgipMGEhVuZXRxLWVuZHBvaW50LWdhdGV3YXkYsagDIixUQmFLTUhzZU80RUdTL3pOT01uQ2lnRnrrUhTbXNPUGRXdnUwTVo5SEpBPTIHZGVmYXVsdDoHbmV0cWRldgz=
Master SSH Public Key: a3NoLXJzYSBBQUFBQjNOemFDMXljMkVBQUFBREFRQUJBQUFCQVFEazliekZDblJUajkvQVhOZ0hteXByTzZIb3Y2cVZBWFdsNVNtKzVrTXo3dmMrcFNZTGlOdWl1bEhZeUZZVDhSNmU3bFdqS3NrSE10bzArNFJsQVd6cnRvbVVzLzlLMzQ4M3pUMjVZQXpIU2N1ZVhBSE1TdTZHZ0JyUkpXYUpTNjJ2RTkzcHBDVjBxWWJvUFo3aGpCY3ozb0VVWnRsU1lqQlZVdjhsVjBNN3JEWW52TXNGSURWLzJ2eks3K0x2N01XTG5aT054S09hdWZKZnVOT0R4YjFLbk1mN0JWK3hURUpLWW1mbTY1ckoyS1ArOEtFUllrr5TkF3bFVRTUdmT3daVHF2RWNoZnpQajMwQ29CWDZZMzVST2hDNmhVVnN5OEkwdjVSV0tCbktrWk81MWlMSDAyZUpJbXJHUGdQa2s1SzhJdGRrQXZISVlTZ0RwRlpRb3Igcm9vdEBucXRzLTEwLTE4OC00NC0xNDc=
Is Cloud: False
Cluster Status:
IP Address Hostname Role Status
------------- ------------- ------ --------
10.188.44.147 10.188.44.147 Role Ready
NetQ... Active
Run the netq show opta-health command to verify all applications are operating properly. Allow 10-15 minutes for all applications to come up and report their status.
If any of the applications or services display Status as DOWN after 30 minutes, open a support ticket and attach the output of the opta-support command.
After NetQ is installed, you can log in to NetQ from your browser.
Set Up Your VMware Virtual Machine for an On-premises Server Cluster
First configure the VM on the master node, and then configure the VM on each worker node.
Follow these steps to set up and configure your VM cluster for an on-premises deployment:
Verify that your master node meets the VM requirements.
Resource
Minimum Requirements
Processor
Sixteen (16) virtual CPUs
Memory
64 GB RAM
Local disk storage
500 GB SSD with minimum disk IOPS of 1000 for a standard 4kb block size (Note: This must be an SSD; use of other storage options can lead to system instability and are not supported.)
Network interface speed
1 Gb NIC
Hypervisor
VMware ESXi™ 6.5 or later (OVA image) for servers running Cumulus Linux, CentOS, Ubuntu, and RedHat operating systems
Confirm that the required ports are open for communications.
You must open the following ports on your NetQ on-premises servers:
Port or Protocol Number
Protocol
Component Access
4
IP Protocol
Calico networking (IP-in-IP Protocol)
22
TCP
SSH
80
TCP
Nginx
179
TCP
Calico networking (BGP)
443
TCP
NetQ UI
2379
TCP
etcd datastore
4789
UDP
Calico networking (VxLAN)
5000
TCP
Docker registry
6443
TCP
kube-apiserver
30001
TCP
DPU communication
31980
TCP
NetQ Agent communication
31982
TCP
NetQ Agent SSL communication
32708
TCP
API Gateway
Additionally, for internal cluster communication, you must open these ports:
VMware Example Configuration
This example shows the VM setup process using an OVA file with VMware ESXi.
Enter the address of the hardware in your browser.
Log in to VMware using credentials with root access.
Click Storage in the Navigator to verify you have an SSD installed.
Click Create/Register VM at the top of the right pane.
Select Deploy a virtual machine from an OVF or OVA file, and click Next.
Provide a name for the VM, for example NetQ.
Tip: Make note of the name used during install as this is needed in a later step.
Drag and drop the NetQ Platform image file you downloaded in Step 1 above.
Click Next.
Select the storage type and data store for the image to use, then click Next. In this example, only one is available.
Accept the default deployment options or modify them according to your network needs. Click Next when you are finished.
Review the configuration summary. Click Back to change any of the settings, or click Finish to continue with the creation of the VM.
The progress of the request is shown in the Recent Tasks window at the bottom of the application. This may take some time, so continue with your other work until the upload finishes.
Once completed, view the full details of the VM and hardware.
Log in to the VM and change the password.
Use the default credentials to log in the first time:
Username: cumulus
Password: cumulus
$ ssh cumulus@<ipaddr>
Warning: Permanently added '<ipaddr>' (ECDSA) to the list of known hosts.
Ubuntu 20.04 LTS
cumulus@<ipaddr>'s password:
You are required to change your password immediately (root enforced)
System information as of Thu Dec 3 21:35:42 UTC 2020
System load: 0.09 Processes: 120
Usage of /: 8.1% of 61.86GB Users logged in: 0
Memory usage: 5% IP address for eth0: <ipaddr>
Swap usage: 0%
WARNING: Your password has expired.
You must change your password now and login again!
Changing password for cumulus.
(current) UNIX password: cumulus
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
Connection to <ipaddr> closed.
Log in again with your new password.
$ ssh cumulus@<ipaddr>
Warning: Permanently added '<ipaddr>' (ECDSA) to the list of known hosts.
Ubuntu 20.04 LTS
cumulus@<ipaddr>'s password:
System information as of Thu Dec 3 21:35:59 UTC 2020
System load: 0.07 Processes: 121
Usage of /: 8.1% of 61.86GB Users logged in: 0
Memory usage: 5% IP address for eth0: <ipaddr>
Swap usage: 0%
Last login: Thu Dec 3 21:35:43 2020 from <local-ipaddr>
cumulus@ubuntu:~$
Verify the master node is ready for installation. Fix any errors indicated before installing the NetQ software.
cumulus@hostname:~$ sudo opta-check
Change the hostname for the VM from the default value.
The default hostname for the NetQ Virtual Machines is ubuntu. Change the hostname to fit your naming conventions while meeting Internet and Kubernetes naming standards.
Kubernetes requires that hostnames are composed of a sequence of labels concatenated with dots. For example, “en.wikipedia.org” is a hostname. Each label must be from 1 to 63 characters long. The entire hostname, including the delimiting dots, has a maximum of 253 ASCII characters.
The Internet standards (RFCs) for protocols specify that labels may contain only the ASCII letters a through z (in lower case), the digits 0 through 9, and the hyphen-minus character ('-').
Add the same NEW_HOSTNAME value to /etc/hosts on your VM for the localhost entry. Example:
127.0.0.1 localhost NEW_HOSTNAME
Verify that your first worker node meets the VM requirements, as described in Step 1.
Confirm that the needed ports are open for communications, as described in Step 2.
Open your hypervisor and set up the VM in the same manner as for the master node.
Make a note of the private IP address you assign to the worker node. You need it for later installation steps.
Verify the worker node is ready for installation. Fix any errors indicated before installing the NetQ software.
cumulus@hostname:~$ sudo opta-check-cloud
Repeat Steps 8 through 11 for each additional worker node you want in your cluster.
The final step is to install and activate the NetQ software using the CLI:
Run the following command on your master node to initialize the cluster. Copy the output of the command to use on your worker nodes:
cumulus@<hostname>:~$ netq install cluster master-init
Please run the following command on all worker nodes:
netq install cluster worker-init c3NoLXJzYSBBQUFBQjNOemFDMXljMkVBQUFBREFRQUJBQUFCQVFDM2NjTTZPdVVUWWJ5c2Q3NlJ4SHdseHBsOHQ4N2VMRWVGR05LSWFWVnVNcy94OEE4RFNMQVhKOHVKRjVLUXBnVjdKM2lnMGJpL2hDMVhmSVVjU3l3ZmhvVDVZM3dQN1oySVZVT29ZTi8vR1lOek5nVlNocWZQMDNDRW0xNnNmSzVvUWRQTzQzRFhxQ3NjbndIT3dwZmhRYy9MWTU1a
Run the netq install cluster worker-init <ssh-key> on each of your worker nodes.
Run the following commands on your master node, using the IP addresses of your worker nodes:
Re-run the install CLI on the appliance. This example uses interface eno1. Replace this with your updated IP address, hostname or interface using the interface or ip-addr option.
cumulus@hostname:~$ netq install standalone full interface eno1 bundle /mnt/installables/NetQ-4.5.0.tgz
If this step fails for any reason, you can run netq bootstrap reset and then try again.
Verify Installation Status
To view the status of the installation, use the netq show status [verbose] command. The following example shows a successful on-premises installation:
State: Active
Version: 4.5.0
Installer Version: 4.5.0
Installation Type: Standalone
Activation Key: PKrgipMGEhVuZXRxLWVuZHBvaW50LWdhdGV3YXkYsagDIixUQmFLTUhzZU80RUdTL3pOT01uQ2lnRnrrUhTbXNPUGRXdnUwTVo5SEpBPTIHZGVmYXVsdDoHbmV0cWRldgz=
Master SSH Public Key: a3NoLXJzYSBBQUFBQjNOemFDMXljMkVBQUFBREFRQUJBQUFCQVFEazliekZDblJUajkvQVhOZ0hteXByTzZIb3Y2cVZBWFdsNVNtKzVrTXo3dmMrcFNZTGlOdWl1bEhZeUZZVDhSNmU3bFdqS3NrSE10bzArNFJsQVd6cnRvbVVzLzlLMzQ4M3pUMjVZQXpIU2N1ZVhBSE1TdTZHZ0JyUkpXYUpTNjJ2RTkzcHBDVjBxWWJvUFo3aGpCY3ozb0VVWnRsU1lqQlZVdjhsVjBNN3JEWW52TXNGSURWLzJ2eks3K0x2N01XTG5aT054S09hdWZKZnVOT0R4YjFLbk1mN0JWK3hURUpLWW1mbTY1ckoyS1ArOEtFUllrr5TkF3bFVRTUdmT3daVHF2RWNoZnpQajMwQ29CWDZZMzVST2hDNmhVVnN5OEkwdjVSV0tCbktrWk81MWlMSDAyZUpJbXJHUGdQa2s1SzhJdGRrQXZISVlTZ0RwRlpRb3Igcm9vdEBucXRzLTEwLTE4OC00NC0xNDc=
Is Cloud: False
Cluster Status:
IP Address Hostname Role Status
------------- ------------- ------ --------
10.188.44.147 10.188.44.147 Role Ready
NetQ... Active
Run the netq show opta-health command to verify all applications are operating properly. Allow 10-15 minutes for all applications to come up and report their status.
If any of the applications or services display Status as DOWN after 30 minutes, open a support ticket and attach the output of the opta-support command.
After NetQ is installed, you can log in to NetQ from your browser.
Set Up Your VMware Virtual Machine for a Cloud Server Cluster
First configure the VM on the master node, and then configure the VM on each worker node.
Follow these steps to set up and configure your VM on a cluster of servers in a cloud deployment:
Verify that your master node meets the VM requirements.
Resource
Minimum Requirements
Processor
Four (4) virtual CPUs
Memory
8 GB RAM
Local disk storage
64 GB
Network interface speed
1 Gb NIC
Hypervisor
VMware ESXi™ 6.5 or later (OVA image) for servers running Cumulus Linux, CentOS, Ubuntu, and RedHat operating systems
Confirm that the required ports are open for communications. The OPTA must be able to initiate HTTPS connections (destination TCP port 443) to the netq.nvidia.com domain (*.netq.nvidia.com). You must also open the following ports on your NetQ OPTA:
Port or Protocol Number
Protocol
Component Access
4
IP Protocol
Calico networking (IP-in-IP Protocol)
22
TCP
SSH
80
TCP
Nginx
179
TCP
Calico networking (BGP)
443
TCP
Nginx
2379
TCP
etcd datastore
4789
UDP
Calico networking (VxLAN)
5000
TCP
Docker registry
6443
TCP
kube-apiserver
31980
TCP
NetQ Agent communication
31982
TCP
NetQ Agent SSL communication
32708
TCP
API Gateway
The following ports are used for internal cluster communication and must also be open between servers in your cluster:
VMware Example Configuration
This example shows the VM setup process using an OVA file with VMware ESXi.
Enter the address of the hardware in your browser.
Log in to VMware using credentials with root access.
Click Storage in the Navigator to verify you have an SSD installed.
Click Create/Register VM at the top of the right pane.
Select Deploy a virtual machine from an OVF or OVA file, and click Next.
Provide a name for the VM, for example NetQ.
Tip: Make note of the name used during install as this is needed in a later step.
Drag and drop the NetQ Platform image file you downloaded in Step 1 above.
Click Next.
Select the storage type and data store for the image to use, then click Next. In this example, only one is available.
Accept the default deployment options or modify them according to your network needs. Click Next when you are finished.
Review the configuration summary. Click Back to change any of the settings, or click Finish to continue with the creation of the VM.
The progress of the request is shown in the Recent Tasks window at the bottom of the application. This may take some time, so continue with your other work until the upload finishes.
Once completed, view the full details of the VM and hardware.
Log in to the VM and change the password.
Use the default credentials to log in the first time:
Username: cumulus
Password: cumulus
$ ssh cumulus@<ipaddr>
Warning: Permanently added '<ipaddr>' (ECDSA) to the list of known hosts.
Ubuntu 20.04 LTS
cumulus@<ipaddr>'s password:
You are required to change your password immediately (root enforced)
System information as of Thu Dec 3 21:35:42 UTC 2020
System load: 0.09 Processes: 120
Usage of /: 8.1% of 61.86GB Users logged in: 0
Memory usage: 5% IP address for eth0: <ipaddr>
Swap usage: 0%
WARNING: Your password has expired.
You must change your password now and login again!
Changing password for cumulus.
(current) UNIX password: cumulus
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
Connection to <ipaddr> closed.
Log in again with your new password.
$ ssh cumulus@<ipaddr>
Warning: Permanently added '<ipaddr>' (ECDSA) to the list of known hosts.
Ubuntu 20.04 LTS
cumulus@<ipaddr>'s password:
System information as of Thu Dec 3 21:35:59 UTC 2020
System load: 0.07 Processes: 121
Usage of /: 8.1% of 61.86GB Users logged in: 0
Memory usage: 5% IP address for eth0: <ipaddr>
Swap usage: 0%
Last login: Thu Dec 3 21:35:43 2020 from <local-ipaddr>
cumulus@ubuntu:~$
Verify the master node is ready for installation. Fix any errors indicated before installing the NetQ software.
cumulus@hostname:~$ sudo opta-check-cloud
Change the hostname for the VM from the default value.
The default hostname for the NetQ Virtual Machines is ubuntu. Change the hostname to fit your naming conventions while meeting Internet and Kubernetes naming standards.
Kubernetes requires that hostnames are composed of a sequence of labels concatenated with dots. For example, “en.wikipedia.org” is a hostname. Each label must be from 1 to 63 characters long. The entire hostname, including the delimiting dots, has a maximum of 253 ASCII characters.
The Internet standards (RFCs) for protocols specify that labels may contain only the ASCII letters a through z (in lower case), the digits 0 through 9, and the hyphen-minus character ('-').
Add the same NEW_HOSTNAME value to /etc/hosts on your VM for the localhost entry. Example:
127.0.0.1 localhost NEW_HOSTNAME
Verify that your first worker node meets the VM requirements, as described in Step 1.
Confirm that the needed ports are open for communications, as described in Step 2.
Open your hypervisor and set up the VM in the same manner as for the master node.
Make a note of the private IP address you assign to the worker node. You need it for later installation steps.
Verify the worker node is ready for installation. Fix any errors indicated before installing the NetQ software.
cumulus@hostname:~$ sudo opta-check-cloud
Repeat Steps 8 through 11 for each additional worker node you want in your cluster.
The final step is to install and activate the NetQ software using the CLI:
Run the following command on your master node to initialize the cluster. Copy the output of the command to use on your worker nodes:
cumulus@<hostname>:~$ netq install cluster master-init
Please run the following command on all worker nodes:
netq install cluster worker-init c3NoLXJzYSBBQUFBQjNOemFDMXljMkVBQUFBREFRQUJBQUFCQVFDM2NjTTZPdVVUWWJ5c2Q3NlJ4SHdseHBsOHQ4N2VMRWVGR05LSWFWVnVNcy94OEE4RFNMQVhKOHVKRjVLUXBnVjdKM2lnMGJpL2hDMVhmSVVjU3l3ZmhvVDVZM3dQN1oySVZVT29ZTi8vR1lOek5nVlNocWZQMDNDRW0xNnNmSzVvUWRQTzQzRFhxQ3NjbndIT3dwZmhRYy9MWTU1a
Run the netq install cluster worker-init <ssh-key> on each of your worker nodes.
Run the following command on your NetQ Cloud Appliance with the config-key obtained from the email you received from NVIDIA titled NetQ Access Link. You can also obtain the configuration key through the NetQ UI in the premise management configuration.
You can specify the IP address instead of the interface name here: use ip-addr <IP address> in place of interface <ifname> above.
If you have changed the IP address or hostname of the NetQ OPTA after this step, you need to re-register this address with NetQ as follows:
Reset the VM:
cumulus@hostname:~$ netq bootstrap reset
Re-run the install CLI on the appliance. This example uses interface eth0. Replace this with your updated IP address, hostname or interface using the interface or ip-addr option.
If this step fails for any reason, you can run netq bootstrap reset and then try again.
Consider the following for container environments, and make adjustments as needed.
Calico Networking
NetQ overrides the Calico default address range and changes it to 10.244.0.0/16. To modify this range, use the netq install opta command, specifying the default address range with the pod-ip-range option. For example:
The default Docker bridge interface is disabled in NetQ. If you need to reenable the interface, contact support.
Verify Installation Status
To view the status of the installation, use the netq show status [verbose] command. The following example shows a successful on-premises installation:
State: Active
Version: 4.5.0
Installer Version: 4.5.0
Installation Type: Standalone
Activation Key: PKrgipMGEhVuZXRxLWVuZHBvaW50LWdhdGV3YXkYsagDIixUQmFLTUhzZU80RUdTL3pOT01uQ2lnRnrrUhTbXNPUGRXdnUwTVo5SEpBPTIHZGVmYXVsdDoHbmV0cWRldgz=
Master SSH Public Key: a3NoLXJzYSBBQUFBQjNOemFDMXljMkVBQUFBREFRQUJBQUFCQVFEazliekZDblJUajkvQVhOZ0hteXByTzZIb3Y2cVZBWFdsNVNtKzVrTXo3dmMrcFNZTGlOdWl1bEhZeUZZVDhSNmU3bFdqS3NrSE10bzArNFJsQVd6cnRvbVVzLzlLMzQ4M3pUMjVZQXpIU2N1ZVhBSE1TdTZHZ0JyUkpXYUpTNjJ2RTkzcHBDVjBxWWJvUFo3aGpCY3ozb0VVWnRsU1lqQlZVdjhsVjBNN3JEWW52TXNGSURWLzJ2eks3K0x2N01XTG5aT054S09hdWZKZnVOT0R4YjFLbk1mN0JWK3hURUpLWW1mbTY1ckoyS1ArOEtFUllrr5TkF3bFVRTUdmT3daVHF2RWNoZnpQajMwQ29CWDZZMzVST2hDNmhVVnN5OEkwdjVSV0tCbktrWk81MWlMSDAyZUpJbXJHUGdQa2s1SzhJdGRrQXZISVlTZ0RwRlpRb3Igcm9vdEBucXRzLTEwLTE4OC00NC0xNDc=
Is Cloud: False
Cluster Status:
IP Address Hostname Role Status
------------- ------------- ------ --------
10.188.44.147 10.188.44.147 Role Ready
NetQ... Active
Run the netq show opta-health command to verify all applications are operating properly. Allow 10-15 minutes for all applications to come up and report their status.
If any of the applications or services display Status as DOWN after 30 minutes, open a support ticket and attach the output of the opta-support command.
After NetQ is installed, you can log in to NetQ from your browser.
Set Up Your KVM Virtual Machine for a Single On-premises Server
Follow these steps to set up and configure your VM on a single server in an on-premises deployment:
Verify that your system meets the VM requirements.
Resource
Minimum Requirements
Processor
Sixteen (16) virtual CPUs
Memory
64 GB RAM
Local disk storage
500 GB SSD with minimum disk IOPS of 1000 for a standard 4kb block size (Note: This must be an SSD; use of other storage options can lead to system instability and are not supported.)
Network interface speed
1 Gb NIC
Hypervisor
KVM/QCOW (QEMU Copy on Write) image for servers running CentOS, Ubuntu, and RedHat operating systems
Confirm that the required ports are open for communications.
You must open the following ports on your NetQ on-premises server:
Copy the QCOW2 image to a directory where you want to run it.
Tip: Copy, instead of moving, the original QCOW2 image that was downloaded to avoid re-downloading it again later should you need to perform this process again.
Replace the disk path value with the location where the QCOW2 image is to reside. Replace network model value (eth0 in the above example) with the name of the interface where the VM is connected to the external network.
Or, for a Bridged VM, where the VM attaches to a bridge which has already been setup to allow for external access:
Replace network bridge value (br0 in the above example) with the name of the (pre-existing) bridge interface where the VM is connected to the external network.
Make note of the name used during install as this is needed in a later step.
Watch the boot process in another terminal window.
$ virsh console netq_ts
Log into the VM and change the password.
Use the default credentials to log in the first time:
Username: cumulus
Password: cumulus
$ ssh cumulus@<ipaddr>
Warning: Permanently added '<ipaddr>' (ECDSA) to the list of known hosts.
Ubuntu 20.04 LTS
cumulus@<ipaddr>'s password:
You are required to change your password immediately (root enforced)
System information as of Thu Dec 3 21:35:42 UTC 2020
System load: 0.09 Processes: 120
Usage of /: 8.1% of 61.86GB Users logged in: 0
Memory usage: 5% IP address for eth0: <ipaddr>
Swap usage: 0%
WARNING: Your password has expired.
You must change your password now and login again!
Changing password for cumulus.
(current) UNIX password: cumulus
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
Connection to <ipaddr> closed.
Log in again with your new password.
$ ssh cumulus@<ipaddr>
Warning: Permanently added '<ipaddr>' (ECDSA) to the list of known hosts.
Ubuntu 20.04 LTS
cumulus@<ipaddr>'s password:
System information as of Thu Dec 3 21:35:59 UTC 2020
System load: 0.07 Processes: 121
Usage of /: 8.1% of 61.86GB Users logged in: 0
Memory usage: 5% IP address for eth0: <ipaddr>
Swap usage: 0%
Last login: Thu Dec 3 21:35:43 2020 from <local-ipaddr>
cumulus@ubuntu:~$
Verify the platform is ready for installation. Fix any errors indicated before installing the NetQ software.
cumulus@hostname:~$ sudo opta-check
Change the hostname for the VM from the default value.
The default hostname for the NetQ Virtual Machines is ubuntu. Change the hostname to fit your naming conventions while meeting Internet and Kubernetes naming standards.
Kubernetes requires that hostnames are composed of a sequence of labels concatenated with dots. For example, “en.wikipedia.org” is a hostname. Each label must be from 1 to 63 characters long. The entire hostname, including the delimiting dots, has a maximum of 253 ASCII characters.
The Internet standards (RFCs) for protocols specify that labels may contain only the ASCII letters a through z (in lower case), the digits 0 through 9, and the hyphen-minus character ('-').
Re-run the install CLI on the appliance. This example uses interface eno1. Replace this with your updated IP address, hostname or interface using the interface or ip-addr option.
cumulus@hostname:~$ netq install standalone full interface eno1 bundle /mnt/installables/NetQ-4.5.0.tgz
If this step fails for any reason, you can run netq bootstrap reset and then try again.
Verify Installation Status
To view the status of the installation, use the netq show status [verbose] command. The following example shows a successful on-premises installation:
State: Active
Version: 4.5.0
Installer Version: 4.5.0
Installation Type: Standalone
Activation Key: PKrgipMGEhVuZXRxLWVuZHBvaW50LWdhdGV3YXkYsagDIixUQmFLTUhzZU80RUdTL3pOT01uQ2lnRnrrUhTbXNPUGRXdnUwTVo5SEpBPTIHZGVmYXVsdDoHbmV0cWRldgz=
Master SSH Public Key: a3NoLXJzYSBBQUFBQjNOemFDMXljMkVBQUFBREFRQUJBQUFCQVFEazliekZDblJUajkvQVhOZ0hteXByTzZIb3Y2cVZBWFdsNVNtKzVrTXo3dmMrcFNZTGlOdWl1bEhZeUZZVDhSNmU3bFdqS3NrSE10bzArNFJsQVd6cnRvbVVzLzlLMzQ4M3pUMjVZQXpIU2N1ZVhBSE1TdTZHZ0JyUkpXYUpTNjJ2RTkzcHBDVjBxWWJvUFo3aGpCY3ozb0VVWnRsU1lqQlZVdjhsVjBNN3JEWW52TXNGSURWLzJ2eks3K0x2N01XTG5aT054S09hdWZKZnVOT0R4YjFLbk1mN0JWK3hURUpLWW1mbTY1ckoyS1ArOEtFUllrr5TkF3bFVRTUdmT3daVHF2RWNoZnpQajMwQ29CWDZZMzVST2hDNmhVVnN5OEkwdjVSV0tCbktrWk81MWlMSDAyZUpJbXJHUGdQa2s1SzhJdGRrQXZISVlTZ0RwRlpRb3Igcm9vdEBucXRzLTEwLTE4OC00NC0xNDc=
Is Cloud: False
Cluster Status:
IP Address Hostname Role Status
------------- ------------- ------ --------
10.188.44.147 10.188.44.147 Role Ready
NetQ... Active
Run the netq show opta-health command to verify all applications are operating properly. Allow 10-15 minutes for all applications to come up and report their status.
If any of the applications or services display Status as DOWN after 30 minutes, open a support ticket and attach the output of the opta-support command.
After NetQ is installed, you can log in to NetQ from your browser.
Set Up Your KVM Virtual Machine for a Single Cloud Server
Follow these steps to set up and configure your VM on a single server in a cloud deployment:
Verify that your system meets the VM requirements.
Resource
Minimum Requirements
Processor
Four (4) virtual CPUs
Memory
8 GB RAM
Local disk storage
64 GB
Network interface speed
1 Gb NIC
Hypervisor
KVM/QCOW (QEMU Copy on Write) image for servers running CentOS, Ubuntu, and RedHat operating systems
Confirm that the required ports are open for communications. The OPTA must be able to initiate HTTPS connections (destination TCP port 443) to the netq.nvidia.com domain (*.netq.nvidia.com). You must also open the following ports on your NetQ OPTA:
Copy the QCOW2 image to a directory where you want to run it.
Tip: Copy, instead of moving, the original QCOW2 image that was downloaded to avoid re-downloading it again later should you need to perform this process again.
Replace the disk path value with the location where the QCOW2 image is to reside. Replace network model value (eth0 in the above example) with the name of the interface where the VM is connected to the external network.
Or, for a Bridged VM, where the VM attaches to a bridge which has already been setup to allow for external access:
Replace network bridge value (br0 in the above example) with the name of the (pre-existing) bridge interface where the VM is connected to the external network.
Make note of the name used during install as this is needed in a later step.
Watch the boot process in another terminal window.
$ virsh console netq_ts
Log in to the VM and change the password.
Use the default credentials to log in the first time:
Username: cumulus
Password: cumulus
$ ssh cumulus@<ipaddr>
Warning: Permanently added '<ipaddr>' (ECDSA) to the list of known hosts.
Ubuntu 20.04 LTS
cumulus@<ipaddr>'s password:
You are required to change your password immediately (root enforced)
System information as of Thu Dec 3 21:35:42 UTC 2020
System load: 0.09 Processes: 120
Usage of /: 8.1% of 61.86GB Users logged in: 0
Memory usage: 5% IP address for eth0: <ipaddr>
Swap usage: 0%
WARNING: Your password has expired.
You must change your password now and login again!
Changing password for cumulus.
(current) UNIX password: cumulus
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
Connection to <ipaddr> closed.
Log in again with your new password.
$ ssh cumulus@<ipaddr>
Warning: Permanently added '<ipaddr>' (ECDSA) to the list of known hosts.
Ubuntu 20.04 LTS
cumulus@<ipaddr>'s password:
System information as of Thu Dec 3 21:35:59 UTC 2020
System load: 0.07 Processes: 121
Usage of /: 8.1% of 61.86GB Users logged in: 0
Memory usage: 5% IP address for eth0: <ipaddr>
Swap usage: 0%
Last login: Thu Dec 3 21:35:43 2020 from <local-ipaddr>
cumulus@ubuntu:~$
Verify the platform is ready for installation. Fix any errors indicated before installing the NetQ software.
cumulus@hostname:~$ sudo opta-check-cloud
Change the hostname for the VM from the default value.
The default hostname for the NetQ Virtual Machines is ubuntu. Change the hostname to fit your naming conventions while meeting Internet and Kubernetes naming standards.
Kubernetes requires that hostnames are composed of a sequence of labels concatenated with dots. For example, “en.wikipedia.org” is a hostname. Each label must be from 1 to 63 characters long. The entire hostname, including the delimiting dots, has a maximum of 253 ASCII characters.
The Internet standards (RFCs) for protocols specify that labels may contain only the ASCII letters a through z (in lower case), the digits 0 through 9, and the hyphen-minus character ('-').
Add the same NEW_HOSTNAME value to /etc/hosts on your VM for the localhost entry. Example:
127.0.0.1 localhost NEW_HOSTNAME
The final step is to install and activate the NetQ software using the CLI:
Run the following command on your NetQ Cloud Appliance with the config-key obtained from the email you received from NVIDIA titled NetQ Access Link. You can also obtain the configuration key through the NetQ UI.
You can specify the IP address instead of the interface name here: use ip-addr <IP address> in place of interface <ifname> above.
If you have changed the IP address or hostname of the NetQ OPTA after this step, you need to re-register this address with NetQ as follows:
Reset the VM:
cumulus@hostname:~$ netq bootstrap reset
Re-run the install CLI on the appliance. This example uses interface eno1. Replace this with your updated IP address, hostname or interface using the interface or ip-addr option.
If this step fails for any reason, you can run netq bootstrap reset and then try again.
Consider the following for container environments, and make adjustments as needed.
Calico Networking
NetQ overrides the Calico default address range and changes it to 10.244.0.0/16. To modify this range, use the netq install opta command, specifying the default address range with the pod-ip-range option. For example:
The default Docker bridge interface is disabled in NetQ. If you need to reenable the interface, contact support.
Verify Installation Status
To view the status of the installation, use the netq show status [verbose] command. The following example shows a successful on-premises installation:
State: Active
Version: 4.5.0
Installer Version: 4.5.0
Installation Type: Standalone
Activation Key: PKrgipMGEhVuZXRxLWVuZHBvaW50LWdhdGV3YXkYsagDIixUQmFLTUhzZU80RUdTL3pOT01uQ2lnRnrrUhTbXNPUGRXdnUwTVo5SEpBPTIHZGVmYXVsdDoHbmV0cWRldgz=
Master SSH Public Key: a3NoLXJzYSBBQUFBQjNOemFDMXljMkVBQUFBREFRQUJBQUFCQVFEazliekZDblJUajkvQVhOZ0hteXByTzZIb3Y2cVZBWFdsNVNtKzVrTXo3dmMrcFNZTGlOdWl1bEhZeUZZVDhSNmU3bFdqS3NrSE10bzArNFJsQVd6cnRvbVVzLzlLMzQ4M3pUMjVZQXpIU2N1ZVhBSE1TdTZHZ0JyUkpXYUpTNjJ2RTkzcHBDVjBxWWJvUFo3aGpCY3ozb0VVWnRsU1lqQlZVdjhsVjBNN3JEWW52TXNGSURWLzJ2eks3K0x2N01XTG5aT054S09hdWZKZnVOT0R4YjFLbk1mN0JWK3hURUpLWW1mbTY1ckoyS1ArOEtFUllrr5TkF3bFVRTUdmT3daVHF2RWNoZnpQajMwQ29CWDZZMzVST2hDNmhVVnN5OEkwdjVSV0tCbktrWk81MWlMSDAyZUpJbXJHUGdQa2s1SzhJdGRrQXZISVlTZ0RwRlpRb3Igcm9vdEBucXRzLTEwLTE4OC00NC0xNDc=
Is Cloud: False
Cluster Status:
IP Address Hostname Role Status
------------- ------------- ------ --------
10.188.44.147 10.188.44.147 Role Ready
NetQ... Active
Run the netq show opta-health command to verify all applications are operating properly. Allow 10-15 minutes for all applications to come up and report their status.
If any of the applications or services display Status as DOWN after 30 minutes, open a support ticket and attach the output of the opta-support command.
After NetQ is installed, you can log in to NetQ from your browser.
Set Up Your KVM Virtual Machine for an On-premises Server Cluster
First configure the VM on the master node, and then configure the VM on each worker node.
Follow these steps to set up and configure your VM on a cluster of servers in an on-premises deployment:
Verify that your master node meets the VM requirements.
Resource
Minimum Requirements
Processor
Sixteen (16) virtual CPUs
Memory
64 GB RAM
Local disk storage
500 GB SSD with minimum disk IOPS of 1000 for a standard 4kb block size (Note: This must be an SSD; use of other storage options can lead to system instability and are not supported.)
Network interface speed
1 Gb NIC
Hypervisor
KVM/QCOW (QEMU Copy on Write) image for servers running CentOS, Ubuntu, and RedHat operating systems
Confirm that the required ports are open for communications.
You must open the following ports on your NetQ on-premises servers:
Port or Protocol Number
Protocol
Component Access
4
IP Protocol
Calico networking (IP-in-IP Protocol)
22
TCP
SSH
80
TCP
Nginx
179
TCP
Calico networking (BGP)
443
TCP
NetQ UI
2379
TCP
etcd datastore
4789
UDP
Calico networking (VxLAN)
5000
TCP
Docker registry
6443
TCP
kube-apiserver
30001
TCP
DPU communication
31980
TCP
NetQ Agent communication
31982
TCP
NetQ Agent SSL communication
32708
TCP
API Gateway
Additionally, for internal cluster communication, you must open these ports:
Copy the QCOW2 image to a directory where you want to run it.
Tip: Copy, instead of moving, the original QCOW2 image that was downloaded to avoid re-downloading it again later should you need to perform this process again.
Replace the disk path value with the location where the QCOW2 image is to reside. Replace network model value (eth0 in the above example) with the name of the interface where the VM is connected to the external network.
Or, for a Bridged VM, where the VM attaches to a bridge which has already been setup to allow for external access:
Replace network bridge value (br0 in the above example) with the name of the (pre-existing) bridge interface where the VM is connected to the external network.
Make note of the name used during install as this is needed in a later step.
Watch the boot process in another terminal window.
$ virsh console netq_ts
Log in to the VM and change the password.
Use the default credentials to log in the first time:
Username: cumulus
Password: cumulus
$ ssh cumulus@<ipaddr>
Warning: Permanently added '<ipaddr>' (ECDSA) to the list of known hosts.
Ubuntu 20.04 LTS
cumulus@<ipaddr>'s password:
You are required to change your password immediately (root enforced)
System information as of Thu Dec 3 21:35:42 UTC 2020
System load: 0.09 Processes: 120
Usage of /: 8.1% of 61.86GB Users logged in: 0
Memory usage: 5% IP address for eth0: <ipaddr>
Swap usage: 0%
WARNING: Your password has expired.
You must change your password now and login again!
Changing password for cumulus.
(current) UNIX password: cumulus
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
Connection to <ipaddr> closed.
Log in again with your new password.
$ ssh cumulus@<ipaddr>
Warning: Permanently added '<ipaddr>' (ECDSA) to the list of known hosts.
Ubuntu 20.04 LTS
cumulus@<ipaddr>'s password:
System information as of Thu Dec 3 21:35:59 UTC 2020
System load: 0.07 Processes: 121
Usage of /: 8.1% of 61.86GB Users logged in: 0
Memory usage: 5% IP address for eth0: <ipaddr>
Swap usage: 0%
Last login: Thu Dec 3 21:35:43 2020 from <local-ipaddr>
cumulus@ubuntu:~$
Verify the master node is ready for installation. Fix any errors indicated before installing the NetQ software.
cumulus@hostname:~$ sudo opta-check
Change the hostname for the VM from the default value.
The default hostname for the NetQ Virtual Machines is ubuntu. Change the hostname to fit your naming conventions while meeting Internet and Kubernetes naming standards.
Kubernetes requires that hostnames are composed of a sequence of labels concatenated with dots. For example, “en.wikipedia.org” is a hostname. Each label must be from 1 to 63 characters long. The entire hostname, including the delimiting dots, has a maximum of 253 ASCII characters.
The Internet standards (RFCs) for protocols specify that labels may contain only the ASCII letters a through z (in lower case), the digits 0 through 9, and the hyphen-minus character ('-').
Add the same NEW_HOSTNAME value to /etc/hosts on your VM for the localhost entry. Example:
127.0.0.1 localhost NEW_HOSTNAME
Verify that your first worker node meets the VM requirements, as described in Step 1.
Confirm that the needed ports are open for communications, as described in Step 2.
Open your hypervisor and set up the VM in the same manner as for the master node.
Make a note of the private IP address you assign to the worker node. You need it for later installation steps.
Verify the worker node is ready for installation. Fix any errors indicated before installing the NetQ software.
cumulus@hostname:~$ sudo opta-check
Repeat Steps 8 through 11 for each additional worker node you want in your cluster.
The final step is to install and activate the NetQ software using the CLI:
Run the following command on your master node to initialize the cluster. Copy the output of the command to use on your worker nodes:
cumulus@<hostname>:~$ netq install cluster master-init
Please run the following command on all worker nodes:
netq install cluster worker-init c3NoLXJzYSBBQUFBQjNOemFDMXljMkVBQUFBREFRQUJBQUFCQVFDM2NjTTZPdVVUWWJ5c2Q3NlJ4SHdseHBsOHQ4N2VMRWVGR05LSWFWVnVNcy94OEE4RFNMQVhKOHVKRjVLUXBnVjdKM2lnMGJpL2hDMVhmSVVjU3l3ZmhvVDVZM3dQN1oySVZVT29ZTi8vR1lOek5nVlNocWZQMDNDRW0xNnNmSzVvUWRQTzQzRFhxQ3NjbndIT3dwZmhRYy9MWTU1a
Run the netq install cluster worker-init <ssh-key> on each of your worker nodes.
Run the following commands on your master node, using the IP addresses of your worker nodes:
Re-run the install CLI on the appliance. This example uses interface eno1. Replace this with your updated IP address, hostname or interface using the interface or ip-addr option.
cumulus@hostname:~$ netq install standalone full interface eno1 bundle /mnt/installables/NetQ-4.5.0.tgz
If this step fails for any reason, you can run netq bootstrap reset and then try again.
Verify Installation Status
To view the status of the installation, use the netq show status [verbose] command. The following example shows a successful on-premises installation:
State: Active
Version: 4.5.0
Installer Version: 4.5.0
Installation Type: Standalone
Activation Key: PKrgipMGEhVuZXRxLWVuZHBvaW50LWdhdGV3YXkYsagDIixUQmFLTUhzZU80RUdTL3pOT01uQ2lnRnrrUhTbXNPUGRXdnUwTVo5SEpBPTIHZGVmYXVsdDoHbmV0cWRldgz=
Master SSH Public Key: a3NoLXJzYSBBQUFBQjNOemFDMXljMkVBQUFBREFRQUJBQUFCQVFEazliekZDblJUajkvQVhOZ0hteXByTzZIb3Y2cVZBWFdsNVNtKzVrTXo3dmMrcFNZTGlOdWl1bEhZeUZZVDhSNmU3bFdqS3NrSE10bzArNFJsQVd6cnRvbVVzLzlLMzQ4M3pUMjVZQXpIU2N1ZVhBSE1TdTZHZ0JyUkpXYUpTNjJ2RTkzcHBDVjBxWWJvUFo3aGpCY3ozb0VVWnRsU1lqQlZVdjhsVjBNN3JEWW52TXNGSURWLzJ2eks3K0x2N01XTG5aT054S09hdWZKZnVOT0R4YjFLbk1mN0JWK3hURUpLWW1mbTY1ckoyS1ArOEtFUllrr5TkF3bFVRTUdmT3daVHF2RWNoZnpQajMwQ29CWDZZMzVST2hDNmhVVnN5OEkwdjVSV0tCbktrWk81MWlMSDAyZUpJbXJHUGdQa2s1SzhJdGRrQXZISVlTZ0RwRlpRb3Igcm9vdEBucXRzLTEwLTE4OC00NC0xNDc=
Is Cloud: False
Cluster Status:
IP Address Hostname Role Status
------------- ------------- ------ --------
10.188.44.147 10.188.44.147 Role Ready
NetQ... Active
Run the netq show opta-health command to verify all applications are operating properly. Allow 10-15 minutes for all applications to come up and report their status.
If any of the applications or services display Status as DOWN after 30 minutes, open a support ticket and attach the output of the opta-support command.
After NetQ is installed, you can log in to NetQ from your browser.
Set Up Your KVM Virtual Machine for a Cloud Server Cluster
First configure the VM on the master node, and then configure the VM on each worker node.
Follow these steps to set up and configure your VM on a cluster of servers in a cloud deployment:
Verify that your master node meets the VM requirements.
Resource
Minimum Requirements
Processor
Four (4) virtual CPUs
Memory
8 GB RAM
Local disk storage
64 GB
Network interface speed
1 Gb NIC
Hypervisor
KVM/QCOW (QEMU Copy on Write) image for servers running CentOS, Ubuntu, and RedHat operating systems
Confirm that the required ports are open for communications. The OPTA must be able to initiate HTTPS connections (destination TCP port 443) to the netq.nvidia.com domain (*.netq.nvidia.com). You must also open the following ports on your NetQ OPTA:
Port or Protocol Number
Protocol
Component Access
4
IP Protocol
Calico networking (IP-in-IP Protocol)
22
TCP
SSH
80
TCP
Nginx
179
TCP
Calico networking (BGP)
443
TCP
Nginx
2379
TCP
etcd datastore
4789
UDP
Calico networking (VxLAN)
5000
TCP
Docker registry
6443
TCP
kube-apiserver
31980
TCP
NetQ Agent communication
31982
TCP
NetQ Agent SSL communication
32708
TCP
API Gateway
The following ports are used for internal cluster communication and must also be open between servers in your cluster:
Copy the QCOW2 image to a directory where you want to run it.
Tip: Copy, instead of moving, the original QCOW2 image that was downloaded to avoid re-downloading it again later should you need to perform this process again.
Replace the disk path value with the location where the QCOW2 image is to reside. Replace network model value (eth0 in the above example) with the name of the interface where the VM is connected to the external network.
Or, for a Bridged VM, where the VM attaches to a bridge which has already been setup to allow for external access:
Replace network bridge value (br0 in the above example) with the name of the (pre-existing) bridge interface where the VM is connected to the external network.
Make note of the name used during install as this is needed in a later step.
Watch the boot process in another terminal window.
$ virsh console netq_ts
Log in to the VM and change the password.
Use the default credentials to log in the first time:
Username: cumulus
Password: cumulus
$ ssh cumulus@<ipaddr>
Warning: Permanently added '<ipaddr>' (ECDSA) to the list of known hosts.
Ubuntu 20.04 LTS
cumulus@<ipaddr>'s password:
You are required to change your password immediately (root enforced)
System information as of Thu Dec 3 21:35:42 UTC 2020
System load: 0.09 Processes: 120
Usage of /: 8.1% of 61.86GB Users logged in: 0
Memory usage: 5% IP address for eth0: <ipaddr>
Swap usage: 0%
WARNING: Your password has expired.
You must change your password now and login again!
Changing password for cumulus.
(current) UNIX password: cumulus
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
Connection to <ipaddr> closed.
Log in again with your new password.
$ ssh cumulus@<ipaddr>
Warning: Permanently added '<ipaddr>' (ECDSA) to the list of known hosts.
Ubuntu 20.04 LTS
cumulus@<ipaddr>'s password:
System information as of Thu Dec 3 21:35:59 UTC 2020
System load: 0.07 Processes: 121
Usage of /: 8.1% of 61.86GB Users logged in: 0
Memory usage: 5% IP address for eth0: <ipaddr>
Swap usage: 0%
Last login: Thu Dec 3 21:35:43 2020 from <local-ipaddr>
cumulus@ubuntu:~$
Verify the master node is ready for installation. Fix any errors indicated before installing the NetQ software.
cumulus@hostname:~$ sudo opta-check-cloud
Change the hostname for the VM from the default value.
The default hostname for the NetQ Virtual Machines is ubuntu. Change the hostname to fit your naming conventions while meeting Internet and Kubernetes naming standards.
Kubernetes requires that hostnames are composed of a sequence of labels concatenated with dots. For example, “en.wikipedia.org” is a hostname. Each label must be from 1 to 63 characters long. The entire hostname, including the delimiting dots, has a maximum of 253 ASCII characters.
The Internet standards (RFCs) for protocols specify that labels may contain only the ASCII letters a through z (in lower case), the digits 0 through 9, and the hyphen-minus character ('-').
Add the same NEW_HOSTNAME value to /etc/hosts on your VM for the localhost entry. Example:
127.0.0.1 localhost NEW_HOSTNAME
Verify that your first worker node meets the VM requirements, as described in Step 1.
Confirm that the needed ports are open for communications, as described in Step 2.
Open your hypervisor and set up the VM in the same manner as for the master node.
Make a note of the private IP address you assign to the worker node. You need it for later installation steps.
Verify the worker node is ready for installation. Fix any errors indicated before installing the NetQ software.
cumulus@hostname:~$ sudo opta-check-cloud
Repeat Steps 8 through 11 for each additional worker node you want in your cluster.
The final step is to install and activate the NetQ software using the CLI:
Run the following command on your master node to initialize the cluster. Copy the output of the command to use on your worker nodes:
cumulus@<hostname>:~$ netq install cluster master-init
Please run the following command on all worker nodes:
netq install cluster worker-init c3NoLXJzYSBBQUFBQjNOemFDMXljMkVBQUFBREFRQUJBQUFCQVFDM2NjTTZPdVVUWWJ5c2Q3NlJ4SHdseHBsOHQ4N2VMRWVGR05LSWFWVnVNcy94OEE4RFNMQVhKOHVKRjVLUXBnVjdKM2lnMGJpL2hDMVhmSVVjU3l3ZmhvVDVZM3dQN1oySVZVT29ZTi8vR1lOek5nVlNocWZQMDNDRW0xNnNmSzVvUWRQTzQzRFhxQ3NjbndIT3dwZmhRYy9MWTU1a
Run the netq install cluster worker-init <ssh-key> on each of your worker nodes.
Run the following command on your NetQ Cloud Appliance with the config-key obtained from the email you received from NVIDIA titled NetQ Access Link. You can also obtain the configuration key through the NetQ UI in the premise management configuration.
You can specify the IP address instead of the interface name here: use ip-addr <IP address> in place of interface <ifname> above.
If you have changed the IP address or hostname of the NetQ OPTA after this step, you need to re-register this address with NetQ as follows:
Reset the VM:
cumulus@hostname:~$ netq bootstrap reset
Re-run the install CLI on the appliance. This example uses interface eth0. Replace this with your updated IP address, hostname or interface using the interface or ip-addr option.
If this step fails for any reason, you can run netq bootstrap reset and then try again.
Consider the following for container environments, and make adjustments as needed.
Calico Networking
NetQ overrides the Calico default address range and changes it to 10.244.0.0/16. To modify this range, use the netq install opta command, specifying the default address range with the pod-ip-range option. For example:
The default Docker bridge interface is disabled in NetQ. If you need to reenable the interface, contact support.
Verify Installation Status
To view the status of the installation, use the netq show status [verbose] command. The following example shows a successful on-premises installation:
State: Active
Version: 4.5.0
Installer Version: 4.5.0
Installation Type: Standalone
Activation Key: PKrgipMGEhVuZXRxLWVuZHBvaW50LWdhdGV3YXkYsagDIixUQmFLTUhzZU80RUdTL3pOT01uQ2lnRnrrUhTbXNPUGRXdnUwTVo5SEpBPTIHZGVmYXVsdDoHbmV0cWRldgz=
Master SSH Public Key: a3NoLXJzYSBBQUFBQjNOemFDMXljMkVBQUFBREFRQUJBQUFCQVFEazliekZDblJUajkvQVhOZ0hteXByTzZIb3Y2cVZBWFdsNVNtKzVrTXo3dmMrcFNZTGlOdWl1bEhZeUZZVDhSNmU3bFdqS3NrSE10bzArNFJsQVd6cnRvbVVzLzlLMzQ4M3pUMjVZQXpIU2N1ZVhBSE1TdTZHZ0JyUkpXYUpTNjJ2RTkzcHBDVjBxWWJvUFo3aGpCY3ozb0VVWnRsU1lqQlZVdjhsVjBNN3JEWW52TXNGSURWLzJ2eks3K0x2N01XTG5aT054S09hdWZKZnVOT0R4YjFLbk1mN0JWK3hURUpLWW1mbTY1ckoyS1ArOEtFUllrr5TkF3bFVRTUdmT3daVHF2RWNoZnpQajMwQ29CWDZZMzVST2hDNmhVVnN5OEkwdjVSV0tCbktrWk81MWlMSDAyZUpJbXJHUGdQa2s1SzhJdGRrQXZISVlTZ0RwRlpRb3Igcm9vdEBucXRzLTEwLTE4OC00NC0xNDc=
Is Cloud: False
Cluster Status:
IP Address Hostname Role Status
------------- ------------- ------ --------
10.188.44.147 10.188.44.147 Role Ready
NetQ... Active
Run the netq show opta-health command to verify all applications are operating properly. Allow 10-15 minutes for all applications to come up and report their status.
If any of the applications or services display Status as DOWN after 30 minutes, open a support ticket and attach the output of the opta-support command.
After NetQ is installed, you can log in to NetQ from your browser.
Install the NetQ On-premises Appliance
This topic describes how to prepare your single, NetQ On-premises Appliance for installation of the NetQ Platform software.
Each system shipped to you contains:
Your NVIDIA NetQ On-premises Appliance (a Supermicro 6019P-WTR server)
Hardware accessories, such as power cables and rack mounting gear (note that network cables and optics ship separately)
Information regarding your order
For more detail about hardware specifications (including LED layouts and FRUs like the power supply or fans, and accessories like included cables) or safety and environmental information, refer to the user manual and quick reference guide.
Install the Appliance
After you unbox the appliance:
Mount the appliance in the rack.
Connect it to power following the procedures described in your appliance's user manual.
Connect the Ethernet cable to the 1G management port (eno1).
Power on the appliance.
If your network runs DHCP, you can configure NetQ over the network. If DHCP is not enabled, then you configure the appliance using the console cable provided.
Configure the Password, Hostname, and IP Address
Change the password and specify the hostname and IP address for the appliance before installing the NetQ software.
Log in to the appliance using the default login credentials:
Username: cumulus
Password: cumulus
Change the password using the passwd command:
cumulus@hostname:~$ passwd
Changing password for cumulus.
(current) UNIX password: cumulus
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
The default hostname for the NetQ On-premises Appliance is netq-appliance. Change the hostname to fit your naming conventions while meeting Internet and Kubernetes naming standards.
Kubernetes requires that hostnames comprise a sequence of labels concatenated with dots. For example, en.wikipedia.org is a hostname. Each label must be from 1 to 63 characters long. The entire hostname, including the delimiting dots, has a maximum of 253 ASCII characters.
The Internet standards (RFCs) for protocols specify that labels can contain only the ASCII letters a through z (in lower case), the digits 0 through 9, and the hyphen-minus character ('-').
The appliance contains two Ethernet ports. It uses port eno1 for out-of-band management. This is where NetQ Agents should send the telemetry data collected from your monitored switches and hosts. By default, eno1 uses DHCPv4 to get its IP address. You can view the assigned IP address using the following command:
cumulus@hostname:~$ ip -4 -brief addr show eno1
eno1 UP 10.20.16.248/24
Alternately, you can configure the interface with a static IP address by editing the /etc/netplan/01-ethernet.yaml Ubuntu Netplan configuration file.
For example, to set your network interface eno1 to a static IP address of 192.168.1.222 with gateway 192.168.1.1 and DNS server as 8.8.8.8 and 8.8.4.4:
# This file describes the network interfaces available on your system
# For more information, see netplan(5).
network:
version: 2
renderer: networkd
ethernets:
eno1:
dhcp4: no
addresses: [192.168.1.222/24]
gateway4: 192.168.1.1
nameservers:
addresses: [8.8.8.8,8.8.4.4]
Apply the settings.
cumulus@hostname:~$ sudo netplan apply
Verify NetQ Software and Appliance Readiness
Now that the appliance is up and running, verify that the software is available and the appliance is ready for installation.
Verify that the needed packages are present and of the correct release, version 4.5 and update 38.
cumulus@hostname:~$ dpkg -l | grep netq
ii netq-agent 4.5.0-ub20.04u41~1677251815.f5b57862_amd64 Cumulus NetQ Telemetry Agent for Ubuntu
ii netq-apps 4.5.0-ub20.04u41~1677251815.f5b57862_amd64 Cumulus NetQ Fabric Validation Application for Ubuntu
Verify the installation images are present and of the correct release, version 4.5.
cumulus@hostname:~$ cd /mnt/installables/
cumulus@hostname:/mnt/installables$ ls
NetQ-4.5.0.tgz
Verify the appliance is ready for installation. Fix any errors indicated before installing the NetQ software.
cumulus@hostname:~$ sudo opta-check
The final step is to install and activate the NetQ software using the CLI:
Run the following command on your NetQ platform server or NetQ Appliance:
cumulus@hostname:~$ netq install standalone full interface eth0 bundle /mnt/installables/NetQ-4.5.0.tgz
You can specify the IP address instead of the interface name here: use ip-addr <IP address> in place of interface <ifname> above.
If you have changed the IP address or hostname of the NetQ on-premises VM after this step, you need to re-register this address with NetQ as follows:
Reset the VM, indicating whether you want to purge any NetQ DB data or keep it.
Re-run the install CLI on the appliance. This example uses interface eno1. Replace this with your updated IP address, hostname or interface using the interface or ip-addr option.
cumulus@hostname:~$ netq install standalone full interface eno1 bundle /mnt/installables/NetQ-4.5.0.tgz
If this step fails for any reason, you can run netq bootstrap reset and then try again.
Verify Installation Status
To view the status of the installation, use the netq show status [verbose] command. The following example shows a successful on-premises installation:
State: Active
Version: 4.5.0
Installer Version: 4.5.0
Installation Type: Standalone
Activation Key: PKrgipMGEhVuZXRxLWVuZHBvaW50LWdhdGV3YXkYsagDIixUQmFLTUhzZU80RUdTL3pOT01uQ2lnRnrrUhTbXNPUGRXdnUwTVo5SEpBPTIHZGVmYXVsdDoHbmV0cWRldgz=
Master SSH Public Key: a3NoLXJzYSBBQUFBQjNOemFDMXljMkVBQUFBREFRQUJBQUFCQVFEazliekZDblJUajkvQVhOZ0hteXByTzZIb3Y2cVZBWFdsNVNtKzVrTXo3dmMrcFNZTGlOdWl1bEhZeUZZVDhSNmU3bFdqS3NrSE10bzArNFJsQVd6cnRvbVVzLzlLMzQ4M3pUMjVZQXpIU2N1ZVhBSE1TdTZHZ0JyUkpXYUpTNjJ2RTkzcHBDVjBxWWJvUFo3aGpCY3ozb0VVWnRsU1lqQlZVdjhsVjBNN3JEWW52TXNGSURWLzJ2eks3K0x2N01XTG5aT054S09hdWZKZnVOT0R4YjFLbk1mN0JWK3hURUpLWW1mbTY1ckoyS1ArOEtFUllrr5TkF3bFVRTUdmT3daVHF2RWNoZnpQajMwQ29CWDZZMzVST2hDNmhVVnN5OEkwdjVSV0tCbktrWk81MWlMSDAyZUpJbXJHUGdQa2s1SzhJdGRrQXZISVlTZ0RwRlpRb3Igcm9vdEBucXRzLTEwLTE4OC00NC0xNDc=
Is Cloud: False
Cluster Status:
IP Address Hostname Role Status
------------- ------------- ------ --------
10.188.44.147 10.188.44.147 Role Ready
NetQ... Active
Run the netq show opta-health command to verify all applications are operating properly. Allow 10-15 minutes for all applications to come up and report their status.
If any of the applications or services display Status as DOWN after 30 minutes, open a support ticket and attach the output of the opta-support command.
After NetQ is installed, you can log in to NetQ from your browser.
Install the NetQ Cloud Appliance
This topic describes how to prepare your single, NetQ Cloud Appliance for installation of the NetQ Collector software.
Each system shipped to you contains:
Your NVIDIA NetQ Cloud Appliance (a Supermicro SuperServer E300-9D)
Hardware accessories, such as power cables and rack mounting gear (note that network cables and optics ship separately)
Information regarding your order
If you’re looking for hardware specifications (including LED layouts and FRUs like the power supply or fans and accessories like included cables) or safety and environmental information, check out the appliance’s user manual.
Install the Appliance
After you unbox the appliance:
Mount the appliance in the rack.
Connect it to power following the procedures described in your appliance's user manual.
Connect the Ethernet cable to the 1G management port (eno1).
Power on the appliance.
If your network runs DHCP, you can configure NetQ over the network. If DHCP is not enabled, then you configure the appliance using the console cable provided.
Configure the Password, Hostname, and IP Address
Log in to the appliance using the default login credentials:
Username: cumulus
Password: cumulus
Change the password using the passwd command:
cumulus@hostname:~$ passwd
Changing password for cumulus.
(current) UNIX password: cumulus
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
The default hostname for the NetQ Cloud Appliance is netq-appliance. Change the hostname to fit your naming conventions while meeting Internet and Kubernetes naming standards.
Kubernetes requires that hostnames comprise a sequence of labels concatenated with dots. For example, en.wikipedia.org is a hostname. Each label must be from 1 to 63 characters long. The entire hostname, including the delimiting dots, has a maximum of 253 ASCII characters.
The Internet standards (RFCs) for protocols specify that labels can contain only the ASCII letters a through z (in lower case), the digits 0 through 9, and the hyphen-minus character ('-').
The appliance contains two Ethernet ports. It uses port eno1 for out-of-band management. This is where NetQ Agents should send the telemetry data collected from your monitored switches and hosts. By default, eno1 uses DHCPv4 to get its IP address. You can view the assigned IP address using the following command:
cumulus@hostname:~$ ip -4 -brief addr show eno1
eno1 UP 10.20.16.248/24
Alternately, you can configure the interface with a static IP address by editing the /etc/netplan/01-ethernet.yaml Ubuntu Netplan configuration file.
For example, to set your network interface eno1 to a static IP address of 192.168.1.222 with gateway 192.168.1.1 and DNS server as 8.8.8.8 and 8.8.4.4:
# This file describes the network interfaces available on your system
# For more information, see netplan(5).
network:
version: 2
renderer: networkd
ethernets:
eno1:
dhcp4: no
addresses: [192.168.1.222/24]
gateway4: 192.168.1.1
nameservers:
addresses: [8.8.8.8,8.8.4.4]
Apply the settings.
cumulus@hostname:~$ sudo netplan apply
Verify NetQ Software and Appliance Readiness
Now that the appliance is up and running, verify that the software is available and the appliance is ready for installation.
Verify that the required packages are present and reflect the most current version.
cumulus@hostname:~$ dpkg -l | grep netq
ii netq-agent 4.5.0-ub20.04u41~1677251815.f5b57862_amd64 Cumulus NetQ Telemetry Agent for Ubuntu
ii netq-apps 4.5.0-ub20.04u41~1677251815.f5b57862_amd64 Cumulus NetQ Fabric Validation Application for Ubuntu
Verify the installation images are present and reflect the most current version.
cumulus@hostname:~$ cd /mnt/installables/
cumulus@hostname:/mnt/installables$ ls
NetQ-4.5.0-opta.tgz
Verify the appliance is ready for installation. Fix any errors before installing the NetQ software.
cumulus@hostname:~$ sudo opta-check-cloud
Install and activate the NetQ software using the CLI:
Run the following command on your NetQ Cloud Appliance with the config-key obtained from the email you received from NVIDIA titled NetQ Access Link. You can also obtain the configuration key through the NetQ UI.
You can specify the IP address instead of the interface name here: use ip-addr <IP address> in place of interface <ifname> above.
If you have changed the IP address or hostname of the NetQ OPTA after this step, you need to re-register this address with NetQ as follows:
Reset the VM:
cumulus@hostname:~$ netq bootstrap reset
Re-run the install CLI on the appliance. This example uses interface eno1. Replace this with your updated IP address, hostname or interface using the interface or ip-addr option.
If this step fails for any reason, you can run netq bootstrap reset and then try again.
Consider the following for container environments, and make adjustments as needed.
Calico Networking
NetQ overrides the Calico default address range and changes it to 10.244.0.0/16. To modify this range, use the netq install opta command, specifying the default address range with the pod-ip-range option. For example:
The default Docker bridge interface is disabled in NetQ. If you need to reenable the interface, contact support.
Verify Installation Status
To view the status of the installation, use the netq show status [verbose] command. The following example shows a successful on-premises installation:
State: Active
Version: 4.5.0
Installer Version: 4.5.0
Installation Type: Standalone
Activation Key: PKrgipMGEhVuZXRxLWVuZHBvaW50LWdhdGV3YXkYsagDIixUQmFLTUhzZU80RUdTL3pOT01uQ2lnRnrrUhTbXNPUGRXdnUwTVo5SEpBPTIHZGVmYXVsdDoHbmV0cWRldgz=
Master SSH Public Key: a3NoLXJzYSBBQUFBQjNOemFDMXljMkVBQUFBREFRQUJBQUFCQVFEazliekZDblJUajkvQVhOZ0hteXByTzZIb3Y2cVZBWFdsNVNtKzVrTXo3dmMrcFNZTGlOdWl1bEhZeUZZVDhSNmU3bFdqS3NrSE10bzArNFJsQVd6cnRvbVVzLzlLMzQ4M3pUMjVZQXpIU2N1ZVhBSE1TdTZHZ0JyUkpXYUpTNjJ2RTkzcHBDVjBxWWJvUFo3aGpCY3ozb0VVWnRsU1lqQlZVdjhsVjBNN3JEWW52TXNGSURWLzJ2eks3K0x2N01XTG5aT054S09hdWZKZnVOT0R4YjFLbk1mN0JWK3hURUpLWW1mbTY1ckoyS1ArOEtFUllrr5TkF3bFVRTUdmT3daVHF2RWNoZnpQajMwQ29CWDZZMzVST2hDNmhVVnN5OEkwdjVSV0tCbktrWk81MWlMSDAyZUpJbXJHUGdQa2s1SzhJdGRrQXZISVlTZ0RwRlpRb3Igcm9vdEBucXRzLTEwLTE4OC00NC0xNDc=
Is Cloud: False
Cluster Status:
IP Address Hostname Role Status
------------- ------------- ------ --------
10.188.44.147 10.188.44.147 Role Ready
NetQ... Active
Run the netq show opta-health command to verify all applications are operating properly. Allow 10-15 minutes for all applications to come up and report their status.
If any of the applications or services display Status as DOWN after 30 minutes, open a support ticket and attach the output of the opta-support command.
After NetQ is installed, you can log in to NetQ from your browser.
Install a NetQ On-premises Appliance Cluster
This topic describes how to prepare your cluster of NetQ On-premises Appliances for installation of the NetQ Platform software.
Each system shipped to you contains:
Your NVIDIA NetQ On-premises Appliance (a Supermicro 6019P-WTR server)
Hardware accessories, such as power cables and rack mounting gear (note that network cables and optics ship separately)
Information regarding your order
For more detail about hardware specifications (including LED layouts and FRUs like the power supply or fans, and accessories like included cables) or safety and environmental information, refer to the user manual and quick reference guide.
Install Each Appliance
After you unbox the appliance:
Mount the appliance in the rack.
Connect it to power following the procedures described in your appliance's user manual.
Connect the Ethernet cable to the 1G management port (eno1).
Power on the appliance.
If your network runs DHCP, you can configure NetQ over the network. If DHCP is not enabled, then you configure the appliance using the console cable provided.
Configure the Password, Hostname, and IP Address
Change the password and specify the hostname and IP address for each appliance before installing the NetQ software.
Log in to the appliance that you intend to use as your master node using the default login credentials:
Username: cumulus
Password: cumulus
Change the password using the passwd command:
cumulus@hostname:~$ passwd
Changing password for cumulus.
(current) UNIX password: cumulus
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
The default hostname for the NetQ On-premises Appliance is netq-appliance. Change the hostname to fit your naming conventions while meeting Internet and Kubernetes naming standards.
Kubernetes requires that hostnames comprise a sequence of labels concatenated with dots. For example, “en.wikipedia.org” is a hostname. Each label must be from 1 to 63 characters long. The entire hostname, including the delimiting dots, has a maximum of 253 ASCII characters.
The Internet standards (RFCs) for protocols specify that labels can contain only the ASCII letters a through z (in lower case), the digits 0 through 9, and the hyphen-minus character ('-').
The appliance contains two Ethernet ports. It uses port eno1 for out-of-band management. This is where NetQ Agents should send the telemetry data collected from your monitored switches and hosts. By default, eno1 uses DHCPv4 to get its IP address. You can view the assigned IP address using the following command:
cumulus@hostname:~$ ip -4 -brief addr show eno1
eno1 UP 10.20.16.248/24
Alternately, you can configure the interface with a static IP address by editing the /etc/netplan/01-ethernet.yaml Ubuntu Netplan configuration file.
For example, to set your network interface eno1 to a static IP address of 192.168.1.222 with gateway 192.168.1.1 and DNS server as 8.8.8.8 and 8.8.4.4:
# This file describes the network interfaces available on your system
# For more information, see netplan(5).
network:
version: 2
renderer: networkd
ethernets:
eno1:
dhcp4: no
addresses: [192.168.1.222/24]
gateway4: 192.168.1.1
nameservers:
addresses: [8.8.8.8,8.8.4.4]
Apply the settings.
cumulus@hostname:~$ sudo netplan apply
Repeat these steps for each of the worker node appliances.
Verify NetQ Software and Appliance Readiness
Now that the appliances are up and running, verify that the software is available and the appliance is ready for installation.
On the master node, verify that the needed packages are present and of the correct release, version 4.5.
cumulus@hostname:~$ dpkg -l | grep netq
ii netq-agent 4.5.0-ub20.04u41~1677251815.f5b57862_amd64 Cumulus NetQ Telemetry Agent for Ubuntu
ii netq-apps 4.5.0-ub20.04u41~1677251815.f5b57862_amd64 Cumulus NetQ Fabric Validation Application for Ubuntu
Verify the installation images are present and of the correct release, version 4.5.
cumulus@hostname:~$ cd /mnt/installables/
cumulus@hostname:/mnt/installables$ ls
NetQ-4.5.0.tgz
Verify the master node is ready for installation. Fix any errors indicated before installing the NetQ software.
cumulus@hostname:~$ sudo opta-check
On one or your worker nodes, verify that the needed packages are present and of the correct release, version 4.5 and update 38 or later.
cumulus@hostname:~$ dpkg -l | grep netq
ii netq-agent 4.5.0-ub20.04u41~1677251815.f5b57862_amd64 Cumulus NetQ Telemetry Agent for Ubuntu
ii netq-apps 4.5.0-ub20.04u41~1677251815.f5b57862_amd64 Cumulus NetQ Fabric Validation Application for Ubuntu
Make a note of the private IP addresses you assign to the master and worker nodes. You need them for later installation steps.
Verify that the needed packages are present and of the correct release, version 4.5 and update 38.
cumulus@hostname:~$ dpkg -l | grep netq
ii netq-agent 4.5.0-ub20.04u41~1677251815.f5b57862_amd64 Cumulus NetQ Telemetry Agent for Ubuntu
ii netq-apps 4.5.0-ub20.04u41~1677251815.f5b57862_amd64 Cumulus NetQ Fabric Validation Application for Ubuntu
Verify that the needed files are present and of the correct release.
cumulus@hostname:~$ cd /mnt/installables/
cumulus@hostname:/mnt/installables$ ls
NetQ-4.5.0.tgz
Verify the appliance is ready for installation. Fix any errors indicated before installing the NetQ software.
cumulus@hostname:~$ sudo opta-check
Repeat Steps 4-9 for each additional worker node (NetQ On-premises Appliance).
The final step is to install and activate the NetQ software using the CLI:
Run the following command on your master node to initialize the cluster. Copy the output of the command to use on your worker nodes:
cumulus@<hostname>:~$ netq install cluster master-init
Please run the following command on all worker nodes:
netq install cluster worker-init c3NoLXJzYSBBQUFBQjNOemFDMXljMkVBQUFBREFRQUJBQUFCQVFDM2NjTTZPdVVUWWJ5c2Q3NlJ4SHdseHBsOHQ4N2VMRWVGR05LSWFWVnVNcy94OEE4RFNMQVhKOHVKRjVLUXBnVjdKM2lnMGJpL2hDMVhmSVVjU3l3ZmhvVDVZM3dQN1oySVZVT29ZTi8vR1lOek5nVlNocWZQMDNDRW0xNnNmSzVvUWRQTzQzRFhxQ3NjbndIT3dwZmhRYy9MWTU1a
Run the netq install cluster worker-init <ssh-key> on each of your worker nodes.
Run the following commands on your master node, using the IP addresses of your worker nodes:
Re-run the install CLI on the appliance. This example uses interface eno1. Replace this with your updated IP address, hostname or interface using the interface or ip-addr option.
cumulus@hostname:~$ netq install standalone full interface eno1 bundle /mnt/installables/NetQ-4.5.0.tgz
If this step fails for any reason, you can run netq bootstrap reset and then try again.
Verify Installation Status
To view the status of the installation, use the netq show status [verbose] command. The following example shows a successful on-premises installation:
State: Active
Version: 4.5.0
Installer Version: 4.5.0
Installation Type: Standalone
Activation Key: PKrgipMGEhVuZXRxLWVuZHBvaW50LWdhdGV3YXkYsagDIixUQmFLTUhzZU80RUdTL3pOT01uQ2lnRnrrUhTbXNPUGRXdnUwTVo5SEpBPTIHZGVmYXVsdDoHbmV0cWRldgz=
Master SSH Public Key: a3NoLXJzYSBBQUFBQjNOemFDMXljMkVBQUFBREFRQUJBQUFCQVFEazliekZDblJUajkvQVhOZ0hteXByTzZIb3Y2cVZBWFdsNVNtKzVrTXo3dmMrcFNZTGlOdWl1bEhZeUZZVDhSNmU3bFdqS3NrSE10bzArNFJsQVd6cnRvbVVzLzlLMzQ4M3pUMjVZQXpIU2N1ZVhBSE1TdTZHZ0JyUkpXYUpTNjJ2RTkzcHBDVjBxWWJvUFo3aGpCY3ozb0VVWnRsU1lqQlZVdjhsVjBNN3JEWW52TXNGSURWLzJ2eks3K0x2N01XTG5aT054S09hdWZKZnVOT0R4YjFLbk1mN0JWK3hURUpLWW1mbTY1ckoyS1ArOEtFUllrr5TkF3bFVRTUdmT3daVHF2RWNoZnpQajMwQ29CWDZZMzVST2hDNmhVVnN5OEkwdjVSV0tCbktrWk81MWlMSDAyZUpJbXJHUGdQa2s1SzhJdGRrQXZISVlTZ0RwRlpRb3Igcm9vdEBucXRzLTEwLTE4OC00NC0xNDc=
Is Cloud: False
Cluster Status:
IP Address Hostname Role Status
------------- ------------- ------ --------
10.188.44.147 10.188.44.147 Role Ready
NetQ... Active
Run the netq show opta-health command to verify all applications are operating properly. Allow 10-15 minutes for all applications to come up and report their status.
If any of the applications or services display Status as DOWN after 30 minutes, open a support ticket and attach the output of the opta-support command.
After NetQ is installed, you can log in to NetQ from your browser.
Install a NetQ Cloud Appliance Cluster
This topic describes how to prepare your cluster of NetQ Cloud Appliances for installation of the NetQ Collector software.
Each system shipped to you contains:
Your NVIDIA NetQ Cloud Appliance (a Supermicro SuperServer E300-9D)
Hardware accessories, such as power cables and rack mounting gear (note that network cables and optics ship separately)
Information regarding your order
For more detail about hardware specifications (including LED layouts and FRUs like the power supply or fans and accessories like included cables) or safety and environmental information, refer to the user manual.
Install Each Appliance
After you unbox the appliance:
Mount the appliance in the rack.
Connect it to power following the procedures described in your appliance's user manual.
Connect the Ethernet cable to the 1G management port (eno1).
Power on the appliance.
If your network runs DHCP, you can configure NetQ over the network. If DHCP is not enabled, then you configure the appliance using the console cable provided.
Configure the Password, Hostname, and IP Address
Change the password and specify the hostname and IP address for each appliance before installing the NetQ software.
Log in to the appliance that you intend to use as your master node using the default login credentials:
Username: cumulus
Password: cumulus
Change the password using the passwd command:
cumulus@hostname:~$ passwd
Changing password for cumulus.
(current) UNIX password: cumulus
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
The default hostname for the NetQ Cloud Appliance is netq-appliance. Change the hostname to fit your naming conventions while meeting Internet and Kubernetes naming standards.
Kubernetes requires that hostnames comprise a sequence of labels concatenated with dots. For example, en.wikipedia.org is a hostname. Each label must be from 1 to 63 characters long. The entire hostname, including the delimiting dots, has a maximum of 253 ASCII characters.
The Internet standards (RFCs) for protocols specify that labels can contain only the ASCII letters a through z (in lower case), the digits 0 through 9, and the hyphen-minus character ('-').
The appliance contains two Ethernet ports. It uses port eno1 for out-of-band management. This is where NetQ Agents should send the telemetry data collected from your monitored switches and hosts. By default, eno1 uses DHCPv4 to get its IP address. You can view the assigned IP address using the following command:
cumulus@hostname:~$ ip -4 -brief addr show eno1
eno1 UP 10.20.16.248/24
Alternately, you can configure the interface with a static IP address by editing the /etc/netplan/01-ethernet.yaml Ubuntu Netplan configuration file.
For example, to set your network interface eno1 to a static IP address of 192.168.1.222 with gateway 192.168.1.1 and DNS server as 8.8.8.8 and 8.8.4.4:
# This file describes the network interfaces available on your system
# For more information, see netplan(5).
network:
version: 2
renderer: networkd
ethernets:
eno1:
dhcp4: no
addresses: [192.168.1.222/24]
gateway4: 192.168.1.1
nameservers:
addresses: [8.8.8.8,8.8.4.4]
Apply the settings.
cumulus@hostname:~$ sudo netplan apply
Repeat these steps for each of the worker node appliances.
Verify NetQ Software and Appliance Readiness
Now that the appliances are up and running, verify that the software is available and each appliance is ready for installation.
On the master NetQ Cloud Appliance, verify that the needed packages are present and of the correct release, version 4.5.
cumulus@hostname:~$ dpkg -l | grep netq
ii netq-agent 4.5.0-ub20.04u41~1677251815.f5b57862_amd64 Cumulus NetQ Telemetry Agent for Ubuntu
ii netq-apps 4.5.0-ub20.04u41~1677251815.f5b57862_amd64 Cumulus NetQ Fabric Validation Application for Ubuntu
Verify the installation images are present and of the correct release, version 4.5.
cumulus@hostname:~$ cd /mnt/installables/
cumulus@hostname:/mnt/installables$ ls
NetQ-4.5.0-opta.tgz
Verify the master NetQ Cloud Appliance is ready for installation. Fix any errors indicated before installing the NetQ software.
cumulus@hostname:~$ sudo opta-check-cloud
On one of your worker NetQ Cloud Appliances, verify that the needed packages are present and of the correct release, version 4.5 and update 34.
cumulus@hostname:~$ dpkg -l | grep netq
ii netq-agent 4.5.0-ub20.04u41~1677251815.f5b57862_amd64 Cumulus NetQ Telemetry Agent for Ubuntu
ii netq-apps 4.5.0-ub20.04u41~1677251815.f5b57862_amd64 Cumulus NetQ Fabric Validation Application for Ubuntu
Make a note of the private IP addresses you assign to the master and worker nodes. You need them for later installation steps.
Verify that the needed packages are present and of the correct release, version 4.5.
cumulus@hostname:~$ dpkg -l | grep netq
ii netq-agent 4.5.0-ub20.04u41~1677251815.f5b57862_amd64 Cumulus NetQ Telemetry Agent for Ubuntu
ii netq-apps 4.5.0-ub20.04u41~1677251815.f5b57862_amd64 Cumulus NetQ Fabric Validation Application for Ubuntu
Verify that the needed files are present and of the correct release.
cumulus@hostname:~$ cd /mnt/installables/
cumulus@hostname:/mnt/installables$ ls
NetQ-4.5.0-opta.tgz
Verify the appliance is ready for installation. Fix any errors indicated before installing the NetQ software.
cumulus@hostname:~$ sudo opta-check-cloud
Repeat Steps 4-8 for each additional worker NetQ Cloud Appliance.
The final step is to install and activate the NetQ software using the CLI:
Run the following command on your master node to initialize the cluster. Copy the output of the command to use on your worker nodes:
cumulus@<hostname>:~$ netq install cluster master-init
Please run the following command on all worker nodes:
netq install cluster worker-init c3NoLXJzYSBBQUFBQjNOemFDMXljMkVBQUFBREFRQUJBQUFCQVFDM2NjTTZPdVVUWWJ5c2Q3NlJ4SHdseHBsOHQ4N2VMRWVGR05LSWFWVnVNcy94OEE4RFNMQVhKOHVKRjVLUXBnVjdKM2lnMGJpL2hDMVhmSVVjU3l3ZmhvVDVZM3dQN1oySVZVT29ZTi8vR1lOek5nVlNocWZQMDNDRW0xNnNmSzVvUWRQTzQzRFhxQ3NjbndIT3dwZmhRYy9MWTU1a
Run the netq install cluster worker-init <ssh-key> on each of your worker nodes.
Run the following command on your NetQ Cloud Appliance with the config-key obtained from the email you received from NVIDIA titled NetQ Access Link. You can also obtain the configuration key through the NetQ UI in the premise management configuration.
You can specify the IP address instead of the interface name here: use ip-addr <IP address> in place of interface <ifname> above.
If you have changed the IP address or hostname of the NetQ OPTA after this step, you need to re-register this address with NetQ as follows:
Reset the VM:
cumulus@hostname:~$ netq bootstrap reset
Re-run the install CLI on the appliance. This example uses interface eth0. Replace this with your updated IP address, hostname or interface using the interface or ip-addr option.
If this step fails for any reason, you can run netq bootstrap reset and then try again.
Consider the following for container environments, and make adjustments as needed.
Calico Networking
NetQ overrides the Calico default address range and changes it to 10.244.0.0/16. To modify this range, use the netq install opta command, specifying the default address range with the pod-ip-range option. For example:
The default Docker bridge interface is disabled in NetQ. If you need to reenable the interface, contact support.
Verify Installation Status
To view the status of the installation, use the netq show status [verbose] command. The following example shows a successful on-premises installation:
State: Active
Version: 4.5.0
Installer Version: 4.5.0
Installation Type: Standalone
Activation Key: PKrgipMGEhVuZXRxLWVuZHBvaW50LWdhdGV3YXkYsagDIixUQmFLTUhzZU80RUdTL3pOT01uQ2lnRnrrUhTbXNPUGRXdnUwTVo5SEpBPTIHZGVmYXVsdDoHbmV0cWRldgz=
Master SSH Public Key: a3NoLXJzYSBBQUFBQjNOemFDMXljMkVBQUFBREFRQUJBQUFCQVFEazliekZDblJUajkvQVhOZ0hteXByTzZIb3Y2cVZBWFdsNVNtKzVrTXo3dmMrcFNZTGlOdWl1bEhZeUZZVDhSNmU3bFdqS3NrSE10bzArNFJsQVd6cnRvbVVzLzlLMzQ4M3pUMjVZQXpIU2N1ZVhBSE1TdTZHZ0JyUkpXYUpTNjJ2RTkzcHBDVjBxWWJvUFo3aGpCY3ozb0VVWnRsU1lqQlZVdjhsVjBNN3JEWW52TXNGSURWLzJ2eks3K0x2N01XTG5aT054S09hdWZKZnVOT0R4YjFLbk1mN0JWK3hURUpLWW1mbTY1ckoyS1ArOEtFUllrr5TkF3bFVRTUdmT3daVHF2RWNoZnpQajMwQ29CWDZZMzVST2hDNmhVVnN5OEkwdjVSV0tCbktrWk81MWlMSDAyZUpJbXJHUGdQa2s1SzhJdGRrQXZISVlTZ0RwRlpRb3Igcm9vdEBucXRzLTEwLTE4OC00NC0xNDc=
Is Cloud: False
Cluster Status:
IP Address Hostname Role Status
------------- ------------- ------ --------
10.188.44.147 10.188.44.147 Role Ready
NetQ... Active
Run the netq show opta-health command to verify all applications are operating properly. Allow 10-15 minutes for all applications to come up and report their status.
If any of the applications or services display Status as DOWN after 30 minutes, open a support ticket and attach the output of the opta-support command.
After NetQ is installed, you can log in to NetQ from your browser.
Install NetQ Agents
After installing the NetQ software, you should install the NetQ Agents on each switch you want to monitor. You can install NetQ Agents on switches and servers running:
Cumulus Linux 3.7.16 and later
SONiC 202012
CentOS 7
RHEL 7.1
Ubuntu 18.04
Prepare for NetQ Agent Installation
For switches running Cumulus Linux and SONiC, you need to:
Install and configure NTP, if needed
Obtain NetQ software packages
For servers running RHEL, CentOS, or Ubuntu, you need to:
Verify you installed the minimum package versions
Verify the server is running lldpd
Install and configure NTP, if needed
Obtain NetQ software packages
If your network uses a proxy server for external connections, you should first
configure a global proxy so apt-get can access the software package in the NVIDIA networking repository.
Verify NTP Is Installed and Configured
Verify that
NTP is running on the switch. The switch must be in time synchronization with the NetQ Platform or NetQ Appliance to enable useful statistical analysis.
cumulus@switch:~$ sudo systemctl status ntp
[sudo] password for cumulus:
● ntp.service - LSB: Start NTP daemon
Loaded: loaded (/etc/init.d/ntp; bad; vendor preset: enabled)
Active: active (running) since Fri 2018-06-01 13:49:11 EDT; 2 weeks 6 days ago
Docs: man:systemd-sysv-generator(8)
CGroup: /system.slice/ntp.service
└─2873 /usr/sbin/ntpd -p /var/run/ntpd.pid -g -c /var/lib/ntp/ntp.conf.dhcp -u 109:114
If NTP is not installed, install and configure it before continuing.
If NTP is not running:
Verify the IP address or hostname of the NTP server in the /etc/ntp.conf file, and then
Reenable and start the NTP service using the systemctl [enable|start] ntp commands
If you are running NTP in your out-of-band management network with VRF, specify the VRF (ntp@<vrf-name> versus just ntp) in the above commands.
Obtain NetQ Agent Software Package
To install the NetQ Agent you need to install netq-agent on each switch or host. This is available from the NVIDIA networking repository.
To obtain the NetQ Agent package:
Edit the /etc/apt/sources.list file to add the repository for NetQ.
Note that NetQ has a separate repository from Cumulus Linux.
cumulus@switch:~$ sudo nano /etc/apt/sources.list
...
deb https://apps3.cumulusnetworks.com/repos/deb CumulusLinux-3 netq-4.5
...
You can use the deb https://apps3.cumulusnetworks.com/repos/deb CumulusLinux-3 netq-latest repository if you want to always retrieve the latest posted version of NetQ.
Cumulus Linux 4.4 and later includes the netq-agent package by default.
To add the repository, uncomment or add the following line in /etc/apt/sources.list:
cumulus@switch:~$ sudo nano /etc/apt/sources.list
...
deb https://apps3.cumulusnetworks.com/repos/deb CumulusLinux-4 netq-4.5
...
You can use the deb https://apps3.cumulusnetworks.com/repos/deb CumulusLinux-4 netq-latest repository if you want to always retrieve the latest posted version of NetQ.
Add the apps3.cumulusnetworks.com authentication key to Cumulus Linux:
Verify that
NTP is running on the switch. The switch must be in time synchronization with the NetQ Platform or NetQ Appliance to enable useful statistical analysis.
admin@switch:~$ sudo systemctl status ntp
● ntp.service - Network Time Service
Loaded: loaded (/lib/systemd/system/ntp.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2021-06-08 14:56:16 UTC; 2min 18s ago
Docs: man:ntpd(8)
Process: 1444909 ExecStart=/usr/lib/ntp/ntp-systemd-wrapper (code=exited, status=0/SUCCESS)
Main PID: 1444921 (ntpd)
Tasks: 2 (limit: 9485)
Memory: 1.9M
CGroup: /system.slice/ntp.service
└─1444921 /usr/sbin/ntpd -p /var/run/ntpd.pid -x -u 106:112
If NTP is not installed, install and configure it before continuing.
If NTP is not running:
Verify the IP address or hostname of the NTP server in the /etc/sonic/config_db.json file, and then
Reenable and start the NTP service using the sudo config reload -n command
Verify NTP is operating correctly. Look for an asterisk (*) or a plus sign (+) that indicates the clock synchronized with NTP.
admin@switch:~$ show ntp
MGMT_VRF_CONFIG is not present.
synchronised to NTP server (104.194.8.227) at stratum 3
time correct to within 2014 ms
polling server every 64 s
remote refid st t when poll reach delay offset jitter
==============================================================================
-144.172.118.20 139.78.97.128 2 u 26 64 377 47.023 -1798.1 120.803
+208.67.75.242 128.227.205.3 2 u 32 64 377 72.050 -1939.3 97.869
+216.229.4.66 69.89.207.99 2 u 160 64 374 41.223 -1965.9 83.585
*104.194.8.227 164.67.62.212 2 u 33 64 377 9.180 -1934.4 97.376
Obtain NetQ Agent Software Package
To install the NetQ Agent you need to install netq-agent on each switch or host. This is available from the NVIDIA networking repository.
Note that NetQ has a separate repository from SONiC.
To obtain the NetQ Agent package:
Install the wget utility so you can install the GPG keys in step 3.
Before you install the NetQ Agent on a Red Hat or CentOS server, make sure you install and run at least the minimum versions of the following packages:
iproute-3.10.0-54.el7_2.1.x86_64
lldpd-0.9.7-5.el7.x86_64
ntp-4.2.6p5-25.el7.centos.2.x86_64
ntpdate-4.2.6p5-25.el7.centos.2.x86_64
Verify the Server is Running lldpd and wget
Make sure you are running lldpd, not lldpad. CentOS does not include lldpd by default, nor does it include wget; however,the installation requires it.
To install this package, run the following commands:
If NTP is not already installed and configured, follow these steps:
Install
NTP on the server, if not already installed. Servers must be in time synchronization with the NetQ Platform or NetQ Appliance to enable useful statistical analysis.
root@ubuntu:~# sudo apt-get install ntp
Configure the network time server.
Open the /etc/ntp.conf file in your text editor of choice.
Under the Server section, specify the NTP server IP address or hostname.
Create the file /etc/apt/sources.list.d/cumulus-host-ubuntu-bionic.list and add the following line:
root@ubuntu:~# vi /etc/apt/sources.list.d/cumulus-apps-deb-bionic.list
...
deb [arch=amd64] https://apps3.cumulusnetworks.com/repos/deb bionic netq-latest
...
The use of netq-latest in these examples means that a get to the repository always retrieves the latest version of NetQ, even for a major version update. If you want to keep the repository on a specific version — such as netq-4.4 — use that instead.
Install NetQ Agent
After completing the preparation steps, install the agent onto your switch or host.
Cumulus Linux 4.4 and later includes the netq-agent package by default. To install the NetQ Agent on earlier versions of Cumulus Linux:
Update the local apt repository, then install the NetQ software on the switch.
Continue with NetQ Agent Configuration in the next section.
Configure NetQ Agent
After you install the NetQ Agents on the switches you want to monitor, you must configure them to obtain useful and relevant data.
The NetQ Agent is aware of and communicates through the designated VRF. If you do not specify one, it uses the default VRF (named default). If you later change the VRF configured for the NetQ Agent (using a lifecycle management configuration profile, for example), you might cause the NetQ Agent to lose communication.
If you configure the NetQ Agent to communicate in a VRF that is not default or mgmt, the following line must be added to /etc/netq/netq.yml in the netq-agent section:
netq-agent:
netq_stream_address: 0.0.0.0
Two methods are available for configuring a NetQ Agent:
Edit the configuration file on the switch, or
Use the NetQ CLI
Configure NetQ Agents Using a Configuration File
You can configure the NetQ Agent in the netq.yml configuration file contained in the /etc/netq/ directory.
Open the netq.yml file using your text editor of choice. For example:
sudo nano /etc/netq/netq.yml
Locate the netq-agent section, or add it.
Set the parameters for the agent as follows:
port: 31980 (default configuration)
server: IP address of the NetQ Appliance or VM where the agent should send its collected data
If you configured the NetQ CLI, you can use it to configure the NetQ Agent to send telemetry data to the NetQ Appliance or VM. To configure the NetQ CLI, refer to Install NetQ CLI.
A couple of additional options are available for configuring the NetQ Agent. If you are using VRFs, you can configure the agent to communicate over a specific VRF. You can also configure the agent to use a particular port.
Configure the Agent to Use a VRF
By default, NetQ uses the default VRF for communication between the NetQ Appliance or VM and NetQ Agents. While optional, NVIDIA strongly recommends that you configure NetQ Agents to communicate with the NetQ Appliance or VM only via a
VRF, including a
management VRF. To do so, you need to specify the VRF name when configuring the NetQ Agent. For example, if you configured the management VRF and you want the agent to communicate with the NetQ Appliance or VM over it, configure the agent like this:
If you later change the VRF configured for the NetQ Agent (using a lifecycle management configuration profile, for example), you might cause the NetQ Agent to lose communication.
Configure the Agent to Communicate over a Specific Port
By default, NetQ uses port 31980 for communication between the NetQ Appliance or VM and NetQ Agents. If you want the NetQ Agent to communicate with the NetQ Appliance or VM via a different port, you need to specify the port number when configuring the NetQ Agent, like this:
sudo netq config add agent server 192.168.1.254 port 7379
sudo netq config restart agent
Configure the On-switch OPTA
On-switch OPTA functionality is an early access feature, and it does not support Flow Analysis or LCM.
On-switch OPTA is intended for use in small NetQ Cloud deployments where a dedicated OPTA might not be necessary. If you need help assessing the correct OPTA configuration for your deployment, contact your NVIDIA sales team.
Instead of installing a dedicated OPTA appliance, you can enable the OPTA service on every switch in your environment that will send data to the NetQ Cloud. To configure a switch for OPTA functionality, install the netq-opta package.
After the netq-opta package is installed, add your OPTA configuration key. Run the following command with the config-key obtained from the email you received from NVIDIA titled NetQ Access Link. You can also obtain the configuration key through the NetQ UI in the premises management configuration.
The final step is configuring the local NetQ Agent on the switch to connect to the local OPTA service. Configure the agent on the switch to connect to localhost with the following command:
Installing the NetQ CLI on your NetQ Appliances, VMs, switches, or hosts gives you access to new features and bug fixes, and allows you to manage your network from multiple points in the network.
After installing the NetQ software and agent on each switch you want to monitor, you can also install the NetQ CLI on switches running:
Cumulus Linux 3.7.16 and later
SONiC 202012
CentOS 7
RHEL 7.1
Ubuntu 18.04
If your network uses a proxy server for external connections, you should first
configure a global proxy so apt-get can access the software package in the NetQ repository.
Prepare for NetQ CLI Installation on a RHEL, CentOS, or Ubuntu Server
For servers running RHEL 7, CentOS or Ubuntu OS, you need to:
Verify you installed the minimum service packages versions
Verify the server is running lldpd
Install and configure NTP, if needed
Obtain NetQ software packages
These steps are not required for Cumulus Linux or SONiC.
Verify Service Package Versions
iproute-3.10.0-54.el7_2.1.x86_64
lldpd-0.9.7-5.el7.x86_64
ntp-4.2.6p5-25.el7.centos.2.x86_64
ntpdate-4.2.6p5-25.el7.centos.2.x86_64
iproute 1:4.3.0-1ubuntu3.16.04.1 all
iproute2 4.3.0-1ubuntu3 amd64
lldpd 0.7.19-1 amd64
ntp 1:4.2.8p4+dfsg-3ubuntu5.6 amd64
Verify What CentOS and Ubuntu Are Running
For CentOS and Ubuntu, make sure you are running lldpd, not lldpad. CentOS and Ubuntu do not include lldpd by default, even though the installation requires it. In addition, CentOS does not include wget, even though the installation requires it.
To install this package, run the following commands:
If you are running NTP in your out-of-band management network with VRF, specify the VRF (ntp@<vrf-name> versus just ntp) in the above commands.
Verify NTP is operating correctly. Look for an asterisk (*) or a plus sign (+) that indicates the clock synchronized with NTP.
root@rhel7:~# ntpq -pn
remote refid st t when poll reach delay offset jitter
==============================================================================
+173.255.206.154 132.163.96.3 2 u 86 128 377 41.354 2.834 0.602
+12.167.151.2 198.148.79.209 3 u 103 128 377 13.395 -4.025 0.198
2a00:7600::41 .STEP. 16 u - 1024 0 0.000 0.000 0.000
\*129.250.35.250 249.224.99.213 2 u 101 128 377 14.588 -0.299 0.243
Install
NTP on the server, if not already installed. Servers must be in time synchronization with the NetQ Platform or NetQ Appliance to enable useful statistical analysis.
root@ubuntu:~# sudo apt-get install ntp
Configure the network time server.
Open the /etc/ntp.conf file in your text editor of choice.
Under the Server section, specify the NTP server IP address or hostname.
Create the file /etc/apt/sources.list.d/cumulus-host-ubuntu-bionic.list and add the following line:
root@ubuntu:~# vi /etc/apt/sources.list.d/cumulus-apps-deb-bionic.list
...
deb [arch=amd64] https://apps3.cumulusnetworks.com/repos/deb bionic netq-latest
...
The use of netq-latest in these examples means that a get to the repository always retrieves the latest version of NetQ, even for a major version update. If you want to keep the repository on a specific version — such as netq-4.4 — use that instead.
Install NetQ CLI
Follow these steps to install the NetQ CLI on a switch or host.
To install the NetQ CLI you need to install netq-apps on each switch. This is available from the NVIDIA networking repository.
Cumulus Linux 4.4 and later includes the netq-apps package by default.
If your network uses a proxy server for external connections, you should first
configure a global proxy so apt-get can access the software package in the NVIDIA networking repository.
To obtain the NetQ CLI package:
Edit the /etc/apt/sources.list file to add the repository for NetQ.
Note that NetQ has a separate repository from Cumulus Linux.
cumulus@switch:~$ sudo nano /etc/apt/sources.list
...
deb https://apps3.cumulusnetworks.com/repos/deb CumulusLinux-3 netq-4.5
...
You can use the deb https://apps3.cumulusnetworks.com/repos/deb CumulusLinux-4 netq-latest repository to always retrieve the latest version of NetQ.
Cumulus Linux 4.4 and later includes the netq-apps package by default.
To add the repository, uncomment or add the following line in /etc/apt/sources.list:
cumulus@switch:~$ sudo nano /etc/apt/sources.list
...
deb https://apps3.cumulusnetworks.com/repos/deb CumulusLinux-4 netq-4.5
...
You can use the deb https://apps3.cumulusnetworks.com/repos/deb CumulusLinux-4 netq-latest repository if you want to always retrieve the latest posted version of NetQ.
Update the local apt repository and install the software on the switch.
Continue with NetQ CLI configuration in the next section.
To install the NetQ CLI you need to install netq-apps on each switch. This is available from the NVIDIA networking repository.
If your network uses a proxy server for external connections, you should first
configure a global proxy so apt-get can access the software package in the NVIDIA networking repository.
To obtain the NetQ CLI package:
Edit the /etc/apt/sources.list file to add the repository for NetQ.
Continue with NetQ CLI configuration in the next section.
Configure the NetQ CLI
By default, you do not configure the NetQ CLI during the NetQ installation. The configuration resides in the /etc/netq/netq.yml file. Until the CLI is configured on a device, you can only run netq config and netq help commands, and you must use sudo to run them.
At minimum, you need to configure the NetQ CLI and NetQ Agent to communicate with the telemetry server. To do so, configure the NetQ Agent and the NetQ CLI so that they are running in the VRF where the routing tables have connectivity to the telemetry server (typically the management VRF).
To access and configure the CLI for your on-premises NetQ deployment, you must generate AuthKeys. You’ll need your username and password to generate them. These keys provide authorized access (access key) and user authentication (secret key).
To generate AuthKeys:
Enter your on-premises NetQ appliance hostname or IP address into your browser to open the NetQ UI login page.
Enter your username and password.
Expand the Menu, and under Admin, select Management.
Select Manage on the User Accounts card.
Select your user and click above the table.
Copy these keys to a safe place. Select Copy to obtain the CLI configuration command to use on your devices.
The secret key is only shown once. If you do not copy these, you will need to regenerate them and reconfigure CLI access.
You can also save these keys to a YAML file for easy reference, and to avoid having to type or copy the key values. You can:
store the file wherever you like, for example in /home/cumulus/ or /etc/netq
name the file whatever you like, for example credentials.yml, creds.yml, or keys.yml
The following example uses the individual access key, a premises of datacenterwest, and the default Cloud address, port and VRF. Replace the key values with your generated keys if you are using this example on your server.
This example uses an optional keys file. Replace the keys filename and path with the full path and name of your keys file, and the datacenterwest premises name with your premises name if you are using this example on your server.
If you have multiple premises and want to query data from a different premises than you originally configured, rerun the netq config add cli server command with the desired premises name. You can only view the data for one premises at a time with the CLI.
To access and configure the CLI for your on-premises NetQ deployment, you must generate AuthKeys. You’ll need your username and password to generate them. These keys provide authorized access (access key) and user authentication (secret key). Your credentials and NetQ Cloud addresses were obtained during first login to the NetQ Cloud and premises activation.
To generate AuthKeys:
Enter netq.nvidia.com into your browser to open the NetQ UI login page.
Enter your username and password.
Expand the Menu, and under Admin, select Management.
Select Manage on the User Accounts card.
Select your user and click above the table.
Copy these keys to a safe place. Select Copy to obtain the CLI configuration command to use on your devices.
The secret key is only shown once. If you do not copy these, you will need to regenerate them and reconfigure CLI access.
You can also save these keys to a YAML file for easy reference, and to avoid having to type or copy the key values. You can:
store the file wherever you like, for example in /home/cumulus/ or /etc/netq
name the file whatever you like, for example credentials.yml, creds.yml, or keys.yml
The following example uses the individual access key, a premises of datacenterwest, and the default Cloud address, port and VRF. Replace the key values with your generated keys if you are using this example on your server.
sudo netq config add cli server api.netq.cumulusnetworks.com access-key 123452d9bc2850a1726f55534279dd3c8b3ec55e8b25144d4739dfddabe8149e secret-key /vAGywae2E4xVZg8F+HtS6h6yHliZbBP6HXU3J98765= premises datacenterwest
Successfully logged into NetQ cloud at api.netq.cumulusnetworks.com:443
Updated cli server api.netq.cumulusnetworks.com vrf default port 443. Please restart netqd (netq config restart cli)
sudo netq config restart cli
Restarting NetQ CLI... Success!
The following example uses an optional keys file. Replace the keys filename and path with the full path and name of your keys file, and the datacenterwest premises name with your premises name if you are using this example on your server.
sudo netq config add cli server api.netq.cumulusnetworks.com cli-keys-file /home/netq/nq-cld-creds.yml premises datacenterwest
Successfully logged into NetQ cloud at api.netq.cumulusnetworks.com:443
Updated cli server api.netq.cumulusnetworks.com vrf default port 443. Please restart netqd (netq config restart cli)
sudo netq config restart cli
Restarting NetQ CLI... Success!
If you have multiple premises and want to query data from a different premises than you originally configured, rerun the netq config add cli server command with the desired premises name. You can only view the data for one premises at a time with the CLI.
Add More Nodes to Your Server Cluster
You can add additional nodes to your server cluster on-premises and cloud deployments using the CLI:
Run the following CLI command to add a new worker node for on-premises deployments:
netq install cluster add-worker <text-worker-01>
Run the following CLI command to add a new worker node for cloud deployments:
The NetQ UI ships with a self-signed certificate that is sufficient for non-production environments or cloud deployments. For on-premises deployments, however, you receive a warning from your browser that this default certificate is not trusted when you first log in to the NetQ UI. You can avoid this by installing your own signed certificate.
If you already have a certificate installed and want to change or update it, run the kubectl delete secret netq-gui-ingress-tls [name] --namespace default command.
You need the following items to perform the certificate installation:
A valid X509 certificate.
A private key file for the certificate.
A DNS record name configured to access the NetQ UI.
The FQDN should match the common name of the certificate. If you use a wild card in the common name — for example, if the common name of the certificate is *.example.com — then the NetQ telemetry server should reside on a subdomain of that domain, accessible via a URL like netq.example.com.
A functioning and healthy NetQ instance.
You can verify this by running the netq show opta-health command.
Install a Certificate using the NetQ CLI
Log in to the NetQ On-premises Appliance or VM via SSH and copy your certificate and key file there.
Generate a Kubernetes secret called netq-gui-ingress-tls.
cumulus@netq-ts:~$ kubectl create secret tls netq-gui-ingress-tls \
--namespace default \
--key <name of your key file>.key \
--cert <name of your cert file>.crt
Verify that you created the secret successfully.
cumulus@netq-ts:~$ kubectl get secret
NAME TYPE DATA AGE
netq-gui-ingress-tls kubernetes.io/tls 2 5s
Update the ingress rule file to install self-signed certificates.
After saving your changes, delete the current swagger-ui pod to restart the service:
cumulus@netq-ts:~$ kubectl delete pod -l app=swagger-ui
pod "swagger-ui-deploy-69cfff7b45-cj6r6" deleted
Your custom certificate should now be working. Verify this by opening the NetQ UI at https://<your-hostname-or-ipaddr> in your browser.
Update Cloud Activation Key
NVIDIA provides a cloud activation key when you set up your premises. You use the cloud activation key (called the config-key) to access the cloud services. Note that these authorization keys are different from the ones you use to configure the CLI.
On occasion, you might want to update your cloud service activation key—for example, if you mistyped the key during installation and now your existing key does not work, or you received a new key for your premises from NVIDIA.
Update the activation key using the NetQ CLI:
Run the following command on your standalone or master NetQ Cloud Appliance or VM replacing text-opta-key with your new key.
This section describes how to upgrade from your current installation to NetQ 4.5. Refer to the release notes before you upgrade.
You must upgrade your NetQ On-premises or Cloud Appliances or virtual machines (VMs). While there is some backwards compatibility with the previous NetQ release for any version, upgrading NetQ Agents is always recommended. If you want access to new and updated commands, you can upgrade the CLI on your physical servers or VMs, and monitored switches and hosts as well.
To complete the upgrade for either an on-premises or a cloud deployment:
NetQ accounts are assigned one of two roles: admin or user.
Accounts with admin privileges can perform the same actions as user accounts. Additionally, admins can access a management dashboard in the UI. From this dashboard, admins can:
Create, edit, and delete NetQ accounts.
Manage login policies, including SSO and LDAP authentication.
Review account activity.
Create, edit, and delete system events, channels, and notifications.
Manage premises.
Schedule network traces and validations.
Manage switches' lifecycles.
The following image displays the management dashboard. Accounts with user privileges cannot perform the functions described above and do not have access to the management dashboard.
Sign in to NetQ as an admin to view and manage accounts. If you want to change individual preferences, visit Set User Preferences.
Navigate to the NetQ management dashboard to complete the tasks outlined in this section. To get there, expand the Menu on the NetQ dashboard and select Management.
Add an Account
This section outlines the steps to add a local user account. To add an LDAP account, refer to LDAP Authentication.
To create a new account:
On the User Accounts card, select Manage to open a table listing all accounts.
Above the table, select Add to add an account.
Enter the fields and select Save.
Be especially careful entering the email address; you cannot change it once you save the account. If you save a mistyped email address, you must delete the account and create a new one.
Edit an Account
As an admin, you can:
Edit the first or last name associated with an account
Reset an account’s password
Change an account’s role (user or admin)
You cannot edit the email address associated with an account, because this is the identifier the system uses for authentication. If you need to change an email address, delete the account and create a new one.
To edit an account:
On the User Accounts card, select Manage to open a table listing all accounts.
Select the account you’d like to edit. Above the table, click Edit to edit the account’s information.
Reset an Admin Password
If your account is assigned an admin role, reset your password by restoring the default password, then changing the password:
Run the following command on your on-premises server’s CLI:
Click Forgot Password? and enter an email address. Look for a message with the subject NetQ Password Reset Link from netq-sre@cumulusnetworks.com.
Select the link in the email and follow the instructions to create a new password.
Delete an Account
To delete one or more accounts:
On the User Accounts card, select Manage to open a table listing all accounts.
Select one or more accounts. Above the table, click Delete to delete the selected account(s).
View Account Activity
Administrators can view account activity in the activity log.
To view the log, expand the Menu on the NetQ dashboard and select Management. Select Activity Log to open a table listing account activity. Use the controls above the table to filter or export the data.
Manage Login Policies
Administrators can configure a session expiration time and the number of times users can refresh before requiring them to log in again to NetQ.
To configure these login policies:
On the Login Management card, select Manage.
Select how long an account can be logged in before requiring a user to log in again:
Click Update to save the changes.
The Login Management card reflects the updated configuration.
Premises Management
The NetQ management dashboard lets you configure a single NetQ UI and CLI for monitoring data from multiple premises. This means you do not need to log in to each premises individually to view the data.
Configure Multiple Premises
There are two ways to implement a multi-site, on-premises deployment: (1) as a full deployment at the primary premises and each of the external premises or (2) as a full deployment at the primary premises with smaller deployments at the secondary premises.
The primary premises is called OPID0 by default in the UI.
Full NetQ Deployment at Each Premises
In this implementation, there is a NetQ appliance or VM running the NetQ Platform software with a database. Each premises operates independently as an external premises, with its own NetQ UI and CLI. The NetQ appliance or VM at one of the deployments acts as the primary premises. A list of external premises is stored with the primary deployment.
To configure a single UI to monitor multiple premises:
From the UI of the primary premises (OPID0), select the Premises dropdown in the top-right corner of the screen.
Select Manage premises, then select the External premises tab.
Select Add external premises.
Enter the IP address for the external server.
Enter the username and password for the external server, then click Next. These are the same credentials used to log in to the UI for the external server.
Select the premises you want to connect, then click Finish.
You can also reduce the number of premises that can be displayed in the UI by hovering over a deployment and selecting Delete.
To view the premises you just added, return to the home workbench and select the Premises dropdown in the top-right corner of the screen.
Full NetQ Deployment at Primary Premises and Smaller Deployments at Secondary Premises
In this implementation, there is a NetQ appliance or VM at one of the deployments acting as the primary premises for the other deployments. The primary premises runs the NetQ software (including the NetQ UI and CLI) and houses the database. All other deployments are secondary premises; they run the NetQ Collector software and send their data to the primary premises for storage and processing. A list of these secondary premises is stored with the primary deployment.
After the multiple premises are configured, you can view this list of premises in the NetQ UI at the primary premises, change the name of premises on the list, and delete premises from the list.
In this deployment model, the data is stored and can be viewed only from the NetQ UI at the primary premises.
The primary NetQ premises must be installed and operational before the secondary premises can be added.
To create and add secondary premises:
In the workbench header, select the Premises dropdown.
Click Manage premises. Your primary premises (OPID0) is shown by default.
Click Add premises.
Enter the name of a secondary premises you’d like to add, then click Done.
From the confirmation dialog, select View config key.
Click the copy icon, then save the key to a safe place, or click e-mail to send it to yourself or others. Then click Confirm activation.
To view the premises you just added, return to the home workbench and select the Premises dropdown at the top-right corner of the screen.
Rename a Premises
To rename an existing premises:
In the workbench header, select the Premises dropdown, then Manage premises.
Select a premises to rename, then click Edit.
Enter the new name for the premises, then click Done.
Back Up and Restore NetQ
The following sections describe how to back up and restore your NetQ data and VMs.
These procedures do not apply to your NetQ Cloud Appliance or VM. The NetQ cloud service handles data backups automatically.
You must run backup and restore scripts with sudo privileges.
Back Up Your NetQ Data
NetQ stores its data in a Cassandra database. You perform backups by running scripts provided with the software and located in the /usr/sbin directory. When you run a backup, the script creates a single tar file in the /opt/backuprestore/ directory.
To create a backup, refer to the following steps for your NetQ version.
In the directory you copied the vm-backuprestore.sh script, run:
cumulus@netq-appliance:~$ sudo ./vm-backuprestore.sh --backup
[sudo] password for cumulus:
Mon Feb 6 12:37:18 2023 - Please find detailed logs at: /var/log/vm-backuprestore.log
Mon Feb 6 12:37:18 2023 - Starting backup of data, the backup might take time based on the size of the data
Mon Feb 6 12:37:19 2023 - Scaling static pods to replica 0
Mon Feb 6 12:37:19 2023 - Scaling all pods to replica 0
Mon Feb 6 12:37:28 2023 - Scaling all daemonsets to replica 0
Mon Feb 6 12:37:29 2023 - Waiting for all pods to go down
Mon Feb 6 12:37:29 2023 - All pods are down
Mon Feb 6 12:37:29 2023 - Creating backup tar /opt/backuprestore/backup-netq-standalone-onprem-4.4.0-2023-02-06_12_37_29_UTC.tar
Backup is successful, please scp it to the master node the below command:
sudo scp /opt/backuprestore/backup-netq-standalone-onprem-4.4.0-2023-02-06_12_37_29_UTC.tar cumulus@<ip_addr>:/home/cumulus
Restore the backup file using the below command:
./vm-backuprestore.sh --restore --backupfile /opt/backuprestore/backup-netq-standalone-onprem-4.4.0-2023-02-06_12_37_29_UTC.tar
cumulus@netq-appliance:~$
Verify the backup file creation was successful:
cumulus@netq-appliance:~$ cd /opt/backuprestore/
cumulus@netq-appliance:~/opt/backuprestore$ ls
backup-netq-standalone-onprem-4.4.0-2023-02-06_12_37_29_UTC.tar
Back Up NetQ 4.5.0
Run the backup script /usr/sbin/vm-backuprestore.sh:
cumulus@netq-appliance:~$ cd /opt/backuprestore/
cumulus@netq-appliance:~/opt/backuprestore$ ls
Restore Your NetQ Data
Restore NetQ data with the backup file you created in the steps above. The restore option of the backup script copies the data from the backup file to the database, decompresses it, verifies the restoration, and starts all necessary services. You should not see any data loss as a result of a restore operation.
Run the restore script, referencing the directory where the backup file resides.
If you restore NetQ data to a server with an IP address that is different from the one used to back up the data, you must reconfigure the agents on each switch as a final step.
cumulus@netq-appliance:~$ sudo vm-backuprestore.sh --restore --backupfile /home/cumulus/backup-netq-standalone-onprem-4.4.0-2023-02-06_12_37_29_UTC.tar
Mon Feb 6 12:39:57 2023 - Please find detailed logs at: /var/log/vm-backuprestore.log
Mon Feb 6 12:39:57 2023 - Starting restore of data
Mon Feb 6 12:39:57 2023 - Extracting release file from backup tar
Mon Feb 6 12:39:57 2023 - Cleaning the system
Mon Feb 6 12:39:57 2023 - Restoring data from tarball /home/cumulus/backup-netq-standalone-onprem-4.4.0-2023-02-06_12_37_29_UTC.tar
Data restored successfully
Please follow the below instructions to bootstrap the cluster
The config key restored is EhVuZXRxLWVuZHBvaW50LWdhdGVfYXkYsagDIix2OUJhMUpyekMwSHBBaitUdTVDaTRvbVJDR3F6Qlo4VHhZRytjUUhLZGJRPQ==, alternately the config key is available in file /tmp/config-key
Pass the config key while bootstrapping:
Example(standalone): netq install standalone full interface eth0 bundle /mnt/installables/NetQ-4.5.0.tgz config-key EhVuZXRxLWVuZHBvaW50LWdhdGV3YXkYsagDIix2OUJhMUpyekMwSHBbaitUdTVDaTRvbVJDR3F6Qlo4VHhZRytjUUhLZGJRPQ==
Example(cluster): netq install cluster full interface eth0 bundle /mnt/installables/NetQ-4.5.0.tgz config-key EhVuZXRxLWVuZHBvaW50LWdhdGV3YXkYsagDIix2OUJhMUpyekMwSHBbaitUdTVDaTRvbVJDR3F6Qlo4VHhZRytjUUhLZGJRPQ==
Alternately you can setup config-key post bootstrap in case you missed to pass it during bootstrap
Example(standalone): netq install standalone activate-job config-key EhVuZXRxLWVuZHBvaW50LWdhdGV3YXkYsagDIix2OUJhMUpyekMwSHBbaitUdTVDaTRvbVJDR3F6Qlo4VHhZRytjUUhLZGJRPQ==
Example(cluster): netq install cluster activate-job config-key EhVuZXRxLWVuZHBvaW50LWdhdGV3YXkYsagDIix2OUJhMUpyekMwSHBbaitUdTVDaTRvbVJDR3F6Qlo4VHhZRytjUUhLZGJRPQ==
In case the IP of the restore machine is different from the backup machine, please reconfigure the agents using: https://docs.nvidia.com/networking-ethernet-software/cumulus-netq-44/Installation-Management/Install-NetQ/Install-NetQ-Agents/#configure-netq-agents-using-a-configuration-file
cumulus@netq-appliance:~$
Post-installation Configurations
This section describes the various integrations you can configure after installing NetQ.
LDAP Authentication
As an administrator, you can integrate the NetQ role-based access control (RBAC) with your lightweight directory access protocol (LDAP) server in on-premises deployments. NetQ maintains control over role-based permissions for the NetQ application. There are two roles, admin and user. With the RBAC integration, LDAP handles account authentication and your directory service (such as Microsoft Active Directory, Kerberos, OpenLDAP, and Red Hat Directory Service). A copy of each account from LDAP is stored in the local NetQ database.
Integrating with an LDAP server does not prevent you from configuring local accounts (stored and managed in the NetQ database) as well.
Get Started
LDAP integration requires information about how to connect to your LDAP server, the type of authentication you plan to use, bind credentials, and, optionally, search attributes.
Provide Your LDAP Server Information
To connect to your LDAP server, you need the URI and bind credentials. The URI identifies the location of the LDAP server. It comprises a FQDN (fully qualified domain name) or IP address, and the port of the LDAP server where the LDAP client can connect. For example: myldap.mycompany.com or 192.168.10.2. Typically you use port 389 for connection over TCP or UDP. In production environments, you deploy a secure connection with SSL. In this case, the port used is typically 636. Setting the Enable SSL toggle automatically sets the server port to 636.
Specify Your Authentication Method
There are two types of user authentication: anonymous and basic.
Anonymous: LDAP client does not require any authentication. The user can access all resources anonymously. This is not commonly used for production environments.
Basic: (Also called Simple) LDAP client must provide a bind DN and password to authenticate the connection. When selected, the Admin credentials appear: Bind DN and Bind Password. You define the distinguished name (DN) using a string of variables. Some common variables include:
Syntax
Description or Usage
cn
Common name
ou
Organizational unit or group
dc
Domain name
dc
Domain extension
Bind DN: DN of user with administrator access to query the LDAP server; used for binding with the server. For example, uid =admin,ou=ntwkops,dc=mycompany,dc=com.
Bind Password: Password associated with Bind DN.
The Bind DN and password get sent as clear text. Only users with these credentials can perform LDAP operations.
If you are unfamiliar with the configuration of your LDAP server, contact your administrator to ensure you select the appropriate authentication method and credentials.
Define User Attributes
You need the following two attributes to define a user entry in a directory:
Base DN: Location in directory structure where search begins. For example, dc=mycompany,dc=com.
User ID: Type of identifier used to specify an LDAP user. This can vary depending on the authentication service you are using. For example, you can use the user ID (UID) or email address with OpenLDAP, whereas you might use the sAMAccountName with Active Directory.
Optionally, you can specify the first name, last name, and email address of the user.
Set Search Attributes
While optional, specifying search scope indicates where to start and how deep a given user can search within the directory. You specify the data to search for in the search query.
Search scope options include:
Subtree: Search for users from base, subordinates at any depth (default)
Base: Search for users at the base level only; no subordinates
One level: Search for immediate children of user; not at base or for any descendants
Subordinate: Search for subordinates at any depth of user; but not at base
A typical search query for users could be {userIdAttribute}={userId}.
Create an LDAP Configuration
You can configure one LDAP server per bind DN (distinguished name). After you configure LDAP, you can verify the connectivity and save the configuration.
To create an LDAP configuration:
Expand the Menu and select Management.
Locate the LDAP Server Info card, and click Configure LDAP.
Fill out the LDAP server configuration form according to your particular configuration.
Click Save to complete the configuration, or click Cancel to discard the configuration.
LDAP config cannot be changed after it is configured. If you need to change the configuration, you must delete the current LDAP configuration and create a new one. Note that if you change the LDAP server configuration, all users created against that LDAP server remain in the NetQ database and continue to be visible, but are no longer viable. You must manually delete those users if you do not want to see them.
Example LDAP Configurations
This section lists a variety of example configurations. Scenarios 1-3 are based on using an OpenLDAP or similar authentication service. Scenario 4 is based on using the Active Directory service for authentication.
Scenario 1: Base Configuration
In this scenario, we are configuring the LDAP server with anonymous authentication, a user ID based on an email address, and a search scope of base.
Parameter
Value
Host Server URL
ldap1.mycompany.com
Host Server Port
389
Authentication
Anonymous
Base DN
dc=mycompany,dc=com
User ID
email
Search Scope
Base
Search Query
{userIdAttribute}={userId}
Scenario 2: Basic Authentication and Subset of Users
In this scenario, we are configuring the LDAP server with basic authentication, accessible only to users in the network operators group, and with a limited search scope.
Parameter
Value
Host Server URL
ldap1.mycompany.com
Host Server Port
389
Authentication
Basic
Admin Bind DN
uid =admin,ou=netops,dc=mycompany,dc=com
Admin Bind Password
nqldap!
Base DN
dc=mycompany,dc=com
User ID
UID
Search Scope
One Level
Search Query
{userIdAttribute}={userId}
Scenario 3: Scenario 2 with Widest Search Capability
In this scenario, we are configuring the LDAP server with basic authentication, accessible only to users in the network administrators group, and with an unlimited search scope.
Parameter
Value
Host Server URL
192.168.10.2
Host Server Port
389
Authentication
Basic
Admin Bind DN
uid =admin,ou=netadmin,dc=mycompany,dc=com
Admin Bind Password
1dap*netq
Base DN
dc=mycompany, dc=net
User ID
UID
Search Scope
Subtree
Search Query
userIdAttribute}={userId}
Scenario 4: Scenario 3 with Active Directory Service
In this scenario, we are configuring the LDAP server with basic authentication, accessible only to users in the given Active Directory group, and with an unlimited search scope.
Parameter
Value
Host Server URL
192.168.10.2
Host Server Port
389
Authentication
Basic
Admin Bind DN
cn=netq,ou=45,dc=mycompany,dc=com
Admin Bind Password
nq&4mAd!
Base DN
dc=mycompany, dc=net
User ID
sAMAccountName
Search Scope
Subtree
Search Query
{userIdAttribute}={userId}
Add LDAP Users to NetQ
Click Menu and select Management.
Locate the User Accounts card, and click Manage.
From the User accounts tab, select Add user above the table.
Select LDAP User, then enter the user’s ID.
Enter your administrator password, then select Search.
If the user is found, the email address, first, and last name fields are automatically populated. If searching is not enabled on the LDAP server, you must enter the information manually.
If the fields are not automatically filled in, and searching is enabled on the LDAP server, you might need to edit the mapping file.
LDAP user passwords are not stored in the NetQ database and are always authenticated against LDAP.
Repeat these steps to add additional LDAP users.
Remove LDAP Users from NetQ
You can remove LDAP users in the same manner as local users.
Expand the Menu and select Management.
Locate the User Accounts card, and click Manage.
Select the user(s) you want to remove, then select Delete.
If you delete an LDAP user in LDAP it is not automatically deleted from NetQ; however, the login credentials for these LDAP users stop working immediately.
Integrate NetQ with Grafana
Switches collect statistics about the performance of their interfaces. The NetQ Agent on each switch collects these statistics every 15 seconds and then sends them to your NetQ Appliance or Virtual Machine.
NetQ collects statistics for physical interfaces; it does not collect statistics for virtual interfaces, such as bonds, bridges, and VXLANs.
NetQ displays:
Transmit with tx_ prefix: bytes, carrier, colls, drop, errs, packets
Receive with rx_ prefix: bytes, drop, errs, frame, multicast, packets
You can use Grafana, an open source analytics and monitoring tool, to view these statistics. The fastest way to achieve this is by installing Grafana on an application server or locally per user, and then installing the NetQ plugin.
If you do not have Grafana installed already, refer to grafana.com for instructions on installing and configuring the Grafana tool.
Install NetQ Plugin for Grafana
Use the Grafana CLI to install the NetQ plugin. For more detail about this command, refer to the Grafana CLI documentation.
The Grafana plugin comes unsigned. Before you can install it, you need to update the grafana.ini file then restart the Grafana service:
Edit the /etc/grafana/grafana.ini file and add allow_loading_unsigned_plugins = netq-dashboard under plugins:
Cumulus in the Cloud (CITC): plugin.air.netq.nvidia.com
Select procdevstats from the Module dropdown.
Enter your credentials (the ones used to log in).
For NetQ cloud deployments only, if you have more than one premises configured, you can select the premises you want to view, as follows:
If you leave the Premises field blank, the first premises name is selected by default
If you enter a premises name, that premises is selected for viewing
Note: If multiple premises are configured with the same name, then the first premises of that name is selected for viewing
Click Save & Test.
Create Your NetQ Dashboard
With the data source configured, you can create a dashboard with the transmit and receive statistics of interest to you.
Create a Dashboard
Click to open a blank dashboard.
Click (Dashboard Settings) at the top of the dashboard.
Add Variables
Click Variables.
Enter hostname into the Name field.
Enter hostname into the Label field.
Select Net-Q from the Data source list.
Select On Dashboard Load from the Refresh list.
Enter hostname into the Query field.
Click Add.
You should see a preview at the bottom of the hostname values.
Click Variables to add another variable for the interface name.
Enter ifname into the Name field.
Enter ifname into the Label field.
Select Net-Q from the Data source list.
Select On Dashboard Load from the Refresh list.
Enter ifname into the Query field.
Click Add.
You should see a preview at the bottom of the ifname values.
Click Variables to add another variable for metrics.
Enter metrics into the Name field.
Enter metrics into the Label field.
Select Net-Q from the Data source list.
Select On Dashboard Load from the Refresh list.
Enter metrics into the Query field.
Click Add.
You should see a preview at the bottom of the metrics values.
Add Charts
Now that the variables are defined, click to return to the new dashboard.
Click Add Query.
Select Net-Q from the Query source list.
Select the interface statistic you want to view from the Metric list.
Click the General icon.
Select hostname from the Repeat list.
Set any other parameters around how to display the data.
Return to the dashboard.
Select one or more hostnames from the hostname list.
Select one or more interface names from the ifname list.
Select one or more metrics to display for these hostnames and interfaces from the metrics list.
The following example shows a dashboard with two hostnames, two interfaces, and one metric selected. The more values you select from the variable options, the more charts appear on your dashboard.
Analyze the Data
When you have configured the dashboard, you can start analyzing the data. You can explore the data by modifying the viewing parameters in one of several ways using the dashboard tool set:
Select a different time period for the data by clicking the forward or back arrows. The default time range is dependent on the width of your browser window.
Zoom in on the dashboard by clicking the magnifying glass.
Manually refresh the dashboard data, or set an automatic refresh rate for the dashboard from the down arrow.
Add additional panels.
Click any chart title to edit or remove it from the dashboard.
Rename the dashboard by clicking the cog wheel and entering the new name.
SSO Authentication
You can integrate your NetQ Cloud deployment with a Microsoft Azure Active Directory (AD) or Google Cloud authentication server to support single sign-on (SSO) to NetQ. NetQ supports integration with SAML (Security Assertion Markup Language), OAuth (Open Authorization), and multi-factor authentication (MFA). Only one SSO configuration can be configured at a time.
You can create local accounts with default access roles by enabling SSO. After enabling SSO, users logging in for the first time can sign up for SSO through the NetQ login screen or with a link provided by an admin.
Add SSO Configuration and Accounts
To integrate your authentication server:
Expand the Menu and select Management.
Locate the SSO Configuration card and select Manage.
Select either SAML or OpenID (which uses OAuth with OpenID Connect).
Specify the parameters:
You need several pieces of data from your Microsoft Azure or Google account and authentication server to complete the integration.
SSO Organization is typically a company’s name or a department. The name entered in this field will appear in the SSO signup URL.
Role (either user or admin) is automatically assigned when the account is initalized via SSO login.
Name is a unique name for the SSO configuration.
Client ID is the identifier for your resource server.
Client Secret is the secret key for your resource server.
Authorization Endpoint is the URL of the authorization application.
Token Endpoint is the URL of the authorization token.
Select Test to verify the configuration and ensure that you can log in. If it is not working, you are logged out. Check your specification and retest the configuration until it is working properly.
Select Close. The card reflects the configuration:
To require users to log in using this SSO configuration, select Change under the “Disabled” status and confirm. The card updates to reflect that SSO is enabled.
After an admin has configured and enabled SSO, users logging in for the first time can sign up for SSO.
Select Test to verify the configuration and ensure that you can log in. If it is not working, you are logged out. Check your specification and retest the configuration until it is working properly.
Select Close. The card reflects the configuration:
To require users to log in using this SSO configuration, select Change under the “Disabled” status and confirm. The card updates to reflect that SSO is enabled.
Select Submit to enable the configuration. The SSO card reflects the “enabled” status.
After an admin has configured and enabled SSO, users logging in for the first time can sign up for SSO.
The SSO organization you entered during the configuration will replace SSO_Organization in the URL.
Modify Configuration
You can change the specifications for SSO integration with your authentication server at any time, including changing to an alternate SSO type, disabling the existing configuration, or reconfiguring SSO.
Change SSO Type
From the SSO Configuration card:
Select Disable, then Yes.
Select Manage then select the desired SSO type and complete the form.
Copy the redirect URL on the success dialog into your identity provider configuration.
Select Test to verify that the login is working. Modify your specification and retest the configuration until it is working properly.
Select Update.
Disable SSO Configuration
From the SSO Configuration card:
Select Disable.
Select Yes to disable the configuration, or Cancel to keep it enabled.
Uninstall NetQ
This page outlines how to remove the NetQ software from your system server and switches.
Remove the NetQ Agent and CLI
Use the apt-get purge command to remove the NetQ Agent or CLI package from a Cumulus Linux switch or an Ubuntu host:
cumulus@switch:~$ sudo apt-get update
cumulus@switch:~$ sudo apt-get purge netq-agent netq-apps
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages will be REMOVED:
netq-agent* netq-apps*
0 upgraded, 0 newly installed, 2 to remove and 0 not upgraded.
After this operation, 310 MB disk space will be freed.
Do you want to continue? [Y/n] Y
Creating pre-apt snapshot... 2 done.
(Reading database ... 42026 files and directories currently installed.)
Removing netq-agent (3.0.0-cl3u27~1587646213.c5bc079) ...
/usr/sbin/policy-rc.d returned 101, not running 'stop netq-agent.service'
Purging configuration files for netq-agent (3.0.0-cl3u27~1587646213.c5bc079) ...
dpkg: warning: while removing netq-agent, directory '/etc/netq/config.d' not empty so not removed
Removing netq-apps (3.0.0-cl3u27~1587646213.c5bc079) ...
/usr/sbin/policy-rc.d returned 101, not running 'stop netqd.service'
Purging configuration files for netq-apps (3.0.0-cl3u27~1587646213.c5bc079) ...
dpkg: warning: while removing netq-apps, directory '/etc/netq' not empty so not removed
Processing triggers for man-db (2.7.0.2-5) ...
grep: extra.services.enabled: No such file or directory
Creating post-apt snapshot... 3 done.
If you only want to remove the agent or the CLI, but not both, specify just the relevant package in the apt-get purge command.
To verify the removal of the packages from the switch, run:
cumulus@switch:~$ dpkg-query -l netq-agent
dpkg-query: no packages found matching netq-agent
cumulus@switch:~$ dpkg-query -l netq-apps
dpkg-query: no packages found matching netq-apps
Use the yum remove command to remove the NetQ agent or CLI package from a RHEL7 or CentOS host:
Verify the removal of the packages from the switch:
cumulus@switch:~$ dpkg-query -l netq-agent
dpkg-query: no packages found matching netq-agent
cumulus@switch:~$ dpkg-query -l netq-apps
dpkg-query: no packages found matching netq-apps
Delete the virtual machine according to the usual VMware or KVM practice.
Delete a virtual machine from the host computer using one of the following methods:
Right-click the name of the virtual machine in the Favorites list, then select Delete from Disk.
Select the virtual machine and choose VM > Delete from disk.
Delete a virtual machine from the host computer using one of the following methods:
Run virsch undefine <vm-domain> --remove-all-storage
Run virsh undefine <vm-domain> --wipe-storage
Configuration Management
The topics in this section provide instructions for admins responsible for managing user accounts, physical and software inventory, events and notifications, and lifecycle management.
User Management
As an admin, you can manage users and authentication settings from the NetQ management dashboard.
Lifecycle management is enabled for on-premises deployments by default and disabled for cloud deployments by default. Contact your local NVIDIA sales representative or submit a support ticket to activate LCM on cloud deployments.
Only administrative users can perform the tasks described in this topic.
Using the NetQ UI or CLI, lifecycle management (LCM) allows you to:
Click Devices in a workbench header, then select Manage switches
Access Lifecycle Management with the CLI
Lifecycle management workflows use the netq lcm command set. Refer to the command line reference for a comprehensive list of options and definitions.
NetQ and Network OS Images
NetQ and network operating system images (Cumulus Linux and SONiC) are managed with LCM. This section explains how to check for missing images, upgrade images, and specify default images.
View and Upload Missing Images
You should upload images for each network OS and NetQ version currently installed in your inventory so you can support rolling back to a known good version should an installation or upgrade fail. If you have specified a default network OS and/or NetQ version, the NetQ UI also verifies that the necessary versions of the default image are available based on the known switch inventory, and if not, lists those that are missing.
To upload missing network OS images:
Expand the Menu and select Manage switches. Select the Image management tab.
On the Cumulus Linux Images card, select View # missing CL images to see which images you need.
If you have already specified a default image, you must click Manage and then Missing to see the missing images.
Select one or more of the missing images and take note of the version, ASIC vendor, and CPU architecture for each.
Download the network OS disk images (.bin files) from the NVIDIA Enterprise Support Portal. Log in to the portal and from the Downloads tab, select Switches and Gateways. Under Switch Software, click All downloads next to Cumulus Linux for Mellanox Switches. Select the current version and the target version, then click Show Downloads Path. Download the file.
In the UI, select Add image above the table.
Provide the .bin file from an external drive that matches the criteria for the selected image(s).
Click Import.
If the upload was unsuccessful, an Image Import Failed message appears. Close the dialog and try uploading the file again.
Click Done.
(Optional) Click the Uploaded tab to verify the image is in the repository.
Click Close to return to the LCM dashboard.
The Cumulus Linux Images card reflects the number of images you uploaded.
(Optional) Display a summary of Cumulus Linux images uploaded to the LCM repo on the NetQ appliance or VM:
netq lcm show cl-images
Download the network OS disk images (.bin files) from the NVIDIA Enterprise Support Portal. Log into the portal and from the Downloads tab, select Switches and Gateways. Under Switch Software, click All downloads next to Cumulus Linux for Mellanox Switches. Select the current version and the target version, then click Show Downloads Path. Download the file.
Upload the images to the LCM repository. The following example uses a Cumulus Linux 4.2.0 disk image.
Repeat step 2 for each image you need to upload to the LCM repository.
To upload missing NetQ images:
Expand the Menu and select Manage switches. Select the Image management tab.
On the NetQ Images card, select View # missing NetQ images to see which images you need.
If you have already specified a default image, you must click Manage and then Missing to see the missing images.
Select one or all of the missing images and make note of the OS version, CPU architecture, and image type. Remember that you need both netq-apps and netq-agent for NetQ to perform the installation or upgrade.
Download the NetQ Debian packages needed for upgrade from the NetQ repository, selecting the appropriate OS version and architecture. Place the files in an accessible part of your local network.
In the UI, click Add image above the table.
Provide the .deb file(s) from an external drive that matches the criteria for the selected image.
Click Import.
If the upload was unsuccessful, an Image Import Failed message appears. Close the dialog and try uploading the file again.
Click Done.
(Optional) Click the Uploaded tab to verify that the image is in the repository.
Click Close to return to the LCM dashboard.
The NetQ Images card reflects the number of images you uploaded.
(Optional) Display a summary of NetQ images uploaded to the LCM repo on the NetQ appliance or VM:
netq lcm show netq-images
Download the NetQ Debian packages needed for upgrade from the NetQ repository, selecting the appropriate version and hypervisor/platform. Place them in an accessible part of your local network.
Upload the images to the LCM repository. This example uploads the two packages (netq-agent and netq-apps) needed for NetQ version 4.4.0 for a NetQ appliance or VM running Ubuntu 18.04 with an x86 architecture.
To upload the network OS or NetQ images that you want to use for upgrade, first download the Cumulus Linux or SONiC disk images (.bin files) and NetQ Debian packages from the NVIDIA Enterprise Support Portal and NetQ repository, respectively. Place them in an accessible part of your local network.
If you are upgrading the network OS on switches with different ASIC vendors or CPU architectures, you need more than one image. For NetQ, you need both the netq-apps and netq-agent packages for each variant.
After obtaining the images, upload them to NetQ with the UI or CLI:
From the LCM dashboard, select the Image management tab.
Select Add image on the appropriate card:
Provide one or more images from an external drive.
Click Import.
Monitor the progress until it completes. Click Done.
Use the netq lcm add cl-image <text-cl-image-path> and netq lcm add netq-image <text-image-path> commands to upload the images. Run the relevant command for each image that needs to be uploaded.
Specifying a default upgrade version is optional, but recommended. You can assign a specific OS or NetQ version as the default version to use when installing or upgrading switches. The default is typically the newest version that you intend to install or upgrade on all, or the majority, of your switches. If necessary, you can override the default selection during the installation or upgrade process if an alternate version is needed for a given set of switches.
To specify a default version in the NetQ UI:
From the LCM dashboard, select the Image management tab.
Select Click here to set default x version on the relevant card.
Select the version you want to use as the default for switch upgrades.
Click Save. The default version is now displayed on the relevant Images card.
cumulus@switch:~$ netq lcm show default-version netq-images
Remove Images from Local Repository
After you upgrade all your switches beyond a particular release, you can remove images from the LCM repository to save space on the server. To remove images:
From the LCM dashboard, select the Image management tab.
Click Manage on the Cumulus Linux Images or NetQ Images card.
On the Uploaded tab, select the images you want to remove.
Click Delete.
To remove Cumulus Linux images, run:
netq lcm show cl-images [json]
netq lcm del cl-image <text-cl-image-id>
Authentication credentials are stored in access profiles which can be assigned to individual switches. You can create credentials with either basic (SSH username/password) or SSH (public/private key) authentication. This section describes how to create, edit, and delete access profiles. After you create a profile, attach it to individual switches so that you can perform upgrades on those switches.
By default, NVIDIA supplies two access profiles: Netq-Default and Nvl4-Default (for NVLink devices). NVIDIA strongly recommends creating new access profiles or updating the default profiles with unique credentials. When you upgrade to NetQ 4.5 from 4.4, NetQ saves your 4.4 global access credentials to the Netq-Default profile.
You cannot delete default profiles.
Create Access Profiles
Expand the Menu and select Manage switches.
On the Access Profiles card, select Add profile.
Enter a name for the profile, then select the authentication method you want to use: SSH or Basic
You must have sudoer permission to configure switches when using the SSH key method.
Create a pair of SSH private and public keys:
ssh-keygen -t rsa -C "<USER>"
Copy the SSH public key to each switch that you want to upgrade using one of the following methods:
Manually copy the SSH public key to the /home/<USER>/.ssh/authorized_keys file on each switch, or
Run ssh-copy-id USER@<switch_ip> on the server where you generated the SSH key pair for each switch
Copy the SSH private key into the entry field:
For security, your private key is stored in an encrypted format, and only provided to internal processes while encrypted.
(Optional) To verify that the new profile is listed among available profiles, select View profiles from the Access Profiles card.
Be sure to use credentials for an account that has permission to configure switches.
The default credentials for Cumulus Linux have changed from cumulus/CumulusLinux! to cumulus/cumulus for releases 4.2 and later. For details, read Cumulus Linux User Accounts.
Enter a username and password.
Click Create, then confirm.
(Optional) To verify that the new profile is listed among available profiles, select View profiles from the Access Profiles card.
Specify a unique name for the configuration after profile_name.
The default credentials for Cumulus Linux have changed from cumulus/CumulusLinux! to cumulus/cumulus for releases 4.2 and later. For details, read Cumulus Linux User Accounts.
To configure SSH authentication using a public/private key:
You must have sudoer permission to properly configure switches when using the SSH key method.
If the keys do not yet exist, create a pair of SSH private and public keys.
ssh-keygen -t rsa -C "<USER>"
Copy the SSH public key to each switch that you want to upgrade using one of the following methods:
Manually copy the SSH public key to the /home/<USER>/.ssh/authorized_keys file on each switch, or
Run ssh-copy-id USER@<switch_ip> on the server where you generated the SSH key pair for each switch
Add these credentials to the switch. Specify a unique name for the configuration after profile_name.
Any profile that is assigned to a switch can’t be deleted. You must attach a different profile to the switch first. Note that Netq-Default and Nvl4-Default can’t be deleted.
On the Access Profiles card, select View profiles.
From the list of profiles, select Delete in the profile’s row.
The delete icon only appears next to custom profiles that are not already attached to a switch.
Select Remove.
Run netq lcm show credentials. Identify the profile you’d like to delete and copy its identifier from the Profile ID column. The following example deletes the n-1000 profile:
cumulus@switch:~$ netq lcm show credentials
Profile ID Profile Name Type SSH Key Username Password Number of switches Last Changed
-------------------- ------------------------ ---------------- -------------- ---------------- ---------------- ------------------------------------ -------------------------
credential_profile_d Netq-Default BASIC cumulus ************** 11 Fri Feb 3 18:20:33 2023
9e875bd2e6784617b304
c20090ce28ff2bb46a4b
9bf23cda98f1bdf91128
5c9
credential_profile_3 Nvl4-Default BASIC admin ************** 1 Fri Feb 3 19:18:26 2023
5a2eead7344fb91218bc
dec29b12c66ebef0d806
659b20e8805e4ff629bc
23e
credential_profile_3 n-1000 BASIC admin ************** 0 Fri Feb 3 21:49:10 2023
eddab251bddea9653df7
cd1be0fc123c5d7a42f8
18b68134e42858e54a9c
289
Run netq lcm del credentials profile_id <text-credential-profile-id>:
cumulus@switch:~$ netq lcm del credentials profile_id credential_profile_3eddab251bddea9653df7cd1be0fc123c5d7a42f818b68134e42858e54a9c289
Verify that the profile is deleted with netq lcm show credentials.
View Access Profiles
You can view the type of credentials used to access your switches in the NetQ UI. You can view the details of the credentials using the NetQ CLI.
Open the LCM dashboard.
On the Access Profiles card, select View profiles.
To view a list of access profiles and their associated credentials, run netq lcm show credentials.
If you use an SSH key for the credentials, the public key appears in the command output.
If you use a username and password for the credentials, the username appears in the command output with the password masked.
Upon installation, lifecycle management displays an inventory of switches that are available for software installation or upgrade through NetQ. This includes all switches running Cumulus Linux 3.7.12 or later, SONiC 202012 and 202106, and NetQ Agent 4.1.0 or later in your network. From this list, you can assign access profiles and roles to switches, and select switches for software installation and upgrades.
View the LCM Switch Inventory
From the LCM dashboard, select the Switch management tab. The Switches card displays the number of switches that NetQ discovered and the network OS versions that are running on those switches:
To view a table of all discovered switches and their attributes, select Manage on the Switches card.
If you have more than one network OS version running on your switches, you can click a version segment on the Switches card graph to open a list of switches pre-filtered by that version.
To view a list of all switches discovered by lifecycle management, run:
netq lcm show switches
[cl-version <text-cumulus-linux-version>]
[netq-version <text-netq-version>]
[json]
Use the version options to display switches with a given OS version, X.Y.Z.
This list is the starting point for network OS upgrades or NetQ installations and upgrades. If the switches you want to upgrade are not present in the list, you can:
Verify the missing switches are reachable using ping
Verify the NetQ Agent is fresh and version 4.1.0 or later for switches that already have the agent installed (click Menu, then click Agents or run netq show agents)
After creating access profiles from your credentials, you can attach a profile to one or more switches.
On the Switches card, select Manage.
The table displays a list of switches. The Access type column specifies whether the type of authentication is basic or SSH. The Profile name column displays the access profile that is assigned to the switch.
Select the switches you’d like to assign access profiles, then select Manage access profile above the table:
Select the profile from the list, then click Done.
Run netq lcm show switches and verify the change in the credential profile column.
Reassign or Detach an Access Profile
Detaching a profile from a switch restores it to the default access profile, Netq-Default.
On the Switches card, click Manage.
The table displays a list of switches. In the profile name column, locate the access profile. Hover over the access type column and select Manage access:
To assign a different access profile to the switch, select it from the list. To detach the access profile, select Detach.
After you detach the profile from the switch, NetQ reassigns it to the Netq-Default profile.
The syntax for the detach command is netq lcm detach credentials hostname <text-switch-hostname>.
To obtain a list of hostnames, run netq lcm show switches.
Detach the access profile and specify the hostname. The following example detaches spine-1 from its assigned access profile:
Run netq lcm show switches and verify the change in the credential profile column.
Role Management
You can assign switches one of four roles: superspine, spine, leaf, and exit.
Switch roles identify switch dependencies and determine the order in which switches are upgraded. The upgrade process begins with switches assigned the superspine role, then continues with the spine switches, leaf switches, exit switches, and finally, switches with no role assigned. Upgrades for all switches with a given role must be successful before the upgrade proceeds to the switches with the closest dependent role.
Role assignment is optional, but recommended. Assigning roles can prevent switches from becoming unreachable due to dependencies between switches or single attachments. Additionally, when you deploy MLAG pairs, assigned roles avoid upgrade conflicts.
Assign Roles to Switches
On the Switches card, click Manage.
Select one switch or multiple switches to assign to the same role.
Above the table, select Assign role.
Select the role (superspine, leaf, spine, or exit) that applies to the selected switch(es).
Click Assign.
Note that the Role column is updated with the role assigned to the selected switch(es). To return to the full list of switches, click All.
Continue selecting switches and assigning roles until most or all switches have roles assigned.
To assign multiple switches to the same role, separate the hostnames with commas (no spaces). This example configures leaf01 through leaf04 switches with the leaf role:
netq lcm add role leaf switches leaf01,leaf02,leaf03,leaf04
To view all switch roles, run:
netq lcm show switches [version <text-cumulus-linux-version>] [json]
Use the version option to only show switches with a given network OS version, X.Y.Z.
Select the switches with the incorrect role from the list.
Click Assign role.
Select the correct role. To leave a switch unassigned, select No Role.
Click Assign.
You use the same command to both assign a role and change a role.
For a single switch, run:
netq lcm add role exit switches border01
To assign multiple switches to the same role, separate the hostnames with commas (no spaces). For example:
cumulus@switch:~$ netq lcm add role exit switches border01,border02
Upgrade NetQ Agent Using LCM
Lifecycle management lets you upgrade to the latest agent version on switches with an existing NetQ Agent. You can upgrade only the NetQ Agent or both the NetQ Agent and NetQ CLI simultaneously. You can run up to five jobs at the same time; however, a given switch can only appear in one running job at a time.
Upgrades can be performed with LCM for NetQ Agents versions 2.4.0 and later. For earlier versions, perform a new installation.
Prepare for a NetQ Agent Upgrade
Before you upgrade, make sure you have the appropriate files and credentials:
After you complete the preparation steps, upgrade the NetQ Agents:
From the LCM dashboard, select the Switch management tab. Locate the Switches card and click Manage.
Select the switches you want to upgrade.
Click Upgrade NetQ above the table and follow the steps in the UI.
Verify that the number of switches selected for upgrade matches your expectation.
Enter a name for the upgrade job. The name can contain a maximum of 22 characters (including spaces).
Review each switch:
Is the NetQ Agent version 2.4.0 or later? If not, this switch can only be upgraded through the switch discovery process.
Is the configuration profile the one you want to apply? If not, click Change config, then select an alternate profile to apply to all selected switches.
You can apply different profiles to switches in a single upgrade job by selecting a subset of switches then choosing a different profile. You can also change the profile on a per-switch basis by clicking the current profile link and selecting an alternate one.
Review the summary indicating the number of switches and the configuration profile to be used. If either is incorrect, click Back and review your selections.
Select the version of NetQ Agent for upgrade. If you have designated a default version, keep the Default selection. Otherwise, select an alternate version by clicking Custom and selecting it from the list.
By default, the NetQ Agent and CLI are upgraded on the selected switches. If you do not want to upgrade the NetQ CLI, click Advanced and change the selection to No.
NetQ performs several checks to eliminate preventable problems during the upgrade process. When all of the pre-checks pass, click Upgrade to initiate the upgrade.
To upgrade the NetQ Agent on one or more switches, run:
The following example creates a NetQ Agent upgrade job called upgrade-cl530-nq450. It upgrades the spine01 and spine02 switches with NetQ Agents version 4.5.0.
After starting the upgrade you can monitor the progress in the NetQ UI. Successful upgrades are indicated by a green . Failed upgrades display error messages indicating the cause of failure.
To view the progress of upgrade jobs using the CLI, run:
netq lcm show upgrade-jobs netq-image [json]
netq lcm show status <text-lcm-job-id> [json]
▼
Example netq lcm show upgrade-jobs
You can view the progress of one upgrade job at a time. This requires the job identifier.
The following example shows all upgrade jobs that are currently running or have completed, and then shows the status of the job with a job identifier of job_netq_install_7152a03a8c63c906631c3fb340d8f51e70c3ab508d69f3fdf5032eebad118cc7.
cumulus@switch:~$ netq lcm show upgrade-jobs netq-image json
[
{
"jobId": "job_netq_install_7152a03a8c63c906631c3fb340d8f51e70c3ab508d69f3fdf5032eebad118cc7",
"name": "Leaf01-02 to NetQ330",
"netqVersion": "4.1.0",
"overallStatus": "FAILED",
"pre-checkStatus": "COMPLETED",
"warnings": [],
"errors": [],
"startTime": 1611863290557.0
}
]
cumulus@switch:~$ netq lcm show status netq-image job_netq_install_7152a03a8c63c906631c3fb340d8f51e70c3ab508d69f3fdf5032eebad118cc7
NetQ Upgrade FAILED
Upgrade Summary
---------------
Start Time: 2021-01-28 19:48:10.557000
End Time: 2021-01-28 19:48:17.972000
Upgrade CLI: True
NetQ Version: 4.1.0
Pre Check Status COMPLETED
Precheck Task switch_precheck COMPLETED
Warnings: []
Errors: []
Precheck Task version_precheck COMPLETED
Warnings: []
Errors: []
Precheck Task config_precheck COMPLETED
Warnings: []
Errors: []
Hostname CL Version NetQ Version Prev NetQ Ver Config Profile Status Warnings Errors Start Time
sion
----------------- ----------- ------------- ------------- ---------------------------- ---------------- ---------------- ------------ --------------------------
leaf01 4.2.1 4.1.0 3.2.1 ['NetQ default config'] FAILED [] ["Unreachabl Thu Jan 28 19:48:10 2021
e at Invalid
/incorrect u
sername/pass
word. Skippi
ng remaining
10 retries t
o prevent ac
count lockou
t: Warning:
Permanently
added '192.1
68.200.11' (
ECDSA) to th
e list of kn
own hosts.\r
\nPermission
denied,
please try a
gain."]
leaf02 4.2.1 4.1.0 3.2.1 ['NetQ default config'] FAILED [] ["Unreachabl Thu Jan 28 19:48:10 2021
e at Invalid
/incorrect u
sername/pass
word. Skippi
ng remaining
10 retries t
o prevent ac
count lockou
t: Warning:
Permanently
added '192.1
68.200.12' (
ECDSA) to th
e list of kn
own hosts.\r
\nPermission
denied,
please try a
gain."]
Upgrade Cumulus Linux Using LCM
LCM lets you upgrade Cumulus Linux on one or more switches in your network via the NetQ UI or the CLI. You can run up to five upgrade jobs simultaneously; however, a given switch can only appear in one running job at a time. Upgrading Cumulus Linux on a switch typically takes around 45 minutes.
You can upgrade Cumulus Linux from:
3.7.16 to later versions of Cumulus Linux 3
3.7.16 or later to 4.2.0 or later versions of Cumulus Linux 4
4.2 to later versions of Cumulus Linux 4
4.4.0 or later to Cumulus Linux 5 releases
5.0.0 or later to 5.1.0 or later versions of Cumulus Linux 5
When upgrading to Cumulus Linux 5.0.0 or later, LCM backs up and restores flat file configurations in Cumulus Linux. After you upgrade to Cumulus Linux 5, running NVUE configuration commands replaces any configuration restored by NetQ LCM. See Upgrading Cumulus Linux for additional information.
When NVUE is enabled, LCM supports upgrades from Cumulus Linux 5.0.0 to later versions of Cumulus Linux 5. Upgrading from earlier versions of Cumulus Linux is not supported when NVUE is enabled.
Prepare for a Cumulus Linux Upgrade
If the NetQ Agent is already installed on the switches you’d like to upgrade, follow the steps below. If the NetQ Agent is not installed on the switches you’d like to upgrade, run a switch discovery, then proceed with the upgrade.
Before you upgrade, make sure you have the appropriate files and credentials:
After you complete the preparation steps, upgrade Cumulus Linux:
Click Devices in any workbench header, then select Manage switches.
Locate the Switches card and click Manage.
Select the switches you want to upgrade.
Click Upgrade OS above the table.
Follow the steps in the UI. Create a name for the upgrade and review the switches that you selected to upgrade:
If you accidentally included a switch that you do not want to upgrade, hover over the switch information card and click Delete to remove it from the upgrade.
If the role is incorrect or missing, click Edit, then select a role for that switch from the dropdown. Click Cancel to discard the change.
By default, NetQ performs a roll back to the original Cumulus Linux version on any server which fails to upgrade. It also takes network snapshots before and after the upgrade.
You can exclude selected services and protocols from the snapshots by clicking them. Node and services must be included.
Click Next.
NetQ performs several checks to eliminate preventable problems during the upgrade process. When all of the pre-checks pass, click Preview.
NetQ directs you to a screen where you can review the upgrade. After reviewing, select Start upgrade and confirm.
Perform the upgrade using the netq lcm upgrade cl-image command, providing a name for the upgrade job, the Cumulus Linux and NetQ version, and a comma-separated list of the hostname(s) to be upgraded:
(Recommended) You can restore the previous version of Cumulus Linux if the upgrade job fails by adding the run-restore-on-failure option to the command.
cumulus@switch:~$ netq lcm upgrade cl-image name upgrade-530 cl-version 5.3.0 netq-version 4.5.0 hostnames spine01,spine02,leaf01,leaf02 order spine,leaf run-restore-on-failure
Pre-check Failures
If one or more of the pre-checks fail, resolve the related issue and start the upgrade again. In the NetQ UI these failures appear on the Upgrade Preview page. In the NetQ CLI, it appears in the form of error messages in the netq lcm show upgrade-jobs cl-image command output.
Analyze Results
After starting the upgrade you can monitor the progress in the NetQ UI. Successful upgrades are indicated by a green . Failed upgrades display error messages indicating the cause of failure.
To view the progress of current upgrade jobs and the history of previous upgrade jobs using the CLI, run netq lcm show upgrade-jobs cl-image.
To see details of a particular upgrade job, run netq lcm show status job-ID.
To see only Cumulus Linux upgrade jobs, run netq lcm show status cl-image job-ID.
Download details about the upgrade in a JSON-formatted file, by clicking Download report.
Post-check Failures
A successful upgrade can still have post-check warnings. For example, you updated the OS, but not all services are fully up and running after the upgrade. If one or more of the post-checks fail, warning messages appear in the Post-Upgrade Tasks section of the preview. Click the warning category to view the detailed messages.
Upgrade Cumulus Linux on Switches Without NetQ Agent Installed
To upgrade Cumulus Linux on switches without NetQ installed, create a switch discovery. The discovery searches your network for all Cumulus Linux switches (with and without NetQ currently installed) and determines the versions of Cumulus Linux and NetQ installed. These results are then used to install or upgrade Cumulus Linux and NetQ on all discovered switches in a single procedure rather than in two steps. You can run up to five jobs simultaneously; however, a given switch can only appear in one running job at a time.
To discover switches running Cumulus Linux and upgrade Cumulus Linux and NetQ on those switches:
Click Devices in the workbench header, then click Manage switches.
On the Switches card, click Discover.
Enter a name for the scan.
Choose whether you want to look for switches by entering IP address ranges or import switches using a comma-separated values (CSV) file.
If you do not have a switch listing, then you can manually add the address ranges where your switches are located in the network. This has the advantage of catching switches that might have been missed in a file.
A maximum of 50 addresses can be included in an address range. If necessary, break the range into smaller ranges.
To discover switches using address ranges:
Enter an IP address range in the IP Range field.
Ranges can be contiguous, for example 192.168.0.24-64, or non-contiguous, for example 192.168.0.24-64,128-190,235, but they must be contained within a single subnet.
Optionally, enter another IP address range (in a different subnet) by clicking .
For example, 198.51.100.0-128 or 198.51.100.0-128,190,200-253.
Add additional ranges as needed. Click to remove a range.
If you decide to use a CSV file instead, the ranges you entered will remain if you return to using IP ranges again.
To import switches through a CSV file:
Click Browse.
Select the CSV file containing the list of switches.
The CSV file must include a header containing hostname, ip, and port. They can be in any order you like, but the data must match that order. For example, a CSV file that represents the Cumulus reference topology could look like this:
or this:
You must have an IP address in your file, but the hostname is optional. If the port is blank, NetQ uses switch port 22 by default.
Click Remove if you decide to use a different file or want to use IP address ranges instead. If you entered ranges before selecting the CSV file option, they remain.
Select an access profile from the dropdown menu. If you use Netq-Default you will see a message requesting that you create or update your credentials.
Click Next.
When the network discovery is complete, NetQ presents the number of Cumulus Linux switches it found. Each switch can be in one of the following categories:
Discovered without NetQ: Switches found without NetQ installed
Discovered with NetQ: Switches found with some version of NetQ installed
Discovered but Rotten: Switches found that are unreachable
Incorrect Credentials: Switches found that cannot are unreachable because the provided access credentials do not match those for the switches
OS not Supported: Switches found that are running Cumulus Linux version not supported by the LCM upgrade feature
Not Discovered: IP addresses which did not have an associated Cumulus Linux switch
If the discovery process does not find any switches for a particular category, then it does not display that category.
Select which switches you want to upgrade from each category by clicking the checkbox on each switch card.
Click Next.
Accept the default NetQ version or click Custom and select an alternate version.
By default, the NetQ Agent and CLI are upgraded on the selected switches. If you do not want to upgrade the NetQ CLI, click Advanced and change the selection to No.
Click Next.
NetQ performs several checks to eliminate preventable problems during the upgrade process. When all of the pre-checks pass, select Install.
After starting the upgrade you can monitor the progress from the preview page or the Upgrade History page.
Use the netq lcm discover command, specifying a single IP address, a range of IP addresses where your switches are located in the network, or a CSV file containing the IP address.
You must also specify the access profile ID, which you can obtain with the netq lcm show credentials command.
cumulus@switch:~$ netq lcm discover ip-range 10.0.1.12 profile_id credential_profile_3eddab251bddea9653df7cd1be0fc123c5d7a42f818b68134e42858e54a9c289
NetQ Discovery Started with job id: job_scan_4f3873b0-5526-11eb-97a2-5b3ed2e556db
When the network discovery is complete, NetQ presents the number of Cumulus Linux switches it has found. The output displays their discovery status, which can be one of the following:
Discovered without NetQ: Switches found without NetQ installed
Discovered with NetQ: Switches found with some version of NetQ installed
Discovered but Rotten: Switches found that are unreachable
Incorrect Credentials: Switches found that are unreachable because the provided access credentials do not match those for the switches
OS not Supported: Switches found that are running Cumulus Linux version not supported by the LCM upgrade feature
NOT_FOUND: IP addresses which did not have an associated Cumulus Linux switch
After you determine which switches you need to upgrade, run the upgrade process as described above.
Note that if you previously ran a switch discovery, you can display its results with netq lcm show discovery-job:
cumulus@switch:~$ netq lcm show discovery-job job_scan_921f0a40-5440-11eb-97a2-5b3ed2e556db
Scan COMPLETED
Summary
-------
Start Time: 2021-01-11 19:09:47.441000
End Time: 2021-01-11 19:09:59.890000
Total IPs: 1
Completed IPs: 1
Discovered without NetQ: 0
Discovered with NetQ: 0
Incorrect Credentials: 0
OS Not Supported: 0
Not Discovered: 1
Hostname IP Address MAC Address CPU CL Version NetQ Version Config Profile Discovery Status Upgrade Status
----------------- ------------------------- ------------------ -------- ----------- ------------- ---------------------------- ---------------- --------------
N/A 10.0.1.12 N/A N/A N/A N/A [] NOT_FOUND NOT_UPGRADING
cumulus@switch:~$
Network Snapshots
Snapshots capture a network’s state—including the services running on the network—at a particular point in time. Comparing snapshots lets you check what (if anything) changed in the network, which can be helpful when upgrading a switch or modifying its configuration. This section outlines how to create, compare, and interpret snapshots.
Create a Network Snapshot
To create a snapshot:
From the workbench header, select Snapshot, then Create Snapshot:
Next, enter the snapshot’s name, time frame, and the elements you’d like included in the snapshot:
To capture the network’s current state, click Now. To capture the network’s state at a previous date and time, click Past, then in the Start Time field, select the calendar icon.
The Choose options field includes all the elements and services that may run on the network. All are selected by default. Click any element to remove it from the snapshot. Nodes and services are included in all snapshots.
The Notes field is optional. You can add a note to remind you of the snapshot’s purpose.
Select Finish. The card now appears on your workbench.
When you are finished viewing the snapshot, click Dismiss to remove it from your workbench. You can add it back by selecting Snapshot in the header and navigating to the option to view snapshots.
Compare Network Snapshots
You can compare the state of your network before and after an upgrade or other configuration change to help avoid unwanted changes to your network’s state.
To compare network snapshots:
From the workbench header, select Snapshot.
Select Compare snapshots, then select the two snapshots you want to compare.
Click Finish.
If the snapshot cards are already on your workbench, place the cards side-by-side for a high-level comparison. For a more detailed comparison, click Compare on one of the cards and select a snapshot for comparison from the list.
Interpreting the Comparison Data
For each network element with changes, a visualization displays the differences between the two snapshots. Green represents additions, red represents subtractions, and orange represents updates.
In the following example, Snapshot 3 and Snapshot 4 are being compared. Snapshot 3 has a BGP count of 212 and Snapshot 4 has a BGP count of 186. The comparison also shows 98 BGP updates.
From this view, you can dismiss the snapshots or select View Details for additional information and to filter and export the data as a JSON file.
The following table describes the information provided for each element type when changes are present:
Element
Data Descriptions
BGP
Hostname: Name of the host running the BGP session
VRF: Virtual route forwarding interface if used
BGP Session: Session that was removed or added
ASN: Autonomous system number
Interface
Hostname: Name of the host where the interface resides
IF Name: Name of the interface that was removed or added
IP Address
Hostname: Name of the host where address was removed or added
Prefix: IP address prefix
Mask: IP address mask
IF Name: Name of the interface that owns the address
Links
Hostname: Name of the host where the link was removed or added
Change the hostname of the monitored switch or host
Move the monitored switch or host from one data center to another
RMA the monitored switch or host
Decommissioning the switch or host removes information about the switch or host from the NetQ database. When the NetQ Agent restarts at a later date, it sends a connection request back to the database, so NetQ can monitor the switch or host again.
Decommission a Switch
From the LCM dashboard, navigate to the Switch management tab.
On the Switches card, select Manage.
Select the devices to decommission, then select the decommission icon above the table:
If you attempt to decommission a switch that is assigned a default, unmodified access profile, the process will fail. Create a unique access profile (or update the default with unique credentials), then attach the profile to the switch you want to decommission.
Confirm the devices you want to decommission.
Wait for the decommission process to complete, then select Done.
To decommission a switch or host:
On the given switch or host, stop and disable the NetQ Agent service:
Run the following commands to view the status of an agent, disable an agent, manage logging, and configure the events the agent collects.
View NetQ Agent Status
The syntax for the NetQ Agent status command is:
netq [<hostname>] show agents
[fresh | dead | rotten | opta]
[around <text-time>]
[json]
You can view the status for a given switch, host or NetQ Appliance or Virtual Machine. You can also filter by the status and view the status at a time in the past.
To view the current status of all NetQ Agents, run:
cumulus@switch~:$ netq show agents
To view NetQ Agents that are not communicating, run:
cumulus@switch~:$ netq show agents rotten
No matching agents records found
To view NetQ Agent status on the NetQ appliance or VM, run:
cumulus@switch~:$ netq show agents opta
Matching agents records:
Hostname Status NTP Sync Version Sys Uptime Agent Uptime Reinitialize Time Last Changed
----------------- ---------------- -------- ------------------------------------ ------------------------- ------------------------- -------------------------- -------------------------
netq-ts Fresh yes 3.2.0-ub18.04u30~1601393774.104fb9e Mon Sep 21 16:46:53 2020 Tue Sep 29 21:13:07 2020 Tue Sep 29 21:13:07 2020 Thu Oct 1 16:29:51 2020
View NetQ Agent Configuration
You can view the current configuration of a NetQ Agent to determine what data it collects and where it sends that data. The syntax for this command is:
netq config show agent
[cpu-limit|frr-monitor|kubernetes-monitor|loglevel|sensors|ssl|stats||wjh|wjh-threshold]
[json]
The following example shows a NetQ Agent in an on-premises deployment, talking to an appliance or VM at 127.0.0.1 using the default ports and VRF. There is no special configuration to monitor Kubernetes, FRR, interface statistics, sensors, or WJH, and there are no limits on CPU usage or change to the default logging level.
cumulus@switch:~$ netq config show agent
netq-agent value default
--------------------- --------- ---------
exhibitport
exhibiturl
server 127.0.0.1 127.0.0.1
cpu-limit 100 100
agenturl
enable-opta-discovery True True
agentport 8981 8981
port 31980 31980
vrf default default
()
To view the configuration of a particular aspect of a NetQ Agent, use the various options.
This example shows a NetQ Agent configured with a CPU limit of 60%.
cumulus@switch:~$ netq config show agent cpu-limit
CPU Quota
-----------
60%
()
Modify the Configuration of the NetQ Agent on a Node
The agent configuration commands let you:
Add, disable, and remove a NetQ Agent
Start and stop a NetQ Agent
Configure a NetQ Agent to collect selected data (CPU usage limit, FRR, Kubernetes, sensors, WJH)
Configure a NetQ Agent to send data to a server cluster
Troubleshoot the NetQ Agent
Commands apply to one agent at a time, and you run them on the switch or host where the NetQ Agent resides.
Add and Remove a NetQ Agent
To add or remove a NetQ Agent, you must add or remove the IP address (and port and VRF when specified) from the NetQ configuration file (at /etc/netq/netq.yml). This adds or removes the information about the appliance or VM where the agent sends the data it collects.
To use the NetQ CLI to add or remove a NetQ Agent on a switch or host, run:
netq config add agent server <text-opta-ip> [port <text-opta-port>] [vrf <text-vrf-name>]
netq config del agent server
If you want to use a specific port on the appliance or VM, use the port option. If you want the data sent over a particular virtual route interface, use the vrf option.
This example shows how to add a NetQ Agent and tell it to send the data it collects to the NetQ Appliance or VM at the IPv4 address of 10.0.0.23 using the default port (on-premises = 31980; cloud = 443) and vrf (default).
You can temporarily disable the NetQ Agent on a node. Disabling the NetQ Agent maintains the data already collected in the NetQ database, but stops the NetQ Agent from collecting new data until you reenable it.
To disable a NetQ Agent, run:
cumulus@switch:~$ netq config stop agent
To reenable a NetQ Agent, run:
cumulus@switch:~$ netq config restart agent
Configure a NetQ Agent to Limit Switch CPU Usage
While not typically an issue, you can restrict the NetQ Agent from using more than a configurable amount of the CPU resources. This setting requires Cumulus Linux versions 3.6.x, 3.7.x or 4.1.0 or later to be running on the switch.
You must separate the list of IP addresses by commas (not spaces). You can optionally specify a port or VRF.
This example configures the NetQ Agent on a switch to send the data to three servers located at 10.0.0.21, 10.0.0.22, and 10.0.0.23 using the rocket VRF.
To stop a NetQ Agent from sending data to a server cluster, run:
cumulus@switch:~$ netq config del agent cluster-servers
Configure Logging to Troubleshoot a NetQ Agent
The logging level used for a NetQ Agent determines what types of events get logged about the NetQ Agent on the switch or host.
First, you need to decide what level of logging you want to configure. You can configure the logging level to be the same for every NetQ Agent, or selectively increase or decrease the logging level for a NetQ Agent on a problematic node.
Logging Level
Description
debug
Sends notifications for all debug, info, warning, and error messages.
info
Sends notifications for info, warning, and error messages (default).
warning
Sends notifications for warning and error messages.
error
Sends notifications for errors messages.
You can view the NetQ Agent log directly. Messages have the following structure:
(Optional) Verify connection to the NetQ appliance or VM by viewing the netq-agent.log messages.
Disable Agent Logging
If you set the logging level to debug for troubleshooting, NVIDIA recommends that you either change the logging level to a less verbose mode or disable agent logging when you finish troubleshooting.
To change the logging level from debug to another level, run:
The NetQ Agent contains a pre-configured set of modular commands that run periodically and send event and resource data to the NetQ appliance or VM. You can fine tune which events the agent can poll and vary frequency of polling using the NetQ CLI.
For example, if your network is not running OSPF, you can disable the command that polls for OSPF events. Or you can decrease the polling interval for LLDP from the default of 60 seconds to 120 seconds. By not polling for selected data or polling less frequently, you can reduce switch CPU usage by the NetQ Agent.
Depending on the switch platform, the NetQ Agent might not execute some supported protocol commands. For example, if a switch has no VXLAN capability, then the agent skips all VXLAN-related commands.
Supported Commands
To see the list of supported modular commands, run:
agent_stats: Collects statistics about the NetQ Agent every 5 minutes.
agent_util_stats: Collects switch CPU and memory utilization by the NetQ Agent every 30 seconds.
cl-support-json: Polls the switch every 3 minutes to determine if an agent generated a cl-support file.
config-mon-json: Polls the /etc/network/interfaces, /etc/frr/frr.conf, /etc/lldpd.d/README.conf, and /etc/ptm.d/topology.dot files every 2 minutes to determine if the contents of any of these files has changed. If a change occurred, the agent transmits the contents of the file and its modification time to the NetQ appliance or VM.
ports: Polls for optics plugged into the switch every hour.
proc-net-dev: Polls for network statistics on the switch every 30 seconds.
running-config-mon-json: Polls the clagctl parameters every 30 seconds and sends a diff of any changes to the NetQ appliance or VM.
Modify the Polling Frequency
You can change the polling frequency (in seconds) of a modular command. For example, to change the polling frequency of the lldp-json command to 60 seconds from its default of 120 seconds, run:
You can disable unnecessary commands. This can help reduce the compute resources the NetQ Agent consumes on the switch. For example, if your network does not run OSPF, you can disable the two OSPF commands:
This section describes how to use the NetQ UI and CLI to monitor your inventory from networkwide and device-specific perspectives.
You can monitor all hardware and software components installed and running on the switches and hosts across the entire network. This is useful for understanding dependencies on various vendors and versions and can help when planning upgrades.
Networkwide Inventory
Use the UI or CLI to monitor your inventory of switches, hosts, and DPUs at the networkwide level. The inventory includes a count for each device and its operating system and information about the hardware and software components on individual switches, such as the motherboard, ASIC, microprocessor, disk, memory, fan, and power supply information.
Several forms of this command are available based on the inventory component you’d like to view. See the command line reference for additional options, definitions, and examples.
netq show inventory (brief | asic | board | cpu | disk | memory | os)
View Networkwide Inventory in the UI
To view the quantity of devices in your network, open the Inventory/Devices card. The medium-sized card displays operating system distribution across the network and the total number of devices in the network. Hover over the chart’s outer circle to view operating system distribution; hover over the chart’s inner circle to view device counts.
Expand to the large card for additional distribution info. By default, the Switches tab shows the total number of switches, ASIC vendors, OS versions, NetQ Agent versions, and specific platforms deployed across all your switches. You can hover over and select any of the segments in a component distribution chart to highlight and filter data, including:
Name or value of the component type, such as the version number or status
Total number of switches with a particular type of component deployed compared to the total number of switches
Percentage of this type as compared to all component types
Expand the Inventory/Devices card to full-screen to view information for all switches, hosts, and DPUs in your network in a table where you can filter and export data:
Switch Inventory
With the NetQ UI and NetQ CLI, you can monitor your inventory of switches across the network or individually. A user can view operating system, motherboard, ASIC, microprocessor, disk, memory, fan, and power supply information.
Add the Inventory/Switches card to your workbench to monitor the hardware and software component inventory on switches running NetQ in your network. Select the dropdown to view additional inventory information.
Use the netq <hostname> show inventory command to view switch inventory information with the CLI.
View Switch Inventory Summary
View the Number of Types of Any Component Deployed
For each of the components monitored on a switch, NetQ displays a unique count.
To view this count for all of the components on the switch:
Open the large Switch Inventory card.
Note the number in the Unique column for each component.
By default, the card displays data for fresh switches. Select Rotten switches from the dropdown to display information for switches that are in a down state. Hover over any of the segments in the distribution chart to highlight a specific component.
When you hover, a tooltip appears displaying:
Name or value of the component type, such as the version number or status
Total number of switches with that type of component deployed compared to the total number of switches
Percentage of this type with respect to all component types
To view the hardware and software components for a switch, run:
netq <hostname> show inventory brief
This example shows the type of switch (Cumulus VX), operating system (Cumulus Linux), CPU (x86_62), and ASIC (virtual) for the spine01 switch.
cumulus@switch:~$ netq spine01 show inventory brief
Matching inventory records:
Hostname Switch OS CPU ASIC Ports
----------------- -------------------- --------------- -------- --------------- -----------------------------------
spine01 VX CL x86_64 VX N/A
This example show the components on the NetQ On-premises or Cloud Appliance.
cumulus@switch:~$ netq show inventory brief opta
Matching inventory records:
Hostname Switch OS CPU ASIC Ports
----------------- -------------------- --------------- -------- --------------- -----------------------------------
netq-ts N/A Ubuntu x86_64 N/A N/A
View Switch Hardware Inventory
You can view hardware components deployed on each switch in your network.
View ASIC Information for a Switch
You can view the ASIC information for a switch from either the NetQ CLI or NetQ UI.
Locate the medium Inventory/Switches card on your workbench.
Change to the full-screen card and click ASIC.
Note that if you are running CumulusVX switches, no detailed ASIC information is available because the hardware is virtualized.
Click to quickly locate a switch that does not appear on the first page of the switch list.
Select hostname from the Field dropdown.
Enter the hostname of the switch you want to view, and click Apply.
To view information about the ASIC on a switch, run:
netq [<hostname>] show inventory asic [opta] [json]
This example shows the ASIC information for the leaf02 switch.
cumulus@switch:~$ netq leaf02 show inventory asic
Matching inventory records:
Hostname Vendor Model Model ID Core BW Ports
----------------- -------------------- ------------------------------ ------------------------- -------------- -----------------------------------
leaf02 Mellanox Spectrum MT52132 N/A 32 x 100G-QSFP28
This example shows the ASIC information for the NetQ On-premises or Cloud Appliance.
cumulus@switch:~$ netq show inventory asic opta
Matching inventory records:
Hostname Vendor Model Model ID Core BW Ports
----------------- -------------------- ------------------------------ ------------------------- -------------- -----------------------------------
netq-ts Mellanox Spectrum MT52132 N/A 32 x 100G-QSFP28
View Motherboard Information for a Switch
Locate the medium Inventory/Switches card on your workbench.
Hover over the card, and change to the full-screen card using the size picker.
Click Platform.
Note that if you are running CumulusVX switches, no detailed platform information is available because the hardware is virtualized.
Click to quickly locate a switch that does not appear on the first page of the switch list.
Select hostname from the Field dropdown.
Enter the hostname of the switch you want to view, and click Apply.
To view a list of motherboards installed in a switch, run:
netq [<hostname>] show inventory board [opta] [json]
This example shows all motherboard data for the spine01 switch.
cumulus@switch:~$ netq spine01 show inventory board
Matching inventory records:
Hostname Vendor Model Base MAC Serial No Part No Rev Mfg Date
----------------- -------------------- ------------------------------ ------------------ ------------------------- ---------------- ------ ----------
spine01 Dell S6000-ON 44:38:39:00:80:00 N/A N/A N/A N/A
Use the opta option without the hostname option to view the motherboard data for the NetQ On-premises or Cloud Appliance. No motherboard data is available for NetQ On-premises or Cloud VMs.
View CPU Information for a Switch
Locate the Inventory/Switches card on your workbench.
Hover over the card, and change to the full-screen card using the size picker.
Click CPU.
Click to quickly locate a switch that does not appear on the first page of the switch list.
Select hostname from the Field dropdown. Then enter the hostname of the switch you want to view.
To view CPU information for a switch in your network, run:
netq [<hostname>] show inventory cpu [arch <cpu-arch>] [opta] [json]
This example shows CPU information for the server02 switch.
cumulus@switch:~$ netq server02 show inventory cpu
Matching inventory records:
Hostname Arch Model Freq Cores
----------------- -------- ------------------------------ ---------- -----
server02 x86_64 Intel Core i7 9xx (Nehalem Cla N/A 1
ss Core i7)
This example shows the CPU information for the NetQ On-premises or Cloud Appliance.
cumulus@switch:~$ netq show inventory cpu opta
Matching inventory records:
Hostname Arch Model Freq Cores
----------------- -------- ------------------------------ ---------- -----
netq-ts x86_64 Intel Xeon Processor (Skylake, N/A 8
IBRS)
View Disk Information for a Switch
Locate the Inventory/Switches card on your workbench.
Hover over the card, and change to the full-screen card using the size picker.
Click Disk.
Note that if you are running CumulusVX switches, no detailed disk information is available because the hardware is virtualized.
Click to quickly locate a switch that does not appear on the first page of the switch list.
Select hostname from the Field dropdown. Then enter the hostname of the switch you want to view.
To view disk information for a switch in your network, run:
netq [<hostname>] show inventory disk [opta] [json]
This example shows the disk information for the leaf03 switch.
cumulus@switch:~$ netq leaf03 show inventory disk
Matching inventory records:
Hostname Name Type Transport Size Vendor Model
----------------- --------------- ---------------- ------------------ ---------- -------------------- ------------------------------
leaf03 vda disk N/A 6G 0x1af4 N/A
This example show the disk information for the NetQ On-premises or Cloud Appliance.
cumulus@switch:~$ netq show inventory disk opta
Matching inventory records:
Hostname Name Type Transport Size Vendor Model
----------------- --------------- ---------------- ------------------ ---------- -------------------- ------------------------------
netq-ts vda disk N/A 265G 0x1af4 N/A
View Memory Information for a Switch
Memory information is available from the NetQ UI and NetQ CLI.
Inventory/Switches card: view memory chip vendor, name, serial number, size, speed, and type on a switch (table)
netq show inventory memory: view memory chip name, type, size, speed, vendor, and serial number on all devices
Locate the medium Inventory/Switches card on your workbench.
Hover over the card, and change to the full-screen card using the size picker.
Click Memory.
Click to quickly locate a switch that does not appear on the first page of the switch list.
Select hostname from the Field dropdown. Then enter the hostname of the switch you want to view.
To view memory information for your switches and host servers, run:
netq [<hostname>] show inventory memory [opta] [json]
This example shows all the memory characteristics for the leaf01 switch.
cumulus@switch:~$ netq leaf01 show inventory memory
Matching inventory records:
Hostname Name Type Size Speed Vendor Serial No
----------------- --------------- ---------------- ---------- ---------- -------------------- -------------------------
leaf01 DIMM 0 RAM 768 MB Unknown QEMU Not Specified
This example shows the memory information for the NetQ On-premises or Cloud Appliance.
cumulus@switch:~$ netq show inventory memory opta
Matching inventory records:
Hostname Name Type Size Speed Vendor Serial No
----------------- --------------- ---------------- ---------- ---------- -------------------- -------------------------
netq-ts DIMM 0 RAM 16384 MB Unknown QEMU Not Specified
netq-ts DIMM 1 RAM 16384 MB Unknown QEMU Not Specified
netq-ts DIMM 2 RAM 16384 MB Unknown QEMU Not Specified
netq-ts DIMM 3 RAM 16384 MB Unknown QEMU Not Specified
View Switch Software Inventory
View Operating System Information for a Switch
Locate the Inventory/Switches card on your workbench.
Hover over the card, and change to the full-screen card using the size picker.
Click OS.
Click to quickly locate a switch that does not appear on the first page of the switch list.
Enter a hostname, then click Apply.
To view OS information for a switch, run:
netq [<hostname>] show inventory os [opta] [json]
This example shows the OS information for the leaf02 switch.
cumulus@switch:~$ netq leaf02 show inventory os
Matching inventory records:
Hostname Name Version Last Changed
----------------- --------------- ------------------------------------ -------------------------
leaf02 CL 3.7.5 Fri Apr 19 16:01:46 2019
This example shows the OS information for the NetQ On-premises or Cloud Appliance.
cumulus@switch:~$ netq show inventory os opta
Matching inventory records:
Hostname Name Version Last Changed
----------------- --------------- ------------------------------------ -------------------------
netq-ts Ubuntu 18.04 Tue Jul 14 19:27:39 2020
View the Cumulus Linux Packages on a Switch
When you are troubleshooting an issue with a switch, you might want to know which supported versions of the Cumulus Linux operating system are available for that switch and on a switch that is not having the same issue.
To view package information for your switches, run:
netq <hostname> show cl-manifest [json]
This example shows the Cumulus Linux OS versions supported for the leaf01 switch, using the vx ASIC vendor (virtual, so simulated) and x86_64 CPU architecture.
If you are having an issue with a particular switch, you should verify all the installed software and whether it needs updating.
To view package information for a switch, run:
netq <hostname> show cl-pkg-info [<text-package-name>] [around <text-time>] [json]
Use the text-package-name option to narrow the results to a particular package or the around option to narrow the output to a particular time range.
This example shows all installed software packages for spine01.
cumulus@switch:~$ netq spine01 show cl-pkg-info
Matching package_info records:
Hostname Package Name Version CL Version Package Status Last Changed
----------------- ------------------------ -------------------- -------------------- -------------------- -------------------------
spine01 libfile-fnmatch-perl 0.02-2+b1 Cumulus Linux 3.7.12 installed Wed Aug 26 19:58:45 2020
spine01 screen 4.2.1-3+deb8u1 Cumulus Linux 3.7.12 installed Wed Aug 26 19:58:45 2020
spine01 libudev1 215-17+deb8u13 Cumulus Linux 3.7.12 installed Wed Aug 26 19:58:45 2020
spine01 libjson-c2 0.11-4 Cumulus Linux 3.7.12 installed Wed Aug 26 19:58:45 2020
spine01 atftp 0.7.git20120829-1+de Cumulus Linux 3.7.12 installed Wed Aug 26 19:58:45 2020
b8u1
spine01 isc-dhcp-relay 4.3.1-6-cl3u14 Cumulus Linux 3.7.12 installed Wed Aug 26 19:58:45 2020
spine01 iputils-ping 3:20121221-5+b2 Cumulus Linux 3.7.12 installed Wed Aug 26 19:58:45 2020
spine01 base-files 8+deb8u11 Cumulus Linux 3.7.12 installed Wed Aug 26 19:58:45 2020
spine01 libx11-data 2:1.6.2-3+deb8u2 Cumulus Linux 3.7.12 installed Wed Aug 26 19:58:45 2020
spine01 onie-tools 3.2-cl3u6 Cumulus Linux 3.7.12 installed Wed Aug 26 19:58:45 2020
spine01 python-cumulus-restapi 0.1-cl3u10 Cumulus Linux 3.7.12 installed Wed Aug 26 19:58:45 2020
spine01 tasksel 3.31+deb8u1 Cumulus Linux 3.7.12 installed Wed Aug 26 19:58:45 2020
spine01 ncurses-base 5.9+20140913-1+deb8u Cumulus Linux 3.7.12 installed Wed Aug 26 19:58:45 2020
3
spine01 libmnl0 1.0.3-5-cl3u2 Cumulus Linux 3.7.12 installed Wed Aug 26 19:58:45 2020
spine01 xz-utils 5.1.1alpha+20120614- Cumulus Linux 3.7.12 installed Wed Aug 26 19:58:45 2020
...
This example shows the ntp package on the spine01 switch.
cumulus@switch:~$ netq spine01 show cl-pkg-info ntp
Matching package_info records:
Hostname Package Name Version CL Version Package Status Last Changed
----------------- ------------------------ -------------------- -------------------- -------------------- -------------------------
spine01 ntp 1:4.2.8p10-cl3u2 Cumulus Linux 3.7.12 installed Wed Aug 26 19:58:45 2020
View Recommended Software Packages
If you have a software manifest, you can determine the recommended packages and versions for a particular Cumulus Linux release. You can then compare that to the software already installed on your switch(es) to determine if it differs from the manifest. Such a difference might occur if you upgraded one or more packages separately from the Cumulus Linux software itself.
To view recommended package information for a switch, run:
netq <hostname> show recommended-pkg-version [release-id <text-release-id>] [package-name <text-package-name>] [json]
This example shows the recommended packages for upgrading the leaf12 switch, namely switchd.
cumulus@switch:~$ netq leaf12 show recommended-pkg-version
Matching manifest records:
Hostname Release ID ASIC Vendor CPU Arch Package Name Version Last Changed
----------------- -------------------- -------------------- -------------------- -------------------- -------------------- -------------------------
leaf12 3.7.1 vx x86_64 switchd 1.0-cl3u30 Wed Feb 5 04:36:30 2020
This example shows the recommended packages for upgrading the server01 switch, namely lldpd.
cumulus@switch:~$ netq server01 show recommended-pkg-version
Matching manifest records:
Hostname Release ID ASIC Vendor CPU Arch Package Name Version Last Changed
----------------- -------------------- -------------------- -------------------- -------------------- -------------------- -------------------------
server01 3.7.1 vx x86_64 lldpd 0.9.8-0-cl3u11 Wed Feb 5 04:36:30 2020
This example shows the recommended version of the switchd package for use with Cumulus Linux 3.7.2.
cumulus@switch:~$ netq act-5712-09 show recommended-pkg-version release-id 3.7.2 package-name switchd
Matching manifest records:
Hostname Release ID ASIC Vendor CPU Arch Package Name Version Last Changed
----------------- -------------------- -------------------- -------------------- -------------------- -------------------- -------------------------
act-5712-09 3.7.2 bcm x86_64 switchd 1.0-cl3u31 Wed Feb 5 04:36:30 2020
This example shows the recommended version of the switchd package for use with Cumulus Linux 3.1.0. Note the version difference from the example for Cumulus Linux 3.7.2.
cumulus@noc-pr:~$ netq act-5712-09 show recommended-pkg-version release-id 3.1.0 package-name switchd
Matching manifest records:
Hostname Release ID ASIC Vendor CPU Arch Package Name Version Last Changed
----------------- -------------------- -------------------- -------------------- -------------------- -------------------- -------------------------
act-5712-09 3.1.0 bcm x86_64 switchd 1.0-cl3u4 Wed Feb 5 04:36:30 2020
Validate NetQ Agents are Running
You can confirm that NetQ Agents are running on switches and hosts (if installed) using the netq show agents command. The Status indicates whether the agent is up and current, labelled Fresh, or down and stale, labelled Rotten. Additional information includes the agent status — whether it is time synchronized, how long it has been up, and the last time its state changed.
This example shows NetQ Agent state on all devices.
View the state of the NetQ Agent on a given device using the
hostname keyword.
View only the NetQ Agents that are fresh or rotten using the fresh or rotten keyword.
View the state of NetQ Agents at an earlier time using the around
keyword.
Monitor Software Services
Cumulus Linux, SONiC, and NetQ run many services to deliver the various features of these products. You can monitor their status using the netq show services command. This section describes services related to system-level operation. For monitoring other services, such as those related to routing, see those topics. NetQ automatically monitors the following services:
aclinit: aclinit service
acltool: acltool service
bgp: BGP (Border Gateway Protocol) service
bgpd: BGP daemon
chrony: chrony service
clagd: MLAG (Multi-chassis Link Aggregation) daemon
cumulus-chassis-ssh: cumulus-chassis-ssh
cumulus-chassisd: cumulus-chassisd
database: database
dhcp_relay: DHCP relay service
docker: Docker container service
ledmgrd: Switch LED manager daemon
lldp: LLDP (Link Layer Discovery Protocol) service
lldpd: LLDP daemon
mstpd: MSTP (Multiple Spanning Tree Protocol) daemon
neighmgrd: Neighbor manager daemon for BGP and OSPF
netq-agent: NetQ Agent service
netqd: NetQ application daemon
ntp: Network Time Protocol (NTP) service
pmon: Process monitor service
portwd: Port watch daemon
ptmd: PTM (Prescriptive Topology Manager) daemon
pwmd: Password manager daemon
radv: Route advertiser service
rsyslog: Rocket-fast system event logging processing service
smond: System monitor daemon
ssh: Secure shell service for switches and servers
status: Show services with a given status (ok, error, warning, fail)
switchd: Cumulus Linux switchd service for hardware acceleration
swss: SONiC switch state service daemon
sx_sdk: Spectrum ASIC SDK service
syncd: Synchronization service
syslog: System event logging service
teamd: Network team service
vrf: VRF (Virtual Route Forwarding) service
wd_keepalive: Software watchdog service
zebra: GNU Zebra routing daemon
The CLI syntax for viewing the status of services is:
netq [<hostname>] show services [<service-name>] [vrf <vrf>] [active|monitored] [around <text-time>] [json]
netq [<hostname>] show services [<service-name>] [vrf <vrf>] status (ok|warning|error|fail) [around <text-time>] [json]
netq [<hostname>] show events [severity info | severity error ] message_type services [between <text-time> and <text-endtime>] [json]
View All Services on All Devices
This example shows all available services on each device and whether each is enabled, active, and monitored, along with how long the service has been running and the last time it changed.
It is useful to have colored output for this show command. To configure colored output, run the netq config add color command.
cumulus@switch:~$ netq show services
Hostname Service PID VRF Enabled Active Monitored Status Uptime Last Changed
----------------- -------------------- ----- --------------- ------- ------ --------- ---------------- ------------------------- -------------------------
leaf01 bgpd 2872 default yes yes yes ok 1d:6h:43m:59s Fri Feb 15 17:28:24 2019
leaf01 clagd n/a default yes no yes n/a 1d:6h:43m:35s Fri Feb 15 17:28:48 2019
leaf01 ledmgrd 1850 default yes yes no ok 1d:6h:43m:59s Fri Feb 15 17:28:24 2019
leaf01 lldpd 2651 default yes yes yes ok 1d:6h:43m:27s Fri Feb 15 17:28:56 2019
leaf01 mstpd 1746 default yes yes yes ok 1d:6h:43m:35s Fri Feb 15 17:28:48 2019
leaf01 neighmgrd 1986 default yes yes no ok 1d:6h:43m:59s Fri Feb 15 17:28:24 2019
leaf01 netq-agent 8654 mgmt yes yes yes ok 1d:6h:43m:29s Fri Feb 15 17:28:54 2019
leaf01 netqd 8848 mgmt yes yes yes ok 1d:6h:43m:29s Fri Feb 15 17:28:54 2019
leaf01 ntp 8478 mgmt yes yes yes ok 1d:6h:43m:29s Fri Feb 15 17:28:54 2019
leaf01 ptmd 2743 default yes yes no ok 1d:6h:43m:59s Fri Feb 15 17:28:24 2019
leaf01 pwmd 1852 default yes yes no ok 1d:6h:43m:59s Fri Feb 15 17:28:24 2019
leaf01 smond 1826 default yes yes yes ok 1d:6h:43m:27s Fri Feb 15 17:28:56 2019
leaf01 ssh 2106 default yes yes no ok 1d:6h:43m:59s Fri Feb 15 17:28:24 2019
leaf01 syslog 8254 default yes yes no ok 1d:6h:43m:59s Fri Feb 15 17:28:24 2019
leaf01 zebra 2856 default yes yes yes ok 1d:6h:43m:59s Fri Feb 15 17:28:24 2019
leaf02 bgpd 2867 default yes yes yes ok 1d:6h:43m:55s Fri Feb 15 17:28:28 2019
leaf02 clagd n/a default yes no yes n/a 1d:6h:43m:31s Fri Feb 15 17:28:53 2019
leaf02 ledmgrd 1856 default yes yes no ok 1d:6h:43m:55s Fri Feb 15 17:28:28 2019
leaf02 lldpd 2646 default yes yes yes ok 1d:6h:43m:30s Fri Feb 15 17:28:53 2019
...
If you want to view the service information for a given device, use the hostname option when running the command.
View Information about a Given Service on All Devices
You can view the status of a given service at the current time, at a prior point in time, or view the changes that have occurred for the service during a specified timeframe.
This example shows how to view the status of the NTP service across the network. In this case, the VRF configuration has the NTP service running on both the default and management interface. You can perform the same command with the other services, such as bgpd, lldpd, and clagd.
cumulus@switch:~$ netq show services ntp
Matching services records:
Hostname Service PID VRF Enabled Active Monitored Status Uptime Last Changed
----------------- -------------------- ----- --------------- ------- ------ --------- ---------------- ------------------------- -------------------------
exit01 ntp 8478 mgmt yes yes yes ok 1d:6h:52m:41s Fri Feb 15 17:28:54 2019
exit02 ntp 8497 mgmt yes yes yes ok 1d:6h:52m:36s Fri Feb 15 17:28:59 2019
firewall01 ntp n/a default yes yes yes ok 1d:6h:53m:4s Fri Feb 15 17:28:31 2019
hostd-11 ntp n/a default yes yes yes ok 1d:6h:52m:46s Fri Feb 15 17:28:49 2019
hostd-21 ntp n/a default yes yes yes ok 1d:6h:52m:37s Fri Feb 15 17:28:58 2019
hosts-11 ntp n/a default yes yes yes ok 1d:6h:52m:28s Fri Feb 15 17:29:07 2019
hosts-13 ntp n/a default yes yes yes ok 1d:6h:52m:19s Fri Feb 15 17:29:16 2019
hosts-21 ntp n/a default yes yes yes ok 1d:6h:52m:14s Fri Feb 15 17:29:21 2019
hosts-23 ntp n/a default yes yes yes ok 1d:6h:52m:4s Fri Feb 15 17:29:31 2019
noc-pr ntp 2148 default yes yes yes ok 1d:6h:53m:43s Fri Feb 15 17:27:52 2019
noc-se ntp 2148 default yes yes yes ok 1d:6h:53m:38s Fri Feb 15 17:27:57 2019
spine01 ntp 8414 mgmt yes yes yes ok 1d:6h:53m:30s Fri Feb 15 17:28:05 2019
spine02 ntp 8419 mgmt yes yes yes ok 1d:6h:53m:27s Fri Feb 15 17:28:08 2019
spine03 ntp 8443 mgmt yes yes yes ok 1d:6h:53m:22s Fri Feb 15 17:28:13 2019
leaf01 ntp 8765 mgmt yes yes yes ok 1d:6h:52m:52s Fri Feb 15 17:28:43 2019
leaf02 ntp 8737 mgmt yes yes yes ok 1d:6h:52m:46s Fri Feb 15 17:28:49 2019
leaf11 ntp 9305 mgmt yes yes yes ok 1d:6h:49m:22s Fri Feb 15 17:32:13 2019
leaf12 ntp 9339 mgmt yes yes yes ok 1d:6h:49m:9s Fri Feb 15 17:32:26 2019
leaf21 ntp 9367 mgmt yes yes yes ok 1d:6h:49m:5s Fri Feb 15 17:32:30 2019
leaf22 ntp 9403 mgmt yes yes yes ok 1d:6h:52m:57s Fri Feb 15 17:28:38 2019
View Events Related to a Given Service
To view changes over a given time period, use the netq show events command. For more detailed information about events, refer to Events and Notifications.
This example shows changes to the bgpd service in the last 48 hours.
cumulus@switch:/$ netq show events message_type bgp between now and 48h
Matching events records:
Hostname Message Type Severity Message Timestamp
----------------- ------------ -------- ----------------------------------- -------------------------
leaf01 bgp info BGP session with peer spine-1 swp3. 1d:6h:55m:37s
3 vrf DataVrf1081 state changed fro
m failed to Established
leaf01 bgp info BGP session with peer spine-2 swp4. 1d:6h:55m:37s
3 vrf DataVrf1081 state changed fro
m failed to Established
leaf01 bgp info BGP session with peer spine-3 swp5. 1d:6h:55m:37s
3 vrf DataVrf1081 state changed fro
m failed to Established
leaf01 bgp info BGP session with peer spine-1 swp3. 1d:6h:55m:37s
2 vrf DataVrf1080 state changed fro
m failed to Established
leaf01 bgp info BGP session with peer spine-3 swp5. 1d:6h:55m:37s
2 vrf DataVrf1080 state changed fro
m failed to Established
leaf01 bgp info BGP session with peer spine-2 swp4. 1d:6h:55m:37s
2 vrf DataVrf1080 state changed fro
m failed to Established
leaf01 bgp info BGP session with peer spine-3 swp5. 1d:6h:55m:37s
4 vrf DataVrf1082 state changed fro
m failed to Established
In the UI, you can view your inventory of hosts across the network or individually, including a host’s operating system, ASIC, CPU model, disk, platform, and memory information.
To monitor host hardware resource utilization, see Host Monitoring.
Access and View Host Inventory Data
The Inventory/Hosts card monitors the hardware- and software-component inventory on hosts running NetQ in your network. To add this card to your workbench, select Add card > Inventory > Inventory/Hosts card > Open cards.
Hover over the chart in the default card view to view component details. To view the distribution of components, hover over the card header and increase the card’s size. Select the corresponding icon to view a detailed chart for ASIC, platform, or software components:
To display detailed information as a table, expand the card to its largest size:
DPU Inventory
DPU monitoring is an early access feature.
Use the UI to view your data processing unit (DPU) inventory. The Inventory/DPU card includes the DPU’s operating system, ASIC, CPU model, disk, platform, and memory information.
For DPU performance information, refer to DPU Monitoring.
Access and View DPU Inventory Data
The Inventory/DPU card displays the hardware- and software-component inventory on DPUs running NetQ in your network.
Hover over the chart in the default card view to view component details. To view the distribution of components, hover over the card header and increase the card’s size. Select the corresponding icon to view a detailed chart for ASIC, platform, or software components:
To display detailed information as a table, expand the card to its largest size:
Related Information
To read more about NVIDIA BlueField DPUs and the DOCA Telemetry Service, refer to the DOCA SDK Documentation.
Device Groups
Device groups allow you to create a label for a subset of devices in the inventory. You can configure validation checks to run on select devices by referencing group names.
Create a Device Group
To create a device group, add the Device Groups card to your workbench. In the header, click Open card. Select the Device groups card:
The Device Groups card will now be displayed on your workbench. Select Create new group to create a new device group:
Follow the instructions in the UI create a new group:
Enter a name for the group.
Create a hostname-based rule to define which devices in the inventory should be added to the group.
Confirm the expected matched devices appear in the inventory, and click Create device group.
The following example shows a group name of “exit group” matching any device in the inventory with “exit” in the hostname:
Update a Device Group
When new devices that match existing group rules are added to the inventory, NetQ flags the matching devices for review. The following example shows the switch “exit-2” detected in the inventory after the group was configured:
To add the new device to the group inventory, click Add device and then click Update device group.
Delete a Device Group
To delete a device group:
Expand the Device Groups card:
Click Menu on the desired group and select Delete.
Events and Notifications
Events provide information about how a network and its devices are operating during a given time period. Event notifications are available via Slack, PagerDuty, syslog, and email channels to aid troubleshooting and help resolve network problems before they become critical.
NetQ captures three types of events:
System events: a wide range of events generated by the system about network protocols and services operation, hardware and software status, and system services
Threshold-crossing events: a user-specified set of system related events based on threshold values
What Just Happened events: network hardware events for NVIDIA Spectrum™ switches
You can track events in the NetQ UI with the Events and What Just Happened cards:
Events card: displays system and threshold-crossing events
What Just Happened card: displays network hardware events on NVIDIA Spectrum™ switches
You can monitor system and threshold-crossing events in the CLI with the netq show events command. The netq show wjh-drop command lists all What Just Happened events or those with a selected drop type.
Configure System Event Notifications
To receive the event messages generated and processed by NetQ, you must integrate a third-party event notification application into your workflow. You can integrate NetQ with Syslog, PagerDuty, Slack, and/or email. Alternately, you can send notifications to other third-party applications via a generic webhook channel.
In an on-premises deployment, the NetQ On-premises Appliance or VM receives the raw data stream from the NetQ Agents, processes the data, then stores and delivers events to the Notification function. The Notification function filters and sends messages to any configured notification applications. In a cloud deployment, the NetQ Cloud Appliance or VM passes the raw data stream to the NetQ Cloud service for processing and delivery.
You can implement a proxy server (that sits between the NetQ Appliance or VM and the integration channels) that receives, processes, and distributes the notifications rather than having them sent directly to the integration channel. If you use such a proxy, you must configure NetQ with the proxy information.
Notifications are generated for the following types of events:
Category
Events
Network Protocols
BGP status and session state
MLAG (CLAG) status and session state
EVPN status and session state
LLDP status
OSPF status and session state
PTP status and session state
VLAN status and session state
VXLAN status and session state
Interfaces
Link status
Ports and cables status
MTU status
Services
NetQ Agent status
PTM
SSH *
NTP status
Traces
On-demand trace status
Scheduled trace status
Sensors
Fan status
PSU (power supply unit) status
Temperature status
System Software
Configuration file changes
Running configuration file changes
Cumulus Linux support status
Software package status
Operating system version
Lifecycle management status
System hardware
Physical resources status
BTRFS status
SSD utilization status
* CLI only
Event filters are based on rules you create. You must have at least one rule per filter. A select set of events can be triggered by a user-configured threshold. Refer to the System Event Messages Reference for descriptions and examples of these events.
Event Message Format
Messages have the following structure:
<message-type><timestamp><opid><hostname><severity><message>
Element
Description
message type
Category of event
timestamp
Date and time event occurred
opid
Identifier of the service or process that generated the event
hostname
Hostname of network device where event occurred
severity
Severity classification: error or info
message
Text description of event
For example:
To set up the integrations, you must configure NetQ with at least one channel, one rule, and one filter. To refine what messages you want to view and where to send them, you can add additional rules and filters and set thresholds on supported event types. You can also configure a proxy server to receive, process, and forward the messages. This is accomplished in the following order:
Configure Basic NetQ Event Notifications
The simplest configuration you can create is one that sends all events generated by all interfaces to a single notification application. This is described here. For more granular configurations and examples, refer to Configure Advanced NetQ Event Notifications.
A notification configuration must contain one channel, one rule, and one filter. Creation of the configuration follows this same path:
Add a channel.
Add a rule that accepts a selected set of events.
Add a filter that associates this rule with the newly created channel.
Create a Channel
The first step is to create a Slack, PagerDuty, syslog, email, or generic channel to receive the notifications.
You can use the NetQ UI or the NetQ CLI to create a Slack channel.
Expand the Menu and select Notification channels.
The Slack tab is displayed by default.
Add a channel.
When no channels have been specified, click Add Slack channel.
When at least one channel has been specified, click Add above the table.
Provide a unique name for the channel. Note that spaces are not allowed. Use dashes or camelCase instead.
Create an incoming webhook as described in the Slack documentation Then copy and paste it in the Webhook URL field.
Click Add.
(Optional) To verify the channel configuration, click Test.
cumulus@switch:~$ netq show notification channel
Matching config_notify records:
Name Type Severity Channel Info
--------------- ---------------- -------- ----------------------
slk-netq-events slack info webhook:https://hooks.s
lack.com/services/text/
moretext/evenmoretext
You can use the NetQ UI or the NetQ CLI to create a PagerDuty channel.
Expand the Menu and select Notification channels.
Click PagerDuty.
Add a channel.
When no channels have been specified, click Add PagerDuty channel.
When at least one channel has been specified, click Add above the table.
Provide a unique name for the channel. Note that spaces are not allowed. Use dashes or camelCase instead.
Obtain and enter an integration key (also called a service key or routing key).
Click Add.
(Optional) To verify the channel configuration, click Test.
cumulus@switch:~$ netq show notification channel
Matching config_notify records:
Name Type Severity Channel Info
--------------- ---------------- ---------------- ------------------------
pd-netq-events pagerduty info integration-key: c6d666e
210a8425298ef7abde0d1998
You can use the NetQ UI or the NetQ CLI to create a syslog channel.
Expand the Menu and select Notification channels.
Click Syslog.
Add a channel.
When no channels have been specified, click Add syslog channel.
When at least one channel has been specified, click Add above the table.
Provide a unique name for the channel. Note that spaces are not allowed. Use dashes or camelCase instead.
Enter the IP address and port of the syslog server.
Click Add.
(Optional) To verify the channel configuration, click Test.
To create and verify a syslog channel, run:
netq add notification channel syslog <text-channel-name> hostname <text-syslog-hostname> port <text-syslog-port> [severity info | severity error ]
netq show notification channel [json]
This example shows the creation of a syslog-netq-events channel and verifies the configuration.
Obtain the syslog server hostname (or IP address) and port.
cumulus@switch:~$ netq show notification channel
Matching config_notify records:
Name Type Severity Channel Info
--------------- ---------------- -------- ----------------------
syslog-netq-eve syslog info host:syslog-server
nts port: 514
You can use the NetQ UI or the NetQ CLI to create an email channel.
Expand the Menu and select Notification channels.
Click Email.
Add a channel.
When no channels have been specified, click Add email channel.
When at least one channel has been specified, click Add above the table.
Provide a unique name for the channel. Note that spaces are not allowed. Use dashes or camelCase instead.
Enter a list of emails for the people who you want to receive notifications from this channel.
Enter the emails separated by commas, and no spaces. For example: user1@domain.com,user2@domain.com,user3@domain.com
The first time you configure an email channel, you must also specify the SMTP server information:
Host: hostname or IP address of the SMTP server
Port: port of the SMTP server (typically 587)
User ID/Password: your administrative credentials
From: email address that indicates who sent the notifications
After the first time, any additional email channels you create can use this configuration, by clicking Existing.
Click Add.
(Optional) To verify the channel configuration, click Test.
To create and verify the specification of an email channel, run:
netq add notification channel email <text-channel-name> to <text-email-toids> [smtpserver <text-email-hostname>] [smtpport <text-email-port>] [login <text-email-id>] [password <text-email-password>] [severity info | severity error ]
netq add notification channel email <text-channel-name> to <text-email-toids>
netq show notification channel [json]
The configuration is different depending on whether you are using the on-premises or cloud version of NetQ. Do not configure SMTP for cloud deployments as the NetQ cloud service uses the NetQ SMTP server to push email notifications.
For an on-premises deployment:
Set up an SMTP server. The server can be internal or public.
Create a user account (login and password) on the SMTP server. NetQ sends notifications to this address.
Create the notification channel using this form of the CLI command:
This example creates a rule named all-interfaces, using the key ifname and the value ALL, which sends all events from all interfaces to any channel with this rule.
cumulus@switch:~$ netq add notification rule all-interfaces key ifname value ALL
Successfully added/updated rule all-ifs
cumulus@switch:~$ netq show notification rule
Matching config_notify records:
Name Rule Key Rule Value
--------------- ---------------- --------------------
all-interfaces ifname ALL
Create a Filter
The final step is to create a filter to tie the rule to the channel. You create filters for system events using the NetQ CLI.
cumulus@switch:~$ netq add notification filter notify-all-ifs rule all-interfaces channel pd-netq-events
Successfully added/updated filter notify-all-ifs
cumulus@switch:~$ netq show notification filter
Matching config_notify records:
Name Order Severity Channels Rules
--------------- ---------- ---------------- ---------------- ----------
notify-all-ifs 1 info pd-netq-events all-interfaces
cumulus@switch:~$ netq add notification filter notify-all-ifs rule all-interfaces channel slk-netq-events
Successfully added/updated filter notify-all-ifs
cumulus@switch:~$ netq show notification filter
Matching config_notify records:
Name Order Severity Channels Rules
--------------- ---------- ---------------- ---------------- ----------
notify-all-ifs 1 info slk-netq-events all-interfaces
cumulus@switch:~$ netq add notification filter notify-all-ifs rule all-interfaces channel syslog-netq-events
Successfully added/updated filter notify-all-ifs
cumulus@switch:~$ netq show notification filter
Matching config_notify records:
Name Order Severity Channels Rules
--------------- ---------- ---------------- ---------------- ----------
notify-all-ifs 1 info syslog-netq-events all-ifs
cumulus@switch:~$ netq add notification filter notify-all-ifs rule all-interfaces channel onprem-email
Successfully added/updated filter notify-all-ifs
cumulus@switch:~$ netq show notification filter
Matching config_notify records:
Name Order Severity Channels Rules
--------------- ---------- ---------------- ---------------- ----------
notify-all-ifs 1 info onprem-email all-ifs
NetQ is now configured to send all interface events to your selected channel.
Configure Advanced NetQ Event Notifications
If you want to create more granular notifications based on such items as selected devices, characteristics of devices, or protocols, or you want to use a proxy server, you need more than the basic notification configuration. The following section includes details for creating these more complex notification configurations.
Configure a Proxy Server
To send notification messages through a proxy server instead of directly to a notification channel, you configure NetQ with the hostname and optionally a port of a proxy server. If you do not specify a port, NetQ defaults to port 80. Only one proxy server is currently supported. To simplify deployment, configure your proxy server before configuring channels, rules, or filters.
cumulus@switch:~$ netq show notification channel
Matching config_notify records:
Name Type Severity Channel Info
--------------- ---------------- ---------------- ------------------------
pd-netq-events pagerduty info integration-key: c6d666e
210a8425298ef7abde0d1998
NetQ Notifier sends notifications to Slack as incoming webhooks for a
Slack channel you configure.
WebHook URL for the desired channel. For example: https://hooks.slack.com/services/text/moretext/evenmoretext
severity <level>
The log level, either info or error. The severity defaults to info if unspecified.
tag <text-slack-tag>
Optional tag appended to the Slack notification to highlight particular channels or people. An @ sign must precede the tag value. For example, @netq-info.
This example shows the creation of a slk-netq-events channel and verifies the configuration.
Create an incoming webhook as described in the documentation for your version of Slack.
This example creates an email channel named onprem-email that uses the smtpserver on port 587 to send messages to those persons with access to the smtphostlogin account.
Set up an SMTP server. The server can be internal or public.
Create a user account (login and password) on the SMTP server. NetQ sends notifications to this address.
cumulus@switch:~$ netq show notification channel
Matching config_notify records:
Name Type Severity Channel Info
--------------- ---------------- ---------------- ------------------------
onprem-email email error password: MyPassword123,
port: 587,
isEncrypted: True,
host: smtp.domain.com,
from: smtphostlogin@doma
in.com,
id: smtphostlogin@domain
.com,
to: netq-notifications@d
omain.com
In cloud deployments as the NetQ cloud service uses the NetQ SMTP server to push email notifications.
To create an email notification channel for a cloud deployment, run:
netq add notification channel email <text-channel-name> to <text-email-toids> [severity info | severity error]
netq show notification channel [json]
This example creates an email channel named cloud-email that uses the NetQ SMTP server to send messages to those persons with access to the netq-cloud-notifications account.
URL of the remote application to receive notifications
severity <level>
The log level, either info or error. The severity defaults to info if unspecified.
use-ssl [True | False]
Enable or disable SSL
auth-type [basic-auth | api-key]
Set authentication parameters. Either basic-auth with generic-username and generic-password or api-key with a key-name and key-value
Create Rules
A single key-value pair comprises each rule. The key-value pair indicates what messages to include or drop from event information sent to a notification channel. You can create more than one rule for a single filter. Creating multiple rules for a given filter can provide a very defined filter. For example, you can specify rules around hostnames or interface names, enabling you to filter messages specific to those hosts or interfaces. You can only create rules after you have set up your notification channels.
NetQ includes a predefined fixed set of valid rule keys. You enter values as regular expressions, which vary according to your deployment.
Rule Keys and Values
Service
Rule Key
Description
Example Rule Values
BGP
message_type
Network protocol or service identifier
bgp
hostname
User-defined, text-based name for a switch or host
server02, leaf11, exit01, spine-4
peer
User-defined, text-based name for a peer switch or host
server4, leaf-3, exit02, spine06
desc
Text description
vrf
Name of VRF interface
mgmt, default
old_state
Previous state of the BGP service
Established, Failed
new_state
Current state of the BGP service
Established, Failed
old_last_reset_time
Previous time that BGP service was reset
Apr3, 2019, 4:17 PM
new_last_reset_time
Most recent time that BGP service was reset
Apr8, 2019, 11:38 AM
ConfigDiff
message_type
Network protocol or service identifier
configdiff
hostname
User-defined, text-based name for a switch or host
server02, leaf11, exit01, spine-4
vni
Virtual Network Instance identifier
12, 23
old_state
Previous state of the configuration file
created, modified
new_state
Current state of the configuration file
created, modified
EVPN
message_type
Network protocol or service identifier
evpn
hostname
User-defined, text-based name for a switch or host
server02, leaf-9, exit01, spine04
vni
Virtual Network Instance identifier
12, 23
old_in_kernel_state
Previous VNI state, in kernel or not
true, false
new_in_kernel_state
Current VNI state, in kernel or not
true, false
old_adv_all_vni_state
Previous VNI advertising state, advertising all or not
true, false
new_adv_all_vni_state
Current VNI advertising state, advertising all or not
true, false
LCM
message_type
Network protocol or service identifier
clag
hostname
User-defined, text-based name for a switch or host
server02, leaf-9, exit01, spine04
old_conflicted_bonds
Previous pair of interfaces in a conflicted bond
swp7 swp8, swp3 swp4
new_conflicted_bonds
Current pair of interfaces in a conflicted bond
swp11 swp12, swp23 swp24
old_state_protodownbond
Previous state of the bond
protodown, up
new_state_protodownbond
Current state of the bond
protodown, up
Link
message_type
Network protocol or service identifier
link
hostname
User-defined, text-based name for a switch or host
server02, leaf-6, exit01, spine7
ifname
Software interface name
eth0, swp53
LLDP
message_type
Network protocol or service identifier
lldp
hostname
User-defined, text-based name for a switch or host
server02, leaf41, exit01, spine-5, tor-36
ifname
Software interface name
eth1, swp12
old_peer_ifname
Previous software interface name
eth1, swp12, swp27
new_peer_ifname
Current software interface name
eth1, swp12, swp27
old_peer_hostname
Previous user-defined, text-based name for a peer switch or host
server02, leaf41, exit01, spine-5, tor-36
new_peer_hostname
Current user-defined, text-based name for a peer switch or host
server02, leaf41, exit01, spine-5, tor-36
MLAG (CLAG)
message_type
Network protocol or service identifier
clag
hostname
User-defined, text-based name for a switch or host
server02, leaf-9, exit01, spine04
old_conflicted_bonds
Previous pair of interfaces in a conflicted bond
swp7 swp8, swp3 swp4
new_conflicted_bonds
Current pair of interfaces in a conflicted bond
swp11 swp12, swp23 swp24
old_state_protodownbond
Previous state of the bond
protodown, up
new_state_protodownbond
Current state of the bond
protodown, up
Node
message_type
Network protocol or service identifier
node
hostname
User-defined, text-based name for a switch or host
server02, leaf41, exit01, spine-5, tor-36
ntp_state
Current state of NTP service
in sync, not sync
db_state
Current state of DB
Add, Update, Del, Dead
NTP
message_type
Network protocol or service identifier
ntp
hostname
User-defined, text-based name for a switch or host
server02, leaf-9, exit01, spine04
old_state
Previous state of service
in sync, not sync
new_state
Current state of service
in sync, not sync
Port
message_type
Network protocol or service identifier
port
hostname
User-defined, text-based name for a switch or host
server02, leaf13, exit01, spine-8, tor-36
ifname
Interface name
eth0, swp14
old_speed
Previous speed rating of port
10 G, 25 G, 40 G, unknown
old_transreceiver
Previous transceiver
40G Base-CR4, 25G Base-CR
old_vendor_name
Previous vendor name of installed port module
Amphenol, OEM, NVIDIA, Fiberstore, Finisar
old_serial_number
Previous serial number of installed port module
MT1507VS05177, AVE1823402U, PTN1VH2
old_supported_fec
Previous forward error correction (FEC) support status
User-defined, text-based name for a switch or host
server02, leaf-26, exit01, spine2-4
old_state
Previous state of a fan, power supply unit, or thermal sensor
Fan: ok, absent, bad
PSU: ok, absent, bad
Temp: ok, busted, bad, critical
new_state
Current state of a fan, power supply unit, or thermal sensor
Fan: ok, absent, bad
PSU: ok, absent, bad
Temp: ok, busted, bad, critical
old_s_state
Previous state of a fan or power supply unit.
Fan: up, down
PSU: up, down
new_s_state
Current state of a fan or power supply unit.
Fan: up, down
PSU: up, down
new_s_max
Current maximum temperature threshold value
Temp: 110
new_s_crit
Current critical high temperature threshold value
Temp: 85
new_s_lcrit
Current critical low temperature threshold value
Temp: -25
new_s_min
Current minimum temperature threshold value
Temp: -50
Services
message_type
Network protocol or service identifier
services
hostname
User-defined, text-based name for a switch or host
server02, leaf03, exit01, spine-8
name
Name of service
clagd, lldpd, ssh, ntp, netqd, netq-agent
old_pid
Previous process or service identifier
12323, 52941
new_pid
Current process or service identifier
12323, 52941
old_status
Previous status of service
up, down
new_status
Current status of service
up, down
Rule names are case sensitive, and you cannot use wildcards. Rule names can contain spaces, but you must enclose them with single quotes in commands. It is easier to use dashes in place of spaces or mixed case for better readability. For example, use 'bgpSessionChanges', 'BGP-session-changes', or 'BGPsessions', instead of 'BGP Session Changes'. Use tab completion to view the command options syntax.
cumulus@switch:~$ netq add notification rule swp52 key port value swp52
Successfully added/updated rule swp52
View Rule Configurations
Use the netq show notification command to view the rules on your
platform.
Create Filters
You can limit or direct event messages using filters. Filters are created based on rules you define and each filter contains one or more rules. When a message matches the rule, it is sent to the indicated destination. Before you can create filters, you need to have already defined rules and configured channels.
As you create filters, they are added to the bottom of a list of filters. By default, NetQ processes event messages against filters starting at the top of the filter list and works its way down until it finds a match. NetQ applies the first filter that matches an event message, ignoring the other filters. Then it moves to the next event message and reruns the process, starting at the top of the list of filters. NetQ ignores events that do not match any filter.
You might have to change the order of filters in the list to ensure you capture the events you want and drop the events you do not want. This is possible using the before or after keywords to ensure one rule is processed before or after another.
This diagram shows an example with four defined filters with sample output results.
Filter names can contain spaces, but must be enclosed with single quotes in commands. It is easier to use dashes in place of spaces or mixed case for better readability. For example, use bgpSessionChanges or BGP-session-changes or BGPsessions, instead of 'BGP Session Changes'. Filter names are also case sensitive.
Example Filters
Create a filter for BGP events on a particular device:
Create a filter to drop messages from a given interface, and match
against this filter before any other filters. To create a drop-style
filter, do not specify a channel. To list the filter first, use the
before option.
You do not need to reenter all the severity, channel, and rule information for existing rules if you only want to change their processing order.
Run the netq show notification command again to verify the changes.
Suppress Events
Suppressing events reduces the number of event notifications NetQ displays. You can create rules to suppress events attributable to known issues or false alarms. In addition to the rules you create to suppress events, NetQ suppresses some events by default.
You can suppress events for the following types of messages:
agent: NetQ Agent messages
bgp: BGP-related messages
btrfsinfo: Messages related to the BTRFS file system in Cumulus Linux
clag: MLAG-related messages
clsupport: Messages generated when creating the cl-support script
configdiff: Messages related to the difference between two configurations
evpn: EVPN-related messages
link: Messages related to links, including state and interface name
lldp: LLDP-related messages
ntp: NTP-related messages
ospf: OSPF-related messages
sensor: Messages related to various sensors
services: Service-related information, including whether a service is active or inactive
ssdutil: Messages related to the storage on the switch
NetQ suppresses BGP, EVPN, link, and sensor-related events with a severity level of "info" by default in the UI. You can disable this rule if you'd prefer to receive these notifications.
Create an Event Suppression Configuration
To suppress events using the NetQ UI:
Click Menu, then Events.
In the top-right corner, select Show suppression rules.
Select Add rule. You can configure individual suppression rules or you can create a group rule that suppresses events for all message types.
Enter the suppression rule parameters and click Create.
When you add a new configuration using the CLI, you can specify a scope, which limits the suppression in the following order:
Hostname.
Severity.
Message type-specific filters. For example, the target VNI for EVPN messages, or the interface name for a link message.
NetQ has a predefined set of filter conditions. To see these conditions, run netq show events-config show-filter-conditions:
cumulus@switch:~$ netq show events-config show-filter-conditions
Matching config_events records:
Message Name Filter Condition Name Filter Condition Hierarchy Filter Condition Description
------------------------ ------------------------------------------ ---------------------------------------------------- --------------------------------------------------------
evpn vni 3 Target VNI
evpn severity 2 Severity error/info
evpn hostname 1 Target Hostname
clsupport fileAbsName 3 Target File Absolute Name
clsupport severity 2 Severity error/info
clsupport hostname 1 Target Hostname
link new_state 4 up / down
link ifname 3 Target Ifname
link severity 2 Severity error/info
link hostname 1 Target Hostname
ospf ifname 3 Target Ifname
ospf severity 2 Severity error/info
ospf hostname 1 Target Hostname
sensor new_s_state 4 New Sensor State Eg. ok
sensor sensor 3 Target Sensor Name Eg. Fan, Temp
sensor severity 2 Severity error/info
sensor hostname 1 Target Hostname
configdiff old_state 5 Old State
configdiff new_state 4 New State
configdiff type 3 File Name
configdiff severity 2 Severity error/info
configdiff hostname 1 Target Hostname
ssdutil info 3 low health / significant health drop
ssdutil severity 2 Severity error/info
ssdutil hostname 1 Target Hostname
agent db_state 3 Database State
agent severity 2 Severity error/info
agent hostname 1 Target Hostname
ntp new_state 3 yes / no
ntp severity 2 Severity error/info
ntp hostname 1 Target Hostname
bgp vrf 4 Target VRF
bgp peer 3 Target Peer
bgp severity 2 Severity error/info
bgp hostname 1 Target Hostname
services new_status 4 active / inactive
services name 3 Target Service Name Eg.netqd, mstpd, zebra
services severity 2 Severity error/info
services hostname 1 Target Hostname
btrfsinfo info 3 high btrfs allocation space / data storage efficiency
btrfsinfo severity 2 Severity error/info
btrfsinfo hostname 1 Target Hostname
clag severity 2 Severity error/info
clag hostname 1 Target Hostname
For example, to create a configuration called mybtrfs that suppresses OSPF-related events on leaf01 for the next 10 minutes, run:
You can delete or disable suppression rules. After you delete a rule, event notifications will resume. Disabling suppression rules pauses those rules, allowing you to receive event notifications temporarily.
To remove suppressed event configurations:
Click Menu, then Events.
Select Show suppression rules at the top of the page.
Toggle between the Single and All tabs to view the suppression rules. Navigate to the rule you would like to delete or disable.
Click the three-dot menu and select Delete. If you’d like to pause the rule instead of deleting it, click Disable.
To remove an event suppression configuration, run netq del events-config events_config_id <text-events-config-id-anchor>.
When you filter for a message type, you must include the show-filter-conditions keyword to show the conditions associated with that message type and the hierarchy in which they get processed.
The following section lists examples of advanced notification configurations.
Create a Notification for BGP Events from a Selected Switch
This example creates a notification integration with a PagerDuty channel called pd-netq-events. It then creates a rule bgpHostname and a filter called 4bgpSpine for any notifications from spine-01. The result is that any info severity event messages from Spine-01 is filtered to the pd-netq-events channel.
cumulus@switch:~$ netq add notification channel pagerduty pd-netq-events integration-key 1234567890
Successfully added/updated channel pd-netq-events
cumulus@switch:~$ netq add notification rule bgpHostname key node value spine-01
Successfully added/updated rule bgpHostname
cumulus@switch:~$ netq add notification filter bgpSpine rule bgpHostname channel pd-netq-events
Successfully added/updated filter bgpSpine
cumulus@switch:~$ netq show notification channel
Matching config_notify records:
Name Type Severity Channel Info
--------------- ---------------- ---------------- ------------------------
pd-netq-events pagerduty info integration-key: 1234567
890
cumulus@switch:~$ netq show notification rule
Matching config_notify records:
Name Rule Key Rule Value
--------------- ---------------- --------------------
bgpHostname hostname spine-01
cumulus@switch:~$ netq show notification filter
Matching config_notify records:
Name Order Severity Channels Rules
--------------- ---------- ---------------- ---------------- ----------
bgpSpine 1 info pd-netq-events bgpHostnam
e
Create a Notification for Errors on a Given EVPN VNI
This example creates a notification integration with a PagerDuty channel called pd-netq-events. It then creates a rule evpnVni and a filter called 3vni42 for any error messages from VNI 42 on the EVPN overlay network. The result is that any event messages from VNI 42 with a severity level of ‘error’ are filtered to the pd-netq-events channel.
cumulus@switch:~$ netq add notification channel pagerduty pd-netq-events integration-key 1234567890
Successfully added/updated channel pd-netq-events
cumulus@switch:~$ netq add notification rule evpnVni key vni value 42
Successfully added/updated rule evpnVni
cumulus@switch:~$ netq add notification filter vni42 rule evpnVni channel pd-netq-events
Successfully added/updated filter vni42
cumulus@switch:~$ netq show notification channel
Matching config_notify records:
Name Type Severity Channel Info
--------------- ---------------- ---------------- ------------------------
pd-netq-events pagerduty info integration-key: 1234567
890
cumulus@switch:~$ netq show notification rule
Matching config_notify records:
Name Rule Key Rule Value
--------------- ---------------- --------------------
bgpHostname hostname spine-01
evpnVni vni 42
cumulus@switch:~$ netq show notification filter
Matching config_notify records:
Name Order Severity Channels Rules
--------------- ---------- ---------------- ---------------- ----------
bgpSpine 1 info pd-netq-events bgpHostnam
e
vni42 2 error pd-netq-events evpnVni
Create a Notification for Configuration File Changes
This example creates a notification integration with a Slack channel called slk-netq-events. It then creates a rule sysconf and a filter called configChange for any configuration file update messages. The result is that any configuration update messages are filtered to the slk-netq-events channel.
cumulus@switch:~$ netq add notification channel slack slk-netq-events webhook https://hooks.slack.com/services/text/moretext/evenmoretext
Successfully added/updated channel slk-netq-events
cumulus@switch:~$ netq add notification rule sysconf key message_type value configdiff
Successfully added/updated rule sysconf
cumulus@switch:~$ netq add notification filter configChange severity info rule sysconf channel slk-netq-events
Successfully added/updated filter configChange
cumulus@switch:~$ netq show notification channel
Matching config_notify records:
Name Type Severity Channel Info
--------------- ---------------- -------- ----------------------
slk-netq-events slack info webhook:https://hooks.s
lack.com/services/text/
moretext/evenmoretext
cumulus@switch:~$ netq show notification rule
Matching config_notify records:
Name Rule Key Rule Value
--------------- ---------------- --------------------
bgpHostname hostname spine-01
evpnVni vni 42
sysconf message_type configdiff
cumulus@switch:~$ netq show notification filter
Matching config_notify records:
Name Order Severity Channels Rules
--------------- ---------- ---------------- ---------------- ----------
bgpSpine 1 info pd-netq-events bgpHostnam
e
vni42 2 error pd-netq-events evpnVni
configChange 3 info slk-netq-events sysconf
Create a Notification for When a Service Goes Down
This example creates a notification integration with a Slack channel called slk-netq-events. It then creates a rule svcStatus and a filter called svcDown for any services state messages indicating a service is no longer operational. The result is that any service down messages are filtered to the slk-netq-events channel.
cumulus@switch:~$ netq add notification channel slack slk-netq-events webhook https://hooks.slack.com/services/text/moretext/evenmoretext
Successfully added/updated channel slk-netq-events
cumulus@switch:~$ netq add notification rule svcStatus key new_status value down
Successfully added/updated rule svcStatus
cumulus@switch:~$ netq add notification filter svcDown severity error rule svcStatus channel slk-netq-events
Successfully added/updated filter svcDown
cumulus@switch:~$ netq show notification channel
Matching config_notify records:
Name Type Severity Channel Info
--------------- ---------------- -------- ----------------------
slk-netq-events slack info webhook:https://hooks.s
lack.com/services/text/
moretext/evenmoretext
cumulus@switch:~$ netq show notification rule
Matching config_notify records:
Name Rule Key Rule Value
--------------- ---------------- --------------------
bgpHostname hostname spine-01
evpnVni vni 42
svcStatus new_status down
sysconf configdiff updated
cumulus@switch:~$ netq show notification filter
Matching config_notify records:
Name Order Severity Channels Rules
--------------- ---------- ---------------- ---------------- ----------
bgpSpine 1 info pd-netq-events bgpHostnam
e
vni42 2 error pd-netq-events evpnVni
configChange 3 info slk-netq-events sysconf
svcDown 4 error slk-netq-events svcStatus
Create a Filter to Drop Notifications from a Given Interface
This example creates a notification integration with a Slack channel called slk-netq-events. It then creates a rule swp52 and a filter called swp52Drop that drops all notifications for events from interface swp52.
cumulus@switch:~$ netq add notification channel slack slk-netq-events webhook https://hooks.slack.com/services/text/moretext/evenmoretext
Successfully added/updated channel slk-netq-events
cumulus@switch:~$ netq add notification rule swp52 key port value swp52
Successfully added/updated rule swp52
cumulus@switch:~$ netq add notification filter swp52Drop severity error rule swp52 before bgpSpine
Successfully added/updated filter swp52Drop
cumulus@switch:~$ netq show notification channel
Matching config_notify records:
Name Type Severity Channel Info
--------------- ---------------- -------- ----------------------
slk-netq-events slack info webhook:https://hooks.s
lack.com/services/text/
moretext/evenmoretext
cumulus@switch:~$ netq show notification rule
Matching config_notify records:
Name Rule Key Rule Value
--------------- ---------------- --------------------
bgpHostname hostname spine-01
evpnVni vni 42
svcStatus new_status down
swp52 port swp52
sysconf configdiff updated
cumulus@switch:~$ netq show notification filter
Matching config_notify records:
Name Order Severity Channels Rules
--------------- ---------- ---------------- ---------------- ----------
swp52Drop 1 error NetqDefaultChann swp52
el
bgpSpine 2 info pd-netq-events bgpHostnam
e
vni42 3 error pd-netq-events evpnVni
configChange 4 info slk-netq-events sysconf
svcDown 5 error slk-netq-events svcStatus
Create a Notification for a Given Device that Has a Tendency to Overheat (Using Multiple Rules)
This example creates a notification when switch leaf04 has passed over the high temperature threshold. Two rules were necessary to create this notification, one to identify the specific device and one to identify the temperature trigger. NetQ then sends the message to the pd-netq-events channel.
cumulus@switch:~$ netq add notification channel pagerduty pd-netq-events integration-key 1234567890
Successfully added/updated channel pd-netq-events
cumulus@switch:~$ netq add notification rule switchLeaf04 key hostname value leaf04
Successfully added/updated rule switchLeaf04
cumulus@switch:~$ netq add notification rule overTemp key new_s_crit value 24
Successfully added/updated rule overTemp
cumulus@switch:~$ netq add notification filter critTemp rule switchLeaf04 channel pd-netq-events
Successfully added/updated filter critTemp
cumulus@switch:~$ netq add notification filter critTemp severity critical rule overTemp channel pd-netq-events
Successfully added/updated filter critTemp
cumulus@switch:~$ netq show notification channel
Matching config_notify records:
Name Type Severity Channel Info
--------------- ---------------- ---------------- ------------------------
pd-netq-events pagerduty info integration-key: 1234567
890
cumulus@switch:~$ netq show notification rule
Matching config_notify records:
Name Rule Key Rule Value
--------------- ---------------- --------------------
bgpHostname hostname spine-01
evpnVni vni 42
overTemp new_s_crit 24
svcStatus new_status down
switchLeaf04 hostname leaf04
swp52 port swp52
sysconf configdiff updated
cumulus@switch:~$ netq show notification filter
Matching config_notify records:
Name Order Severity Channels Rules
--------------- ---------- ---------------- ---------------- ----------
swp52Drop 1 error NetqDefaultChann swp52
el
bgpSpine 2 info pd-netq-events bgpHostnam
e
vni42 3 error pd-netq-events evpnVni
configChange 4 info slk-netq-events sysconf
svcDown 5 error slk-netq-events svcStatus
critTemp 6 error pd-netq-events switchLeaf
04
overTemp
Manage NetQ Event Notification Integrations
You might need to modify event notification configurations at some point in the lifecycle of your deployment. You can add channels, rules, filters, and a proxy at any time. You can remove channels, rules, and filters if they are not part of an existing notification configuration.
Remove an Event Notification Channel
You can remove channels if they are not part of an existing notification configuration.
To remove notification channels:
Expand the Menu and select Notification channels.
Select the tab for the type of channel you want to remove.
Select one or more channels.
Click Delete.
To remove notification channels, run:
netq del notification channel <text-channel-name-anchor>
This example removes a Slack integration and verifies it is no longer in
the configuration:
cumulus@switch:~$ netq del notification channel slk-netq-events
cumulus@switch:~$ netq show notification channel
Matching config_notify records:
Name Type Severity Channel Info
--------------- ---------------- ---------------- ------------------------
pd-netq-events pagerduty info integration-key: 1234567
890
Delete an Event Notification Rule
You might find after some experience with a given rule that you want to edit or remove the rule to better meet your needs. You can remove rules if they are not part of an existing notification configuration using the NetQ CLI.
To remove notification rules, run:
netq del notification rule <text-rule-name-anchor>
This example removes a rule named swp52 and verifies it is no longer in
the configuration:
cumulus@switch:~$ netq del notification rule swp52
cumulus@switch:~$ netq show notification rule
Matching config_notify records:
Name Rule Key Rule Value
--------------- ---------------- --------------------
bgpHostname hostname spine-01
evpnVni vni 42
overTemp new_s_crit 24
svcStatus new_status down
switchLeaf04 hostname leaf04
sysconf configdiff updated
Delete an Event Notification Filter
To delete notification filters, run:
netq del notification filter <text-filter-name-anchor>
Delete an Event Notification Proxy
You can remove the proxy server by running the netq del notification proxy command. This changes the NetQ behavior to send events directly to the notification channels.
Monitor Container Environments Using Kubernetes API Server
The NetQ Agent monitors many aspects of containers on your network by integrating with the Kubernetes API server. In particular, the NetQ Agent tracks:
Identity: Every container’s IP and MAC address, name, image, and more. NetQ can locate containers across the fabric based on a container’s name, image, IP or MAC address, and protocol and port pair.
Port mapping on a network: Protocol and ports exposed by a container. NetQ can identify containers exposing a specific protocol and port pair on a network.
Connectivity: Information about network connectivity for a container, including adjacency and identifying a top of rack switch’s effects on containers.
This topic assumes a reasonable familiarity with Kubernetes terminology and architecture.
Use NetQ with Kubernetes Clusters
The NetQ Agent interfaces with the Kubernetes API server and listens to Kubernetes events. The NetQ Agent monitors network identity and physical network connectivity of Kubernetes resources like pods, daemon sets, services, and so forth. NetQ works with any container network interface (CNI), such as Calico or Flannel.
The NetQ Kubernetes integration enables network administrators to:
Identify and locate pods, deployment, replica-set and services deployed within the network using IP, name, label, and so forth.
Track network connectivity of all pods of a service, deployment, and replica set.
Locate what pods have been deployed adjacent to a top of rack (ToR) switch.
Check the impact on a pod, services, replica set or deployment by a specific ToR switch.
NetQ also helps network administrators identify changes within a Kubernetes cluster and determine if such changes had an adverse effect on the network performance (caused by a noisy neighbor for example). Additionally, NetQ helps the infrastructure administrator determine the distribution of Kubernetes workloads within a network.
Requirements
The NetQ Agent supports Kubernetes version 1.9.2 or later.
Command Summary
A large set of commands are available to monitor Kubernetes configurations, including the ability to monitor clusters, nodes, daemon-set, deployment, pods, replication, and services. Run netq show kubernetes help to view the commands. Refer to the command line reference for additional details.
Enable Kubernetes Monitoring
For Kubernetes monitoring, the NetQ Agent must be installed, running, and enabled on the hosts providing the Kubernetes service.
To enable NetQ Agent monitoring of the containers using the Kubernetes API, you must configure the following on the Kubernetes master node:
Install and configure the NetQ Agent and CLI on the master node.
After waiting for a minute, run the show command to view the cluster:
cumulus@host:~$netq show kubernetes cluster
Next, you must enable the NetQ Agent on every worker node for complete insight into your container network. Repeat steps 2 and 3 on each worker node.
View Status of Kubernetes Clusters
Run the netq show kubernetes cluster command to view the status of all Kubernetes clusters in the fabric. The following example shows two clusters: one with server11 as the master server and the other with server12 as the master server. Both are healthy and both list their associated worker nodes.
cumulus@host:~$ netq show kubernetes cluster
Matching kube_cluster records:
Master Cluster Name Controller Status Scheduler Status Nodes
------------------------ ---------------- -------------------- ---------------- --------------------
server11:3.0.0.68 default Healthy Healthy server11 server13 se
rver22 server11 serv
er12 server23 server
24
server12:3.0.0.69 default Healthy Healthy server12 server21 se
rver23 server13 serv
er14 server21 server
22
For deployments with multiple clusters, you can use the hostname option to filter the output. This example shows filtering of the list by server11:
cumulus@host:~$ netq server11 show kubernetes cluster
Matching kube_cluster records:
Master Cluster Name Controller Status Scheduler Status Nodes
------------------------ ---------------- -------------------- ---------------- --------------------
server11:3.0.0.68 default Healthy Healthy server11 server13 se
rver22 server11 serv
er12 server23 server
24
View Changes to a Cluster
If data collection from the NetQ Agents is not occurring as it did previously, verify that no changes made to the Kubernetes cluster configuration use the around option. Be sure to include the unit of measure with the around value. Valid units include:
w: weeks
d: days
h: hours
m: minutes
s: seconds
now
This example shows changes that made to the cluster in the last hour. This example shows the addition of the two master nodes and the various worker nodes for each cluster.
cumulus@host:~$ netq show kubernetes cluster around 1h
Matching kube_cluster records:
Master Cluster Name Controller Status Scheduler Status Nodes DBState Last changed
------------------------ ---------------- -------------------- ---------------- ---------------------------------------- -------- -------------------------
server11:3.0.0.68 default Healthy Healthy server11 server13 server22 server11 serv Add Fri Feb 8 01:50:50 2019
er12 server23 server24
server12:3.0.0.69 default Healthy Healthy server12 server21 server23 server13 serv Add Fri Feb 8 01:50:50 2019
er14 server21 server22
server12:3.0.0.69 default Healthy Healthy server12 server21 server23 server13 Add Fri Feb 8 01:50:50 2019
server11:3.0.0.68 default Healthy Healthy server11 Add Fri Feb 8 01:50:50 2019
server12:3.0.0.69 default Healthy Healthy server12 Add Fri Feb 8 01:50:50 2019
View Kubernetes Pod Information
You can show configuration and status of the pods in a cluster, including the names, labels, addresses, associated cluster and containers, and whether the pod is running. This example shows pods for FRR, nginx, Calico, and various Kubernetes components sorted by master node.
You can view detailed information about a node, including their role in the cluster, pod CIDR and kubelet status. This example shows all the nodes in the cluster with server11 as the master. Note that server11 acts as a worker node along with the other nodes in the cluster, server12, server13, server22, server23, and server24.
To display the kubelet or Docker version, use the components option with the show command. This example lists the kublet version, a proxy address if used, and the status of the container for server11 master and worker nodes.
To view only the details for a selected node, the name option with the hostname of that node following the components option:
cumulus@host:~$ netq server11 show kubernetes node components name server13
Matching kube_cluster records:
Master Cluster Name Node Name Kubelet KubeProxy Container Runt
ime
------------------------ ---------------- -------------------- ------------ ------------ ----------------- --------------
server11:3.0.0.68 default server13 v1.9.2 v1.9.2 docker://17.3.2 KubeletReady
View Kubernetes Replica Set on a Node
You can view information about the replica set, including the name, labels, and number of replicas present for each application. This example shows the number of replicas for each application in the server11 cluster:
You can view information about the daemon set running on the node. This example shows that six copies of the cumulus-frr daemon are running on the server11 node:
cumulus@host:~$ netq server11 show kubernetes daemon-set namespace default
Matching kube_daemonset records:
Master Cluster Name Namespace Daemon Set Name Labels Desired Count Ready Count Last Changed
------------------------ ------------ ---------------- ------------------------------ -------------------- ------------- ----------- ----------------
server11:3.0.0.68 default default cumulus-frr k8s-app:cumulus-frr 6 6 14h:25m:37s
View Pods on a Node
You can view information about the pods on the node. The first example shows all pods running nginx in the default namespace for the server11 cluster. The second example shows all pods running any application in the default namespace for the server11 cluster.
cumulus@host:~$ netq server11 show kubernetes pod namespace default label nginx
Matching kube_pod records:
Master Namespace Name IP Node Labels Status Containers Last Changed
------------------------ ------------ -------------------- ---------------- ------------ -------------------- -------- ------------------------ ----------------
server11:3.0.0.68 default nginx-8586cf59-26pj5 10.244.9.193 server24 run:nginx Running nginx:6e2b65070c86 14h:25m:24s
server11:3.0.0.68 default nginx-8586cf59-c82ns 10.244.40.128 server12 run:nginx Running nginx:01b017c26725 14h:25m:24s
server11:3.0.0.68 default nginx-8586cf59-wjwgp 10.244.49.64 server22 run:nginx Running nginx:ed2b4254e328 14h:25m:24s
cumulus@host:~$ netq server11 show kubernetes pod namespace default label app
Matching kube_pod records:
Master Namespace Name IP Node Labels Status Containers Last Changed
------------------------ ------------ -------------------- ---------------- ------------ -------------------- -------- ------------------------ ----------------
server11:3.0.0.68 default httpd-5456469bfd-bq9 10.244.49.65 server22 app:httpd Running httpd:79b7f532be2d 14h:20m:34s
zm
server11:3.0.0.68 default influxdb-6cdb566dd-8 10.244.162.128 server13 app:influx Running influxdb:15dce703cdec 14h:20m:34s
9lwn
View Status of the Replication Controller on a Node
After you create the replicas, you can then view information about the replication controller:
cumulus@host:~$ netq server11 show kubernetes replication-controller
No matching kube_replica records found
View Kubernetes Deployment Information
For each depolyment, you can view the number of replicas associated with an application. This example shows information for a deployment of the nginx application:
cumulus@host:~$ netq server11 show kubernetes deployment name nginx
Matching kube_deployment records:
Master Namespace Name Replicas Ready Replicas Labels Last Changed
------------------------ --------------- -------------------- ---------------------------------- -------------- ------------------------------ ----------------
server11:3.0.0.68 default nginx 3 3 run:nginx 14h:27m:20s
Search Using Labels
You can search for information about your Kubernetes clusters using labels. A label search is similar to a “contains” regular expression search. The following example looks for all nodes that contain kube in the replication set name or label:
You can view the connectivity graph of a Kubernetes pod, seeing its replica set, deployment or service level. The connectivity graph starts with the server where you deployed the pod, and shows the peer for each server interface. This data appears in a similar manner as the netq trace command, showing the interface name, the outbound port on that interface, and the inbound port on the peer.
In this example shows connectivity at the deployment level, where the nginx-8586cf59-wjwgp replica is in a pod on the server22 node. It has four possible communication paths, through interfaces swp1-4 out varying ports to peer interfaces swp7 and swp20 on torc-21, torc-22, edge01 and edge02 nodes. Similarly, it shows the connections for two additional nginx replicas.
You can show details about the Kubernetes services in a cluster, including service name, labels associated with the service, type of service, associated IP address, an external address if a public service, and ports used. This example shows the services available in the Kubernetes cluster:
You can filter the list to view details about a particular Kubernetes service using the name option, as shown here:
cumulus@host:~$ netq show kubernetes service name calico-etcd
Matching kube_service records:
Master Namespace Service Name Labels Type Cluster IP External IP Ports Last Changed
------------------------ ---------------- -------------------- ------------ ---------- ---------------- ---------------- ----------------------------------- ----------------
server11:3.0.0.68 kube-system calico-etcd k8s-app:cali ClusterIP 10.96.232.136 TCP:6666 2d:13h:48m:10s
co-etcd
server12:3.0.0.69 kube-system calico-etcd k8s-app:cali ClusterIP 10.96.232.136 TCP:6666 2d:13h:49m:3s
co-etcd
View Kubernetes Service Connectivity
To see the connectivity of a given Kubernetes service, include the connectivity option. This example shows the connectivity of the calico-etcd service:
View the Impact of Connectivity Loss for a Service
You can preview the impact on the service availability based on the loss of particular node using the impact option. The output is color coded (not shown in the example below) so you can clearly see the impact: green shows no impact, yellow shows partial impact, and red shows full impact.
cumulus@host:~$ netq server11 show impact kubernetes service name calico-etcd
calico-etcd -- calico-etcd-pfg9r -- server11:swp1:torbond1 -- swp6:hostbond2:torc-11
-- server11:swp2:torbond1 -- swp6:hostbond2:torc-12
-- server11:swp3:NetQBond-2 -- swp16:NetQBond-16:edge01
-- server11:swp4:NetQBond-2 -- swp16:NetQBond-16:edge02
View Kubernetes Cluster Configuration in the Past
You can use the around option to go back in time to check the network status and identify any changes that occurred on the network.
This example shows the current state of the network. Notice there is a node named server23. server23 is there because the node server22 went down and Kubernetes spun up a third replica on a different host to satisfy the deployment requirement.
View the Impact of Connectivity Loss for a Deployment
You can determine the impact on the Kubernetes deployment in the event a host or switch goes down. The output is color coded (not shown in the example below) so you can clearly see the impact: green shows no impact, yellow shows partial impact, and red shows full impact.
If you need to perform maintenance on the Kubernetes cluster itself, use the following commands to bring the cluster down and then back up.
Display the list of all the nodes in the Kubernetes cluster:
cumulus@host:~$ kubectl get nodes
Tell Kubernetes to drain the node so that the pods running on it are gracefully scheduled elsewhere:
cumulus@host:~$ kubectl drain <node name>
After the maintenance window is over, put the node back into the cluster so that Kubernetes can start scheduling pods on it again:
cumulus@host:~$ kubectl uncordon <node name>
Configure Threshold-Crossing Event Notifications
Threshold-crossing events are user-defined events that detect and prevent network failures for ACL resources, digital optics, forwarding resources, interface errors and statistics, link flaps, resource utilization, and sensor events. You can find a complete list in the Threshold-crossing Events Reference.
A notification configuration must contain one rule. Each rule must contain a scope and a threshold. If you want to deliver events to one or more notification channels (for example, email or Slack), create them by following the instructions in Create a Channel, and then return here to define your rule.
If a rule is not associated with a channel, the event information is only reachable from the database.
Define a Scope
Scope parameters are used to filter events generated by a given rule. You can filter all rules by hostname, while other rules can be filtered by interface or event-specific parameters.
Select Scope Parameters
For each event type, you can filter rules according to the following parameters:
Event ID
Scope Parameters
TCA_TCAM_IN_ACL_V4_FILTER_UPPER
Hostname
TCA_TCAM_EG_ACL_V4_FILTER_UPPER
Hostname
TCA_TCAM_IN_ACL_V4_MANGLE_UPPER
Hostname
TCA_TCAM_EG_ACL_V4_MANGLE_UPPER
Hostname
TCA_TCAM_IN_ACL_V6_FILTER_UPPER
Hostname
TCA_TCAM_EG_ACL_V6_FILTER_UPPER
Hostname
TCA_TCAM_IN_ACL_V6_MANGLE_UPPER
Hostname
TCA_TCAM_EG_ACL_V6_MANGLE_UPPER
Hostname
TCA_TCAM_IN_ACL_8021x_FILTER_UPPER
Hostname
TCA_TCAM_ACL_L4_PORT_CHECKERS_UPPER
Hostname
TCA_TCAM_ACL_REGIONS_UPPER
Hostname
TCA_TCAM_IN_ACL_MIRROR_UPPER
Hostname
TCA_TCAM_ACL_18B_RULES_UPPER
Hostname
TCA_TCAM_ACL_32B_RULES_UPPER
Hostname
TCA_TCAM_ACL_54B_RULES_UPPER
Hostname
TCA_TCAM_IN_PBR_V4_FILTER_UPPER
Hostname
TCA_TCAM_IN_PBR_V6_FILTER_UPPER
Hostname
Event ID
Scope Parameters
TCA_DOM_RX_POWER_ALARM_UPPER
Hostname, Interface
TCA_DOM_RX_POWER_ALARM_LOWER
Hostname, Interface
TCA_DOM_RX_POWER_WARNING_UPPER
Hostname, Interface
TCA_DOM_RX_POWER_WARNING_LOWER
Hostname, Interface
TCA_DOM_BIAS_CURRENT_ALARM_UPPER
Hostname, Interface
TCA_DOM_BIAS_CURRENT_ALARM_LOWER
Hostname, Interface
TCA_DOM_BIAS_CURRENT_WARNING_UPPER
Hostname, Interface
TCA_DOM_BIAS_CURRENT_WARNING_LOWER
Hostname, Interface
TCA_DOM_OUTPUT_POWER_ALARM_UPPER
Hostname, Interface
TCA_DOM_OUTPUT_POWER_ALARM_LOWER
Hostname, Interface
TCA_DOM_OUTPUT_POWER_WARNING_UPPER
Hostname, Interface
TCA_DOM_OUTPUT_POWER_WARNING_LOWER
Hostname, Interface
TCA_DOM_MODULE_TEMPERATURE_ALARM_UPPER
Hostname, Interface
TCA_DOM_MODULE_TEMPERATURE_ALARM_LOWER
Hostname, Interface
TCA_DOM_MODULE_TEMPERATURE_WARNING_UPPER
Hostname, Interface
TCA_DOM_MODULE_TEMPERATURE_WARNING_LOWER
Hostname, Interface
TCA_DOM_MODULE_VOLTAGE_ALARM_UPPER
Hostname, Interface
TCA_DOM_MODULE_VOLTAGE_ALARM_LOWER
Hostname, Interface
TCA_DOM_MODULE_VOLTAGE_WARNING_UPPER
Hostname, Interface
TCA_DOM_MODULE_VOLTAGE_WARNING_LOWER
Hostname, Interface
Event ID
Scope Parameters
TCA_TCAM_TOTAL_ROUTE_ENTRIES_UPPER
Hostname
TCA_TCAM_TOTAL_MCAST_ROUTES_UPPER
Hostname
TCA_TCAM_MAC_ENTRIES_UPPER
Hostname
TCA_TCAM_ECMP_NEXTHOPS_UPPER
Hostname
TCA_TCAM_IPV4_ROUTE_UPPER
Hostname
TCA_TCAM_IPV4_HOST_UPPER
Hostname
TCA_TCAM_IPV6_ROUTE_UPPER
Hostname
TCA_TCAM_IPV6_HOST_UPPER
Hostname
Event ID
Scope Parameters
TCA_HW_IF_OVERSIZE_ERRORS
Hostname, Interface
TCA_HW_IF_UNDERSIZE_ERRORS
Hostname, Interface
TCA_HW_IF_ALIGNMENT_ERRORS
Hostname, Interface
TCA_HW_IF_JABBER_ERRORS
Hostname, Interface
TCA_HW_IF_SYMBOL_ERRORS
Hostname, Interface
Event ID
Scope Parameters
TCA_RXBROADCAST_UPPER
Hostname, Interface
TCA_RXBYTES_UPPER
Hostname, Interface
TCA_RXMULTICAST_UPPER
Hostname, Interface
TCA_TXBROADCAST_UPPER
Hostname, Interface
TCA_TXBYTES_UPPER
Hostname, Interface
TCA_TXMULTICAST_UPPER
Hostname, Interface
Event ID
Scope Parameters
TCA_LINK
Hostname, Interface
Event ID
Scope Parameters
TCA_CPU_UTILIZATION_UPPER
Hostname
TCA_DISK_UTILIZATION_UPPER
Hostname
TCA_MEMORY_UTILIZATION_UPPER
Hostname
Event ID
Scope Parameters
Tx CNP Unicast No Buffer Discard
Hostname, Interface
Rx RoCE PFC Pause Duration
Hostname
Rx RoCE PG Usage Cells
Hostname, Interface
Tx RoCE TC Usage Cells
Hostname, Interface
Rx RoCE No Buffer Discard
Hostname, Interface
Tx RoCE PFC Pause Duration
Hostname, Interface
Tx CNP Buffer Usage Cells
Hostname, Interface
Tx ECN Marked Packets
Hostname, Interface
Tx RoCE PFC Pause Packets
Hostname, Interface
Rx CNP No Buffer Discard
Hostname, Interface
Rx CNP PG Usage Cells
Hostname, Interface
Tx CNP TC Usage Cells
Hostname, Interface
Rx RoCE Buffer Usage Cells
Hostname, Interface
Tx RoCE Unicast No Buffer Discard
Hostname, Interface
Rx CNP Buffer Usage Cells
Hostname, Interface
Rx RoCE PFC Pause Packets
Hostname, Interface
Tx RoCE Buffer Usage Cells
Hostname, Interface
Event ID
Scope Parameters
TCA_SENSOR_FAN_UPPER
Hostname, Sensor Name
TCA_SENSOR_POWER_UPPER
Hostname, Sensor Name
TCA_SENSOR_TEMPERATURE_UPPER
Hostname, Sensor Name
TCA_SENSOR_VOLTAGE_UPPER
Hostname, Sensor Name
Event ID
Scope Parameters
TCA_WJH_DROP_AGG_UPPER
Hostname, Reason
TCA_WJH_ACL_DROP_AGG_UPPER
Hostname, Reason, Ingress port
TCA_WJH_BUFFER_DROP_AGG_UPPER
Hostname, Reason
TCA_WJH_SYMBOL_ERROR_UPPER
Hostname, Port down reason
TCA_WJH_CRC_ERROR_UPPER
Hostname, Port down reason
Specify the Scope
A rule’s scope can include all monitored devices or a subset. You define scopes as regular expressions, which is how they appear in NetQ. Each event has a set of attributes you can use to apply the rule to a subset of all devices. The definition and display is slightly different between the NetQ UI and the NetQ CLI, but the results are the same.
You define the scope in the Choose Attributes step when creating an event rule. You can choose to apply the rule to all devices or narrow the scope using attributes. If you choose to narrow the scope, but then do not enter any values for the available attributes, the result is all devices and attributes.
Scopes appear in threshold-crossing rule cards using the following format: Attribute, Operation, Value.
In this example, three attributes are available. For one or more of these attributes, select the operation (equals or starts with) and enter a value. For drop reasons, click in the value field to open a list of reasons, and select one from the list.
Note that you should leave the drop type attribute blank.
Create rule to show events from a …
Attribute
Operation
Value
Single device
hostname
Equals
<hostname> such as spine01
Single interface
ifname
Equals
<interface-name> such as swp6
Single sensor
s_name
Equals
<sensor-name> such as fan2
Single WJH drop reason
reason or port_down_reason
Equals
<drop-reason> such as WRED
Single WJH ingress port
ingress_port
Equals
<port-name> such as 47
Set of devices
hostname
Starts with
<partial-hostname> such as leaf
Set of interfaces
ifname
Starts with
<partial-interface-name> such as swp or eth
Set of sensors
s_name
Starts with
<partial-sensor-name> such as fan, temp, or psu
Refer to WJH Event Messages Reference for WJH drop types and reasons. Leaving an attribute value blank defaults to all: all hostnames, interfaces, sensors, forwarding resources, ACL resources, and so forth.
Each attribute is displayed on the rule card as a regular expression equivalent to your choices above:
Equals is displayed as an equals sign (=)
Starts with is displayed as a caret (^)
Blank (all) is displayed as an asterisk (*)
Scopes are defined with regular expressions. When more than one scoping parameter is available, they must be separated by a comma (without spaces), and all parameters must be defined in order. When an asterisk (*) is used alone, it must be entered inside either single or double quotes. Single quotes are used here.
The single hostname scope parameter is used by the ACL resources, forwarding resources, and resource utilization events.
Scope Value
Example
Result
<hostname>
leaf01
Deliver events for the specified device
<partial-hostname>*
leaf*
Deliver events for devices with hostnames starting with specified text (leaf)
The hostname and interface scope parameters are used by the digital optics, interface errors, interface statistics, and link flaps events.
Scope Value
Example
Result
<hostname>,<interface>
leaf01,swp9
Deliver events for the specified interface (swp9) on the specified device (leaf01)
<hostname>,'*'
leaf01,'*'
Deliver events for all interfaces on the specified device (leaf01)
'*',<interface>
'*',swp9
Deliver events for the specified interface (swp9) on all devices
<partial-hostname>*,<interface>
leaf*,swp9
Deliver events for the specified interface (swp9) on all devices with hostnames starting with the specified text (leaf)
<hostname>,<partial-interface>*
leaf01,swp*
Deliver events for all interface with names starting with the specified text (swp) on the specified device (leaf01)
The hostname and sensor name scope parameters are used by the sensor events.
Scope Value
Example
Result
<hostname>,<sensorname>
leaf01,fan1
Deliver events for the specified sensor (fan1) on the specified device (leaf01)
'*',<sensorname>
'*',fan1
Deliver events for the specified sensor (fan1) for all devices
<hostname>,'*'
leaf01,'*'
Deliver events for all sensors on the specified device (leaf01)
<partial-hostname>*,<interface>
leaf*,fan1
Deliver events for the specified sensor (fan1) on all devices with hostnames starting with the specified text (leaf)
<hostname>,<partial-sensorname>*
leaf01,fan*
Deliver events for all sensors with names starting with the specified text (fan) on the specified device (leaf01)
The hostname, reason/port down reason, ingress port, and drop type scope parameters are used by the What Just Happened events.
Scope Value
Example
Result
<hostname>,<reason>,<ingress_port>,<drop_type>
leaf01,ingress-port-acl,'*','*'
Deliver WJH events for all ports on the specified device (leaf01) with the specified reason triggered (ingress-port-acl exceeded the threshold)
'*',<reason>,'*'
'*',tail-drop,'*'
Deliver WJH events for the specified reason (tail-drop) for all devices
Deliver WJH events for the specified reason (calibration-failure) on all devices with hostnames starting with the specified text (leaf)
<hostname>,<partial-reason>*,<drop_type>
leaf01,blackhole,'*'
Deliver WJH events for reasons starting with the specified text (blackhole [route]) on the specified device (leaf01)
Create a Threshold-crossing Rule
Click Menu and navigate to Threshold crossing rules.
Select the tab that reflects the event type for the rule.
Click Create a rule. Enter a name for the rule and assign a severity, then click Next.
Select the attribute you want to monitor. The listed attributes change depending on the type of event you chose in the previous step.
Click Next.
On the Set threshold step, enter a threshold value.
For digital optics, you can choose to use the thresholds defined by the optics vendor (default) or specify your own.
Define the scope of the rule.
If you want to restrict the rule based on a particular parameter, enter values for one or more of the available attributes. For What Just Happened rules, select a reason from the available list.
If you want the rule to apply to across the network, select the Apply rule to entire network toggle.
Click Next.
(Optional) Select a notification channel where you want the events to be sent.
Only previously created channels are available for selection. If no channel is available or selected, the notifications can only be retrieved from the database. You can add a channel at a later time and then add it to the rule.
Click Finish. The rules may take several minutes to appear in the UI.
The simplest configuration you can create is one that sends a TCA event generated by all devices and all interfaces to a single notification application. Use the netq add tca command to configure the event. Its syntax is:
Note that the event ID is case sensitive and must be in all uppercase.
For example, this rule tells NetQ to deliver an event notification to the tca_slack_ifstats pre-configured Slack channel when the CPU utilization exceeds 95% of its capacity on any monitored switch:
This rule tells NetQ to deliver an event notification to the tca_pd_ifstats PagerDuty channel when the number of transmit bytes per second (Bps) on the leaf12 switch exceeds 20,000 Bps on any interface:
This rule tells NetQ to deliver an event notification to the syslog-netq syslog channel when the temperature on sensor temp1 on the leaf12 switch exceeds 32 degrees Celcius:
This rule tells NetQ to deliver an event notification to the tca-slack channel when the total number of ACL drops on the leaf04 switch exceeds 20,000 for any reason, ingress port, or drop type.
For a Slack channel, the event messages should be similar to this:
Set the Severity of a Threshold-crossing Event
In addition to defining a scope for TCA rule, you can also set a severity of either info or error. To add a severity to a rule, use the severity option.
For example, if you want to add an error severity to the CPU utilization rule you created earlier:
Digital optics have the additional option of applying user- or vendor-defined thresholds, using the threshold_type and threshold options.
This example shows how to send an error to channel ch1 when the upper threshold for module voltage exceeds the vendor-defined thresholds for interface swp31 on the mlx-2700-04 switch.
This example shows how to send an error to channel ch1 when the upper threshold for module voltage exceeds the user-defined threshold of 3V for interface swp31 on the mlx-2700-04 switch.
Now you have four rules created (the original one, plus these three new ones) all based on the TCA_SENSOR_TEMPERATURE_UPPER event. To identify the various rules, NetQ automatically generates a TCA name for each rule. As you create each rule, NetQ adds an _# to the event name. The TCA Name for the first rule created is then TCA_SENSOR_TEMPERATURE_UPPER_1, the second rule created for this event is TCA_SENSOR_TEMPERATURE_UPPER_2, and so forth.
Manage Threshold-crossing Event Notifications
View Threshold-crossing Rules
Click Menu and navigate to Threshold crossing rules.
Select the relevant tab. The UI displays each rule and its parameters as a card.
After creating a rule, you can use the filters that appear above the rule cards to filter by status, severity, channel, and/or events.
To view TCA rules, run:
netq show tca [tca_id <text-tca-id-anchor>] [json]
This example displays all TCA rules:
cumulus@switch:~$ netq show tca
Matching config_tca records:
TCA Name Event Name Scope Severity Channel/s Active Threshold Unit Threshold Type Suppress Until
---------------------------- -------------------- -------------------------- -------- ------------------ ------ ------------------ -------- -------------- ----------------------------
TCA_CPU_UTILIZATION_UPPER_1 TCA_CPU_UTILIZATION_ {"hostname":"leaf01"} info pd-netq-events,slk True 87 % user_set Fri Oct 9 15:39:35 2020
UPPER -netq-events
TCA_CPU_UTILIZATION_UPPER_2 TCA_CPU_UTILIZATION_ {"hostname":"*"} error slk-netq-events True 93 % user_set Fri Oct 9 15:39:56 2020
UPPER
TCA_DOM_BIAS_CURRENT_ALARM_U TCA_DOM_BIAS_CURRENT {"hostname":"leaf*","ifnam error slk-netq-events True 0 mA vendor_set Fri Oct 9 16:02:37 2020
PPER_1 _ALARM_UPPER e":"*"}
TCA_DOM_RX_POWER_ALARM_UPPER TCA_DOM_RX_POWER_ALA {"hostname":"*","ifname":" info slk-netq-events True 0 mW vendor_set Fri Oct 9 15:25:26 2020
_1 RM_UPPER *"}
TCA_SENSOR_TEMPERATURE_UPPER TCA_SENSOR_TEMPERATU {"hostname":"leaf","s_name error slk-netq-events True 32 degreeC user_set Fri Oct 9 15:40:18 2020
_1 RE_UPPER ":"temp1"}
TCA_TCAM_IPV4_ROUTE_UPPER_1 TCA_TCAM_IPV4_ROUTE_ {"hostname":"*"} error pd-netq-events True 20000 % user_set Fri Oct 9 16:13:39 2020
UPPER
This example displays a specific TCA rule:
cumulus@switch:~$ netq show tca tca_id TCA_TXMULTICAST_UPPER_1
Matching config_tca records:
TCA Name Event Name Scope Severity Channel/s Active Threshold Suppress Until
---------------------------- -------------------- -------------------------- ---------------- ------------------ ------ ------------------ ----------------------------
TCA_TXMULTICAST_UPPER_1 TCA_TXMULTICAST_UPPE {"ifname":"swp3","hostname info tca-tx-bytes-slack True 0 Sun Dec 8 16:40:14 2269
R ":"leaf01"}
Change the Threshold on a Rule
After receiving notifications based on a rule, you might want to increase or decrease the threshold value to limit or increase the number of events you receive.
To modify the threshold:
Locate the rule you want to modify and hover over the top of the card.
Click Edit.
Enter a new threshold value, then select Update rule.
After receiving notifications based on a rule, you might find that you want to narrow or widen the scope value to limit or increase the number of events you receive.
To modify the scope:
Locate the rule you want to modify and hover over the top of the card.
Click Edit.
Select the toggle to either apply the rule to the entire network or individual hosts.
This example changes the scope for the rule TCA_CPU_UTILIZATION_UPPER to apply only to switches beginning with a hostname of leaf. You must also provide a threshold value. This example case uses a value of 95 percent. Note that this overwrites the existing scope and threshold values.
cumulus@switch:~$ netq add tca event_id TCA_CPU_UTILIZATION_UPPER scope hostname^leaf threshold 95
Successfully added/updated tca
cumulus@switch:~$ netq show tca
Matching config_tca records:
TCA Name Event Name Scope Severity Channel/s Active Threshold Suppress Until
---------------------------- -------------------- -------------------------- ---------------- ------------------ ------ ------------------ ----------------------------
TCA_CPU_UTILIZATION_UPPER_1 TCA_CPU_UTILIZATION_ {"hostname":"*"} error onprem-email True 93 Mon Aug 31 20:59:57 2020
UPPER
TCA_CPU_UTILIZATION_UPPER_2 TCA_CPU_UTILIZATION_ {"hostname":"hostname^leaf info True 95 Tue Sep 1 18:47:24 2020
UPPER "}
Change, Add, or Remove Channels
Locate the rule you want to modify and hover over the top of the card.
You cannot change the name of a threshold-crossing rule using the NetQ CLI because the rules do not have names. They receive identifiers (the tca_id) automatically. In the NetQ UI, to change a rule name, you must delete the rule and re-create it with the new name.
Change the Severity of a Rule
Threshold-crossing rules are categorized as either info or error.
In the NetQ UI, you must delete the rule and re-create it, specifying the new severity.
In the NetQ CLI, to change the severity, run:
netq add tca tca_id <text-tca-id-anchor> (severity info | severity error)
This example changes the severity of the maximum CPU utilization 1 rule from error to info:
During troubleshooting or switch maintenance, you might want to suppress a rule to prevent erroneous or excessive notifications. This effectively pauses notifications for a specified time period.
Locate the rule you want to disable and click Disable.
Select the Date/Time field to set when you want the rule to be reenabled.
Click Disable.
Note the changes in the card:
The state changes to Snoozed
The Suppressed field displays the date and time at which the rule will be reenabled.
The Disable button changes to Disable forever.
Using the suppress_until option allows you to prevent the rule from being applied for a designated amout of time (in seconds). When this time has passed, the rule is automatically reenabled.
To reenable the rule, set the is_active option to true.
Delete a Rule
To delete a rule:
Locate the rule you want to remove and hover over the card.
In the card’s top-right corner, select Delete.
To remove a rule altogether, run:
netq del tca tca_id <text-tca-id-anchor>
This example deletes the maximum receive bytes rule:
cumulus@switch:~$ netq del tca tca_id TCA_RXBYTES_UPPER_1
Successfully deleted TCA TCA_RXBYTES_UPPER_1
Resolve Scope Conflicts
There might be occasions where the scopes defined by multiple threshold-crossing rules overlap. In such cases, NetQ uses the rule with the most specific scope that is still true to generate the event.
To clarify this, consider this example. Three events occurred:
First event on switch leaf01, interface swp1
Second event on switch leaf01, interface swp3
Third event on switch spine01, interface swp1
NetQ attempts to match the threshold-crossing event against hostname and interface name with three threshold-crossing rules with different scopes:
Scope 1 send events for the swp1 interface on switch leaf01 (very specific)
Scope 2 send events for all interfaces on switches that start with leaf (moderately specific)
Scope 3 send events for all switches and interfaces (very broad)
The result is:
For the first event, NetQ applies the scope from rule 1 because it matches scope 1 exactly
For the second event, NetQ applies the scope from rule 2 because it does not match scope 1, but does match scope 2
For the third event, NetQ applies the scope from rule 3 because it does not match either scope 1 or scope 2
In summary:
Input Event
Scope Parameters
TCA Scope 1
TCA Scope 2
TCA Scope 3
Scope Applied
leaf01,swp1
Hostname, Interface
'*','*'
leaf*,'*'
leaf01,swp1
Scope 3
leaf01,swp3
Hostname, Interface
'*','*'
leaf*,'*'
leaf01,swp1
Scope 2
spine01,swp1
Hostname, Interface
'*','*'
leaf*,'*'
leaf01,swp1
Scope 1
You can modify threshold-crossing rules to remove conflicts.
BGP
Use the UI or CLI to monitor Border Gateway Protocol (BGP) on a networkwide or per-session basis.
BGP Commands
Monitor BGP with the following commands. See the command line reference for additional options, definitions, and examples.
netq show bgp
netq show events message_type bgp
netq show events-config message_type bgp
View BGP in the UI
To add the BGP card to your workbench, navigate to the header and select Add card > Network services > All BGP Sessions card > Open cards. In this example, there are 13 nodes running the BGP protocol, 0 open events (from the last 24 hours), and 10 nodes with unestablished sessions.
Expand to the large card for additional BGP info. By default, the card displays the Sessions summary tab. From here you can see which devices are handling the most BGP sessions, or select the dropdown to view nodes with the most unestablished BGP sessions. You can view BGP-related events by selecting the Events tab.
Expand the BGP card to full-screen to view, filter, or export:
Virtual routing and forwarding (VRF) information
Autonomous system number (ASN) assignments
Peer ASNs
The received address prefix for IPv4/IPv6/EVPN when the session is established
From this table, you can select a row, then click Add card above the table.
NetQ adds a new, BGP ‘single-session’ card to your workbench. From this card, you can view session state changes and compare them with events, and monitor the running BGP configuration and changes to the configuration file.
Before adding a BGP single-session card, verify that both the peer hostname and peer ASN are valid. This ensures the information presented is reliable.
Monitor a Single BGP Session
The BGP single-session card displays the node, its peer, its status (established or unestablished), and its router ID. This information can help you determine the stability of the BGP session between two devices. The heat map indicates the status of the session over the designated time period. In this example, the session has been established throughout the entire time period:
Understanding the Heat Map
On the medium and large single BGP session cards, vertically stacked heat maps represent the status of the sessions: one for established sessions, and one for unestablished sessions. Depending on the time period of data on the card, the number of smaller time blocks indicate that the status varies. A vertical stack of time blocks, one from each map, includes the results from all checks during that time. The results appear by how saturated the color is for each block. If only established sessions occurred during that time period for the entire time block, then the top block is 100% saturated (white) and the unestablished block is 0% saturated (gray). As unestablished sessions increase in saturation, the established sessions block is proportionally reduced in saturation. An example heat map for a time period of 24 hours appears here with the most common time periods in the table showing the resulting time blocks.
Time Period
Number of Runs
Number Time Blocks
Amount of Time in Each Block
6 hours
18
6
1 hour
12 hours
36
12
1 hour
24 hours
72
24
1 hour
1 week
504
7
1 day
1 month
2,086
30
1 day
1 quarter
7,000
13
1 week
View Changes to the BGP Service Configuration File
Each time a change is made to the configuration file for the BGP service, NetQ logs the change and lets you compare it with the previous version. This can be useful when you are troubleshooting potential causes for events or sessions losing their connections.
From the large single-session card, select the BGP Configuration File Evolution tab.
Select the time.
Choose between the File view and the Diff view.
The File view displays the content of the file:
The Diff view highlights the changes between this version (on left) and the most recent version (on right) side by side:
You can monitor both system and threshold-crossing events with the UI or CLI. You can view all events across the entire network or all events on a device, then filter your view of events based on event type, severity, and timeframe.
Note that in the UI, it can take several minutes for NetQ to process and accurately display network events. The delay is caused by events with multiple network dependencies. It takes between 5 and 10 minutes for NetQ to consolidate and display these events.
Monitor All System and TCA Events Networkwide
Click Menu.
In the side navigation under Network, click Events.
The dashboard presents a timeline of events alongside the devices that are causing the most events. You can filter events by type, including interface, network services, system, and threshold crossing events. The filter controls are located at the top of the screen.
If you are receiving too many event notifications, you can create rules to suppress events. Select Show suppression rules in the top-right corner to view rules that prevent NetQ from displaying an event message. Refer to Configure System Event Notifications for information about event suppression.
Events are also generated when streaming validation checks detect a failure. If an event is generated from a failed validation check, it will be marked resolved automatically the next time the check runs successfully.
To view all system and all TCA events, run:
netq show events [between <text-time> and <text-endtime>] [json]
This example shows all system and TCA events between now and an hour ago.
netq show events
cumulus@switch:~$ netq show events
Matching events records:
Hostname Message Type Severity Message Timestamp
----------------- ------------------------ ---------------- ----------------------------------- -------------------------
leaf01 btrfsinfo error data storage efficiency : space lef Wed Sep 2 20:04:30 2020
t after allocation greater than chu
nk size 0.57 GB
leaf02 btrfsinfo error data storage efficiency : space lef Wed Sep 2 19:55:26 2020
t after allocation greater than chu
nk size 0.57 GB
leaf01 btrfsinfo error data storage efficiency : space lef Wed Sep 2 19:34:29 2020
t after allocation greater than chu
nk size 0.57 GB
leaf02 btrfsinfo error data storage efficiency : space lef Wed Sep 2 19:25:24 2020
t after allocation greater than chu
nk size 0.57 GB
This example shows all events between now and 24 hours ago.
netq show events between now and 24hr
cumulus@switch:~$ netq show events between now and 24hr
Matching events records:
Hostname Message Type Severity Message Timestamp
----------------- ------------------------ ---------------- ----------------------------------- -------------------------
leaf01 btrfsinfo error data storage efficiency : space lef Wed Sep 2 20:04:30 2020
t after allocation greater than chu
nk size 0.57 GB
leaf02 btrfsinfo error data storage efficiency : space lef Wed Sep 2 19:55:26 2020
t after allocation greater than chu
nk size 0.57 GB
leaf01 btrfsinfo error data storage efficiency : space lef Wed Sep 2 19:34:29 2020
t after allocation greater than chu
nk size 0.57 GB
leaf02 btrfsinfo error data storage efficiency : space lef Wed Sep 2 19:25:24 2020
t after allocation greater than chu
nk size 0.57 GB
leaf01 btrfsinfo error data storage efficiency : space lef Wed Sep 2 19:04:22 2020
t after allocation greater than chu
nk size 0.57 GB
leaf02 btrfsinfo error data storage efficiency : space lef Wed Sep 2 18:55:17 2020
t after allocation greater than chu
nk size 0.57 GB
leaf01 btrfsinfo error data storage efficiency : space lef Wed Sep 2 18:34:21 2020
t after allocation greater than chu
nk size 0.57 GB
leaf02 btrfsinfo error data storage efficiency : space lef Wed Sep 2 18:25:16 2020
t after allocation greater than chu
nk size 0.57 GB
leaf01 btrfsinfo error data storage efficiency : space lef Wed Sep 2 18:04:19 2020
t after allocation greater than chu
nk size 0.57 GB
leaf02 btrfsinfo error data storage efficiency : space lef Wed Sep 2 17:55:15 2020
t after allocation greater than chu
nk size 0.57 GB
leaf01 btrfsinfo error data storage efficiency : space lef Wed Sep 2 17:34:18 2020
t after allocation greater than chu
nk size 0.57 GB
...
Monitor All System and TCA Events on a Device
Click Menu.
In the side navigation under Network, click Events.
At the top of the screen, click the Hostname field and select a device.
Click Apply.
To view all system and TCA events on a switch, run:
netq <hostname> show events [between <text-time> and <text-endtime>] [json]
This example shows all system and TCA events that have occurred on the leaf01 switch between now and an hour ago.
cumulus@switch:~$ netq leaf01 show events
Matching events records:
Hostname Message Type Severity Message Timestamp
----------------- ------------------------ ---------------- ----------------------------------- -------------------------
leaf01 btrfsinfo error data storage efficiency : space lef Wed Sep 2 20:34:31 2020
t after allocation greater than chu
nk size 0.57 GB
leaf01 btrfsinfo error data storage efficiency : space lef Wed Sep 2 20:04:30 2020
t after allocation greater than chu
nk size 0.57 GB
This example shows that no events have occurred on the spine01 switch in the last hour.
cumulus@switch:~$ netq spine01 show events
No matching event records found
Monitor System and TCA Events Networkwide by Type
Click Menu.
In the side navigation under Network, click Events.
At the top of the screen, click the Type field and select a network protocol or service.
Click Apply.
To view all system events for a given network protocol or service, run:
Monitor System and TCA Events Networkwide by Severity
System event severities include info, error, warning, or debug. TCA event severities include info or error.
Click Menu.
In the side navigation under Network, click Events.
At the top of the screen, click the Severity field and select a level.
Click Apply.
To view all system events of a given severity, run:
netq show events [severity info | severity error ] [between <text-time> and <text-endtime>] [json]
Monitor System and TCA Events on a Device by Severity
Click Menu.
In the side navigation under Network, click Events.
At the top of the screen, click the Hostname field and select a device.
In the same row, click the Severity field and select a level.
Click Apply.
To view all system events for a given severity on a device, run:
netq <hostname> show events [severity info | severity error ] [between <text-time> and <text-endtime>] [json]
Monitor System and TCA Events Networkwide by Time
Click Menu.
In the side navigation under Network, click Events.
At the top of the screen, use the first two fields to filter either over a time range or by recent events.
Click Apply.
The NetQ CLI uses a default of one hour unless otherwise specified. To view all system and all TCA events for a time beyond an hour in the past, run:
netq show events [between <text-time> and <text-endtime>] [json]
This example shows all system and TCA events between now and 24 hours ago.
netq show events between now and 24hr
cumulus@switch:~$ netq show events between now and 24hr
Matching events records:
Hostname Message Type Severity Message Timestamp
----------------- ------------------------ ---------------- ----------------------------------- -------------------------
leaf01 btrfsinfo error data storage efficiency : space lef Wed Sep 2 20:04:30 2020
t after allocation greater than chu
nk size 0.57 GB
leaf02 btrfsinfo error data storage efficiency : space lef Wed Sep 2 19:55:26 2020
t after allocation greater than chu
nk size 0.57 GB
leaf01 btrfsinfo error data storage efficiency : space lef Wed Sep 2 19:34:29 2020
t after allocation greater than chu
nk size 0.57 GB
leaf02 btrfsinfo error data storage efficiency : space lef Wed Sep 2 19:25:24 2020
t after allocation greater than chu
nk size 0.57 GB
leaf01 btrfsinfo error data storage efficiency : space lef Wed Sep 2 19:04:22 2020
t after allocation greater than chu
nk size 0.57 GB
leaf02 btrfsinfo error data storage efficiency : space lef Wed Sep 2 18:55:17 2020
t after allocation greater than chu
nk size 0.57 GB
leaf01 btrfsinfo error data storage efficiency : space lef Wed Sep 2 18:34:21 2020
t after allocation greater than chu
nk size 0.57 GB
leaf02 btrfsinfo error data storage efficiency : space lef Wed Sep 2 18:25:16 2020
t after allocation greater than chu
nk size 0.57 GB
leaf01 btrfsinfo error data storage efficiency : space lef Wed Sep 2 18:04:19 2020
t after allocation greater than chu
nk size 0.57 GB
leaf02 btrfsinfo error data storage efficiency : space lef Wed Sep 2 17:55:15 2020
t after allocation greater than chu
nk size 0.57 GB
leaf01 btrfsinfo error data storage efficiency : space lef Wed Sep 2 17:34:18 2020
t after allocation greater than chu
nk size 0.57 GB
...
This example shows all system and TCA events between one and three days ago.
cumulus@switch:~$ netq show events between 1d and 3d
Matching events records:
Hostname Message Type Severity Message Timestamp
----------------- ------------------------ ---------------- ----------------------------------- -------------------------
leaf01 btrfsinfo error data storage efficiency : space lef Wed Sep 9 16:14:37 2020
t after allocation greater than chu
nk size 0.57 GB
leaf02 btrfsinfo error data storage efficiency : space lef Wed Sep 9 16:03:31 2020
t after allocation greater than chu
nk size 0.57 GB
leaf01 btrfsinfo error data storage efficiency : space lef Wed Sep 9 15:44:36 2020
t after allocation greater than chu
nk size 0.57 GB
leaf02 btrfsinfo error data storage efficiency : space lef Wed Sep 9 15:33:30 2020
t after allocation greater than chu
nk size 0.57 GB
leaf01 btrfsinfo error data storage efficiency : space lef Wed Sep 9 15:14:35 2020
t after allocation greater than chu
nk size 0.57 GB
leaf02 btrfsinfo error data storage efficiency : space lef Wed Sep 9 15:03:28 2020
t after allocation greater than chu
nk size 0.57 GB
leaf01 btrfsinfo error data storage efficiency : space lef Wed Sep 9 14:44:34 2020
t after allocation greater than chu
nk size 0.57 GB
leaf02 btrfsinfo error data storage efficiency : space lef Wed Sep 9 14:33:21 2020
t after allocation greater than chu
nk size 0.57 GB
...
Configure and Monitor What Just Happened
The What Just Happened (WJH) feature, available on NVIDIA Spectrum switches, streams detailed and contextual telemetry data for analysis. This provides real-time visibility into problems in the network, such as hardware packet drops due to buffer congestion, incorrect routing, and ACL or layer 1 problems.
WJH is only supported on NVIDIA Spectrum switches. WJH latency and congestion monitoring is supported on NVIDIA Spectrum 2 switches and above. WJH requires Cumulus Linux 4.4.0 or later. SONiC only supports collection of WJH data with gNMI.
Using WJH in combination with NetQ helps you identify losses anywhere in the fabric. From a single management console you can:
View any current or historic drop information, including the reason for the drop
Identify problematic flows or endpoints, and pinpoint where communication is failing in the network
By default, Cumulus Linux 4.4.0 and later provides the NetQ Agent and CLI. Depending on the version of Cumulus Linux running on your NVIDIA switch, you might need to upgrade the NetQ Agent and CLI to the latest release:
WJH is enabled by default on NVIDIA switches and Cumulus Linux 4.4.0 requires no configuration; however, you must enable the NetQ Agent to collect the data.
To enable WJH in NetQ on any switch or server:
Configure the NetQ Agent on the NVIDIA switch.
cumulus@switch:~$ sudo netq config add agent wjh
Restart the NetQ Agent to start collecting the WJH data.
cumulus@switch:~$ sudo netq config restart agent
When you finish viewing the WJH metrics, you might want to stop the NetQ Agent from collecting WJH data to reduce network traffic. Use netq config del agent wjh followed by netq config restart agent to disable the WJH feature on the given switch.
Using wjh_dump.py on an NVIDIA platform that is running Cumulus Linux and the NetQ agent causes the NetQ WJH client to stop receiving packet drop call backs. To prevent this issue, run wjh_dump.py on a different system than the one where the NetQ Agent has WJH enabled, or disable wjh_dump.py and restart the NetQ Agent (run netq config restart agent).
Configure Latency and Congestion Thresholds
WJH latency and congestion metrics depend on threshold settings to trigger the events. WJH measures packet latency as the time spent inside a single system (switch). When specified, WJH triggers events when measured values cross high thresholds and events are suppressed when values are below low thresholds.
You can specify multiple traffic classes and multiple ports by separating the classes or ports by a comma (no spaces).
The following example creates latency thresholds for Class 3 traffic on port swp1 where the upper threshold is 10 usecs and the lower threshold is 1 usec:
This example creates congestion thresholds for Class 4 traffic on port swp1 where the upper threshold is 200 cells and the lower threshold is 10 cells, where a cell is a unit of 144 bytes:
You can filter WJH events by drop type at the NetQ Agent before the NetQ system processes it. You can filter the drop type further by specifying one or more drop reasons or severity. Filter events by creating a NetQ configuration profile in the NetQ UI or using the netq config add agent wjh-drop-filter command in the NetQ CLI.
On the NetQ Configurations card, click Add Config.
Click Enable to enable WJH, then click Customize:
By default, WJH includes all drop reasons and severities. Uncheck any drop reasons or severity you do not want to generate WJH events, then click Done.
Click Add to save the configuration profile, or click Close to discard it.
To configure the NetQ Agent to filter WJH drops, run:
You can view the WJH metrics from the NetQ UI or the NetQ CLI. WJH metrics are visible on the WJH card and the Events card. To view the metrics on the Events card, open the medium-sized card and hover over most-active devices. For a more detailed view, open the WJH card.
Open the What Just Happened card on your workbench:
You can expand the card to see a detailed summary of WJH data:
Expanding the card to its largest size will open the advanced WJH dashboard. You can also access this dashboard by clicking Menu and selecting What Just Happened under the Network column:
Hover over the color-coded chart to view and expand individual WJH event categories:
Click on a category in the chart for a detailed view:
Use the various options to restrict the output accordingly.
This example uses the first form of the command to show drops on switch leaf03 for the past week.
cumulus@switch:~$ netq leaf03 show wjh-drop between now and 7d
Matching wjh records:
Drop type Aggregate Count
------------------ ------------------------------
L1 560
Buffer 224
Router 144
L2 0
ACL 0
Tunnel 0
This example uses the second form of the command to show drops on switch leaf03 for the past week including the drop reasons.
cumulus@switch:~$ netq leaf03 show wjh-drop details between now and 7d
Matching wjh records:
Drop type Aggregate Count Reason
------------------ ------------------------------ ---------------------------------------------
L1 556 None
Buffer 196 WRED
Router 144 Blackhole route
Buffer 14 Packet Latency Threshold Crossed
Buffer 14 Port TC Congestion Threshold
L1 4 Oper down
This example shows the drops seen at layer 2 across the network.
cumulus@mlx-2700-03:mgmt:~$ netq show wjh-drop l2
Matching wjh records:
Hostname Ingress Port Reason Agg Count Src Ip Dst Ip Proto Src Port Dst Port Src Mac Dst Mac First Timestamp Last Timestamp
----------------- ------------------------ --------------------------------------------- ------------------ ---------------- ---------------- ------ ---------------- ---------------- ------------------ ------------------ ------------------------------ ----------------------------
mlx-2700-03 swp1s2 Port loopback filter 10 27.0.0.19 27.0.0.22 0 0 0 00:02:00:00:00:73 0c:ff:ff:ff:ff:ff Mon Dec 16 11:54:15 2019 Mon Dec 16 11:54:15 2019
mlx-2700-03 swp1s2 Source MAC equals destination MAC 10 27.0.0.19 27.0.0.22 0 0 0 00:02:00:00:00:73 00:02:00:00:00:73 Mon Dec 16 11:53:17 2019 Mon Dec 16 11:53:17 2019
mlx-2700-03 swp1s2 Source MAC equals destination MAC 10 0.0.0.0 0.0.0.0 0 0 0 00:02:00:00:00:73 00:02:00:00:00:73 Mon Dec 16 11:40:44 2019 Mon Dec 16 11:40:44 2019
The following two examples include the severity of a drop event (error, warning or notice) for ACLs and routers.
cumulus@switch:~$ netq show wjh-drop acl
Matching wjh records:
Hostname Ingress Port Reason Severity Agg Count Src Ip Dst Ip Proto Src Port Dst Port Src Mac Dst Mac Acl Rule Id Acl Bind Point Acl Name Acl Rule First Timestamp Last Timestamp
----------------- ------------------------ --------------------------------------------- ---------------- ------------------ ---------------- ---------------- ------ ---------------- ---------------- ------------------ ------------------ ---------------------- ---------------------------- ---------------- ---------------- ------------------------------ ----------------------------
leaf01 swp2 Ingress router ACL Error 49 55.0.0.1 55.0.0.2 17 8492 21423 00:32:10:45:76:89 00:ab:05:d4:1b:13 0x0 0 Tue Oct 6 15:29:13 2020 Tue Oct 6 15:29:39 2020
cumulus@switch:~$ netq show wjh-drop router
Matching wjh records:
Hostname Ingress Port Reason Severity Agg Count Src Ip Dst Ip Proto Src Port Dst Port Src Mac Dst Mac First Timestamp Last Timestamp
----------------- ------------------------ --------------------------------------------- ---------------- ------------------ ---------------- ---------------- ------ ---------------- ---------------- ------------------ ------------------ ------------------------------ ----------------------------
leaf01 swp1 Blackhole route Notice 36 46.0.1.2 47.0.2.3 6 1235 43523 00:01:02:03:04:05 00:06:07:08:09:0a Tue Oct 6 15:29:13 2020 Tue Oct 6 15:29:47 2020
DPUs
DPU monitoring is an early access feature.
With the NetQ UI, you can monitor hardware resources of individual data processing units (DPUs), including CPU utilization, disk usage, and memory utilization.
For DPU inventory information, refer to DPU Inventory.
View Overall Health of a DPU
For an overview of the current or past health of DPU hardware resources, open the DPU device card. To open a DPU device card:
Click Devices in the header, then click Open a device card.
Select a DPU from the dropdown.
Click Add. This example shows that the r-netq-bf2-01 DPU has low utilization across CPU, memory, and disks:
View DPU Attributes
For a quick look at the key attributes of a particular DPU, expand the DPU card.
Attributes are displayed as the default tab on the large DPU card. You can view the static information about the DPU, including its hostname, ASIC vendor and model, CPU information, OS version, and agent version.
To view a larger display of hardware resource utilization, select Utilization.
View Installed Packages
To view the list of installed packages on a particular DPU, expand the card to its largest size:
Related Information
To read more about NVIDIA BlueField DPUs and the DOCA Telemetry Service, refer to the DOCA SDK Documentation.
gNMI Streaming
You can use gRPC Network Management Interface (gNMI) to collect system resource, interface, and counter information from Cumulus Linux and export it to your own gNMI client.
Configure the gNMI Agent
The gNMI agent is disabled by default. To enable it, run:
The gNMI agent listens over port 9339. You can change the default port in case you use that port in another application. The /etc/netq/netq.yml file stores the configuration.
Use the following commands to adjust the settings:
Restart the NetQ agent to incorporate the configuration changes:
cumulus@switch:~$ netq config restart agent
Use the gNMI Agent Only
NVIDIA recommends collecting data with both the gNMI and NetQ agents. However, if you do not want to collect data with both agents, you can disable the NetQ agent. Data is then sent exclusively to the gNMI agent.
To disable the NetQ agent, use the following command:
You cannot disable both the NetQ and gNMI agents. If both agents are enabled on Cumulus Linux and a NetQ server is unreachable, the data from the following models are not sent to gNMI:
openconfig-interfaces
openconfig-if-ethernet
openconfig-if-ethernet-ext
openconfig-system
nvidia-if-ethernet-ext
WJH, openconfig-platform, and openconfig-lldp data continue streaming to gNMI in this state. If you are only using gNMI and a NetQ telemetry server does not exist, you should disable the NetQ agent by setting opta-enable to false.
Supported Models
Cumulus Linux supports the following OpenConfig models:
The client should use the following YANG models as a reference:
▼
nvidia-if-ethernet-ext
module nvidia-if-ethernet-counters-ext {
// xPath --> /interfaces/interface[name=*]/ethernet/counters/state/
namespace "http://nvidia.com/yang/nvidia-ethernet-counters";
prefix "nvidia-if-ethernet-counters-ext";
// import some basic types
import openconfig-interfaces { prefix oc-if; }
import openconfig-if-ethernet { prefix oc-eth; }
import openconfig-yang-types { prefix oc-yang; }
revision "2021-10-12" {
description
"Initial revision";
reference "1.0.0.";
}
grouping ethernet-counters-ext {
leaf alignment-error {
type oc-yang:counter64;
}
leaf in-acl-drops {
type oc-yang:counter64;
}
leaf in-buffer-drops {
type oc-yang:counter64;
}
leaf in-dot3-frame-errors {
type oc-yang:counter64;
}
leaf in-dot3-length-errors {
type oc-yang:counter64;
}
leaf in-l3-drops {
type oc-yang:counter64;
}
leaf in-pfc0-packets {
type oc-yang:counter64;
}
leaf in-pfc1-packets {
type oc-yang:counter64;
}
leaf in-pfc2-packets {
type oc-yang:counter64;
}
leaf in-pfc3-packets {
type oc-yang:counter64;
}
leaf in-pfc4-packets {
type oc-yang:counter64;
}
leaf in-pfc5-packets {
type oc-yang:counter64;
}
leaf in-pfc6-packets {
type oc-yang:counter64;
}
leaf in-pfc7-packets {
type oc-yang:counter64;
}
leaf out-non-q-drops {
type oc-yang:counter64;
}
leaf out-pfc0-packets {
type oc-yang:counter64;
}
leaf out-pfc1-packets {
type oc-yang:counter64;
}
leaf out-pfc2-packets {
type oc-yang:counter64;
}
leaf out-pfc3-packets {
type oc-yang:counter64;
}
leaf out-pfc4-packets {
type oc-yang:counter64;
}
leaf out-pfc5-packets {
type oc-yang:counter64;
}
leaf out-pfc6-packets {
type oc-yang:counter64;
}
leaf out-pfc7-packets {
type oc-yang:counter64;
}
leaf out-q0-wred-drops {
type oc-yang:counter64;
}
leaf out-q1-wred-drops {
type oc-yang:counter64;
}
leaf out-q2-wred-drops {
type oc-yang:counter64;
}
leaf out-q3-wred-drops {
type oc-yang:counter64;
}
leaf out-q4-wred-drops {
type oc-yang:counter64;
}
leaf out-q5-wred-drops {
type oc-yang:counter64;
}
leaf out-q6-wred-drops {
type oc-yang:counter64;
}
leaf out-q7-wred-drops {
type oc-yang:counter64;
}
leaf out-q8-wred-drops {
type oc-yang:counter64;
}
leaf out-q9-wred-drops {
type oc-yang:counter64;
}
leaf out-q-drops {
type oc-yang:counter64;
}
leaf out-q-length {
type oc-yang:counter64;
}
leaf out-wred-drops {
type oc-yang:counter64;
}
leaf symbol-errors {
type oc-yang:counter64;
}
leaf out-tx-fifo-full {
type oc-yang:counter64;
}
}
augment "/oc-if:interfaces/oc-if:interface/oc-eth:ethernet/" +
"oc-eth:state/oc-eth:counters" {
uses ethernet-counters-ext;
}
}
▼
nvidia-if-wjh-drop-aggregate
module nvidia-wjh {
// Entrypoint /oc-if:interfaces/oc-if:interface
//
// xPath L1 --> interfaces/interface[name=*]/wjh/aggregate/l1
// xPath L2 --> /interfaces/interface[name=*]/wjh/aggregate/l2/reasons/reason[id=*][severity=*]
// xPath Router --> /interfaces/interface[name=*]/wjh/aggregate/router/reasons/reason[id=*][severity=*]
// xPath Tunnel --> /interfaces/interface[name=*]/wjh/aggregate/tunnel/reasons/reason[id=*][severity=*]
// xPath Buffer --> /interfaces/interface[name=*]/wjh/aggregate/buffer/reasons/reason[id=*][severity=*]
// xPath ACL --> /interfaces/interface[name=*]/wjh/aggregate/acl/reasons/reason[id=*][severity=*]
import openconfig-interfaces { prefix oc-if; }
namespace "http://nvidia.com/yang/what-just-happened-config";
prefix "nvidia-wjh";
revision "2021-10-12" {
description
"Initial revision";
reference "1.0.0.";
}
augment "/oc-if:interfaces/oc-if:interface" {
uses interfaces-wjh;
}
grouping interfaces-wjh {
description "Top-level grouping for What-just happened data.";
container wjh {
container aggregate {
container l1 {
container state {
leaf drop {
type string;
description "Drop list based on wjh-drop-types module encoded in JSON";
}
}
}
container l2 {
uses reason-drops;
}
container router {
uses reason-drops;
}
container tunnel {
uses reason-drops;
}
container acl {
uses reason-drops;
}
container buffer {
uses reason-drops;
}
}
}
}
grouping reason-drops {
container reasons {
list reason {
key "id severity";
leaf id {
type leafref {
path "../state/id";
}
description "reason ID";
}
leaf severity {
type leafref {
path "../state/severity";
}
description "Reason severity";
}
container state {
leaf id {
type uint32;
description "Reason ID";
}
leaf name {
type string;
description "Reason name";
}
leaf severity {
type string;
mandatory "true";
description "Reason severity";
}
leaf drop {
type string;
description "Drop list based on wjh-drop-types module encoded in JSON";
}
}
}
}
}
}
module wjh-drop-types {
namespace "http://nvidia.com/yang/what-just-happened-config-types";
prefix "wjh-drop-types";
container l1-aggregated {
uses l1-drops;
}
container l2-aggregated {
uses l2-drops;
}
container router-aggregated {
uses router-drops;
}
container tunnel-aggregated {
uses tunnel-drops;
}
container acl-aggregated {
uses acl-drops;
}
container buffer-aggregated {
uses buffer-drops;
}
grouping reason-key {
leaf id {
type uint32;
mandatory "true";
description "reason ID";
}
leaf severity {
type string;
mandatory "true";
description "Severity";
}
}
grouping reason_info {
leaf reason {
type string;
mandatory "true";
description "Reason name";
}
leaf drop_type {
type string;
mandatory "true";
description "reason drop type";
}
leaf ingress_port {
type string;
mandatory "true";
description "Ingress port name";
}
leaf ingress_lag {
type string;
description "Ingress LAG name";
}
leaf egress_port {
type string;
description "Egress port name";
}
leaf agg_count {
type uint64;
description "Aggregation count";
}
leaf severity {
type string;
description "Severity";
}
leaf first_timestamp {
type uint64;
description "First timestamp";
}
leaf end_timestamp {
type uint64;
description "End timestamp";
}
}
grouping packet_info {
leaf smac {
type string;
description "Source MAC";
}
leaf dmac {
type string;
description "Destination MAC";
}
leaf sip {
type string;
description "Source IP";
}
leaf dip {
type string;
description "Destination IP";
}
leaf proto {
type uint32;
description "Protocol";
}
leaf sport {
type uint32;
description "Source port";
}
leaf dport {
type uint32;
description "Destination port";
}
}
grouping l1-drops {
description "What-just happened drops.";
leaf ingress_port {
type string;
description "Ingress port";
}
leaf is_port_up {
type boolean;
description "Is port up";
}
leaf port_down_reason {
type string;
description "Port down reason";
}
leaf description {
type string;
description "Description";
}
leaf state_change_count {
type uint64;
description "State change count";
}
leaf symbol_error_count {
type uint64;
description "Symbol error count";
}
leaf crc_error_count {
type uint64;
description "CRC error count";
}
leaf first_timestamp {
type uint64;
description "First timestamp";
}
leaf end_timestamp {
type uint64;
description "End timestamp";
}
leaf timestamp {
type uint64;
description "Timestamp";
}
}
grouping l2-drops {
description "What-just happened drops.";
uses reason_info;
uses packet_info;
}
grouping router-drops {
description "What-just happened drops.";
uses reason_info;
uses packet_info;
}
grouping tunnel-drops {
description "What-just happened drops.";
uses reason_info;
uses packet_info;
}
grouping acl-drops {
description "What-just happened drops.";
uses reason_info;
uses packet_info;
leaf acl_rule_id {
type uint64;
description "ACL rule ID";
}
leaf acl_bind_point {
type uint32;
description "ACL bind point";
}
leaf acl_name {
type string;
description "ACL name";
}
leaf acl_rule {
type string;
description "ACL rule";
}
}
grouping buffer-drops {
description "What-just happened drops.";
uses reason_info;
uses packet_info;
leaf traffic_class {
type uint32;
description "Traffic Class";
}
leaf original_occupancy {
type uint32;
description "Original occupancy";
}
leaf original_latency {
type uint64;
description "Original latency";
}
}
}
Collect WJH Data Using gNMI
You can export What Just Happened data from the NetQ agent to your own gNMI client. Refer to the previous section for the nvidia-if-wjh-drop-aggregate reference YANG model.
Supported Features
The gNMI agent supports capability and stream subscribe requests for WJH events.
If you are using SONiC, WJH data can only be collected using gNMI.
WJH Drop Reasons
The data NetQ sends to the gNMI agent is in the form of WJH drop reasons. The reasons are generated by the SDK and are stored in the /usr/etc/wjh_lib_conf.xml file on the switch. Use this file as a guide to filter for specific reason types (L1, ACL, and so forth), reason IDs, or event severities.
L1 Drop Reasons
Reason ID
Reason
Description
10021
Port admin down
Validate port configuration
10022
Auto-negotiation failure
Set port speed manually, disable auto-negotiation
10023
Logical mismatch with peer link
Check cable/transceiver
10024
Link training failure
Check cable/transceiver
10025
Peer is sending remote faults
Replace cable/transceiver
10026
Bad signal integrity
Replace cable/transceiver
10027
Cable/transceiver is not supported
Use supported cable/transceiver
10028
Cable/transceiver is unplugged
Plug cable/transceiver
10029
Calibration failure
Check cable/transceiver
10030
Cable/transceiver bad status
Check cable/transceiver
10031
Other reason
Other L1 drop reason
L2 Drop Reasons
Reason ID
Reason
Severity
Description
201
MLAG port isolation
Notice
Expected behavior
202
Destination MAC is reserved (DMAC=01-80-C2-00-00-0x)
Error
Bad packet was received from the peer
203
VLAN tagging mismatch
Error
Validate the VLAN tag configuration on both ends of the link
204
Ingress VLAN filtering
Error
Validate the VLAN membership configuration on both ends of the link
205
Ingress spanning tree filter
Notice
Expected behavior
206
Unicast MAC table action discard
Error
Validate MAC table for this destination MAC
207
Multicast egress port list is empty
Warning
Validate why IGMP join or multicast router port does not exist
208
Port loopback filter
Error
Validate MAC table for this destination MAC
209
Source MAC is multicast
Error
Bad packet was received from peer
210
Source MAC equals destination MAC
Error
Bad packet was received from peer
Router Drop Reasons
Reason ID
Reason
Severity
Description
301
Non-routable packet
Notice
Expected behavior
302
Blackhole route
Warning
Validate routing table for this destination IP
303
Unresolved neighbor/next hop
Warning
Validate ARP table for the neighbor/next hop
304
Blackhole ARP/neighbor
Warning
Validate ARP table for the next hop
305
IPv6 destination in multicast scope FFx0:/16
Notice
Expected behavior - packet is not routable
306
IPv6 destination in multicast scope FFx1:/16
Notice
Expected behavior - packet is not routable
307
Non-IP packet
Notice
Destination MAC is the router, packet is not routable
308
Unicast destination IP but multicast destination MAC
Error
Bad packet was received from the peer
309
Destination IP is loopback address
Error
Bad packet was received from the peer
310
Source IP is multicast
Error
Bad packet was received from the peer
311
Source IP is in class E
Error
Bad packet was received from the peer
312
Source IP is loopback address
Error
Bad packet was received from the peer
313
Source IP is unspecified
Error
Bad packet was received from the peer
314
Checksum or IPver or IPv4 IHL too short
Error
Bad cable or bad packet was received from the peer
315
Multicast MAC mismatch
Error
Bad packet was received from the peer
316
Source IP equals destination IP
Error
Bad packet was received from the peer
317
IPv4 source IP is limited broadcast
Error
Bad packet was received from the peer
318
IPv4 destination IP is local network (destination=0.0.0.0/8)
Error
Bad packet was received from the peer
320
Ingress router interface is disabled
Warning
Validate your configuration
321
Egress router interface is disabled
Warning
Validate your configuration
323
IPv4 routing table (LPM) unicast miss
Warning
Validate routing table for this destination IP
324
IPv6 routing table (LPM) unicast miss
Warning
Validate routing table for this destination IP
325
Router interface loopback
Warning
Validate the interface configuration
326
Packet size is larger than router interface MTU
Warning
Validate the router interface MTU configuration
327
TTL value is too small
Warning
Actual path is longer than the TTL
Tunnel Drop Reasons
Reason ID
Reason
Severity
Description
402
Overlay switch - Source MAC is multicast
Error
The peer sent a bad packet
403
Overlay switch - Source MAC equals destination MAC
Error
The peer sent a bad packet
404
Decapsulation error
Error
The peer sent a bad packet
ACL Drop Reasons
Reason ID
Reason
Severity
Description
601
Ingress port ACL
Notice
Validate ACL configuration
602
Ingress router ACL
Notice
Validate ACL configuration
603
Egress router ACL
Notice
Validate ACL configuration
604
Egress port ACL
Notice
Validate ACL configuration
Buffer Drop Reasons
Reason ID
Reason
Severity
Description
503
Tail drop
Warning
Monitor network congestion
504
WRED
Warning
Monitor network congestion
505
Port TC congestion threshold crossed
Notice
Monitor network congestion
506
Packet latency threshold crossed
Notice
Monitor network congestion
gNMI Client Requests
You can use your gNMI client on a host server to request capabilities and data that the agent is subscribed to.
The following example shows a gNMI client request for interface speed:
The following table lists all system event messages organized by type. You can view these messages through third-party notification applications. For details about configuring notifications for these events, refer to Configure System Event Notifications.
Agent Events
Type
Trigger
Severity
Message Format
Example
agent
NetQ Agent state changed to Rotten (not heard from in over 15 seconds)
Error
Agent state changed to rotten
Agent state changed to rotten
agent
NetQ Agent rebooted
Error
Netq-agent rebooted at (@last_boot)
Netq-agent rebooted at 1573166417
agent
Node running NetQ Agent rebooted
Error
Switch rebooted at (@sys_uptime)
Switch rebooted at 1573166131
agent
NetQ Agent state changed to Fresh
Info
Agent state changed to fresh
Agent state changed to fresh
agent
NetQ Agent state was reset
Info
Agent state was paused and resumed at (@last_reinit)
Agent state was paused and resumed at 1573166125
agent
Version of NetQ Agent has changed
Info
Agent version has been changed old_version:@old_version and new_version:@new_version. Agent reset at @sys_uptime
Agent version has been changed old_version:2.1.2 and new_version:2.3.1. Agent reset at 1573079725
BGP Events
Type
Trigger
Severity
Message Format
Example
bgp
BGP Session state changed
Error
BGP session with peer @peer @neighbor vrf @vrf state changed from @old_state to @new_state
BGP session with peer leaf03 leaf04 vrf mgmt state changed from Established to Failed
bgp
BGP Session state changed from Failed to Established
Info
BGP session with peer @peer @peerhost @neighbor vrf @vrf session state changed from Failed to Established
BGP session with peer swp5 spine02 spine03 vrf default session state changed from Failed to Established
bgp
BGP Session state changed from Established to Failed
Info
BGP session with peer @peer @neighbor vrf @vrf state changed from established to failed
BGP session with peer leaf03 leaf04 vrf mgmt state changed from down to up
bgp
The reset time for a BGP session changed
Info
BGP session with peer @peer @neighbor vrf @vrf reset time changed from @old_last_reset_time to @new_last_reset_time
BGP session with peer spine03 swp9 vrf vrf2 reset time changed from 1559427694 to 1559837484
BTRFS Events
Type
Trigger
Severity
Message Format
Example
btrfsinfo
Disk space available after BTRFS allocation is less than 80% of partition size or only 2 GB remain.
Error
@info : @details
high btrfs allocation space : greater than 80% of partition size, 61708420
btrfsinfo
Indicates if a rebalance operation can free up space on the disk
Error
@info : @details
data storage efficiency : space left after allocation greater than chunk size 6170849.2","
Cable Events
Type
Trigger
Severity
Message Format
Example
cable
Link speed is not the same on both ends of the link
Error
@ifname speed @speed, mismatched with peer @peer @peer_if speed @peer_speed
swp2 speed 10, mismatched with peer server02 swp8 speed 40
cable
The speed setting for a given port changed
Info
@ifname speed changed from @old_speed to @new_speed
swp9 speed changed from 10 to 40
cable
The transceiver status for a given port changed
Info
@ifname transceiver changed from @old_transceiver to @new_transceiver
swp4 transceiver changed from disabled to enabled
cable
The vendor of a given transceiver changed
Info
@ifname vendor name changed from @old_vendor_name to @new_vendor_name
swp23 vendor name changed from Broadcom to NVIDIA
cable
The part number of a given transceiver changed
Info
@ifname part number changed from @old_part_number to @new_part_number
swp7 part number changed from FP1ZZ5654002A to MSN2700-CS2F0
cable
The serial number of a given transceiver changed
Info
@ifname serial number changed from @old_serial_number to @new_serial_number
swp4 serial number changed from 571254X1507020 to MT1552X12041
cable
The status of forward error correction (FEC) support for a given port changed
Info
@ifname supported fec changed from @old_supported_fec to @new_supported_fec
swp12 supported fec changed from supported to unsupported
swp12 supported fec changed from unsupported to supported
cable
The advertised support for FEC for a given port changed
Info
@ifname supported fec changed from @old_advertised_fec to @new_advertised_fec
swp24 supported FEC changed from advertised to not advertised
cable
The FEC status for a given port changed
Info
@ifname fec changed from @old_fec to @new_fec
swp15 fec changed from disabled to enabled
CLAG/MLAG Events
Type
Trigger
Severity
Message Format
Example
clag
CLAG remote peer state changed from up to down
Error
Peer state changed to down
Peer state changed to down
clag
Local CLAG host MTU does not match its remote peer MTU
Error
SVI @svi1 on vlan @vlan mtu @mtu1 mismatched with peer mtu @mtu2
SVI svi7 on vlan 4 mtu 1592 mistmatched with peer mtu 1680
clag
CLAG SVI on VLAN is missing from remote peer state
Warning
SVI on vlan @vlan is missing from peer
SVI on vlan vlan4 is missing from peer
clag
CLAG peerlink is not opperating at full capacity. At least one link is down.
Warning
Clag peerlink not at full redundancy, member link @slave is down
Clag peerlink not at full redundancy, member link swp40 is down
clag
CLAG remote peer state changed from down to up
Info
Peer state changed to up
Peer state changed to up
clag
Local CLAG host state changed from down to up
Info
Clag state changed from down to up
Clag state changed from down to up
clag
CLAG bond in Conflicted state updated with new bonds
Info
Clag conflicted bond changed from @old_conflicted_bonds to @new_conflicted_bonds
Clag conflicted bond changed from swp7 swp8 to @swp9 swp10
clag
CLAG bond changed state from protodown to up state
Info
Clag conflicted bond changed from @old_state_protodownbond to @new_state_protodownbond
Clag conflicted bond changed from protodown to up
CL Support Events
Type
Trigger
Severity
Message Format
Example
clsupport
A new CL Support file has been created for the given node
Error
HostName @hostname has new CL SUPPORT file
HostName leaf01 has new CL SUPPORT file
Config Diff Events
Type
Trigger
Severity
Message Format
Example
configdiff
Configuration file deleted on a device
Error
@hostname config file @type was deleted
spine03 config file /etc/frr/frr.conf was deleted
configdiff
Configuration file has been created
Info
@hostname config file @type was created
leaf12 config file /etc/lldp.d/README.conf was created
configdiff
Configuration file has been modified
Info
@hostname config file @type was modified
spine03 config file /etc/frr/frr.conf was modified
EVPN Events
Type
Trigger
Severity
Message Format
Example
evpn
A VNI was configured and moved from the up state to the down state
Error
VNI @vni state changed from up to down
VNI 36 state changed from up to down
evpn
A VNI was configured and moved from the down state to the up state
Info
VNI @vni state changed from down to up
VNI 36 state changed from down to up
evpn
The kernel state changed on a VNI
Info
VNI @vni kernel state changed from @old_in_kernel_state to @new_in_kernel_state
VNI 3 kernel state changed from down to up
evpn
A VNI state changed from not advertising all VNIs to advertising all VNIs
Info
VNI @vni vni state changed from @old_adv_all_vni_state to @new_adv_all_vni_state
VNI 11 vni state changed from false to true
Lifecycle Management Events
Type
Trigger
Severity
Message Format
Example
lcm
Cumulus Linux backup started for a switch or host
Info
CL configuration backup started for hostname @hostname
CL configuration backup started for hostname spine01
lcm
Cumulus Linux backup completed for a switch or host
Info
CL configuration backup completed for hostname @hostname
CL configuration backup completed for hostname spine01
lcm
Cumulus Linux backup failed for a switch or host
Error
CL configuration backup failed for hostname @hostname
CL configuration backup failed for hostname spine01
lcm
Cumulus Linux upgrade from one version to a newer version has started for a switch or host
Error
CL Image upgrade from version @old_cl_version to version @new_cl_version started for hostname @hostname
CL Image upgrade from version 4.1.0 to version 4.2.1 started for hostname server01
lcm
Cumulus Linux upgrade from one version to a newer version has completed successfully for a switch or host
Info
CL Image upgrade from version @old_cl_version to version @new_cl_version completed for hostname @hostname
CL Image upgrade from version 4.1.0 to version 4.2.1 completed for hostname server01
lcm
Cumulus Linux upgrade from one version to a newer version has failed for a switch or host
Error
CL Image upgrade from version @old_cl_version to version @new_cl_version failed for hostname @hostname
CL Image upgrade from version 4.1.0 to version 4.2.1 failed for hostname server01
lcm
Restoration of a Cumulus Linux configuration started for a switch or host
Info
CL configuration restore started for hostname @hostname
CL configuration restore started for hostname leaf01
lcm
Restoration of a Cumulus Linux configuration completed successfully for a switch or host
Info
CL configuration restore completed for hostname @hostname
CL configuration restore completed for hostname leaf01
lcm
Restoration of a Cumulus Linux configuration failed for a switch or host
Error
CL configuration restore failed for hostname @hostname
CL configuration restore failed for hostname leaf01
lcm
Rollback of a Cumulus Linux image has started for a switch or host
Error
CL Image rollback from version @old_cl_version to version @new_cl_version started for hostname @hostname
CL Image rollback from version 4.2.1 to version 4.1.0 started for hostname leaf01
lcm
Rollback of a Cumulus Linux image has completed successfully for a switch or host
Info
CL Image rollback from version @old_cl_version to version @new_cl_version completed for hostname @hostname
CL Image rollback from version 4.2.1 to version 4.1.0 completed for hostname leaf01
lcm
Rollback of a Cumulus Linux image has failed for a switch or host
Error
CL Image rollback from version @old_cl_version to version @new_cl_version failed for hostname @hostname
CL Image rollback from version 4.2.1 to version 4.1.0 failed for hostname leaf01
lcm
Installation of a NetQ image has started for a switch or host
Info
NetQ Image version @netq_version installation started for hostname @hostname
NetQ Image version 3.2.0 installation started for hostname spine02
lcm
Installation of a NetQ image has completed successfully for a switch or host
Info
NetQ Image version @netq_version installation completed for hostname @hostname
NetQ Image version 3.2.0 installation completed for hostname spine02
lcm
Installation of a NetQ image has failed for a switch or host
Error
NetQ Image version @netq_version installation failed for hostname @hostname
NetQ Image version 3.2.0 installation failed for hostname spine02
lcm
Upgrade of a NetQ image has started for a switch or host
Info
NetQ Image upgrade from version @old_netq_version to version @netq_version started for hostname @hostname
NetQ Image upgrade from version 3.1.0 to version 3.2.0 started for hostname spine02
lcm
Upgrade of a NetQ image has completed successfully for a switch or host
Info
NetQ Image upgrade from version @old_netq_version to version @netq_version completed for hostname @hostname
NetQ Image upgrade from version 3.1.0 to version 3.2.0 completed for hostname spine02
lcm
Upgrade of a NetQ image has failed for a switch or host
Error
NetQ Image upgrade from version @old_netq_version to version @netq_version failed for hostname @hostname
NetQ Image upgrade from version 3.1.0 to version 3.2.0 failed for hostname spine02
Link Events
Type
Trigger
Severity
Message Format
Example
link
Link operational state changed from up to down
Error
HostName @hostname changed state from @old_state to @new_state Interface:@ifname
HostName leaf01 changed state from up to down Interface:swp34
link
Link operational state changed from down to up
Info
HostName @hostname changed state from @old_state to @new_state Interface:@ifname
HostName leaf04 changed state from down to up Interface:swp11
LLDP Events
Type
Trigger
Severity
Message Format
Example
lldp
Local LLDP host has new neighbor information
Info
LLDP Session with host @hostname and @ifname modified fields @changed_fields
LLDP Session with host leaf02 swp6 modified fields leaf06 swp21
lldp
Local LLDP host has new peer interface name
Info
LLDP Session with host @hostname and @ifname @old_peer_ifname changed to @new_peer_ifname
LLDP Session with host spine01 and swp5 swp12 changed to port12
lldp
Local LLDP host has new peer hostname
Info
LLDP Session with host @hostname and @ifname @old_peer_hostname changed to @new_peer_hostname
LLDP Session with host leaf03 and swp2 leaf07 changed to exit01
MTU Events
Type
Trigger
Severity
Message Format
Example
mtu
VLAN interface link MTU is smaller than that of its parent MTU
Warning
vlan interface @link mtu @mtu is smaller than parent @parent mtu @parent_mtu
vlan interface swp3 mtu 1500 is smaller than parent peerlink-1 mtu 1690
mtu
Bridge interface MTU is smaller than the member interface with the smallest MTU
Warning
bridge @link mtu @mtu is smaller than least of member interface mtu @min
bridge swp0 mtu 1280 is smaller than least of member interface mtu 1500
NTP Events
Type
Trigger
Severity
Message Format
Example
ntp
NTP sync state changed from in sync to not in sync
Error
Sync state changed from @old_state to @new_state for @hostname
Sync state changed from in sync to not sync for leaf06
ntp
NTP sync state changed from not in sync to in sync
Info
Sync state changed from @old_state to @new_state for @hostname
Sync state changed from not sync to in sync for leaf06
OSPF Events
Type
Trigger
Severity
Message Format
Example
ospf
OSPF session state on a given interface changed from Full to a down state
Error
OSPF session @ifname with @peer_address changed from Full to @down_state
OSPF session swp7 with 27.0.0.18 state changed from Full to Fail
OSPF session swp7 with 27.0.0.18 state changed from Full to ExStart
ospf
OSPF session state on a given interface changed from a down state to full
Info
OSPF session @ifname with @peer_address changed from @down_state to Full
OSPF session swp7 with 27.0.0.18 state changed from Down to Full
OSPF session swp7 with 27.0.0.18 state changed from Init to Full
OSPF session swp7 with 27.0.0.18 state changed from Fail to Full
Package Information Events
Type
Trigger
Severity
Message Format
Example
packageinfo
Package version on device does not match the version identified in the existing manifest
Error
@package_name manifest version mismatch
netq-apps manifest version mismatch
PTM Events
Type
Trigger
Severity
Message Format
Example
ptm
Physical interface cabling does not match configuration specified in topology.dot file
Error
PTM cable status failed
PTM cable status failed
ptm
Physical interface cabling matches configuration specified in topology.dot file
Error
PTM cable status passed
PTM cable status passed
Resource Events
Type
Trigger
Severity
Message Format
Example
resource
A physical resource has been deleted from a device
Error
Resource Utils deleted for @hostname
Resource Utils deleted for spine02
resource
Root file system access on a device has changed from Read/Write to Read Only
Error
@hostname root file system access mode set to Read Only
server03 root file system access mode set to Read Only
resource
Root file system access on a device has changed from Read Only to Read/Write
Info
@hostname root file system access mode set to Read/Write
leaf11 root file system access mode set to Read/Write
resource
A physical resource has been added to a device
Info
Resource Utils added for @hostname
Resource Utils added for spine04
Running Config Diff Events
Type
Trigger
Severity
Message Format
Example
runningconfigdiff
Running configuration file has been modified
Info
@commandname config result was modified
@commandname config result was modified
Sensor Events
Type
Trigger
Severity
Message Format
Example
sensor
A fan or power supply unit sensor has changed state
Error
Sensor @sensor state changed from @old_s_state to @new_s_state
Sensor fan state changed from up to down
sensor
A temperature sensor has crossed the maximum threshold for that sensor
Error
Sensor @sensor max value @new_s_max exceeds threshold @new_s_crit
Sensor temp max value 110 exceeds the threshold 95
sensor
A temperature sensor has crossed the minimum threshold for that sensor
Error
Sensor @sensor min value @new_s_lcrit fall behind threshold @new_s_min
Sensor psu min value 10 fell below threshold 25
sensor
A temperature, fan, or power supply sensor state changed
Info
Sensor @sensor state changed from @old_state to @new_state
Sensor temperature state changed from Error to ok
Sensor fan state changed from absent to ok
Sensor psu state changed from bad to ok
sensor
A fan or power supply sensor state changed
Info
Sensor @sensor state changed from @old_s_state to @new_s_state
Sensor fan state changed from down to up
Sensor psu state changed from down to up
Services Events
Type
Trigger
Severity
Message Format
Example
services
A service status changed from down to up
Error
Service @name status changed from @old_status to @new_status
Service bgp status changed from down to up
services
A service status changed from up to down
Error
Service @name status changed from @old_status to @new_status
Service lldp status changed from up to down
services
A service changed state from inactive to active
Info
Service @name changed state from inactive to active
Service bgp changed state from inactive to active
Service lldp changed state from inactive to active
SSD Utilization Events
Type
Trigger
Severity
Message Format
Example
ssdutil
3ME3 disk health has dropped below 10%
Error
@info: @details
low health : 5.0%
ssdutil
A dip in 3ME3 disk health of more than 2% has occurred within the last 24 hours
Error
@info: @details
significant health drop : 3.0%
Version Events
Type
Trigger
Severity
Message Format
Example
version
An unknown version of the operating system was detected
Error
unexpected os version @my_ver
unexpected os version cl3.2
version
Desired version of the operating system is not available
Error
os version @ver
os version cl3.7.9
version
An unknown version of a software package was detected
Error
expected release version @ver
expected release version cl3.6.2
version
Desired version of a software package is not available
Error
different from version @ver
different from version cl4.0
VXLAN Events
Type
Trigger
Severity
Message Format
Example
vxlan
Replication list is contains an inconsistent set of nodes<>
Error<>
VNI @vni replication list inconsistent with @conflicts diff:@diff<>
VNI 14 replication list inconsistent with ["leaf03","leaf04"] diff:+:["leaf03","leaf04"] -:["leaf07","leaf08"]
TCA Event Messages Reference
This reference lists the threshold-based events that NetQ supports. You can view these messages through third-party notification applications. For details about configuring notifications for these events, refer to Configure Threshold-Crossing Event Notifications.
ACL Resources
NetQ UI Name
NetQ CLI Event ID
Description
Ingress ACL IPv4 %
TCA_TCAM_IN_ACL_V4_FILTER_UPPER
Number of ingress ACL filters for IPv4 addresses on a given switch or host exceeded user-defined threshold
Egress ACL IPv4 %
TCA_TCAM_EG_ACL_V4_FILTER_UPPER
Number of egress ACL filters for IPv4 addresses on a given switch or host exceeded user-defined maximum threshold
Ingress ACL IPv4 Mangle %
TCA_TCAM_IN_ACL_V4_MANGLE_UPPER
Number of ingress ACL mangles for IPv4 addresses on a given switch or host exceeded user-defined maximum threshold
Ingress ACL IPv4 Mangle %
TCA_TCAM_EG_ACL_V4_MANGLE_UPPER
Number of egress ACL mangles for IPv4 addresses on a given switch or host exceeded user-defined maximum threshold
Ingress ACL IPv6 %
TCA_TCAM_IN_ACL_V6_FILTER_UPPER
Number of ingress ACL filters for IPv6 addresses on a given switch or host exceeded user-defined maximum threshold
Egress ACL IPv6 %
TCA_TCAM_EG_ACL_V6_FILTER_UPPER
Number of egress ACL filters for IPv6 addresses on a given switch or host exceeded user-defined maximum threshold
Ingress ACL IPv6 Mangle %
TCA_TCAM_IN_ACL_V6_MANGLE_UPPER
Number of ingress ACL mangles for IPv6 addresses on a given switch or host exceeded user-defined maximum threshold
Egress ACL IPv6 Mangle %
TCA_TCAM_EG_ACL_V6_MANGLE_UPPER
Number of egress ACL mangles for IPv6 addresses on a given switch or host exceeded user-defined maximum threshold
Ingress ACL 8021x %
TCA_TCAM_IN_ACL_8021x_FILTER_UPPER
Number of ingress ACL 802.1 filters on a given switch or host exceeded user-defined maximum threshold
ACL L4 port %
TCA_TCAM_ACL_L4_PORT_CHECKERS_UPPER
Number of ACL port range checkers on a given switch or host exceeded user-defined maximum threshold
ACL Regions %
TCA_TCAM_ACL_REGIONS_UPPER
Number of ACL regions on a given switch or host exceeded user-defined maximum threshold
Ingress ACL Mirror %
TCA_TCAM_IN_ACL_MIRROR_UPPER
Number of ingress ACL mirrors on a given switch or host exceeded user-defined maximum threshold
ACL 18B Rules %
TCA_TCAM_ACL_18B_RULES_UPPER
Number of ACL 18B rules on a given switch or host exceeded user-defined maximum threshold
ACL 32B %
TCA_TCAM_ACL_32B_RULES_UPPER
Number of ACL 32B rules on a given switch or host exceeded user-defined maximum threshold
ACL 54B %
TCA_TCAM_ACL_54B_RULES_UPPER
Number of ACL 54B rules on a given switch or host exceeded user-defined maximum threshold
Ingress PBR IPv4 %
TCA_TCAM_IN_PBR_V4_FILTER_UPPER
Number of ingress policy-based routing (PBR) filters for IPv4 addresses on a given switch or host exceeded user-defined maximum threshold
Ingress PBR IPv6 %
TCA_TCAM_IN_PBR_V6_FILTER_UPPER
Number of ingress policy-based routing (PBR) filters for IPv6 addresses on a given switch or host exceeded user-defined maximum threshold
Digital Optics
Some of the event IDs have changed. If you have TCA rules configured for digital optics for a NetQ 3.1.0 deployment or earlier, verify that they are using the correct event IDs. You might need to remove and recreate some of the events.
NetQ UI Name
NetQ CLI Event ID
Description
Laser RX Power Alarm Upper
TCA_DOM_RX_POWER_ALARM_UPPER
Transceiver Input power (mW) for the digital optical module on a given switch or host interface exceeded user-defined the maximum alarm threshold
Laser RX Power Alarm Lower
TCA_DOM_RX_POWER_ALARM_LOWER
Transceiver Input power (mW) for the digital optical module on a given switch or host exceeded user-defined minimum alarm threshold
Laser RX Power Warning Upper
TCA_DOM_RX_POWER_WARNING_UPPER
Transceiver Input power (mW) for the digital optical module on a given switch or host exceeded user-defined specified warning threshold
Laser RX Power Warning Lower
TCA_DOM_RX_POWER_WARNING_LOWER
Transceiver Input power (mW) for the digital optical module on a given switch or host exceeded user-defined minimum warning threshold
Laser Bias Current Alarm Upper
TCA_DOM_BIAS_CURRENT_ALARM_UPPER
Laser bias current (mA) for the digital optical module on a given switch or host exceeded user-defined maximum alarm threshold
Laser Bias Current Alarm Lower
TCA_DOM_BIAS__CURRENT_ALARM_LOWER
Laser bias current (mA) for the digital optical module on a given switch or host exceeded user-defined minimum alarm threshold
Laser Bias Current Warning Upper
TCA_DOM_BIAS_CURRENT_WARNING_UPPER
Laser bias current (mA) for the digital optical module on a given switch or host exceeded user-defined maximum warning threshold
Laser Bias Current Warning Lower
TCA_DOM_BIAS__CURRENT_WARNING_LOWER
Laser bias current (mA) for the digital optical module on a given switch or host exceeded user-defined minimum warning threshold
Laser Output Power Alarm Upper
TCA_DOM_OUTPUT_POWER_ALARM_UPPER
Laser output power (mW) for the digital optical module on a given switch or host exceeded user-defined maximum alarm threshold
Laser Output Power Alarm Lower
TCA_DOM_OUTPUT_POWER_ALARM_LOWER
Laser output power (mW) for the digital optical module on a given switch or host exceeded user-defined minimum alarm threshold
Laser Output Power Alarm Upper
TCA_DOM_OUTPUT_POWER_WARNING_UPPER
Laser output power (mW) for the digital optical module on a given switch or host exceeded user-defined maximum warning threshold
Laser Output Power Warning Lower
TCA_DOM_OUTPUT_POWER_WARNING_LOWER
Laser output power (mW) for the digital optical module on a given switch or host exceeded user-defined minimum warning threshold
Laser Module Temperature Alarm Upper
TCA_DOM_MODULE_TEMPERATURE_ALARM_UPPER
Digital optical module temperature (°C) on a given switch or host exceeded user-defined maximum alarm threshold
Laser Module Temperature Alarm Lower
TCA_DOM_MODULE_TEMPERATURE_ALARM_LOWER
Digital optical module temperature (°C) on a given switch or host exceeded user-defined minimum alarm threshold
Laser Module Temperature Warning Upper
TCA_DOM_MODULE_TEMPERATURE_WARNING_UPPER
Digital optical module temperature (°C) on a given switch or host exceeded user-defined maximum warning threshold
Laser Module Temperature Warning Lower
TCA_DOM_MODULE_TEMPERATURE_WARNING_LOWER
Digital optical module temperature (°C) on a given switch or host exceeded user-defined minimum warning threshold
Laser Module Voltage Alarm Upper
TCA_DOM_MODULE_VOLTAGE_ALARM_UPPER
Transceiver voltage (V) on a given switch or host exceeded user-defined maximum alarm threshold
Laser Module Voltage Alarm Lower
TCA_DOM_MODULE_VOLTAGE_ALARM_LOWER
Transceiver voltage (V) on a given switch or host exceeded user-defined minimum alarm threshold
Laser Module Voltage Warning Upper
TCA_DOM_MODULE_VOLTAGE_WARNING_UPPER
Transceiver voltage (V) on a given switch or host exceeded user-defined maximum warning threshold
Laser Module Voltage Warning Lower
TCA_DOM_MODULE_VOLTAGE_WARNING_LOWER
Transceiver voltage (V) on a given switch or host exceeded user-defined minimum warning threshold
Forwarding Resources
NetQ UI Name
NetQ CLI Event ID
Description
Total Route Entries %
TCA_TCAM_TOTAL_ROUTE_ENTRIES_UPPER
Number of routes on a given switch or host exceeded user-defined maximum threshold
Mcast Routes %
TCA_TCAM_TOTAL_MCAST_ROUTES_UPPER
Number of multicast routes on a given switch or host exceeded user-defined maximum threshold
MAC entries %
TCA_TCAM_MAC_ENTRIES_UPPER
Number of MAC addresses on a given switch or host exceeded user-defined maximum threshold
IPv4 Routes %
TCA_TCAM_IPV4_ROUTE_UPPER
Number of IPv4 routes on a given switch or host exceeded user-defined maximum threshold
IPv4 Hosts %
TCA_TCAM_IPV4_HOST_UPPER
Number of IPv4 hosts on a given switch or host exceeded user-defined maximum threshold
Exceeding IPV6 Routes %
TCA_TCAM_IPV6_ROUTE_UPPER
Number of IPv6 routes on a given switch or host exceeded user-defined maximum threshold
IPv6 Hosts %
TCA_TCAM_IPV6_HOST_UPPER
Number of IPv6 hosts on a given switch or host exceeded user-defined maximum threshold
ECMP Next Hop %
TCA_TCAM_ECMP_NEXTHOPS_UPPER
Number of equal cost multi-path (ECMP) next hop entries on a given switch or host exceeded user-defined maximum threshold
Interface Errors
NetQ UI Name
NetQ CLI Event ID
Description
Oversize Errors
TCA_HW_IF_OVERSIZE_ERRORS
Number of times a frame longer than maximum size (1518 Bytes) exceeded user-defined threshold
Undersize Errors
TCA_HW_IF_UNDERSIZE_ERRORS
Number of times a frame shorter than minimum size (64 Bytes) exceeded user-defined threshold
Alignment Errors
TCA_HW_IF_ALIGNMENT_ERRORS
Number of times a frame with an uneven byte count and a CRC error exceeded user-defined threshold
Jabber Errors
TCA_HW_IF_JABBER_ERRORS
Number of times a frame longer than maximum size (1518 bytes) and with a CRC error exceeded user-defined threshold
Symbol Errors
TCA_HW_IF_SYMBOL_ERRORS
Number of times that detected undefined or invalid symbols exceeded user-defined threshold
Interface Statistics
NetQ UI Name
NetQ CLI Event ID
Description
Example Message
Broadcast Received Bytes
TCA_RXBROADCAST_UPPER
Number of broadcast receive bytes per second exceeded user-defined maximum threshold on a switch interface
Number of link flaps user-defined maximum threshold
Resource Utilization
NetQ UI Name
NetQ CLI Event ID
Description
Example Message
CPU Utilization
TCA_CPU_UTILIZATION_UPPER
Percentage of CPU utilization exceeded user-defined maximum threshold on a switch or host
CPU Utilization for host leaf11 exceed configured mark 85
Disk Utilization
TCA_DISK_UTILIZATION_UPPER
Percentage of disk utilization exceeded user-defined maximum threshold on a switch or host
Disk Utilization for host leaf11 exceed configured mark 90
Memory Utilization
TCA_MEMORY_UTILIZATION_UPPER
Percentage of memory utilization exceeded user-defined maximum threshold on a switch or host
Memory Utilization for host leaf11 exceed configured mark 95
RoCE
NetQ UI Name
NetQ CLI Event ID
Description
Rx CNP Buffer Usage Cells
TCA_RX_CNP_BUFFER_USAGE_CELLS
Percentage of Rx General+CNP buffer usage exceeded user-defined maximum threshold on a switch interface
Rx CNP No Buffer Discard
TCA_RX_CNP_NO_BUFFER_DISCARD
Rate of Rx General+CNP no buffer discard exceeded user-defined maximum threshold on a switch interface
Rx CNP PG Usage Cells
TCA_RX_CNP_PG_USAGE_CELLS
Percentage of Rx General+CNP PG usage exceeded user-defined maximum threshold on a switch interface
Rx RoCE Buffer Usage Cells
TCA_RX_ROCE_BUFFER_USAGE_CELLS
Percentage of Rx RoCE buffer usage exceeded user-defined maximum threshold on a switch interface
Rx RoCE No Buffer Discard
TCA_RX_ROCE_NO_BUFFER_DISCARD
Rate of Rx RoCE no buffer discard exceeded user-defined maximum threshold on a switch interface
Rx RoCE PG Usage Cells
TCA_RX_ROCE_PG_USAGE_CELLS
Percentage of Rx RoCE PG usage exceeded user-defined maximum threshold on a switch interface
Rx RoCE PFC Pause Duration
TCA_RX_ROCE_PFC_PAUSE_DURATION
Number of Rx RoCE PFC pause duration exceeded user-defined maximum threshold on a switch interface
Rx RoCE PFC Pause Packets
TCA_RX_ROCE_PFC_PAUSE_PACKETS
Rate of Rx RoCE PFC pause packets exceeded user-defined maximum threshold on a switch interface
Tx CNP Buffer Usage Cells
TCA_TX_CNP_BUFFER_USAGE_CELLS
Percentage of Tx General+CNP buffer usage exceeded user-defined maximum threshold on a switch interface
Tx CNP TC Usage Cells
TCA_TX_CNP_TC_USAGE_CELLS
Percentage of Tx CNP TC usage exceeded user-defined maximum threshold on a switch interface
Tx CNP Unicast No Buffer Discard
TCA_TX_CNP_UNICAST_NO_BUFFER_DISCARD
Rate of Tx CNP unicast no buffer discard exceeded user-defined maximum threshold on a switch interface
Tx ECN Marked Packets
TCA_TX_ECN_MARKED_PACKETS
Rate of Tx Port ECN marked packets exceeded user-defined maximum threshold on a switch interface
Tx RoCE Buffer Usage Cells
TCA_TX_ROCE_BUFFER_USAGE_CELLS
Percentage of Tx RoCE buffer usage exceeded user-defined maximum threshold on a switch interface
Tx RoCE PFC Pause Duration
TCA_TX_ROCE_PFC_PAUSE_DURATION
Number of Tx RoCE PFC pause duration exceeded user-defined maximum threshold on a switch interface
Tx RoCE PFC Pause Packets
TCA_TX_ROCE_PFC_PAUSE_PACKETS
Rate of Tx RoCE PFC pause packets exceeded user-defined maximum threshold on a switch interface
Tx RoCE TC Usage Cells
TCA_TX_ROCE_TC_USAGE_CELLS
Percentage of Tx RoCE TC usage exceeded user-defined maximum threshold on a switch interface
Tx RoCE Unicast No Buffer Discard
TCA_TX_ROCE_UNICAST_NO_BUFFER_DISCARD
Rate of Tx RoCE unicast no buffer discard exceeded user-defined maximum threshold on a switch interface
Sensors
NetQ UI Name
NetQ CLI Event ID
Description
Example Message
Fan Speed
TCA_SENSOR_FAN_UPPER
Fan speed exceeded user-defined maximum threshold on a switch
Sensor for spine03 exceeded threshold fan speed 700 for sensor fan2
Power Supply Watts
TCA_SENSOR_POWER_UPPER
Power supply output exceeded user-defined maximum threshold on a switch
Sensor for leaf14 exceeded threshold power 120 watts for sensor psu1
Power Supply Volts
TCA_SENSOR_VOLTAGE_UPPER
Power supply voltage exceeded user-defined maximum threshold on a switch
Sensor for leaf14 exceeded threshold voltage 12 volts for sensor psu2
Switch Temperature
TCA_SENSOR_TEMPERATURE_UPPER
Temperature (° C) exceeded user-defined maximum threshold on a switch
Sensor for leaf14 exceeded threshold temperature 90 for sensor temp1
What Just Happened
NetQ UI Name
NetQ CLI Event ID
Drop Type
Reason/Port Down Reason
Description
ACL Drop Aggregate Upper
TCA_WJH_ACL_DROP_AGG_UPPER
ACL
Egress port ACL
ACL action set to deny on the physical egress port or bond
ACL Drop Aggregate Upper
TCA_WJH_ACL_DROP_AGG_UPPER
ACL
Egress router ACL
ACL action set to deny on the egress switch virtual interfaces (SVIs)
ACL Drop Aggregate Upper
TCA_WJH_ACL_DROP_AGG_UPPER
ACL
Ingress port ACL
ACL action set to deny on the physical ingress port or bond
ACL Drop Aggregate Upper
TCA_WJH_ACL_DROP_AGG_UPPER
ACL
Ingress router ACL
ACL action set to deny on the ingress switch virtual interfaces (SVIs)
Buffer Drop Aggregate Upper
TCA_WJH_BUFFER_DROP_AGG_UPPER
Buffer
Packet Latency Threshold Crossed
Time a packet spent within the switch exceeded or dropped below the specified high or low threshold
Buffer Drop Aggregate Upper
TCA_WJH_BUFFER_DROP_AGG_UPPER
Buffer
Port TC Congestion Threshold Crossed
Percentage of the occupancy buffer exceeded or dropped below the specified high or low threshold
Buffer Drop Aggregate Upper
TCA_WJH_BUFFER_DROP_AGG_UPPER
Buffer
Tail drop
Tail drop is enabled, and buffer queue is filled to maximum capacity
Buffer Drop Aggregate Upper
TCA_WJH_BUFFER_DROP_AGG_UPPER
Buffer
WRED
Weighted Random Early Detection is enabled, and buffer queue is filled to maximum capacity or the RED engine dropped the packet as of random congestion prevention
CRC Error Upper
TCA_WJH_CRC_ERROR_UPPER
L1
Auto-negotiation failure
Negotiation of port speed with peer has failed
CRC Error Upper
TCA_WJH_CRC_ERROR_UPPER
L1
Bad signal integrity
Integrity of the signal on port is not sufficient for good communication
CRC Error Upper
TCA_WJH_CRC_ERROR_UPPER
L1
Cable/transceiver is not supported
The attached cable or transceiver is not supported by this port
CRC Error Upper
TCA_WJH_CRC_ERROR_UPPER
L1
Cable/transceiver is unplugged
A cable or transceiver is missing or not fully inserted into the port
CRC Error Upper
TCA_WJH_CRC_ERROR_UPPER
L1
Calibration failure
Calibration failure
CRC Error Upper
TCA_WJH_CRC_ERROR_UPPER
L1
Link training failure
Link is not able to go operational up due to link training failure
CRC Error Upper
TCA_WJH_CRC_ERROR_UPPER
L1
Peer is sending remote faults
Peer node is not operating correctly
CRC Error Upper
TCA_WJH_CRC_ERROR_UPPER
L1
Port admin down
Port has been purposely set down by user
Drop Aggregate Upper
TCA_WJH_DROP_AGG_UPPER
L2
Destination MAC is reserved (DMAC=01-80-C2-00-00-0x)
The address cannot be used by this link
Drop Aggregate Upper
TCA_WJH_DROP_AGG_UPPER
L2
Ingress spanning tree filter
Port is in Spanning Tree blocking state
Drop Aggregate Upper
TCA_WJH_DROP_AGG_UPPER
L2
Ingress VLAN filtering
Frames whose port is not a member of the VLAN are discarded
Drop Aggregate Upper
TCA_WJH_DROP_AGG_UPPER
L2
MLAG port isolation
Not supported for port isolation implemented with system ACL
Drop Aggregate Upper
TCA_WJH_DROP_AGG_UPPER
L2
Multicast egress port list is empty
No ports are defined for multicast egress
Drop Aggregate Upper
TCA_WJH_DROP_AGG_UPPER
L2
Port loopback filter
Port is operating in loopback mode; packets are being sent to itself (source MAC address is the same as the destination MAC address
Drop Aggregate Upper
TCA_WJH_DROP_AGG_UPPER
L2
Unicast MAC table action discard
Currently not supported
Drop Aggregate Upper
TCA_WJH_DROP_AGG_UPPER
L2
VLAN tagging mismatch
VLAN tags on the source and destination do not match
Drop Aggregate Upper
TCA_WJH_DROP_AGG_UPPER
Router
Blackhole ARP/neighbor
Packet received with blackhole adjacency
Drop Aggregate Upper
TCA_WJH_DROP_AGG_UPPER
Router
Blackhole route
Packet received with action equal to discard
Drop Aggregate Upper
TCA_WJH_DROP_AGG_UPPER
Router
Checksum or IPver or IPv4 IHL too short
Cannot read packet due to header checksum error, IP version mismatch, or IPv4 header length is too short
Drop Aggregate Upper
TCA_WJH_DROP_AGG_UPPER
Router
Destination IP is loopback address
Cannot read packet as destination IP address is a loopback address (dip=>127.0.0.0/8)
Drop Aggregate Upper
TCA_WJH_DROP_AGG_UPPER
Router
Egress router interface is disabled
Packet destined to a different subnet cannot be routed because egress router interface is disabled
Drop Aggregate Upper
TCA_WJH_DROP_AGG_UPPER
Router
Ingress router interface is disabled
Packet destined to a different subnet cannot be routed because ingress router interface is disabled
Drop Aggregate Upper
TCA_WJH_DROP_AGG_UPPER
Router
IPv4 destination IP is link local
Packet has IPv4 destination address that is a local link (destination in 169.254.0.0/16)
Drop Aggregate Upper
TCA_WJH_DROP_AGG_UPPER
Router
IPv4 destination IP is local network (destination=0.0.0.0/8)
Packet has IPv4 destination address that is a local network (destination=0.0.0.0/8)
Drop Aggregate Upper
TCA_WJH_DROP_AGG_UPPER
Router
IPv4 routing table (LPM) unicast miss
No route available in routing table for packet
Drop Aggregate Upper
TCA_WJH_DROP_AGG_UPPER
Router
IPv4 source IP is limited broadcast
Packet has broadcast source IP address
Drop Aggregate Upper
TCA_WJH_DROP_AGG_UPPER
Router
IPv6 destination in multicast scope FFx0:/16
Packet received with multicast destination address in FFx0:/16 address range
Drop Aggregate Upper
TCA_WJH_DROP_AGG_UPPER
Router
IPv6 destination in multicast scope FFx1:/16
Packet received with multicast destination address in FFx1:/16 address range
Drop Aggregate Upper
TCA_WJH_DROP_AGG_UPPER
Router
IPv6 routing table (LPM) unicast miss
No route available in routing table for packet
Drop Aggregate Upper
TCA_WJH_DROP_AGG_UPPER
Router
Multicast MAC mismatch
For IPv4, destination MAC address is not equal to {0x01-00-5E-0 (25 bits), DIP[22:0]} and DIP is multicast. For IPv6, destination MAC address is not equal to {0x3333, DIP[31:0]} and DIP is multicast
Drop Aggregate Upper
TCA_WJH_DROP_AGG_UPPER
Router
Non IP packet
Cannot read packet header because it is not an IP packet
Drop Aggregate Upper
TCA_WJH_DROP_AGG_UPPER
Router
Non-routable packet
Packet has no route in routing table
Drop Aggregate Upper
TCA_WJH_DROP_AGG_UPPER
Router
Packet size is larger than router interface MTU
Packet has larger MTU configured than the VLAN
Drop Aggregate Upper
TCA_WJH_DROP_AGG_UPPER
Router
Router interface loopback
Packet has destination IP address that is local. For example, SIP = 1.1.1.1, DIP = 1.1.1.128.
Drop Aggregate Upper
TCA_WJH_DROP_AGG_UPPER
Router
Source IP equals destination IP
Packet has a source IP address equal to the destination IP address
Drop Aggregate Upper
TCA_WJH_DROP_AGG_UPPER
Router
Source IP is in class E
Cannot read packet as source IP address is a Class E address
Drop Aggregate Upper
TCA_WJH_DROP_AGG_UPPER
Router
Source IP is loopback address
Cannot read packet as source IP address is a loopback address ( ipv4 => 127.0.0.0/8 for ipv6 => ::1/128)
Drop Aggregate Upper
TCA_WJH_DROP_AGG_UPPER
Router
Source IP is multicast
Cannot read packet as source IP address is a multicast address (ipv4 SIP => 224.0.0.0/4)
Drop Aggregate Upper
TCA_WJH_DROP_AGG_UPPER
Router
Source IP is unspecified
Cannot read packet as source IP address is unspecified (ipv4 = 0.0.0.0/32; for ipv6 = ::0)
Drop Aggregate Upper
TCA_WJH_DROP_AGG_UPPER
Router
TTL value is too small
Packet has TTL value of 1
Drop Aggregate Upper
TCA_WJH_DROP_AGG_UPPER
Router
Unicast destination IP but multicast destination MAC
Cannot read packet with IP unicast address when destination MAC address is not unicast (FF:FF:FF:FF:FF:FF)
Drop Aggregate Upper
TCA_WJH_DROP_AGG_UPPER
Router
Unresolved neighbor/next-hop
The next hop in the route is unknown
Drop Aggregate Upper
TCA_WJH_DROP_AGG_UPPER
Tunnel
Decapsulation error
De-capsulation produced incorrect format of packet. For example, encapsulation of packet with many VLANs or IP options on the underlay can cause de-capsulation to result in a short packet.
Drop Aggregate Upper
TCA_WJH_DROP_AGG_UPPER
Tunnel
Overlay switch - Source MAC equals destination MAC
Overlay packet’s source MAC address is the same as the destination MAC address
Drop Aggregate Upper
TCA_WJH_DROP_AGG_UPPER
Tunnel
Overlay switch - Source MAC is multicast
Overlay packet’s source MAC address is multicast
Symbol Error Upper
TCA_WJH_SYMBOL_ERROR_UPPER
L1
Auto-negotiation failure
Negotiation of port speed with peer has failed
Symbol Error Upper
TCA_WJH_SYMBOL_ERROR_UPPER
L1
Bad signal integrity
Integrity of the signal on port is not sufficient for good communication
Symbol Error Upper
TCA_WJH_SYMBOL_ERROR_UPPER
L1
Cable/transceiver is not supported
The attached cable or transceiver is not supported by this port
Symbol Error Upper
TCA_WJH_SYMBOL_ERROR_UPPER
L1
Cable/transceiver is unplugged
A cable or transceiver is missing or not fully inserted into the port
Symbol Error Upper
TCA_WJH_SYMBOL_ERROR_UPPER
L1
Calibration failure
Calibration failure
Symbol Error Upper
TCA_WJH_SYMBOL_ERROR_UPPER
L1
Link training failure
Link is not able to go operational up due to link training failure
Symbol Error Upper
TCA_WJH_SYMBOL_ERROR_UPPER
L1
Peer is sending remote faults
Peer node is not operating correctly
Symbol Error Upper
TCA_WJH_SYMBOL_ERROR_UPPER
L1
Port admin down
Port has been purposely set down by user
WJH Event Messages Reference
This reference lists all the NetQ-supported WJH metrics and provides a brief description of each. The full outputs vary slightly based on the type of drop and whether you are viewing the results in the NetQ UI or through one of the NetQ CLI commands.
Link is not able to go operational up due to link training failure
Peer is sending remote faults
Peer node is not operating correctly
Bad signal integrity
Integrity of the signal on port is not sufficient for good communication
Cable/transceiver is not supported
The attached cable or transceiver is not supported by this port
Cable/transceiver is unplugged
A cable or transceiver is missing or not fully inserted into the port
Calibration failure
Calibration failure
Port state changes counter
Cumulative number of state changes
Symbol error counter
Cumulative number of symbol errors
CRC error counter
Cumulative number of CRC errors
In addition to the reason, the information provided for these drops includes:
Parameter
Description
Corrective Action
Provides recommend actions to take to resolve the port down state
First Timestamp
Date and time this port was marked as down for the first time
Ingress Port
Port accepting incoming traffic
CRC Error Count
Number of CRC errors generated by this port
Symbol Error Count
Number of Symbol errors generated by this port
State Change Count
Number of state changes that have occurred on this port
OPID
Operation identifier; used for internal purposes
Is Port Up
Indicates whether the port is in an Up (true) or Down (false) state
Layer 2 Drops
Displays the reason for a link to be down.
Reason
Severity
Description
MLAG port isolation
Notice
Not supported for port isolation implemented with system ACL
Destination MAC is reserved (DMAC=01-80-C2-00-00-0x)
Error
The address cannot be used by this link
VLAN tagging mismatch
Error
VLAN tags on the source and destination do not match
Ingress VLAN filtering
Error
Frames whose port is not a member of the VLAN are discarded
Ingress spanning tree filter
Notice
Port is in Spanning Tree blocking state
Unicast MAC table action discard
Notice
Packet dropped due to a MAC table configuration rule
Multicast egress port list is empty
Warning
No ports are defined for multicast egress
Port loopback filter
Error
Port is operating in loopback mode; packets are being sent to itself (source MAC address is the same as the destination MAC address)
Source MAC is multicast
Error
Packets have multicast source MAC address
Source MAC equals destination MAC
Error
Source MAC address is the same as the destination MAC address
In addition to the reason, the information provided for these drops includes:
Parameter
Description
Source Port
Port ID where the link originates
Source IP
Port IP address where the link originates
Source MAC
Port MAC address where the link originates
Destination Port
Port ID where the link terminates
Destination IP
Port IP address where the link terminates
Destination MAC
Port MAC address where the link terminates
First Timestamp
Date and time this link was marked as down for the first time
Aggregate Count
Total number of dropped packets
Protocol
ID of the communication protocol running on this link
Ingress Port
Port accepting incoming traffic
OPID
Operation identifier; used for internal purposes
Router Drops
Displays the reason why the server is unable to route a packet.
Reason
Severity
Description
Non-routable packet
Notice
Packet has no route in routing table
Blackhole route
Warning
Packet received with action equal to discard
Unresolved next hop
Warning
The next hop in the route is unknown
Blackhole ARP/neighbor
Warning
Packet received with blackhole adjacency
IPv6 destination in multicast scope FFx0:/16
Notice
Packet received with multicast destination address in FFx0:/16 address range
IPv6 destination in multicast scope FFx1:/16
Notice
Packet received with multicast destination address in FFx1:/16 address range
Non-IP packet
Notice
Cannot read packet header because it is not an IP packet
Unicast destination IP but non-unicast destination MAC
Error
Cannot read packet with IP unicast address when destination MAC address is not unicast (FF:FF:FF:FF:FF:FF)
Destination IP is loopback address
Error
Cannot read packet as destination IP address is a loopback address (dip=>127.0.0.0/8)
Source IP is multicast
Error
Cannot read packet as source IP address is a multicast address (ipv4 SIP => 224.0.0.0/4)
Source IP is in class E
Error
Cannot read packet as source IP address is a Class E address
Source IP is loopback address
Error
Cannot read packet as source IP address is a loopback address (ipv4 => 127.0.0.0/8 for ipv6 => ::1/128)
Source IP is unspecified
Error
Cannot read packet as source IP address is unspecified (ipv4 = 0.0.0.0/32; for ipv6 = ::0)
Checksum or IP ver or IPv4 IHL too short
Error
Cannot read packet due to header checksum error, IP version mismatch, or IPv4 header length is too short
Multicast MAC mismatch
Error
For IPv4, destination MAC address is not equal to {0x01-00-5E-0 (25 bits), DIP[22:0]} and DIP is multicast. For IPv6, destination MAC address is not equal to {0x3333, DIP[31:0]} and DIP is multicast
Source IP equals destination IP
Error
Packet has a source IP address equal to the destination IP address
IPv4 source IP is limited broadcast
Error
Packet has broadcast source IP address
IPv4 destination IP is local network (destination = 0.0.0.0/8)
Error
Packet has IPv4 destination address that is a local network (destination=0.0.0.0/8)
IPv4 destination IP is link-local (destination in 169.254.0.0/16)
Error
Packet has IPv4 destination address that is a local link
Ingress router interface is disabled
Warning
Packet destined to a different subnet cannot be routed because ingress router interface is disabled
Egress router interface is disabled
Warning
Packet destined to a different subnet cannot be routed because egress router interface is disabled
IPv4 routing table (LPM) unicast miss
Warning
No route available in routing table for packet
IPv6 routing table (LPM) unicast miss
Warning
No route available in routing table for packet
Router interface loopback
Warning
Packet has destination IP address that is local. For example, SIP = 1.1.1.1, DIP = 1.1.1.128.
Packet size is larger than MTU
Warning
Packet has larger MTU configured than the VLAN
TTL value is too small
Warning
Packet has TTL value of 1
Tunnel Drops
Displays the reason for a tunnel to be down.
Reason
Severity
Description
Overlay switch - source MAC is multicast
Error
Overlay packet’s source MAC address is multicast
Overlay switch - source MAC equals destination MAC
Error
Overlay packet’s source MAC address is the same as the destination MAC address
Decapsulation error
Error
De-capsulation produced incorrect format of packet. For example, encapsulation of packet with many VLANs or IP options on the underlay can cause de-capsulation to result in a short packet.
Tunnel interface is disabled
Error
Packet cannot de-capsulate because the tunnel interface is disabled
Buffer Drops
Displays the reason why the server buffer has dropped packets.
Reason
Severity
Description
Tail drop
Warning
Tail drop is enabled, and buffer queue is filled to maximum capacity
WRED
Warning
Weighted Random Early Detection is enabled, and buffer queue is filled to maximum capacity or the RED engine dropped the packet as of random congestion prevention
Port TC Congestion Threshold Crossed
Warning
Percentage of the occupancy buffer exceeded or dropped below the specified high or low threshold
Packet Latency Threshold Crossed
Warning
Time a packet spent within the switch exceeded or dropped below the specified high or low threshold
ACL Drops
Displays the reason why an ACL has dropped packets.
Reason
Severity
Description
Ingress port ACL
Notice
ACL action set to deny on the physical ingress port or bond
Ingress router ACL
Notice
ACL action set to deny on the ingress switch virtual interfaces (SVIs)
Egress port ACL
Notice
ACL action set to deny on the physical egress port or bond
Use the UI or CLI to monitor Ethernet VPN (EVPN) on a networkwide or per-session basis.
EVPN Commands
Monitor EVPN with the following commands. See the command line reference for additional options, definitions, and examples.
netq show evpn
netq show events message_type evpn
netq show events-config message_type evpn
View EVPN in the UI
To add the EVPN card to your workbench, navigate to the header and select Add card > Network services > All EVPN Sessions card > Open cards. In this example, there are 6 nodes running the EVPN service, 0 open events (from the last 24 hours), and 48 VNIs.
View the Distribution of Layer-2 and -3 VNIs and Sessions
To view the number of sessions between devices and Virtual Network Identifiers (VNIs) that occur over layer 3, open the large EVPN Sessions card. In this example, there are 18 layer-3 VNIs.
Select the dropdown to display the switches with the most EVPN sessions, as well as the switches with the most layer-2 and layer-3 EVPN sessions.
You can view EVPN-related events by selecting the Events tab.
Expand the EVPN card to full-screen to view, filter, or export:
A list of switches and their associated VNIs
The address of the VNI endpoint
Whether the session is part of a layer 2 or layer 3 configuration
The associated VRF or VLAN (when defined)
The export and import route targets used for filtering
From this table, you can select a row, then click Add card above the table.
NetQ adds a new, EVPN ‘single-session’ card to your workbench. From this card, you can view the number of VTEPs (VXLAN Tunnel Endpoints) for a given EVPN session as well as the attributes of all EVPN sessions for a given VNI.
Monitor a Single EVPN Session
The EVPN single-session card displays the number of VTEPs for a given EVPN session (in this case, 48).
Expand the card to display the associated VRF (layer 3) or VLAN (layer 2) on each device participating in this session. The full-screen card displays all stored attributes of all EVPN sessions running networkwide.
Using NetQ on a Linux host is the same as using it on a Cumulus Linux switch. For example, if you want to check LLDP neighbor information about a given host, run:
Use the CLI to monitor OSI Layer 1 physical components on network devices, including interfaces, ports, links, and peers. You can monitor transceivers and cabling deployed per port (interface), per vendor, per part number, and so forth.
This information can help you:
Determine which ports are empty versus which ones have cables plugged in to help validate expected connectivity.
Audit transceiver and cable components by vendor, helping you estimate replacement costs, repair costs, and overall maintenance costs.
Identify mismatched links.
Identify when physical layer changes (for example, bonds and links going down or flapping) occurred.
NetQ uses
LLDP (Link Layer Discovery Protocol) to collect port information. NetQ can also identify peer ports connected to DACs (Direct Attached Cables) and AOCs (Active Optical Cables) without using LLDP, even if the link is not UP.
Physical Interfaces
View performance and status information about cables, transceiver modules, and interfaces with netq show interfaces physical:
View which cables connect to each interface port for all devices, including the module type, vendor, part number and performance characteristics.
View the cable information for a given device by adding a hostname to show.
▼
show interfaces physical
The following example shows cable information and status for all interface ports on all devices:
cumulus@switch:~$ netq show interfaces physical
Matching cables records:
Hostname Interface State Speed AutoNeg Module Vendor Part No Last Changed
----------------- ------------------------- ---------- ---------- ------- --------- -------------------- ---------------- -------------------------
border01 vagrant down Unknown off RJ45 n/a n/a Fri Sep 18 20:08:05 2020
border01 swp54 up 1G off RJ45 n/a n/a Fri Sep 18 20:08:05 2020
border01 swp49 up 1G off RJ45 n/a n/a Fri Sep 18 20:08:05 2020
border01 swp2 down Unknown off RJ45 n/a n/a Fri Sep 18 20:08:05 2020
border01 swp3 up 1G off RJ45 n/a n/a Fri Sep 18 20:08:05 2020
border01 swp52 up 1G off RJ45 n/a n/a Fri Sep 18 20:08:05 2020
border01 swp1 down Unknown off RJ45 n/a n/a Fri Sep 18 20:08:05 2020
border01 swp53 up 1G off RJ45 n/a n/a Fri Sep 18 20:08:05 2020
border01 swp4 down Unknown off RJ45 n/a n/a Fri Sep 18 20:08:05 2020
border01 swp50 up 1G off RJ45 n/a n/a Fri Sep 18 20:08:05 2020
border01 eth0 up 1G off RJ45 n/a n/a Fri Sep 18 20:08:05 2020
border01 swp51 up 1G off RJ45 n/a n/a Fri Sep 18 20:08:05 2020
border02 swp49 up 1G off RJ45 n/a n/a Thu Sep 17 21:07:54 2020
border02 swp54 up 1G off RJ45 n/a n/a Thu Sep 17 21:07:54 2020
border02 swp52 up 1G off RJ45 n/a n/a Thu Sep 17 21:07:54 2020
border02 swp53 up 1G off RJ45 n/a n/a Thu Sep 17 21:07:54 2020
border02 swp4 down Unknown off RJ45 n/a n/a Thu Sep 17 21:07:54 2020
border02 swp3 up 1G off RJ45 n/a n/a Thu Sep 17 21:07:54 2020
border02 vagrant down Unknown off RJ45 n/a n/a Thu Sep 17 21:07:54 2020
border02 swp1 down Unknown off RJ45 n/a n/a Thu Sep 17 21:07:54 2020
border02 swp2 down Unknown off RJ45 n/a n/a Thu Sep 17 21:07:54 2020
border02 swp51 up 1G off RJ45 n/a n/a Thu Sep 17 21:07:54 2020
border02 swp50 up 1G off RJ45 n/a n/a Thu Sep 17 21:07:54 2020
border02 eth0 up 1G off RJ45 n/a n/a Thu Sep 17 21:07:54 2020
fw1 swp49 down Unknown off RJ45 n/a n/a Thu Sep 17 21:07:37 2020
fw1 eth0 up 1G off RJ45 n/a n/a Thu Sep 17 21:07:37 2020
fw1 swp1 up 1G off RJ45 n/a n/a Thu Sep 17 21:07:37 2020
fw1 swp2 up 1G off RJ45 n/a n/a Thu Sep 17 21:07:37 2020
fw1 vagrant down Unknown off RJ45 n/a n/a Thu Sep 17 21:07:37 2020
fw2 vagrant down Unknown off RJ45 n/a n/a Thu Sep 17 21:07:38 2020
fw2 eth0 up 1G off RJ45 n/a n/a Thu Sep 17 21:07:38 2020
fw2 swp49 down Unknown off RJ45 n/a n/a Thu Sep 17 21:07:38 2020
fw2 swp2 down Unknown off RJ45 n/a n/a Thu Sep 17 21:07:38 2020
fw2 swp1 down Unknown off RJ45 n/a n/a Thu Sep 17 21:07:38 2020
...
View Detailed Module Information for a Given Device
View detailed information about the transceiver modules on each interface port, including serial number, transceiver type, connector, and attached cable length.
View the module information for a given device by adding a hostname to show.
▼
show interfaces physical module
The following example shows detailed module information for the interface ports on leaf02 switch:
cumulus@switch:~$ netq leaf02 show interfaces physical module
Matching cables records are:
Hostname Interface Module Vendor Part No Serial No Transceiver Connector Length Last Changed
----------------- ------------------------- --------- -------------------- ---------------- ------------------------- ---------------- ---------------- ------ -------------------------
leaf02 swp1 RJ45 n/a n/a n/a n/a n/a n/a Thu Feb 7 22:49:37 2019
leaf02 swp2 SFP Mellanox MC2609130-003 MT1507VS05177 1000Base-CX,Copp Copper pigtail 3m Thu Feb 7 22:49:37 2019
er Passive,Twin
Axial Pair (TW)
leaf02 swp47 QSFP+ CISCO AFBR-7IER05Z-CS1 AVE1823402U n/a n/a 5m Thu Feb 7 22:49:37 2019
leaf02 swp48 QSFP28 TE Connectivity 2231368-1 15250052 100G Base-CR4 or n/a 3m Thu Feb 7 22:49:37 2019
25G Base-CR CA-L
,40G Base-CR4
leaf02 swp49 SFP OEM SFP-10GB-LR ACSLR130408 10G Base-LR LC 10km, Thu Feb 7 22:49:37 2019
10000m
leaf02 swp50 SFP JDSU PLRXPLSCS4322N CG03UF45M 10G Base-SR,Mult LC 80m, Thu Feb 7 22:49:37 2019
imode, 30m,
50um (M5),Multim 300m
ode,
62.5um (M6),Shor
twave laser w/o
OFC (SN),interme
diate distance (
I)
leaf02 swp51 SFP Mellanox MC2609130-003 MT1507VS05177 1000Base-CX,Copp Copper pigtail 3m Thu Feb 7 22:49:37 2019
er Passive,Twin
Axial Pair (TW)
leaf02 swp52 SFP FINISAR CORP. FCLF8522P2BTL PTN1VH2 1000Base-T RJ45 100m Thu Feb 7 22:49:37 2019
View Ports without Cables Connected for a Given Device
Check for empty ports and compare expected versus actual deployment.
View the cable information for a given device by adding a hostname to show.
▼
show interfaces physical empty
The following example shows the ports that are empty on leaf01 switch:
cumulus@switch:~$ netq leaf01 show interfaces physical empty
Matching cables records are:
Hostname Interface State Speed AutoNeg Module Vendor Part No Last Changed
---------------- --------- ----- ---------- ------- --------- ---------------- ---------------- ------------------------
leaf01 swp49 down Unknown on empty n/a n/a Thu Feb 7 22:49:37 2019
leaf01 swp52 down Unknown on empty n/a n/a Thu Feb 7 22:49:37 2019
View Ports with Cables Connected for a Given Device
Check for ports that have cables connected, and compare expected versus actual deployment.
View the cable information for a given device by adding a hostname to show.
▼
show interfaces physical plugged
The following example shows the ports of leaf01 switch that have attached cables:
cumulus@switch:~$ netq leaf01 show interfaces physical plugged
Matching cables records:
Hostname Interface State Speed AutoNeg Module Vendor Part No Last Changed
----------------- ------------------------- ---------- ---------- ------- --------- -------------------- ---------------- -------------------------
leaf01 eth0 up 1G on RJ45 n/a n/a Thu Feb 7 22:49:37 2019
leaf01 swp1 up 10G off SFP Amphenol 610640005 Thu Feb 7 22:49:37 2019
leaf01 swp2 up 10G off SFP Amphenol 610640005 Thu Feb 7 22:49:37 2019
leaf01 swp3 down 10G off SFP Mellanox MC3309130-001 Thu Feb 7 22:49:37 2019
leaf01 swp33 down 10G off SFP OEM SFP-H10GB-CU1M Thu Feb 7 22:49:37 2019
leaf01 swp34 down 10G off SFP Amphenol 571540007 Thu Feb 7 22:49:37 2019
leaf01 swp35 down 10G off SFP Amphenol 571540007 Thu Feb 7 22:49:37 2019
leaf01 swp36 down 10G off SFP OEM SFP-H10GB-CU1M Thu Feb 7 22:49:37 2019
leaf01 swp37 down 10G off SFP OEM SFP-H10GB-CU1M Thu Feb 7 22:49:37 2019
leaf01 swp38 down 10G off SFP OEM SFP-H10GB-CU1M Thu Feb 7 22:49:37 2019
leaf01 swp39 down 10G off SFP Amphenol 571540007 Thu Feb 7 22:49:37 2019
leaf01 swp40 down 10G off SFP Amphenol 571540007 Thu Feb 7 22:49:37 2019
leaf01 swp49 up 40G off QSFP+ Amphenol 624410001 Thu Feb 7 22:49:37 2019
leaf01 swp5 down 10G off SFP Amphenol 571540007 Thu Feb 7 22:49:37 2019
leaf01 swp50 down 40G off QSFP+ Amphenol 624410001 Thu Feb 7 22:49:37 2019
leaf01 swp51 down 40G off QSFP+ Amphenol 603020003 Thu Feb 7 22:49:37 2019
leaf01 swp52 up 40G off QSFP+ Amphenol 603020003 Thu Feb 7 22:49:37 2019
leaf01 swp54 down 40G off QSFP+ Amphenol 624410002 Thu Feb 7 22:49:37 2019
View Components from a Given Vendor
Filter for a specific cable vendor to collect information such as how many ports use components from that vendor and when they were last updated.
▼
show interfaces physical vendor
The following example shows all the ports that are using components by an OEM vendor:
cumulus@switch:~$ netq leaf01 show interfaces physical vendor OEM
Matching cables records:
Hostname Interface State Speed AutoNeg Module Vendor Part No Last Changed
----------------- ------------------------- ---------- ---------- ------- --------- -------------------- ---------------- -------------------------
leaf01 swp33 down 10G off SFP OEM SFP-H10GB-CU1M Thu Feb 7 22:49:37 2019
leaf01 swp36 down 10G off SFP OEM SFP-H10GB-CU1M Thu Feb 7 22:49:37 2019
leaf01 swp37 down 10G off SFP OEM SFP-H10GB-CU1M Thu Feb 7 22:49:37 2019
leaf01 swp38 down 10G off SFP OEM SFP-H10GB-CU1M Thu Feb 7 22:49:37 2019
View All Devices Using a Given Component
View all devices with ports using a particular component.
▼
show interfaces physical model
The following example first determines which models (part numbers) exist on all the devices and then displays devices with a part number of QSFP-H40G-CU1M installed:
cumulus@switch:~$ netq show interfaces physical model
2231368-1 : 2231368-1
624400001 : 624400001
QSFP-H40G-CU1M : QSFP-H40G-CU1M
QSFP-H40G-CU1MUS : QSFP-H40G-CU1MUS
n/a : n/a
cumulus@switch:~$ netq show interfaces physical model QSFP-H40G-CU1M
Matching cables records:
Hostname Interface State Speed AutoNeg Module Vendor Part No Last Changed
----------------- ------------------------- ---------- ---------- ------- --------- -------------------- ---------------- -------------------------
leaf01 swp50 up 1G off QSFP+ OEM QSFP-H40G-CU1M Thu Feb 7 18:31:20 2019
leaf02 swp52 up 1G off QSFP+ OEM QSFP-H40G-CU1M Thu Feb 7 18:31:20 2019
View Changes to Physical Components
View changes to the physical components on your devices.
▼
show events type interfaces-physical with time constraints
The following example illustrates each of these scenarios for all devices in the network:
cumulus@switch:~$ netq show events message_type interfaces-physical between now and 30d
Matching cables records:
Hostname Interface State Speed AutoNeg Module Vendor Part No Last Changed
----------------- ------------------------- ---------- ---------- ------- --------- -------------------- ---------------- -------------------------
leaf01 swp1 up 1G off SFP AVAGO AFBR-5715PZ-JU1 Thu Feb 7 18:34:20 2019
leaf01 swp2 up 10G off SFP OEM SFP-10GB-LR Thu Feb 7 18:34:20 2019
leaf01 swp47 up 10G off SFP JDSU PLRXPLSCS4322N Thu Feb 7 18:34:20 2019
leaf01 swp48 up 40G off QSFP+ Mellanox MC2210130-002 Thu Feb 7 18:34:20 2019
leaf01 swp49 down 10G off empty n/a n/a Thu Feb 7 18:34:20 2019
leaf01 swp50 up 1G off SFP FINISAR CORP. FCLF8522P2BTL Thu Feb 7 18:34:20 2019
leaf01 swp51 up 1G off SFP FINISAR CORP. FTLF1318P3BTL Thu Feb 7 18:34:20 2019
leaf01 swp52 down 1G off SFP CISCO-AGILENT QFBR-5766LP Thu Feb 7 18:34:20 2019
leaf02 swp1 up 1G on RJ45 n/a n/a Thu Feb 7 18:34:20 2019
leaf02 swp2 up 10G off SFP Mellanox MC2609130-003 Thu Feb 7 18:34:20 2019
leaf02 swp47 up 10G off QSFP+ CISCO AFBR-7IER05Z-CS1 Thu Feb 7 18:34:20 2019
leaf02 swp48 up 10G off QSFP+ Mellanox MC2609130-003 Thu Feb 7 18:34:20 2019
leaf02 swp49 up 10G off SFP FIBERSTORE SFP-10GLR-31 Thu Feb 7 18:34:20 2019
leaf02 swp50 up 1G off SFP OEM SFP-GLC-T Thu Feb 7 18:34:20 2019
leaf02 swp51 up 10G off SFP Mellanox MC2609130-003 Thu Feb 7 18:34:20 2019
leaf02 swp52 up 1G off SFP FINISAR CORP. FCLF8522P2BTL Thu Feb 7 18:34:20 2019
leaf03 swp1 up 10G off SFP Mellanox MC2609130-003 Thu Feb 7 18:34:20 2019
leaf03 swp2 up 10G off SFP Mellanox MC3309130-001 Thu Feb 7 18:34:20 2019
leaf03 swp47 up 10G off SFP CISCO-AVAGO AFBR-7IER05Z-CS1 Thu Feb 7 18:34:20 2019
leaf03 swp48 up 10G off SFP Mellanox MC3309130-001 Thu Feb 7 18:34:20 2019
leaf03 swp49 down 1G off SFP FINISAR CORP. FCLF8520P2BTL Thu Feb 7 18:34:20 2019
leaf03 swp50 up 1G off SFP FINISAR CORP. FCLF8522P2BTL Thu Feb 7 18:34:20 2019
leaf03 swp51 up 10G off QSFP+ Mellanox MC2609130-003 Thu Feb 7 18:34:20 2019
...
oob-mgmt-server swp1 up 1G off RJ45 n/a n/a Thu Feb 7 18:34:20 2019
oob-mgmt-server swp2 up 1G off RJ45 n/a n/a Thu Feb 7 18:34:20 2019
cumulus@switch:~$ netq show events interfaces-physical between 6d and 16d
Matching cables records:
Hostname Interface State Speed AutoNeg Module Vendor Part No Last Changed
----------------- ------------------------- ---------- ---------- ------- --------- -------------------- ---------------- -------------------------
leaf01 swp1 up 1G off SFP AVAGO AFBR-5715PZ-JU1 Thu Feb 7 18:34:20 2019
leaf01 swp2 up 10G off SFP OEM SFP-10GB-LR Thu Feb 7 18:34:20 2019
leaf01 swp47 up 10G off SFP JDSU PLRXPLSCS4322N Thu Feb 7 18:34:20 2019
leaf01 swp48 up 40G off QSFP+ Mellanox MC2210130-002 Thu Feb 7 18:34:20 2019
leaf01 swp49 down 10G off empty n/a n/a Thu Feb 7 18:34:20 2019
leaf01 swp50 up 1G off SFP FINISAR CORP. FCLF8522P2BTL Thu Feb 7 18:34:20 2019
leaf01 swp51 up 1G off SFP FINISAR CORP. FTLF1318P3BTL Thu Feb 7 18:34:20 2019
leaf01 swp52 down 1G off SFP CISCO-AGILENT QFBR-5766LP Thu Feb 7 18:34:20 2019
...
cumulus@switch:~$ netq show events message_type interfaces-physical between 0s and 5h
No matching cables records found
View Utilization Statistics Networkwide
Utilization statistics can indicate whether resources are becoming dangerously close to their maximum capacity or other, user-defined thresholds. Depending on the function of the switch, the acceptable thresholds can vary.
View Compute Resources Utilization
View how many compute resources—CPU, disk, and memory—the switches on your network consume:
netq <hostname> show resource-util [cpu | memory] [around <text-time>] [json]
netq <hostname> show resource-util disk [<text-diskname>] [around <text-time>] [json]
If you do not specify options, the output shows the percentage of CPU and memory the switch consumed as well as the amount and percentage of disk space it consumed.
▼
show resource-util
The following example shows the CPU, memory, and disk utilization for all devices:
cumulus@switch:~$ netq show resource-util
Matching resource_util records:
Hostname CPU Utilization Memory Utilization Disk Name Total Used Disk Utilization Last Updated
----------------- -------------------- -------------------- -------------------- -------------------- -------------------- -------------------- ------------------------
exit01 9.2 48 /dev/vda4 6170849280 1524920320 26.8 Wed Feb 12 03:54:10 2020
exit02 9.6 47.6 /dev/vda4 6170849280 1539346432 27.1 Wed Feb 12 03:54:22 2020
leaf01 9.8 50.5 /dev/vda4 6170849280 1523818496 26.8 Wed Feb 12 03:54:25 2020
leaf02 10.9 49.4 /dev/vda4 6170849280 1535246336 27 Wed Feb 12 03:54:11 2020
leaf03 11.4 49.4 /dev/vda4 6170849280 1536798720 27 Wed Feb 12 03:54:10 2020
leaf04 11.4 49.4 /dev/vda4 6170849280 1522495488 26.8 Wed Feb 12 03:54:03 2020
spine01 8.4 50.3 /dev/vda4 6170849280 1522249728 26.8 Wed Feb 12 03:54:19 2020
spine02 9.8 49 /dev/vda4 6170849280 1522003968 26.8 Wed Feb 12 03:54:25 2020
The following example shows only the CPU utilization for all devices:
cumulus@switch:~$ netq show resource-util cpu
Matching resource_util records:
Hostname CPU Utilization Last Updated
----------------- -------------------- ------------------------
exit01 8.9 Wed Feb 12 04:29:29 2020
exit02 8.3 Wed Feb 12 04:29:22 2020
leaf01 10.9 Wed Feb 12 04:29:24 2020
leaf02 11.6 Wed Feb 12 04:29:10 2020
leaf03 9.8 Wed Feb 12 04:29:33 2020
leaf04 11.7 Wed Feb 12 04:29:29 2020
spine01 10.4 Wed Feb 12 04:29:38 2020
spine02 9.7 Wed Feb 12 04:29:15 2020
The following example shows only the memory utilization for all devices:
cumulus@switch:~$ netq show resource-util memory
Matching resource_util records:
Hostname Memory Utilization Last Updated
----------------- -------------------- ------------------------
exit01 48.8 Wed Feb 12 04:29:29 2020
exit02 49.7 Wed Feb 12 04:29:22 2020
leaf01 49.8 Wed Feb 12 04:29:24 2020
leaf02 49.5 Wed Feb 12 04:29:10 2020
leaf03 50.7 Wed Feb 12 04:29:33 2020
leaf04 49.3 Wed Feb 12 04:29:29 2020
spine01 47.5 Wed Feb 12 04:29:07 2020
spine02 49.2 Wed Feb 12 04:29:15 2020
The following example shows only the disk utilization for all devices:
cumulus@switch:~$ netq show resource-util disk
Matching resource_util records:
Hostname Disk Name Total Used Disk Utilization Last Updated
----------------- -------------------- -------------------- -------------------- -------------------- ------------------------
exit01 /dev/vda4 6170849280 1525309440 26.8 Wed Feb 12 04:29:29 2020
exit02 /dev/vda4 6170849280 1539776512 27.1 Wed Feb 12 04:29:22 2020
leaf01 /dev/vda4 6170849280 1524203520 26.8 Wed Feb 12 04:29:24 2020
leaf02 /dev/vda4 6170849280 1535631360 27 Wed Feb 12 04:29:41 2020
leaf03 /dev/vda4 6170849280 1537191936 27.1 Wed Feb 12 04:29:33 2020
leaf04 /dev/vda4 6170849280 1522864128 26.8 Wed Feb 12 04:29:29 2020
spine01 /dev/vda4 6170849280 1522688000 26.8 Wed Feb 12 04:29:38 2020
spine02 /dev/vda4 6170849280 1522409472 26.8 Wed Feb 12 04:29:46 2020
View Port Statistics
View statistics about a given node and interface, including frame errors, ACL drops, and buffer drops, with ethtool:
netq [<hostname>] show ethtool-stats port <physical-port> (rx | tx) [extended] [around <text-time>] [json]
If there are no changes, a “No matching ethtool_stats records found” message appears.
▼
show ethtool-stats port
The following example shows the transmit statistics for switch port swp50 on a the leaf01 switch in the network:
NetQ Agents collect performance statistics every 30 seconds for the physical interfaces on switches in your network. The NetQ Agent does not collect statistics for non-physical interfaces, such as bonds, bridges, and VXLANs. The NetQ Agent collects the following statistics:
For NetQ Appliances that have 3ME3 solid state drives (SSDs) installed (primarily in on-premises deployments), you can view the utilization of the drive on demand. A warning is generated when a drive drops below 10% health, or has more than a 2% loss of health in 24 hours, indicating the need to rebalance the drive. Tracking SSD utilization over time lets you see any downward trend or drive instability before you receive a warning message.
To view SDD utilization, run:
netq show cl-ssd-util [around <text-time>] [json]
▼
show cl-ssd-util
The following example shows the utilization for all devices which have this type of SSD:
cumulus@switch:~$ netq show cl-ssd-util
Hostname Remaining PE Cycle (%) Current PE Cycles executed Total PE Cycles supported SSD Model Last Changed
spine02 80 576 2880 M.2 (S42) 3ME3 Thu Oct 31 00:15:06 2019
This output indicates that the one drive found of this type, on the spine02 switch, is in a good state overall with 80% of its PE cycles remaining.
View Disk Storage After BTRFS Allocation Networkwide
Customers running Cumulus Linux 3 which uses the BTRFS (b-tree file system) might experience issues with disk space management. This is a known problem of BTRFS because it does not perform periodic garbage collection, or rebalancing. If left unattended, these errors can make it impossible to rebalance the partitions on the disk. To avoid this issue, NVIDIA recommends rebalancing the BTRFS partitions in a preemptive manner, but only when absolutely needed to avoid reduction in the lifetime of the disk. By tracking the state of the disk space usage, users can determine when to rebalance.
netq show cl-btrfs-util [around <text-time>] [json]
▼
show cl-btrfs-info
The following example shows the utilization on all devices:
cumulus@switch:~$ netq show cl-btrfs-info
Matching btrfs_info records:
Hostname Device Allocated Unallocated Space Largest Chunk Size Unused Data Chunks S Rebalance Recommended Last Changed
pace d
----------------- -------------------- -------------------- -------------------- -------------------- -------------------- -------------------------
leaf01 37.79 % 3.58 GB 588.5 MB 771.91 MB yes Wed Sep 16 21:25:17 2020
Look for the Rebalance Recommended column. If the value in that column says Yes, then you are strongly encouraged to rebalance the BTRFS partitions. If it says No, then you can review the other values in the output to determine if you are getting close to needing a rebalance, and come back to view this data at a later time.
Interfaces
Use the CLI to monitor interface (link) health using the netq show interfaces command.
The syntax for interface commands is:
netq show interfaces type (bond|bridge|eth|loopback|macvlan|swp|vlan|vrf|vxlan) [state <remote-interface-state>] [around <text-time>] [json]
netq <hostname> show interfaces type (bond|bridge|eth|loopback|macvlan|swp|vlan|vrf|vxlan) [state <remote-interface-state>] [around <text-time>] [count] [json]
netq [<hostname>] show events [severity info | severity error ] message_type interfaces [between <text-time> and <text-endtime>] [json]
View Status for All Interfaces
Viewing the status of all interfaces at one time can be helpful when you are trying to compare the configuration or status of a set of links.
▼
show interfaces
The following example shows all interfaces networkwide:
cumulus@switch:~$ netq show interfaces
Matching link records:
Hostname Interface Type State VRF Details Last Changed
----------------- ------------------------- ---------------- ---------- --------------- ----------------------------------- -------------------------
exit01 bridge bridge up default , Root bridge: exit01, Mon Apr 29 20:57:59 2019
Root port: , Members: vxlan4001,
bridge,
exit01 eth0 eth up mgmt MTU: 1500 Mon Apr 29 20:57:59 2019
exit01 lo loopback up default MTU: 65536 Mon Apr 29 20:57:58 2019
exit01 mgmt vrf up table: 1001, MTU: 65536, Mon Apr 29 20:57:58 2019
Members: mgmt, eth0,
exit01 swp1 swp down default VLANs: , PVID: 0 MTU: 1500 Mon Apr 29 20:57:59 2019
exit01 swp44 swp up vrf1 VLANs: , Mon Apr 29 20:57:58 2019
PVID: 0 MTU: 1500 LLDP: internet:sw
p1
exit01 swp45 swp down default VLANs: , PVID: 0 MTU: 1500 Mon Apr 29 20:57:59 2019
exit01 swp46 swp down default VLANs: , PVID: 0 MTU: 1500 Mon Apr 29 20:57:59 2019
exit01 swp47 swp down default VLANs: , PVID: 0 MTU: 1500 Mon Apr 29 20:57:59 2019
...
leaf01 bond01 bond up default Slave:swp1 LLDP: server01:eth1 Mon Apr 29 20:57:59 2019
leaf01 bond02 bond up default Slave:swp2 LLDP: server02:eth1 Mon Apr 29 20:57:59 2019
leaf01 bridge bridge up default , Root bridge: leaf01, Mon Apr 29 20:57:59 2019
Root port: , Members: vxlan4001,
bond02, vni24, vni13, bond01,
bridge, peerlink,
leaf01 eth0 eth up mgmt MTU: 1500 Mon Apr 29 20:58:00 2019
leaf01 lo loopback up default MTU: 65536 Mon Apr 29 20:57:59 2019
leaf01 mgmt vrf up table: 1001, MTU: 65536, Mon Apr 29 20:57:59 2019
Members: mgmt, eth0,
leaf01 peerlink bond up default Slave:swp50 LLDP: leaf02:swp49 LLDP Mon Apr 29 20:58:00 2019
: leaf02:swp50
...
View Interface Status for a Given Device
View the status of interfaces on a specific device.
▼
spine01 show interfaces
The following example shows all interfaces on spine01:
cumulus@switch:~$ netq spine01 show interfaces
Matching link records:
Hostname Interface Type State VRF Details Last Changed
----------------- ------------------------- ---------------- ---------- --------------- ----------------------------------- -------------------------
spine01 swp5 swp up default VLANs: , Mon Jan 11 05:56:54 2021
PVID: 0 MTU: 9216 LLDP: border01:sw
p51
spine01 swp6 swp up default VLANs: , Mon Jan 11 05:56:54 2021
PVID: 0 MTU: 9216 LLDP: border02:sw
p51
spine01 lo loopback up default MTU: 65536 Mon Jan 11 05:56:54 2021
spine01 eth0 eth up mgmt MTU: 1500 Mon Jan 11 05:56:54 2021
spine01 vagrant swp down default VLANs: , PVID: 0 MTU: 1500 Mon Jan 11 05:56:54 2021
spine01 mgmt vrf up mgmt table: 1001, MTU: 65536, Mon Jan 11 05:56:54 2021
Members: eth0, mgmt,
spine01 swp1 swp up default VLANs: , Mon Jan 11 05:56:54 2021
PVID: 0 MTU: 9216 LLDP: leaf01:swp5
1
spine01 swp2 swp up default VLANs: , Mon Jan 11 05:56:54 2021
PVID: 0 MTU: 9216 LLDP: leaf02:swp5
1
spine01 swp3 swp up default VLANs: , Mon Jan 11 05:56:54 2021
PVID: 0 MTU: 9216 LLDP: leaf03:swp5
1
spine01 swp4 swp up default VLANs: , Mon Jan 11 05:56:54 2021
PVID: 0 MTU: 9216 LLDP: leaf04:swp5
1
cumulus@switch:~$
View All Interfaces of a Given Type
View the status of a particular type of interface.
▼
show interfaces type bond state
The following example shows all bond interfaces that are alternately down and up:
cumulus@switch:~$ netq show interfaces type bond state down
No matching link records found
cumulus@switch:~$ netq show interfaces type bond state up
Matching link records:
Hostname Interface Type State VRF Details Last Changed
----------------- ------------------------- ---------------- ---------- --------------- ----------------------------------- -------------------------
border01 peerlink bond up default Slave: swp49 (LLDP: border02:swp49) Mon Jan 11 05:56:35 2021
,
Slave: swp50 (LLDP: border02:swp50)
border01 bond1 bond up default Slave: swp3 (LLDP: fw1:swp1) Mon Jan 11 05:56:36 2021
border02 peerlink bond up default Slave: swp49 (LLDP: border01:swp49) Mon Jan 11 05:56:38 2021
,
Slave: swp50 (LLDP: border01:swp50)
border02 bond1 bond up default Slave: swp3 (LLDP: fw1:swp2) Mon Jan 11 05:56:38 2021
fw1 borderBond bond up default Slave: swp1 (LLDP: border01:swp3), Mon Jan 11 05:56:36 2021
Slave: swp2 (LLDP: border02:swp3)
leaf01 bond2 bond up default Slave: swp2 (LLDP: server02:mac:44: Mon Jan 11 05:56:39 2021
38:39:00:00:34)
leaf01 peerlink bond up default Slave: swp49 (LLDP: leaf02:swp49), Mon Jan 11 05:56:39 2021
Slave: swp50 (LLDP: leaf02:swp50)
leaf01 bond3 bond up default Slave: swp3 (LLDP: server03:mac:44: Mon Jan 11 05:56:39 2021
38:39:00:00:36)
leaf01 bond1 bond up default Slave: swp1 (LLDP: server01:mac:44: Mon Jan 11 05:56:39 2021
38:39:00:00:32)
leaf02 bond2 bond up default Slave: swp2 (LLDP: server02:mac:44: Mon Jan 11 05:56:31 2021
38:39:00:00:3a)
leaf02 peerlink bond up default Slave: swp49 (LLDP: leaf01:swp49), Mon Jan 11 05:56:31 2021
Slave: swp50 (LLDP: leaf01:swp50)
leaf02 bond3 bond up default Slave: swp3 (LLDP: server03:mac:44: Mon Jan 11 05:56:31 2021
38:39:00:00:3c)
leaf02 bond1 bond up default Slave: swp1 (LLDP: server01:mac:44: Mon Jan 11 05:56:31 2021
38:39:00:00:38)
leaf03 bond2 bond up default Slave: swp2 (LLDP: server05:mac:44: Mon Jan 11 05:56:37 2021
38:39:00:00:40)
leaf03 peerlink bond up default Slave: swp49 (LLDP: leaf04:swp49), Mon Jan 11 05:56:37 2021
Slave: swp50 (LLDP: leaf04:swp50)
leaf03 bond3 bond up default Slave: swp3 (LLDP: server06:mac:44: Mon Jan 11 05:56:37 2021
38:39:00:00:42)
leaf03 bond1 bond up default Slave: swp1 (LLDP: server04:mac:44: Mon Jan 11 05:56:37 2021
38:39:00:00:3e)
leaf04 bond2 bond up default Slave: swp2 (LLDP: server05:mac:44: Mon Jan 11 05:56:43 2021
38:39:00:00:46)
leaf04 peerlink bond up default Slave: swp49 (LLDP: leaf03:swp49), Mon Jan 11 05:56:43 2021
Slave: swp50 (LLDP: leaf03:swp50)
leaf04 bond3 bond up default Slave: swp3 (LLDP: server06:mac:44: Mon Jan 11 05:56:43 2021
38:39:00:00:48)
leaf04 bond1 bond up default Slave: swp1 (LLDP: server04:mac:44: Mon Jan 11 05:56:43 2021
38:39:00:00:44)
server01 uplink bond up default Slave: eth2 (LLDP: leaf02:swp1), Mon Jan 11 05:35:22 2021
Slave: eth1 (LLDP: leaf01:swp1)
server02 uplink bond up default Slave: eth2 (LLDP: leaf02:swp2), Mon Jan 11 05:34:52 2021
Slave: eth1 (LLDP: leaf01:swp2)
server03 uplink bond up default Slave: eth2 (LLDP: leaf02:swp3), Mon Jan 11 05:34:47 2021
Slave: eth1 (LLDP: leaf01:swp3)
server04 uplink bond up default Slave: eth2 (LLDP: leaf04:swp1), Mon Jan 11 05:34:52 2021
Slave: eth1 (LLDP: leaf03:swp1)
server05 uplink bond up default Slave: eth2 (LLDP: leaf04:swp2), Mon Jan 11 05:34:41 2021
Slave: eth1 (LLDP: leaf03:swp2)
server06 uplink bond up default Slave: eth2 (LLDP: leaf04:swp3), Mon Jan 11 05:35:03 2021
Slave: eth1 (LLDP: leaf03:swp3)
View the Total Number of Interfaces
To display the number of interfaces currently operating on a device, use the hostname and count options together.
▼
leaf03 show interfaces count
The following example shows the count of interfaces on the leaf03 switch:
cumulus@switch:~$ netq leaf03 show interfaces count
Count of matching link records: 28
View the Total Number of a Given Interface Type
View the number of interfaces of a particular type on a given device.
▼
leaf03 show interfaces type swp count
The following example shows the count of swp interfaces are on the leaf03 switch:
cumulus@switch:~$ netq leaf03 show interfaces type swp count
Count of matching link records: 11
View Aliases for Interfaces
View which interfaces have aliases.
▼
show interfaces alias swp2
If you do not specify a switch port or host, the command returns all configured aliases.
cumulus@switch:~$ netq show interfaces alias swp2
Matching link records:
Hostname Interface Alias State Last Changed
----------------- ------------------------- ------------------------------ ------- -------------------------
border01 swp2 down Mon Jan 11 05:56:35 2021
border02 swp2 down Mon Jan 11 05:56:38 2021
fw1 swp2 up Mon Jan 11 05:56:36 2021
fw2 swp2 rocket down Mon Jan 11 05:56:34 2021
leaf01 swp2 up Mon Jan 11 23:16:42 2021
leaf02 swp2 turtle up Mon Jan 11 05:56:30 2021
leaf03 swp2 up Mon Jan 11 05:56:37 2021
leaf04 swp2 up Mon Jan 11 05:56:43 2021
spine01 swp2 up Mon Jan 11 05:56:54 2021
spine02 swp2 up Mon Jan 11 05:56:35 2021
spine03 swp2 up Mon Jan 11 05:56:35 2021
spine04 swp2 up Mon Jan 11 05:56:35 2021
Check for MTU Inconsistencies
The maximum transmission unit (MTU) determines the largest size packet or frame that can be transmitted across a given communication link. When the MTU is not configured to the same value on both ends of the link, communication problems can occur. Use the netq check mtu command to verify that the MTU is correctly specified for each link.
▼
check mtu
The following example shows that four switches have inconsistently specified link MTUs. The network administrator or operator can reconfigure the switches and eliminate the communication issues associated with this misconfiguration.
cumulus@switch:~$ netq check mtu
Checked Nodes: 15, Checked Links: 215, Failed Nodes: 4, Failed Links: 7
MTU mismatch found on following links
Hostname Interface MTU Peer Peer Interface Peer MTU Error
----------------- ------------------------- ------ ----------------- ------------------------- -------- ---------------
spine01 swp30 9216 exit01 swp51 1500 MTU Mismatch
exit01 swp51 1500 spine01 swp30 9216 MTU Mismatch
spine01 swp29 9216 exit02 swp51 1500 MTU Mismatch
exit02 - - - - - Rotten Agent
exit01 swp52 1500 spine02 swp30 9216 MTU Mismatch
spine02 swp30 9216 exit01 swp52 1500 MTU Mismatch
spine02 swp29 9216 exit02 swp52 1500 MTU Mismatch
IP Addresses
Use the CLI to monitor IP (Internet Protocol) addresses, neighbors, and routes.
This information can help you:
Determine the IP neighbors for each switch.
Calculate the total number of IPv4 and IPv6 addresses and their corresponding interfaces.
Identify which routes are owned by which switches.
Pinpoint when changes occurred to an IP configuration.
Run netq show ip to display address, neighbor, and route information for your devices:
netq <hostname> show ip addresses [<remote-interface>] [<ipv4>|<ipv4/prefixlen>] [vrf <vrf>] [around <text-time>] [count] [json]
netq [<hostname>] show ip addresses [<remote-interface>] [<ipv4>|<ipv4/prefixlen>] [vrf <vrf>] [around <text-time>] [json]
netq show ip addresses [<remote-interface>] [<ipv4>|<ipv4/prefixlen>] [vrf <vrf>] [subnet|supernet|gateway] [around <text-time>] [json]
netq <hostname> show ip neighbors [<remote-interface>] [<ipv4>|<ipv4> vrf <vrf>|vrf <vrf>] [<mac>] [around <text-time>] [json]
netq [<hostname>] show ip neighbors [<remote-interface>] [<ipv4>|<ipv4> vrf <vrf>|vrf <vrf>] [<mac>] [around <text-time>] [count] [json]
netq <hostname> show ip routes [<ipv4>|<ipv4/prefixlen>] [vrf <vrf>] [origin] [around <text-time>] [count] [json]
netq [<hostname>] show ip routes [<ipv4>|<ipv4/prefixlen>] [vrf <vrf>] [origin] [around <text-time>] [json]
netq <hostname> show ipv6 addresses [<remote-interface>] [<ipv6>|<ipv6/prefixlen>] [vrf <vrf>] [around <text-time>] [count] [json]
netq [<hostname>] show ipv6 addresses [<remote-interface>] [<ipv6>|<ipv6/prefixlen>] [vrf <vrf>] [around <text-time>] [json]
netq show ipv6 addresses [<remote-interface>] [<ipv6>|<ipv6/prefixlen>] [vrf <vrf>] [subnet|supernet|gateway] [around <text-time>] [json]
netq <hostname> show ipv6 neighbors [<remote-interface>] [<ipv6>|<ipv6> vrf <vrf>|vrf <vrf>] [<mac>] [around <text-time>] [count] [json]
netq [<hostname>] show ipv6 neighbors [<remote-interface>] [<ipv6>|<ipv6> vrf <vrf>|vrf <vrf>] [<mac>] [around <text-time>] [json]
netq <hostname> show ipv6 routes [<ipv6>|<ipv6/prefixlen>] [vrf <vrf>] [origin] [around <text-time>] [count] [json]
netq [<hostname>] show ipv6 routes [<ipv6>|<ipv6/prefixlen>] [vrf <vrf>] [origin] [around <text-time>] [json]
View IP Address Information
You can view IPv4 and IPv6 address information for all devices, including the interface and VRF for each device.
View IPv4 Address Information for All Devices
To view only IPv4 addresses, run netq show ip addresses.
▼
show ip addresses
The following example shows all IPv4 addresses in the reference topology:
cumulus@switch:~$ netq show ip addresses
Matching address records:
Address Hostname Interface VRF Last Changed
------------------------- ----------------- ------------------------- --------------- -------------------------
10.10.10.104/32 spine04 lo default Mon Oct 19 22:28:23 2020
192.168.200.24/24 spine04 eth0 Tue Oct 20 15:46:20 2020
10.10.10.103/32 spine03 lo default Mon Oct 19 22:29:01 2020
192.168.200.23/24 spine03 eth0 Tue Oct 20 15:19:24 2020
192.168.200.22/24 spine02 eth0 Tue Oct 20 15:40:03 2020
10.10.10.102/32 spine02 lo default Mon Oct 19 22:28:45 2020
192.168.200.21/24 spine01 eth0 Tue Oct 20 15:59:36 2020
10.10.10.101/32 spine01 lo default Mon Oct 19 22:28:48 2020
192.168.200.38/24 server08 eth0 default Mon Oct 19 22:28:50 2020
192.168.200.37/24 server07 eth0 default Mon Oct 19 22:28:43 2020
192.168.200.36/24 server06 eth0 default Mon Oct 19 22:40:52 2020
10.1.20.105/24 server05 uplink default Mon Oct 19 22:41:08 2020
10.1.10.104/24 server04 uplink default Mon Oct 19 22:40:45 2020
192.168.200.33/24 server03 eth0 default Mon Oct 19 22:41:04 2020
192.168.200.32/24 server02 eth0 default Mon Oct 19 22:41:00 2020
10.1.10.101/24 server01 uplink default Mon Oct 19 22:40:36 2020
10.255.1.228/24 oob-mgmt-server vagrant default Mon Oct 19 22:28:20 2020
192.168.200.1/24 oob-mgmt-server eth1 default Mon Oct 19 22:28:20 2020
10.1.20.3/24 leaf04 vlan20 RED Mon Oct 19 22:28:47 2020
10.1.10.1/24 leaf04 vlan10-v0 RED Mon Oct 19 22:28:47 2020
192.168.200.14/24 leaf04 eth0 Tue Oct 20 15:56:40 2020
10.10.10.4/32 leaf04 lo default Mon Oct 19 22:28:47 2020
10.1.20.1/24 leaf04 vlan20-v0 RED Mon Oct 19 22:28:47 2020
...
View IPv6 Address Information for All Devices
To view only IPv6 addresses, run netq show ipv6 addresses.
▼
show ipv6 addresses
The following example shows all IPv6 addresses in the reference topology:
cumulus@switch:~$ netq show ipv6 addresses
Matching address records:
Address Hostname Interface VRF Last Changed
------------------------- ----------------- ------------------------- --------------- -------------------------
fe80::4638:39ff:fe00:16c/ spine04 eth0 Mon Oct 19 22:28:23 2020
64
fe80::4638:39ff:fe00:27/6 spine04 swp5 default Mon Oct 19 22:28:23 2020
4
fe80::4638:39ff:fe00:2f/6 spine04 swp6 default Mon Oct 19 22:28:23 2020
4
fe80::4638:39ff:fe00:17/6 spine04 swp3 default Mon Oct 19 22:28:23 2020
4
fe80::4638:39ff:fe00:1f/6 spine04 swp4 default Mon Oct 19 22:28:23 2020
4
fe80::4638:39ff:fe00:7/64 spine04 swp1 default Mon Oct 19 22:28:23 2020
fe80::4638:39ff:fe00:f/64 spine04 swp2 default Mon Oct 19 22:28:23 2020
fe80::4638:39ff:fe00:2d/6 spine03 swp6 default Mon Oct 19 22:29:01 2020
4
fe80::4638:39ff:fe00:25/6 spine03 swp5 default Mon Oct 19 22:29:01 2020
4
fe80::4638:39ff:fe00:170/ spine03 eth0 Mon Oct 19 22:29:01 2020
64
fe80::4638:39ff:fe00:15/6 spine03 swp3 default Mon Oct 19 22:29:01 2020
4
...
Filter IP Address Information
You can filter IP address information by hostname, interface, or VRF.
▼
show ip addresses eth0
The following example shows the IPv4 address information for the eth0 interface on all devices:
cumulus@switch:~$ netq show ip addresses eth0
Matching address records:
Address Hostname Interface VRF Last Changed
------------------------- ----------------- ------------------------- --------------- -------------------------
192.168.200.24/24 spine04 eth0 Tue Oct 20 15:46:20 2020
192.168.200.23/24 spine03 eth0 Tue Oct 20 15:19:24 2020
192.168.200.22/24 spine02 eth0 Tue Oct 20 15:40:03 2020
192.168.200.21/24 spine01 eth0 Tue Oct 20 15:59:36 2020
192.168.200.38/24 server08 eth0 default Mon Oct 19 22:28:50 2020
192.168.200.37/24 server07 eth0 default Mon Oct 19 22:28:43 2020
192.168.200.36/24 server06 eth0 default Mon Oct 19 22:40:52 2020
192.168.200.35/24 server05 eth0 default Mon Oct 19 22:41:08 2020
192.168.200.34/24 server04 eth0 default Mon Oct 19 22:40:45 2020
192.168.200.33/24 server03 eth0 default Mon Oct 19 22:41:04 2020
192.168.200.32/24 server02 eth0 default Mon Oct 19 22:41:00 2020
192.168.200.31/24 server01 eth0 default Mon Oct 19 22:40:36 2020
192.168.200.14/24 leaf04 eth0 Tue Oct 20 15:56:40 2020
192.168.200.13/24 leaf03 eth0 Tue Oct 20 15:40:56 2020
192.168.200.12/24 leaf02 eth0 Tue Oct 20 15:43:24 2020
192.168.200.11/24 leaf01 eth0 Tue Oct 20 16:12:00 2020
192.168.200.62/24 fw2 eth0 Tue Oct 20 15:31:29 2020
192.168.200.61/24 fw1 eth0 Tue Oct 20 15:56:03 2020
192.168.200.64/24 border02 eth0 Tue Oct 20 15:20:23 2020
192.168.200.63/24 border01 eth0 Tue Oct 20 15:46:57 2020
▼
leaf01 show ipv6 addresses
The following example shows the IPv6 address information for the leaf01 switch:
cumulus@switch:~$ netq leaf01 show ipv6 addresses
Matching address records:
Address Hostname Interface VRF Last Changed
------------------------- ----------------- ------------------------- --------------- -------------------------
fe80::4638:39ff:febe:efaa leaf01 vlan4002 BLUE Mon Oct 19 22:28:22 2020
/64
fe80::4638:39ff:fe00:8/64 leaf01 swp54 default Mon Oct 19 22:28:22 2020
fe80::4638:39ff:fe00:59/6 leaf01 vlan10 RED Mon Oct 19 22:28:22 2020
4
fe80::4638:39ff:fe00:59/6 leaf01 vlan20 RED Mon Oct 19 22:28:22 2020
4
fe80::4638:39ff:fe00:59/6 leaf01 vlan30 BLUE Mon Oct 19 22:28:22 2020
4
fe80::4638:39ff:fe00:2/64 leaf01 swp51 default Mon Oct 19 22:28:22 2020
fe80::4638:39ff:fe00:4/64 leaf01 swp52 default Mon Oct 19 22:28:22 2020
fe80::4638:39ff:febe:efaa leaf01 vlan4001 RED Mon Oct 19 22:28:22 2020
/64
fe80::4638:39ff:fe00:6/64 leaf01 swp53 default Mon Oct 19 22:28:22 2020
fe80::200:ff:fe00:1c/64 leaf01 vlan30-v0 BLUE Mon Oct 19 22:28:22 2020
fe80::200:ff:fe00:1b/64 leaf01 vlan20-v0 RED Mon Oct 19 22:28:22 2020
fe80::200:ff:fe00:1a/64 leaf01 vlan10-v0 RED Mon Oct 19 22:28:22 2020
fe80::4638:39ff:fe00:59/6 leaf01 peerlink.4094 default Mon Oct 19 22:28:22 2020
4
fe80::4638:39ff:fe00:59/6 leaf01 bridge default Mon Oct 19 22:28:22 2020
4
fe80::4638:39ff:fe00:17a/ leaf01 eth0 Mon Oct 19 22:28:22 2020
64
Obtain a Count of IP Addresses Used on a Device
Use the count option to view the number of IP addresses on a device.
▼
show ip addresses count
The following example shows the number of IPv4 and IPv6 addresses on the leaf01 switch:
cumulus@switch:~$ netq leaf01 show ip addresses count
Count of matching address records: 9
cumulus@switch:~$ netq leaf01 show ipv6 addresses count
Count of matching address records: 17
View IP Neighbor Information
You can view the IPv4 and IPv6 neighbor information for all devices, including the interface port, MAC address, VRF assignment, and whether it learns the MAC address from the peer (remote=yes).
View IP Neighbor Information for All Devices
To view neighbor information for all devices running IPv4 or IPv6, run netq show ip/ipv6 neighbors.
▼
show ip neighbors
The following example shows all neighbors for devices running IPv4:
cumulus@switch:~$ netq show ip neighbors
Matching neighbor records:
IP Address Hostname Interface MAC Address VRF Remote Last Changed
------------------------- ----------------- ------------------------- ------------------ --------------- ------ -------------------------
169.254.0.1 spine04 swp1 44:38:39:00:00:08 default no Mon Oct 19 22:28:23 2020
169.254.0.1 spine04 swp6 44:38:39:00:00:30 default no Mon Oct 19 22:28:23 2020
169.254.0.1 spine04 swp5 44:38:39:00:00:28 default no Mon Oct 19 22:28:23 2020
192.168.200.1 spine04 eth0 44:38:39:00:00:6d no Tue Oct 20 17:39:25 2020
169.254.0.1 spine04 swp4 44:38:39:00:00:20 default no Mon Oct 19 22:28:23 2020
169.254.0.1 spine04 swp3 44:38:39:00:00:18 default no Mon Oct 19 22:28:23 2020
169.254.0.1 spine04 swp2 44:38:39:00:00:10 default no Mon Oct 19 22:28:23 2020
192.168.200.24 spine04 mgmt c6:b3:15:1d:84:c4 no Mon Oct 19 22:28:23 2020
192.168.200.250 spine04 eth0 44:38:39:00:01:80 no Mon Oct 19 22:28:23 2020
169.254.0.1 spine03 swp1 44:38:39:00:00:06 default no Mon Oct 19 22:29:01 2020
169.254.0.1 spine03 swp6 44:38:39:00:00:2e default no Mon Oct 19 22:29:01 2020
169.254.0.1 spine03 swp5 44:38:39:00:00:26 default no Mon Oct 19 22:29:01 2020
192.168.200.1 spine03 eth0 44:38:39:00:00:6d no Tue Oct 20 17:25:19 2020
169.254.0.1 spine03 swp4 44:38:39:00:00:1e default no Mon Oct 19 22:29:01 2020
169.254.0.1 spine03 swp3 44:38:39:00:00:16 default no Mon Oct 19 22:29:01 2020
169.254.0.1 spine03 swp2 44:38:39:00:00:0e default no Mon Oct 19 22:29:01 2020
192.168.200.250 spine03 eth0 44:38:39:00:01:80 no Mon Oct 19 22:29:01 2020
169.254.0.1 spine02 swp1 44:38:39:00:00:04 default no Mon Oct 19 22:28:46 2020
169.254.0.1 spine02 swp6 44:38:39:00:00:2c default no Mon Oct 19 22:28:46 2020
169.254.0.1 spine02 swp5 44:38:39:00:00:24 default no Mon Oct 19 22:28:46 2020
...
Filter IP Neighbor Information
You can filter the list of IP neighbors to show only neighbors for a particular device, interface, address, or VRF assignment.
▼
leaf02 show ipv6 neighbors
The following example shows IPv6 neighbors for leaf02 switch:
cumulus@switch$ netq leaf02 show ipv6 neighbors
Matching neighbor records:
IP Address Hostname Interface MAC Address VRF Remote Last Changed
------------------------- ----------------- ------------------------- ------------------ --------------- ------ -------------------------
ff02::16 leaf02 eth0 33:33:00:00:00:16 no Mon Oct 19 22:28:30 2020
fe80::4638:39ff:fe00:32 leaf02 vlan10-v0 44:38:39:00:00:32 RED no Mon Oct 19 22:28:30 2020
fe80::4638:39ff:febe:efaa leaf02 vlan4001 44:38:39:be:ef:aa RED no Mon Oct 19 22:28:30 2020
fe80::4638:39ff:fe00:3a leaf02 vlan20-v0 44:38:39:00:00:34 RED no Mon Oct 19 22:28:30 2020
ff02::1 leaf02 mgmt 33:33:00:00:00:01 no Mon Oct 19 22:28:30 2020
fe80::4638:39ff:fe00:3c leaf02 vlan30 44:38:39:00:00:36 BLUE no Mon Oct 19 22:28:30 2020
fe80::4638:39ff:fe00:59 leaf02 peerlink.4094 44:38:39:00:00:59 default no Mon Oct 19 22:28:30 2020
fe80::4638:39ff:fe00:59 leaf02 vlan20 44:38:39:00:00:59 RED no Mon Oct 19 22:28:30 2020
fe80::4638:39ff:fe00:42 leaf02 vlan30-v0 44:38:39:00:00:42 BLUE no Mon Oct 19 22:28:30 2020
fe80::4638:39ff:fe00:9 leaf02 swp51 44:38:39:00:00:09 default no Mon Oct 19 22:28:30 2020
fe80::4638:39ff:fe00:44 leaf02 vlan10 44:38:39:00:00:3e RED yes Mon Oct 19 22:28:30 2020
fe80::4638:39ff:fe00:3c leaf02 vlan30-v0 44:38:39:00:00:36 BLUE no Mon Oct 19 22:28:30 2020
fe80::4638:39ff:fe00:32 leaf02 vlan10 44:38:39:00:00:32 RED no Mon Oct 19 22:28:30 2020
fe80::4638:39ff:fe00:59 leaf02 vlan30 44:38:39:00:00:59 BLUE no Mon Oct 19 22:28:30 2020
fe80::4638:39ff:fe00:190 leaf02 eth0 44:38:39:00:01:90 no Mon Oct 19 22:28:30 2020
fe80::4638:39ff:fe00:40 leaf02 vlan20-v0 44:38:39:00:00:40 RED no Mon Oct 19 22:28:30 2020
fe80::4638:39ff:fe00:44 leaf02 vlan10-v0 44:38:39:00:00:3e RED no Mon Oct 19 22:28:30 2020
fe80::4638:39ff:fe00:3a leaf02 vlan20 44:38:39:00:00:34 RED no Mon Oct 19 22:28:30 2020
fe80::4638:39ff:fe00:180 leaf02 eth0 44:38:39:00:01:80 no Mon Oct 19 22:28:30 2020
fe80::4638:39ff:fe00:40 leaf02 vlan20 44:38:39:00:00:40 RED yes Mon Oct 19 22:28:30 2020
fe80::4638:39ff:fe00:f leaf02 swp54 44:38:39:00:00:0f default no Mon Oct 19 22:28:30 2020
▼
show ip neighbors vrf RED
The following example shows all IPv4 neighbors using the RED VRF. Note that the VRF name is case sensitive.
cumulus@switch:~$ netq show ip neighbors vrf RED
Matching neighbor records:
IP Address Hostname Interface MAC Address VRF Remote Last Changed
------------------------- ----------------- ------------------------- ------------------ --------------- ------ -------------------------
10.1.10.2 leaf04 vlan10 44:38:39:00:00:5d RED no Mon Oct 19 22:28:47 2020
10.1.20.2 leaf04 vlan20 44:38:39:00:00:5d RED no Mon Oct 19 22:28:47 2020
10.1.10.3 leaf03 vlan10 44:38:39:00:00:5e RED no Mon Oct 19 22:28:18 2020
10.1.20.3 leaf03 vlan20 44:38:39:00:00:5e RED no Mon Oct 19 22:28:18 2020
10.1.10.2 leaf02 vlan10 44:38:39:00:00:59 RED no Mon Oct 19 22:28:30 2020
10.1.20.2 leaf02 vlan20 44:38:39:00:00:59 RED no Mon Oct 19 22:28:30 2020
10.1.10.3 leaf01 vlan10 44:38:39:00:00:37 RED no Mon Oct 19 22:28:22 2020
10.1.20.3 leaf01 vlan20 44:38:39:00:00:37 RED no Mon Oct 19 22:28:22 2020
▼
show ipv6 neighbors vlan10
The following example shows all IPv6 neighbors using the vlan10 interface:
cumulus@netq-ts:~$ netq show ipv6 neighbors vlan10
Matching neighbor records:
IP Address Hostname Interface MAC Address VRF Remote Last Changed
------------------------- ----------------- ------------------------- ------------------ --------------- ------ -------------------------
fe80::4638:39ff:fe00:44 leaf04 vlan10 44:38:39:00:00:3e RED no Mon Oct 19 22:28:47 2020
fe80::4638:39ff:fe00:5d leaf04 vlan10 44:38:39:00:00:5d RED no Mon Oct 19 22:28:47 2020
fe80::4638:39ff:fe00:32 leaf04 vlan10 44:38:39:00:00:32 RED yes Mon Oct 19 22:28:47 2020
fe80::4638:39ff:fe00:44 leaf03 vlan10 44:38:39:00:00:3e RED no Mon Oct 19 22:28:18 2020
fe80::4638:39ff:fe00:5e leaf03 vlan10 44:38:39:00:00:5e RED no Mon Oct 19 22:28:18 2020
fe80::4638:39ff:fe00:32 leaf03 vlan10 44:38:39:00:00:32 RED yes Mon Oct 19 22:28:18 2020
fe80::4638:39ff:fe00:44 leaf02 vlan10 44:38:39:00:00:3e RED yes Mon Oct 19 22:28:30 2020
fe80::4638:39ff:fe00:32 leaf02 vlan10 44:38:39:00:00:32 RED no Mon Oct 19 22:28:30 2020
fe80::4638:39ff:fe00:59 leaf02 vlan10 44:38:39:00:00:59 RED no Mon Oct 19 22:28:30 2020
fe80::4638:39ff:fe00:44 leaf01 vlan10 44:38:39:00:00:3e RED yes Mon Oct 19 22:28:22 2020
fe80::4638:39ff:fe00:32 leaf01 vlan10 44:38:39:00:00:32 RED no Mon Oct 19 22:28:22 2020
fe80::4638:39ff:fe00:37 leaf01 vlan10 44:38:39:00:00:37 RED no Mon Oct 19 22:28:22 2020
View IP Routes Information
You can view the IPv4 and IPv6 routes for all devices, including the IP address (with or without mask), the destination (by hostname) of the route, next hops available, VRF assignment, and whether a host is the owner of the route or MAC address.
View IP Routes for All Devices
To view all IP routes, run netq show ip routes.
▼
show ip routes
The following example shows the IPv4 routes for all devices in the network:
cumulus@switch:~$ netq show ip routes
Matching routes records:
Origin VRF Prefix Hostname Nexthops Last Changed
------ --------------- ------------------------------ ----------------- ----------------------------------- -------------------------
no default 10.0.1.2/32 spine04 169.254.0.1: swp3, Mon Oct 19 22:28:23 2020
169.254.0.1: swp4
no default 10.10.10.4/32 spine04 169.254.0.1: swp3, Mon Oct 19 22:28:23 2020
169.254.0.1: swp4
no default 10.10.10.3/32 spine04 169.254.0.1: swp3, Mon Oct 19 22:28:23 2020
169.254.0.1: swp4
no default 10.10.10.2/32 spine04 169.254.0.1: swp1, Mon Oct 19 22:28:23 2020
169.254.0.1: swp2
no default 10.10.10.1/32 spine04 169.254.0.1: swp1, Mon Oct 19 22:28:23 2020
169.254.0.1: swp2
yes 192.168.200.0/24 spine04 eth0 Mon Oct 19 22:28:23 2020
yes 192.168.200.24/32 spine04 eth0 Mon Oct 19 22:28:23 2020
no default 10.0.1.1/32 spine04 169.254.0.1: swp1, Mon Oct 19 22:28:23 2020
169.254.0.1: swp2
yes default 10.10.10.104/32 spine04 lo Mon Oct 19 22:28:23 2020
no 0.0.0.0/0 spine04 Blackhole Mon Oct 19 22:28:23 2020
no default 10.10.10.64/32 spine04 169.254.0.1: swp5, Mon Oct 19 22:28:23 2020
169.254.0.1: swp6
no default 10.10.10.63/32 spine04 169.254.0.1: swp5, Mon Oct 19 22:28:23 2020
169.254.0.1: swp6
...
▼
show ipv6 routes
The following example shows the IPv6 routes for all devices in the network:
cumulus@switch:~$ netq show ipv6 routes
Matching routes records:
Origin VRF Prefix Hostname Nexthops Last Changed
------ --------------- ------------------------------ ----------------- ----------------------------------- -------------------------
no ::/0 spine04 Blackhole Mon Oct 19 22:28:23 2020
no ::/0 spine03 Blackhole Mon Oct 19 22:29:01 2020
no ::/0 spine02 Blackhole Mon Oct 19 22:28:46 2020
no ::/0 spine01 Blackhole Mon Oct 19 22:28:48 2020
no RED ::/0 leaf04 Blackhole Mon Oct 19 22:28:47 2020
no ::/0 leaf04 Blackhole Mon Oct 19 22:28:47 2020
no BLUE ::/0 leaf04 Blackhole Mon Oct 19 22:28:47 2020
no RED ::/0 leaf03 Blackhole Mon Oct 19 22:28:18 2020
no ::/0 leaf03 Blackhole Mon Oct 19 22:28:18 2020
no BLUE ::/0 leaf03 Blackhole Mon Oct 19 22:28:18 2020
no RED ::/0 leaf02 Blackhole Mon Oct 19 22:28:30 2020
no ::/0 leaf02 Blackhole Mon Oct 19 22:28:30 2020
no BLUE ::/0 leaf02 Blackhole Mon Oct 19 22:28:30 2020
no RED ::/0 leaf01 Blackhole Mon Oct 19 22:28:22 2020
no ::/0 leaf01 Blackhole Mon Oct 19 22:28:22 2020
no BLUE ::/0 leaf01 Blackhole Mon Oct 19 22:28:22 2020
no ::/0 fw2 Blackhole Mon Oct 19 22:28:22 2020
no ::/0 fw1 Blackhole Mon Oct 19 22:28:10 2020
no RED ::/0 border02 Blackhole Mon Oct 19 22:28:38 2020
no ::/0 border02 Blackhole Mon Oct 19 22:28:38 2020
no BLUE ::/0 border02 Blackhole Mon Oct 19 22:28:38 2020
no RED ::/0 border01 Blackhole Mon Oct 19 22:28:34 2020
no ::/0 border01 Blackhole Mon Oct 19 22:28:34 2020
no BLUE ::/0 border01 Blackhole Mon Oct 19 22:28:34 2020
Filter IP Route Information
You can filter the IP route information listing for a particular device, interface address, VRF assignment or route origination.
▼
show ip routes 10.0.0.12
The following example shows the routes available for an IP address of 10.0.0.12. The result shows nine available routes:
cumulus@switch:~$ netq show ip routes 10.0.0.12
Matching routes records:
Origin VRF Prefix Hostname Nexthops Last Changed
------ --------------- ------------------------------ ----------------- ----------------------------------- -------------------------
no 0.0.0.0/0 spine04 Blackhole Mon Oct 19 22:28:23 2020
no 0.0.0.0/0 spine03 Blackhole Mon Oct 19 22:29:01 2020
no 0.0.0.0/0 spine02 Blackhole Mon Oct 19 22:28:46 2020
no 0.0.0.0/0 spine01 Blackhole Mon Oct 19 22:28:48 2020
no default 0.0.0.0/0 server08 192.168.200.1: eth0 Mon Oct 19 22:28:50 2020
no default 0.0.0.0/0 server07 192.168.200.1: eth0 Mon Oct 19 22:28:43 2020
no default 10.0.0.0/8 server06 10.1.30.1: uplink Mon Oct 19 22:40:52 2020
no default 10.0.0.0/8 server05 10.1.20.1: uplink Mon Oct 19 22:41:08 2020
no default 10.0.0.0/8 server04 10.1.10.1: uplink Mon Oct 19 22:40:45 2020
no default 10.0.0.0/8 server03 10.1.30.1: uplink Mon Oct 19 22:41:04 2020
no default 10.0.0.0/8 server02 10.1.20.1: uplink Mon Oct 19 22:41:00 2020
no default 10.0.0.0/8 server01 10.1.10.1: uplink Mon Oct 19 22:40:36 2020
no default 0.0.0.0/0 oob-mgmt-server 10.255.1.1: vagrant Mon Oct 19 22:28:20 2020
no BLUE 0.0.0.0/0 leaf04 Blackhole Mon Oct 19 22:28:47 2020
no 0.0.0.0/0 leaf04 Blackhole Mon Oct 19 22:28:47 2020
no RED 0.0.0.0/0 leaf04 Blackhole Mon Oct 19 22:28:47 2020
no BLUE 0.0.0.0/0 leaf03 Blackhole Mon Oct 19 22:28:18 2020
no 0.0.0.0/0 leaf03 Blackhole Mon Oct 19 22:28:18 2020
no RED 0.0.0.0/0 leaf03 Blackhole Mon Oct 19 22:28:18 2020
no BLUE 0.0.0.0/0 leaf02 Blackhole Mon Oct 19 22:28:30 2020
no 0.0.0.0/0 leaf02 Blackhole Mon Oct 19 22:28:30 2020
...
▼
spine01 show ip routes origin
The following example shows all IPv4 routes owned by spine01 switch:
cumulus@switch:~$ netq spine01 show ip routes origin
Matching routes records:
Origin VRF Prefix Hostname Nexthops Last Changed
------ --------------- ------------------------------ ----------------- ----------------------------------- -------------------------
yes 192.168.200.0/24 spine01 eth0 Mon Oct 19 22:28:48 2020
yes 192.168.200.21/32 spine01 eth0 Mon Oct 19 22:28:48 2020
yes default 10.10.10.101/32 spine01 lo Mon Oct 19 22:28:48 2020
View IP Routes for a Given Device at a Prior Time
As with most NetQ CLI commands, you can view a characteristic for a time in the past. The same is true with IP routes.
▼
spine01 show ip routes around 24h
The following example shows the IPv4 routes for spine01 switch approximately 24 hours ago:
cumulus@switch:~$ netq spine01 show ip routes around 24h
Matching routes records:
Origin VRF Prefix Hostname Nexthops Last Changed
------ --------------- ------------------------------ ----------------- ----------------------------------- -------------------------
no default 10.0.1.2/32 spine01 169.254.0.1: swp3, Sun Oct 18 22:28:41 2020
169.254.0.1: swp4
no default 10.10.10.4/32 spine01 169.254.0.1: swp3, Sun Oct 18 22:28:41 2020
169.254.0.1: swp4
no default 10.10.10.3/32 spine01 169.254.0.1: swp3, Sun Oct 18 22:28:41 2020
169.254.0.1: swp4
no default 10.10.10.2/32 spine01 169.254.0.1: swp1, Sun Oct 18 22:28:41 2020
169.254.0.1: swp2
no default 10.10.10.1/32 spine01 169.254.0.1: swp1, Sun Oct 18 22:28:41 2020
169.254.0.1: swp2
yes 192.168.200.0/24 spine01 eth0 Sun Oct 18 22:28:41 2020
yes 192.168.200.21/32 spine01 eth0 Sun Oct 18 22:28:41 2020
no default 10.0.1.1/32 spine01 169.254.0.1: swp1, Sun Oct 18 22:28:41 2020
169.254.0.1: swp2
yes default 10.10.10.101/32 spine01 lo Sun Oct 18 22:28:41 2020
no 0.0.0.0/0 spine01 Blackhole Sun Oct 18 22:28:41 2020
no default 10.10.10.64/32 spine01 169.254.0.1: swp5, Sun Oct 18 22:28:41 2020
169.254.0.1: swp6
no default 10.10.10.63/32 spine01 169.254.0.1: swp5, Sun Oct 18 22:28:41 2020
169.254.0.1: swp6
no default 10.0.1.254/32 spine01 169.254.0.1: swp5, Sun Oct 18 22:28:41 2020
169.254.0.1: swp6
View the Number of IP Routes
You can view the total number of IP routes on all devices or on a particular device.
▼
leaf01 show ip routes count
This example shows the total number of IPv4 and IPv6 routes for all devices on a the leaf01 switch.
cumulus@switch:~$ netq leaf01 show ip routes count
Count of matching routes records: 27
cumulus@switch:~$ netq leaf01 show ipv6 routes count
Count of matching routes records: 3
View the History of an IP Address
The netq show address-history command displays when an IP address configuration changed for an interface. Add options to the command to show:
Changes made between two points in time, using the between option.
Only the difference between to points in time, using the diff option.
The selected output order, using the listby option.
Each change made for the IP address on a particular interface, using the ifname option.
Changes listed chronologically.
The syntax of the command is:
netq [<hostname>] show address-history <text-prefix> [ifname <text-ifname>] [vrf <text-vrf>] [diff] [between <text-time> and <text-endtime>] [listby <text-list-by>] [json]
▼
show address-history 10.1.10.2/24
The following example displays a full chronology of changes for an IP address. If a caret (^) notation appeared, it would indicate that there was no change in value from the row above.
▼
show address-history 10.1.10.2/24 listby hostname
The following example displays the history of an IP address by hostname. If a caret (^) notation appeared, it would indicate that there was no change in this value from the row above.
▼
show address-history 10.1.10.2/24 between 2h and now
The following example displays the history of an IP address between now and two hours ago. If a caret (^) notation appeared, it would indicate that there was no change in this value from the row above.
cumulus@switch:~$ netq show address-history 10.1.10.2/24 between 2h and now
Matching addresshistory records:
Last Changed Hostname Ifname Prefix Mask Vrf
------------------------- ----------------- ------------ ------------------------------ -------- ---------------
Tue Sep 29 15:35:21 2020 leaf03 vlan10 10.1.10.2 24 RED
Tue Sep 29 15:35:24 2020 leaf01 vlan10 10.1.10.2 24 RED
Tue Sep 29 17:24:59 2020 leaf03 vlan10 10.1.10.2 24 RED
Tue Sep 29 17:24:59 2020 leaf01 vlan10 10.1.10.2 24 RED
Tue Sep 29 17:25:05 2020 leaf03 vlan10 10.1.10.2 24 RED
Tue Sep 29 17:25:05 2020 leaf01 vlan10 10.1.10.2 24 RED
Tue Sep 29 17:25:07 2020 leaf03 vlan10 10.1.10.2 24 RED
Tue Sep 29 17:25:08 2020 leaf01 vlan10 10.1.10.2 24 RED
View the Neighbor History for an IP Address
The netq show neighbor-history command displays when the neighbor configuration changed for an IP address.
The syntax of the command is:
netq [<hostname>] show neighbor-history <text-ipaddress> [ifname <text-ifname>] [diff] [between <text-time> and <text-endtime>] [listby <text-list-by>] [json]
▼
show neighbor-history 10.1.10.2
The following example displays a full chronology of changes for an IP address neighbor. If a caret (^) notation appeared, it would indicate that there was no change in this value from the row above.
cumulus@switch:~$ netq show neighbor-history 10.1.10.2
Matching neighborhistory records:
Last Changed Hostname Ifname Vrf Remote Ifindex Mac Address Ipv6 Ip Address
------------------------- ----------------- ------------ --------------- ------ -------------- ------------------ -------- -------------------------
Tue Sep 29 17:25:08 2020 leaf02 vlan10 RED no 24 44:38:39:00:00:59 no 10.1.10.2
Tue Sep 29 17:25:17 2020 leaf04 vlan10 RED no 24 44:38:39:00:00:5d no 10.1.10.2
▼
show neighbor-history 10.1.10.2 listby hostname
The following example displays the history of an IP address neighbor by hostname. If a caret (^) notation appeared, it would indicate that there was no change in this value from the row above.
cumulus@switch:~$ netq show neighbor-history 10.1.10.2 listby hostname
Matching neighborhistory records:
Last Changed Hostname Ifname Vrf Remote Ifindex Mac Address Ipv6 Ip Address
------------------------- ----------------- ------------ --------------- ------ -------------- ------------------ -------- -------------------------
Tue Sep 29 17:25:08 2020 leaf02 vlan10 RED no 24 44:38:39:00:00:59 no 10.1.10.2
Tue Sep 29 17:25:17 2020 leaf04 vlan10 RED no 24 44:38:39:00:00:5d no 10.1.10.2
▼
show neighbor-history 10.1.10.2 between 2h and now
The following example displays the history of an IP address neighbor between now and two hours ago. If a caret (^) notation appeared, it would indicate that there was no change in this value from the row above.
cumulus@switch:~$ netq show neighbor-history 10.1.10.2 between 2h and now
Matching neighborhistory records:
Last Changed Hostname Ifname Vrf Remote Ifindex Mac Address Ipv6 Ip Address
------------------------- ----------------- ------------ --------------- ------ -------------- ------------------ -------- -------------------------
Tue Sep 29 15:35:18 2020 leaf02 vlan10 RED no 24 44:38:39:00:00:59 no 10.1.10.2
Tue Sep 29 15:35:22 2020 leaf04 vlan10 RED no 24 44:38:39:00:00:5d no 10.1.10.2
Tue Sep 29 17:25:00 2020 leaf02 vlan10 RED no 24 44:38:39:00:00:59 no 10.1.10.2
Tue Sep 29 17:25:08 2020 leaf04 vlan10 RED no 24 44:38:39:00:00:5d no 10.1.10.2
Tue Sep 29 17:25:08 2020 leaf02 vlan10 RED no 24 44:38:39:00:00:59 no 10.1.10.2
Tue Sep 29 17:25:14 2020 leaf04 vlan10 RED no 24 44:38:39:00:00:5d no 10.1.10.2
LLDP
Network devices use LLDP to advertise their identity, capabilities, and neighbors on a LAN. You can view this information for one or more devices. You can also view the information at an earlier point in time or view changes that have occurred to the information during a specified time period. For an overview and how to configure LLDP in your network, refer to
Link Layer Discovery Protocol.
NetQ enables operators to view the overall health of the LLDP service on a networkwide and a per-session basis, giving greater insight into all aspects of the service. You accomplish this in the NetQ UI through two card workflows, one for the service and one for the session and in the NetQ CLI with the netq show lldp command.
Monitor the LLDP Service Networkwide
You can monitor LLDP performance across the network with a card or at the command line.
View Service Status Summary
You can view a summary of the LLDP service from the NetQ UI or the NetQ CLI.
Open the small Network Services/All LLDP Sessions card. In this example, the number of devices running the LLDP service is 14 and no alarms are present.
To view LLDP service status, run netq show lldp.
This example shows the Cumulus reference topology, where LLDP runs on all border, firewall, leaf and spine switches, servers, including the out-of-band management server.
cumulus@switch:~$ netq show lldp
Matching lldp records:
Hostname Interface Peer Hostname Peer Interface Last Changed
----------------- ------------------------- ----------------- ------------------------- -------------------------
border01 swp3 fw1 swp1 Mon Oct 26 04:13:29 2020
border01 swp49 border02 swp49 Mon Oct 26 04:13:29 2020
border01 swp51 spine01 swp5 Mon Oct 26 04:13:29 2020
border01 swp52 spine02 swp5 Mon Oct 26 04:13:29 2020
border01 eth0 oob-mgmt-switch swp20 Mon Oct 26 04:13:29 2020
border01 swp53 spine03 swp5 Mon Oct 26 04:13:29 2020
border01 swp50 border02 swp50 Mon Oct 26 04:13:29 2020
border01 swp54 spine04 swp5 Mon Oct 26 04:13:29 2020
border02 swp49 border01 swp49 Mon Oct 26 04:13:11 2020
border02 swp3 fw1 swp2 Mon Oct 26 04:13:11 2020
border02 swp51 spine01 swp6 Mon Oct 26 04:13:11 2020
border02 swp54 spine04 swp6 Mon Oct 26 04:13:11 2020
border02 swp52 spine02 swp6 Mon Oct 26 04:13:11 2020
border02 eth0 oob-mgmt-switch swp21 Mon Oct 26 04:13:11 2020
border02 swp53 spine03 swp6 Mon Oct 26 04:13:11 2020
border02 swp50 border01 swp50 Mon Oct 26 04:13:11 2020
fw1 eth0 oob-mgmt-switch swp18 Mon Oct 26 04:38:03 2020
fw1 swp1 border01 swp3 Mon Oct 26 04:38:03 2020
fw1 swp2 border02 swp3 Mon Oct 26 04:38:03 2020
fw2 eth0 oob-mgmt-switch swp19 Mon Oct 26 04:46:54 2020
leaf01 swp1 server01 mac:44:38:39:00:00:32 Mon Oct 26 04:13:57 2020
leaf01 swp2 server02 mac:44:38:39:00:00:34 Mon Oct 26 04:13:57 2020
leaf01 swp52 spine02 swp1 Mon Oct 26 04:13:57 2020
leaf01 swp49 leaf02 swp49 Mon Oct 26 04:13:57 2020
leaf01 eth0 oob-mgmt-switch swp10 Mon Oct 26 04:13:57 2020
leaf01 swp3 server03 mac:44:38:39:00:00:36 Mon Oct 26 04:13:57 2020
leaf01 swp53 spine03 swp1 Mon Oct 26 04:13:57 2020
leaf01 swp50 leaf02 swp50 Mon Oct 26 04:13:57 2020
leaf01 swp54 spine04 swp1 Mon Oct 26 04:13:57 2020
leaf01 swp51 spine01 swp1 Mon Oct 26 04:13:57 2020
...
View the Distribution of Nodes, Alarms, and Sessions
It is useful to know the number of network nodes running the LLDP protocol over a period of time and the number of established sessions on a given node, as it gives you insight into the amount of traffic associated with and breadth of use of the protocol. Additionally, if there are many alarms, it is worth investigating either the service or particular devices.
Nodes which have a large number of unestablished sessions might have a misconfiguration or are experiencing communication issues. This is visible with the NetQ UI.
To view the distribution, open the medium Network Services/All LLDP Sessions card.
In this example, we see that 13 nodes are running the LLDP protocol, that there are 52 sessions established, and that no LLDP-related alarms have occurred in the last 24 hours. If there was a visual correlation between the alarms and sessions, you could dig a little deeper by expanding to the card to view more data.
To view the number of switches running the LLDP service, run:
netq show lldp
Count the switches in the output.
This example shows two border, two firewall, four leaf switches, four spine, and one out-of-band management switches, plus eight host servers are all running the LLDP service, for a total of 23 devices.
cumulus@switch:~$ netq show lldp
Matching lldp records:
Hostname Interface Peer Hostname Peer Interface Last Changed
----------------- ------------------------- ----------------- ------------------------- -------------------------
border01 swp3 fw1 swp1 Mon Oct 26 04:13:29 2020
border01 swp49 border02 swp49 Mon Oct 26 04:13:29 2020
border01 swp51 spine01 swp5 Mon Oct 26 04:13:29 2020
border01 swp52 spine02 swp5 Mon Oct 26 04:13:29 2020
border01 eth0 oob-mgmt-switch swp20 Mon Oct 26 04:13:29 2020
border01 swp53 spine03 swp5 Mon Oct 26 04:13:29 2020
border01 swp50 border02 swp50 Mon Oct 26 04:13:29 2020
border01 swp54 spine04 swp5 Mon Oct 26 04:13:29 2020
border02 swp49 border01 swp49 Mon Oct 26 04:13:11 2020
border02 swp3 fw1 swp2 Mon Oct 26 04:13:11 2020
border02 swp51 spine01 swp6 Mon Oct 26 04:13:11 2020
border02 swp54 spine04 swp6 Mon Oct 26 04:13:11 2020
border02 swp52 spine02 swp6 Mon Oct 26 04:13:11 2020
border02 eth0 oob-mgmt-switch swp21 Mon Oct 26 04:13:11 2020
border02 swp53 spine03 swp6 Mon Oct 26 04:13:11 2020
border02 swp50 border01 swp50 Mon Oct 26 04:13:11 2020
fw1 eth0 oob-mgmt-switch swp18 Mon Oct 26 04:38:03 2020
fw1 swp1 border01 swp3 Mon Oct 26 04:38:03 2020
fw1 swp2 border02 swp3 Mon Oct 26 04:38:03 2020
fw2 eth0 oob-mgmt-switch swp19 Mon Oct 26 04:46:54 2020
leaf01 swp1 server01 mac:44:38:39:00:00:32 Mon Oct 26 04:13:57 2020
leaf01 swp2 server02 mac:44:38:39:00:00:34 Mon Oct 26 04:13:57 2020
leaf01 swp52 spine02 swp1 Mon Oct 26 04:13:57 2020
leaf01 swp49 leaf02 swp49 Mon Oct 26 04:13:57 2020
leaf01 eth0 oob-mgmt-switch swp10 Mon Oct 26 04:13:57 2020
leaf01 swp3 server03 mac:44:38:39:00:00:36 Mon Oct 26 04:13:57 2020
leaf01 swp53 spine03 swp1 Mon Oct 26 04:13:57 2020
leaf01 swp50 leaf02 swp50 Mon Oct 26 04:13:57 2020
leaf01 swp54 spine04 swp1 Mon Oct 26 04:13:57 2020
leaf01 swp51 spine01 swp1 Mon Oct 26 04:13:57 2020
leaf02 swp52 spine02 swp2 Mon Oct 26 04:14:57 2020
leaf02 swp54 spine04 swp2 Mon Oct 26 04:14:57 2020
leaf02 swp2 server02 mac:44:38:39:00:00:3a Mon Oct 26 04:14:57 2020
leaf02 swp3 server03 mac:44:38:39:00:00:3c Mon Oct 26 04:14:57 2020
leaf02 swp53 spine03 swp2 Mon Oct 26 04:14:57 2020
leaf02 swp50 leaf01 swp50 Mon Oct 26 04:14:57 2020
leaf02 swp51 spine01 swp2 Mon Oct 26 04:14:57 2020
leaf02 eth0 oob-mgmt-switch swp11 Mon Oct 26 04:14:57 2020
leaf02 swp49 leaf01 swp49 Mon Oct 26 04:14:57 2020
leaf02 swp1 server01 mac:44:38:39:00:00:38 Mon Oct 26 04:14:57 2020
leaf03 swp2 server05 mac:44:38:39:00:00:40 Mon Oct 26 04:16:09 2020
leaf03 swp49 leaf04 swp49 Mon Oct 26 04:16:09 2020
leaf03 swp51 spine01 swp3 Mon Oct 26 04:16:09 2020
leaf03 swp50 leaf04 swp50 Mon Oct 26 04:16:09 2020
leaf03 swp54 spine04 swp3 Mon Oct 26 04:16:09 2020
leaf03 swp1 server04 mac:44:38:39:00:00:3e Mon Oct 26 04:16:09 2020
leaf03 swp52 spine02 swp3 Mon Oct 26 04:16:09 2020
leaf03 eth0 oob-mgmt-switch swp12 Mon Oct 26 04:16:09 2020
leaf03 swp53 spine03 swp3 Mon Oct 26 04:16:09 2020
leaf03 swp3 server06 mac:44:38:39:00:00:42 Mon Oct 26 04:16:09 2020
leaf04 swp1 server04 mac:44:38:39:00:00:44 Mon Oct 26 04:15:57 2020
leaf04 swp49 leaf03 swp49 Mon Oct 26 04:15:57 2020
leaf04 swp54 spine04 swp4 Mon Oct 26 04:15:57 2020
leaf04 swp52 spine02 swp4 Mon Oct 26 04:15:57 2020
leaf04 swp2 server05 mac:44:38:39:00:00:46 Mon Oct 26 04:15:57 2020
leaf04 swp50 leaf03 swp50 Mon Oct 26 04:15:57 2020
leaf04 swp51 spine01 swp4 Mon Oct 26 04:15:57 2020
leaf04 eth0 oob-mgmt-switch swp13 Mon Oct 26 04:15:57 2020
leaf04 swp3 server06 mac:44:38:39:00:00:48 Mon Oct 26 04:15:57 2020
leaf04 swp53 spine03 swp4 Mon Oct 26 04:15:57 2020
oob-mgmt-server eth1 oob-mgmt-switch swp1 Sun Oct 25 22:46:24 2020
server01 eth0 oob-mgmt-switch swp2 Sun Oct 25 22:51:17 2020
server01 eth1 leaf01 swp1 Sun Oct 25 22:51:17 2020
server01 eth2 leaf02 swp1 Sun Oct 25 22:51:17 2020
server02 eth0 oob-mgmt-switch swp3 Sun Oct 25 22:49:41 2020
server02 eth1 leaf01 swp2 Sun Oct 25 22:49:41 2020
server02 eth2 leaf02 swp2 Sun Oct 25 22:49:41 2020
server03 eth2 leaf02 swp3 Sun Oct 25 22:50:08 2020
server03 eth1 leaf01 swp3 Sun Oct 25 22:50:08 2020
server03 eth0 oob-mgmt-switch swp4 Sun Oct 25 22:50:08 2020
server04 eth0 oob-mgmt-switch swp5 Sun Oct 25 22:50:27 2020
server04 eth1 leaf03 swp1 Sun Oct 25 22:50:27 2020
server04 eth2 leaf04 swp1 Sun Oct 25 22:50:27 2020
server05 eth0 oob-mgmt-switch swp6 Sun Oct 25 22:49:12 2020
server05 eth1 leaf03 swp2 Sun Oct 25 22:49:12 2020
server05 eth2 leaf04 swp2 Sun Oct 25 22:49:12 2020
server06 eth0 oob-mgmt-switch swp7 Sun Oct 25 22:49:22 2020
server06 eth1 leaf03 swp3 Sun Oct 25 22:49:22 2020
server06 eth2 leaf04 swp3 Sun Oct 25 22:49:22 2020
server07 eth0 oob-mgmt-switch swp8 Sun Oct 25 22:29:58 2020
server08 eth0 oob-mgmt-switch swp9 Sun Oct 25 22:34:12 2020
spine01 swp1 leaf01 swp51 Mon Oct 26 04:13:20 2020
spine01 swp3 leaf03 swp51 Mon Oct 26 04:13:20 2020
spine01 swp2 leaf02 swp51 Mon Oct 26 04:13:20 2020
spine01 swp5 border01 swp51 Mon Oct 26 04:13:20 2020
spine01 eth0 oob-mgmt-switch swp14 Mon Oct 26 04:13:20 2020
spine01 swp4 leaf04 swp51 Mon Oct 26 04:13:20 2020
spine01 swp6 border02 swp51 Mon Oct 26 04:13:20 2020
spine02 swp4 leaf04 swp52 Mon Oct 26 04:16:26 2020
spine02 swp3 leaf03 swp52 Mon Oct 26 04:16:26 2020
spine02 swp6 border02 swp52 Mon Oct 26 04:16:26 2020
spine02 eth0 oob-mgmt-switch swp15 Mon Oct 26 04:16:26 2020
spine02 swp5 border01 swp52 Mon Oct 26 04:16:26 2020
spine02 swp2 leaf02 swp52 Mon Oct 26 04:16:26 2020
spine02 swp1 leaf01 swp52 Mon Oct 26 04:16:26 2020
spine03 swp2 leaf02 swp53 Mon Oct 26 04:13:48 2020
spine03 swp6 border02 swp53 Mon Oct 26 04:13:48 2020
spine03 swp1 leaf01 swp53 Mon Oct 26 04:13:48 2020
spine03 swp3 leaf03 swp53 Mon Oct 26 04:13:48 2020
spine03 swp4 leaf04 swp53 Mon Oct 26 04:13:48 2020
spine03 eth0 oob-mgmt-switch swp16 Mon Oct 26 04:13:48 2020
spine03 swp5 border01 swp53 Mon Oct 26 04:13:48 2020
spine04 eth0 oob-mgmt-switch swp17 Mon Oct 26 04:11:23 2020
spine04 swp3 leaf03 swp54 Mon Oct 26 04:11:23 2020
spine04 swp2 leaf02 swp54 Mon Oct 26 04:11:23 2020
spine04 swp4 leaf04 swp54 Mon Oct 26 04:11:23 2020
spine04 swp1 leaf01 swp54 Mon Oct 26 04:11:23 2020
spine04 swp5 border01 swp54 Mon Oct 26 04:11:23 2020
spine04 swp6 border02 swp54 Mon Oct 26 04:11:23 2020
View the Distribution of Missing Neighbors
You can view the number of missing neighbors in any given time period and how that number has changed over time. This is a good indicator of link communication issues.
To view the distribution, open the large Network Services/ALL LLDP Sessions card and view the bottom chart on the left, Total Sessions with No Nbr.
In this example, we see that 16 of the 52 sessions are consistently missing the neighbor (peer) device over the last 24 hours.
View Devices with the Most LLDP Sessions
You can view the load from LLDP on your switches using the large Network Services/All LLDP Sessions card or the NetQ CLI. This data enables you to see which switches are handling the most LLDP traffic currently, validate that is what is expected based on your network design, and compare that with data from an earlier time to look for any differences.
To view switches and hosts with the most LLDP sessions:
Open the large Network Services/All LLDP Sessions card.
Select Switches with Most Sessions.
The table lists nodes running the most LLDP sessions at the top. Scroll down to view those with the fewest sessions.
To compare this data with the same data at a previous time:
Open another large LLDP Service card.
Move the new card next to the original card if needed.
Change the time period for the data on the new card by hovering over the card and clicking .
Select the time period that you want to compare with the current time. You can now see whether there are significant differences between this time period and the previous time period.
In this case, notice that there are fewer nodes running the protocol, but the total number of sessions running has nearly doubled. If the changes are unexpected, you can investigate further by looking at another time frame, determining if more nodes are now running LLDP than previously, looking for changes in the topology, and so forth.
To determine the devices with the most sessions, run netq show lldp. Then count the sessions on each device.
In this example, border01-02 each have eight sessions, fw1-2 each have two sessions, leaf01-04 each have 10 sessions, spine01-04 switches each have four sessions, server01-06 each have three sessions, and server07-08 and oob-mgmt-server each have one session. Therefore the leaf switches have the most sessions.
cumulus@switch:~$ netq show lldp
Matching lldp records:
Hostname Interface Peer Hostname Peer Interface Last Changed
----------------- ------------------------- ----------------- ------------------------- -------------------------
border01 swp3 fw1 swp1 Mon Oct 26 04:13:29 2020
border01 swp49 border02 swp49 Mon Oct 26 04:13:29 2020
border01 swp51 spine01 swp5 Mon Oct 26 04:13:29 2020
border01 swp52 spine02 swp5 Mon Oct 26 04:13:29 2020
border01 eth0 oob-mgmt-switch swp20 Mon Oct 26 04:13:29 2020
border01 swp53 spine03 swp5 Mon Oct 26 04:13:29 2020
border01 swp50 border02 swp50 Mon Oct 26 04:13:29 2020
border01 swp54 spine04 swp5 Mon Oct 26 04:13:29 2020
border02 swp49 border01 swp49 Mon Oct 26 04:13:11 2020
border02 swp3 fw1 swp2 Mon Oct 26 04:13:11 2020
border02 swp51 spine01 swp6 Mon Oct 26 04:13:11 2020
border02 swp54 spine04 swp6 Mon Oct 26 04:13:11 2020
border02 swp52 spine02 swp6 Mon Oct 26 04:13:11 2020
border02 eth0 oob-mgmt-switch swp21 Mon Oct 26 04:13:11 2020
border02 swp53 spine03 swp6 Mon Oct 26 04:13:11 2020
border02 swp50 border01 swp50 Mon Oct 26 04:13:11 2020
fw1 eth0 oob-mgmt-switch swp18 Mon Oct 26 04:38:03 2020
fw1 swp1 border01 swp3 Mon Oct 26 04:38:03 2020
fw1 swp2 border02 swp3 Mon Oct 26 04:38:03 2020
fw2 eth0 oob-mgmt-switch swp19 Mon Oct 26 04:46:54 2020
leaf01 swp1 server01 mac:44:38:39:00:00:32 Mon Oct 26 04:13:57 2020
leaf01 swp2 server02 mac:44:38:39:00:00:34 Mon Oct 26 04:13:57 2020
leaf01 swp52 spine02 swp1 Mon Oct 26 04:13:57 2020
leaf01 swp49 leaf02 swp49 Mon Oct 26 04:13:57 2020
leaf01 eth0 oob-mgmt-switch swp10 Mon Oct 26 04:13:57 2020
leaf01 swp3 server03 mac:44:38:39:00:00:36 Mon Oct 26 04:13:57 2020
leaf01 swp53 spine03 swp1 Mon Oct 26 04:13:57 2020
leaf01 swp50 leaf02 swp50 Mon Oct 26 04:13:57 2020
leaf01 swp54 spine04 swp1 Mon Oct 26 04:13:57 2020
leaf01 swp51 spine01 swp1 Mon Oct 26 04:13:57 2020
leaf02 swp52 spine02 swp2 Mon Oct 26 04:14:57 2020
leaf02 swp54 spine04 swp2 Mon Oct 26 04:14:57 2020
leaf02 swp2 server02 mac:44:38:39:00:00:3a Mon Oct 26 04:14:57 2020
leaf02 swp3 server03 mac:44:38:39:00:00:3c Mon Oct 26 04:14:57 2020
leaf02 swp53 spine03 swp2 Mon Oct 26 04:14:57 2020
leaf02 swp50 leaf01 swp50 Mon Oct 26 04:14:57 2020
leaf02 swp51 spine01 swp2 Mon Oct 26 04:14:57 2020
leaf02 eth0 oob-mgmt-switch swp11 Mon Oct 26 04:14:57 2020
leaf02 swp49 leaf01 swp49 Mon Oct 26 04:14:57 2020
leaf02 swp1 server01 mac:44:38:39:00:00:38 Mon Oct 26 04:14:57 2020
leaf03 swp2 server05 mac:44:38:39:00:00:40 Mon Oct 26 04:16:09 2020
leaf03 swp49 leaf04 swp49 Mon Oct 26 04:16:09 2020
leaf03 swp51 spine01 swp3 Mon Oct 26 04:16:09 2020
leaf03 swp50 leaf04 swp50 Mon Oct 26 04:16:09 2020
leaf03 swp54 spine04 swp3 Mon Oct 26 04:16:09 2020
leaf03 swp1 server04 mac:44:38:39:00:00:3e Mon Oct 26 04:16:09 2020
leaf03 swp52 spine02 swp3 Mon Oct 26 04:16:09 2020
leaf03 eth0 oob-mgmt-switch swp12 Mon Oct 26 04:16:09 2020
leaf03 swp53 spine03 swp3 Mon Oct 26 04:16:09 2020
leaf03 swp3 server06 mac:44:38:39:00:00:42 Mon Oct 26 04:16:09 2020
leaf04 swp1 server04 mac:44:38:39:00:00:44 Mon Oct 26 04:15:57 2020
leaf04 swp49 leaf03 swp49 Mon Oct 26 04:15:57 2020
leaf04 swp54 spine04 swp4 Mon Oct 26 04:15:57 2020
leaf04 swp52 spine02 swp4 Mon Oct 26 04:15:57 2020
leaf04 swp2 server05 mac:44:38:39:00:00:46 Mon Oct 26 04:15:57 2020
leaf04 swp50 leaf03 swp50 Mon Oct 26 04:15:57 2020
leaf04 swp51 spine01 swp4 Mon Oct 26 04:15:57 2020
leaf04 eth0 oob-mgmt-switch swp13 Mon Oct 26 04:15:57 2020
leaf04 swp3 server06 mac:44:38:39:00:00:48 Mon Oct 26 04:15:57 2020
leaf04 swp53 spine03 swp4 Mon Oct 26 04:15:57 2020
oob-mgmt-server eth1 oob-mgmt-switch swp1 Sun Oct 25 22:46:24 2020
server01 eth0 oob-mgmt-switch swp2 Sun Oct 25 22:51:17 2020
server01 eth1 leaf01 swp1 Sun Oct 25 22:51:17 2020
server01 eth2 leaf02 swp1 Sun Oct 25 22:51:17 2020
server02 eth0 oob-mgmt-switch swp3 Sun Oct 25 22:49:41 2020
server02 eth1 leaf01 swp2 Sun Oct 25 22:49:41 2020
server02 eth2 leaf02 swp2 Sun Oct 25 22:49:41 2020
server03 eth2 leaf02 swp3 Sun Oct 25 22:50:08 2020
server03 eth1 leaf01 swp3 Sun Oct 25 22:50:08 2020
server03 eth0 oob-mgmt-switch swp4 Sun Oct 25 22:50:08 2020
server04 eth0 oob-mgmt-switch swp5 Sun Oct 25 22:50:27 2020
server04 eth1 leaf03 swp1 Sun Oct 25 22:50:27 2020
server04 eth2 leaf04 swp1 Sun Oct 25 22:50:27 2020
server05 eth0 oob-mgmt-switch swp6 Sun Oct 25 22:49:12 2020
server05 eth1 leaf03 swp2 Sun Oct 25 22:49:12 2020
server05 eth2 leaf04 swp2 Sun Oct 25 22:49:12 2020
server06 eth0 oob-mgmt-switch swp7 Sun Oct 25 22:49:22 2020
server06 eth1 leaf03 swp3 Sun Oct 25 22:49:22 2020
server06 eth2 leaf04 swp3 Sun Oct 25 22:49:22 2020
server07 eth0 oob-mgmt-switch swp8 Sun Oct 25 22:29:58 2020
server08 eth0 oob-mgmt-switch swp9 Sun Oct 25 22:34:12 2020
spine01 swp1 leaf01 swp51 Mon Oct 26 04:13:20 2020
spine01 swp3 leaf03 swp51 Mon Oct 26 04:13:20 2020
spine01 swp2 leaf02 swp51 Mon Oct 26 04:13:20 2020
spine01 swp5 border01 swp51 Mon Oct 26 04:13:20 2020
spine01 eth0 oob-mgmt-switch swp14 Mon Oct 26 04:13:20 2020
spine01 swp4 leaf04 swp51 Mon Oct 26 04:13:20 2020
spine01 swp6 border02 swp51 Mon Oct 26 04:13:20 2020
spine02 swp4 leaf04 swp52 Mon Oct 26 04:16:26 2020
spine02 swp3 leaf03 swp52 Mon Oct 26 04:16:26 2020
spine02 swp6 border02 swp52 Mon Oct 26 04:16:26 2020
spine02 eth0 oob-mgmt-switch swp15 Mon Oct 26 04:16:26 2020
spine02 swp5 border01 swp52 Mon Oct 26 04:16:26 2020
spine02 swp2 leaf02 swp52 Mon Oct 26 04:16:26 2020
spine02 swp1 leaf01 swp52 Mon Oct 26 04:16:26 2020
spine03 swp2 leaf02 swp53 Mon Oct 26 04:13:48 2020
spine03 swp6 border02 swp53 Mon Oct 26 04:13:48 2020
spine03 swp1 leaf01 swp53 Mon Oct 26 04:13:48 2020
spine03 swp3 leaf03 swp53 Mon Oct 26 04:13:48 2020
spine03 swp4 leaf04 swp53 Mon Oct 26 04:13:48 2020
spine03 eth0 oob-mgmt-switch swp16 Mon Oct 26 04:13:48 2020
spine03 swp5 border01 swp53 Mon Oct 26 04:13:48 2020
spine04 eth0 oob-mgmt-switch swp17 Mon Oct 26 04:11:23 2020
spine04 swp3 leaf03 swp54 Mon Oct 26 04:11:23 2020
spine04 swp2 leaf02 swp54 Mon Oct 26 04:11:23 2020
spine04 swp4 leaf04 swp54 Mon Oct 26 04:11:23 2020
spine04 swp1 leaf01 swp54 Mon Oct 26 04:11:23 2020
spine04 swp5 border01 swp54 Mon Oct 26 04:11:23 2020
spine04 swp6 border02 swp54 Mon Oct 26 04:11:23 2020
View Devices with the Most Unestablished LLDP Sessions
You can identify switches and hosts experiencing difficulties establishing LLDP sessions—both currently and in the past—using the NetQ UI.
To view switches with the most unestablished LLDP sessions:
Open the large Network Services/All LLDP Sessions card.
Select Switches with Most Unestablished Sessions.
The table lists nodes with the most unestablished LLDP sessions at the top. Scroll down to view those with the fewest unestablished sessions.
Where to go next depends on what data you see, but a few options include:
Changing the time period and comparing the data with a prior time. If the same switches are consistently indicating the most unestablished sessions, you might want to look more carefully at those switches using the Switches card workflow to determine probable causes. Refer to Switches.
Selecting Show All Sessions to investigate all LLDP sessions with events in the full screen card.
View LLDP Configuration Information for a Given Device
You can view the LLDP configuration information for a given device from the NetQ UI or the NetQ CLI.
Open the full-screen Network Services/All LLDP Sessions card.
Click to filter by hostname.
Click Apply.
Run the netq show lldp command with the hostname option.
This example shows the LLDP configuration information for the leaf01 switch. The switch has a session between its swp1 interface and host server01 in the mac:44:38:39:00:00:32 interface. It also has a session between its swp2 interface and host server02 on mac:44:38:39:00:00:34 interface. And so on.
cumulus@netq-ts:~$ netq leaf01 show lldp
Matching lldp records:
Hostname Interface Peer Hostname Peer Interface Last Changed
----------------- ------------------------- ----------------- ------------------------- -------------------------
leaf01 swp1 server01 mac:44:38:39:00:00:32 Mon Oct 26 04:13:57 2020
leaf01 swp2 server02 mac:44:38:39:00:00:34 Mon Oct 26 04:13:57 2020
leaf01 swp52 spine02 swp1 Mon Oct 26 04:13:57 2020
leaf01 swp49 leaf02 swp49 Mon Oct 26 04:13:57 2020
leaf01 eth0 oob-mgmt-switch swp10 Mon Oct 26 04:13:57 2020
leaf01 swp3 server03 mac:44:38:39:00:00:36 Mon Oct 26 04:13:57 2020
leaf01 swp53 spine03 swp1 Mon Oct 26 04:13:57 2020
leaf01 swp50 leaf02 swp50 Mon Oct 26 04:13:57 2020
leaf01 swp54 spine04 swp1 Mon Oct 26 04:13:57 2020
leaf01 swp51 spine01 swp1 Mon Oct 26 04:13:57 2020
View Switches with the Most LLDP-related Alarms
Switches or hosts experiencing a large number of LLDP alarms might indicate a configuration or performance issue that needs further investigation. You can view this information using the NetQ UI or NetQ CLI.
With the NetQ UI, you can view the switches sorted by the number of LLDP alarms and then use the Switches or Events cards to gather more information about possible causes for the alarms.
To view switches with most LLDP alarms:
Open the large Network Services/All LLDP Sessions card.
Hover over the header and click .
Select Events by Most Active Device.
The table lists nodes with the most LLDP alarms at the top. Scroll down to view those with the fewest alarms.
Where to go next depends on what data you see, but a few options include:
Changing the time period and comparing the data with a prior time. If the same switches are consistently indicating the most alarms, you might want to look more carefully at those switches in the Switches card.
Click Show All Sessions to investigate all switches running LLDP sessions in the full-screen card.
To view the switches and hosts with the most LLDP events, run the netq show events command with the message_type option set to lldp, and optionally the between option set to display the events within a given time range. Count the events associated with each switch.
This example shows that no LLDP events have occurred in the last 24 hours.
cumulus@switch:~$ netq show events message_type lldp
No matching event records found
This example shows all LLDP events between now and 30 days ago, a total of 21 info events.
cumulus@switch:~$ netq show events message_type lldp between now and 30d
Matching events records:
Hostname Message Type Severity Message Timestamp
----------------- ------------------------ ---------------- ----------------------------------- -------------------------
spine02 lldp info LLDP Session with hostname spine02 Fri Oct 2 22:28:57 2020
and eth0 modified fields {"new lldp
peer osv":"4.2.1","old lldp peer os
v":"3.7.12"}
leaf04 lldp info LLDP Session with hostname leaf04 a Fri Oct 2 22:28:39 2020
nd eth0 modified fields {"new lldp
peer osv":"4.2.1","old lldp peer os
v":"3.7.12"}
border02 lldp info LLDP Session with hostname border02 Fri Oct 2 22:28:35 2020
and eth0 modified fields {"new lldp
peer osv":"4.2.1","old lldp peer os
v":"3.7.12"}
spine04 lldp info LLDP Session with hostname spine04 Fri Oct 2 22:28:35 2020
and eth0 modified fields {"new lldp
peer osv":"4.2.1","old lldp peer os
v":"3.7.12"}
server07 lldp info LLDP Session with hostname server07 Fri Oct 2 22:28:34 2020
and eth0 modified fields {"new lldp
peer osv":"4.2.1","old lldp peer os
v":"3.7.12"}
server08 lldp info LLDP Session with hostname server08 Fri Oct 2 22:28:33 2020
and eth0 modified fields {"new lldp
peer osv":"4.2.1","old lldp peer os
v":"3.7.12"}
fw2 lldp info LLDP Session with hostname fw2 and Fri Oct 2 22:28:32 2020
eth0 modified fields {"new lldp pee
r osv":"4.2.1","old lldp peer osv":
"3.7.12"}
server02 lldp info LLDP Session with hostname server02 Fri Oct 2 22:28:31 2020
and eth0 modified fields {"new lldp
peer osv":"4.2.1","old lldp peer os
v":"3.7.12"}
server03 lldp info LLDP Session with hostname server03 Fri Oct 2 22:28:28 2020
and eth0 modified fields {"new lldp
peer osv":"4.2.1","old lldp peer os
v":"3.7.12"}
border01 lldp info LLDP Session with hostname border01 Fri Oct 2 22:28:28 2020
and eth0 modified fields {"new lldp
peer osv":"4.2.1","old lldp peer os
v":"3.7.12"}
leaf03 lldp info LLDP Session with hostname leaf03 a Fri Oct 2 22:28:27 2020
nd eth0 modified fields {"new lldp
peer osv":"4.2.1","old lldp peer os
v":"3.7.12"}
fw1 lldp info LLDP Session with hostname fw1 and Fri Oct 2 22:28:23 2020
eth0 modified fields {"new lldp pee
r osv":"4.2.1","old lldp peer osv":
"3.7.12"}
server05 lldp info LLDP Session with hostname server05 Fri Oct 2 22:28:22 2020
and eth0 modified fields {"new lldp
peer osv":"4.2.1","old lldp peer os
v":"3.7.12"}
server06 lldp info LLDP Session with hostname server06 Fri Oct 2 22:28:21 2020
and eth0 modified fields {"new lldp
peer osv":"4.2.1","old lldp peer os
v":"3.7.12"}
spine03 lldp info LLDP Session with hostname spine03 Fri Oct 2 22:28:20 2020
and eth0 modified fields {"new lldp
peer osv":"4.2.1","old lldp peer os
v":"3.7.12"}
server01 lldp info LLDP Session with hostname server01 Fri Oct 2 22:28:15 2020
and eth0 modified fields {"new lldp
peer osv":"4.2.1","old lldp peer os
v":"3.7.12"}
server04 lldp info LLDP Session with hostname server04 Fri Oct 2 22:28:13 2020
and eth0 modified fields {"new lldp
peer osv":"4.2.1","old lldp peer os
v":"3.7.12"}
leaf01 lldp info LLDP Session with hostname leaf01 a Fri Oct 2 22:28:05 2020
nd eth0 modified fields {"new lldp
peer osv":"4.2.1","old lldp peer os
v":"3.7.12"}
spine01 lldp info LLDP Session with hostname spine01 Fri Oct 2 22:28:05 2020
and eth0 modified fields {"new lldp
peer osv":"4.2.1","old lldp peer os
v":"3.7.12"}
oob-mgmt-server lldp info LLDP Session with hostname oob-mgmt Fri Oct 2 22:27:54 2020
-server and eth1 modified fields {"
new lldp peer osv":"4.2.1","old lld
p peer osv":"3.7.12"}
leaf02 lldp info LLDP Session with hostname leaf02 a Fri Oct 2 22:27:39 2020
nd eth0 modified fields {"new lldp
peer osv":"4.2.1","old lldp peer os
v":"3.7.12"}
View All LLDP Events
The Network Services/All LLDP Sessions card and the netq show events message_type lldp command let you view all LLDP events in a designated time period.
To view all LLDP events:
Open the Network Services/All LLDP Sessions card.
Change to the full-screen card using the card size picker.
Select Events.
By default, events sort by time, with the most recent listed first.
To view all LLDP events, run:
netq show events [severity info | severity error ] message_type lldp [between <text-time> and <text-endtime>] [json]
This example shows that no LLDP events have occurred in the last three days.
cumulus@switch:~$ netq show events message_type lldp between now and 3d
No matching event records found
View Details About All Switches Running LLDP
You can view attributes of all switches running LLDP in your network in the full-screen card.
To view all switch details, open the Network Services/All LLDP Sessions card, and select All Switches.
Use the icons above the table to select/deselect, filter, and export items in the list. Refer to Table Settings for more detail.
View Details for All LLDP Sessions
You can view attributes of all LLDP sessions in your network with the NetQ UI or NetQ CLI.
To view all session details:
Open the Network Services/All LLDP Sessions card.
Change to the full-screen card using the card size picker.
Select All Sessions.
Use the icons above the table to select/deselect, filter, and export items in the list. Refer to Table Settings for more detail.
To view session details, run netq show lldp.
This example shows all current sessions (one per row) and the attributes associated with them.
cumulus@netq-ts:~$ netq show lldp
Matching lldp records:
Hostname Interface Peer Hostname Peer Interface Last Changed
----------------- ------------------------- ----------------- ------------------------- -------------------------
border01 swp3 fw1 swp1 Mon Oct 26 04:13:29 2020
border01 swp49 border02 swp49 Mon Oct 26 04:13:29 2020
border01 swp51 spine01 swp5 Mon Oct 26 04:13:29 2020
border01 swp52 spine02 swp5 Mon Oct 26 04:13:29 2020
border01 eth0 oob-mgmt-switch swp20 Mon Oct 26 04:13:29 2020
border01 swp53 spine03 swp5 Mon Oct 26 04:13:29 2020
border01 swp50 border02 swp50 Mon Oct 26 04:13:29 2020
border01 swp54 spine04 swp5 Mon Oct 26 04:13:29 2020
border02 swp49 border01 swp49 Mon Oct 26 04:13:11 2020
border02 swp3 fw1 swp2 Mon Oct 26 04:13:11 2020
border02 swp51 spine01 swp6 Mon Oct 26 04:13:11 2020
border02 swp54 spine04 swp6 Mon Oct 26 04:13:11 2020
border02 swp52 spine02 swp6 Mon Oct 26 04:13:11 2020
border02 eth0 oob-mgmt-switch swp21 Mon Oct 26 04:13:11 2020
border02 swp53 spine03 swp6 Mon Oct 26 04:13:11 2020
border02 swp50 border01 swp50 Mon Oct 26 04:13:11 2020
fw1 eth0 oob-mgmt-switch swp18 Mon Oct 26 04:38:03 2020
fw1 swp1 border01 swp3 Mon Oct 26 04:38:03 2020
fw1 swp2 border02 swp3 Mon Oct 26 04:38:03 2020
fw2 eth0 oob-mgmt-switch swp19 Mon Oct 26 04:46:54 2020
leaf01 swp1 server01 mac:44:38:39:00:00:32 Mon Oct 26 04:13:57 2020
leaf01 swp2 server02 mac:44:38:39:00:00:34 Mon Oct 26 04:13:57 2020
leaf01 swp52 spine02 swp1 Mon Oct 26 04:13:57 2020
leaf01 swp49 leaf02 swp49 Mon Oct 26 04:13:57 2020
leaf01 eth0 oob-mgmt-switch swp10 Mon Oct 26 04:13:57 2020
leaf01 swp3 server03 mac:44:38:39:00:00:36 Mon Oct 26 04:13:57 2020
leaf01 swp53 spine03 swp1 Mon Oct 26 04:13:57 2020
leaf01 swp50 leaf02 swp50 Mon Oct 26 04:13:57 2020
leaf01 swp54 spine04 swp1 Mon Oct 26 04:13:57 2020
leaf01 swp51 spine01 swp1 Mon Oct 26 04:13:57 2020
leaf02 swp52 spine02 swp2 Mon Oct 26 04:14:57 2020
leaf02 swp54 spine04 swp2 Mon Oct 26 04:14:57 2020
leaf02 swp2 server02 mac:44:38:39:00:00:3a Mon Oct 26 04:14:57 2020
leaf02 swp3 server03 mac:44:38:39:00:00:3c Mon Oct 26 04:14:57 2020
...
Monitor a Single LLDP Session
With NetQ, you can monitor the number of nodes running the LLDP service, view neighbor state changes, and compare with events occurring at the same time, as well as monitor the running LLDP configuration and changes to the configuration file. For an overview and how to configure LLDP in your data center network, refer to
Link Layer Discovery Protocol.
To access the single session cards, you must open the full-screen Network Services/All LLDP Sessions card, click the All Sessions tab, select the desired session, then click (Open Card).
Granularity of Data Shown Based on Time Period
On the medium and large single LLDP session cards, vertically stacked heat maps represent the status of the neighboring peers: one for peers that are reachable (neighbor detected) and one for peers that are unreachable (neighbor not detected). Depending on the time period of data on the card, the number of smaller time blocks used to indicate the status varies. A vertical stack of time blocks, one from each map, includes the results from all checks during that time. The results appear by how saturated the color is for each block. If LLDP detected all peers during that time period for the entire time block, then the top block is 100% saturated (white) and the neighbor not detected block is 0% saturated (gray). As peers become reachable, the neighbor-detected block increases in saturation and the peers that are unreachable (neighbor not detected) block is proportionally reduced in saturation. The following example shows a heat map for a time period of 24 hours with the most common time periods in the table showing the resulting time blocks.
Time Period
Number of Runs
Number Time Blocks
Amount of Time in Each Block
6 hours
18
6
1 hour
12 hours
36
12
1 hour
24 hours
72
24
1 hour
1 week
504
7
1 day
1 month
2,086
30
1 day
1 quarter
7,000
13
1 week
View Session Status Summary
You can view information about a given LLDP session using the NetQ UI or NetQ CLI.
A summary of the LLDP session is available from the Network Services/LLDP Session card, showing the node and its peer as well as current status.
To view the summary:
Open the or add the Network Services/All LLDP Sessions card.
Change to the full-screen card using the card size picker.
Click All Sessions.
Select the session of interest, then click (Open Card).
Locate the medium Network Services/LLDP Session card.
Optionally, open the small Network Services/LLDP Session card to keep track of the session health.
Run the netq show lldp command with the hostname and remote-physical-interface options.
This example show the session information for the leaf02 switch on swp49 interface of the leaf01 peer.
You can view the neighbor state for a given LLDP session from the medium and large LLDP Session cards. For a given time period, you can determine the stability of the LLDP session between two devices. If you experienced connectivity issues at a particular time, you can use these cards to help verify the state of the neighbor. If the neighbor was not alive more than it was alive, you can then investigate further into possible causes.
You can view the neighbor states on either the medium or large Network Services/All LLDP Sessions cards. To view the neighbor availability for a given LLDP session on the large LLDP Session card:
Open a Network Services/LLDP Session card.
Hover over the card, and change to the large card using the card size picker.
From this card, you can also view the alarm and info event counts, host interface name, peer hostname, and peer interface identifying the session in more detail.
View Changes to the LLDP Service Configuration File
Each time a change is made to the configuration file for the LLDP service, NetQ logs the change and lets you compare it with the last version using the NetQ UI. This can be useful when you are troubleshooting potential causes for alarms or sessions losing their connections.
To view the configuration file changes:
Open or add the full-screen Network Services/All LLDP Sessions card.
Click All Sessions.
Select the session of interest, then click (Open Card).
Locate the Network Services/LLDP Session card and expand it to large.
Hover over the card and click to open the Configuration File Evolution tab.
Select the time of interest on the left; when a change might have impacted the performance. Scroll down if needed.
Choose between the File view and the Diff view.
The File view displays the file’s content.
The Diff view displays the changes between the two versions, side by side. The changes are highlighted in red and green. In the following example, there are no differences between the files, and thus no highlighted lines.
View All LLDP Session Details
You can view attributes of all of the LLDP sessions for the devices participating in a given session with the NetQ UI and the NetQ CLI.
To view all session details:
Open or add the full-screen Network Services/All LLDP Sessions card.
Click All Sessions.
Select the session of interest, then click (Open Card).
Locate the Network Services/LLDP Session card and change to the full-screen card using the card size picker. The All LLDP Sessions tab is displayed by default.
Run the netq show lldp command.
This example shows all LLDP sessions in the last 24 hours.
cumulus@netq-ts:~$ netq show lldp
Matching lldp records:
Hostname Interface Peer Hostname Peer Interface Last Changed
----------------- ------------------------- ----------------- ------------------------- -------------------------
border01 swp3 fw1 swp1 Mon Oct 26 04:13:29 2020
border01 swp49 border02 swp49 Mon Oct 26 04:13:29 2020
border01 swp51 spine01 swp5 Mon Oct 26 04:13:29 2020
border01 swp52 spine02 swp5 Mon Oct 26 04:13:29 2020
border01 eth0 oob-mgmt-switch swp20 Mon Oct 26 04:13:29 2020
border01 swp53 spine03 swp5 Mon Oct 26 04:13:29 2020
border01 swp50 border02 swp50 Mon Oct 26 04:13:29 2020
border01 swp54 spine04 swp5 Mon Oct 26 04:13:29 2020
border02 swp49 border01 swp49 Mon Oct 26 04:13:11 2020
border02 swp3 fw1 swp2 Mon Oct 26 04:13:11 2020
border02 swp51 spine01 swp6 Mon Oct 26 04:13:11 2020
border02 swp54 spine04 swp6 Mon Oct 26 04:13:11 2020
border02 swp52 spine02 swp6 Mon Oct 26 04:13:11 2020
border02 eth0 oob-mgmt-switch swp21 Mon Oct 26 04:13:11 2020
border02 swp53 spine03 swp6 Mon Oct 26 04:13:11 2020
border02 swp50 border01 swp50 Mon Oct 26 04:13:11 2020
fw1 eth0 oob-mgmt-switch swp18 Mon Oct 26 04:38:03 2020
fw1 swp1 border01 swp3 Mon Oct 26 04:38:03 2020
fw1 swp2 border02 swp3 Mon Oct 26 04:38:03 2020
fw2 eth0 oob-mgmt-switch swp19 Mon Oct 26 04:46:54 2020
leaf01 swp1 server01 mac:44:38:39:00:00:32 Mon Oct 26 04:13:57 2020
leaf01 swp2 server02 mac:44:38:39:00:00:34 Mon Oct 26 04:13:57 2020
leaf01 swp52 spine02 swp1 Mon Oct 26 04:13:57 2020
leaf01 swp49 leaf02 swp49 Mon Oct 26 04:13:57 2020
leaf01 eth0 oob-mgmt-switch swp10 Mon Oct 26 04:13:57 2020
leaf01 swp3 server03 mac:44:38:39:00:00:36 Mon Oct 26 04:13:57 2020
leaf01 swp53 spine03 swp1 Mon Oct 26 04:13:57 2020
leaf01 swp50 leaf02 swp50 Mon Oct 26 04:13:57 2020
leaf01 swp54 spine04 swp1 Mon Oct 26 04:13:57 2020
leaf01 swp51 spine01 swp1 Mon Oct 26 04:13:57 2020
leaf02 swp52 spine02 swp2 Mon Oct 26 04:14:57 2020
leaf02 swp54 spine04 swp2 Mon Oct 26 04:14:57 2020
leaf02 swp2 server02 mac:44:38:39:00:00:3a Mon Oct 26 04:14:57 2020
leaf02 swp3 server03 mac:44:38:39:00:00:3c Mon Oct 26 04:14:57 2020
leaf02 swp53 spine03 swp2 Mon Oct 26 04:14:57 2020
leaf02 swp50 leaf01 swp50 Mon Oct 26 04:14:57 2020
leaf02 swp51 spine01 swp2 Mon Oct 26 04:14:57 2020
leaf02 eth0 oob-mgmt-switch swp11 Mon Oct 26 04:14:57 2020
leaf02 swp49 leaf01 swp49 Mon Oct 26 04:14:57 2020
leaf02 swp1 server01 mac:44:38:39:00:00:38 Mon Oct 26 04:14:57 2020
leaf03 swp2 server05 mac:44:38:39:00:00:40 Mon Oct 26 04:16:09 2020
leaf03 swp49 leaf04 swp49 Mon Oct 26 04:16:09 2020
leaf03 swp51 spine01 swp3 Mon Oct 26 04:16:09 2020
leaf03 swp50 leaf04 swp50 Mon Oct 26 04:16:09 2020
leaf03 swp54 spine04 swp3 Mon Oct 26 04:16:09 2020
...
spine04 swp3 leaf03 swp54 Mon Oct 26 04:11:23 2020
spine04 swp2 leaf02 swp54 Mon Oct 26 04:11:23 2020
spine04 swp4 leaf04 swp54 Mon Oct 26 04:11:23 2020
spine04 swp1 leaf01 swp54 Mon Oct 26 04:11:23 2020
spine04 swp5 border01 swp54 Mon Oct 26 04:11:23 2020
spine04 swp6 border02 swp54 Mon Oct 26 04:11:23 2020
MAC Addresses
A MAC (media access control) address is a layer 2 construct that uses 48 bits to uniquely identify a network interface controller (NIC) for communication within a network.
With NetQ, you can:
View MAC address across the network and for a given device, VLAN, egress port on a VLAN, and VRR
View a count of MAC addresses on a given device
View where MAC addresses have lived in the network (MAC history)
View commentary on changes to MAC addresses (MAC commentary)
View events related to MAC addresses
MAC addresses are associated with switch interfaces. They are classified as:
Origin: MAC address is owned by a particular switch, on one or more interfaces. A MAC address typically has only one origin node. The exceptions are when MLAG is configured, the MAC on the VRR interfaces for the MLAG pair is the same, and when EVPN is configured, the MAC is distributed across the layer 3 gateways.
Remote: MAC address is learned or distributed by the control plane on a tunnel interface pointing to a particular remote location. For a given MAC address and VLAN there is only one first-hop switch (or switch pair), but multiple nodes can have the same remote MAC address.
Local (not origin and not remote): MAC address is learned on a bridge and points to an interface on another switch. If the LLDP neighbor of the interface is a host, then this switch is the first-hop switch where the MAC address is learned. For a given MAC address and VLAN there is only one first-hop switch, except if the switches are part of an MLAG pair, and the interfaces on both switches form a dually or singly connected bond.
The NetQ UI provides a listing of current MAC addresses that you can filter by hostname, timestamp, MAC address, VLAN, and origin. You can sort the list by these parameters and also remote, static, and next hop.
The NetQ CLI provides the following commands:
netq show macs [<mac>] [vlan <1-4096>] [origin] [around <text-time>] [json]
netq <hostname> show macs [<mac>] [vlan <1-4096>] [origin | count] [around <text-time>] [json]
netq <hostname> show macs egress-port <egress-port> [<mac>] [vlan <1-4096>] [origin] [around <text-time>] [json]
netq [<hostname>] show mac-history <mac> [vlan <1-4096>] [diff] [between <text-time> and <text-endtime>] [listby <text-list-by>] [json]
netq [<hostname>] show mac-commentary <mac> vlan <1-4096> [between <text-time> and <text-endtime>] [json]
netq [<hostname>] show events [severity info | severity error ] message_type macs [between <text-time> and <text-endtime>] [json]
View MAC Addresses Networkwide
You can view all MAC addresses across your network with the NetQ UI or the NetQ CLI.
Select the Menu.
Under the Network heading, select MACs.
You can filter and sort the table entries.
Use the netq show macs command to view all MAC addresses.
This example shows all MAC addresses in the Cumulus Networks reference topology.
cumulus@switch:~$ netq show macs
Matching mac records:
Origin MAC Address VLAN Hostname Egress Port Remote Last Changed
------ ------------------ ------ ----------------- ------------------------------ ------ -------------------------
no 46:38:39:00:00:46 20 leaf04 bond2 no Tue Oct 27 22:29:07 2020
yes 44:38:39:00:00:5e 20 leaf04 bridge no Tue Oct 27 22:29:07 2020
yes 00:00:00:00:00:1a 10 leaf04 bridge no Tue Oct 27 22:29:07 2020
yes 44:38:39:00:00:5e 4002 leaf04 bridge no Tue Oct 27 22:29:07 2020
no 44:38:39:00:00:5d 30 leaf04 peerlink no Tue Oct 27 22:29:07 2020
no 44:38:39:00:00:37 30 leaf04 vni30 no Tue Oct 27 22:29:07 2020
no 44:38:39:00:00:59 30 leaf04 vni30 no Tue Oct 27 22:29:07 2020
yes 7e:1a:b3:4f:05:b8 20 leaf04 vni20 no Tue Oct 27 22:29:07 2020
no 44:38:39:00:00:36 30 leaf04 vni30 yes Tue Oct 27 22:29:07 2020
no 44:38:39:00:00:59 20 leaf04 vni20 no Tue Oct 27 22:29:07 2020
no 44:38:39:00:00:37 20 leaf04 vni20 no Tue Oct 27 22:29:07 2020
...
yes 7a:4a:c7:bb:48:27 4001 border01 vniRED no Tue Oct 27 22:28:48 2020
yes ce:93:1d:e3:08:1b 4002 border01 vniBLUE no Tue Oct 27 22:28:48 2020
View MAC Addresses for a Given Device
Select the Menu.
Under the Network heading, select MACs.
Click and enter a hostname:
Click Apply.
Use the netq <hostname> show macs command to view MAC address on a given device.
This example shows all MAC addresses on the leaf03 switch.
cumulus@switch:~$ netq leaf03 show macs
Matching mac records:
Origin MAC Address VLAN Hostname Egress Port Remote Last Changed
------ ------------------ ------ ----------------- ------------------------------ ------ -------------------------
yes 2e:3d:b4:55:40:ba 4002 leaf03 vniBLUE no Tue Oct 27 22:28:24 2020
no 44:38:39:00:00:5e 20 leaf03 peerlink no Tue Oct 27 22:28:24 2020
no 46:38:39:00:00:46 20 leaf03 bond2 no Tue Oct 27 22:28:24 2020
yes 44:38:39:00:00:5d 4001 leaf03 bridge no Tue Oct 27 22:28:24 2020
yes 00:00:00:00:00:1a 10 leaf03 bridge no Tue Oct 27 22:28:24 2020
yes 44:38:39:00:00:5d 30 leaf03 bridge no Tue Oct 27 22:28:24 2020
yes 26:6e:54:35:3b:28 4001 leaf03 vniRED no Tue Oct 27 22:28:24 2020
no 44:38:39:00:00:37 30 leaf03 vni30 no Tue Oct 27 22:28:24 2020
no 44:38:39:00:00:59 30 leaf03 vni30 no Tue Oct 27 22:28:24 2020
yes 72:78:e6:4e:3d:4c 20 leaf03 vni20 no Tue Oct 27 22:28:24 2020
no 44:38:39:00:00:36 30 leaf03 vni30 yes Tue Oct 27 22:28:24 2020
no 44:38:39:00:00:59 20 leaf03 vni20 no Tue Oct 27 22:28:24 2020
no 44:38:39:00:00:37 20 leaf03 vni20 no Tue Oct 27 22:28:24 2020
no 44:38:39:00:00:59 10 leaf03 vni10 no Tue Oct 27 22:28:24 2020
no 44:38:39:00:00:37 10 leaf03 vni10 no Tue Oct 27 22:28:24 2020
no 46:38:39:00:00:48 30 leaf03 bond3 no Tue Oct 27 22:28:24 2020
no 46:38:39:00:00:38 10 leaf03 vni10 yes Tue Oct 27 22:28:24 2020
yes 36:99:0d:48:51:41 10 leaf03 vni10 no Tue Oct 27 22:28:24 2020
yes 1a:6e:d8:ed:d2:04 30 leaf03 vni30 no Tue Oct 27 22:28:24 2020
no 46:38:39:00:00:36 30 leaf03 vni30 yes Tue Oct 27 22:28:24 2020
no 44:38:39:00:00:5e 30 leaf03 peerlink no Tue Oct 27 22:28:24 2020
no 44:38:39:00:00:3e 10 leaf03 bond1 no Tue Oct 27 22:28:24 2020
no 44:38:39:00:00:34 20 leaf03 vni20 yes Tue Oct 27 22:28:24 2020
no 44:38:39:00:00:5e 10 leaf03 peerlink no Tue Oct 27 22:28:24 2020
no 46:38:39:00:00:3c 30 leaf03 vni30 yes Tue Oct 27 22:28:24 2020
no 46:38:39:00:00:3e 10 leaf03 bond1 no Tue Oct 27 22:28:24 2020
no 46:38:39:00:00:34 20 leaf03 vni20 yes Tue Oct 27 22:28:24 2020
no 44:38:39:00:00:42 30 leaf03 bond3 no Tue Oct 27 22:28:24 2020
yes 44:38:39:00:00:5d 4002 leaf03 bridge no Tue Oct 27 22:28:24 2020
yes 44:38:39:00:00:5d 20 leaf03 bridge no Tue Oct 27 22:28:24 2020
yes 44:38:39:be:ef:bb 4002 leaf03 bridge no Tue Oct 27 22:28:24 2020
no 44:38:39:00:00:32 10 leaf03 vni10 yes Tue Oct 27 22:28:24 2020
yes 44:38:39:00:00:5d 10 leaf03 bridge no Tue Oct 27 22:28:24 2020
yes 00:00:00:00:00:1b 20 leaf03 bridge no Tue Oct 27 22:28:24 2020
no 46:38:39:00:00:44 10 leaf03 bond1 no Tue Oct 27 22:28:24 2020
no 46:38:39:00:00:42 30 leaf03 bond3 no Tue Oct 27 22:28:24 2020
yes 44:38:39:be:ef:bb 4001 leaf03 bridge no Tue Oct 27 22:28:24 2020
yes 00:00:00:00:00:1c 30 leaf03 bridge no Tue Oct 27 22:28:24 2020
no 46:38:39:00:00:32 10 leaf03 vni10 yes Tue Oct 27 22:28:24 2020
no 44:38:39:00:00:40 20 leaf03 bond2 no Tue Oct 27 22:28:24 2020
no 46:38:39:00:00:3a 20 leaf03 vni20 yes Tue Oct 27 22:28:24 2020
no 46:38:39:00:00:40 20 leaf03 bond2 no Tue Oct 27 22:28:24 2020
View MAC Addresses Associated with a VLAN
Select the Menu.
Under the Network heading, select MACs.
Click and enter a VLAN ID.
Click Apply.
(Optional) Select and add the additional hostname filter to view the MAC addresses for a VLAN on a particular device.
Use the netq show macs command with the vlan option to view the MAC addresses for a given VLAN.
This example shows the MAC addresses associated with VLAN 10.
cumulus@switch:~$ netq show macs vlan 10
Matching mac records:
Origin MAC Address VLAN Hostname Egress Port Remote Last Changed
------ ------------------ ------ ----------------- ------------------------------ ------ -------------------------
yes 00:00:00:00:00:1a 10 leaf04 bridge no Tue Oct 27 22:29:07 2020
no 44:38:39:00:00:37 10 leaf04 vni10 no Tue Oct 27 22:29:07 2020
no 44:38:39:00:00:59 10 leaf04 vni10 no Tue Oct 27 22:29:07 2020
no 46:38:39:00:00:38 10 leaf04 vni10 yes Tue Oct 27 22:29:07 2020
no 44:38:39:00:00:3e 10 leaf04 bond1 no Tue Oct 27 22:29:07 2020
no 46:38:39:00:00:3e 10 leaf04 bond1 no Tue Oct 27 22:29:07 2020
yes 44:38:39:00:00:5e 10 leaf04 bridge no Tue Oct 27 22:29:07 2020
no 44:38:39:00:00:32 10 leaf04 vni10 yes Tue Oct 27 22:29:07 2020
no 44:38:39:00:00:5d 10 leaf04 peerlink no Tue Oct 27 22:29:07 2020
no 46:38:39:00:00:44 10 leaf04 bond1 no Tue Oct 27 22:29:07 2020
no 46:38:39:00:00:32 10 leaf04 vni10 yes Tue Oct 27 22:29:07 2020
yes 36:ae:d2:23:1d:8c 10 leaf04 vni10 no Tue Oct 27 22:29:07 2020
yes 00:00:00:00:00:1a 10 leaf03 bridge no Tue Oct 27 22:28:24 2020
no 44:38:39:00:00:59 10 leaf03 vni10 no Tue Oct 27 22:28:24 2020
no 44:38:39:00:00:37 10 leaf03 vni10 no Tue Oct 27 22:28:24 2020
no 46:38:39:00:00:38 10 leaf03 vni10 yes Tue Oct 27 22:28:24 2020
yes 36:99:0d:48:51:41 10 leaf03 vni10 no Tue Oct 27 22:28:24 2020
no 44:38:39:00:00:3e 10 leaf03 bond1 no Tue Oct 27 22:28:24 2020
no 44:38:39:00:00:5e 10 leaf03 peerlink no Tue Oct 27 22:28:24 2020
no 46:38:39:00:00:3e 10 leaf03 bond1 no Tue Oct 27 22:28:24 2020
no 44:38:39:00:00:32 10 leaf03 vni10 yes Tue Oct 27 22:28:24 2020
yes 44:38:39:00:00:5d 10 leaf03 bridge no Tue Oct 27 22:28:24 2020
no 46:38:39:00:00:44 10 leaf03 bond1 no Tue Oct 27 22:28:24 2020
no 46:38:39:00:00:32 10 leaf03 vni10 yes Tue Oct 27 22:28:24 2020
yes 00:00:00:00:00:1a 10 leaf02 bridge no Tue Oct 27 22:28:51 2020
no 44:38:39:00:00:59 10 leaf02 peerlink no Tue Oct 27 22:28:51 2020
yes 44:38:39:00:00:37 10 leaf02 bridge no Tue Oct 27 22:28:51 2020
no 46:38:39:00:00:38 10 leaf02 bond1 no Tue Oct 27 22:28:51 2020
no 44:38:39:00:00:3e 10 leaf02 vni10 yes Tue Oct 27 22:28:51 2020
no 46:38:39:00:00:3e 10 leaf02 vni10 yes Tue Oct 27 22:28:51 2020
no 44:38:39:00:00:5e 10 leaf02 vni10 no Tue Oct 27 22:28:51 2020
no 44:38:39:00:00:5d 10 leaf02 vni10 no Tue Oct 27 22:28:51 2020
no 44:38:39:00:00:32 10 leaf02 bond1 no Tue Oct 27 22:28:51 2020
no 46:38:39:00:00:44 10 leaf02 vni10 yes Tue Oct 27 22:28:51 2020
no 46:38:39:00:00:32 10 leaf02 bond1 no Tue Oct 27 22:28:51 2020
yes 4a:32:30:8c:13:08 10 leaf02 vni10 no Tue Oct 27 22:28:51 2020
yes 00:00:00:00:00:1a 10 leaf01 bridge no Tue Oct 27 22:28:42 2020
no 44:38:39:00:00:37 10 leaf01 peerlink no Tue Oct 27 22:28:42 2020
yes 44:38:39:00:00:59 10 leaf01 bridge no Tue Oct 27 22:28:42 2020
no 46:38:39:00:00:38 10 leaf01 bond1 no Tue Oct 27 22:28:42 2020
no 44:38:39:00:00:3e 10 leaf01 vni10 yes Tue Oct 27 22:28:43 2020
no 46:38:39:00:00:3e 10 leaf01 vni10 yes Tue Oct 27 22:28:42 2020
no 44:38:39:00:00:5e 10 leaf01 vni10 no Tue Oct 27 22:28:42 2020
no 44:38:39:00:00:5d 10 leaf01 vni10 no Tue Oct 27 22:28:42 2020
no 44:38:39:00:00:32 10 leaf01 bond1 no Tue Oct 27 22:28:43 2020
no 46:38:39:00:00:44 10 leaf01 vni10 yes Tue Oct 27 22:28:43 2020
no 46:38:39:00:00:32 10 leaf01 bond1 no Tue Oct 27 22:28:42 2020
yes 52:37:ca:35:d3:70 10 leaf01 vni10 no Tue Oct 27 22:28:42 2020
Use the netq show macs command with the hostname and vlan options to view the MAC addresses for a given VLAN on a particular device.
This example shows the MAC addresses associated with VLAN 10 on the leaf02 switch.
cumulus@switch:~$ netq leaf02 show macs vlan 10
Matching mac records:
Origin MAC Address VLAN Hostname Egress Port Remote Last Changed
------ ------------------ ------ ----------------- ------------------------------ ------ -------------------------
yes 00:00:00:00:00:1a 10 leaf02 bridge no Tue Oct 27 22:28:51 2020
no 44:38:39:00:00:59 10 leaf02 peerlink no Tue Oct 27 22:28:51 2020
yes 44:38:39:00:00:37 10 leaf02 bridge no Tue Oct 27 22:28:51 2020
no 46:38:39:00:00:38 10 leaf02 bond1 no Tue Oct 27 22:28:51 2020
no 44:38:39:00:00:3e 10 leaf02 vni10 yes Tue Oct 27 22:28:51 2020
no 46:38:39:00:00:3e 10 leaf02 vni10 yes Tue Oct 27 22:28:51 2020
no 44:38:39:00:00:5e 10 leaf02 vni10 no Tue Oct 27 22:28:51 2020
no 44:38:39:00:00:5d 10 leaf02 vni10 no Tue Oct 27 22:28:51 2020
no 44:38:39:00:00:32 10 leaf02 bond1 no Tue Oct 27 22:28:51 2020
no 46:38:39:00:00:44 10 leaf02 vni10 yes Tue Oct 27 22:28:51 2020
no 46:38:39:00:00:32 10 leaf02 bond1 no Tue Oct 27 22:28:51 2020
yes 4a:32:30:8c:13:08 10 leaf02 vni10 no Tue Oct 27 22:28:51 2020
View MAC Addresses Associated with an Egress Port
Select the Menu.
Under the Network heading, select MACs.
Locate the Egress Port column. Hover over the column header and select it to sort A-Z or Z-A order of the egress port used by a MAC address.
(Optional) Click and enter a hostname to view the MAC addresses on a particular device.
Use the netq <hostname> show macs egress-port <egress-port> command to view the MAC addresses on a given device that use a given egress port. Note that you cannot view this information across all devices.
This example shows MAC addresses associated with the leaf03 switch that use the bridge port for egress.
cumulus@switch:~$ netq leaf03 show macs egress-port bridge
Matching mac records:
Origin MAC Address VLAN Hostname Egress Port Remote Last Changed
------ ------------------ ------ ----------------- ------------------------------ ------ -------------------------
yes 44:38:39:00:00:5d 4001 leaf03 bridge no Tue Oct 27 22:28:24 2020
yes 00:00:00:00:00:1a 10 leaf03 bridge no Tue Oct 27 22:28:24 2020
yes 44:38:39:00:00:5d 30 leaf03 bridge no Tue Oct 27 22:28:24 2020
yes 44:38:39:00:00:5d 4002 leaf03 bridge no Tue Oct 27 22:28:24 2020
yes 44:38:39:00:00:5d 20 leaf03 bridge no Tue Oct 27 22:28:24 2020
yes 44:38:39:be:ef:bb 4002 leaf03 bridge no Tue Oct 27 22:28:24 2020
yes 44:38:39:00:00:5d 10 leaf03 bridge no Tue Oct 27 22:28:24 2020
yes 00:00:00:00:00:1b 20 leaf03 bridge no Tue Oct 27 22:28:24 2020
yes 44:38:39:be:ef:bb 4001 leaf03 bridge no Tue Oct 27 22:28:24 2020
yes 00:00:00:00:00:1c 30 leaf03 bridge no Tue Oct 27 22:28:24 2020
View MAC Addresses Associated with VRR Configurations
You can view all MAC addresses associated with your VRR (virtual router reflector) interface configuration using the netq show interfaces type macvlan command. This is useful for determining if the specified MAC address inside a VLAN is the same or different across your VRR configuration.
cumulus@switch:~$ netq show interfaces type macvlan
Matching link records:
Hostname Interface Type State VRF Details Last Changed
----------------- ------------------------- ---------------- ---------- --------------- ----------------------------------- -------------------------
leaf01 vlan10-v0 macvlan up RED MAC: 00:00:00:00:00:1a, Tue Oct 27 22:28:42 2020
Mode: Private
leaf01 vlan20-v0 macvlan up RED MAC: 00:00:00:00:00:1b, Tue Oct 27 22:28:42 2020
Mode: Private
leaf01 vlan30-v0 macvlan up BLUE MAC: 00:00:00:00:00:1c, Tue Oct 27 22:28:42 2020
Mode: Private
leaf02 vlan10-v0 macvlan up RED MAC: 00:00:00:00:00:1a, Tue Oct 27 22:28:51 2020
Mode: Private
leaf02 vlan20-v0 macvlan up RED MAC: 00:00:00:00:00:1b, Tue Oct 27 22:28:51 2020
Mode: Private
leaf02 vlan30-v0 macvlan up BLUE MAC: 00:00:00:00:00:1c, Tue Oct 27 22:28:51 2020
Mode: Private
leaf03 vlan10-v0 macvlan up RED MAC: 00:00:00:00:00:1a, Tue Oct 27 22:28:23 2020
Mode: Private
leaf03 vlan20-v0 macvlan up RED MAC: 00:00:00:00:00:1b, Tue Oct 27 22:28:23 2020
Mode: Private
leaf03 vlan30-v0 macvlan up BLUE MAC: 00:00:00:00:00:1c, Tue Oct 27 22:28:23 2020
Mode: Private
leaf04 vlan10-v0 macvlan up RED MAC: 00:00:00:00:00:1a, Tue Oct 27 22:29:06 2020
Mode: Private
leaf04 vlan20-v0 macvlan up RED MAC: 00:00:00:00:00:1b, Tue Oct 27 22:29:06 2020
Mode: Private
leaf04 vlan30-v0 macvlan up BLUE MAC: 00:00:00:00:00:1c, Tue Oct 27 22:29:06 2020
Mode: Private
View the History of a MAC Address
It is useful when debugging to be able to see whether a MAC address is learned, where it moved in the network after that, if there was a duplicate at any time, and so forth. The netq show mac-history command makes this information available. It enables you to see:
Each change made chronologically.
Changes made between two points in time, using the between option.
Only the differences in the changes between two points in time using the diff option.
The output ordered by selected output fields using the listby option.
Each change made for the MAC address on a particular VLAN, using the vlan option.
The default time range used is now to one hour ago. You can view the output in JSON format as well.
View MAC Address Changes in Chronological Order
View the full listing of changes for a MAC address for the last hour in chronological order using the netq show mac-history command.
This example shows how to view a full chronology of changes for a MAC address of 44:38:39:00:00:5d. When shown, the caret (^) notation indicates no change in this value from the row above.
cumulus@switch:~$ netq show mac-history 44:38:39:00:00:5d
Matching machistory records:
Last Changed Hostname VLAN Origin Link Destination Remote Static
------------------------- ----------------- ------ ------ ---------------- ---------------------- ------ ------------
Tue Oct 27 22:28:24 2020 leaf03 10 yes bridge no no
Tue Oct 27 22:28:42 2020 leaf01 10 no vni10 10.0.1.2 no yes
Tue Oct 27 22:28:51 2020 leaf02 10 no vni10 10.0.1.2 no yes
Tue Oct 27 22:29:07 2020 leaf04 10 no peerlink no yes
Tue Oct 27 22:28:24 2020 leaf03 4002 yes bridge no no
Tue Oct 27 22:28:24 2020 leaf03 0 yes peerlink no no
Tue Oct 27 22:28:24 2020 leaf03 20 yes bridge no no
Tue Oct 27 22:28:42 2020 leaf01 20 no vni20 10.0.1.2 no yes
Tue Oct 27 22:28:51 2020 leaf02 20 no vni20 10.0.1.2 no yes
Tue Oct 27 22:29:07 2020 leaf04 20 no peerlink no yes
Tue Oct 27 22:28:24 2020 leaf03 4001 yes bridge no no
Tue Oct 27 22:28:24 2020 leaf03 30 yes bridge no no
Tue Oct 27 22:28:42 2020 leaf01 30 no vni30 10.0.1.2 no yes
Tue Oct 27 22:28:51 2020 leaf02 30 no vni30 10.0.1.2 no yes
Tue Oct 27 22:29:07 2020 leaf04 30 no peerlink no yes
View MAC Address Changes for a Given Time Frame
View a listing of changes for a MAC address for a given timeframe using the netq show mac-history command with the between option. When shown, the caret (^) notation indicates no change in this value from the row above.
This example shows changes for a MAC address of 44:38:39:00:00:5d between now three and seven days ago.
cumulus@switch:~$ netq show mac-history 44:38:39:00:00:5d between 3d and 7d
Matching machistory records:
Last Changed Hostname VLAN Origin Link Destination Remote Static
------------------------- ----------------- ------ ------ ---------------- ---------------------- ------ ------------
Tue Oct 20 22:28:19 2020 leaf03 10 yes bridge no no
Tue Oct 20 22:28:24 2020 leaf01 10 no vni10 10.0.1.2 no yes
Tue Oct 20 22:28:37 2020 leaf02 10 no vni10 10.0.1.2 no yes
Tue Oct 20 22:28:53 2020 leaf04 10 no peerlink no yes
Wed Oct 21 22:28:19 2020 leaf03 10 yes bridge no no
Wed Oct 21 22:28:26 2020 leaf01 10 no vni10 10.0.1.2 no yes
Wed Oct 21 22:28:44 2020 leaf02 10 no vni10 10.0.1.2 no yes
Wed Oct 21 22:28:55 2020 leaf04 10 no peerlink no yes
Thu Oct 22 22:28:20 2020 leaf03 10 yes bridge no no
Thu Oct 22 22:28:28 2020 leaf01 10 no vni10 10.0.1.2 no yes
Thu Oct 22 22:28:45 2020 leaf02 10 no vni10 10.0.1.2 no yes
Thu Oct 22 22:28:57 2020 leaf04 10 no peerlink no yes
Fri Oct 23 22:28:21 2020 leaf03 10 yes bridge no no
Fri Oct 23 22:28:29 2020 leaf01 10 no vni10 10.0.1.2 no yes
Fri Oct 23 22:28:45 2020 leaf02 10 no vni10 10.0.1.2 no yes
Fri Oct 23 22:28:58 2020 leaf04 10 no peerlink no yes
Sat Oct 24 22:28:28 2020 leaf03 10 yes bridge no no
Sat Oct 24 22:28:29 2020 leaf01 10 no vni10 10.0.1.2 no yes
Sat Oct 24 22:28:45 2020 leaf02 10 no vni10 10.0.1.2 no yes
Sat Oct 24 22:28:59 2020 leaf04 10 no peerlink no yes
Tue Oct 20 22:28:19 2020 leaf03 4002 yes bridge no no
Tue Oct 20 22:28:19 2020 leaf03 0 yes peerlink no no
Tue Oct 20 22:28:19 2020 leaf03 20 yes bridge no no
Tue Oct 20 22:28:24 2020 leaf01 20 no vni20 10.0.1.2 no yes
Tue Oct 20 22:28:37 2020 leaf02 20 no vni20 10.0.1.2 no yes
Tue Oct 20 22:28:53 2020 leaf04 20 no peerlink no yes
Wed Oct 21 22:28:19 2020 leaf03 20 yes bridge no no
Wed Oct 21 22:28:26 2020 leaf01 20 no vni20 10.0.1.2 no yes
Wed Oct 21 22:28:44 2020 leaf02 20 no vni20 10.0.1.2 no yes
Wed Oct 21 22:28:55 2020 leaf04 20 no peerlink no yes
Thu Oct 22 22:28:20 2020 leaf03 20 yes bridge no no
Thu Oct 22 22:28:28 2020 leaf01 20 no vni20 10.0.1.2 no yes
Thu Oct 22 22:28:45 2020 leaf02 20 no vni20 10.0.1.2 no yes
Thu Oct 22 22:28:57 2020 leaf04 20 no peerlink no yes
Fri Oct 23 22:28:21 2020 leaf03 20 yes bridge no no
Fri Oct 23 22:28:29 2020 leaf01 20 no vni20 10.0.1.2 no yes
Fri Oct 23 22:28:45 2020 leaf02 20 no vni20 10.0.1.2 no yes
Fri Oct 23 22:28:58 2020 leaf04 20 no peerlink no yes
Sat Oct 24 22:28:28 2020 leaf03 20 yes bridge no no
Sat Oct 24 22:28:29 2020 leaf01 20 no vni20 10.0.1.2 no yes
Sat Oct 24 22:28:45 2020 leaf02 20 no vni20 10.0.1.2 no yes
Sat Oct 24 22:28:59 2020 leaf04 20 no peerlink no yes
Tue Oct 20 22:28:19 2020 leaf03 4001 yes bridge no no
Tue Oct 20 22:28:19 2020 leaf03 30 yes bridge no no
Tue Oct 20 22:28:24 2020 leaf01 30 no vni30 10.0.1.2 no yes
Tue Oct 20 22:28:37 2020 leaf02 30 no vni30 10.0.1.2 no yes
Tue Oct 20 22:28:53 2020 leaf04 30 no peerlink no yes
Wed Oct 21 22:28:19 2020 leaf03 30 yes bridge no no
Wed Oct 21 22:28:26 2020 leaf01 30 no vni30 10.0.1.2 no yes
Wed Oct 21 22:28:44 2020 leaf02 30 no vni30 10.0.1.2 no yes
Wed Oct 21 22:28:55 2020 leaf04 30 no peerlink no yes
Thu Oct 22 22:28:20 2020 leaf03 30 yes bridge no no
Thu Oct 22 22:28:28 2020 leaf01 30 no vni30 10.0.1.2 no yes
Thu Oct 22 22:28:45 2020 leaf02 30 no vni30 10.0.1.2 no yes
Thu Oct 22 22:28:57 2020 leaf04 30 no peerlink no yes
Fri Oct 23 22:28:21 2020 leaf03 30 yes bridge no no
Fri Oct 23 22:28:29 2020 leaf01 30 no vni30 10.0.1.2 no yes
Fri Oct 23 22:28:45 2020 leaf02 30 no vni30 10.0.1.2 no yes
Fri Oct 23 22:28:58 2020 leaf04 30 no peerlink no yes
Sat Oct 24 22:28:28 2020 leaf03 30 yes bridge no no
Sat Oct 24 22:28:29 2020 leaf01 30 no vni30 10.0.1.2 no yes
Sat Oct 24 22:28:45 2020 leaf02 30 no vni30 10.0.1.2 no yes
Sat Oct 24 22:28:59 2020 leaf04 30 no peerlink no yes
View Only the Differences in MAC Address Changes
Instead of viewing the full chronology of change made for a MAC address within a given timeframe, you can view only the differences between two snapshots using the netq show mac-history command with the diff option. When shown, the caret (^) notation indicates no change in this value from the row above.
This example shows only the differences in the changes for a MAC address of 44:38:39:00:00:5d between now and an hour ago.
cumulus@switch:~$ netq show mac-history 44:38:39:00:00:5d diff
Matching machistory records:
Last Changed Hostname VLAN Origin Link Destination Remote Static
------------------------- ----------------- ------ ------ ---------------- ---------------------- ------ ------------
Tue Oct 27 22:29:07 2020 leaf04 30 no peerlink no yes
This example shows only the differences in the changes for a MAC address of 44:38:39:00:00:5d between now and 30 days ago.
cumulus@switch:~$ netq show mac-history 44:38:39:00:00:5d diff between now and 30d
Matching machistory records:
Last Changed Hostname VLAN Origin Link Destination Remote Static
------------------------- ----------------- ------ ------ ---------------- ---------------------- ------ ------------
Mon Sep 28 00:02:26 2020 leaf04 30 no peerlink no no
Tue Oct 27 22:29:07 2020 leaf04 ^ ^ ^ ^ ^ yes
View MAC Address Changes by a Given Attribute
You can order the output of the MAC address changes by many of the attributes associated with the changes that you can make using the netq show mac-history command with the listby option. For example, you can order the output by hostname, link, destination, and so forth.
This example shows the history of MAC address 44:38:39:00:00:5d ordered by hostname. When shown, the caret (^) notation indicates no change in this value from the row above.
cumulus@switch:~$ netq show mac-history 44:38:39:00:00:5d listby hostname
Matching machistory records:
Last Changed Hostname VLAN Origin Link Destination Remote Static
------------------------- ----------------- ------ ------ ---------------- ---------------------- ------ ------------
Tue Oct 27 22:28:51 2020 leaf02 20 no vni20 10.0.1.2 no yes
Tue Oct 27 22:28:24 2020 leaf03 4001 yes bridge no no
Tue Oct 27 22:28:24 2020 leaf03 0 yes peerlink no no
Tue Oct 27 22:28:24 2020 leaf03 4002 yes bridge no no
Tue Oct 27 22:28:42 2020 leaf01 10 no vni10 10.0.1.2 no yes
Tue Oct 27 22:29:07 2020 leaf04 10 no peerlink no yes
Tue Oct 27 22:29:07 2020 leaf04 30 no peerlink no yes
Tue Oct 27 22:28:42 2020 leaf01 30 no vni30 10.0.1.2 no yes
Tue Oct 27 22:28:42 2020 leaf01 20 no vni20 10.0.1.2 no yes
Tue Oct 27 22:28:51 2020 leaf02 10 no vni10 10.0.1.2 no yes
Tue Oct 27 22:29:07 2020 leaf04 20 no peerlink no yes
Tue Oct 27 22:28:51 2020 leaf02 30 no vni30 10.0.1.2 no yes
Tue Oct 27 22:28:24 2020 leaf03 10 yes bridge no no
Tue Oct 27 22:28:24 2020 leaf03 20 yes bridge no no
Tue Oct 27 22:28:24 2020 leaf03 30 yes bridge no no
View MAC Address Changes for a Given VLAN
View a listing of changes for a MAC address for a given VLAN using the netq show mac-history command with the vlan option. When shown, the caret (^) notation indicates no change in this value from the row above.
This example shows changes for a MAC address of 44:38:39:00:00:5d and VLAN 10.
cumulus@switch:~$ netq show mac-history 44:38:39:00:00:5d vlan 10
Matching machistory records:
Last Changed Hostname VLAN Origin Link Destination Remote Static
------------------------- ----------------- ------ ------ ---------------- ---------------------- ------ ------------
Tue Oct 27 22:28:24 2020 leaf03 10 yes bridge no no
Tue Oct 27 22:28:42 2020 leaf01 10 no vni10 10.0.1.2 no yes
Tue Oct 27 22:28:51 2020 leaf02 10 no vni10 10.0.1.2 no yes
Tue Oct 27 22:29:07 2020 leaf04 10 no peerlink no yes
View MAC Address Commentary
You can get more descriptive information about changes to a given MAC address on a specific VLAN. Commentary is available for the following MAC address-related events based on their classification (refer to the definition of these at the beginning of this topic):
Event Triggers
Example Commentary
A MAC address is created, or the MAC address on the interface is changed via the hwaddress option in /etc/network/interface
leaf01 00:00:5e:00:00:03 configured on interface vlan1000-v0
An interface becomes a slave in, or is removed from, a bond
leaf01 00:00:5e:00:00:03 configured on interface vlan1000-v0
An interface is a bridge and it inherits a different MAC address due to a membership change
leaf01 00:00:5e:00:00:03 configured on interface vlan1000-v0
A remote MAC address is learned or installed by control plane on a tunnel interface
44:38:39:00:00:5d learned/installed on vni vni10 pointing to remote dest 10.0.1.34
A remote MAC address is flushed or expires
leaf01 44:38:39:00:00:5d is flushed or expired
A remote MAC address moves from behind one remote switch to another remote switch or becomes a local MAC address
leaf02: 00:08:00:00:aa:13 moved from remote dest 27.0.0.22 to remote dest 27.0.0.34 00:08:00:00:aa:13 moved from remote dest 27.0.0.22 to local interface hostbond2
A MAC address is learned at the first-hop switch (or MLAG switch pair)
leaf04 (and MLAG peer leaf05): 44:38:39:00:00:5d learned on first hop switch, pointing to local interface bond4
A local MAC address is flushed or expires
leaf04 (and MLAG peer leaf05) 44:38:39:00:00:5d is flushed or expires from bond4
A local MAC address moves from one interface to another interface or to another switch
leaf04: 00:08:00:00:aa:13 moved from hostbond2 to hostbond3 00:08:00:00:aa:13 moved from hostbond2 to remote dest 27.0.0.13
To view MAC address commentary:
Select the Menu.
Under the Network heading, select MACs.
Select the checkbox next to one of the entries, then select Open card above the table.
Choose a time range, then click Continue.
You can scroll through the list to see comments related to the MAC address moves and changes:
(Optional) From here, you can filter the list by a given device by selecting Filters.
A red dot on the filter icon indicates that filtering is active. To remove the filter, click again, then click Clear Filter.
To see MAC address commentary, use the netq show mac-commentary command. The following examples show the commentary seen in common situations.
MAC Address Configured Locally
In this example, the 46:38:39:00:00:44 MAC address was configured on the VlanA-1 interface of multiple switches, so we see the MAC configured commentary on each of them.
cumulus@server-01:~$ netq show mac-commentary 46:38:39:00:00:44 between now and 1hr
Matching mac_commentary records:
Last Updated Hostname VLAN Commentary
------------------------- ---------------- ------ --------------------------------------------------------------------------------
Mon Aug 24 2020 14:14:33 leaf11 100 leaf11: 46:38:39:00:00:44 configured on interface VlanA-1
Mon Aug 24 2020 14:15:03 leaf12 100 leaf12: 46:38:39:00:00:44 configured on interface VlanA-1
Mon Aug 24 2020 14:15:19 leaf21 100 leaf21: 46:38:39:00:00:44 configured on interface VlanA-1
Mon Aug 24 2020 14:15:40 leaf22 100 leaf22: 46:38:39:00:00:44 configured on interface VlanA-1
Mon Aug 24 2020 14:15:19 leaf21 1003 leaf21: 46:38:39:00:00:44 configured on interface VlanA-1
Mon Aug 24 2020 14:15:40 leaf22 1003 leaf22: 46:38:39:00:00:44 configured on interface VlanA-1
Mon Aug 24 2020 14:16:32 leaf02 1003 leaf02: 00:00:5e:00:01:01 configured on interface VlanA-1
MAC Address Configured on Server and Learned from a Peer
In this example, the 00:08:00:00:aa:13 MAC address was configured on server01. As a result, both leaf11 and leaf12 learned this address on the next hop interface serv01bond2 (learned locally), whereas, the leaf01 switch learned this address remotely on vx-34 (learned remotely).
cumulus@server11:~$ netq show mac-commentary 00:08:00:00:aa:13 vlan 1000 between now and 5hr
Matching mac_commentary records:
Last Updated Hostname VLAN Commentary
------------------------- ---------------- ------ --------------------------------------------------------------------------------
Tue Aug 25 2020 10:29:23 leaf12 1000 leaf12: 00:08:00:00:aa:13 learned on first hop switch interface serv01bond2
Tue Aug 25 2020 10:29:23 leaf11 1000 leaf11: 00:08:00:00:aa:13 learned on first hop switch interface serv01bond2
Tue Aug 25 2020 10:29:23 leaf01 1000 leaf01: 00:08:00:00:aa:13 learned/installed on vni vx-34 pointing to remote dest 36.0.0.24
MAC Address Removed
In this example the bridge FDB entry for the 00:02:00:00:00:a0 MAC address, interface VlanA-1, and VLAN 100 was deleted impacting leaf11 and leaf12.
cumulus@server11:~$ netq show mac-commentary 00:02:00:00:00:a0 vlan 100 between now and 5hr
Matching mac_commentary records:
Last Updated Hostname VLAN Commentary
------------------------- ---------------- ------ --------------------------------------------------------------------------------
Mon Aug 24 2020 14:14:33 leaf11 100 leaf11: 00:02:00:00:00:a0 configured on interface VlanA-1
Mon Aug 24 2020 14:15:03 leaf12 100 leaf12: 00:02:00:00:00:a0 learned on first hop switch interface peerlink-1
Tue Aug 25 2020 13:06:52 leaf11 100 leaf11: 00:02:00:00:00:a0 unconfigured on interface VlanA-1
MAC Address Moved on Server and Learned from a Peer
The MAC address on server11 changed from 00:08:00:00:aa:13. In this example, the MAC learned remotely on leaf01 is now a locally learned MAC address from its local interface swp6. Similarly, the locally learned MAC addresses on leaf11 and leaf12 are now learned from remote dest 27.0.0.22.
cumulus@server11:~$ netq show mac-commentary 00:08:00:00:aa:13 vlan 1000 between now and 5hr
Matching mac_commentary records:
Last Updated Hostname VLAN Commentary
------------------------- ---------------- ------ --------------------------------------------------------------------------------
Tue Aug 25 2020 10:29:23 leaf12 1000 leaf12: 00:08:00:00:aa:13 learned on first hop switch interface serv01bond2
Tue Aug 25 2020 10:29:23 leaf11 1000 leaf11: 00:08:00:00:aa:13 learned on first hop switch interface serv01bond2
Tue Aug 25 2020 10:29:23 leaf01 1000 leaf01: 00:08:00:00:aa:13 learned/installed on vni vx-34 pointing to remote dest 36.0.0.24
Tue Aug 25 2020 10:33:06 leaf01 1000 leaf01: 00:08:00:00:aa:13 moved from remote dest 36.0.0.24 to local interface swp6
Tue Aug 25 2020 10:33:06 leaf12 1000 leaf12: 00:08:00:00:aa:13 moved from local interface serv01bond2 to remote dest 27.0.0.22
Tue Aug 25 2020 10:33:06 leaf11 1000 leaf11: 00:08:00:00:aa:13 moved from local interface serv01bond2 to remote dest 27.0.0.22
MAC Address Learned from MLAG Pair
In this example, after the local first hop learning of the 00:02:00:00:00:1c MAC address on leaf11 and leaf12, the MLAG exchanged the learning on the dually connected interface serv01bond3.
cumulus@server11:~$ netq show mac-commentary 00:02:00:00:00:1c vlan 105 between now and 2d
Matching mac_commentary records:
Last Updated Hostname VLAN Commentary
------------------------- ---------------- ------ --------------------------------------------------------------------------------
Sun Aug 23 2020 14:13:39 leaf11 105 leaf11: 00:02:00:00:00:1c learned on first hop switch interface serv01bond3
Sun Aug 23 2020 14:14:02 leaf12 105 leaf12: 00:02:00:00:00:1c learned on first hop switch interface serv01bond3
Sun Aug 23 2020 14:14:16 leaf11 105 leaf11: 00:02:00:00:00:1c moved from interface serv01bond3 to interface serv01bond3
Sun Aug 23 2020 14:14:23 leaf12 105 leaf12: 00:02:00:00:00:1c learned on MLAG peer dually connected interface serv01bond3
Sun Aug 23 2020 14:14:37 leaf11 105 leaf11: 00:02:00:00:00:1c learned on MLAG peer dually connected interface serv01bond3
Sun Aug 23 2020 14:14:39 leaf12 105 leaf12: 00:02:00:00:00:1c moved from interface serv01bond3 to interface serv01bond3
Sun Aug 23 2020 14:53:31 leaf11 105 leaf11: 00:02:00:00:00:1c learned on MLAG peer dually connected interface serv01bond3
Mon Aug 24 2020 14:15:03 leaf12 105 leaf12: 00:02:00:00:00:1c learned on MLAG peer dually connected interface serv01bond3
MAC Address Flushed
In this example, the interface VlanA-1 associated with the 00:02:00:00:00:2d MAC address and VLAN 1008 is deleted, impacting leaf11 and leaf12.
cumulus@server11:~$ netq show mac-commentary 00:02:00:00:00:2d vlan 1008 between now and 5hr
Matching mac_commentary records:
Last Updated Hostname VLAN Commentary
------------------------- ---------------- ------ --------------------------------------------------------------------------------
Mon Aug 24 2020 14:14:33 leaf11 1008 leaf11: 00:02:00:00:00:2d learned/installed on vni vx-42 pointing to remote dest 27.0.0.22
Mon Aug 24 2020 14:15:03 leaf12 1008 leaf12: 00:02:00:00:00:2d learned/installed on vni vx-42 pointing to remote dest 27.0.0.22
Mon Aug 24 2020 14:16:03 leaf01 1008 leaf01: 00:02:00:00:00:2d learned on MLAG peer dually connected interface swp8
Tue Aug 25 2020 11:36:06 leaf11 1008 leaf11: 00:02:00:00:00:2d is flushed or expired
Tue Aug 25 2020 11:36:06 leaf11 1008 leaf11: 00:02:00:00:00:2d on vni 1008 remote dest changed to 27.0.0.22
MLAG
You use Multi-Chassis Link Aggregation (MLAG) to enable a server or switch with a two-port bond (such as a link aggregation group/LAG, EtherChannel, port group or trunk) to connect those ports to different switches and operate as if they have a connection to a single, logical switch. This provides greater redundancy and greater system throughput. Dual-connected devices can create LACP bonds that contain links to each physical switch. Therefore, NetQ supports active-active links from the dual-connected devices even though each switch connects to a different physical switch. For an overview and how to configure MLAG in your network, refer to
Multi-Chassis Link Aggregation - MLAG.
MLAG or CLAG?
Other vendors refer to the Cumulus Linux implementation of MLAG as MLAG, MC-LAG or VPC. The NetQ UI uses the MLAG terminology predominantly. However, the management daemon, named clagd, and other options in the code, such as clag-id, remain for historical purposes.
NetQ enables operators to view the health of the MLAG service on a networkwide and a per-session basis, giving greater insight into all aspects of the service. You accomplish this in the NetQ UI through two card workflows, one for the service and one for the session, and in the NetQ CLI with the netq show mlag command.
Any prior scripts or automation that use the older netq show clag command continue to work as the command still exists in the operating system.
Monitor the MLAG Service Networkwide
View Service Status Summary
You can view a summary of the MLAG service from the NetQ UI or the NetQ CLI.
To view the summary, open the small Network Services/All MLAG Sessions card. In this example, the number of devices running the MLAG service is 4 and no alarms are present.
To view MLAG service status, run netq show mlag.
This example shows the Cumulus reference topology, where MLAG is configured on the border and leaf switches. You can view host, peer, system MAC address, state, information about the bonds, and last time a change was made for each MLAG session.
cumulus@switch~$ netq show mlag
Matching clag records:
Hostname Peer SysMac State Backup #Bond #Dual Last Changed
s
----------------- ----------------- ------------------ ---------- ------ ----- ----- -------------------------
border01(P) border02 44:38:39:be:ef:ff up up 3 3 Tue Oct 27 10:50:26 2020
border02 border01(P) 44:38:39:be:ef:ff up up 3 3 Tue Oct 27 10:46:38 2020
leaf01(P) leaf02 44:38:39:be:ef:aa up up 8 8 Tue Oct 27 10:44:39 2020
leaf02 leaf01(P) 44:38:39:be:ef:aa up up 8 8 Tue Oct 27 10:52:15 2020
leaf03(P) leaf04 44:38:39:be:ef:bb up up 8 8 Tue Oct 27 10:48:07 2020
leaf04 leaf03(P) 44:38:39:be:ef:bb up up 8 8 Tue Oct 27 10:48:18 2020
View the Distribution of Sessions and Alarms
It is useful to know the number of network nodes running the MLAG protocol over a period of time, as it gives you insight into the amount of traffic associated with and breadth of use of the protocol. It is also useful to compare the number of nodes running MLAG with the alarms present at the same time to determine if there is any correlation between the issues and the ability to establish an MLAG session.
Nodes with a large number of unestablished sessions might have a misconfiguration or might be experiencing communication issues. This is visible with the NetQ UI.
To view the distribution, open the medium Network Services/All MLAG Sessions card.
This example shows the following for the last 24 hours:
Four nodes have been running the MLAG protocol with no changes in that number
Four sessions were established and remained so
No MLAG-related alarms have occurred
If there was a visual correlation between the alarms and sessions, you could dig a little deeper with the large Network Services/All MLAG Sessions card.
To view the number of switches running the MLAG service, run:
netq show mlag
Count the switches in the output.
This example shows two border and four leaf switches, for a total of six switches running the protocol. NetQ marks the device in each session acting in the primary role with (P).
cumulus@switch~$ netq show mlag
Matching clag records:
Hostname Peer SysMac State Backup #Bond #Dual Last Changed
s
----------------- ----------------- ------------------ ---------- ------ ----- ----- -------------------------
border01(P) border02 44:38:39:be:ef:ff up up 3 3 Tue Oct 27 10:50:26 2020
border02 border01(P) 44:38:39:be:ef:ff up up 3 3 Tue Oct 27 10:46:38 2020
leaf01(P) leaf02 44:38:39:be:ef:aa up up 8 8 Tue Oct 27 10:44:39 2020
leaf02 leaf01(P) 44:38:39:be:ef:aa up up 8 8 Tue Oct 27 10:52:15 2020
leaf03(P) leaf04 44:38:39:be:ef:bb up up 8 8 Tue Oct 27 10:48:07 2020
leaf04 leaf03(P) 44:38:39:be:ef:bb up up 8 8 Tue Oct 27 10:48:18 2020
View Bonds with Only a Single Link
You can determine whether there are any bonds in your MLAG configuration with only a single link, instead of the usual two, using the NetQ UI or the NetQ CLI.
Open the medium Network Services/All MLAG Sessions card.
This example shows that four bonds have single links.
Hover over the card and change to the full-screen card using the card size picker.
Click the All Sessions tab.
Browse the sessions looking for either a blank value in the Dual Bonds column, or with one or more bonds listed in the Single Bonds column, to determine whether the devices participating in these sessions are incorrectly configured.
Optionally, change the time period of the data on either size card to determine when the configuration might have changed from a dual to a single bond.
Run the netq show mlag command to view bonds with single links in the last 24 hours. Use the around option to view bonds with single links for a time in the past.
This example shows that no bonds have single links, because the #Bonds value equals the #Dual value for all sessions.
cumulus@switch:~$ netq show mlag
Matching clag records:
Hostname Peer SysMac State Backup #Bond #Dual Last Changed
s
----------------- ----------------- ------------------ ---------- ------ ----- ----- -------------------------
border01(P) border02 44:38:39:be:ef:ff up up 3 3 Tue Oct 27 10:50:26 2020
border02 border01(P) 44:38:39:be:ef:ff up up 3 3 Tue Oct 27 10:46:38 2020
leaf01(P) leaf02 44:38:39:be:ef:aa up up 8 8 Tue Oct 27 10:44:39 2020
leaf02 leaf01(P) 44:38:39:be:ef:aa up up 8 8 Tue Oct 27 10:52:15 2020
leaf03(P) leaf04 44:38:39:be:ef:bb up up 8 8 Tue Oct 27 10:48:07 2020
leaf04 leaf03(P) 44:38:39:be:ef:bb up up 8 8 Tue Oct 27 10:48:18 2020
This example shows that you configured more bonds 30 days ago than in the last 24 hours, but in either case, none of those bonds had single links.
cumulus@switch:~$ netq show mlag around 30d
Matching clag records:
Hostname Peer SysMac State Backup #Bond #Dual Last Changed
s
----------------- ----------------- ------------------ ---------- ------ ----- ----- -------------------------
border01(P) border02 44:38:39:be:ef:ff up up 6 6 Sun Sep 27 03:41:52 2020
border02 border01(P) 44:38:39:be:ef:ff up up 6 6 Sun Sep 27 03:34:57 2020
leaf01(P) leaf02 44:38:39:be:ef:aa up up 8 8 Sun Sep 27 03:59:25 2020
leaf02 leaf01(P) 44:38:39:be:ef:aa up up 8 8 Sun Sep 27 03:38:39 2020
leaf03(P) leaf04 44:38:39:be:ef:bb up up 8 8 Sun Sep 27 03:36:40 2020
leaf04 leaf03(P) 44:38:39:be:ef:bb up up 8 8 Sun Sep 27 03:37:59 2020
View Sessions with No Backup IP addresses Assigned
You can determine whether MLAG sessions have a backup IP address assigned and ready using the NetQ UI or NetQ CLI.
Open the medium Network Services/All MLAG Sessions card.
This example shows that none of the bonds have single links.
Hover over the card and change to the full-screen card using the card size picker.
Click the All Sessions tab.
Look for the Backup IP column to confirm the IP address assigned if assigned.
Optionally, change the time period of the data on either size card to determine when a backup IP address was added or removed.
Run netq show mlag to view the status of backup IP addresses for sessions.
This example shows that a backup IP has been configured and is currently reachable for all MLAG sessions because the Backup column indicates up.
cumulus@switch:~$ netq show mlag
Matching clag records:
Hostname Peer SysMac State Backup #Bond #Dual Last Changed
s
----------------- ----------------- ------------------ ---------- ------ ----- ----- -------------------------
border01(P) border02 44:38:39:be:ef:ff up up 3 3 Tue Oct 27 10:50:26 2020
border02 border01(P) 44:38:39:be:ef:ff up up 3 3 Tue Oct 27 10:46:38 2020
leaf01(P) leaf02 44:38:39:be:ef:aa up up 8 8 Tue Oct 27 10:44:39 2020
leaf02 leaf01(P) 44:38:39:be:ef:aa up up 8 8 Tue Oct 27 10:52:15 2020
leaf03(P) leaf04 44:38:39:be:ef:bb up up 8 8 Tue Oct 27 10:48:07 2020
leaf04 leaf03(P) 44:38:39:be:ef:bb up up 8 8 Tue Oct 27 10:48:18 2020
View Sessions with Conflicted Bonds
You can view sessions with conflicted bonds (bonds that conflict with existing bond relationships) in the NetQ UI.
To view these sessions:
Open the Network Services/All MLAG Sessions card.
Hover over the card and change to the full-screen card using the card size picker.
Click the All Sessions tab.
Scroll to the right to view the Conflicted Bonds column. Based on the value/s in that field, reconfigure MLAG accordingly, using the net add bond NCLU command or edit the /etc/network/interfaces file. Refer to Basic Configuration in the Cumulus Linux MLAG topic.
View Devices with the Most MLAG Sessions
You can view the load from MLAG on your switches using the large Network Services/All MLAG Sessions card. This data enables you to see which switches are handling the most MLAG traffic currently, validate that is what you expect based on your network design, and compare that with data from an earlier time to look for any differences.
To view switches and hosts with the most MLAG sessions:
Open the large Network Services/All MLAG Sessions card.
Select Switches with Most Sessions from the filter above the table.
The table content sorts by this characteristic, listing nodes running the most MLAG sessions at the top. Scroll down to view those with the fewest sessions.
To compare this data with the same data at a previous time:
Open another large Network Services/All MLAG Sessions card.
Move the new card next to the original card if needed.
Change the time period for the data on the new card by hovering over the card and clicking .
Select the time period that you want to compare with the current time. You can now see whether there are significant differences between this time period and the previous time period.
If the changes are unexpected, you can investigate further by looking at another timeframe, determining if more nodes are now running MLAG than previously, looking for changes in the topology, and so forth.
To determine the devices with the most sessions, run netq show mlag. Then count the sessions on each device.
In this example, there are two sessions between border01 and border02, two sessions between leaf01 and leaf02, and two sessions between leaf03 and leaf04. Therefore, no devices has more sessions that any other.
cumulus@switch:~$ netq show mlag
Matching clag records:
Hostname Peer SysMac State Backup #Bond #Dual Last Changed
s
----------------- ----------------- ------------------ ---------- ------ ----- ----- -------------------------
border01(P) border02 44:38:39:be:ef:ff up up 3 3 Tue Oct 27 10:50:26 2020
border02 border01(P) 44:38:39:be:ef:ff up up 3 3 Tue Oct 27 10:46:38 2020
leaf01(P) leaf02 44:38:39:be:ef:aa up up 8 8 Tue Oct 27 10:44:39 2020
leaf02 leaf01(P) 44:38:39:be:ef:aa up up 8 8 Tue Oct 27 10:52:15 2020
leaf03(P) leaf04 44:38:39:be:ef:bb up up 8 8 Tue Oct 27 10:48:07 2020
leaf04 leaf03(P) 44:38:39:be:ef:bb up up 8 8 Tue Oct 27 10:48:18 2020
View Devices with the Most Unestablished MLAG Sessions
You can identify switches that are experiencing difficulties establishing MLAG sessions; both currently and in the past, using the NetQ UI.
To view switches with the most unestablished MLAG sessions:
Open the large Network Services/All MLAG Sessions card.
Select Switches with Most Unestablished Sessions from the filter above the table.
The table content sorts by this characteristic, listing nodes with the most unestablished MLAG sessions at the top. Scroll down to view those with the fewest unestablished sessions.
Where to go next depends on what data you see, but a few options include:
Change the time period for the data to compare with a prior time. If the same switches are consistently indicating the most unestablished sessions, you might want to look more carefully at those switches using the Switches card workflow to determine probable causes. Refer to Switches.
Click Show All Sessions to investigate all MLAG sessions with events in the full-screen card.
View MLAG Configuration Information for a Given Device
You can view the MLAG configuration information for a given device from the NetQ UI or the NetQ CLI.
Open the full-screen Network Services/All MLAG Sessions card.
Click to filter by hostname.
Click Apply.
The sessions with the identified device as the primary, or host device in the MLAG pair, are listed. This example shows the sessions for the leaf01 switch.
Run the netq show mlag command with the hostname option.
This example shows all sessions in which the leaf01 switch is the primary node.
cumulus@switch:~$ netq leaf01 show mlag
Matching clag records:
Hostname Peer SysMac State Backup #Bond #Dual Last Changed
s
----------------- ----------------- ------------------ ---------- ------ ----- ----- -------------------------
leaf01(P) leaf02 44:38:39:be:ef:aa up up 8 8 Tue Oct 27 10:44:39 2020
View Switches with the Most MLAG-related Alarms
Switches experiencing a large number of MLAG alarms might indicate a configuration or performance issue that needs further investigation. You can view this information using the NetQ UI or NetQ CLI.
With the NetQ UI, you can view the switches sorted by the number of MLAG alarms and then use the Switches card workflow or the Events|Alarms card workflow to gather more information about possible causes for the alarms.
To view switches with most MLAG alarms:
Open the large Network Services/All MLAG Sessions card.
Hover over the header and click .
Select Events by Most Active Device from the filter above the table.
The table content sorts by this characteristic, listing nodes with the most MLAG alarms at the top. Scroll down to view those with the fewest alarms.
Where to go next depends on what data you see, but a few options include:
Change the time period for the data to compare with a prior time. If the same switches are consistently indicating the most alarms, you might want to look more carefully at those switches using the Switches card workflow.
Click Show All Sessions to investigate all MLAG sessions with alarms in the full-screen card.
To view the switches and hosts with the most MLAG alarms and informational events, run the netq show events command with the message_type option set to clag, and optionally the between option set to display the events within a given time range. Count the events associated with each switch.
This example shows that no MLAG events have occurred in the last 24 hours. Note that this command still uses the clag nomenclature.
cumulus@switch:~$ netq show events message_type clag
No matching event records found
This example shows all MLAG events between now and 30 days ago, a total of 1 info event.
cumulus@switch:~$ netq show events message_type clag between now and 30d
Matching events records:
Hostname Message Type Severity Message Timestamp
----------------- ------------------------ ---------------- ----------------------------------- -------------------------
border02 clag info Peer state changed to up Fri Oct 2 22:39:28 2020
View All MLAG Events
The Network Services/All MLAG Sessions card workflow and the netq show events message_type mlag command enable you to view all MLAG events in a designated time period.
To view all MLAG events:
Open the Network Services/All MLAG Sessions card.
Change to the full-screen card using the card size picker.
Click All Alarms tab.
By default, events sort by most recent to least recent.
Where to go next depends on what data you see, but a few options include:
Sort on various parameters:
By Message to determine the frequency of particular events.
By Severity to determine the most critical events.
By Time to find events that might have occurred at a particular time to try to correlate them with other system events.
Export the data to a file for use in another analytics tool by clicking .
To view all MLAG alarms, run:
netq show events [severity info | severity error ] message_type clag [between <text-time> and <text-endtime>] [json]
This example shows that no MLAG events have occurred in the last three days.
cumulus@switch:~$ netq show events messsage_type clag between now and 3d
No matching event records found
This example shows that one MLAG event occurred in the last 30 days.
cumulus@switch:~$ netq show events message_type clag between now and 30d
Matching events records:
Hostname Message Type Severity Message Timestamp
----------------- ------------------------ ---------------- ----------------------------------- -------------------------
border02 clag info Peer state changed to up Fri Oct 2 22:39:28 2020
View Details About All Switches Running MLAG
You can view attributes of all switches running MLAG in your network in the full-screen card.
To view all switch details:
Open the Network Services/All MLAG Sessions card.
Change to the full-screen card using the card size picker.
Click the All Switches tab.
Use the icons above the table to select/deselect, filter, and export items in the list. Refer to Table Settings for more detail.
View Details for All MLAG Sessions
You can view attributes of all MLAG sessions in your network
with the NetQ UI or NetQ CLI.
To view all session details:
Open the Network Services/All MLAG Sessions card.
Change to the full-screen card using the card size picker.
Click the All Sessions tab.
Use the icons above the table to select/deselect, filter, and export items in the list. Refer to Table Settings for more detail.
To view session details, run netq show mlag.
This example shows all current sessions (one per row) and the attributes associated with them.
cumulus@switch:~$ netq show mlag
Matching clag records:
Hostname Peer SysMac State Backup #Bond #Dual Last Changed
s
----------------- ----------------- ------------------ ---------- ------ ----- ----- -------------------------
border01(P) border02 44:38:39:be:ef:ff up up 3 3 Tue Oct 27 10:50:26 2020
border02 border01(P) 44:38:39:be:ef:ff up up 3 3 Tue Oct 27 10:46:38 2020
leaf01(P) leaf02 44:38:39:be:ef:aa up up 8 8 Tue Oct 27 10:44:39 2020
leaf02 leaf01(P) 44:38:39:be:ef:aa up up 8 8 Tue Oct 27 10:52:15 2020
leaf03(P) leaf04 44:38:39:be:ef:bb up up 8 8 Tue Oct 27 10:48:07 2020
leaf04 leaf03(P) 44:38:39:be:ef:bb up up 8 8 Tue Oct 27 10:48:18 2020
Monitor a Single MLAG Session
With NetQ, you can monitor the number of nodes running the MLAG service, view switches with the most peers alive and not alive, and view alarms triggered by the MLAG service. For an overview and how to configure MLAG in your data center network, refer to
Multi-Chassis Link Aggregation - MLAG.
To access the single session cards, you must open the full-screen Network Services/All MLAG Sessions card, click the All Sessions tab, select the desired session, then click (Open Card).
Granularity of Data Shown Based on Time Period
On the medium and large single MLAG session cards, vertically stacked heat maps represent the status of the peers; one for peers that are reachable (alive), and one for peers that are unreachable (not alive). Depending on the time period of data on the card, the number of smaller time blocks used to indicate the status varies. A vertical stack of time blocks, one from each map, includes the results from all checks during that time. The amount of saturation for each block indicates how many peers were alive. If all peers during that time period were alive for the entire time block, then the top block is 100% saturated (white) and the not alive block is zero percent saturated (gray). As peers that are not alive increase in saturation, the amount of saturation diminishes proportionally for peers that are in the alive block. The example below shows a heat map for a time period of 24 hours with the most common time periods in the table showing the resulting time blocks.
Time Period
Number of Runs
Number Time Blocks
Amount of Time in Each Block
6 hours
18
6
1 hour
12 hours
36
12
1 hour
24 hours
72
24
1 hour
1 week
504
7
1 day
1 month
2,086
30
1 day
1 quarter
7,000
13
1 week
View Session Status Summary
A summary of the MLAG session is available about a given MLAG session using the NetQ UI or NetQ CLI.
A summary of the MLAG session is available from the Network Services/MLAG Session card workflow, showing the host and peer devices participating in the session, node role, peer role and state, the associated system MAC address, and the distribution of the MLAG session state.
To view the summary:
Open or add the Network Services/All MLAG Sessions card.
Change to the full-screen card using the card size picker.
Click the All Sessions tab.
Select the session of interest, then click (Open Card).
Locate the medium Network Services/MLAG Session card.
In the left example, we see that the tor1 switch plays the secondary role in this session with the switch at 44:38:39:ff:01:01 and that there is an issue with this session. In the right example, we see that the leaf03 switch plays the primary role in this session with leaf04 and this session is in good health.
Optionally, open the small Network Services/MLAG Session card to keep track of the session health.
Run the netq show mlag command with the hostname option.
This example shows the session information when the leaf01 switch is acting as the primary role in the session.
cumulus@switch:~$ netq leaf01 show mlag
Matching clag records:
Hostname Peer SysMac State Backup #Bond #Dual Last Changed
s
----------------- ----------------- ------------------ ---------- ------ ----- ----- -------------------------
leaf01(P) leaf02 44:38:39:be:ef:aa up up 8 8 Tue Oct 27 10:44:39 2020
View MLAG Session Peering State Changes
You can view the peering state for a given MLAG session from the medium and large MLAG Session cards. For a given time period, you can determine the stability of the MLAG session between two devices. If you experienced connectivity issues at a particular time, you can use these cards to help verify the state of the peer. If the peer was not alive more than it was alive, you can then investigate further into possible causes.
To view the state transitions for a given MLAG session on the medium card:
Open the or add the Network Services/All MLAG Sessions card.
Change to the full-screen card using the card size picker.
Click the All Sessions tab.
Select the session of interest, then click (Open Card).
Locate the medium Network Services/MLAG Session card.
In this example, the heat map tells us that the peer switch has been alive for the entire 24-hour period.
From this card, you can also view the node role, peer role and state, and MLAG system MAC address which identify the session in more detail.
To view the peering state transitions for a given MLAG session on the large Network Services/MLAG Session card:
Open a Network Services/MLAG Session card.
Hover over the card, and change to the large card using the card size picker.
From this card, you can also view the alarm and info event counts, node role, peer role, state, and interface, MLAG system MAC address, active backup IP address, single, dual, conflicted, and protocol down bonds, and the VXLAN anycast address identifying the session in more detail.
View Changes to the MLAG Service Configuration File
Each time a change is made to the configuration file for the MLAG service, NetQ logs the change and enables you to compare it with the last version using the NetQ UI. This can be useful when you are troubleshooting potential causes for alarms or sessions losing their connections.
To view the configuration file changes:
Open or add the Network Services/All MLAG Sessions card.
Switch to the full-screen card using the card size picker.
Click the All Sessions tab.
Select the session of interest, then click (Open Card).
Locate the medium Network Services/MLAG Session card.
Hover over the card, and change to the large card using the card size picker.
Hover over the card and click to open the Configuration File Evolution tab.
Select the time of interest on the left; when a change might have impacted the performance. Scroll down if needed.
Choose between the File view and the Diff view (selected option is dark; File by default).
The File view displays the content of the file for you to review.
The Diff view displays the changes between this version (on left) and the most recent version (on right) side by side. The changes are highlighted in red and green. In this example, we don’t have any changes after this first creation, so the same file is shown on both sides and no highlighting is present.
All MLAG Session Details
You can view attributes of all of the MLAG sessions for the devices participating in a given session with the NetQ UI and the NetQ CLI.
To view all session details:
Open or add the Network Services/All MLAG Sessions card.
Switch to the full-screen card using the card size picker.
Click the All Sessions tab.
Select the session of interest, then click (Open Card).
Locate the medium Network Services/MLAG Session card.
Hover over the card, and change to the full-screen card using the card size picker. The All MLAG Sessions tab is displayed by default.
Where to go next depends on what data you see, but a few options include:
Open the All Events tabs to look more closely at the alarm and info events fin the network.
Sort on other parameters:
By Single Bonds to determine which interface sets are only connected to one of the switches.
By Backup IP and Backup IP Active to determine if the correct backup IP address is specified for the service.
Export the data to a file by clicking .
Run the netq show mlag command.
This example shows all MLAG sessions in the last 24 hours.
cumulus@switch:~$ netq show mlag
Matching clag records:
Hostname Peer SysMac State Backup #Bond #Dual Last Changed
s
----------------- ----------------- ------------------ ---------- ------ ----- ----- -------------------------
border01(P) border02 44:38:39:be:ef:ff up up 3 3 Tue Oct 27 10:50:26 2020
border02 border01(P) 44:38:39:be:ef:ff up up 3 3 Tue Oct 27 10:46:38 2020
leaf01(P) leaf02 44:38:39:be:ef:aa up up 8 8 Tue Oct 27 10:44:39 2020
leaf02 leaf01(P) 44:38:39:be:ef:aa up up 8 8 Tue Oct 27 10:52:15 2020
leaf03(P) leaf04 44:38:39:be:ef:bb up up 8 8 Tue Oct 27 10:48:07 2020
leaf04 leaf03(P) 44:38:39:be:ef:bb up up 8 8 Tue Oct 27 10:48:18 2020
View All MLAG Session Events
You can view all alarm and info events for the two devices on this card.
Open or add the Network Services/All MLAG Sessions card.
Switch to the full-screen card using the card size picker.
Click the All Sessions tab.
Select the session of interest, then click (Open Card).
Locate the medium Network Services/MLAG Session card.
Hover over the card, and change to the full-screen card using the card size picker.
Click the All Events tab.
Where to go next depends on what data you see, but a few options include:
Sort on other parameters:
By Message to determine the frequency of particular events.
By Severity to determine the most critical events.
By Time to find events that might have occurred at a particular time to try to correlate them with other system events.
Export the data to a file by clicking .
Network Topology
NetQ lets you monitor your network by viewing performance and configuration data for individual network devices and the entire fabric networkwide. This section describes monitoring tasks you can perform from a topology view in the NetQ UI.
Access the Topology View
To open the topology view, click Topology in any workbench header. This opens the full-screen view of your network topology.
Topology Overview
The topology view provides a visual representation of your Linux network, showing the connections and device information for all monitored nodes. The topology view uses a number of icons and elements to represent the nodes and their connections:
Symbol
Usage
Switch running Cumulus Linux OS
Switch running RedHat, Ubuntu, or CentOS
Host with unknown operating system
Host running Ubuntu
Lines
Physical links or connections
Interact with the Topology
There are a number of ways in which you can interact with the topology.
Move the Topology Focus
You can move the focus on the topology closer to view a smaller number of nodes, or further out to view a larger number of nodes. As with mapping applications, the node labels appear and disappear as you move in and out on the diagram for better readability. To zoom, you can use:
The zoom controls, , in the bottom right corner of the screen; the ‘+’ zooms you in closer, the ‘-’ moves you further out, and the ‘o’ resets to the default size.
A scrolling motion on your mouse.
Your trackpad.
You can also click anywhere on the topology, and drag it left, right, up, or down to view a different portion of the network diagram.
View Data About the Network
You can hover over the various elements to view data about them. Select an element to open a side panel with additional statistics.
Hovering over a line highlights each end of the connection. Select the line to open a side panel with additional configuration data.
From the side panel, you can view the following data about nodes and links:
Node Data
Description
ASIC
Name of the ASIC used in the switch. A value of Cumulus Networks VX indicates a virtual machine.
NetQ Agent Status
Operational status of the NetQ Agent on the switch (fresh or rotten).
NetQ Agent Version
Version ID of the NetQ Agent on the switch.
OS Name
Operating system running on the switch.
Platform
Vendor and name of the switch hardware.
Interface Statistics
Transmit and receive data.
Resource Utilization
CPU, memory, and disk utilization.
Events
Warning and info events.
Link Data
Description
Source
Switch where the connection originates
Source Interface
Port on the source switch used by the connection
Target
Switch where the connection ends
Target Interface
Port on the destination switch used by the connection
Change the time period by selecting the timestamp box in the topology header. Adjust the time to view historic network configurations.
Click Export in the header to export your topology information as a JSON file.
Rearrange the Topology Layout
NetQ generates the network topology and positions the nodes automatically. In large topologies, the position of the nodes might not be suitable for easy viewing. You can move the components of the topology by dragging and dropping them with your mouse. You can save the new layout so other users can see it by selecting the Save icon in the header.
NTP
Use the CLI to view Network Time Protocol (NTP). The command output displays the time synchronization status for all devices. You can filter for devices that are either in synchronization or out of synchronization, currently or at a time in the past.
Monitor NTP with the following command. See the command line reference for additional options, definitions, and examples.
netq show ntp
OSPF
If you have OSPF running on your switches and hosts, NetQ enables you to view the health of the OSPF service on a networkwide and a per session basis, giving greater insight into all aspects of the service. For each device, you can view its associated interfaces, areas, peers, state, and type of OSPF running (numbered or unnumbered). Additionally, you can view the information at an earlier point in time and filter against a particular device, interface, or area.
You accomplish this in the NetQ UI through two card workflows, one for the service and one for the session, and in the NetQ CLI with the netq show ospf command.
Monitor the OSPF Service Networkwide
View Service Status Summary
You can view a summary of the OSPF service from the NetQ UI or the NetQ CLI.
To view the summary, open the Network Services/All OSPF Sessions card. In this example, the number of devices running the OSPF service is nine (9) and the number and distribution of related critical severity alarms is zero (0).
To view OSPF service status, run:
netq show ospf
This example shows all devices included in OSPF unnumbered routing, the assigned areas, state, peer and interface, and the last time this information changed.
cumulus@switch:~$ netq show ospf
Matching ospf records:
Hostname Interface Area Type State Peer Hostname Peer Interface Last Changed
----------------- ------------------------- ------------ ---------------- ---------- ----------------- ------------------------- -------------------------
leaf01 swp51 0.0.0.0 Unnumbered Full spine01 swp1 Thu Feb 7 14:42:16 2019
leaf01 swp52 0.0.0.0 Unnumbered Full spine02 swp1 Thu Feb 7 14:42:16 2019
leaf02 swp51 0.0.0.0 Unnumbered Full spine01 swp2 Thu Feb 7 14:42:16 2019
leaf02 swp52 0.0.0.0 Unnumbered Full spine02 swp2 Thu Feb 7 14:42:16 2019
leaf03 swp51 0.0.0.0 Unnumbered Full spine01 swp3 Thu Feb 7 14:42:16 2019
leaf03 swp52 0.0.0.0 Unnumbered Full spine02 swp3 Thu Feb 7 14:42:16 2019
leaf04 swp51 0.0.0.0 Unnumbered Full spine01 swp4 Thu Feb 7 14:42:16 2019
leaf04 swp52 0.0.0.0 Unnumbered Full spine02 swp4 Thu Feb 7 14:42:16 2019
spine01 swp1 0.0.0.0 Unnumbered Full leaf01 swp51 Thu Feb 7 14:42:16 2019
spine01 swp2 0.0.0.0 Unnumbered Full leaf02 swp51 Thu Feb 7 14:42:16 2019
spine01 swp3 0.0.0.0 Unnumbered Full leaf03 swp51 Thu Feb 7 14:42:16 2019
spine01 swp4 0.0.0.0 Unnumbered Full leaf04 swp51 Thu Feb 7 14:42:16 2019
spine02 swp1 0.0.0.0 Unnumbered Full leaf01 swp52 Thu Feb 7 14:42:16 2019
spine02 swp2 0.0.0.0 Unnumbered Full leaf02 swp52 Thu Feb 7 14:42:16 2019
spine02 swp3 0.0.0.0 Unnumbered Full leaf03 swp52 Thu Feb 7 14:42:16 2019
spine02 swp4 0.0.0.0 Unnumbered Full leaf04 swp52 Thu Feb 7 14:42:16 2019
View the Distribution of Sessions
It is useful to know the number of network nodes running the OSPF protocol over a period of time, as it gives you insight into the amount of traffic associated with and breadth of use of the protocol. It is also useful to view the health of the sessions.
To view these distributions, open the medium Network Services/All OSPF Sessions card. In this example, there are nine nodes running the service with a total of 40 sessions. This has not changed over the past 24 hours.
To view the number of switches running the OSPF service, run:
netq show ospf
Count the switches in the output.
This example shows four leaf switches and two spine switches are running the OSPF service, for a total of six switches.
cumulus@switch:~$ netq show ospf
Matching ospf records:
Hostname Interface Area Type State Peer Hostname Peer Interface Last Changed
----------------- ------------------------- ------------ ---------------- ---------- ----------------- ------------------------- -------------------------
leaf01 swp51 0.0.0.0 Unnumbered Full spine01 swp1 Thu Feb 7 14:42:16 2019
leaf01 swp52 0.0.0.0 Unnumbered Full spine02 swp1 Thu Feb 7 14:42:16 2019
leaf02 swp51 0.0.0.0 Unnumbered Full spine01 swp2 Thu Feb 7 14:42:16 2019
leaf02 swp52 0.0.0.0 Unnumbered Full spine02 swp2 Thu Feb 7 14:42:16 2019
leaf03 swp51 0.0.0.0 Unnumbered Full spine01 swp3 Thu Feb 7 14:42:16 2019
leaf03 swp52 0.0.0.0 Unnumbered Full spine02 swp3 Thu Feb 7 14:42:16 2019
leaf04 swp51 0.0.0.0 Unnumbered Full spine01 swp4 Thu Feb 7 14:42:16 2019
leaf04 swp52 0.0.0.0 Unnumbered Full spine02 swp4 Thu Feb 7 14:42:16 2019
spine01 swp1 0.0.0.0 Unnumbered Full leaf01 swp51 Thu Feb 7 14:42:16 2019
spine01 swp2 0.0.0.0 Unnumbered Full leaf02 swp51 Thu Feb 7 14:42:16 2019
spine01 swp3 0.0.0.0 Unnumbered Full leaf03 swp51 Thu Feb 7 14:42:16 2019
spine01 swp4 0.0.0.0 Unnumbered Full leaf04 swp51 Thu Feb 7 14:42:16 2019
spine02 swp1 0.0.0.0 Unnumbered Full leaf01 swp52 Thu Feb 7 14:42:16 2019
spine02 swp2 0.0.0.0 Unnumbered Full leaf02 swp52 Thu Feb 7 14:42:16 2019
spine02 swp3 0.0.0.0 Unnumbered Full leaf03 swp52 Thu Feb 7 14:42:16 2019
spine02 swp4 0.0.0.0 Unnumbered Full leaf04 swp52 Thu Feb 7 14:42:16 2019
To compare this count with the count at another time, run the netq show ospf command with the around option. Count the devices running OSPF at that time. Repeat with another time to collect a picture of changes over time.
View Devices with the Most OSPF Sessions
You can view the load from OSPF on your switches and hosts using the large Network Services card. This data enables you to see which switches are handling the most OSPF traffic currently, validate that is what you expect based on your network design, and compare that with data from an earlier time to look for any differences.
To view switches and hosts with the most OSPF sessions:
Open the large Network Services/All OSPF Sessions card.
Select Switches with Most Sessions from the filter above the table.
The table content sorts by this characteristic, listing nodes running the most OSPF sessions at the top. Scroll down to view those with the fewest sessions.
To compare this data with the same data at a previous time:
Open another large OSPF Service card.
Move the new card next to the original card if needed.
Change the time period for the data on the new card by hovering over the card and clicking .
Select the time period that you want to compare with the original time. We chose Past Week for this example.
You can now see whether there are significant differences between this time and the original time. If the changes are unexpected, you can investigate further by looking at another timeframe, determining if more nodes are now running OSPF than previously, looking for changes in the topology, and so forth.
To determine the devices with the most sessions, run netq show ospf. Then count the sessions on each device.
In this example, the leaf01-04 switches each have two sessions and the spine01-02 switches have four session each. Therefore the spine switches have the most sessions.
cumulus@switch:~$ netq show ospf
Matching ospf records:
Hostname Interface Area Type State Peer Hostname Peer Interface Last Changed
----------------- ------------------------- ------------ ---------------- ---------- ----------------- ------------------------- -------------------------
leaf01 swp51 0.0.0.0 Unnumbered Full spine01 swp1 Thu Feb 7 14:42:16 2019
leaf01 swp52 0.0.0.0 Unnumbered Full spine02 swp1 Thu Feb 7 14:42:16 2019
leaf02 swp51 0.0.0.0 Unnumbered Full spine01 swp2 Thu Feb 7 14:42:16 2019
leaf02 swp52 0.0.0.0 Unnumbered Full spine02 swp2 Thu Feb 7 14:42:16 2019
leaf03 swp51 0.0.0.0 Unnumbered Full spine01 swp3 Thu Feb 7 14:42:16 2019
leaf03 swp52 0.0.0.0 Unnumbered Full spine02 swp3 Thu Feb 7 14:42:16 2019
leaf04 swp51 0.0.0.0 Unnumbered Full spine01 swp4 Thu Feb 7 14:42:16 2019
leaf04 swp52 0.0.0.0 Unnumbered Full spine02 swp4 Thu Feb 7 14:42:16 2019
spine01 swp1 0.0.0.0 Unnumbered Full leaf01 swp51 Thu Feb 7 14:42:16 2019
spine01 swp2 0.0.0.0 Unnumbered Full leaf02 swp51 Thu Feb 7 14:42:16 2019
spine01 swp3 0.0.0.0 Unnumbered Full leaf03 swp51 Thu Feb 7 14:42:16 2019
spine01 swp4 0.0.0.0 Unnumbered Full leaf04 swp51 Thu Feb 7 14:42:16 2019
spine02 swp1 0.0.0.0 Unnumbered Full leaf01 swp52 Thu Feb 7 14:42:16 2019
spine02 swp2 0.0.0.0 Unnumbered Full leaf02 swp52 Thu Feb 7 14:42:16 2019
spine02 swp3 0.0.0.0 Unnumbered Full leaf03 swp52 Thu Feb 7 14:42:16 2019
spine02 swp4 0.0.0.0 Unnumbered Full leaf04 swp52 Thu Feb 7 14:42:16 2019
View Devices with the Most Unestablished OSPF Sessions
You can identify switches and hosts that are experiencing difficulties establishing OSPF sessions; both currently and in the past using the NetQ UI.
To view switches with the most unestablished OSPF sessions:
Open the large Network Services/All OSPF Sessions card.
Select Switches with Most Unestablished Sessions from the filter above the table.
The table content sorts by this characteristic, listing nodes with the most unestablished OSPF sessions at the top. Scroll down to view those with the fewest unestablished sessions.
Where to go next depends on what data you see, but a couple of options include:
Change the time period for the data to compare with a prior time.
If the same switches are consistently indicating the most unestablished sessions, you might want to look more carefully at those switches using the Switches card workflow to determine probable causes. Refer to Switches.
Click Show All Sessions to investigate all OSPF sessions with events in the full screen card.
View Devices with the Most OSPF-related Alarms
Switches or hosts experiencing a large number of OSPF alarms might indicate a configuration or performance issue that needs further investigation. You can view the devices sorted by the number of OSPF alarms and then use the Switches card workflow or the Alarms card workflow to gather more information about possible causes for the alarms. Compare the number of nodes running OSPF with unestablished sessions with the alarms present at the same time to determine if there is any correlation between the issues and the ability to establish an OSPF session.
To view switches with the most OSPF alarms:
Open the large OSPF Service card.
Hover over the header and click .
Select Switches with Most Alarms from the filter above the table.
The table content is sorted by this characteristic, listing nodes with the most OSPF alarms at the top. Scroll down to view those with the fewest alarms.
Where to go next depends on what data you see, but a few options include:
Change the time period for the data to compare with a prior time. If the same switches are consistently indicating the most alarms, you might want to look more carefully at those switches using the Switches card workflow.
Click Show All Sessions to investigate all OSPF sessions with events in the full screen card.
View All OSPF Events
You can view all of the OSPF-related events in the network using the NetQ UI or the NetQ CLI.
The Network Services/All OSPF Sessions card enables you to view all of the OSPF events in the designated time period.
To view all OSPF events:
Open the full-screen Network Services/All OSPF Sessions card.
Click All Alarms in the navigation panel. By default, events are listed in most recent to least recent order.
Where to go next depends on what data you see, but a couple of options include:
Open one of the other full-screen tabs in this flow to focus on devices or sessions.
Export the data for use in another analytics tool, by clicking and providing a name for the data file.
To view OSPF events, run:
netq [<hostname>] show events [severity info | severity error ] message_type ospf [between <text-time> and <text-endtime>] [json]
For example:
To view all OSPF events, run netq show events message_type ospf.
To view all OSPF events in the past three days, run netq show events message_type ospf between now and 3d.
View Details for All Devices Running OSPF
You can view all stored attributes of all switches and hosts running OSPF in your network in the full screen card.
To view all device details, open the full screen OSPF Service card and click the All Switches tab.
View Details for All OSPF Sessions
You can view all stored attributes of all OSPF sessions in your network with the NetQ UI or the NetQ CLI.
To view all session details, open the full screen Network Services/All OSPF Sessions card and click the All Sessions tab.
To view session details, run netq show ospf.
This example show all current sessions and the attributes associated with them.
cumulus@switch:~$ netq show ospf
Matching ospf records:
Hostname Interface Area Type State Peer Hostname Peer Interface Last Changed
----------------- ------------------------- ------------ ---------------- ---------- ----------------- ------------------------- -------------------------
leaf01 swp51 0.0.0.0 Unnumbered Full spine01 swp1 Thu Feb 7 14:42:16 2019
leaf01 swp52 0.0.0.0 Unnumbered Full spine02 swp1 Thu Feb 7 14:42:16 2019
leaf02 swp51 0.0.0.0 Unnumbered Full spine01 swp2 Thu Feb 7 14:42:16 2019
leaf02 swp52 0.0.0.0 Unnumbered Full spine02 swp2 Thu Feb 7 14:42:16 2019
leaf03 swp51 0.0.0.0 Unnumbered Full spine01 swp3 Thu Feb 7 14:42:16 2019
leaf03 swp52 0.0.0.0 Unnumbered Full spine02 swp3 Thu Feb 7 14:42:16 2019
leaf04 swp51 0.0.0.0 Unnumbered Full spine01 swp4 Thu Feb 7 14:42:16 2019
leaf04 swp52 0.0.0.0 Unnumbered Full spine02 swp4 Thu Feb 7 14:42:16 2019
spine01 swp1 0.0.0.0 Unnumbered Full leaf01 swp51 Thu Feb 7 14:42:16 2019
spine01 swp2 0.0.0.0 Unnumbered Full leaf02 swp51 Thu Feb 7 14:42:16 2019
spine01 swp3 0.0.0.0 Unnumbered Full leaf03 swp51 Thu Feb 7 14:42:16 2019
spine01 swp4 0.0.0.0 Unnumbered Full leaf04 swp51 Thu Feb 7 14:42:16 2019
spine02 swp1 0.0.0.0 Unnumbered Full leaf01 swp52 Thu Feb 7 14:42:16 2019
spine02 swp2 0.0.0.0 Unnumbered Full leaf02 swp52 Thu Feb 7 14:42:16 2019
spine02 swp3 0.0.0.0 Unnumbered Full leaf03 swp52 Thu Feb 7 14:42:16 2019
spine02 swp4 0.0.0.0 Unnumbered Full leaf04 swp52 Thu Feb 7 14:42:16 2019
Monitor a Single OSPF Session
With NetQ, you can monitor the performance of a single OSPF session using the NetQ UI or the NetQ CLI.
Network Services/OSPF Session
Small: view devices participating in the session and summary status
Medium: view devices participating in the session, summary status, session state changes, and key identifiers of the session
Large: view devices participating in the session, summary status, session state changes, event distribution and counts, attributes of the session, and the running OSPF configuration and changes to the configuration file
Full-screen: view all session attributes and all events
netq <hostname> show ospf command: view configuration and status for session by hostname, including interface, area, type, state, peer hostname, peer interface, and the last time this information changed
To access the single session cards, you must open the full screen Network Services/All OSPF Sessions card, click the All Sessions tab, select the desired session, then click (Open Card).
Granularity of Data Shown Based on Time Period
On the medium and large single OSPF session cards, vertically stacked heat maps represent the status of the sessions; one for established sessions, and one for unestablished sessions. Depending on the time period of data on the card, the number of smaller time blocks used to indicate the status varies. A vertical stack of time blocks, one from each map, includes the results from all checks during that time. The results appear by how saturated the color is for each block. If all sessions during that time period were established for the entire time block, then the top block is 100% saturated (white) and the unestablished block is zero percent saturated (gray). As sessions that are not established increase in saturation, the sessions that are established block is proportionally reduced in saturation. The following example heat map is for a time period of 24 hours, with the most common time periods in the table showing the resulting time blocks.
Time Period
Number of Runs
Number Time Blocks
Amount of Time in Each Block
6 hours
18
6
1 hour
12 hours
36
12
1 hour
24 hours
72
24
1 hour
1 week
504
7
1 day
1 month
2,086
30
1 day
1 quarter
7,000
13
1 week
View Session Status Summary
You can view a summary of a given OSPF session from the NetQ UI or NetQ CLI.
To view the summary:
Open the Network Services/All OSPF Sessions card.
Switch to the full-screen card using the card size picker.
Click the All Sessions tab.
Select the session of interest, then click (Open Card).
Optionally, switch to the small OSPF Session card.
To view a session summary, run:
netq <hostname> show ospf [<remote-interface>] [area <area-id>] [around <text-time>] [json]
Where:
remote-interface specifies the interface on host node
area filters for sessions occurring in a designated OSPF area
around shows status at a time in the past
json outputs the results in JSON format
This example show OSPF sessions on the leaf01 switch:
cumulus@switch:~$ netq leaf01 show ospf
Matching ospf records:
Hostname Interface Area Type State Peer Hostname Peer Interface Last Changed
----------------- ------------------------- ------------ ---------------- ---------- ----------------- ------------------------- -------------------------
leaf01 swp51 0.0.0.0 Unnumbered Full spine01 swp1 Thu Feb 7 14:42:16 2019
leaf01 swp52 0.0.0.0 Unnumbered Full spine02 swp1 Thu Feb 7 14:42:16 2019
This example shows OSPF sessions for all devices using the swp51 interface on the host node.
cumulus@switch:~$ netq show ospf swp51
Matching ospf records:
Hostname Interface Area Type State Peer Hostname Peer Interface Last Changed
----------------- ------------------------- ------------ ---------------- ---------- ----------------- ------------------------- -------------------------
leaf01 swp51 0.0.0.0 Unnumbered Full spine01 swp1 Thu Feb 7 14:42:16 2019
leaf02 swp51 0.0.0.0 Unnumbered Full spine01 swp2 Thu Feb 7 14:42:16 2019
leaf03 swp51 0.0.0.0 Unnumbered Full spine01 swp3 Thu Feb 7 14:42:16 2019
leaf04 swp51 0.0.0.0 Unnumbered Full spine01 swp4 Thu Feb 7 14:42:16 2019
View OSPF Session State Changes
You can view the state of a given OSPF session from the medium and large Network Service|All OSPF Sessions card. For a given time period, you can determine the stability of the OSPF session between two devices. If you experienced connectivity issues at a particular time, you can use these cards to help verify the state of the session. If it was not established more than it was established, you can then investigate further into possible causes.
To view the state transitions for a given OSPF session, on the medium OSPF Session card:
Open the Network Services/All OSPF Sessions card.
Switch to the full-screen card using the card size picker.
Click the All Sessions tab.
Select the session of interest. The full-screen card closes automatically.
The heat map indicates the status of the session over the designated time period. In this example, the session has been established for the entire time period.
From this card, you can also view the interface name, peer address, and peer id identifying the session in more detail.
To view the state transitions for a given OSPF session on the large OSPF Session card:
Open a Network Services/OSPF Session card.
Hover over the card, and change to the large card using the card size picker.
From this card, you can view the alarm and info event counts, interface name, peer address and peer id, state, and several other parameters identifying the session in more detail.
View Changes to the OSPF Service Configuration File
Each time a change is made to the configuration file for the OSPF service, NetQ logs the change and enables you to compare it with the last version using the NetQ UI. This can be useful when you are troubleshooting potential causes for alarms or sessions losing their connections.
To view the configuration file changes:
Open or add the Network Services/All OSPF Sessions card.
Switch to the full-screen card.
Click the All Sessions tab.
Select the session of interest. The full-screen card closes automatically.
Hover over the card, and change to the large card using the card size picker.
Hover over the card and click to open the Configuration File Evolution tab.
Select the time of interest on the left; when a change might have impacted the performance. Scroll down if needed.
Choose between the File view and the Diff view (selected option is dark; File by default).
The File view displays the content of the file for you to review.
The Diff view displays the changes between this version (on left) and the most recent version (on right) side by side. The changes are highlighted in red and green. In this example, we don’t have a change to highlight, so it shows the same file on both sides.
View All OSPF Session Details
You can view attributes of all of the OSPF sessions for the devices participating in a given session with the NetQ UI and the NetQ CLI.
To view all session details:
Open or add the Network Services/All OSPF Sessions card.
Switch to the full-screen card.
Click the All Sessions tab.
Select the session of interest. The full-screen card closes automatically.
Hover over the card, and change to the full-screen card using the card size picker.
Run the netq show ospf command.
This example shows all OSPF sessions. Filter by remote interface or area to narrow the listing. Scroll until you find the session of interest.
cumulus@switch:~$ netq show ospf
Matching ospf records:
Hostname Interface Area Type State Peer Hostname Peer Interface Last Changed
----------------- ------------------------- ------------ ---------------- ---------- ----------------- ------------------------- -------------------------
leaf01 swp51 0.0.0.0 Unnumbered Full spine01 swp1 Thu Feb 7 14:42:16 2019
leaf01 swp52 0.0.0.0 Unnumbered Full spine02 swp1 Thu Feb 7 14:42:16 2019
leaf02 swp51 0.0.0.0 Unnumbered Full spine01 swp2 Thu Feb 7 14:42:16 2019
leaf02 swp52 0.0.0.0 Unnumbered Full spine02 swp2 Thu Feb 7 14:42:16 2019
leaf03 swp51 0.0.0.0 Unnumbered Full spine01 swp3 Thu Feb 7 14:42:16 2019
leaf03 swp52 0.0.0.0 Unnumbered Full spine02 swp3 Thu Feb 7 14:42:16 2019
leaf04 swp51 0.0.0.0 Unnumbered Full spine01 swp4 Thu Feb 7 14:42:16 2019
leaf04 swp52 0.0.0.0 Unnumbered Full spine02 swp4 Thu Feb 7 14:42:16 2019
spine01 swp1 0.0.0.0 Unnumbered Full leaf01 swp51 Thu Feb 7 14:42:16 2019
spine01 swp2 0.0.0.0 Unnumbered Full leaf02 swp51 Thu Feb 7 14:42:16 2019
spine01 swp3 0.0.0.0 Unnumbered Full leaf03 swp51 Thu Feb 7 14:42:16 2019
spine01 swp4 0.0.0.0 Unnumbered Full leaf04 swp51 Thu Feb 7 14:42:16 2019
spine02 swp1 0.0.0.0 Unnumbered Full leaf01 swp52 Thu Feb 7 14:42:16 2019
spine02 swp2 0.0.0.0 Unnumbered Full leaf02 swp52 Thu Feb 7 14:42:16 2019
spine02 swp3 0.0.0.0 Unnumbered Full leaf03 swp52 Thu Feb 7 14:42:16 2019
spine02 swp4 0.0.0.0 Unnumbered Full leaf04 swp52 Thu Feb 7 14:42:16 2019
View All Events for a Given Session
You can view all alarm and info events for the devices participating in a given session with the NetQ UI.
To view all events:
Open or add the Network Services/All OSPF Sessions card.
Switch to the full-screen card.
Click the All Sessions tab.
Select the session of interest. The full-screen card closes automatically.
Hover over the card, and change to the full-screen card using the card size picker.
Click the All Events tab.
PTP
PTP monitoring is an early access feature. It requires a switch fabric running Cumulus Linux version 5.0 and above and NetQ Agent 4.5.
Use the UI or CLI to monitor Precision Time Protocol, including clock hierarchies and priorities, synchronization thresholds, and accuracy rates.
PTP Commands
Monitor PTP with the following commands. See the command line reference for additional options, definitions, and examples.
netq show ptp clock-details
netq show ptp counters (tx | rx)
netq show ptp global-config
netq show ptp port-status
Access the PTP Dashboard
Select Menu.
Under the Network section, select PTP.
The PTP summary dashboard displays:
clock count, type, and distribution
an overview of PTP-related events
a summary of PTP violations (mean path delay and offset from master)
Navigate to the Events tab to view, filter, and sort PTP-related events:
View PTP on a Switch
Select Devices in the workbench header, then click Open a device card.
Select a switch from the dropdown and specify the large card.
Hover over the top of the card and select the PTP icon :
For more granular data, expand the card to full-size and navigate to PTP:
Hover over the chart at any point to display timestamped mean-path-delay and offset-from-master data. You can drag the bottom bar to expand and compress the period of time displayed in the graph.
Select the tabs above the chart to display information about domains, clocks, ports, and configurations:
To view RoCE counter pools, open the large switch card, then click the RoCE icon ().
Switch to the full-screen card, then click RoCE Counters. Look for these columns: Lossy Default Ingress Size, RoCE Reserved Ingress Size, Lossy Default Egress Size, and RoCE Reserved Egress Size.
To view the RoCE counter pools, run netq show roce-counters pool:
cumulus@switch:~$ netq show roce-counters pool
Matching roce records:
Hostname Lossy Default Ingress Size Roce Reserved Ingress Size Lossy Default Egress Size Roce Reserved Egress Size
----------------- ------------------------------ ------------------------------ ------------------------------ ------------------------------
switch 104823 104823 104823 104823
View Counters for a Specific Switch Port
Open the large switch card, then click the RoCE icon ().
Select a port from the list on the left:
To view counters for a specific switch port, include the switch name with the command:
cumulus@switch:~$ netq show roce-counters swp1s1 rx general
Matching roce records:
Hostname Interface PG packets PG bytes no buffer discard buffer usage buffer max usage PG usage PG max usage
----------------- -------------------- -------------------- -------------------- -------------------- -------------------- -------------------- -------------------- --------------------
switch swp1s1 1643392 154094520 0 0 1 0 1
View Results from a Time in the Past
To view counters for a time period in the past:
Open the large switch card, then click the RoCE icon ().
Click in the header and select a different time period.
Use around with any RoCE-related command to view counters from a previous time period:
cumulus@switch:~$ netq show roce-counters swp1s1 rx general around 1h
Matching roce records:
Hostname Interface PG packets PG bytes no buffer discard buffer usage buffer max usage PG usage PG max usage
----------------- -------------------- -------------------- -------------------- -------------------- -------------------- -------------------- -------------------- --------------------
switch swp1s1 661 61856 0 0 1 0 1
Disable RoCE Monitoring
To disable RoCE monitoring:
Edit /etc/netq/commands/cl4-netq-commands.yml and comment out the following lines:
Use the CLI to view the Spanning Tree Protocol (STP) topology on a bridge or switch.
Monitor STP with the following command. If you do not have a bridge in your configuration, the output indicates such. See the command line reference for additional options, definitions, and examples.
netq show stp topology
Switches
With the NetQ UI and NetQ CLI, you can monitor the health of individual switches, including interface performance and resource utilization.
NetQ reports switch performance metrics for three categories:
System configuration: events, interfaces, IP and MAC addresses, VLANs, IP routes, IP neighbors, and installed software packages
Utilization statistics: CPU, memory, disk, ACL and forwarding resources, SSD, and BTRFS
Physical sensing: digital optics and chassis sensors
For switch inventory information (ASIC, platform, CPU, memory, disk, and OS), refer to Switch Inventory.
View Switch Metrics and Attributes
To view events, metrics, and attributes per switch, open the Switch card:
In the header, select Devices, then click Open a device card.
Select a switch from the list:
Click Add.
Adjust the card’s size to view information at different levels of granularity.
Attributes are displayed as the default tab on the large Switch card. You can view the static information about the switch, including its hostname, addresses, server and ASIC vendors and models, OS and NetQ software information. You can also view the state of the interfaces and NetQ Agent on the switch.
Hover over the top of the card, then select the appropriate icon to view utilization info, interface statistics, digital optics info, and RoCe metrics as graphs. The following card shows interface statistics:
View System Configuration in the UI
To view additional information in the NetQ UI, open a Switch card, then expand it to the full-screen view. From here, you can filter, sort, and view information about events, interfaces, MAC addresses, VLANs, IP routes, IP neighbors, IP addresses, BTRFS utilization, software packages, SSD utilization, forwarding resources, ACL resources, What Just Happened events, sensors, RoCE counters, and digital optics.
View System Configuration in the CLI
View All Switch Events
To view all events on the switch, run:
netq [<hostname>] show events [severity info | severity error ] [between <text-time> and <text-endtime>] [json]
View Compute Resources Utilization
You can view the current utilization of CPU, memory, and disk resources to determine whether a switch is reaching its maximum load and compare its performance with other switches.
To determine how many compute resources the switches on your network consume, run:
netq <hostname> show resource-util [cpu | memory] [around <text-time>] [json]
netq <hostname> show resource-util disk [<text-diskname>] [around <text-time>] [json]
When no options are included the output shows the percentage of CPU and memory being consumed as well as the amount and percentage of disk space being consumed.
This example shows only the disk utilization for the leaf01 switch. If you have more than one disk in your switch, the output displays utilization data for all disks. If you want to view the data for only one of the disks, you must specify a disk name.
cumulus@switch:~$ netq leaf01 show resource-util disk
Matching resource_util records:
Hostname Disk Name Total Used Disk Utilization Last Updated
----------------- -------------------- -------------------- -------------------- -------------------- ------------------------
leaf01 /dev/vda4 6170849280 1230393344 20.9 Wed Sep 16 20:54:14 2020
View Status of All Interfaces
You can view all interfaces or filter by the interface type.
netq <hostname> show interfaces type (bond|bridge|eth|loopback|macvlan|swp|vlan|vrf|vxlan) [state <remote-interface-state>] [around <text-time>] [count] [json]
View Interface Statistics and Utilization
NetQ Agents collect performance statistics every 30 seconds for the physical interfaces on switches in your network. The NetQ Agent does not collect statistics for non-physical interfaces, such as bonds, bridges, and VXLANs. The NetQ Agent collects:
To view the interface statistics and utilization, run:
netq <hostname> show interface-stats [errors | all] [<physical-port>] [around <text-time>] [json]
netq <hostname> show interface-utilization [<text-port>] [tx|rx] [around <text-time>] [json]
Where the various options are:
hostname limits the output to a particular switch
errors limits the output to only the transmit and receive errors found on the designated interfaces
physical-port limits the output to a particular port
around enables viewing of the data at a time in the past
json outputs results in JSON format
text-port limits output to a particular host and port; this option requires a hostname
tx, rx limits output to the transmit or receive values, respectively
View All MAC Addresses on a Switch
You can view all MAC addresses on a switch, or filter the list to view a particular address, only the addresses on the egress port, a particular VLAN, or those that are owned by the switch. You can also view the number addresses.
Use the following commands to obtain this MAC address information:
This example shows the total number of MAC address on the leaf01 switch:
cumulus@switch:~$ netq leaf01 show macs count
Count of matching mac records: 55
This example shows the addresses on the bridge egress port on the leaf01 switch:
cumulus@switch:~$ netq leaf01 show macs egress-port bridge
Matching mac records:
Origin MAC Address VLAN Hostname Egress Port Remote Last Changed
------ ------------------ ------ ----------------- ------------------------------ ------ -------------------------
yes 00:00:00:00:00:1a 10 leaf01 bridge no Thu Sep 17 16:16:11 2020
yes 44:38:39:00:00:59 4001 leaf01 bridge no Thu Sep 17 16:16:11 2020
yes 44:38:39:00:00:59 30 leaf01 bridge no Thu Sep 17 16:16:11 2020
yes 44:38:39:00:00:59 20 leaf01 bridge no Thu Sep 17 16:16:11 2020
yes 44:38:39:00:00:59 4002 leaf01 bridge no Thu Sep 17 16:16:11 2020
yes 44:38:39:00:00:59 10 leaf01 bridge no Thu Sep 17 16:16:11 2020
yes 44:38:39:be:ef:aa 4001 leaf01 bridge no Thu Sep 17 16:16:11 2020
yes 44:38:39:be:ef:aa 4002 leaf01 bridge no Thu Sep 17 16:16:11 2020
yes 00:00:00:00:00:1b 20 leaf01 bridge no Thu Sep 17 16:16:11 2020
yes 00:00:00:00:00:1c 30 leaf01 bridge no Thu Sep 17 16:16:11 2020
View All VLANs on a Switch
To view all VLANs on a switch, run:
netq <hostname> show interfaces type vlan [state <remote-interface-state>] [around <text-time>] [count] [json]
Filter the output for VLANs with state option to view VLANs that are up or down, the around option to view VLAN information for a time in the past, or the count option to view the total number of VLANs on the device.
This example shows the total number of VLANs on the leaf01 switch:
cumulus@switch:~$ netq leaf01 show interfaces type vlan count
Count of matching link records: 6
This example shows the VLANs on the leaf01 switch that are down:
cumulus@switch:~$ netq leaf01 show interfaces type vlan state down
No matching link records found
View All IP Routes on a Switch
To view all IPv4 and IPv6 routes or only IPv4 routes on a switch, run:
netq show ip routes [<ipv4>|<ipv4/prefixlen>] [vrf <vrf>] [origin] [around <text-time>] [json]
You can filter the output with the following options:
ipv4 or ipv4/prefixlen to view a particular IPv4 route on the switch
vrf to view routes using a given VRF
origin to view routes that the switch owns
around to view routes at a time in the past
The following example shows information for the IPv4 route at 10.10.10.1 on the spine01 switch:
cumulus@switch:~$ netq spine01 show ip routes 10.10.10.1
Matching routes records:
Origin VRF Prefix Hostname Nexthops Last Changed
------ --------------- ------------------------------ ----------------- ----------------------------------- -------------------------
no default 10.10.10.1/32 spine01 169.254.0.1: swp1, Wed Sep 16 19:57:26 2020
169.254.0.1: swp2
View All IP Neighbors on a Switch
To view all IP neighbors on a switch, run:
netq <hostname> show ip neighbors [<remote-interface>] [<ipv4>|<ipv4> vrf <vrf>|vrf <vrf>] [<mac>] [around <text-time>] [count] [json]
You can filter the output with the following options:
ipv4, ipv4 vrf, orvrf to view the neighbor with a given IPv4 address, the neighbor with a given IPv4 address and VRF, or all neighbors using a given VRF on the switch
mac to view the neighbor with a given MAC address
count to view the total number of known IP neighbors
around to view neighbors at a time in the past
The following example shows the neighbor with a MAC address of 44:38:39:00:00:0b on the leaf02 switch:
cumulus@switch:~$ netq leaf02 show ip neighbors 44:38:39:00:00:0b
Matching neighbor records:
IP Address Hostname Interface MAC Address VRF Remote Last Changed
------------------------- ----------------- ------------------------- ------------------ --------------- ------ -------------------------
169.254.0.1 leaf02 swp52 44:38:39:00:00:0b default no Thu Sep 17 20:25:16 2020
This example shows the neighbor with an IP address of 10.1.10.2 on the leaf02 switch:
cumulus@switch:~$ netq leaf02 show ip neighbors 10.1.10.2
Matching neighbor records:
IP Address Hostname Interface MAC Address VRF Remote Last Changed
------------------------- ----------------- ------------------------- ------------------ --------------- ------ -------------------------
10.1.10.2 leaf02 vlan10 44:38:39:00:00:59 RED no Thu Sep 17 20:25:14 2020
View All IP Addresses on a Switch
To view all IP addresses on a switch, run:
netq <hostname> show ip addresses [<remote-interface>] [<ipv4>|<ipv4/prefixlen>] [vrf <vrf>] [around <text-time>] [count] [json]
You can filter the output with the following options:
ipv4 or ipv4/prefixlen to view a particular IPv4 address on the switch
vrf to view addresses using a given VRF
count to view the total number of known IP neighbors
around to view addresses at a time in the past
This example shows all IP addresses using the BLUE VRF on the leaf03 switch:
cumulus@switch:~$ netq leaf03 show ip addresses vrf BLUE
Matching address records:
Address Hostname Interface VRF Last Changed
------------------------- ----------------- ------------------------- --------------- -------------------------
10.1.30.1/24 leaf03 vlan30-v0 BLUE Thu Sep 17 20:25:09 2020
10.1.30.2/24 leaf03 vlan30 BLUE Thu Sep 17 20:25:08 2020
View Disk Storage After BTRFS Allocation
Customers running Cumulus Linux 3 which uses the BTRFS (b-tree file system) might experience issues with disk space management. This is a known problem of BTRFS because it does not perform periodic garbage collection, or rebalancing. If left unattended, these errors can make it impossible to rebalance the partitions on the disk. To avoid this issue, NVIDIA recommends rebalancing the BTRFS partitions preemptively, but only when absolutely needed to avoid reduction in the lifetime of the disk. By tracking the state of the disk space usage, users can determine when they should rebalance.
To view the disk utilization and check whether a rebalance is recommended, run:
netq show cl-btrfs-util [around <text-time>] [json]
This example shows the utilization on the leaf01 switch:
cumulus@switch:~$ netq leaf01 show cl-btrfs-info
Matching btrfs_info records:
Hostname Device Allocated Unallocated Space Largest Chunk Size Unused Data Chunks S Rebalance Recommende Last Changed
pace d
----------------- -------------------- -------------------- -------------------- -------------------- -------------------- -------------------------
leaf01 37.79 % 3.58 GB 588.5 MB 771.91 MB yes Wed Sep 16 21:25:17 2020
Look for the Rebalance Recommended column. If the value in that column says Yes, then you are strongly encouraged to rebalance the BTRFS partitions. If it says No, then you can review the other values in the output to determine if you are getting close to needing a rebalance, and come back to view this data at a later time.
View All Software Packages
To view software package information for a switch, run:
netq <hostname> show cl-pkg-info [<text-package-name>] [around <text-time>] [json]
Use the text-package-name option to narrow the results to a particular package or the around option to narrow the output to a particular time range.
This example shows the ntp package on the spine01 switch.
cumulus@switch:~$ netq spine01 show cl-pkg-info ntp
Matching package_info records:
Hostname Package Name Version CL Version Package Status Last Changed
----------------- ------------------------ -------------------- -------------------- -------------------- -------------------------
spine01 ntp 1:4.2.8p10-cl3u2 Cumulus Linux 3.7.12 installed Wed Aug 26 19:58:45 2020
View SSD Utilization
For NetQ Appliances that have 3ME3 solid state drives (SSDs) installed (primarily in on-premises deployments), you can view the utilization of the drive on-demand. NetQ generates an event for drives that drop below 10% health, or have more than a 2% loss of health in 24 hours, indicating the need to rebalance the drive.
To view SDD utilization, run:
netq <hostname> show cl-ssd-util [around <text-time>] [json]
This example shows the utilization for spine02 which has this type of SSD.
cumulus@switch:~$ netq spine02 show cl-ssd-util
Hostname Remaining PE Cycle (%) Current PE Cycles executed Total PE Cycles supported SSD Model Last Changed
spine02 80 576 2880 M.2 (S42) 3ME3 Thu Oct 31 00:15:06 2019
This output indicates that this drive is in a good state overall with 80% of its PE cycles remaining.
View Forwarding Resources
To view forwarding resources utilization on a switch, run:
netq <hostname> show cl-resource forwarding [around <text-time>] [json]
This example shows the forwarding resources used by the spine02 switch:
You can monitor the incoming and outgoing access control lists (ACLs) configured on a switch.
Both the Switch card and netq show cl-resource acl command display the ingress/egress IPv4/IPv6 filter/mangle, ingress 802.1x filter, ingress mirror, ingress/egress PBR IPv4/IPv6 filter/mangle, ACL Regions, 18B/32B/54B Rules Key, and layer 4 port range checker.
To view ACL resource utilization on a switch, run:
Use the egress or ingress options to show only the outgoing or incoming ACLs.
This example shows the ACL resources available and currently used by the leaf01 switch.
cumulus@switch:~$ netq leaf01 show cl-resource acl
Matching cl_resource records:
Hostname In IPv4 filter In IPv4 Mangle In IPv6 filter In IPv6 Mangle In 8021x filter In Mirror In PBR IPv4 filter In PBR IPv6 filter Eg IPv4 filter Eg IPv4 Mangle Eg IPv6 filter Eg IPv6 Mangle ACL Regions 18B Rules Key 32B Rules Key 54B Rules Key L4 Port range Checke Last Updated
rs
----------------- -------------------- -------------------- -------------------- -------------------- -------------------- -------------------- -------------------- -------------------- -------------------- -------------------- -------------------- -------------------- -------------------- -------------------- -------------------- -------------------- -------------------- ------------------------
leaf01 36,512(7%) 0,0(0%) 30,768(3%) 0,0(0%) 0,0(0%) 0,0(0%) 0,0(0%) 0,0(0%) 29,256(11%) 0,0(0%) 0,0(0%) 0,0(0%) 0,0(0%) 0,0(0%) 0,0(0%) 0,0(0%) 2,24(8%) Mon Jan 13 03:34:11 2020
View Chassis Health with Sensors
Fan, power supply unit (PSU), and temperature sensors provide additional data about the switch operation.
View All Sensor Information for a Switch
To view information for power supplies, fans, and temperature sensors on a switch, run:
netq <hostname> show sensors all [around <text-time>] [json]
View Only Power Supply Health
To view information from all PSU sensors or PSU sensors with a given name on a given switch, run:
netq <hostname> show sensors psu [<psu-name>] [around <text-time>] [json]
Use the psu-name option to view all PSU sensors with a particular name.
Use Tab completion to determine the names of the PSUs in your switches.
cumulus@switch:~$ netq <hostname> show sensors psu <press tab>
around : Go back in time to around ...
json : Provide output in JSON
psu1 : Power Supply
psu2 : Power Supply
<ENTER>
View Only Fan Health
To view information from all fan sensors or fan sensors with a given name on your switch, run:
netq <hostname> show sensors fan [<fan-name>] [around <text-time>] [json]
Use the fan-name option to view all fan sensors with a particular name.
This example shows the state of all fans with the name fan1 on the leaf02 switch.
cumulus@switch:~$ netq leaf02 show sensors fan fan1
Hostname Name Description State Speed Max Min Message Last Changed
----------------- --------------- ----------------------------------- ---------- ---------- -------- -------- ----------------------------------- -------------------------
leaf02 fan1 fan tray 1, fan 1 ok 2500 29000 2500 Fri Apr 19 16:01:41 2019
View Only Temperature Information
To view information from all temperature sensors or temperature sensors with a given name on a switch, run:
netq <hostname> show sensors temp [<temp-name>] [around <text-time>] [json]
Use the temp-name option to view all PSU sensors with a particular name.
This example shows the state of the psu1temp1 temperature sensor on the leaf01 switch.
cumulus@switch:~$ netq leaf01 show sensors temp psu2temp1
Matching sensors records:
Hostname Name Description State Temp Critical Max Min Message Last Changed
----------------- --------------- ----------------------------------- ---------- -------- -------- -------- -------- ----------------------------------- -------------------------
leaf01 psu2temp1 psu2 temp sensor ok 25 85 80 5 Wed Aug 26 16:14:41 2020
View Digital Optics Health
Digital optics module information is available regarding the performance degradation or complete outage of any digital optics modules on a switch.
To view digital optics information for a switch, run one of the following:
netq <hostname> show dom type (laser_rx_power|laser_output_power|laser_bias_current) [interface <text-dom-port-anchor>] [channel_id <text-channel-id>] [around <text-time>] [json]
netq <hostname> show dom type (module_temperature|module_voltage) [interface <text-dom-port-anchor>] [around <text-time>] [json]
VLAN
Use the CLI to view Virtual Local Area Network (VLAN) information.
Monitor VLAN with the following commands. See the command line reference for additional options, definitions, and examples.
netq show vlan
netq show interfaces type macvlan
netq show interfaces type vlan
netq show macs
netq show events message_type vlan
VXLAN
Use the CLI to monitor Virtual Extensible LAN (VXLAN) and validate overlay communication paths. See the command line reference for additional options, definitions, and examples.
netq show vxlan
netq show interfaces type vxlan
netq show events message_type vxlan
Validation Checks
When you discover operational anomalies, you can check whether the devices, hosts, network protocols, and services are operating as expected. NetQ lets you see when changes have occurred to the network, devices, and interfaces by viewing their operation, configuration, and status at an earlier point in time.
Validation support is available in the NetQ UI and the NetQ CLI for the following:
Item
NetQ UI
NetQ CLI
Addresses
Yes
Yes
Agents
Yes
Yes
BGP
Yes
Yes
Cumulus Linux version
No
Yes
EVPN
Yes
Yes
Interfaces
Yes
Yes
MLAG (CLAG)
Yes
Yes
MTU
Yes
Yes
NTP
Yes
Yes
OSPF
Yes
Yes
RoCE
Yes
Yes
Sensors
Yes
Yes
VLAN
Yes
Yes
VXLAN
Yes
Yes
Validation with the NetQ UI
The NetQ UI uses the following cards to create validations and view results for these protocols and services:
Network Health
Validation Request
On-demand and Scheduled Validation Results
For a general understanding of how well your network is operating, the Network Health card workflow is the best place to start as it contains the highest-level view and performance roll-ups.
Validation with the NetQ CLI
The NetQ CLI uses the netq check commands to validate the various elements of your network fabric, looking for inconsistencies in configuration across your fabric, connectivity faults, missing configurations, and so forth. You can run commands from any node in the network.
View Default Validation Tests
To view the list of tests run for a given protocol or service by default, use either netq show unit-tests <protocol/service> or perform a tab completion on netq check <protocol/service> [include|exclude]. Refer to Validation Tests Reference for a description of the individual tests.
Select the Tests to Run
You can include or exclude one or more of the various tests performed during the validation. Each test is assigned a number, which is used to identify which tests to run. By default, all tests are run. The <protocol-number-range-list> value is used with the include and exclude options to indicate which tests to include. It is a number list separated by commas, or a range using a dash, or a combination of these. Do not use spaces after commas. For example:
include 1,3,5
include 1-5
include 1,3-5
exclude 6,7
exclude 6-7
exclude 3,4-7,9
The output indicates whether a given test passed, failed, or was skipped.
Validation Check Result Filtering
You can create filters to suppress false alarms or uninteresting errors and warnings that can be a nuisance in CI workflows. For example, certain configurations permit a singly connected MLAG bond, which generates a standard error that is not useful.
Filtered errors and warnings related to validation checks do NOT generate notifications and do not get counted in the alarm and info event totals. They do get counted as part of suppressed notifications instead.
You define these filters in the /etc/netq/check-filter.yml file. You can create a rule for individual check commands or you can create a global rule that applies to all tests run by the check command. Additionally, you can create a rule specific to a particular test run by the check command.
Each rule must contain at least one match criteria and an action response. The only action currently available is filter. The match can comprise multiple criteria, one per line, creating a logical AND. You can match against any column in the validation check output. The match criteria values must match the case and spacing of the column names in the corresponding netq check output and are parsed as regular expressions.
This example shows a global rule for the BGP checks that suppresses any events generated by the DataVrf virtual route forwarding interface coming from swp3 or swp7.. It also shows a test-specific rule to filter all Address Families events from devices with hostnames starting with exit-1 or firewall.
You can configure filters to change validation errors to warnings that would normally occur due to the default expectations of the netq check commands. This applies to all protocols and services, except for agents. For example, if you provision BGP with configurations where a BGP peer is not expected or desired, then errors that a BGP peer is missing occur. By creating a filter, you can remove the error in favor of a warning.
To create a validation filter:
Navigate to the /etc/netq directory.
Create or open the check_filter.yml file using your text editor of choice.
This file contains the syntax to follow to create one or more rules for one or more protocols or services. Create your own rules, and/or edit and un-comment any example rules you would like to use.
# Netq check result filter rule definition file. This is for filtering
# results based on regex match on one or more columns of each test result.
# Currently, only action 'filter' is supported. Each test can have one or
# more rules, and each rule can match on one or more columns. In addition,
# rules can also be optionally defined under the 'global' section and will
# apply to all tests of a check.
#
# syntax:
#
# <check name>:
# tests:
# <test name, as shown in test list when using the include/exclude and tab>:
# - rule:
# match:
# <column name>: regex
# <more columns and regex.., result is AND>
# action:
# filter
# - <more rules..>
# global:
# - rule:
# . . .
# - rule:
# . . .
#
# <another check name>:
# . . .
#
# e.g.
#
# bgp:
# tests:
# Address Families:
# - rule:
# match:
# Hostname: (^exit*|^firewall)
# VRF: DataVrf1080
# Reason: AFI/SAFI evpn not activated on peer
# action:
# filter
# - rule:
# match:
# Hostname: exit-2
# Reason: SAFI evpn not activated on peer
# action:
# filter
# Router ID:
# - rule:
# match:
# Hostname: exit-2
# action:
# filter
#
# evpn:
# tests:
# EVPN Type 2:
# - rule:
# match:
# Hostname: exit-1
# action:
# filter
#
Use Validation Commands in Scripts
If you are running scripts based on the older version of the netq check commands and want to stay with the old output, edit the netq.yml file to include old-check: true in the netq-cli section of the file. For example:
Then run netq config restart cli to apply the change.
If you update your scripts to work with the new version of the commands, change the old-check value to false or remove it. Then restart the CLI.
Use netq check mlag in place of netq check clag from NetQ 2.4 onward. netq check clag remains available for automation scripts, but you should begin migrating to netq check mlag to maintain compatibility with future NetQ releases.
Validation Tests Reference
NetQ collects data that validates the health of your network fabric, devices, and interfaces. You can create and run validations with either the NetQ UI or the NetQ CLI. The number of checks and the type of checks are tailored to the particular protocol or element being validated.
Use the value in the Test Number column in the tables below with the NetQ CLI when you want to include or exclude specific tests with the netq check command. You can get the test numbers by running the netq show unit-tests command.
Addresses Validation Tests
The duplicate address detection tests look for duplicate IPv4 and IPv6 addresses assigned to interfaces across devices in the inventory. It also checks for duplicate /32 host routes in each VRF.
Test Number
Test Name
Description
0
IPv4 Duplicate Addresses
Checks for duplicate IPv4 addresses
1
IPv6 Duplicate Addresses
Checks for duplicate IPv6 addresses
Agent Validation Tests
NetQ Agent validation looks for an agent status of rotten for each node in the network. A fresh status indicates the agent is running as expected. The agent sends a ‘heartbeat’ every 30 seconds, and if it does not send three consecutive heartbeats, its status changes to rotten.
Test Number
Test Name
Description
0
Agent Health
Checks for nodes that have failed or lost communication
BGP Validation Tests
The BGP validation tests look for status and configuration anomalies.
Test Number
Test Name
Description
0
Session Establishment
Checks that BGP sessions are in an established state
1
Address Families
Checks if transmit and receive address family advertisement is consistent between peers of a BGP session
2
Router ID
Checks for BGP router ID conflict in the network
3
Hold Time
Checks for mismatch of hold time between peers of a BGP session
4
Keep Alive Interval
Checks for mismatch of keep alive interval between peers of a BGP session
5
Ipv4 Stale Path Time
Checks for mismatch of IPv4 stale path timer between peers of a BGP session
6
IPv6 Stale Path Time
Checks for mismatch of IPv6 stale path timer between peers of a BGP session
7
Interface MTU
Checks for consistency of interface MTU for BGP peers
Cumulus Linux Version Tests
The Cumulus Linux version test looks for version consistency.
Test Number
Test Name
Description
0
Cumulus Linux Image Version
Checks the following:
No version specified, checks that all switches in the network have consistent version
match-version specified, checks that a switch’s OS version is equals the specified version
min-version specified, checks that a switch’s OS version is equal to or greater than the specified version
EVPN Validation Tests
The EVPN validation tests look for status and configuration anomalies.
Test Number
Test Name
Description
0
EVPN BGP Session
Checks if:
BGP EVPN sessions are established
The EVPN address family advertisement is consistent
1
EVPN VNI Type Consistency
Because a VNI can be of type L2 or L3, checks that for a given VNI, its type is consistent across the network
2
EVPN Type 2
Checks for consistency of IP-MAC binding and the location of a given IP-MAC across all VTEPs
3
EVPN Type 3
Checks for consistency of replication group across all VTEPs
4
EVPN Session
For each EVPN session, checks if:
adv_all_vni is enabled
FDB learning is disabled on tunnel interface
5
VLAN Consistency
Checks for consistency of VLAN to VNI mapping across the network
6
VRF Consistency
Checks for consistency of VRF to L3 VNI mapping across the network
Interface Validation Tests
The interface validation tests look for consistent configuration between two nodes.
Test Number
Test Name
Description
0
Admin State
Checks for consistency of administrative state on two sides of a physical interface
1
Oper State
Checks for consistency of operational state on two sides of a physical interface
2
Speed
Checks for consistency of the speed setting on two sides of a physical interface
3
Autoneg
Checks for consistency of the auto-negotiation setting on two sides of a physical interface
Link MTU Validation Tests
The link MTU validation tests look for consistency across an interface and appropriate size MTU for VLAN and bridge interfaces.
Test Number
Test Name
Description
0
Link MTU Consistency
Checks for consistency of MTU setting on two sides of a physical interface
1
VLAN interface
Checks if the MTU of an SVI is no smaller than the parent interface, subtracting the VLAN tag size
2
Bridge interface
Checks if the MTU on a bridge is not arbitrarily smaller than the smallest MTU among its members
MLAG Validation Tests
The MLAG validation tests look for misconfigurations, peering status, and bond error states.
Test Number
Test Name
Description
0
Peering
Checks if:
MLAG peerlink is up
MLAG peerlink bond slaves are down (not in full capacity and redundancy)
Peering is established between two nodes in a MLAG pair
1
Backup IP
Checks if:
MLAG backup IP configuration is missing on a MLAG node
MLAG backup IP is correctly pointing to the MLAG peer and its connectivity is available
2
CLAG Sysmac
Checks if:
MLAG Sysmac is consistently configured on both nodes in a MLAG pair
Any duplication of a MLAG sysmac exists within a bridge domain
3
VXLAN Anycast IP
Checks if the VXLAN anycast IP address is consistently configured on both nodes in an MLAG pair
4
Bridge Membership
Checks if the MLAG peerlink is part of bridge
5
Spanning Tree
Checks if:
STP is enabled and running on the MLAG nodes
MLAG peerlink role is correct from STP perspective
The bridge ID is consistent between two nodes of a MLAG pair
The VNI in the bridge has BPDU guard and BPDU filter enabled
6
Dual Home
Checks for:
MLAG bonds that are not in dually connected state
Dually connected bonds have consistent VLAN and MTU configuration on both sides
STP has consistent view of bonds' dual connectedness
7
Single Home
Checks for:
Singly connected bonds
STP has consistent view of bond’s single connectedness
8
Conflicted Bonds
Checks for bonds in MLAG conflicted state and shows the reason
9
ProtoDown Bonds
Checks for bonds in protodown state and shows the reason
10
SVI
Checks if:
Both sides of a MLAG pair have an SVI configured
SVI on both sides have consistent MTU setting
NTP Validation Tests
The NTP validation test looks for poor operational status of the NTP service.
Test Number
Test Name
Description
0
NTP Sync
Checks if the NTP service is running and in sync state
OSPF Validation Tests
The OSPF validation tests look for indications of the service health and configuration consistency.
Test Number
Test Name
Description
0
Router ID
Checks for OSPF router ID conflicts in the network
1
Adjacency
Checks or OSPF adjacencies in a down or unknown state
2
Timers
Checks for consistency of OSPF timer values in an OSPF adjacency
3
Network Type
Checks for consistency of network type configuration in an OSPF adjacency
4
Area ID
Checks for consistency of area ID configuration in an OSPF adjacency
5
Interface MTU
Checks for MTU consistency in an OSPF adjacency
6
Service Status
Checks for OSPF service health in an OSPF adjacency
RoCE Validation Tests
The RoCE validation tests look for consistent RoCE and QoS configurations across nodes.
Test Number
Test Name
Description
0
RoCE Mode
Checks whether RoCE is configured for lossy or lossless mode
1
Classification
Checks for consistency of DSCP, service pool, port group, and traffic class settings
2
Congestion Control
Checks for consistency of ECN and RED threshold settings
3
Flow Control
Checks for consistency of PFC configuration for RoCE lossless mode
4
ETS
Checks for consistency of Enhanced Transmission Selection settings
Sensor Validation Tests
The sensor validation tests looks for chassis power supply, fan, and temperature sensors that are not operating as expected.
Test Number
Test Name
Description
0
PSU sensors
Checks for power supply unit sensors that are not in ok state
1
Fan sensors
Checks for fan sensors that are not in ok state
2
Temperature sensors
Checks for temperature sensors that are not in ok state
VLAN Validation Tests
The VLAN validation tests look for configuration consistency between two nodes.
Test Number
Test Name
Description
0
Link Neighbor VLAN Consistency
Checks for consistency of VLAN configuration on two sides of a port or a bond
1
CLAG Bond VLAN Consistency
Checks for consistent VLAN membership of a CLAG (MLAG) bond on each side of the CLAG (MLAG) pair
VXLAN Validation Tests
The VXLAN validation tests look for configuration consistency across all VTEPs.
Test Number
Test Name
Description
0
VLAN Consistency
Checks for consistent VLAN to VXLAN mapping across all VTEPs
1
BUM replication
Checks for consistent replication group membership across all VTEPs
Flow Analysis
Create a flow analysis to sample data from TCP and UDP flows in your environment and to review latency and buffer utilization statistics across network paths.
Flow analysis is supported on NVIDIA Spectrum-2 switches and above. It requires a switch fabric running Cumulus Linux version 5.0 or above.
You must enable Lifecycle Management (LCM) to use the flow analysis. If LCM is disabled, you will not see the flow analysis icon in the UI. LCM is enabled for on-premises deployments by default and disabled for cloud deployments by default. Contact your local NVIDIA sales representative or submit a support ticket to activate LCM on cloud deployments.
Create a New Flow Analysis
To start a new flow analysis, click the Flow analysis icon and select Create new flow analysis.
In the dialog, enter the application parameters, including the source IP address, destination IP address, source port, and destination port of the flow you wish to analyze. Select the protocol and VRF for the flow from the dropdown menus.
After you enter the application parameters, enter the monitor settings, including the sampling rate and time parameters.
If you attempt to run a flow analysis that includes switches assigned a default, unmodified access profile, the process will fail. Create a unique access profile (or update the default profile with unique credentials), then assign the profile to the switches you want to include in the flow analysis.
Running a flow analysis will affect switch CPU performance. For high-volume flows, set a lower sampling rate to limit switch CPU impact.
View Flow Analysis Data
After starting the flow analysis, a flow analysis card will appear on the NetQ Workbench.
View a previous flow analysis by selecting Flow analysis and View previous flow analysis.
Select View details next to the name of the flow analysis to display the analysis dashboard. You can use this dashboard to view latency and buffer statistics for the monitored flow. If bi-directional monitoring was enabled, you can view the reverse direction of the flow by selecting the icon. The following example shows flow data across a single path:
The dashboard header shows the monitored flow settings:
Flow Settings
Description
Lifetime
The lifetime of the flow analysis. This example completed in 11 minutes.
Source IP
The source IP address of the flow. In this example it is 10.1.100.125.
Destination IP
The destination IP address of the flow. In this example it is 10.1.10.105.
Source Port
The source port of the flow. In this example it displays N/A because it was not set.
Destination Port
The destination port of the flow. In this example it is 2222.
Protocol
The protocol of the monitored flow. In this example it is UDP.
Sampling Rate
The sampling rate of the flow. In this example it is low.
VRF
The VRF the flow is present in. In this example it is the default VRF.
Bi-directional Monitoring
This determines if the flow is monitored in both directions between the source IP address and the destination IP address. In this example it is enabled. Click to change the direction that is displayed.
Understanding the Flow Analysis Graph
The flow analysis graph is color coded relative to the values measured across devices. Lower values are displayed in green, and higher values are displayed in orange. The color gradient is displayed below the graph along with the low and high values from the collected flow data. Each hop in the path is represented in the graph with a vertical, gray-striped line labeled by hostname. The following example shows a single path:
The flow graph panel on the right side of the dashboard displays the devices along the selected path.
View Flow Latency
The latency measured by the flow analysis is the total transit time of the sampled packets through individual devices. A summary of measured latency for each device is displayed above the main flow analysis graph.
The average latency for packets in the flow is displayed under the hostname of each device, along with the minimum and maximum latencies observed during the analysis lifetime. The 95th percentile (P95) latency value for sampled packets is also displayed. The P95 calculation means that 95% of the sampled packets have a latency value less than or equal to the calculation.
Use your cursor to hover over sections of the main analysis graph to view average latency values for each device in a path.
The left panel of the flow analysis dashboard also displays a timeline of measured latency for each device on that path. Use your cursor to hover over the plotted data points on the timeline for each device to view the latency measured at each time interval.
View Buffer Occupancy
The main flow analysis dashboard also displays the buffer occupancy of each device along the path. To change the graph view to display buffer occupancy for the flow, click next to Avg. flow latency and select Avg. buffer occupancy. You can view an overview graph of buffer occupancy or select each device to see the buffer occupancy for the analyzed flow:
The percentages represent the amount of buffer space on the switch that the analyzed flow occupied while the analysis was running.
View Multiple Paths
When packets matching the flow settings traverse multiple paths in the topology, the flow graph displays latency and buffer occupancy for each path:
You can switch between paths by clicking on an alternate path in the Flow graph panel, or by clicking on an unselected path on the main analysis graph:
In the detail panel on the left side of the dashboard, you can select a path to view the percent of packets distributed over each path.
Partial Path Support
Some flows can still be analyzed if they traverse a network path that includes switches lacking flow analysis support. Partial-path flow analysis is supported in the following conditions:
The unsupported device cannot be the initial ingress or terminating egress device in the path of the analyzed flow.
If there is more than one consecutive transit device in the path that lacks flow analysis support, the path discovery will terminate at that point in the topology and some devices will not be displayed in the flow graph.
An unsupported device is represented in the flow analysis graph as a black bar lined with red x’s. Flow statistics are not displayed for that device.
Unsupported devices are also designated in the flow graph panel:
Selecting the unsupported device displays device statistics in the left panel if they are available to NetQ. Otherwise, the display will indicate why the device is not supported:
Path discovery will terminate if multiple consecutive switches do not support flow analysis. When additional data is available from switches outside of discovered paths, you can view data from those devices from the menu at the top of the page:
The left panel displays the data, along with ingress and egress ports.
View Device Statistics
You can view latency, buffer occupancy, interface statistics, resource utilization, and WJH events for each device by clicking on a device in the Flow Graph panel, or by clicking on the line associated with a device in the main flow analysis graph. The left panel will then update to reflect statistics for the respective device.
After selecting a device, click to expand the statistics chart:
In this view, you can select additional categories to add to the chart:
The Flow Graph panel allows you to access the topology view, where you can also click the paths and devices to view statistics. Click View in topology to switch to the topology view.
View WJH Events
Flow analysis monitors the path for WJH events and records any drops for the flow. Switches with WJH events recorded are represented in the flow analysis graph as a red bar with white stripes. Hover over the device to see a WJH event summary:
You can also view devices with WJH events in the flow graph panel:
Click on a device with WJH events to see the statistics in the left panel. Hover over the data to reveal the type of drops over time:
WJH drops can also be viewed from the expanded device chart by selecting the WJH category:
Select Show all drops to display a list of all WJH drops for the device:
Validate Overall Network Health
The Validation Summary card in the NetQ UI lets you view the overall health of your network at a glance, giving you a high-level understanding of how well your network is operating. Successful validation results determine overall network health shown in this card.
View Key Metrics of Network Health
Overall network health in the NetQ UI is a calculated average of several key health metrics: system, network services, and interface health.
System health represents the NetQ Agent and sensor health validations. In all cases, validation is performed on the agents. If you are monitoring platform sensors, the validation checks include these as well.
Network service health represents the individual network protocol and services validations. In all cases, validation is performed on NTP. If you are running BGP, EVPN, MLAG, OSPF, or VXLAN protocols the validation checks include these as well.
Interface health represents the interfaces, VLAN, and link MTU validations.
To view network health metrics:
Open or locate the medium Validation Summary card on your workbench.
Each metric displays a distribution of the validation results for each category. Hover over the individual categories to view detailed metrics for specific validation checks.
In this example, system health is good, but network services and interface health display validation failures:
View Detailed Network Health
To view details about your network’s health, open or locate the large Validation Summary card on your workbench.
By default, the System health tab is displayed.
The health of agents and sensors is represented on the left side of the card. Hover over the chart for each type of validation to see detailed results. The right side of the card displays devices with failures related to agents and sensors.
Click the Network service health tab.
The health of each network protocol or service is represented on the left side of the card. Hover over the chart for each type of validation to see detailed results. The right side of the card displays devices with failures related to these protocols and services.
Click the Interface health tab.
The health of interfaces, VLANs, and link MTUs is represented on the left side of the card. Hover over the chart for each type of validation to see detailed results. The right side of the card displays devices with failures related to interfaces, VLANs, and link MTUs.
View Devices with the Most Issues
To view devices with the most issues, select Most failures from the filter above the table on the right.
Devices with the most issues are listed at the top. To further investigate critical devices, click on the hostname to open the device card, or use the Events card and filter on the indicated switches.
View Devices with Recent Issues
To view devices with recent issues, select Recent failures from the filter above the table on the right. The devices with the most-recent failures are listed at the top. To further investigate critical devices, click on the hostname to open the device card, or use the Events card and filter on the indicated switches.
Filter Results by Service
You can focus the data in the table on the right by unselecting one or more services. Select the checkbox next to the service you want to remove from the data. In this example, we have unchecked MTU.
Unselecting the service temporarily removes the data related to that service from the table.
View Details of a Particular Service
From the relevant tab (System Health, Network Service Health, or Interface Health) on the large Validation Summary card, you can select a chart to open a full-screen view of the validation data for that service.
The following example shows the EVPN chart:
View All Network Protocol and Service Validation Results
The Validation Summary card workflow lets you view all of the results of all validations run on the network protocols and services during a designated time period.
To view all the validation results:
Open or locate the full-screen Validation Summary card on your workbench.
Look for patterns in the data. For example, when did nodes, sessions, links, ports, or devices start failing validation? Was it at a specific time? Was it when you starting running the service on more nodes? Did sessions fail, but nodes were fine?
Where to go next depends on what data you see, but a few options include:
Look for matching event information for the failure points in a given protocol or service.
When you find failures in one protocol, compare with higher level protocols to see if they fail at a similar time (or vice versa with supporting services).
Export the data for use in another analytics tool, by clicking and providing a name for the data file.
Validate Network Protocol and Service Operations
NetQ lets you validate the operation of the protocols and services running in your network either on demand or according to a schedule. For a general understanding of how well your network is operating, refer to the Validate Overall Network Health.
On-demand Validations
When you want to validate the operation of one or more network protocols and services right now, you can create and run on-demand validations using the NetQ UI or the NetQ CLI.
Create an On-demand Validation
Using the NetQ UI, you can create an on-demand validation for multiple protocols or services at the same time. This is handy when the protocols are strongly related regarding a possible issue or if you only want to create one validation request.
To create and run a request containing checks on one or more protocols or services within the NetQ UI:
In the workbench header, click Validation, then click Create a validation. Choose whether the on-demand validation should run on all devices or on specific device groups.
On the left side of the card, select the protocols or services you want to validate by clicking on their names, then click Next.
This example shows BGP:
Select Now and specify a workbench:
Click Run to start the check. It might take a few minutes for results to appear if the load on the NetQ system is heavy at the time of the run.
The respective On-demand Validation Result card opens on your workbench. If you selected more than one protocol or service, a card opens for each selection. To view additional information about the errors reported, hover over a check and click View details. To view all data for all on-demand validation results for a given protocol, click Show all results.
To create and run a request containing checks on a single protocol or service all within the NetQ CLI, run the relevant netq check command:
All netq check commands have a summary and test results section. Some have additional summary information.
Using the include <bgp-number-range-list> and exclude <bgp-number-range-list> options of the netq check command, you can include or exclude one or more of the various checks performed during the validation.
First determine the number of the tests you want to include or exclude. Refer to Validation Tests Reference for a description of these tests and to get the test numbers for the tests to include or exclude. You can also get the test numbers and descriptions when you run the netq show unit-tests command.
Then run the netq check command.
The following example shows a BGP validation that includes only the session establishment and router ID tests. Note that you can obtain the same results using either of the include or exclude options and that the test that is not run is marked as skipped.
cumulus@switch:~$ netq show unit-tests bgp
0 : Session Establishment - check if BGP session is in established state
1 : Address Families - check if tx and rx address family advertisement is consistent between peers of a BGP session
2 : Router ID - check for BGP router id conflict in the network
Configured global result filters:
Configured per test result filters:
cumulus@switch:~$ netq check bgp include 0,2
bgp check result summary:
Total nodes : 10
Checked nodes : 10
Failed nodes : 0
Rotten nodes : 0
Warning nodes : 0
Additional summary:
Total Sessions : 54
Failed Sessions : 0
Session Establishment Test : passed
Address Families Test : skipped
Router ID Test : passed
cumulus@switch:~$ netq check bgp exclude 1
bgp check result summary:
Total nodes : 10
Checked nodes : 10
Failed nodes : 0
Rotten nodes : 0
Warning nodes : 0
Additional summary:
Total Sessions : 54
Failed Sessions : 0
Session Establishment Test : passed
Address Families Test : skipped
Router ID Test : passed
To create a request containing checks on a single protocol or service in the NetQ CLI, run:
The associated Validation Result card is accessible from the full-screen Validate Network card.
Run an Existing Scheduled Validation On Demand
To run a scheduled validation now:
Click Validation, then click Existing validations.
Select one or more validations you want to run by clicking their names, then click View results.
The associated Validation Result cards open on your workbench.
On-Demand CLI Validation Examples
This section provides CLI validation examples for a variety of protocols and elements. Refer to Validation Tests Reference for descriptions of these tests.
The duplicate address detection validation tests looks for duplicate IPv4 and IPv6 addresses assigned to interfaces across devices in the inventory, and checks for duplicate /32 host routes in each VRF.
The default validation confirms that the NetQ Agent is running on all monitored nodes and provides a summary of the validation results. This example shows the results of a fully successful validation.
cumulus@switch:~$ netq check agents
agent check result summary:
Checked nodes : 13
Total nodes : 13
Rotten nodes : 0
Failed nodes : 0
Warning nodes : 0
Agent Health Test : passed
The default validation runs a networkwide BGP connectivity and configuration check on all nodes running the BGP service:
cumulus@switch:~$ netq check bgp
bgp check result summary:
Checked nodes : 8
Total nodes : 8
Rotten nodes : 0
Failed nodes : 0
Warning nodes : 0
Additional summary:
Total Sessions : 30
Failed Sessions : 0
Session Establishment Test : passed
Address Families Test : passed
Router ID Test : passed
This example indicates that all nodes running BGP and all BGP sessions are running properly. If there were issues with any of the nodes, NetQ would provide information about each node to aid in resolving the issues.
Perform a BGP Validation for a Particular VRF
Using the vrf <vrf> option of the netq check bgp command, you can validate the BGP service where communication is occurring through a particular virtual route. In this example, the name of the VRF of interest is vrf1.
cumulus@switch:~$ netq check bgp vrf vrf1
bgp check result summary:
Checked nodes : 2
Total nodes : 2
Rotten nodes : 0
Failed nodes : 0
Warning nodes : 0
Additional summary:
Total Sessions : 2
Failed Sessions : 0
Session Establishment Test : passed
Address Families Test : passed
Router ID Test : passed
Perform a BGP Validation with Selected Tests
Using the include <bgp-number-range-list> and exclude <bgp-number-range-list> options, you can include or exclude one or more of the various checks performed during the validation. You can select from the following BGP validation tests:
Test Number
Test Name
0
Session Establishment
1
Address Families
2
Router ID
To include only the session establishment and router ID tests during a validation, run either of these commands:
The default validation (using no options) checks that all switches in the network have a consistent version.
cumulus@switch:~$ netq check cl-version
version check result summary:
Checked nodes : 12
Total nodes : 12
Rotten nodes : 0
Failed nodes : 0
Warning nodes : 0
Cumulus Linux Image Version Test : passed
The default validation runs a networkwide EVPN connectivity and configuration check on all nodes running the EVPN service. This example shows results for a fully successful validation.
cumulus@switch:~$ netq check evpn
evpn check result summary:
Checked nodes : 6
Total nodes : 6
Rotten nodes : 0
Failed nodes : 0
Warning nodes : 0
Additional summary:
Failed BGP Sessions : 0
Total Sessions : 16
Total VNIs : 3
EVPN BGP Session Test : passed,
EVPN VNI Type Consistency Test : passed,
EVPN Type 2 Test : passed,
EVPN Type 3 Test : passed,
EVPN Session Test : passed,
Vlan Consistency Test : passed,
Vrf Consistency Test : passed,
Perform an EVPN Validation with Selected Tests
Using the include <evpn-number-range-list> and exclude <evpn-number-range-list> options, you can include or exclude one or more of the various checks performed during the validation. You can select from the following EVPN validation tests:
Test Number
Test Name
0
EVPN BGP Session
1
EVPN VNI Type Consistency
2
EVPN Type 2
3
EVPN Type 3
4
EVPN Session
5
L3 VNI RMAC
6
VLAN Consistency
7
VRF Consistency
To run only the EVPN Type 2 test:
cumulus@switch:~$ netq check evpn include 2
evpn check result summary:
Checked nodes : 6
Total nodes : 6
Rotten nodes : 0
Failed nodes : 0
Warning nodes : 0
Additional summary:
Failed BGP Sessions : 0
Total Sessions : 0
Total VNIs : 3
EVPN BGP Session Test : skipped
EVPN VNI Type Consistency Test : skipped
EVPN Type 2 Test : passed,
EVPN Type 3 Test : skipped
EVPN Session Test : skipped
Vlan Consistency Test : skipped
Vrf Consistency Test : skipped
To exclude the BGP session and VRF consistency tests:
cumulus@switch:~$ netq check evpn exclude 0,6
evpn check result summary:
Checked nodes : 6
Total nodes : 6
Rotten nodes : 0
Failed nodes : 0
Warning nodes : 0
Additional summary:
Failed BGP Sessions : 0
Total Sessions : 0
Total VNIs : 3
EVPN BGP Session Test : skipped
EVPN VNI Type Consistency Test : passed,
EVPN Type 2 Test : passed,
EVPN Type 3 Test : passed,
EVPN Session Test : passed,
Vlan Consistency Test : passed,
Vrf Consistency Test : skipped
To run only the first five tests:
cumulus@switch:~$ netq check evpn include 0-4
evpn check result summary:
Checked nodes : 6
Total nodes : 6
Rotten nodes : 0
Failed nodes : 0
Warning nodes : 0
Additional summary:
Failed BGP Sessions : 0
Total Sessions : 16
Total VNIs : 3
EVPN BGP Session Test : passed,
EVPN VNI Type Consistency Test : passed,
EVPN Type 2 Test : passed,
EVPN Type 3 Test : passed,
EVPN Session Test : passed,
Vlan Consistency Test : skipped
Vrf Consistency Test : skipped
The default validation runs a networkwide connectivity and configuration check on all interfaces. This example shows results for a fully successful validation.
cumulus@switch:~$ netq check interfaces
interface check result summary:
Checked nodes : 12
Total nodes : 12
Rotten nodes : 0
Failed nodes : 0
Warning nodes : 0
Additional summary:
Unverified Ports : 56
Checked Ports : 108
Failed Ports : 0
Admin State Test : passed,
Oper State Test : passed,
Speed Test : passed,
Autoneg Test : passed,
Perform an Interfaces Validation with Selected Tests
Using the include <interface-number-range-list> and exclude <interface-number-range-list> options, you can include or exclude one or more of the various checks performed during the validation. You can select from the following interface validation tests:
Test Number
Test Name
0
Admin State
1
Oper State
2
Speed
3
Autoneg
The default validate verifies that all corresponding interface links have matching MTUs. This example shows no mismatches.
cumulus@switch:~$ netq check mtu
mtu check result summary:
Checked nodes : 12
Total nodes : 12
Rotten nodes : 0
Failed nodes : 0
Warning nodes : 0
Additional summary:
Warn Links : 0
Failed Links : 0
Checked Links : 196
Link MTU Consistency Test : passed,
VLAN interface Test : passed,
Bridge interface Test : passed,
The default validation runs a networkwide MLAG connectivity and configuration check on all nodes running the MLAG service. This example shows results for a fully successful validation.
cumulus@switch:~$ netq check mlag
mlag check result summary:
Checked nodes : 4
Total nodes : 4
Rotten nodes : 0
Failed nodes : 0
Warning nodes : 0
Peering Test : passed,
Backup IP Test : passed,
Clag SysMac Test : passed,
VXLAN Anycast IP Test : passed,
Bridge Membership Test : passed,
Spanning Tree Test : passed,
Dual Home Test : passed,
Single Home Test : passed,
Conflicted Bonds Test : passed,
ProtoDown Bonds Test : passed,
SVI Test : passed,
You can also run this check using netq check clag and get the same results.
This example shows representative results for one or more failures, warnings, or errors. In particular, you can see that you have duplicate system MAC addresses.
cumulus@switch:~$ netq check mlag
mlag check result summary:
Checked nodes : 4
Total nodes : 4
Rotten nodes : 0
Failed nodes : 2
Warning nodes : 0
Peering Test : passed,
Backup IP Test : passed,
Clag SysMac Test : 0 warnings, 2 errors,
VXLAN Anycast IP Test : passed,
Bridge Membership Test : passed,
Spanning Tree Test : passed,
Dual Home Test : passed,
Single Home Test : passed,
Conflicted Bonds Test : passed,
ProtoDown Bonds Test : passed,
SVI Test : passed,
Clag SysMac Test details:
Hostname Reason
----------------- ---------------------------------------------
leaf01 Duplicate sysmac with leaf02/None
leaf03 Duplicate sysmac with leaf04/None
Perform an MLAG Validation with Selected Tests
Using the include <mlag-number-range-list> and exclude <mlag-number-range-list> options, you can include or exclude one or more of the various checks performed during the validation. You can select from the following MLAG validation tests:
Test Number
Test Name
0
Peering
1
Backup IP
2
Clag Sysmac
3
VXLAN Anycast IP
4
Bridge Membership
5
Spanning Tree
6
Dual Home
7
Single Home
8
Conflicted Bonds
9
ProtoDown Bonds
10
SVI
To include only the CLAG SysMAC test during a validation:
cumulus@switch:~$ netq check mlag include 2
mlag check result summary:
Checked nodes : 4
Total nodes : 4
Rotten nodes : 0
Failed nodes : 2
Warning nodes : 0
Peering Test : skipped
Backup IP Test : skipped
Clag SysMac Test : 0 warnings, 2 errors,
VXLAN Anycast IP Test : skipped
Bridge Membership Test : skipped
Spanning Tree Test : skipped
Dual Home Test : skipped
Single Home Test : skipped
Conflicted Bonds Test : skipped
ProtoDown Bonds Test : skipped
SVI Test : skipped
Clag SysMac Test details:
Hostname Reason
----------------- ---------------------------------------------
leaf01 Duplicate sysmac with leaf02/None
leaf03 Duplicate sysmac with leaf04/None
To exclude the backup IP, CLAG SysMAC, and VXLAN anycast IP tests during a validation:
cumulus@switch:~$ netq check mlag exclude 1-3
mlag check result summary:
Checked nodes : 4
Total nodes : 4
Rotten nodes : 0
Failed nodes : 0
Warning nodes : 0
Peering Test : passed,
Backup IP Test : skipped
Clag SysMac Test : skipped
VXLAN Anycast IP Test : skipped
Bridge Membership Test : passed,
Spanning Tree Test : passed,
Dual Home Test : passed,
Single Home Test : passed,
Conflicted Bonds Test : passed,
ProtoDown Bonds Test : passed,
SVI Test : passed,
The default validation checks for synchronization of the NTP server with all nodes in the network. It is always important to have your devices in time synchronization to ensure that you can track configuration and management events and can make correlations between events.
The default validation runs a networkwide OSPF connectivity and configuration check on all nodes running the OSPF service. This example shows errors in the timers and interface MTU tests.
cumulus@switch:~# netq check ospf
Checked nodes: 8, Total nodes: 8, Rotten nodes: 0, Failed nodes: 4, Warning nodes: 0, Failed Adjacencies: 4, Total Adjacencies: 24
Router ID Test : passed
Adjacency Test : passed
Timers Test : 0 warnings, 4 errors
Network Type Test : passed
Area ID Test : passed
Interface Mtu Test : 0 warnings, 2 errors
Service Status Test : passed
Timers Test details:
Hostname Interface PeerID Peer IP Reason Last Changed
----------------- ------------------------- ------------------------- ------------------------- --------------------------------------------- -------------------------
spine-1 downlink-4 torc-22 uplink-1 dead time mismatch Mon Jul 1 16:18:33 2019
spine-1 downlink-4 torc-22 uplink-1 hello time mismatch Mon Jul 1 16:18:33 2019
torc-22 uplink-1 spine-1 downlink-4 dead time mismatch Mon Jul 1 16:19:21 2019
torc-22 uplink-1 spine-1 downlink-4 hello time mismatch Mon Jul 1 16:19:21 2019
Interface Mtu Test details:
Hostname Interface PeerID Peer IP Reason Last Changed
----------------- ------------------------- ------------------------- ------------------------- --------------------------------------------- -------------------------
spine-2 downlink-6 0.0.0.22 27.0.0.22 mtu mismatch Mon Jul 1 16:19:02 2019
tor-2 uplink-2 0.0.0.20 27.0.0.20 mtu mismatch Mon Jul 1 16:19:37 2019
The RoCE validation tests look for consistent RoCE and QoS configurations across nodes.
cumulus@switch:mgmt:~$ netq check roce
roce check result summary:
Total nodes : 12
Checked nodes : 12
Failed nodes : 0
Rotten nodes : 0
Warning nodes : 0
Skipped nodes : 0
RoCE mode Test : passed
RoCE Classification Test : passed
RoCE Congestion Control Test : passed
RoCE Flow Control Test : passed
RoCE ETS mode Test : passed
Hardware platforms have a number of sensors that provide environmental data about the switches. Knowing these are all within range is a good check point for maintenance.
For example, if you had a temporary HVAC failure and you have concerns that some of your nodes are beginning to overheat, you can run this validation to determine if any switches have already reached the maximum temperature threshold.
cumulus@switch:~$ netq check sensors
sensors check result summary:
Checked nodes : 8
Total nodes : 8
Rotten nodes : 0
Failed nodes : 0
Warning nodes : 0
Additional summary:
Checked Sensors : 136
Failed Sensors : 0
PSU sensors Test : passed,
Fan sensors Test : passed,
Temperature sensors Test : passed,
Validate the VLAN configuration and that they are operating properly:
cumulus@switch:~$ netq check vlan
vlan check result summary:
Checked nodes : 12
Total nodes : 12
Rotten nodes : 0
Failed nodes : 0
Warning nodes : 0
Additional summary:
Failed Link Count : 0
Total Link Count : 196
Link Neighbor VLAN Consistency Test : passed,
Clag Bond VLAN Consistency Test : passed,
Validate the VXLAN configuration and that they are operating properly:
cumulus@switch:~$ netq check vxlan
vxlan check result summary:
Checked nodes : 6
Total nodes : 6
Rotten nodes : 0
Failed nodes : 0
Warning nodes : 0
Vlan Consistency Test : passed,
BUM replication Test : passed,
This command validates both asymmetric and symmetric VXLAN configurations.
Scheduled Validations
When you want to see validation results on a regular basis, it is useful to configure a scheduled validation request to avoid re-creating the request each time. You can create up to 15 scheduled validations for a given NetQ system.
By default, a scheduled validation for each protocol and service runs every hour. You do not need to create a scheduled validation for these unless you want it to run at a different interval. You cannot remove the default validations, but they do not count as part of the 15-validation limit.
Schedule a Validation
You might want to create a scheduled validation that runs more often than the default validation if you are investigating an issue with a protocol or service. You might also want to create a scheduled validation that runs less often than the default validation if you prefer a longer term performance trend.
Sometimes it is useful to run validations on more than one protocol simultaneously. This gives a view into any potential relationship between the protocols or services status. For example, you might want to compare NTP with Agent validations if NetQ Agents are losing connectivity or the data appears to be collected at the wrong time. It would help determine if loss of time synchronization is causing the issue. Simultaneous validations are displayed in the NetQ UI.
Click Validation, then click Create a validation. Choose whether the scheduled validation should run on all devices or on specific device groups.
On the left side of the card, select the protocols or services you want to validate by clicking on their names, then click Next.
Click Later then choose when to start the check and how frequently to repeat the check (every 30 minutes, 1 hour, 3 hours, 6 hours, 12 hours, or 1 day).
Click Schedule.
To see the card with the other network validations, click View. If you selected more than one protocol or service, a card opens for each selection. To view the card on your workbench, click Open card.
To create a scheduled request containing checks on a single protocol or service in the NetQ CLI, run:
netq add validation name <text-new-validation-name> type (ntp | interfaces | license | sensors | evpn | vxlan | agents | mlag | vlan | bgp | mtu | ospf | roce | addr) interval <text-time-min> [alert-on-failure]
This example shows the creation of a BGP validation run every 15 minutes for debugging.
cumulus@switch:~$ netq add validation name Bgp15m type bgp interval 15m
Successfully added Bgp15m running every 15m
The associated Validation Result card is accessible from the full-screen Scheduled Validation Result card.
View Scheduled Validation Results
After creating scheduled validations with either the NetQ UI or the NetQ CLI, the results appear in the Scheduled Validation Result card. When a request has completed processing, you can access the Validation Result card from the full-screen Validations card. Each protocol and service has its own validation result card, but the content is similar on each.
Granularity of Data Shown Based on Time Period
On the medium and large Validation Result cards, vertically stacked heat maps represent the status of the runs; one for passing runs, one for runs with warnings, and one for runs with failures. Depending on the time period of data on the card, the number of smaller time blocks indicate that the status varies. A vertical stack of time blocks, one from each map, includes the results from all checks during that time. The results appear by how saturated the color is for each block. If all validations during that time period pass, then the middle block is 100% saturated (white) and the warning and failure blocks are zero % saturated (gray). As warnings and errors increase in saturation, the passing block is proportionally reduced in saturation. The example heat map for a time period of 24 hours shown here uses the most common time periods from the table showing the resulting time blocks and regions.
Time Period
Number of Runs
Number Time Blocks
Amount of Time in Each Block
6 hours
18
6
1 hour
12 hours
36
12
1 hour
24 hours
72
24
1 hour
1 week
504
7
1 day
1 month
2,086
30
1 day
1 quarter
7,000
13
1 week
Access and Analyze the Scheduled Validation Results
After a scheduled validation request has completed, the results are available in the corresponding Validation Result card.
To access the results:
In the workbench header, select Validation, then click Existing validations.
Select the validation results you want to view, then click View results.
The medium Scheduled Validation Result card(s) for the selected items appear on your workbench.
To analyze the results:
Note the distribution of results. Are there many failures? Are they concentrated together in time? Has the protocol or service recovered after the failures?
Hover over the heat maps to view the status numbers and what percentage of the total results that represents for a given region. The tooltip also shows the number of devices included in the validation and the number with warnings and/or failures. This is useful when you see the failures occurring on a small set of devices, as it might point to an issue with the devices rather than the network service.
Switch to the large Scheduled Validation card using the card size picker.
The card displays a chart alongside events metrics. Click to expand or collapse the chart.
You can view the configuration of the request that produced the results shown on this card, by hovering over the card and clicking .
To view all data available for all scheduled validation results for the given protocol or service, switch to the full-screen card.
In the Checks box, hover over an individual check and select View details for additional information:
Manage Scheduled Validations
You can edit or delete any scheduled validation that you created. Default validations cannot be edited or deleted, but can be disabled.
Edit a Scheduled Validation
At some point you might want to change the schedule or validation types that are specified in a scheduled validation request. This creates a new validation request and the original validation has the (old) label applied to the name. The old validation can no longer be edited.
When you update a scheduled request, the results for all future runs of the validation will be different from the results of previous runs of the validation.
To edit a scheduled validation:
Click Validation, then click Scheduled validations.
Hover over the validation then click Edit.
Select which checks to add or remove from the validation request, then click Update.
Change the schedule for the validation, then click Update.
You can run the modified validation immediately or wait for it to run according to the schedule you specified.
Delete a Scheduled Validation
You can remove a user-defined scheduled validation using the NetQ UI or the NetQ CLI. Default validations cannot be deleted, but they can be disabled.
Click Validation, then click Scheduled validations.
Hover over the validation you want to remove.
Click , then click Yes to confirm.
To disable a default validation, select the icon on the card for the desired validation and select Disable validation. Validation checks can be enabled from the same menu.
Determine the name of the scheduled validation you want to remove:
netq show validation summary [name <text-validation-name>] type (ntp | interfaces | license | sensors | evpn | vxlan | agents | mlag | vlan | bgp | mtu | ospf | roce | addr) [around <text-time-hr>] [json]
This example shows all scheduled validations for BGP.
cumulus@switch:~$ netq show validation summary type bgp
Name Type Job ID Checked Nodes Failed Nodes Total Nodes Timestamp
--------------- ---------------- ------------ -------------------------- ------------------------ ---------------------- -------------------------
Bgp30m scheduled 4c78cdf3-24a 0 0 0 Thu Nov 12 20:38:20 2020
6-4ecb-a39d-
0c2ec265505f
Bgp15m scheduled 2e891464-637 10 0 10 Thu Nov 12 20:28:58 2020
a-4e89-a692-
3bf5f7c8fd2a
Bgp30m scheduled 4c78cdf3-24a 0 0 0 Thu Nov 12 20:24:14 2020
6-4ecb-a39d-
0c2ec265505f
Bgp30m scheduled 4c78cdf3-24a 0 0 0 Thu Nov 12 20:15:20 2020
6-4ecb-a39d-
0c2ec265505f
Bgp15m scheduled 2e891464-637 10 0 10 Thu Nov 12 20:13:57 2020
a-4e89-a692-
3bf5f7c8fd2a
...
To remove the validation, run:
netq del validation <text-validation-name>
This example removes the scheduled validation named Bgp15m.
cumulus@switch:~$ netq del validation Bgp15m
Successfully deleted validation Bgp15m
Repeat these steps to remove additional scheduled validations.
Validate Device Groups
Both on-demand and scheduled validations can run on specific device groups. To create a validation for a device group rather than all devices:
Click Validation, then click Create a validation. Choose Run on group of switches:
Select which group to run the validation on:
Select the protocols or services you want to validate, then click Next.
Select which individual validations to run for each service. Individual checks can be disabled by clicking .
Choose whether to run the validation now or schedule it for another time, then click Run.
Verify Network Connectivity
You can verify the connectivity between two devices in both an ad-hoc fashion and by defining connectivity checks to occur on a scheduled basis.
Specifying Source and Destination Values
When specifying traces, the following options are available for the source and destination values:
Trace Type
Source
Destination
Layer 2
Hostname
MAC address plus VLAN
Layer 2
IPv4/IPv6 address plus VRF (if not default)
MAC address plus VLAN
Layer 2
MAC Address
MAC address plus VLAN
Layer 3
Hostname
IPv4/IPv6 address
Layer 3
IPv4/IPv6 address plus VRF (if not default)
IPv4/IPv6 address
If you use an IPv6 address, you must enter the complete, non-truncated address.
Known Addresses
The tracing function only knows about previously learned addresses. If you find that a path is invalid or incomplete, ping the identified device so that its address becomes known.
Create On-demand Traces
You can view the current connectivity between two devices in your network by creating an on-demand trace. You can perform these traces at layer 2 or layer 3 using the NetQ UI or the NetQ CLI.
Create a Layer 3 On-demand Trace Request
It is helpful to verify the connectivity between two devices when you suspect an issue is preventing proper communication between them. If you cannot find a layer 3 path, you might also try checking connectivity through a layer 2 path.
Determine the IP addresses of the two devices you want to trace.
Click Menu, then select IP addresses.
Select Filter and enter a hostname.
From the list of results, note the relevant address.
Filter the list again for the other hostname, and note its address.
Open the Trace Request card.
On a new workbench: Type trace in the Global search field and select the card.
On a current workbench: Click Add card, then select the Trace card.
In the Source field, enter the hostname or IP address of the device where you want to start the trace.
In the Destination field, enter the IP address of the device where you want to end the trace.
If you mistype an address, you must double-click it, or backspace over the error, and retype the address. You cannot select the address by dragging over it as this action attempts to move the card to another location.
Click Run now. A corresponding Trace Results card is opened on your workbench.
Use the netq trace command to view the results in the terminal window. Use the netq add trace command to view the results in the NetQ UI.
To create a layer 3 on-demand trace and see the results in the terminal window, run:
Note the syntax requires the destination device address first and then the source device address or hostname.
This example shows a trace from 10.10.10.1 (source, leaf01) to 10.10.10.63 (destination, border01) on the underlay in pretty output. You could have used leaf01 as the source instead of its IP address. The example first identifies the addresses for the source and destination devices using netq show ip addresses then runs the trace.
cumulus@switch:~$ netq border01 show ip addresses
Matching address records:
Address Hostname Interface VRF Last Changed
------------------------- ----------------- ------------------------- --------------- -------------------------
192.168.200.63/24 border01 eth0 Tue Nov 3 15:45:31 2020
10.0.1.254/32 border01 lo default Mon Nov 2 22:28:54 2020
10.10.10.63/32 border01 lo default Mon Nov 2 22:28:54 2020
cumulus@switch:~$ netq trace 10.10.10.63 from 10.10.10.1 pretty
Number of Paths: 12
Number of Paths with Errors: 0
Number of Paths with Warnings: 0
Path MTU: 9216
leaf01 swp54 -- swp1 spine04 swp6 -- swp54 border02 peerlink.4094 -- peerlink.4094 border01 lo
peerlink.4094 -- peerlink.4094 border01 lo
leaf01 swp53 -- swp1 spine03 swp6 -- swp53 border02 peerlink.4094 -- peerlink.4094 border01 lo
peerlink.4094 -- peerlink.4094 border01 lo
leaf01 swp52 -- swp1 spine02 swp6 -- swp52 border02 peerlink.4094 -- peerlink.4094 border01 lo
peerlink.4094 -- peerlink.4094 border01 lo
leaf01 swp51 -- swp1 spine01 swp6 -- swp51 border02 peerlink.4094 -- peerlink.4094 border01 lo
peerlink.4094 -- peerlink.4094 border01 lo
leaf01 swp54 -- swp1 spine04 swp5 -- swp54 border01 lo
leaf01 swp53 -- swp1 spine03 swp5 -- swp53 border01 lo
leaf01 swp52 -- swp1 spine02 swp5 -- swp52 border01 lo
leaf01 swp51 -- swp1 spine01 swp5 -- swp51 border01 lo
Each row of the pretty output shows one of the 12 available paths, with each path described by hops using the following format:
source hostname and source egress port – ingress port of first hop and device hostname and egress port – n*(ingress port of next hop and device hostname and egress port) – ingress port of destination device hostname
In this example, 8 of 12 paths use four hops to get to the destination and four use three hops. The overall MTU for all paths is 9216. No errors or warnings are present on any of the paths.
To create a layer 3 on-demand trace and see the results in the On-demand Trace Results card, run:
netq add trace <ip> from (<src-hostname> | <ip-src>) [alert-on-failure]
This example shows a trace from 10.10.10.1 (source, leaf01) to 10.10.10.63 (destination, border01).
Note the syntax requires the destination device address first and then the source device address or hostname.
This example shows a trace from 10.1.10.101 (source, server01) to 10.1.10.104 (destination, server04) through VRF RED in detail output. It first identifies the addresses for the source and destination devices and a VRF between them using netq show ip addresses then runs the trace. Note that the VRF name is case sensitive. The trace job might take some time to compile all the available paths, especially if there are many of them.
To create a layer 3 on-demand trace and see the results in the On-demand Trace Results card, run:
netq add trace <ip> from (<src-hostname> | <ip-src>) vrf <vrf>
This example shows a trace from 10.1.10.101 (source, server01) to 10.1.10.104 (destination, server04) through VRF RED.
cumulus@switch:~$ netq add trace 10.1.10.104 from 10.1.10.101 vrf RED
Create a Layer 2 On-demand Trace
It is helpful to verify the connectivity between two devices when you suspect an issue is preventing proper communication between them. If you cannot find a path through a layer 2 path, you might also try checking connectivity through a layer 3 path.
Note the syntax requires the destination device address first and then the source device address or hostname.
This example shows a trace from 44:38:39:00:00:32 (source, server01) to 44:38:39:00:00:3e (destination, server04) through VLAN 10 in detail output. It first identifies the MAC addresses for the two devices using netq show ip neighbors. Then it determines the VLAN using netq show macs. Then it runs the trace.
Use the netq add trace command to view on-demand trace results in the NetQ UI.
To create a layer 2 on-demand trace and see the results in the On-demand Trace Results card, run:
netq add trace <mac> vlan <1-4096> from <mac-src>
This example shows a trace from 44:38:39:00:00:32 (source, server01) to 44:38:39:00:00:3e (destination, server04) through VLAN 10.
cumulus@switch:~$ netq add trace 44:38:39:00:00:3e vlan 10 from 44:38:39:00:00:32
View On-demand Trace Results
After you have started an on-demand trace or run the netq add trace command, the results appear in either the UI or CLI. In the CLI, run the netq show trace results command. In the UI, locate the On-demand Trace Result card:
After you click Run Now, the corresponding results card opens on your workbench. While it is working on the trace, a notice appears on the card indicating it is running.
After it is finished, the results are displayed. The following results use the example previously outlined:
To view additional information:
Expand the card to its largest size and double-click a trace to open the detailed view:
This view displays:
Configuration details for the trace:
Errors and warnings for all paths (visible above the table). If the trace was run on a Mellanox switch and What Just Happened drops were detected, they are also included here.
Path details: walk through the path, host by host, viewing the interfaces, ports, tunnels, VLANs, and so forth used to traverse the network from the source to the destination. Scroll down to view all paths.
Note that in our example, paths 9-12 have only three hops because they do not traverse through the border02 switch, but go directly from spine04 to border01. Routing would likely choose these paths over the four-hop paths.
Create Scheduled Traces
There might be paths through your network that you consider critical or particularly important to your everyday operations. In these cases, it might be useful to create one or more traces to periodically confirm that at least one path is available between the relevant two devices. You can create scheduled traces at layer 2 or layer 3 in your network, from the NetQ UI and the NetQ CLI.
Select a timeframe under Schedule to specify how often you want to run the trace.
Accept the default starting time, or click in the Starting field to specify the day you want the trace to run for the first time.
Verify your entries are correct, then click Save As new.
Provide a name for the trace. Note: This name must be unique for a given user.
Click Save.
You can now run this trace on demand by selecting it from the dropdown list, or wait for it to run on its defined schedule.
To create a layer 3 scheduled trace and see the results in the Scheduled Trace Results card, run:
netq add trace name <text-new-trace-name> <ip> from (<src-hostname>|<ip-src>) interval <text-time-min>
This example shows the creation of a scheduled trace between leaf01 (source, 10.10.10.1) and border01 (destination, 10.10.10.63) with a name of L01toB01Daily that runs on an daily basis. The interval option value is 1440 minutes, as denoted by the units indicator (m).
cumulus@switch:~$ netq add trace name Lf01toBor01Daily 10.10.10.63 from 10.10.10.1 interval 1440m
Successfully added/updated Lf01toBor01Daily running every 1440m
View the results in the NetQ UI.
Create a Layer 3 Scheduled Trace through a Given VRF
Enter a VRF interface if you are using anything other than the default VRF.
Select a timeframe under Schedule to specify how often you want to run the trace.
Accept the default starting time, or click in the Starting field to specify the day you want the trace to run for the first time.
Verify your entries are correct, then click Save As new.
Provide a name for the trace. Note: This name must be unique for a given user.
Click Save.
You can now run this trace on demand by selecting it from the dropdown list, or wait for it to run on its defined schedule.
To create a layer 3 scheduled trace that uses a VRF other than default and then see the results in the Scheduled Trace Results card, run:
netq add trace name <text-new-trace-name> <ip> from (<src-hostname>|<ip-src>) vrf <vrf> interval <text-time-min>
This example shows the creation of a scheduled trace between server01 (source, 10.1.10.101) and server04 (destination, 10.1.10.104) with a name of Svr01toSvr04Hrly that runs on an hourly basis. The interval option value is 60 minutes, as denoted by the units indicator (m).
cumulus@switch:~$ netq add trace name Svr01toSvr04Hrly 10.1.10.104 from 10.10.10.1 interval 60m
Successfully added/updated Svr01toSvr04Hrly running every 60m
In the VLAN field, enter the VLAN ID associated with the destination device.
Select a timeframe under Schedule to specify how often you want to run the trace.
Accept the default starting time, or click in the Starting field to specify the day you want the trace to run for the first time.
Verify your entries are correct, then click Save As new.
Provide a name for the trace. Note: This name must be unique for a given user.
Click Save.
You can now run this trace on demand by selecting it from the dropdown list, or wait for it to run on its defined schedule.
To create a layer 2 scheduled trace and then see the results in the Scheduled Trace Result card, run:
netq add trace name <text-new-trace-name> <mac> vlan <1-4096> from (<src-hostname> | <ip-src>) [vrf <vrf>] interval <text-time-min>
This example shows the creation of a scheduled trace between server01 (source, 10.1.10.101) and server04 (destination, 44:38:39:00:00:3e) on VLAN 10 with a name of Svr01toSvr04x3Hrs that runs every three hours. The interval option value is 180 minutes, as denoted by the units indicator (m).
cumulus@switch:~$ netq add trace name Svr01toSvr04x3Hrs 44:38:39:00:00:3e vlan 10 from 10.1.10.101 interval 180m
Successfully added/updated Svr01toSvr04x3Hrs running every 180m
View the results in the NetQ UI.
Run a Scheduled Trace On-demand
To run a scheduled trace now:
Open the a Trace Request card.
Select the scheduled trace from the Select Trace or New Trace Request list. Note: In the medium and large cards, the trace details are filled in on selection of the scheduled trace.
Click Go or Run Now. A corresponding Trace Results card is opened on your workbench.
View Scheduled Trace Results
The results of scheduled traces are displayed on the Scheduled Trace Result card.
Granularity of Data Shown Based on Time Period
On the medium and large Trace Result cards, the status of the runs is represented in heat maps stacked vertically; one for runs with warnings and one for runs with failures. Depending on the time period of data on the card, the number of smaller time blocks used to indicate the status varies. A vertical stack of time blocks, one from each map, includes the results from all checks during that time. The results are shown by how saturated the color is for each block. If all traces run during that time period pass, then both blocks are 100% gray. If there are only failures, the associated lower blocks are 100% saturated white and the warning blocks are 100% saturated gray. As warnings and failures increase, the blocks increase their white saturation. As warnings or failures decrease, the blocks increase their gray saturation. An example heat map for a time period of 24 hours is shown here with the most common time periods in the table showing the resulting time blocks.
Time Period
Number of Runs
Number Time Blocks
Amount of Time in Each Block
6 hours
18
6
1 hour
12 hours
36
12
1 hour
24 hours
72
24
1 hour
1 week
504
7
1 day
1 month
2,086
30
1 day
1 quarter
7,000
13
1 week
View Detailed Scheduled Trace Results
After a scheduled trace request has completed, the results are available in the corresponding Trace Results card.
To view the results:
Open the Trace Request card.
Click Add card and select the Trace card.
Change to the full-screen card using the card size picker to view all scheduled traces.
Select the scheduled trace results you want to view.
Click Open card. This opens the medium Scheduled Trace Results card(s) for the selected items.
Note the distribution of results. Are there many failures? Are they concentrated together in time? Has the trace begun passing again?
Hover over the heat maps to view the status numbers and what percentage of the total results that represents for a given region.
Switch to the large Scheduled Trace Result card.
If there are a large number of warnings or failures, view the associated messages by selecting Failures or Warning in the filter above the table. This might help narrow the failures down to a particular device or small set of devices that you can investigate further.
Look for a consistent number of paths, MTU, hops in the small charts under the heat map. Changes over time here might correlate with events. Note if the number of bad nodes changes over time. Devices that become unreachable are often the cause of trace failures.
View the available paths for each run, by selecting Paths in the filter above the table.
You can view the configuration of the request that produced the results shown on this card workflow, by hovering over the card and clicking . If you want to change the configuration, click Edit to open the large Trace Request card, pre-populated with the current configuration. Follow the instructions in Create a Scheduled Trace Request to make your changes in the same way you created a new scheduled trace.
To view a summary of all scheduled trace results, switch to the full screen card.
Look for changes and patterns in the results for additional clues to isolate root causes of trace failures. Select and view related traces using the Edit menu.
View the details of any specific trace result by clicking on the trace. A new window opens similar to the following:
Scroll to the right to view the information for a given hop. Scroll down to view additional paths. This display shows each of the hosts and detailed steps the trace takes to validate a given path between two devices. Using Path 1 as an example, each path can be interpreted as follows:
Hop 1 is from the source device, server02 in this case.
It exits this device at switch port bond0 with an MTU of 9000 and over the default VRF to get to leaf02.
The trace goes in to swp2 with an MTU of 9216 over the vrf1 interface.
It exits leaf02 through switch port 52 and so on.
View a Summary of All Scheduled Traces
You can view a summary of all scheduled traces using the netq show trace summary command. The summary displays the name of the trace, a job ID, status, and timestamps for when was run and when it completed.
This example shows all scheduled traces run in the last 24 hours.
cumulus@switch:~$ netq show trace summary
Name Job ID Status Status Details Start Time End Time
--------------- ------------ ---------------- ---------------------------- -------------------- ----------------
leaf01toborder0 f8d6a2c5-54d Complete 0 Fri Nov 6 15:04:54 Fri Nov 6 15:05
1 b-44a8-9a5d- 2020 :21 2020
9d31f4e4701d
New Trace 0e65e196-ac0 Complete 1 Fri Nov 6 15:04:48 Fri Nov 6 15:05
5-49d7-8c81- 2020 :03 2020
6e6691e191ae
Svr01toSvr04Hrl 4c580c97-8af Complete 0 Fri Nov 6 15:01:16 Fri Nov 6 15:01
y 8-4ea2-8c09- 2020 :44 2020
038cde9e196c
Abc c7174fad-71c Complete 1 Fri Nov 6 14:57:18 Fri Nov 6 14:58
a-49d3-8c1d- 2020 :11 2020
67962039ebf9
Lf01toBor01Dail f501f9b0-cca Complete 0 Fri Nov 6 14:52:35 Fri Nov 6 14:57
y 3-4fa1-a60d- 2020 :55 2020
fb6f495b7a0e
L01toB01Daily 38a75e0e-7f9 Complete 0 Fri Nov 6 14:50:23 Fri Nov 6 14:57
9-4e0c-8449- 2020 :38 2020
f63def1ab726
leaf01toborder0 f8d6a2c5-54d Complete 0 Fri Nov 6 14:34:54 Fri Nov 6 14:57
1 b-44a8-9a5d- 2020 :20 2020
9d31f4e4701d
leaf01toborder0 f8d6a2c5-54d Complete 0 Fri Nov 6 14:04:54 Fri Nov 6 14:05
1 b-44a8-9a5d- 2020 :20 2020
9d31f4e4701d
New Trace 0e65e196-ac0 Complete 1 Fri Nov 6 14:04:48 Fri Nov 6 14:05
5-49d7-8c81- 2020 :02 2020
6e6691e191ae
Svr01toSvr04Hrl 4c580c97-8af Complete 0 Fri Nov 6 14:01:16 Fri Nov 6 14:01
y 8-4ea2-8c09- 2020 :43 2020
038cde9e196c
...
L01toB01Daily 38a75e0e-7f9 Complete 0 Thu Nov 5 15:50:23 Thu Nov 5 15:58
9-4e0c-8449- 2020 :22 2020
f63def1ab726
leaf01toborder0 f8d6a2c5-54d Complete 0 Thu Nov 5 15:34:54 Thu Nov 5 15:58
1 b-44a8-9a5d- 2020 :03 2020
9d31f4e4701d
View Scheduled Trace Settings for a Given Trace
You can view the configuration settings used by a give scheduled trace using the netq show trace settings command.
This example shows the settings for the scheduled trace named Lf01toBor01Daily.
cumulus@switch:~$ netq show trace settings name Lf01toBor01Daily
View Scheduled Trace Results for a Given Trace
You can view the results for a give scheduled trace using the netq show trace results command.
This example obtains the job ID for the trace named Lf01toBor01Daily, then shows the results.
cumulus@switch:~$ netq show trace summary name Lf01toBor01Daily json
cumulus@switch:~$ netq show trace results f501f9b0-cca3-4fa1-a60d-fb6f495b7a0e
Modify a Scheduled Trace
You can modify scheduled traces at any time as described below. An administrator can also manage scheduled traces through the NetQ management dashboard.
Be aware that changing the configuration of a trace can cause the results to be inconsistent with prior runs of the trace. If this is an unacceptable result, create a new scheduled trace. Optionally you can remove the original trace.
To modify a scheduled trace:
Open the Trace Request card.
Select the trace from the New trace request dropdown.
Edit the schedule, VLAN, or VRF and select Update.
Click Yes to complete the changes, or change the name of the previous version of this scheduled trace.
Click the change name link.
Edit the name, then click Update.
Click Yes to complete the changes, or repeat these steps until you have the name you want.
The validation can now be selected from the New Trace listing and run immediately using Go or Run Now, or you can wait for it to run the first time according to the schedule you specified.
Remove Scheduled Traces
If you have reached the maximum of 15 scheduled traces for your premises, you will need to remove traces to create additional ones.
Both a standard user and an administrative user can remove scheduled traces. No notification is generated on removal. Be sure to communicate with other users before removing a scheduled trace to avoid confusion and support issues.
Open the Trace Request card and expand the card to the largest size.
Select one or more traces.
Above the table, select Delete.
Find the name of the scheduled trace you want to remove:
netq show trace summary [name <text-trace-name>] [around <text-time-hr>] [json]
The following example shows all scheduled traces in JSON format:
cumulus@switch:~$ netq del trace leaf01toborder01
Successfully deleted schedule trace leaf01toborder01
Repeat these steps to remove additional traces.
Troubleshoot NetQ
This page describes how to generate a support file for the NVIDIA support team to help troubleshoot issues with NetQ itself.
Browse Configuration and Log Files
The following configuration and log files contain information that can help with troubleshooting:
File
Description
/etc/netq/netq.yml
The NetQ configuration file. This file appears only if you installed either the netq-apps package or the NetQ Agent on the system.
/var/log/netqd.log
The NetQ daemon log file for the NetQ CLI. This log file appears only if you installed the netq-apps package on the system.
/var/log/netq-agent.log
The NetQ Agent log file. This log file appears only if you installed the NetQ Agent on the system.
Check NetQ System Installation Status
The netq show status verbose command shows the status of NetQ components after installation. Use this command to validate NetQ system readiness:
cumulus@netq:~$ netq show status verbose
NetQ Live State: Active
Installation Status: FINISHED
Version: 4.5.0
Installer Version: 4.5.0
Installation Type: Standalone
Activation Key: EhVuZXRxLWasdW50LWdhdGV3YXkYsagDIixkWUNmVmhVV2dWelVUOVF3bXozSk8vb2lSNGFCaE1FR2FVU2dHK1k3RzJVPQ==
Master SSH Public Key: c3NoLXJzYSBBQUFBQjNOemFDMXljMkVBQUFBREFRQUJBQUFCfdsaHpjKzcwNmJiNVROOExRRXdLL3l5RVNLSHRhUE5sZS9FRjN0cTNzaHh1NmRtMkZpYmg3WWxKUE9lZTd5bnVlV2huaTZxZ0xxV3ZMYkpLMGdkc3RQcGdzNUlqanNMR3RzRTFpaEdNa3RZNlJYenQxLzh4Z3pVRXp3WTBWZDB4aWJrdDF3RGQwSjhnbExlbVk1RDM4VUdBVFVkMWQwcndLQ3gxZEhRdEM5L1UzZUs5cHFlOVdBYmE0ZHdiUFlaazZXLzM0ZmFsdFJxaG8rNUJia0pkTkFnWHdkZGZ5RXA1Vjc3Z2I1TUU3Q1BxOXp2Q1lXZW84cGtXVS9Wc0gxWklNWnhsa2crYlZ4MDRWUnN4ZnNIVVJHVmZvckNLMHRJL0FrQnd1N2FtUGxObW9ERHg2cHNHaU1EQkM0WHdud1lmSlNleUpmdTUvaDFKQ2NuRXpOVnVWRjUgcm9vdEBhbmlscmVzdG9yZQ==
Is Cloud: False
Kubernetes Cluster Nodes Status:
IP Address Hostname Role NodeStatus
------------- ------------- ------ ------------
10.188.46.243 10.188.46.243 Role Ready
Task Status
------------------------------------------------------------------ --------
Prepared for download and extraction FINISHED
Completed setting up python virtual environment FINISHED
Checked connectivity from master node FINISHED
Installed Kubernetes control plane services FINISHED
Installed Calico CNI FINISHED
Installed K8 Certificates FINISHED
Updated etc host file with master node IP address FINISHED
Stored master node hostname FINISHED
Generated and copied master node configuration FINISHED
Updated cluster information FINISHED
Plugged in release bundle FINISHED
Downloaded, installed, and started node service FINISHED
Downloaded, installed, and started port service FINISHED
Patched Kubernetes infrastructure FINISHED
Removed unsupported conditions from master node FINISHED
Installed NetQ Custom Resource Definitions FINISHED
Installed Master Operator FINISHED
Updated Master Custom Resources FINISHED
Updated NetQ cluster manager custom resource FINISHED
Installed Cassandra FINISHED
Created new database FINISHED
Updated Master Custom Resources FINISHED
Updated Kafka Custom Resources FINISHED
Read Config Key ConfigMap FINISHED
Backed up ConfigKey FINISHED
Read ConfigKey FINISHED
Created Keys FINISHED
Verified installer version FINISHED
...
Verify Connectivity between Agents and Appliances
The sudo opta-info.py command displays the status of and connectivity between agents and appliances. This command is typically used when debugging NetQ.
In the output below, the Opta Health Status column displays a healthy status, which indicates that the appliance is functioning properly. The Opta-Gateway Channel Status column displays the connectivity status between the appliance and cloud endpoint. The Agent ID column displays the switches connected to the appliance.
cumulus@netq-appliance:~$ sudo opta-info.py
[sudo] password for cumulus:
Service IP: 10.102.57.27
Opta Health Status Opta-Gateway Channel Status
-------------------- -----------------------------
Healthy READY
Agent ID Remote Address Status Messages Exchanged Time Since Last Communicated
---------- ---------------- -------- -------------------- ------------------------------
switch1 /20.1.1.10:46420 UP 906 2023-02-14 00:32:43.920000
netq-appliance /20.1.1.10:44717 UP 1234 2023-02-14 00:32:31.757000
cumulus@sm-telem-06:~$ sudo opta-info.py
Service IP: 10.97.49.106
Agent ID Remote Address Status Messages Exchanged Time Since Last Communicated
----------------------------------------- --------------------- -------- -------------------- ------------------------------
netq-lcm-executor-deploy-65c984fc7c-x97bl /10.244.207.135:52314 UP 1340 2023-02-13 19:31:37.311000
sm-telem-06 /10.188.47.228:2414 UP 1449 2023-02-14 06:42:12.215000
mlx-2010a1-14 /10.188.47.228:12888 UP 15 2023-02-14 06:42:27.003000
Generate a Support File on the NetQ System
The opta-support command generates information for troubleshooting issues with NetQ. It provides information about the NetQ Platform configuration and runtime statistics as well as output from the docker ps command.
cumulus@server:~$ sudo opta-support
Please send /var/support/opta_support_server_2021119_165552.txz to Nvidia support.
To export network validation check data in addition to OPTA health data to the support bundle, the NetQ CLI must be activated with AuthKeys. If the CLI access key is not activated, the command output displays a notification and data collection excludes netq show output:
cumulus@server:~$ sudo opta-support
Access key is not found. Please check the access key entered or generate a fresh access_key,secret_key pair and add it to the CLI configuration
Proceeding with opta-support generation without netq show outputs
Please send /var/support/opta_support_server_20211122_22259.txz to Nvidia support.
Generate a Support File on Switches and Hosts
The netq-support command generates information for troubleshooting NetQ issues on a host or switch. Similar to collecting a support bundle on the NetQ system, the NVIDIA support team might request this output to gather more information about switch and host status.
When you run the netq-support command on a switch running Cumulus Linux, a cl-support file will also be created and bundled within the NetQ support archive:
The following sections contain NetQ reference materials.
NetQ CLI Reference
This reference provides details about each of the NetQ CLI commands. For an overview of the CLI structure and usage, read the NetQ Command Line Overview.
The commands appear alphabetically by command name.
Because all commands begin with netq, the next required keyword determines the order
Punctuation and numbers appear before letters
When options are available, you should use them in the order listed.
Integrate NetQ API with Your Applications
The NetQ API provides access to key telemetry and system monitoring data gathered about the performance and operation of your network and devices so that you can view that data in your internal or third-party analytic tools. The API gives you access to the health of individual switches, network protocols and services, trace and validation results, and views of networkwide inventory and events.
This guide provides an overview of the NetQ API framework, the basics of using Swagger UI 2.0 or bash plus curl to view and test the APIs. Descriptions of each endpoint and model parameter are in individual API JSON files.
Inventory and Devices: Address, Inventory, MAC Address tables, Node, Sensors
Events: Events
Each endpoint has its own API. You can make requests for all data and all devices or you can filter the request by a given hostname. Each API returns a predetermined set of data as defined in the API models.
The Swagger interface displays both public and internal APIs. Public APIs do not have internal in their name. Internal APIs are not supported for public use and subject to change without notice.
Get Started
You can access the API gateway and execute requests from the Swagger UI or a terminal interface:
Open a new browser tab or window, and enter one of the following in the address bar:
Select auth from the Select a definition dropdown at the top right of the window. This opens the authorization API.
Open a terminal window.
Continue to Log In instructions.
Log In
While you can view the API endpoints without authorization, you can only execute the API endpoints if you have been authorized.
You must first obtain an access key and then use that key to authorize your access to the API.
Click POST/login.
Click Try it out.
Enter the username and password you used to install NetQ. For this release, the default is username admin and password admin. Do not change the access-key value.
Click Execute.
Scroll down to view the Responses. In the Server response section, in the Response body of the 200 code response, copy the access token in the top line.
Click Authorize.
Paste the access key into the Value field, and click Authorize.
Click Close.
To log in and obtain authorization:
Open a terminal window.
Login to obtain the access token. You will need the following information:
Hostname or IP address, and port (443 for Cloud deployments, 32708 for on-premises deployments) of your API gateway
Your login credentials that were provided as part of the NetQ installation process. For this release, the default is username admin and password admin.
This example uses an IP address of 192.168.0.10, port of 443, and the default credentials:
The output provides the access token as the first parameter.
{"access_token":"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9....","customer_id":0,"expires_at":1597200346504,"id":"admin","is_cloud":true,"premises":[{"name":"OPID0","namespace":"NAN","opid":0},{"name":"ea-demo-dc-1","namespace":"ea1","opid":30000},{"name":"ea-demo-dc-2","namespace":"ea1","opid":30001},{"name":"ea-demo-dc-3","namespace":"ea1","opid":30002},{"name":"ea-demo-dc-4","namespace":"ea1","opid":30003},{"name":"ea-demo-dc-5","namespace":"ea1","opid":30004},{"name":"ea-demo-dc-6","namespace":"ea1","opid":30005},{"name":"ea-demo-dc-7","namespace":"ea1","opid":80006},{"name":"Cumulus Data Center","namespace":"NAN","opid":1568962206}],"reset_password":false,"terms_of_use_accepted":true}
Copy the access token to a text file for use in making API data requests.
You are now able to create and execute API requests against the endpoints.
By default, authorization is valid for 24 hours, after which users must sign in again and reauthorize their account.
API Requests
You can use either the Swagger UI or a terminal window with bash and curl commands to create and execute API requests.
API requests are easy to execute in the Swagger UI. Just select the endpoint of interest and try it out.
Select the endpoint from the definition dropdown at the top right of the application.
This example shows the BGP endpoint selected:
Select the endpoint object.
This example shows the results of selecting the GET bgp object:
A description is provided for each object and the various parameters that can be specified. In the Responses section, you can see the data that is returned when the request is successful.
Click Try it out.
Enter values for the required parameters.
Click Execute.
In a terminal window, use bash plus curl to execute requests. Each request contains an API method (GET, POST, etc.), the address and API endpoint object to query, a variety of headers, and sometimes a body. For example, in the log in step above:
API method = POST
Address and API object = “https://<netq.domain>:443/netq/auth/v1/login”
Headers = -H “accept: application/json” and -H “Content-Type: application/json”
Body = -d “{ "username": "admin", "password": "admin", "access_key": "string"}”
API Responses
A NetQ API response is comprised of a status code, any relevant error codes (if unsuccessful), and the collected data (if successful).
The following HTTP status codes might be presented in the API responses:
Code
Name
Description
Action
200
Success
Request was successfully processed.
Review response.
400
Bad Request
Invalid input was detected in request.
Check the syntax of your request and make sure it matches the schema.
401
Unauthorized
Authentication has failed or credentials were not provided.
Provide or verify your credentials, or request access from your administrator.
403
Forbidden
Request was valid, but user might not have the needed permissions.
Verify your credentials or request an account from your administrator.
404
Not Found
Requested resource could not be found.
Try the request again after a period of time or verify status of resource.
409
Conflict
Request cannot be processed due to conflict in current state of the resource.
Verify status of resource and remove conflict.
500
Internal Server Error
Unexpected condition has occurred.
Perform general troubleshooting and try the request again.
503
Service Unavailable
The service being requested is currently unavailable.
Verify the status of the NetQ Platform or Appliance, and the associated service.
Example Requests and Responses
Some command requests and their responses are shown here, but feel free to run your own requests. To run a request, you will need your authorization token. When using the curl commands, the responses have been piped through a python tool to make them more readable. You can choose to do so as well.
Validate Networkwide Status of the BGP Service
Make your request to the bgp endpoint to obtain validate the operation of the BGP service on all nodes running the service.
Open the check endpoint.
Open the check object.
Click Try it out.
Enter values for time, duration, by, and proto parameters.
In this example, time=1597256560, duration=24, by=scheduled, and proto=bgp.
Click Execute, then scroll down to see the results under Server response.
Run the following curl command, entering values for the various parameters. In this example, time=1597256560, duration=24 (hours), by=scheduled, and proto=bgp.
Make your request to the interfaces endpoint to view the status of all interfaces. By specifying the eq-timestamp option and entering a date and time in epoch format, you indicate the data for that time (versus in the last hour by default), as follows:
Several NetQ features function exclusively on NVIDIA Spectrum switches. The following table summarizes supported features:
Spectrum Switch
What Just Happened
Flow Analysis
ECMP Monitoring
RoCE Monitoring
Spectrum-1
Partial support; no latency and congestion monitoring
No
Yes
Yes
Spectrum-2
Yes
Yes
Yes
Yes
Spectrum-3
Yes
Yes
Yes
Yes
Spectrum-4 POC
Yes
Yes
Yes
Yes
Glossary
Common Cumulus Linux and NetQ Terminology
The following table covers some basic terms used throughout the NetQ
user documentation.
Term
Definition
Agent
NetQ software that resides on a host server that provides metrics about the host to the NetQ Telemetry Server for network health analysis.
Bridge
Device that connects two communication networks or network segments. Occurs at OSI Model Layer 2, Data Link Layer.
Clos
Multistage circuit switching network used by the telecommunications industry, first formalized by Charles Clos in 1952.
Device
UI term referring to a switch, host, or chassis or combination of these. Typically used when describing hardware and components versus a software or network topology. See also Node.
Event
Change or occurrence in network or component that can trigger a notification. Events are categorized by severity: error or info.
Fabric
Network topology where a set of network nodes interconnects through one or more network switches.
Fresh
Node that has been communicative for the last 120 seconds.
High Availability
Software used to provide a high percentage of uptime (running and available) for network devices.
Host
A device connected to a TCP/IP network. It can run one or more virtual machines.
Hypervisor
Software which creates and runs virtual machines. Also called a virtual machine monitor.
IP Address
An Internet Protocol address comprises a series of numbers assigned to a network device to uniquely identify it on a given network. Version 4 addresses are 32 bits and written in dotted decimal notation with 8-bit binary numbers separated by decimal points. Example: 10.10.10.255. Version 6 addresses are 128 bits and written in 16-bit hexadecimal numbers separated by colons. Example: 2018:3468:1B5F::6482:D673.
Leaf
An access layer switch in a Spine-Leaf or Clos topology. An Exit-Leaf is a switch that connects to services outside of the data center such as firewalls, load balancers, and internet routers. See also Spine, Clos, Top of Rack, and Access Switch.
Linux
Set of free and open-source software operating systems built around the Linux kernel. Cumulus Linux is one of the available distribution packages.
Node
UI term referring to a switch, host, or chassis in a topology.
Notification
Item that informs a user of an event. Notifications are received through third-party applications, such as email or Slack.
Peer link
Link, or bonded links, used to connect two switches in an MLAG pair.
Rotten
Node that has been silent for 120 seconds or more.
Router
Device that forwards data packets (directs traffic) from nodes on one communication network to nodes on another network. Occurs at the OSI Model Layer 3, Network Layer.
Spine
Used to describe the role of a switch in a Spine-Leaf or Clos topology. See also Aggregation switch, End of Row switch, and distribution switch.
Switch
High-speed device that receives data packets from one device or node and redirects them to other devices or nodes on a network.
Telemetry server
NetQ server that receives metrics and other data from NetQ agents on leaf and spine switches and hosts.
Top of Rack
Switch that connects to the network (versus internally); also known as a ToR switch.
Virtual Machine
Emulation of a computer system that provides all the functions of a particular architecture.
Web-scale
A network architecture designed to deliver capabilities of large cloud service providers within an enterprise IT environment.
Whitebox
Generic, off-the-shelf, switch or router hardware used in Software Defined Networks (SDN).
Common Cumulus Linux and NetQ Acronyms
The following table covers some common acronyms used throughout the NetQ
user documentation.
Beyond knowing what physical components are in the deployment, it is valuable to know that their configurations are correct and they operate correctly. NetQ enables you to confirm that peer connections are present, discover any misconfigured ports, peers, or unsupported modules, and monitor for link flaps.
NetQ uses
LLDP (Link Layer Discovery Protocol) to collect port information. NetQ can also identify peer ports connected to DACs (Direct Attached Cables) and AOCs (Active Optical Cables) without using LLDP, even if the link is not UP.
Confirm Peer Connections
You can validate peer connections for all devices in your network or for a specific device or port. This example shows the peer hosts and their status for the leaf03 switch.
cumulus@switch:~$ netq leaf03 show interfaces physical peer
Matching cables records:
Hostname Interface Peer Hostname Peer Interface State Message
----------------- ------------------------- ----------------- ------------------------- ---------- -----------------------------------
leaf03 swp1 oob-mgmt-switch swp7 up
leaf03 swp2 down Peer port unknown
leaf03 swp47 leaf04 swp47 up
leaf03 swp48 leaf04 swp48 up
leaf03 swp49 leaf04 swp49 up
leaf03 swp50 leaf04 swp50 up
leaf03 swp51 exit01 swp51 up
leaf03 swp52 down Port cage empty
Discover Misconfigurations
You can verify that the following configurations are the same on both sides of a peer interface:
Admin state
Operational state
Link speed
Auto-negotiation setting
Use the netq check interfaces command to determine if any of the interfaces have continuity errors. This command only checks the physical interfaces; it does not check bridges, bonds, or other software constructs. The command syntax is:
netq check interfaces [around <text-time>] [json]
If NetQ cannot determine a peer for a given device, the port shows as unverified.
If you find a misconfiguration, use the netq show interfaces physical command for clues about the cause.
Find Mismatched Operational States
This example checks every interface for misconfiguration and you can find that one interface port has an error. Look for clues about the cause and see that the operational states do not match on the connection between leaf 03 and leaf04: leaf03 is up, but leaf04 is down. If the misconfiguration was due to a mismatch in the administrative state, the message would have been Admin state mismatch (up, down) or Admin state mismatch (down, up).
This example uses the and keyword to check the connections between two peers. You can see an error, so you check the physical peer information and discover that someone specified an incorrect peer. After fixing it, run the check again, and see that there are no longer any interface errors.
This example checks for configuration mismatches and finds a link speed mismatch on server03. The link speed on swp49 is 40G and the peer port swp50 shows as unknown.
This example checks for configuration mismatches and finds auto-negotiation setting mismatches between the servers and leafs. Auto-negotiation is off for the leafs, but on for the servers.