NVIDIA Cumulus Linux

NVIDIA NetQ 4.8 User Guide

NVIDIA® NetQ™ is a scalable, modern network operations tool set that provides visibility into your overlay and underlay networks, enabling troubleshooting in real-time. NetQ delivers data and statistics about the health of your data center—from the container, virtual machine, or host, all the way to the switch and port. NetQ correlates configuration and operational status, and tracks state changes while simplifying management for the entire Linux-based data center. With NetQ, network operations change from a manual, reactive, node-by-node approach to an automated, informed, and agile one. Visit Network Operations with NetQ to learn more.

This user guide provides documentation for network administrators who are responsible for deploying, configuring, monitoring, and troubleshooting the network in their data center or campus environment.

For a list of the new features in this release, see What's New. For bug fixes and known issues, refer to the release notes.

What's New

This page summarizes new features and improvements for the NetQ 4.8 release. For a complete list of open and fixed issues, see the release notes.

What’s New in NetQ 4.8.0

NetQ 4.8.0 includes the following new features and improvements:

Upgrade Paths

For deployments running:

Enabling high availability for the NetQ control plane and UI requires a new installation of your server cluster deployment. Database migration is not supported for new HA server cluster installations.

Compatible Agent Versions

The NetQ 4.8.0 server is compatible with NetQ Agents 4.7.0 and 4.6.0. You can install NetQ Agents on switches and servers running:

You must upgrade to the latest agent version to enable 4.8 features.

NetQ Overview

This section describes NetQ components and deployment models. It also outlines how to get started with the NetQ user interface and command line.

NetQ Basics

This section provides an overview of the NetQ hardware, software, and deployment models.

NetQ Components

NetQ contains the following applications and key components:

While these functions apply to both the on-premises and cloud solutions, they are configured differently, as shown in the following diagrams.

diagram of NetQ on-premises configuration
diagram of NetQ cloud configuration

NetQ Agents

NetQ Agents are installed via software and run on every monitored node in the network—including Cumulus® Linux® switches, Linux bare metal hosts, and virtual machines. The NetQ Agents push network data regularly and event information immediately to the NetQ Platform.

Switch Agents

The NetQ Agents running on Cumulus Linux or SONiC switches gather the following network data via Netlink:

for the following protocols:

Host Agents

The NetQ Agents running on hosts gather the same information as that for switches, plus the following network data:

The NetQ Agent obtains container information by listening to the Kubernetes orchestration tool.

NetQ Core

The NetQ core performs the data collection, storage, and processing for delivery to various user interfaces. It consists of a collection of scalable components running entirely within a single server. The NetQ software queries this server, rather than individual devices, enabling greater system scalability.

Data Aggregation

The data aggregation component collects data coming from all of the NetQ Agents. It then filters, compresses, and forwards the data to the streaming component. The server monitors for missing messages and also monitors the NetQ Agents themselves, sending notifications about events when appropriate. In addition to the telemetry data collected from the NetQ Agents, the aggregation component collects information from the switches and hosts, such as vendor, model, version, and basic operational state.

Data Stores

NetQ uses two types of data stores. The first stores the raw data, data aggregations, and discrete events needed for quick response to data requests. The second stores data based on correlations, transformations, and raw-data processing.

Real-time Streaming

The streaming component processes the incoming raw data from the aggregation server in real time. It reads the metrics and stores them as a time series, and triggers alarms based on anomaly detection, thresholds, and events.

Network Services

The network services component monitors protocols and services operation individually and on a networkwide basis and stores status details.

User Interfaces

NetQ data is available through several interfaces:

The CLI and UI query the RESTful API to present data. NetQ can integrate with event notification applications and third-party analytics tools.

Data Center Network Deployments

This section describes three common data center deployment types for network management:

NetQ operates over layer 3, and can operate in both layer-2 bridged and layer-3 routed environments. NVIDIA recommends a layer-3 routed environment whenever possible.

Out-of-band Management Deployment

NVIDIA recommends deploying NetQ on an out-of-band (OOB) management network to separate network management traffic from standard network data traffic.

The physical network hardware includes:

The following figure shows an example of a Clos network fabric design for a data center using an OOB management network overlaid on top, where NetQ resides. The physical connections are displayed as gray lines, connecting Spine01 to four leaf and two exit devices; Spine02 is connected to the same leaf and exit devices. Leaf01 and Leaf02 connect to each other over a peerlink and act as an MLAG pair for Server01 and Server02, as do Leaf03 and Leaf04 for Server03 and Server04. The edge connects to both exit devices, and the Internet node connects to Exit01.

diagram of a Clos network displaying connections between spine switches, leafs, servers, and exit switches.

The physical management hardware includes:

These switches connect to each physical network device through a virtual network overlay, as shown below.

diagram displaying connections between physical network hardwar and physical management hardware with a virtual network overlay

In-band Management Deployment

While not recommended, you can implement NetQ within your data network. In this scenario, there is no overlay and all traffic to and from the NetQ Agents and the NetQ Platform traverses the data paths along with your regular network traffic. The roles of the switches in the Clos network are the same, except that the NetQ Platform performs the aggregation function that the OOB management switch performed. If your network goes down, you might not have access to the NetQ Platform for troubleshooting. Certain features—such as lifecycle management—require additional configurations for in-band deployments.

diagram of an in-band management deployment.

Server Cluster Deployments

NetQ supports a server cluster deployment for users who prefer a solution with increased scalability in which the collected data by the NetQ Platform remains available through additional servers if one should fail for any reason. In this configuration, three NetQ Platforms are deployed, with one as the master and two as workers (or replicas). NetQ Agents send data to all three servers so that if the master NetQ Platform fails, one of the replicas automatically becomes the master and continues to store the telemetry data. The following example is based on an OOB-management configuration, and modified to support higher scalability for NetQ.

diagram of a server cluster deployment with one master and two worker NetQ platforms.

High Availability

You can configure a server cluster with a high-availability, virtual IP address for load balancing control plane processing and UI access across all nodes of a cluster deployment. This deployment model requires an additional IP address that is allocated in the same subnet as the master and worker nodes. The virtual IP address also enables UI access in the case of a master node failure. The virtual IP address must be specified during a new high-availability server cluster installation with the cluster-vip option specified in the install command.

High availability is only supported for on-premises deployments.

NetQ Operation

In either in-band or out-of-band deployments, NetQ offers networkwide configuration and device management, proactive monitoring capabilities, and network performance diagnostics.

The NetQ Agent

From a software perspective, a network switch has software associated with the hardware platform, the operating system, and communications. For data centers, the software on a network switch is similar to the following diagram:

diagram illustrating how the NetQ Agent interacts with a switch or host.

The NetQ Agent interacts with the various components and software on switches and hosts and provides the gathered information to the NetQ Platform. You can view the data using the NetQ CLI or UI.

The NetQ Agent polls the user space applications for information about the performance of the various routing protocols and services that are running on the switch. Cumulus Linux supports BGP and OSPF routing protocols as well as static addressing through FRRouting (FRR). Cumulus Linux also supports LLDP and MSTP among other protocols, and a variety of services such as systemd and sensors. SONiC supports BGP and LLDP.

For hosts, the NetQ Agent also polls for performance of containers managed with Kubernetes. This information is used to calculate the network’s health and check if the network is configured and operating correctly.

The NetQ Agent interacts with the Netlink communications between the Linux kernel and the user space, listening for changes to the network state, configurations, routes, and MAC addresses. NetQ sends notifications about these changes so that network operators and administrators can respond quickly when changes are not expected or favorable.

The NetQ Agent also interacts with the hardware platform to obtain performance information about various physical components, such as fans and power supplies, on the switch. The agent measures operational states and temperatures, along with cabling information to allow for proactive maintenance.

The NetQ Platform

After the collected data is sent to and stored in the NetQ database, you can:

Validate Configurations

You can monitor and validate your network’s health in the UI or through two sets of commands: netq check and netq show. They extract the information from the network service component and event service. The network service component is continually validating the connectivity and configuration of the devices and protocols running on the network. Using the netq check and netq show commands displays the status of the various components and services on a networkwide and complete software stack basis. See the command line reference for an exhaustive list of netq check and netq show commands.

Monitor Communication Paths

The trace engine validates the available communication paths between two network devices. The corresponding netq trace command enables you to view all of the paths between the two devices and if there are any breaks in the paths. For more information about trace requests, refer to Verify Network Connectivity.

View Historical State and Configuration Info

You can run all check, show, and trace commands for current and past statuses. To investigate past issues, use the netq check command and look for configuration or operational issues around the time that NetQ timestamped event messages. Then use the netq show commands to view information about device configurations. You can also use the netq trace command to see what the connectivity looked like between any problematic nodes at a particular time.

For example, the following diagram shows issues on spine01, leaf04, and server03:

network diagram displaying issues on spine01, leaf04, and server03

An administrator can run the following commands from any switch in the network to determine the cause of a BGP error on spine01:

cumulus@switch:~$ netq check bgp around 30m
Total Nodes: 25, Failed Nodes: 3, Total Sessions: 220 , Failed Sessions: 24,
Hostname          VRF             Peer Name         Peer Hostname     Reason                                        Last Changed
----------------- --------------- ----------------- ----------------- --------------------------------------------- -------------------------
exit-1            DataVrf1080     swp6.2            firewall-1        BGP session with peer firewall-1 swp6.2: AFI/ 1d:2h:6m:21s
                                                                      SAFI evpn not activated on peer              
exit-1            DataVrf1080     swp7.2            firewall-2        BGP session with peer firewall-2 (swp7.2 vrf  1d:1h:59m:43s
                                                                      DataVrf1080) failed,                         
                                                                      reason: Peer not configured                  
exit-1            DataVrf1081     swp6.3            firewall-1        BGP session with peer firewall-1 swp6.3: AFI/ 1d:2h:6m:21s
                                                                      SAFI evpn not activated on peer              
exit-1            DataVrf1081     swp7.3            firewall-2        BGP session with peer firewall-2 (swp7.3 vrf  1d:1h:59m:43s
                                                                      DataVrf1081) failed,                         
                                                                      reason: Peer not configured                  
exit-1            DataVrf1082     swp6.4            firewall-1        BGP session with peer firewall-1 swp6.4: AFI/ 1d:2h:6m:21s
                                                                      SAFI evpn not activated on peer              
exit-1            DataVrf1082     swp7.4            firewall-2        BGP session with peer firewall-2 (swp7.4 vrf  1d:1h:59m:43s
                                                                      DataVrf1082) failed,                         
                                                                      reason: Peer not configured                  
exit-1            default         swp6              firewall-1        BGP session with peer firewall-1 swp6: AFI/SA 1d:2h:6m:21s
                                                                      FI evpn not activated on peer                
exit-1            default         swp7              firewall-2        BGP session with peer firewall-2 (swp7 vrf de 1d:1h:59m:43s
...
 
cumulus@switch:~$ netq exit-1 show bgp
Matching bgp records:
Hostname          Neighbor                     VRF             ASN        Peer ASN   PfxRx        Last Changed
----------------- ---------------------------- --------------- ---------- ---------- ------------ -------------------------
exit-1            swp3(spine-1)                default         655537     655435     27/24/412    Fri Feb 15 17:20:00 2019
exit-1            swp3.2(spine-1)              DataVrf1080     655537     655435     14/12/0      Fri Feb 15 17:20:00 2019
exit-1            swp3.3(spine-1)              DataVrf1081     655537     655435     14/12/0      Fri Feb 15 17:20:00 2019
exit-1            swp3.4(spine-1)              DataVrf1082     655537     655435     14/12/0      Fri Feb 15 17:20:00 2019
exit-1            swp4(spine-2)                default         655537     655435     27/24/412    Fri Feb 15 17:20:00 2019
exit-1            swp4.2(spine-2)              DataVrf1080     655537     655435     14/12/0      Fri Feb 15 17:20:00 2019
exit-1            swp4.3(spine-2)              DataVrf1081     655537     655435     14/12/0      Fri Feb 15 17:20:00 2019
exit-1            swp4.4(spine-2)              DataVrf1082     655537     655435     13/12/0      Fri Feb 15 17:20:00 2019
exit-1            swp5(spine-3)                default         655537     655435     28/24/412    Fri Feb 15 17:20:00 2019
exit-1            swp5.2(spine-3)              DataVrf1080     655537     655435     14/12/0      Fri Feb 15 17:20:00 2019
exit-1            swp5.3(spine-3)              DataVrf1081     655537     655435     14/12/0      Fri Feb 15 17:20:00 2019
exit-1            swp5.4(spine-3)              DataVrf1082     655537     655435     14/12/0      Fri Feb 15 17:20:00 2019
exit-1            swp6(firewall-1)             default         655537     655539     73/69/-      Fri Feb 15 17:22:10 2019
exit-1            swp6.2(firewall-1)           DataVrf1080     655537     655539     73/69/-      Fri Feb 15 17:22:10 2019
exit-1            swp6.3(firewall-1)           DataVrf1081     655537     655539     73/69/-      Fri Feb 15 17:22:10 2019
exit-1            swp6.4(firewall-1)           DataVrf1082     655537     655539     73/69/-      Fri Feb 15 17:22:10 2019
exit-1            swp7                         default         655537     -          NotEstd      Fri Feb 15 17:28:48 2019
exit-1            swp7.2                       DataVrf1080     655537     -          NotEstd      Fri Feb 15 17:28:48 2019
exit-1            swp7.3                       DataVrf1081     655537     -          NotEstd      Fri Feb 15 17:28:48 2019
exit-1            swp7.4                       DataVrf1082     655537     -          NotEstd      Fri Feb 15 17:28:48 2019

Manage Network Events

The NetQ notifier lets you capture and filter events for devices, components, protocols, and services. This is especially useful when an interface or routing protocol goes down and you want to get them back up and running as quickly as possible. You can improve resolution time significantly by creating filters that focus on topics appropriate for a particular group of users. You can create filters for events related to BGP and MLAG session states, interfaces, links, NTP and other services, fans, power supplies, and physical sensor measurements.

The following is an example of a Slack message received on a netq-notifier channel indicating that the BGP session on switch leaf04 interface swp2 has gone down:

example Slack message from netq notifier indicating session failures

For more information, refer to Events and Notifications.

Timestamps in NetQ

Every event or entry in the NetQ database is stored with a timestamp that reports when the NetQ Agent captured an event on the switch or server. This timestamp is based on the switch or server time where the NetQ Agent is running, and is pushed in UTC format.

Interface state, IP addresses, routes, ARP/ND table (IP neighbor) entries and MAC table entries carry a timestamp that represents the time an event occurred (such as when a route is deleted or an interface comes up).

Data that is captured and saved based on polling has a timestamp according to when the information was captured rather than when the event actually happened, though NetQ compensates for this if the data extracted provides additional information to compute a more precise time of the event. For example, BGP uptime can be used to determine when the event actually happened in conjunction with the timestamp.

Restarting a NetQ Agent on a device does not update the timestamps for existing objects to reflect this new restart time. NetQ preserves their timestamps relative to the original start time of the Agent. A rare exception is if you reboot the device between the time it takes the Agent to stop and restart; in this case, the time is still relative to the start time of the Agent.

Exporting NetQ Data

You can export data from the NetQ Platform in the CLI or UI:

Important File Locations

The following configuration and log files can help with troubleshooting. See Troubleshoot NetQ for more information.

File Description
/etc/netq/netq.yml The NetQ configuration file. This file appears only if you installed either the netq-apps package or the NetQ Agent on the system.
/var/log/netqd.log The NetQ daemon log file for the NetQ CLI. This log file appears only if you installed the netq-apps package on the system.
/var/log/netq-agent.log The NetQ Agent log file. This log file appears only if you installed the NetQ Agent on the system.

NetQ User Interface Overview

The NetQ user interface (UI) lets you access NetQ through a web browser, where you can visualize your network and interact with the display using a keyboard and mouse.

The NetQ UI is supported on Google Chrome and Mozilla Firefox. It is designed to be viewed on a display with a minimum resolution of 1920 × 1080 pixels.

Access the NetQ UI

This page describes how to log in and out of NetQ.

Log In to NetQ

  1. Open a new Chrome or Firefox browser window or tab.

  2. Enter the following URL into the address bar:

    • NetQ on-premises appliance or VM: https://<hostname-or-ipaddress>
    • NetQ cloud appliance or VM: https://netq.nvidia.com
    NetQ login screen
  3. Log in.

    The following are the default usernames and passwords for UI access:

    • NetQ on-premises: admin, admin
    • NetQ cloud: Use the credentials you created during setup. You should receive an email from NVIDIA titled NetQ Access Link.

Enter your username and password to log in. You can also log in with SSO if your company has enabled it.

Username and Password

  1. Locate the email you received from NVIDIA titled NetQ Access Link. Select Create Password.

  2. Enter a new password, then enter it again to confirm it.

  3. Log in using your email address and new password.

  4. Accept the Terms of Use after reading them.

    The default workbench opens, with your username and premises shown in the top-right corner of NetQ.

SSO

  1. Follow the steps above until you reach the NetQ login screen.

  2. Select Sign up for SSO and enter your organization’s name.

  1. Enter your username and password.

  2. Create a new password and enter the new password again to confirm it.

  3. Click Update and Accept after reading the Terms of Use.

    The default workbench opens, with your username shown in the top-right corner of NetQ.

  1. Enter your username.

  2. Enter your password.

    The user-specified home workbench is displayed. If a home workbench is not specified, then the default workbench is displayed.

Any workbench can be set as the home workbench. Select User Settings > Profiles and Preferences, then on the Workbenches card select the workbench you'd like to designate as your home workbench.

Log Out of NetQ

  1. Select profile User Settings in the top-right corner of NetQ.

  2. Select Log Out.

Application Layout

The NetQ UI contains two main areas:

workbench displaying task bar and 5 cards

Select the Menu in the top-left corner to navigate to:

Description Menu
  • Search: searches items listed under the main menu
  • Favorites: lists a user’s favorite workbench
  • Workbenches: lists all workbenches
  • Network: lists various network elements which you can select to monitor your network’s state
  • RoCE counters: lists performance counters for devices running RoCE
  • Traffic histograms: lists types of network traffic that can be visualized with histograms
  • Notifications: lets you set up notification channels and create rules for threshold-crossing events
  • Admin: lets administrators manage NetQ itself and access lifecycle management

You can search for devices and cards in the Global Search field in the header. It behaves like most searches and provides suggestions to help you quickly find device information or populate your workbench with sets of cards.

Selecting the NVIDIA logo takes you to your favorite workbench. For details about specifying your favorite workbench, refer to Set User Preferences.

Validation Summary

Found in the header, the validation summary displays the overall health of your network.

On initial start up, it can take up to an hour to reach an accurate health indication as some processes only run every 30 minutes.

Workbenches

A workbench comprises a given set of cards. A pre-configured default workbench, NetQ Workbench, is available to get you started. You can customize your workbenches by adding or removing cards. For more detail about managing your data using workbenches, refer to Focus Your Monitoring Using Workbenches.

Cards

Cards display information about your network. Each card describes a particular aspect of the network and can be expanded to display information and statistics at increasingly granular levels. You can add or remove cards from a workbench, move between cards and card sizes, and make copies of cards that display different levels of data for a given time period. For details about working with cards, refer to Access Data with Cards.

User Settings

Each user can customize the NetQ display, time zone, and date format; change their account password; and manage their workbenches. Navigate to User Settings  > Profile & Preferences. For details, refer to Set User Preferences.

Focus Your Monitoring Using Workbenches

Workbenches are dashboards where you can visualize and curate data representing different aspects of your network. For example, you might create a workbench that:

NVIDIA provides an example workbench that opens when you first log in to NetQ, called NetQ Workbench. It includes cards displaying your network’s device inventory, switch inventory, validation summary, What Just Happened events, host inventory, DPU inventory, and system events. This workbench is visible to all users within an organization and any changes to it will not be saved.

default netq workbench

Create a Custom Workbench

You can create an unlimited number of custom workbenches. These workbenches are only visible to the user who created them and changes are saved automatically. To create a new workbench:

  1. Select add icon New in the workbench header and give the workbench a name.

  2. Choose whether to restrict access to this workbench to a single premises (local) or make it available across all premises (global). You can modify this setting later if you change your mind.

Refer to the premises management chapter for more information about setting up and managing data between multiple premises.

  1. (Optional) Set the workbench as your home workbench, which opens when you log in to NetQ from the same premises.

  2. Select the cards you want to display on your new workbench.

    interface displaying the cards a user can select to add to their workbench
  3. Click Create.

You can clone a workbench to quickly create a new workbench with the same cards as the one you're viewing. In the header, select Clone, modify the workbench settings, then click Clone.

Switch Between Workbenches

There are several ways to access workbenches:

Edit a Workbench

The changes you make to a workbench are saved automatically. To change a workbench from local to global (or global to local) availability, select next to the current workbench and select Manage my WB. Locate the workbench whose availability you’d like to change and select Local or Global.

To change your home workbench, select the next to the current workbench and select Manage my WB. On the Workbenches card, hover over the workbench you’d like to set as your home workbench and select Home. The next time you log in from this premises, the workbench you selected will be displayed.

Delete a Workbench

You can only delete workbenches that you created. The NVIDIA-supplied NetQ Workbench cannot be deleted. When you delete a workbench that you have designated as your home workbench, the NetQ Workbench will replace it as the home workbench. To delete a workbench:

  1. Select profile icon User Settings in the top-right corner.

  2. Select Profile & Preferences.

  3. Locate the Workbenches card.

  4. Hover over the workbench you want to remove, and click Delete.

Manage Auto-refresh

You can specify how often to update the data displayed on your workbenches. Three refresh rates are available:

To modify the auto-refresh setting:

  1. In the header, select the dropdown next to Refresh.

  2. Select the refresh rate. A check mark indicates the current selection. The new refresh rate is applied immediately.

    refresh rate dropdown listng rate options of 30 seconds, 1 minute, and 2 minutes

To disable auto-refresh, select pause icon Pause. When you’re ready for the data to refresh, select play icon Play.

Access Data with Cards

Cards present information about your network for monitoring and troubleshooting; each card describes a particular aspect of the network. Cards are collected onto a workbench where all data relevant to a task or set of tasks is visible. You can add and remove cards from a workbench, increase or decrease their sizes, change the time period of the data shown on a card, and make copies of cards to show different levels of data at the same time.

Available Cards

Each card focuses on a particular aspect of your network. They include:

Card Sizes

Cards are available in 4 sizes. The granularity of the content on a card varies with the size of the card, with the highest level of information on the smallest card to the most detailed information on the full-screen card.

Card Size Summary

Card Size Small Medium Large Full Screen
Primary Purpose
  • Quick view of status, typically at the level of good or bad
  • View key performance parameters or statistics
  • Perform quick actions
  • Monitor for potential issues
  • View detailed performance and statistics
  • Perform actions
  • Compare and review related information
  • View all attributes for given network aspect
  • Analyze and visualize detailed data
  • Export and filter data

Card Actions

Add Cards to Your Workbench

  1. Click Add card in the header.

  2. Select the card(s) you want to add to your workbench.

  3. When you have selected the cards you want to add to your workbench, select Open cards.

The cards are placed at the end of the set of cards currently on the workbench. You might need to scroll down to see them. Drag and drop the cards on the workbench to rearrange them.

Add Switch Cards to Your Workbench

You can add switch cards to a workbench by selecting Devices in the header or by searching for it in the Global Search field. To add a switch card from the header:

  1. Click Devices, then select Open a device card.

  2. Select the device from the suggestions that appear:

    dropdown displaying switches
  3. Choose the card’s size, then select Add.

Remove Cards from Your Workbench

To remove all the cards from your workbench, click the Clear icon in the header. To remove an individual card:

  1. Hover over the card you want to remove.

  2. Click (More Actions menu).

  3. Select Remove.

The card is removed from the workbench, but not from the application.

Change the Size of the Card

  1. Hover over the top portion of the card until you see a rectangular box divided into four segments.

  2. Move your cursor over the box until the desired size option is highlighted.

    One-quarter width opens a small card. One-half width opens a medium card. Three-quarters width opens a large card. Full width opens a full-screen card.

  3. Select the size. When the card changes to the selected size, it might move to a different area on the workbench.

Change the Time Period for the Card Data

All cards have a default time period for the data shown on the card, typically the last 24 hours. You can change the time period to view the data during a different time range to aid analysis of previous or existing issues.

To change the time period for a card:

  1. Hover over the top portion of the card and select the clock icon .

  2. Select a time period from the dropdown list.

    time options

Changing the time period in this manner only changes the time period for the given card.

Table Settings

You can manipulate the tabular data displayed in a full-screen card by filtering and sorting the columns. Hover over the column header and select it to sort the column. The data is sorted in ascending or descending order: A-Z, Z-A, 1-n, or n-1. The number of rows that can be sorted via the UI is limited to 10,000. To reposition the columns, drag and drop them using your mouse.

Select Export to download and export the tabular data. You can sort and filter tables that exceed 10,000 rows by exporting the data as a CSV file and opening it in a spreadsheet program.

The following icons are common in the full-screen card view:

Icon Action Description
Select All Selects all items in the list.
Clear All Clears all existing selections in the list.
Add Item Adds item to the list.
Edit Edits the selected item.
Delete Removes the selected items.
Filter Filters the list using available parameters.
, Generate/Delete AuthKeys Creates or removes NetQ CLI authorization keys.
Open Cards Opens the corresponding validation or trace card(s).
Assign role Opens role assignment options for switches.
Export Exports selected data into either a .csv or JSON-formatted file.

When there are many items in a table, NetQ loads up to 25 rows by default and provides the rest in additional table pages, accessible through the pagination controls. Pagination is displayed under the table.

Set User Preferences

This section describes how to customize your NetQ display, change your password, and manage your workbenches.

Configure Display Settings

The Display card contains the options for setting the application theme (light or dark), language, time zone, and date formats.

To configure the display settings:

  1. Select User Settings in the top-right corner.

  2. Select Profile & Preferences.

  3. Locate the Display card:

    display card with fields specifying theme, language, time zone, and date format.
  4. Select the Theme field and choose either dark or light. The following figure shows the light theme:

    NetQ workbench displayed in light theme
  5. Select the Time zone field to adjust the time zone.

    By default, the time zone is set to the user’s local time zone. If a time zone has not been selected, NetQ defaults to the current local time zone where NetQ is installed. All time values are based on this setting. This is displayed (and can also be changed) in the application header, and is based on Greenwich Mean Time (GMT). If your deployment is not local to you (for example, you want to view the data from the perspective of a data center in another time zone) you can change the display to a different time zone.

  6. In the Date format field, select the date and time format you want displayed on the cards.

Change Your Password

  1. Click User Settings in the top-right corner.

  2. Click Profile & Preferences.

  3. In the Basic Account Info card, select Change password.

  4. Enter your current password, followed by your new password. The select Save.

To reset the password for an admin account, follow these instructions.

Manage Your Workbenches

A workbench is similar to a dashboard. This is where you collect and view the data that is important to you. You can have more than one workbench and manage them with the Workbenches card located in Profile & Preferences. From the Workbenches card, you can view, sort, and delete workbenches. For a detailed overview of workbenches, see Focus Your Monitoring Using Workbenches.

NetQ Command Line Overview

The NetQ CLI provides access to all network state and event information collected by NetQ Agents. It behaves similarly to typical CLIs, with groups of commands that display related information, and help commands that provide additional information. See the command line reference for a comprehensive list of NetQ commands, including examples, options, and definitions.

The NetQ command line interface only runs on switches and server hosts implemented with Intel x86 or ARM-based architectures.

CLI Access

When you install or upgrade NetQ, you can also install and enable the CLI on your NetQ server or appliance and hosts.

To access the CLI from a switch or server:

  1. Log in to the device. The following example uses the default username of cumulus and a hostname of switch:

    <computer>:~<username>$ ssh cumulus@switch
    
  2. Enter your password to reach the command prompt. The default password is CumulusLinux!

  3. You can now run commands:

    cumulus@switch:~$ netq show agents
    cumulus@switch:~$ netq check bgp
    

Command Line Basics

This section describes the core structure and behavior of the NetQ CLI.

Command Line Structure

The NetQ command line has a flat structure as opposed to a modal structure: you can run all commands from the standard command prompt instead of only in a specific mode, at the same level.

Command Syntax

All NetQ CLI commands begin with netq. NetQ commands fall into one of four syntax categories: validation (check), monitoring (show), configuration, and trace.

netq check <network-protocol-or-service> [options]
netq show <network-protocol-or-service> [options]
netq config <action> <object> [options]
netq trace <destination> from <source> [options]
Symbols Meaning
Parentheses ( ) Grouping of required parameters. Choose one.
Square brackets [ ] Single or group of optional parameters. If more than one object or keyword is available, choose one.
Angle brackets < > Required variable. Value for a keyword or option; enter according to your deployment nomenclature.
Pipe | Separates object and keyword options, also separates value options; enter one object or keyword and zero or one value.

Command Output

The command output presents results in color for many commands. Results with errors appear in red, and warnings appear in yellow. Results without errors or warnings appear in either black or green. VTEPs appear in blue. A node in the pretty output appears in bold, and angle brackets (< >) wrap around a router interface. To view the output with only black text, run the netq config del color command. You can view output with colors again by running netq config add color.

All check and show commands have a default timeframe of now to one hour ago, unless you specify an approximate time using the around keyword or a range using the between keyword. For example, running netq check bgp shows the status of BGP over the last hour. Running netq show bgp around 3h shows the status of BGP three hours ago.

When entering a time value, you must include a numeric value and the unit of measure:

  • w: weeks
  • d: days
  • h: hours
  • m: minutes
  • s: seconds
  • now

When using the between option, you can enter the start time (text-time) and end time (text-endtime) values as most recent first and least recent second, or vice versa. The values do not have to have the same unit of measure. Use the around option to view information for a particular time.

Command Prompts

NetQ code examples use the following prompts:

To use the NetQ CLI, the switches must be running the Cumulus Linux or SONiC operating system, NetQ Platform or NetQ Collector software, the NetQ Agent, and the NetQ CLI. The hosts must be running CentOS, RHEL, or Ubuntu OS, the NetQ Agent, and the NetQ CLI. Refer to Install NetQ for additional information.

Command Completion

As you enter commands, you can get help with the valid keywords or options using the tab key. For example, using tab completion with netq check displays the possible objects for the command, and returns you to the command prompt to complete the command:

cumulus@switch:~$ netq check <<press Tab>>
    agents      :  Netq agent
    bgp         :  BGP info
    cl-version  :  Cumulus Linux version
    clag        :  Cumulus Multi-chassis LAG
    evpn        :  EVPN
    interfaces  :  network interface port
    mlag        :  Multi-chassis LAG (alias of clag)
    mtu         :  Link MTU
    ntp         :  NTP
    ospf        :  OSPF info
    sensors     :  Temperature/Fan/PSU sensors
    vlan        :  VLAN
    vxlan       :  VXLAN data path
cumulus@switch:~$ netq check

Command Help

As you enter commands, you can get help with command syntax by entering help at various points within a command entry. For example, to find out which options are available for a BGP check, enter help after entering some of the netq check command. In the following example, you can see that there are no additional required parameters and you can use three optional parameters — hostnames, vrf, and around — with a BGP check:

cumulus@switch:~$ netq check bgp help
Commands:
    netq check bgp [label <text-label-name> | hostnames <text-list-hostnames>] [vrf <vrf>] [check_filter_id <text-check-filter-id>] [include <bgp-number-range-list> | exclude <bgp-number-range-list>] [around <text-time>] [json | summary]
   netq show unit-tests bgp [check_filter_id <text-check-filter-id>] [json]

To see an exhaustive list of commands, run:

cumulus@switch:~$ netq help list

To get usage information for NetQ, run:

cumulus@switch:~$ netq help verbose

Command History

The CLI stores commands issued within a session, which lets you review and rerun commands that you already ran. At the command prompt, press the Up Arrow and Down Arrow keys to move back and forth through the list of commands previously entered. When you have found a given command, you can run the command by pressing Enter, just as you would if you had entered it manually. You can also modify the command before you run it.

Command Categories

While the CLI has a flat structure, NetQ commands are conceptually grouped into the following functional categories:

Validation Commands

The netq check commands validate the current or historical state of the network by looking for errors and misconfigurations in the network. The commands run fabric-wide validations against various configured protocols and services to determine how well the network is operating. You can perform validation checks for the following:

The commands take the form of netq check <network-protocol-or-service> [options], where the options vary according to the protocol or service.

Example check command

The following example shows the output for the netq check bgp command. Failed checks appear in the summary results or in the failedNodes section.

cumulus@switch:~$ netq check bgp
bgp check result summary:

Checked nodes       : 8
Total nodes         : 8
Rotten nodes        : 0
Failed nodes        : 0
Warning nodes       : 0

Additional summary:
Total Sessions      : 30
Failed Sessions     : 0

Session Establishment Test   : passed
Address Families Test        : passed
Router ID Test               : passed

Example check command in JSON format
cumulus@switch:~$ netq check bgp json
{
    "tests":{
        "Session Establishment":{
            "suppressed_warnings":0,
            "errors":[

            ],
            "suppressed_errors":0,
            "passed":true,
            "warnings":[

            ],
            "duration":0.0000853539,
            "enabled":true,
            "suppressed_unverified":0,
            "unverified":[

            ]
        },
        "Address Families":{
            "suppressed_warnings":0,
            "errors":[

            ],
            "suppressed_errors":0,
            "passed":true,
            "warnings":[

            ],
            "duration":0.0002634525,
            "enabled":true,
            "suppressed_unverified":0,
            "unverified":[

            ]
        },
        "Router ID":{
            "suppressed_warnings":0,
            "errors":[

            ],
            "suppressed_errors":0,
            "passed":true,
            "warnings":[

            ],
            "duration":0.0001821518,
            "enabled":true,
            "suppressed_unverified":0,
            "unverified":[

            ]
        }
    },
    "failed_node_set":[

    ],
    "summary":{
        "checked_cnt":8,
        "total_cnt":8,
        "rotten_node_cnt":0,
        "failed_node_cnt":0,
        "warn_node_cnt":0
    },
    "rotten_node_set":[

    ],
    "warn_node_set":[

    ],
    "additional_summary":{
        "total_sessions":30,
        "failed_sessions":0
    },
    "validation":"bgp"
}

Monitoring Commands

The netq show commands let you view details about the current or historical configuration and status of various protocols and services. You can view the configuration and status for the following:

The commands take the form of netq [<hostname>] show <network-protocol-or-service> [options], where the options vary according to the protocol or service. You can restrict the commands from showing the information for all devices to showing information only for a selected device using the hostname option.

Example show command

The following example shows the standard output for the netq show agents command:

cumulus@switch:~$ netq show agents
Matching agents records:
Hostname          Status           NTP Sync Version                              Sys Uptime                Agent Uptime              Reinitialize Time          Last Changed
----------------- ---------------- -------- ------------------------------------ ------------------------- ------------------------- -------------------------- -------------------------
border01          Fresh            yes      3.2.0-cl4u30~1601410518.104fb9ed     Mon Sep 21 17:04:54 2020  Tue Sep 29 21:24:58 2020  Tue Sep 29 21:24:58 2020   Thu Oct  1 16:07:38 2020
border02          Fresh            yes      3.2.0-cl4u30~1601410518.104fb9ed     Mon Sep 21 17:04:57 2020  Tue Sep 29 21:24:58 2020  Tue Sep 29 21:24:58 2020   Thu Oct  1 16:07:33 2020
fw1               Fresh            yes      3.2.0-cl4u30~1601410518.104fb9ed     Mon Sep 21 17:04:44 2020  Tue Sep 29 21:24:48 2020  Tue Sep 29 21:24:48 2020   Thu Oct  1 16:07:26 2020
fw2               Fresh            yes      3.2.0-cl4u30~1601410518.104fb9ed     Mon Sep 21 17:04:42 2020  Tue Sep 29 21:24:48 2020  Tue Sep 29 21:24:48 2020   Thu Oct  1 16:07:22 2020
leaf01            Fresh            yes      3.2.0-cl4u30~1601410518.104fb9ed     Mon Sep 21 16:49:04 2020  Tue Sep 29 21:24:49 2020  Tue Sep 29 21:24:49 2020   Thu Oct  1 16:07:10 2020
leaf02            Fresh            yes      3.2.0-cl4u30~1601410518.104fb9ed     Mon Sep 21 17:03:14 2020  Tue Sep 29 21:24:49 2020  Tue Sep 29 21:24:49 2020   Thu Oct  1 16:07:30 2020
leaf03            Fresh            yes      3.2.0-cl4u30~1601410518.104fb9ed     Mon Sep 21 17:03:37 2020  Tue Sep 29 21:24:49 2020  Tue Sep 29 21:24:49 2020   Thu Oct  1 16:07:24 2020
leaf04            Fresh            yes      3.2.0-cl4u30~1601410518.104fb9ed     Mon Sep 21 17:03:35 2020  Tue Sep 29 21:24:58 2020  Tue Sep 29 21:24:58 2020   Thu Oct  1 16:07:13 2020
oob-mgmt-server   Fresh            yes      3.1.1-ub18.04u29~1599111022.78b9e43  Mon Sep 21 16:43:58 2020  Mon Sep 21 17:55:00 2020  Mon Sep 21 17:55:00 2020   Thu Oct  1 16:07:31 2020
server01          Fresh            yes      3.2.0-ub18.04u30~1601393774.104fb9e  Mon Sep 21 17:19:57 2020  Tue Sep 29 21:13:07 2020  Tue Sep 29 21:13:07 2020   Thu Oct  1 16:07:16 2020
server02          Fresh            yes      3.2.0-ub18.04u30~1601393774.104fb9e  Mon Sep 21 17:19:57 2020  Tue Sep 29 21:13:07 2020  Tue Sep 29 21:13:07 2020   Thu Oct  1 16:07:24 2020
server03          Fresh            yes      3.2.0-ub18.04u30~1601393774.104fb9e  Mon Sep 21 17:19:56 2020  Tue Sep 29 21:13:07 2020  Tue Sep 29 21:13:07 2020   Thu Oct  1 16:07:12 2020
server04          Fresh            yes      3.2.0-ub18.04u30~1601393774.104fb9e  Mon Sep 21 17:19:57 2020  Tue Sep 29 21:13:07 2020  Tue Sep 29 21:13:07 2020   Thu Oct  1 16:07:17 2020
server05          Fresh            yes      3.2.0-ub18.04u30~1601393774.104fb9e  Mon Sep 21 17:19:57 2020  Tue Sep 29 21:13:10 2020  Tue Sep 29 21:13:10 2020   Thu Oct  1 16:07:25 2020
server06          Fresh            yes      3.2.0-ub18.04u30~1601393774.104fb9e  Mon Sep 21 17:19:57 2020  Tue Sep 29 21:13:10 2020  Tue Sep 29 21:13:10 2020   Thu Oct  1 16:07:21 2020
server07          Fresh            yes      3.2.0-ub18.04u30~1601393774.104fb9e  Mon Sep 21 17:06:48 2020  Tue Sep 29 21:13:10 2020  Tue Sep 29 21:13:10 2020   Thu Oct  1 16:07:28 2020
server08          Fresh            yes      3.2.0-ub18.04u30~1601393774.104fb9e  Mon Sep 21 17:06:45 2020  Tue Sep 29 21:13:10 2020  Tue Sep 29 21:13:10 2020   Thu Oct  1 16:07:31 2020
spine01           Fresh            yes      3.2.0-cl4u30~1601410518.104fb9ed     Mon Sep 21 17:03:34 2020  Tue Sep 29 21:24:58 2020  Tue Sep 29 21:24:58 2020   Thu Oct  1 16:07:20 2020
spine02           Fresh            yes      3.2.0-cl4u30~1601410518.104fb9ed     Mon Sep 21 17:03:33 2020  Tue Sep 29 21:24:58 2020  Tue Sep 29 21:24:58 2020   Thu Oct  1 16:07:16 2020
spine03           Fresh            yes      3.2.0-cl4u30~1601410518.104fb9ed     Mon Sep 21 17:03:34 2020  Tue Sep 29 21:25:07 2020  Tue Sep 29 21:25:07 2020   Thu Oct  1 16:07:20 2020
spine04           Fresh            yes      3.2.0-cl4u30~1601410518.104fb9ed     Mon Sep 21 17:03:32 2020  Tue Sep 29 21:25:07 2020  Tue Sep 29 21:25:07 2020   Thu Oct  1 16:07:33 2020
Example show command with filtered output

The following example shows the filtered output for the netq show agents command:

cumulus@switch:~$ netq leaf01 show agents
Matching agents records:
Hostname          Status           NTP Sync Version                              Sys Uptime                Agent Uptime              Reinitialize Time          Last Changed
----------------- ---------------- -------- ------------------------------------ ------------------------- ------------------------- -------------------------- -------------------------
leaf01            Fresh            yes      3.2.0-cl4u30~1601410518.104fb9ed     Mon Sep 21 16:49:04 2020  Tue Sep 29 21:24:49 2020  Tue Sep 29 21:24:49 2020   Thu Oct  1 16:26:33 2020

Configuration Commands

Various commands—including netq config, netq notification, and netq install—allow you to manage NetQ Agent and CLI server configurations, configure lifecycle management, set up container monitoring, and manage notifications.

NetQ Agent Configuration

The agent commands configure individual NetQ Agents.

The agent configuration commands can add and remove agents from switches and hosts, start and stop agent operations, debug the agent, specify default commands, and enable or disable a variety of monitoring features (including Kubernetes, sensors, FRR (FRRouting), CPU usage limit, and What Just Happened).

Commands apply to one agent at a time. Run them from the switch or host where the NetQ Agent resides.

The agent configuration commands include:

netq config (add|del|show) agent
netq config (start|stop|status|restart) agent

The following example shows how to configure the agent to send sensor data:

cumulus@switch~:$ netq config add agent sensors

The following example shows how to start monitoring with Kubernetes:

cumulus@switch:~$ netq config add agent kubernetes-monitor poll-period 15

The following example shows how to view the NetQ Agent configuration:

cumulus@switch:~$ netq config show agent
netq-agent             value      default
---------------------  ---------  ---------
enable-opta-discovery  True       True
exhibitport
agenturl
server                 127.0.0.1  127.0.0.1
exhibiturl
vrf                    default    default
agentport              8981       8981
port                   31980      31980

After making configuration changes to your agents, you must restart the agent for the changes to take effect. Use the netq config restart agent command.

Refer to Manage NetQ Agents and Install NetQ Agents for additional examples.

CLI Configuration

The netq config cli configures and manages the CLI component. You can add or remove the CLI (essentially enabling/disabling the service), start and restart it, and view the configuration of the service.

Commands apply to one device at a time, and you run them from the switch or host where you run the CLI.

The CLI configuration commands include:

netq config add cli server
netq config del cli server
netq config show cli premises [json]
netq config show (cli|all) [json]
netq config (status|restart) cli
netq config select cli premise

The following example shows how to restart the CLI instance:

cumulus@switch~:$ netq config restart cli

The following example shows how to enable the CLI on a NetQ on-premises appliance or virtual machine:

cumulus@switch~:$ netq config add cli server 10.1.3.101

The following example shows how to enable the CLI on a NetQ Cloud Appliance or VM for the Chicago premises and the default port:

netq config add cli server api.netq.cumulusnetworks.com access-key <user-access-key> secret-key <user-secret-key> premises chicago port 443

NetQ System Configuration Commands

Use the following commands to manage the NetQ system itself:

The following example shows how to decommission a switch named leaf01:

cumulus@netq-appliance:~$ netq decommission leaf01

For information and examples on installing and upgrading the NetQ system, see Install NetQ and Upgrade NetQ.

Event Notification Commands

The notification configuration commands can add, remove, and show notification via third-party integrations. These commands create the channels, filters, and rules that display event messages. Refer to Configure System Event Notifications for step-by-step instructions and examples.

Threshold-based Event Notification Commands

NetQ supports TCA events, a set of events that are triggered by crossing a user-defined threshold. Configure and manage TCA events using the following commands:

netq add tca
netq del tca tca_id <text-tca-id-anchor>
netq show tca

Lifecycle Management Commands

The lifecycle management commands help you efficiently manage the deployment of NVIDIA product software onto your network devices (servers, appliances, and switches).

LCM commands allow you to:

The following example shows the NetQ configuration profiles:

cumulus@switch:~$ netq lcm show netq-config
ID                        Name            Default Profile                VRF             WJH       CPU Limit Log Level Last Changed
------------------------- --------------- ------------------------------ --------------- --------- --------- --------- -------------------------
config_profile_3289efda36 NetQ default co Yes                            mgmt            Disable   Disable   info      Tue Apr 27 22:42:05 2021
db4065d56f91ebbd34a523b45 nfig
944fbfd10c5d75f9134d42023
eb2b

The following example shows how to add a Cumulus Linux installation image to the NetQ repository on the switch:

netq lcm add cl-image /path/to/download/cumulus-linux-4.3.0-mlnx-amd64.bin

Trace Commands

The netq trace commands lets you view the available paths between two nodes on the network currently and at a time in the past. You can perform a layer 2 or layer 3 trace, and view the output in one of three formats: JSON, pretty, and detail. JSON output provides the output in a JSON file format for ease of importing to other applications or software. Pretty output lines up the paths in a pseudo-graphical manner to help visualize multiple paths. Detail output is useful for traces with higher hop counts where the pretty output wraps lines, making it harder to interpret the results. The detail output displays a table with a row for each path.

The trace command syntax is:

netq trace (<mac> vlan <1-4096>) from (<src-hostname>|<ip-src>) [vrf <vrf>] [around <text-time>] [json|detail|pretty] [debug]
netq trace <ip> from (<src-hostname>|<ip-src>) [vrf <vrf>] [around <text-time>] [json|detail|pretty] [debug]
netq trace (<mac> vlan <1-4096>) from <mac-src> [around <text-time>] [json|detail|pretty] [debug]
Example trace command with pretty output

The following example shows how to run a trace based on the destination IP address, in pretty output with a small number of resulting paths:

cumulus@switch:~$ netq trace 10.0.0.11 from 10.0.0.14 pretty
Number of Paths: 6
    Inconsistent PMTU among paths
Number of Paths with Errors: 0
Number of Paths with Warnings: 0
Path MTU: 9000
    leaf04 swp52 -- swp4 spine02 swp2 -- swp52 leaf02 peerlink.4094 -- peerlink.4094 leaf01 lo
                                                    peerlink.4094 -- peerlink.4094 leaf01 lo
    leaf04 swp51 -- swp4 spine01 swp2 -- swp51 leaf02 peerlink.4094 -- peerlink.4094 leaf01 lo
                                                    peerlink.4094 -- peerlink.4094 leaf01 lo
    leaf04 swp52 -- swp4 spine02 swp1 -- swp52 leaf01 lo
    leaf04 swp51 -- swp4 spine01 swp1 -- swp51 leaf01 lo
Example trace command with detail output

This example shows how to run a trace based on the destination IP address, in detail output with a small number of resulting paths:

cumulus@switch:~$ netq trace 10.0.0.11 from 10.0.0.14 detail
Number of Paths: 6
    Inconsistent PMTU among paths
Number of Paths with Errors: 0
Number of Paths with Warnings: 0
Path MTU: 9000
Id  Hop Hostname        InPort          InVlan InTunnel              InRtrIf         InVRF           OutRtrIf        OutVRF          OutTunnel             OutPort         OutVlan
--- --- --------------- --------------- ------ --------------------- --------------- --------------- --------------- --------------- --------------------- --------------- -------
1   1   leaf04                                                                                       swp52           default                               swp52
    2   spine02         swp4                                         swp4            default         swp2            default                               swp2
    3   leaf02          swp52                                        swp52           default         peerlink.4094   default                               peerlink.4094
    4   leaf01          peerlink.4094                                peerlink.4094   default                                                               lo
--- --- --------------- --------------- ------ --------------------- --------------- --------------- --------------- --------------- --------------------- --------------- -------
2   1   leaf04                                                                                       swp52           default                               swp52
    2   spine02         swp4                                         swp4            default         swp2            default                               swp2
    3   leaf02          swp52                                        swp52           default         peerlink.4094   default                               peerlink.4094
    4   leaf01          peerlink.4094                                peerlink.4094   default                                                               lo
--- --- --------------- --------------- ------ --------------------- --------------- --------------- --------------- --------------- --------------------- --------------- -------
3   1   leaf04                                                                                       swp51           default                               swp51
    2   spine01         swp4                                         swp4            default         swp2            default                               swp2
    3   leaf02          swp51                                        swp51           default         peerlink.4094   default                               peerlink.4094
    4   leaf01          peerlink.4094                                peerlink.4094   default                                                               lo
--- --- --------------- --------------- ------ --------------------- --------------- --------------- --------------- --------------- --------------------- --------------- -------
4   1   leaf04                                                                                       swp51           default                               swp51
    2   spine01         swp4                                         swp4            default         swp2            default                               swp2
    3   leaf02          swp51                                        swp51           default         peerlink.4094   default                               peerlink.4094
    4   leaf01          peerlink.4094                                peerlink.4094   default                                                               lo
--- --- --------------- --------------- ------ --------------------- --------------- --------------- --------------- --------------- --------------------- --------------- -------
5   1   leaf04                                                                                       swp52           default                               swp52
    2   spine02         swp4                                         swp4            default         swp1            default                               swp1
    3   leaf01          swp52                                        swp52           default                                                               lo
--- --- --------------- --------------- ------ --------------------- --------------- --------------- --------------- --------------- --------------------- --------------- -------
6   1   leaf04                                                                                       swp51           default                               swp51
    2   spine01         swp4                                         swp4            default         swp1            default                               swp1
    3   leaf01          swp51                                        swp51           default                                                               lo
--- --- --------------- --------------- ------ --------------------- --------------- --------------- --------------- --------------- --------------------- --------------- -------
Example trace command on destination MAC address

This example shows how to run a trace based on the destination MAC address, in pretty output:

cumulus@switch:~$ netq trace A0:00:00:00:00:11 vlan 1001 from Server03 pretty
Number of Paths: 6
Number of Paths with Errors: 0
Number of Paths with Warnings: 0
Path MTU: 9152
    
    Server03 bond1.1001 -- swp7 <vlan1001> Leaf02 vni: 34 swp5 -- swp4 Spine03 swp7 -- swp5 vni: 34 Leaf04 swp6 -- swp1.1001 Server03 <swp1.1001>
                                                        swp4 -- swp4 Spine02 swp7 -- swp4 vni: 34 Leaf04 swp6 -- swp1.1001 Server03 <swp1.1001>
                                                        swp3 -- swp4 Spine01 swp7 -- swp3 vni: 34 Leaf04 swp6 -- swp1.1001 Server03 <swp1.1001>
            bond1.1001 -- swp7 <vlan1001> Leaf01 vni: 34 swp5 -- swp3 Spine03 swp7 -- swp5 vni: 34 Leaf04 swp6 -- swp1.1001 Server03 <swp1.1001>
                                                        swp4 -- swp3 Spine02 swp7 -- swp4 vni: 34 Leaf04 swp6 -- swp1.1001 Server03 <swp1.1001>
                                                        swp3 -- swp3 Spine01 swp7 -- swp3 vni: 34 Leaf04 swp6 -- swp1.1001 Server03 <swp1.1001>

Installation Guide

This section describes how to install, configure, and upgrade NetQ.

Before you begin, review the release notes for this version.

Before You Install

This overview is designed to help you understand the various NetQ deployment and installation options.

Installation Overview

Consider the following before you install the NetQ system:

  1. Determine whether to deploy fully on-premises or as a remote solution.
  2. Choose whether to install the software on a single server or as a server cluster.

Deployment Type: On-premises or Remote

You can deploy NetQ in one of two ways:

In all deployment models, the NetQ Agents reside on the switches and hosts they monitor in your network. Refer to Install the NetQ System for a comprehensive list of deployment types and their respective requirements.

Data Flow

The flow of data differs based on your deployment model.

For the on-premises deployment, the NetQ Agents collect and transmit data from the switches and hosts back to the NetQ on-premises appliance running the NetQ software. The software processes and stores the data, which is then displayed through the user interface.

on-premises deployment type displaying data transmission between the agents, the platform, and the user interface.

For the remote, multi-site NetQ implementation, the NetQ Agents at each secondary premises collect and transmit data from the switches and hosts from the secondary premises to the NetQ cloud appliance. The cloud appliance transmits this data to the primary NetQ on-premises appliance for processing and storage. This deployment is a good choice when you want to store all the data from multiple premises on one NetQ on-premises appliance.

For the remote, cloud-service implementation, the NetQ Agents collect and transmit data from the switches and hosts to the NetQ cloud appliance. The NetQ cloud appliance then transmits this data to the NVIDIA cloud-based infrastructure for further processing and storage.

To access the NetQ UI from the cloud-service implementation, visit https://netq.nvidia.com.

Server Arrangement: Single or Cluster

Both single-server and server-cluster deployments provide identical services and features. The biggest difference is the number of servers deployed and the continued availability of services running on those servers should hardware failures occur.

A single server is easier to set up, configure, and manage, but can limit your ability to scale your network monitoring quickly. Deploying multiple servers is more complicated, but you limit potential downtime and increase availability by having more than one server that can run the software and store the data. Select the standalone, single-server arrangements for smaller, simpler deployments. Be sure to consider the capabilities and resources needed on this server to support the size of your final deployment.

Select the server-cluster arrangement to obtain scalability and high availability for your network. The clustering implementation comprises three servers: one master and two workers. Part of the cluster configuration includes configuring the NetQ Agents to connect to the three servers. If you decide to add additional nodes to the cluster, you do not need to configure these nodes again.

You can enable high availability (HA) of NetQ control plane processing and UI access with the use of an additional virtual IP address assigned to the cluster nodes.

Cluster Deployments and Kubernetes

NetQ also monitors Kubernetes containers. Even if the master node fails, NetQ services remain operational. However, keep in mind that the master hosts the Kubernetes control plane so anything that requires connectivity with the Kubernetes cluster—such as upgrading NetQ or rescheduling pods to other workers if a worker goes down—will not work.

To enable redundancy for the Kubernetes control plane, install your server cluster with the high availability virtual IP address. In this configuration, the majority of nodes must be operational for NetQ to function. For example, a three-node cluster can tolerate a one-node failure, but not a two-node failure.

Cluster Deployments and Load Balancers

As an alternative to the high availability server-cluster deployment with a VIP, you can use an external load balancer to provide high availability for the NetQ API and the NetQ UI.

However, you need to be mindful of where you install the certificates for the NetQ UI (port 443); otherwise, you cannot access the NetQ UI. If you are using a load balancer in your deployment, NVIDIA recommends that you install the certificates directly on the load balancer for SSL offloading. However, if you install the certificates on the master node, then configure the load balancer to allow for SSL passthrough.

Next Steps

After you’ve decided on your deployment type, you’re ready to install NetQ.

Install NetQ

To install NetQ:

  1. Visit Before You Install to understand the various NetQ deployments.

  2. After deciding your deployment model, prepare your devices and install NetQ.

  3. Next, install and configure the NetQ Agents on switches and hosts.

  4. Finally, install and configure the NetQ CLI on switches and hosts.

Install the NetQ System

You can install NetQ either on your premises or as a remote, cloud solution. If you are unsure which option is best for your network, refer to Before You Install.

On-Premises

Deployment Type Server Arrangement Hypervisor Requirements & Installation
On-premises Single server KVM Start install
On-premises Single server VMware Start install
On-premises Server cluster KVM Start install
On-premises Server cluster VMware Start install
On-premises High availability server cluster KVM Start install
On-premises High availability server cluster VMware Start install

Cloud (On-Premises Telemetry Aggregator)

Deployment Type Server Arrangement Hypervisor Requirements & Installation
OPTA Single server KVM Start install
OPTA Single server VMware Start install
OPTA Server cluster KVM Start install
OPTA Server cluster VMware Start install

Set Up Your VMware Virtual Machine for a Single On-premises Server

Follow these steps to set up and configure your VM on a single server in an on-premises deployment:

  1. Verify that your system meets the VM requirements.

    Resource Minimum Requirements
    ProcessorSixteen (16) virtual CPUs
    Memory64 GB RAM
    Local disk storage500 GB SSD with minimum disk IOPS of 1000 for a standard 4kb block size
    (Note: This must be an SSD; use of other storage options can lead to system instability and are not supported.)
    Network interface speed 1 Gb NIC
    HypervisorVMware ESXi™ 6.5 or later (OVA image) for servers running Cumulus Linux, CentOS, Ubuntu, and RedHat operating systems
  2. Confirm that the required ports are open for communications.

    You must open the following ports on your NetQ on-premises server:
    Port or Protocol Number Protocol Component Access
    4 IP Protocol Calico networking (IP-in-IP Protocol)
    22 TCP SSH
    80 TCP Nginx
    179 TCP Calico networking (BGP)
    443 TCP NetQ UI
    2379 TCP etcd datastore
    4789 UDP Calico networking (VxLAN)
    5000 TCP Docker registry
    6443 TCP kube-apiserver
    30001 TCP DPU communication
    31980 TCP NetQ Agent communication
    31982 TCP NetQ Agent SSL communication
    32708 TCP API Gateway
  3. Download the NetQ Platform image.

    1. On the NVIDIA Application Hub, log in to your account.
    2. Select NVIDIA Licensing Portal.
    3. Select Software Downloads from the menu.
    4. Click Product Family and select NetQ.
    5. Locate the NetQ SW 4.8 VMWare image and select Download.
    6. If prompted, read the license agreement and proceed with the download.

    For enterprise customers, if you do not see a link to the NVIDIA Licensing Portal on the NVIDIA Application Hub, contact NVIDIA support.


    For NVIDIA employees, download NetQ directly from the NVIDIA Licensing Portal.

  4. Set up and configure your VM.

    VMware Example Configuration This example shows the VM setup process using an OVA file with VMware ESXi.
    1. Enter the address of the hardware in your browser.

    2. Log in to VMware using credentials with root access.

    3. Click Storage in the Navigator to verify you have an SSD installed.

    4. Click Create/Register VM at the top of the right pane.

    5. Select Deploy a virtual machine from an OVF or OVA file, and click Next.

    6. Provide a name for the VM, for example NetQ.

      Tip: Make note of the name used during install as this is needed in a later step.

    7. Drag and drop the NetQ Platform image file you downloaded in Step 1 above.

  5. Click Next.

  6. Select the storage type and data store for the image to use, then click Next. In this example, only one is available.

  7. Accept the default deployment options or modify them according to your network needs. Click Next when you are finished.

  8. Review the configuration summary. Click Back to change any of the settings, or click Finish to continue with the creation of the VM.

    The progress of the request is shown in the Recent Tasks window at the bottom of the application. This may take some time, so continue with your other work until the upload finishes.

  9. Once completed, view the full details of the VM and hardware.

  • Log in to the VM and change the password.

    Use the default credentials to log in the first time:

    • Username: cumulus
    • Password: cumulus
    $ ssh cumulus@<ipaddr>
    Warning: Permanently added '<ipaddr>' (ECDSA) to the list of known hosts.
    Ubuntu 20.04 LTS
    cumulus@<ipaddr>'s password:
    You are required to change your password immediately (root enforced)
    System information as of Thu Dec  3 21:35:42 UTC 2020
    System load:  0.09              Processes:           120
    Usage of /:   8.1% of 61.86GB   Users logged in:     0
    Memory usage: 5%                IP address for eth0: <ipaddr>
    Swap usage:   0%
    WARNING: Your password has expired.
    You must change your password now and login again!
    Changing password for cumulus.
    (current) UNIX password: cumulus
    Enter new UNIX password:
    Retype new UNIX password:
    passwd: password updated successfully
    Connection to <ipaddr> closed.
    

    Log in again with your new password.

    $ ssh cumulus@<ipaddr>
    Warning: Permanently added '<ipaddr>' (ECDSA) to the list of known hosts.
    Ubuntu 20.04 LTS
    cumulus@<ipaddr>'s password:
      System information as of Thu Dec  3 21:35:59 UTC 2020
      System load:  0.07              Processes:           121
      Usage of /:   8.1% of 61.86GB   Users logged in:     0
      Memory usage: 5%                IP address for eth0: <ipaddr>
      Swap usage:   0%
    Last login: Thu Dec  3 21:35:43 2020 from <local-ipaddr>
    cumulus@ubuntu:~$
    
  • Verify the platform is ready for installation. Fix any errors indicated before installing the NetQ software.

    cumulus@hostname:~$ sudo opta-check
  • Change the hostname for the VM from the default value.

    The default hostname for the NetQ Virtual Machines is ubuntu. Change the hostname to fit your naming conventions while meeting Internet and Kubernetes naming standards.

    Kubernetes requires that hostnames are composed of a sequence of labels concatenated with dots. For example, “en.wikipedia.org” is a hostname. Each label must be from 1 to 63 characters long. The entire hostname, including the delimiting dots, has a maximum of 253 ASCII characters.

    The Internet standards (RFCs) for protocols specify that labels may contain only the ASCII letters a through z (in lower case), the digits 0 through 9, and the hyphen-minus character ('-').

    Use the following command:

    cumulus@hostname:~$ sudo hostnamectl set-hostname NEW_HOSTNAME

    Add the same NEW_HOSTNAME value to /etc/hosts on your VM for the localhost entry. Example:

    127.0.0.1 localhost NEW_HOSTNAME
  • Install and activate the NetQ software using the CLI:

  • Run the following command on your NetQ platform server:

    cumulus@hostname:~$ netq install standalone full interface eth0 bundle /mnt/installables/NetQ-4.8.0.tgz

    You can specify the IP address instead of the interface name here: use ip-addr <IP address> in place of interface <ifname> above.

    If you have changed the IP address or hostname of the NetQ on-premises VM after this step, you need to re-register this address with NetQ as follows:

    Reset the VM, indicating whether you want to purge any NetQ DB data or keep it.

    cumulus@hostname:~$ netq bootstrap reset [purge-db|keep-db]

    Re-run the install CLI on the appliance. This example uses interface eth0. Replace this with your updated IP address, hostname or interface using the interface or ip-addr option.

    cumulus@hostname:~$ netq install standalone full interface eth0 bundle /mnt/installables/NetQ-4.8.0.tgz

    If this step fails for any reason, you can run netq bootstrap reset and then try again.

    Verify Installation Status

    To view the status of the installation, use the netq show status [verbose] command. The following example shows a successful on-premises installation:

    State: Active
        Version: 4.8.0
        Installer Version: 4.8.0
        Installation Type: Standalone
        Activation Key: PKrgipMGEhVuZXRxLWVuZHBvaW50LWdhdGV3YXkYsagDIixUQmFLTUhzZU80RUdTL3pOT01uQ2lnRnrrUhTbXNPUGRXdnUwTVo5SEpBPTIHZGVmYXVsdDoHbmV0cWRldgz=
        Master SSH Public Key: a3NoLXJzYSBBQUFBQjNOemFDMXljMkVBQUFBREFRQUJBQUFCQVFEazliekZDblJUajkvQVhOZ0hteXByTzZIb3Y2cVZBWFdsNVNtKzVrTXo3dmMrcFNZTGlOdWl1bEhZeUZZVDhSNmU3bFdqS3NrSE10bzArNFJsQVd6cnRvbVVzLzlLMzQ4M3pUMjVZQXpIU2N1ZVhBSE1TdTZHZ0JyUkpXYUpTNjJ2RTkzcHBDVjBxWWJvUFo3aGpCY3ozb0VVWnRsU1lqQlZVdjhsVjBNN3JEWW52TXNGSURWLzJ2eks3K0x2N01XTG5aT054S09hdWZKZnVOT0R4YjFLbk1mN0JWK3hURUpLWW1mbTY1ckoyS1ArOEtFUllrr5TkF3bFVRTUdmT3daVHF2RWNoZnpQajMwQ29CWDZZMzVST2hDNmhVVnN5OEkwdjVSV0tCbktrWk81MWlMSDAyZUpJbXJHUGdQa2s1SzhJdGRrQXZISVlTZ0RwRlpRb3Igcm9vdEBucXRzLTEwLTE4OC00NC0xNDc=
        Is Cloud: False
        
        Cluster Status:
        IP Address     Hostname       Role    Status
        -------------  -------------  ------  --------
        10.188.44.147  10.188.44.147  Role    Ready
        
        NetQ... Active
        

    Run the netq show opta-health command to verify all applications are operating properly. Allow 10-15 minutes for all applications to come up and report their status.

    cumulus@hostname:~$ netq show opta-health
        Application                                            Status    Namespace      Restarts    Timestamp
        -----------------------------------------------------  --------  -------------  ----------  ------------------------
        cassandra-rc-0-w7h4z                                   READY     default        0           Fri Apr 10 16:08:38 2020
        cp-schema-registry-deploy-6bf5cbc8cc-vwcsx             READY     default        0           Fri Apr 10 16:08:38 2020
        kafka-broker-rc-0-p9r2l                                READY     default        0           Fri Apr 10 16:08:38 2020
        kafka-connect-deploy-7799bcb7b4-xdm5l                  READY     default        0           Fri Apr 10 16:08:38 2020
        netq-api-gateway-deploy-55996ff7c8-w4hrs               READY     default        0           Fri Apr 10 16:08:38 2020
        netq-app-address-deploy-66776ccc67-phpqk               READY     default        0           Fri Apr 10 16:08:38 2020
        netq-app-admin-oob-mgmt-server                         READY     default        0           Fri Apr 10 16:08:38 2020
        netq-app-bgp-deploy-7dd4c9d45b-j9bfr                   READY     default        0           Fri Apr 10 16:08:38 2020
        netq-app-clagsession-deploy-69564895b4-qhcpr           READY     default        0           Fri Apr 10 16:08:38 2020
        netq-app-configdiff-deploy-ff54c4cc4-7rz66             READY     default        0           Fri Apr 10 16:08:38 2020
        ...
        

    If any of the applications or services display Status as DOWN after 30 minutes, open a support ticket and attach the output of the opta-support command.

    After NetQ is installed, you can log in to NetQ from your browser.

    Set Up Your VMware Virtual Machine for a Single Cloud Server

    Follow these steps to set up and configure your VM for a cloud deployment:

    1. Verify that your system meets the VM requirements.

      Resource Minimum Requirements
      Processor Four (4) virtual CPUs
      Memory8 GB RAM
      Local disk storage 64 GB
      Network interface speed 1 Gb NIC
      HypervisorVMware ESXi™ 6.5 or later (OVA image) for servers running Cumulus Linux, CentOS, Ubuntu, and RedHat operating systems
    2. Confirm that the required ports are open for communications. The OPTA must be able to initiate HTTPS connections (destination TCP port 443) to the netq.nvidia.com domain (*.netq.nvidia.com). You must also open the following ports on your NetQ OPTA:

      Port or Protocol Number Protocol Component Access
      4 IP Protocol Calico networking (IP-in-IP Protocol)
      22 TCP SSH
      80 TCP Nginx
      179 TCP Calico networking (BGP)
      443 TCP Nginx
      2379 TCP etcd datastore
      4789 UDP Calico networking (VxLAN)
      5000 TCP Docker registry
      6443 TCP kube-apiserver
      31980 TCP NetQ Agent communication
      31982 TCP NetQ Agent SSL communication
      32708 TCP API Gateway

    3. Download the NetQ Platform image.

      1. On the NVIDIA Application Hub, log in to your account.
      2. Select NVIDIA Licensing Portal.
      3. Select Software Downloads from the menu.
      4. Click Product Family and select NetQ.
      5. Locate the NetQ SW 4.8 VMWare Cloud image and select Download.
      6. If prompted, read the license agreement and proceed with the download.

      For enterprise customers, if you do not see a link to the NVIDIA Licensing Portal on the NVIDIA Application Hub, contact NVIDIA support.


      For NVIDIA employees, download NetQ directly from the NVIDIA Licensing Portal.

    4. Set up and configure your VM.

      VMware Example Configuration This example shows the VM setup process using an OVA file with VMware ESXi.
      1. Enter the address of the hardware in your browser.

      2. Log in to VMware using credentials with root access.

      3. Click Storage in the Navigator to verify you have an SSD installed.

      4. Click Create/Register VM at the top of the right pane.

      5. Select Deploy a virtual machine from an OVF or OVA file, and click Next.

      6. Provide a name for the VM, for example NetQ.

        Tip: Make note of the name used during install as this is needed in a later step.

      7. Drag and drop the NetQ Platform image file you downloaded in Step 1 above.

    5. Click Next.

    6. Select the storage type and data store for the image to use, then click Next. In this example, only one is available.

    7. Accept the default deployment options or modify them according to your network needs. Click Next when you are finished.

    8. Review the configuration summary. Click Back to change any of the settings, or click Finish to continue with the creation of the VM.

      The progress of the request is shown in the Recent Tasks window at the bottom of the application. This may take some time, so continue with your other work until the upload finishes.

    9. Once completed, view the full details of the VM and hardware.

  • Log in to the VM and change the password.

    Use the default credentials to log in the first time:

    • Username: cumulus
    • Password: cumulus
    $ ssh cumulus@<ipaddr>
    Warning: Permanently added '<ipaddr>' (ECDSA) to the list of known hosts.
    Ubuntu 20.04 LTS
    cumulus@<ipaddr>'s password:
    You are required to change your password immediately (root enforced)
    System information as of Thu Dec  3 21:35:42 UTC 2020
    System load:  0.09              Processes:           120
    Usage of /:   8.1% of 61.86GB   Users logged in:     0
    Memory usage: 5%                IP address for eth0: <ipaddr>
    Swap usage:   0%
    WARNING: Your password has expired.
    You must change your password now and login again!
    Changing password for cumulus.
    (current) UNIX password: cumulus
    Enter new UNIX password:
    Retype new UNIX password:
    passwd: password updated successfully
    Connection to <ipaddr> closed.
    

    Log in again with your new password.

    $ ssh cumulus@<ipaddr>
    Warning: Permanently added '<ipaddr>' (ECDSA) to the list of known hosts.
    Ubuntu 20.04 LTS
    cumulus@<ipaddr>'s password:
      System information as of Thu Dec  3 21:35:59 UTC 2020
      System load:  0.07              Processes:           121
      Usage of /:   8.1% of 61.86GB   Users logged in:     0
      Memory usage: 5%                IP address for eth0: <ipaddr>
      Swap usage:   0%
    Last login: Thu Dec  3 21:35:43 2020 from <local-ipaddr>
    cumulus@ubuntu:~$
    
  • Verify the platform is ready for installation. Fix any errors indicated before installing the NetQ software.

    cumulus@hostname:~$ sudo opta-check-cloud
  • Change the hostname for the VM from the default value.

    The default hostname for the NetQ Virtual Machines is ubuntu. Change the hostname to fit your naming conventions while meeting Internet and Kubernetes naming standards.

    Kubernetes requires that hostnames are composed of a sequence of labels concatenated with dots. For example, “en.wikipedia.org” is a hostname. Each label must be from 1 to 63 characters long. The entire hostname, including the delimiting dots, has a maximum of 253 ASCII characters.

    The Internet standards (RFCs) for protocols specify that labels may contain only the ASCII letters a through z (in lower case), the digits 0 through 9, and the hyphen-minus character ('-').

    Use the following command:

    cumulus@hostname:~$ sudo hostnamectl set-hostname NEW_HOSTNAME

    Add the same NEW_HOSTNAME value to /etc/hosts on your VM for the localhost entry. Example:

    127.0.0.1 localhost NEW_HOSTNAME
  • Install and activate the NetQ software using the CLI:

  • Run the following command on your NetQ cloud appliance with the config-key obtained from the email you received from NVIDIA titled NetQ Access Link. You can also obtain the configuration key through the NetQ UI.

    cumulus@<hostname>:~$ netq install opta standalone full interface eth0 bundle /mnt/installables/NetQ-4.8.0-opta.tgz config-key <your-config-key> [proxy-host <proxy-hostname> proxy-port <proxy-port>]
    

    You can specify the IP address instead of the interface name here: use ip-addr <IP address> in place of interface <ifname> above.

    If you have changed the IP address or hostname of the NetQ OPTA after this step, you need to re-register this address with NetQ as follows:

    Reset the VM:

    cumulus@hostname:~$ netq bootstrap reset

    Re-run the install CLI on the appliance. This example uses interface eth0. Replace this with your updated IP address, hostname or interface using the interface or ip-addr option.

    cumulus@hostname:~$ netq install opta standalone full interface eth0 bundle /mnt/installables/NetQ-4.8.0-opta.tgz config-key <your-config-key> [proxy-host  proxy-port ]

    If this step fails for any reason, you can run netq bootstrap reset and then try again.

    Consider the following for container environments, and make adjustments as needed.

    Calico Networking

    NetQ overrides the Calico default address range and changes it to 10.244.0.0/16. To modify this range, use the netq install opta command, specifying the default address range with the pod-ip-range option. For example:

    cumulus@hostname:~$ netq install opta standalone full interface eth0 bundle /mnt/installables/NetQ-4.8.0-opta.tgz config-key <your-config-key> pod-ip-range 10.255.0.0/16

    Docker Default Bridge Interface

    The default Docker bridge interface is disabled in NetQ. If you need to reenable the interface, contact support.

    Verify Installation Status

    To view the status of the installation, use the netq show status [verbose] command. The following example shows a successful on-premises installation:

    State: Active
        Version: 4.8.0
        Installer Version: 4.8.0
        Installation Type: Standalone
        Activation Key: PKrgipMGEhVuZXRxLWVuZHBvaW50LWdhdGV3YXkYsagDIixUQmFLTUhzZU80RUdTL3pOT01uQ2lnRnrrUhTbXNPUGRXdnUwTVo5SEpBPTIHZGVmYXVsdDoHbmV0cWRldgz=
        Master SSH Public Key: a3NoLXJzYSBBQUFBQjNOemFDMXljMkVBQUFBREFRQUJBQUFCQVFEazliekZDblJUajkvQVhOZ0hteXByTzZIb3Y2cVZBWFdsNVNtKzVrTXo3dmMrcFNZTGlOdWl1bEhZeUZZVDhSNmU3bFdqS3NrSE10bzArNFJsQVd6cnRvbVVzLzlLMzQ4M3pUMjVZQXpIU2N1ZVhBSE1TdTZHZ0JyUkpXYUpTNjJ2RTkzcHBDVjBxWWJvUFo3aGpCY3ozb0VVWnRsU1lqQlZVdjhsVjBNN3JEWW52TXNGSURWLzJ2eks3K0x2N01XTG5aT054S09hdWZKZnVOT0R4YjFLbk1mN0JWK3hURUpLWW1mbTY1ckoyS1ArOEtFUllrr5TkF3bFVRTUdmT3daVHF2RWNoZnpQajMwQ29CWDZZMzVST2hDNmhVVnN5OEkwdjVSV0tCbktrWk81MWlMSDAyZUpJbXJHUGdQa2s1SzhJdGRrQXZISVlTZ0RwRlpRb3Igcm9vdEBucXRzLTEwLTE4OC00NC0xNDc=
        Is Cloud: False
        
        Cluster Status:
        IP Address     Hostname       Role    Status
        -------------  -------------  ------  --------
        10.188.44.147  10.188.44.147  Role    Ready
        
        NetQ... Active
        

    Run the netq show opta-health command to verify all applications are operating properly. Allow 10-15 minutes for all applications to come up and report their status.

    cumulus@hostname:~$ netq show opta-health
        Application                                            Status    Namespace      Restarts    Timestamp
        -----------------------------------------------------  --------  -------------  ----------  ------------------------
        cassandra-rc-0-w7h4z                                   READY     default        0           Fri Apr 10 16:08:38 2020
        cp-schema-registry-deploy-6bf5cbc8cc-vwcsx             READY     default        0           Fri Apr 10 16:08:38 2020
        kafka-broker-rc-0-p9r2l                                READY     default        0           Fri Apr 10 16:08:38 2020
        kafka-connect-deploy-7799bcb7b4-xdm5l                  READY     default        0           Fri Apr 10 16:08:38 2020
        netq-api-gateway-deploy-55996ff7c8-w4hrs               READY     default        0           Fri Apr 10 16:08:38 2020
        netq-app-address-deploy-66776ccc67-phpqk               READY     default        0           Fri Apr 10 16:08:38 2020
        netq-app-admin-oob-mgmt-server                         READY     default        0           Fri Apr 10 16:08:38 2020
        netq-app-bgp-deploy-7dd4c9d45b-j9bfr                   READY     default        0           Fri Apr 10 16:08:38 2020
        netq-app-clagsession-deploy-69564895b4-qhcpr           READY     default        0           Fri Apr 10 16:08:38 2020
        netq-app-configdiff-deploy-ff54c4cc4-7rz66             READY     default        0           Fri Apr 10 16:08:38 2020
        ...
        

    If any of the applications or services display Status as DOWN after 30 minutes, open a support ticket and attach the output of the opta-support command.

    After NetQ is installed, you can log in to NetQ from your browser.

    Set Up Your VMware Virtual Machine for an On-premises HA Server Cluster

    First configure the VM on the master node, and then configure the VM on each worker node.

    Follow these steps to set up and configure your VM cluster for an on-premises deployment:

    1. Verify that each node in your cluster—the master node and two worker nodes—meets the VM requirements.

      Resource Minimum Requirements
      ProcessorSixteen (16) virtual CPUs
      Memory64 GB RAM
      Local disk storage500 GB SSD with minimum disk IOPS of 1000 for a standard 4kb block size
      (Note: This must be an SSD; use of other storage options can lead to system instability and are not supported.)
      Network interface speed 1 Gb NIC
      HypervisorVMware ESXi™ 6.5 or later (OVA image) for servers running Cumulus Linux, CentOS, Ubuntu, and RedHat operating systems
    2. Confirm that the required ports are open for communications.

      You must open the following ports on your NetQ on-premises servers:
      Port or Protocol Number Protocol Component Access
      4 IP Protocol Calico networking (IP-in-IP Protocol)
      22 TCP SSH
      80 TCP Nginx
      179 TCP Calico networking (BGP)
      443 TCP NetQ UI
      2379 TCP etcd datastore
      4789 UDP Calico networking (VxLAN)
      5000 TCP Docker registry
      6443 TCP kube-apiserver
      30001 TCP DPU communication
      31980 TCP NetQ Agent communication
      31982 TCP NetQ Agent SSL communication
      32708 TCP API Gateway
      Additionally, for internal cluster communication, you must open these ports:
      Port Protocol Component Access
      8080 TCP Admin API
      5000 TCP Docker registry
      6443 TCP Kubernetes API server
      10250 TCP kubelet health probe
      2379 TCP etcd
      2380 TCP etcd
      7072 TCP Kafka JMX monitoring
      9092 TCP Kafka client
      7071 TCP Cassandra JMX monitoring
      7000 TCP Cassandra cluster communication
      9042 TCP Cassandra client
      7073 TCP Zookeeper JMX monitoring
      2888 TCP Zookeeper cluster communication
      3888 TCP Zookeeper cluster communication
      2181 TCP Zookeeper client
      36443 TCP Kubernetes control plane
    3. Download the NetQ Platform image.

      1. On the NVIDIA Application Hub, log in to your account.
      2. Select NVIDIA Licensing Portal.
      3. Select Software Downloads from the menu.
      4. Click Product Family and select NetQ.
      5. Locate the NetQ SW 4.8 VMWare image and select Download.
      6. If prompted, read the license agreement and proceed with the download.

      For enterprise customers, if you do not see a link to the NVIDIA Licensing Portal on the NVIDIA Application Hub, contact NVIDIA support.


      For NVIDIA employees, download NetQ directly from the NVIDIA Licensing Portal.

    4. Set up and configure your VM.

      VMware Example Configuration This example shows the VM setup process using an OVA file with VMware ESXi.
      1. Enter the address of the hardware in your browser.

      2. Log in to VMware using credentials with root access.

      3. Click Storage in the Navigator to verify you have an SSD installed.

      4. Click Create/Register VM at the top of the right pane.

      5. Select Deploy a virtual machine from an OVF or OVA file, and click Next.

      6. Provide a name for the VM, for example NetQ.

        Tip: Make note of the name used during install as this is needed in a later step.

      7. Drag and drop the NetQ Platform image file you downloaded in Step 1 above.

    5. Click Next.

    6. Select the storage type and data store for the image to use, then click Next. In this example, only one is available.

    7. Accept the default deployment options or modify them according to your network needs. Click Next when you are finished.

    8. Review the configuration summary. Click Back to change any of the settings, or click Finish to continue with the creation of the VM.

      The progress of the request is shown in the Recent Tasks window at the bottom of the application. This may take some time, so continue with your other work until the upload finishes.

    9. Once completed, view the full details of the VM and hardware.

  • Log in to the VM and change the password.

    Use the default credentials to log in the first time:

    • Username: cumulus
    • Password: cumulus
    $ ssh cumulus@<ipaddr>
    Warning: Permanently added '<ipaddr>' (ECDSA) to the list of known hosts.
    Ubuntu 20.04 LTS
    cumulus@<ipaddr>'s password:
    You are required to change your password immediately (root enforced)
    System information as of Thu Dec  3 21:35:42 UTC 2020
    System load:  0.09              Processes:           120
    Usage of /:   8.1% of 61.86GB   Users logged in:     0
    Memory usage: 5%                IP address for eth0: <ipaddr>
    Swap usage:   0%
    WARNING: Your password has expired.
    You must change your password now and login again!
    Changing password for cumulus.
    (current) UNIX password: cumulus
    Enter new UNIX password:
    Retype new UNIX password:
    passwd: password updated successfully
    Connection to <ipaddr> closed.
    

    Log in again with your new password.

    $ ssh cumulus@<ipaddr>
    Warning: Permanently added '<ipaddr>' (ECDSA) to the list of known hosts.
    Ubuntu 20.04 LTS
    cumulus@<ipaddr>'s password:
      System information as of Thu Dec  3 21:35:59 UTC 2020
      System load:  0.07              Processes:           121
      Usage of /:   8.1% of 61.86GB   Users logged in:     0
      Memory usage: 5%                IP address for eth0: <ipaddr>
      Swap usage:   0%
    Last login: Thu Dec  3 21:35:43 2020 from <local-ipaddr>
    cumulus@ubuntu:~$
    
  • Verify the master node is ready for installation. Fix any errors indicated before installing the NetQ software.

    cumulus@hostname:~$ sudo opta-check
  • Change the hostname for the VM from the default value.

    The default hostname for the NetQ Virtual Machines is ubuntu. Change the hostname to fit your naming conventions while meeting Internet and Kubernetes naming standards.

    Kubernetes requires that hostnames are composed of a sequence of labels concatenated with dots. For example, “en.wikipedia.org” is a hostname. Each label must be from 1 to 63 characters long. The entire hostname, including the delimiting dots, has a maximum of 253 ASCII characters.

    The Internet standards (RFCs) for protocols specify that labels may contain only the ASCII letters a through z (in lower case), the digits 0 through 9, and the hyphen-minus character ('-').

    Use the following command:

    cumulus@hostname:~$ sudo hostnamectl set-hostname NEW_HOSTNAME

    Add the same NEW_HOSTNAME value to /etc/hosts on your VM for the localhost entry. Example:

    127.0.0.1 localhost NEW_HOSTNAME
  • Verify that your first worker node meets the VM requirements, as described in step 1.

  • Confirm that the required ports are open for communications, as described in step 2.

  • Open your hypervisor and set up the VM in the same manner as the master node.

    Make a note of the private IP address you assign to the worker node. You need it for later installation steps.

  • Verify the worker node is ready for installation. Fix any errors indicated before installing the NetQ software.

    cumulus@hostname:~$ sudo opta-check-cloud
  • Repeat steps 8 through 11 for each additional worker node in your cluster.

  • Install and activate the NetQ software using the CLI:

  • Run the following command on your master node to initialize the cluster. Copy the output of the command to use on your worker nodes:

    cumulus@<hostname>:~$ netq install cluster master-init
        Please run the following command on all worker nodes:
        netq install cluster worker-init c3NoLXJzYSBBQUFBQjNOemFDMXljMkVBQUFBREFRQUJBQUFCQVFDM2NjTTZPdVVUWWJ5c2Q3NlJ4SHdseHBsOHQ4N2VMRWVGR05LSWFWVnVNcy94OEE4RFNMQVhKOHVKRjVLUXBnVjdKM2lnMGJpL2hDMVhmSVVjU3l3ZmhvVDVZM3dQN1oySVZVT29ZTi8vR1lOek5nVlNocWZQMDNDRW0xNnNmSzVvUWRQTzQzRFhxQ3NjbndIT3dwZmhRYy9MWTU1a
    

    Run the netq install cluster worker-init <ssh-key> on each of your worker nodes.

    Run the following commands on your master node, using the IP addresses of your worker nodes and the HA cluster virtual IP address (VIP):

    The HA cluster virtual IP must be allocated from the same subnet used for your master and worker nodes.

    cumulus@<hostname>:~$ netq install cluster full interface eth0 bundle /mnt/installables/NetQ-4.8.0.tgz workers <worker-1-ip> <worker-2-ip> cluster-vip <vip-ip>

    You can specify the IP address instead of the interface name here: use ip-addr <IP address> in place of interface <ifname> above.

    If you have changed the IP address or hostname of the NetQ On-premises VM after this step, you need to re-register this address with NetQ as follows:

    Reset the VM, indicating whether you want to purge any NetQ DB data or keep it.

    cumulus@hostname:~$ netq bootstrap reset [purge-db|keep-db]

    Re-run the install CLI on the appliance. This example uses interface eth0. Replace this with your updated IP address, hostname or interface using the interface or ip-addr option.

    cumulus@hostname:~$ netq install cluster full interface eth0 bundle /mnt/installables/NetQ-4.8.0.tgz workers <worker-1-ip> <worker-2-ip> cluster-vip <vip-ip>

    If this step fails for any reason, you can run netq bootstrap reset and then try again.

    Verify Installation Status

    To view the status of the installation, use the netq show status [verbose] command. The following example shows a successful on-premises installation:

    State: Active
        NetQ Live State: Active
        Installation Status: FINISHED
        Version: 4.8.0
        Installer Version: 4.8.0
        Installation Type: Cluster
        Activation Key: EhVuZXRxLWVuZHBvaW50LWdhdGV3YXkYsagDIixPSUJCOHBPWUFnWXI2dGlGY2hTRzExR2E5aSt6ZnpjOUvpVVTaDdpZEhFPQ==
        Master SSH Public Key: c3NoLXJzYSBBQUFBQjNOemFDMXljMkVBQUFBREFRQUJBQUFCZ1FDNW9iVXB6RkczNkRC
        Is Cloud: False
        
        Kubernetes Cluster Nodes Status:
        IP Address    Hostname     Role    NodeStatus    Virtual IP
        ------------  -----------  ------  ------------  ------------
        10.213.7.52   10.213.7.52  Worker  Ready         10.213.7.53
        10.213.7.51   10.213.7.51  Worker  Ready         10.213.7.53
        10.213.7.49   10.213.7.49  Master  Ready         10.213.7.53
        
        In Summary, Live state of the NetQ is... Active

    Run the netq show opta-health command to verify all applications are operating properly. Allow 10-15 minutes for all applications to come up and report their status.

    cumulus@hostname:~$ netq show opta-health
        Application                                            Status    Namespace      Restarts    Timestamp
        -----------------------------------------------------  --------  -------------  ----------  ------------------------
        cassandra-rc-0-w7h4z                                   READY     default        0           Fri Apr 10 16:08:38 2020
        cp-schema-registry-deploy-6bf5cbc8cc-vwcsx             READY     default        0           Fri Apr 10 16:08:38 2020
        kafka-broker-rc-0-p9r2l                                READY     default        0           Fri Apr 10 16:08:38 2020
        kafka-connect-deploy-7799bcb7b4-xdm5l                  READY     default        0           Fri Apr 10 16:08:38 2020
        netq-api-gateway-deploy-55996ff7c8-w4hrs               READY     default        0           Fri Apr 10 16:08:38 2020
        netq-app-address-deploy-66776ccc67-phpqk               READY     default        0           Fri Apr 10 16:08:38 2020
        netq-app-admin-oob-mgmt-server                         READY     default        0           Fri Apr 10 16:08:38 2020
        netq-app-bgp-deploy-7dd4c9d45b-j9bfr                   READY     default        0           Fri Apr 10 16:08:38 2020
        netq-app-clagsession-deploy-69564895b4-qhcpr           READY     default        0           Fri Apr 10 16:08:38 2020
        netq-app-configdiff-deploy-ff54c4cc4-7rz66             READY     default        0           Fri Apr 10 16:08:38 2020
        ...
        

    If any of the applications or services display Status as DOWN after 30 minutes, open a support ticket and attach the output of the opta-support command.

    After NetQ is installed, you can log in to NetQ from your browser.

    Set Up Your VMware Virtual Machine for an On-premises Server Cluster

    First configure the VM on the master node, and then configure the VM on each worker node.

    Follow these steps to set up and configure your VM cluster for an on-premises deployment:

    1. Verify that each node in your cluster—the master node and two worker nodes—meets the VM requirements.

      Resource Minimum Requirements
      ProcessorSixteen (16) virtual CPUs
      Memory64 GB RAM
      Local disk storage500 GB SSD with minimum disk IOPS of 1000 for a standard 4kb block size
      (Note: This must be an SSD; use of other storage options can lead to system instability and are not supported.)
      Network interface speed 1 Gb NIC
      HypervisorVMware ESXi™ 6.5 or later (OVA image) for servers running Cumulus Linux, CentOS, Ubuntu, and RedHat operating systems
    2. Confirm that the required ports are open for communications.

      You must open the following ports on your NetQ on-premises servers:
      Port or Protocol Number Protocol Component Access
      4 IP Protocol Calico networking (IP-in-IP Protocol)
      22 TCP SSH
      80 TCP Nginx
      179 TCP Calico networking (BGP)
      443 TCP NetQ UI
      2379 TCP etcd datastore
      4789 UDP Calico networking (VxLAN)
      5000 TCP Docker registry
      6443 TCP kube-apiserver
      30001 TCP DPU communication
      31980 TCP NetQ Agent communication
      31982 TCP NetQ Agent SSL communication
      32708 TCP API Gateway
      Additionally, for internal cluster communication, you must open these ports:
      Port Protocol Component Access
      8080 TCP Admin API
      5000 TCP Docker registry
      6443 TCP Kubernetes API server
      10250 TCP kubelet health probe
      2379 TCP etcd
      2380 TCP etcd
      7072 TCP Kafka JMX monitoring
      9092 TCP Kafka client
      7071 TCP Cassandra JMX monitoring
      7000 TCP Cassandra cluster communication
      9042 TCP Cassandra client
      7073 TCP Zookeeper JMX monitoring
      2888 TCP Zookeeper cluster communication
      3888 TCP Zookeeper cluster communication
      2181 TCP Zookeeper client
      36443 TCP Kubernetes control plane
    3. Download the NetQ Platform image.

      1. On the NVIDIA Application Hub, log in to your account.
      2. Select NVIDIA Licensing Portal.
      3. Select Software Downloads from the menu.
      4. Click Product Family and select NetQ.
      5. Locate the NetQ SW 4.8 VMWare image and select Download.
      6. If prompted, read the license agreement and proceed with the download.

      For enterprise customers, if you do not see a link to the NVIDIA Licensing Portal on the NVIDIA Application Hub, contact NVIDIA support.


      For NVIDIA employees, download NetQ directly from the NVIDIA Licensing Portal.

    4. Set up and configure your VM.

      VMware Example Configuration This example shows the VM setup process using an OVA file with VMware ESXi.
      1. Enter the address of the hardware in your browser.

      2. Log in to VMware using credentials with root access.

      3. Click Storage in the Navigator to verify you have an SSD installed.

      4. Click Create/Register VM at the top of the right pane.

      5. Select Deploy a virtual machine from an OVF or OVA file, and click Next.

      6. Provide a name for the VM, for example NetQ.

        Tip: Make note of the name used during install as this is needed in a later step.

      7. Drag and drop the NetQ Platform image file you downloaded in Step 1 above.

    5. Click Next.

    6. Select the storage type and data store for the image to use, then click Next. In this example, only one is available.

    7. Accept the default deployment options or modify them according to your network needs. Click Next when you are finished.

    8. Review the configuration summary. Click Back to change any of the settings, or click Finish to continue with the creation of the VM.

      The progress of the request is shown in the Recent Tasks window at the bottom of the application. This may take some time, so continue with your other work until the upload finishes.

    9. Once completed, view the full details of the VM and hardware.

  • Log in to the VM and change the password.

    Use the default credentials to log in the first time:

    • Username: cumulus
    • Password: cumulus
    $ ssh cumulus@<ipaddr>
    Warning: Permanently added '<ipaddr>' (ECDSA) to the list of known hosts.
    Ubuntu 20.04 LTS
    cumulus@<ipaddr>'s password:
    You are required to change your password immediately (root enforced)
    System information as of Thu Dec  3 21:35:42 UTC 2020
    System load:  0.09              Processes:           120
    Usage of /:   8.1% of 61.86GB   Users logged in:     0
    Memory usage: 5%                IP address for eth0: <ipaddr>
    Swap usage:   0%
    WARNING: Your password has expired.
    You must change your password now and login again!
    Changing password for cumulus.
    (current) UNIX password: cumulus
    Enter new UNIX password:
    Retype new UNIX password:
    passwd: password updated successfully
    Connection to <ipaddr> closed.
    

    Log in again with your new password.

    $ ssh cumulus@<ipaddr>
    Warning: Permanently added '<ipaddr>' (ECDSA) to the list of known hosts.
    Ubuntu 20.04 LTS
    cumulus@<ipaddr>'s password:
      System information as of Thu Dec  3 21:35:59 UTC 2020
      System load:  0.07              Processes:           121
      Usage of /:   8.1% of 61.86GB   Users logged in:     0
      Memory usage: 5%                IP address for eth0: <ipaddr>
      Swap usage:   0%
    Last login: Thu Dec  3 21:35:43 2020 from <local-ipaddr>
    cumulus@ubuntu:~$
    
  • Verify the master node is ready for installation. Fix any errors indicated before installing the NetQ software.

    cumulus@hostname:~$ sudo opta-check
  • Change the hostname for the VM from the default value.

    The default hostname for the NetQ Virtual Machines is ubuntu. Change the hostname to fit your naming conventions while meeting Internet and Kubernetes naming standards.

    Kubernetes requires that hostnames are composed of a sequence of labels concatenated with dots. For example, “en.wikipedia.org” is a hostname. Each label must be from 1 to 63 characters long. The entire hostname, including the delimiting dots, has a maximum of 253 ASCII characters.

    The Internet standards (RFCs) for protocols specify that labels may contain only the ASCII letters a through z (in lower case), the digits 0 through 9, and the hyphen-minus character ('-').

    Use the following command:

    cumulus@hostname:~$ sudo hostnamectl set-hostname NEW_HOSTNAME

    Add the same NEW_HOSTNAME value to /etc/hosts on your VM for the localhost entry. Example:

    127.0.0.1 localhost NEW_HOSTNAME
  • Verify that your first worker node meets the VM requirements, as described in step 1.

  • Confirm that the required ports are open for communications, as described in step 2.

  • Open your hypervisor and set up the VM in the same manner as the master node.

    Make a note of the private IP address you assign to the worker node. You need it for later installation steps.

  • Verify the worker node is ready for installation. Fix any errors indicated before installing the NetQ software.

    cumulus@hostname:~$ sudo opta-check-cloud
  • Repeat steps 8 through 11 for each additional worker node in your cluster.

  • Install and activate the NetQ software using the CLI:

  • Run the following command on your master node to initialize the cluster. Copy the output of the command to use on your worker nodes:

    cumulus@<hostname>:~$ netq install cluster master-init
        Please run the following command on all worker nodes:
        netq install cluster worker-init c3NoLXJzYSBBQUFBQjNOemFDMXljMkVBQUFBREFRQUJBQUFCQVFDM2NjTTZPdVVUWWJ5c2Q3NlJ4SHdseHBsOHQ4N2VMRWVGR05LSWFWVnVNcy94OEE4RFNMQVhKOHVKRjVLUXBnVjdKM2lnMGJpL2hDMVhmSVVjU3l3ZmhvVDVZM3dQN1oySVZVT29ZTi8vR1lOek5nVlNocWZQMDNDRW0xNnNmSzVvUWRQTzQzRFhxQ3NjbndIT3dwZmhRYy9MWTU1a
    

    Run the netq install cluster worker-init <ssh-key> on each of your worker nodes.

    Run the following commands on your master node, using the IP addresses of your worker nodes:

    cumulus@<hostname>:~$ netq install cluster full interface eth0 bundle /mnt/installables/NetQ-4.8.0.tgz workers <worker-1-ip> <worker-2-ip>

    You can specify the IP address instead of the interface name here: use ip-addr <IP address> in place of interface <ifname> above.

    If you have changed the IP address or hostname of the NetQ On-premises VM after this step, you need to re-register this address with NetQ as follows:

    Reset the VM, indicating whether you want to purge any NetQ DB data or keep it.

    cumulus@hostname:~$ netq bootstrap reset [purge-db|keep-db]

    Re-run the install CLI on the appliance. This example uses interface eth0. Replace this with your updated IP address, hostname or interface using the interface or ip-addr option.

    cumulus@hostname:~$ netq install cluster full interface eth0 bundle /mnt/installables/NetQ-4.8.0.tgz workers <worker-1-ip> <worker-2-ip>

    If this step fails for any reason, you can run netq bootstrap reset and then try again.

    Verify Installation Status

    To view the status of the installation, use the netq show status [verbose] command. The following example shows a successful on-premises installation:

    State: Active
        Version: 4.8.0
        Installer Version: 4.8.0
        Installation Type: Standalone
        Activation Key: PKrgipMGEhVuZXRxLWVuZHBvaW50LWdhdGV3YXkYsagDIixUQmFLTUhzZU80RUdTL3pOT01uQ2lnRnrrUhTbXNPUGRXdnUwTVo5SEpBPTIHZGVmYXVsdDoHbmV0cWRldgz=
        Master SSH Public Key: a3NoLXJzYSBBQUFBQjNOemFDMXljMkVBQUFBREFRQUJBQUFCQVFEazliekZDblJUajkvQVhOZ0hteXByTzZIb3Y2cVZBWFdsNVNtKzVrTXo3dmMrcFNZTGlOdWl1bEhZeUZZVDhSNmU3bFdqS3NrSE10bzArNFJsQVd6cnRvbVVzLzlLMzQ4M3pUMjVZQXpIU2N1ZVhBSE1TdTZHZ0JyUkpXYUpTNjJ2RTkzcHBDVjBxWWJvUFo3aGpCY3ozb0VVWnRsU1lqQlZVdjhsVjBNN3JEWW52TXNGSURWLzJ2eks3K0x2N01XTG5aT054S09hdWZKZnVOT0R4YjFLbk1mN0JWK3hURUpLWW1mbTY1ckoyS1ArOEtFUllrr5TkF3bFVRTUdmT3daVHF2RWNoZnpQajMwQ29CWDZZMzVST2hDNmhVVnN5OEkwdjVSV0tCbktrWk81MWlMSDAyZUpJbXJHUGdQa2s1SzhJdGRrQXZISVlTZ0RwRlpRb3Igcm9vdEBucXRzLTEwLTE4OC00NC0xNDc=
        Is Cloud: False
        
        Cluster Status:
        IP Address     Hostname       Role    Status
        -------------  -------------  ------  --------
        10.188.44.147  10.188.44.147  Role    Ready
        
        NetQ... Active
        

    Run the netq show opta-health command to verify all applications are operating properly. Allow 10-15 minutes for all applications to come up and report their status.

    cumulus@hostname:~$ netq show opta-health
        Application                                            Status    Namespace      Restarts    Timestamp
        -----------------------------------------------------  --------  -------------  ----------  ------------------------
        cassandra-rc-0-w7h4z                                   READY     default        0           Fri Apr 10 16:08:38 2020
        cp-schema-registry-deploy-6bf5cbc8cc-vwcsx             READY     default        0           Fri Apr 10 16:08:38 2020
        kafka-broker-rc-0-p9r2l                                READY     default        0           Fri Apr 10 16:08:38 2020
        kafka-connect-deploy-7799bcb7b4-xdm5l                  READY     default        0           Fri Apr 10 16:08:38 2020
        netq-api-gateway-deploy-55996ff7c8-w4hrs               READY     default        0           Fri Apr 10 16:08:38 2020
        netq-app-address-deploy-66776ccc67-phpqk               READY     default        0           Fri Apr 10 16:08:38 2020
        netq-app-admin-oob-mgmt-server                         READY     default        0           Fri Apr 10 16:08:38 2020
        netq-app-bgp-deploy-7dd4c9d45b-j9bfr                   READY     default        0           Fri Apr 10 16:08:38 2020
        netq-app-clagsession-deploy-69564895b4-qhcpr           READY     default        0           Fri Apr 10 16:08:38 2020
        netq-app-configdiff-deploy-ff54c4cc4-7rz66             READY     default        0           Fri Apr 10 16:08:38 2020
        ...
        

    If any of the applications or services display Status as DOWN after 30 minutes, open a support ticket and attach the output of the opta-support command.

    After NetQ is installed, you can log in to NetQ from your browser.

    Set Up Your VMware Virtual Machine for a Cloud Server Cluster

    First configure the VM on the master node, and then configure the VM on each worker node.

    Follow these steps to set up and configure your VM on a cluster of servers in a cloud deployment:

    1. Verify that each node in your cluster—the master node and two worker nodes—meets the VM requirements.

      Resource Minimum Requirements
      Processor Four (4) virtual CPUs
      Memory8 GB RAM
      Local disk storage 64 GB
      Network interface speed 1 Gb NIC
      HypervisorVMware ESXi™ 6.5 or later (OVA image) for servers running Cumulus Linux, CentOS, Ubuntu, and RedHat operating systems
    2. Confirm that the required ports are open for communications. The OPTA must be able to initiate HTTPS connections (destination TCP port 443) to the netq.nvidia.com domain (*.netq.nvidia.com). You must also open the following ports on your NetQ OPTA:

      Port or Protocol Number Protocol Component Access
      4 IP Protocol Calico networking (IP-in-IP Protocol)
      22 TCP SSH
      80 TCP Nginx
      179 TCP Calico networking (BGP)
      443 TCP Nginx
      2379 TCP etcd datastore
      4789 UDP Calico networking (VxLAN)
      5000 TCP Docker registry
      6443 TCP kube-apiserver
      31980 TCP NetQ Agent communication
      31982 TCP NetQ Agent SSL communication
      32708 TCP API Gateway
      The following ports are used for internal cluster communication and must also be open between servers in your cluster:

      Port Protocol Component Access
      8080 TCP Admin API
      5000 TCP Docker registry
      6443 TCP Kubernetes API server
      10250 TCP kubelet health probe
      2379 TCP etcd
      2380 TCP etcd
      36443 TCP Kubernetes control plane

    3. Download the NetQ Platform image.

      1. On the NVIDIA Application Hub, log in to your account.
      2. Select NVIDIA Licensing Portal.
      3. Select Software Downloads from the menu.
      4. Click Product Family and select NetQ.
      5. Locate the NetQ SW 4.8 VMWare Cloud image and select Download.
      6. If prompted, read the license agreement and proceed with the download.

      For enterprise customers, if you do not see a link to the NVIDIA Licensing Portal on the NVIDIA Application Hub, contact NVIDIA support.


      For NVIDIA employees, download NetQ directly from the NVIDIA Licensing Portal.

    4. Set up and configure your VM.

      VMware Example Configuration This example shows the VM setup process using an OVA file with VMware ESXi.
      1. Enter the address of the hardware in your browser.

      2. Log in to VMware using credentials with root access.

      3. Click Storage in the Navigator to verify you have an SSD installed.

      4. Click Create/Register VM at the top of the right pane.

      5. Select Deploy a virtual machine from an OVF or OVA file, and click Next.

      6. Provide a name for the VM, for example NetQ.

        Tip: Make note of the name used during install as this is needed in a later step.

      7. Drag and drop the NetQ Platform image file you downloaded in Step 1 above.

    5. Click Next.

    6. Select the storage type and data store for the image to use, then click Next. In this example, only one is available.

    7. Accept the default deployment options or modify them according to your network needs. Click Next when you are finished.

    8. Review the configuration summary. Click Back to change any of the settings, or click Finish to continue with the creation of the VM.

      The progress of the request is shown in the Recent Tasks window at the bottom of the application. This may take some time, so continue with your other work until the upload finishes.

    9. Once completed, view the full details of the VM and hardware.

  • Log in to the VM and change the password.

    Use the default credentials to log in the first time:

    • Username: cumulus
    • Password: cumulus
    $ ssh cumulus@<ipaddr>
    Warning: Permanently added '<ipaddr>' (ECDSA) to the list of known hosts.
    Ubuntu 20.04 LTS
    cumulus@<ipaddr>'s password:
    You are required to change your password immediately (root enforced)
    System information as of Thu Dec  3 21:35:42 UTC 2020
    System load:  0.09              Processes:           120
    Usage of /:   8.1% of 61.86GB   Users logged in:     0
    Memory usage: 5%                IP address for eth0: <ipaddr>
    Swap usage:   0%
    WARNING: Your password has expired.
    You must change your password now and login again!
    Changing password for cumulus.
    (current) UNIX password: cumulus
    Enter new UNIX password:
    Retype new UNIX password:
    passwd: password updated successfully
    Connection to <ipaddr> closed.
    

    Log in again with your new password.

    $ ssh cumulus@<ipaddr>
    Warning: Permanently added '<ipaddr>' (ECDSA) to the list of known hosts.
    Ubuntu 20.04 LTS
    cumulus@<ipaddr>'s password:
      System information as of Thu Dec  3 21:35:59 UTC 2020
      System load:  0.07              Processes:           121
      Usage of /:   8.1% of 61.86GB   Users logged in:     0
      Memory usage: 5%                IP address for eth0: <ipaddr>
      Swap usage:   0%
    Last login: Thu Dec  3 21:35:43 2020 from <local-ipaddr>
    cumulus@ubuntu:~$
    
  • Verify the master node is ready for installation. Fix any errors indicated before installing the NetQ software.

    cumulus@hostname:~$ sudo opta-check-cloud
  • Change the hostname for the VM from the default value.

    The default hostname for the NetQ Virtual Machines is ubuntu. Change the hostname to fit your naming conventions while meeting Internet and Kubernetes naming standards.

    Kubernetes requires that hostnames are composed of a sequence of labels concatenated with dots. For example, “en.wikipedia.org” is a hostname. Each label must be from 1 to 63 characters long. The entire hostname, including the delimiting dots, has a maximum of 253 ASCII characters.

    The Internet standards (RFCs) for protocols specify that labels may contain only the ASCII letters a through z (in lower case), the digits 0 through 9, and the hyphen-minus character ('-').

    Use the following command:

    cumulus@hostname:~$ sudo hostnamectl set-hostname NEW_HOSTNAME

    Add the same NEW_HOSTNAME value to /etc/hosts on your VM for the localhost entry. Example:

    127.0.0.1 localhost NEW_HOSTNAME
  • Verify that your first worker node meets the VM requirements, as described in step 1.

  • Confirm that the required ports are open for communications, as described in step 2.

  • Open your hypervisor and set up the VM in the same manner as the master node.

    Make a note of the private IP address you assign to the worker node. You will need it at a later point in the installation process.

  • Verify the worker node is ready for installation. Fix any errors indicated before installing the NetQ software.

    cumulus@hostname:~$ sudo opta-check-cloud
  • Repeat steps 8 through 11 for each additional worker node in your cluster.

  • Install and activate the NetQ software using the CLI:

  • Run the following command on your master node to initialize the cluster. Copy the output of the command to use on your worker nodes:

    cumulus@<hostname>:~$ netq install cluster master-init
        Please run the following command on all worker nodes:
        netq install cluster worker-init c3NoLXJzYSBBQUFBQjNOemFDMXljMkVBQUFBREFRQUJBQUFCQVFDM2NjTTZPdVVUWWJ5c2Q3NlJ4SHdseHBsOHQ4N2VMRWVGR05LSWFWVnVNcy94OEE4RFNMQVhKOHVKRjVLUXBnVjdKM2lnMGJpL2hDMVhmSVVjU3l3ZmhvVDVZM3dQN1oySVZVT29ZTi8vR1lOek5nVlNocWZQMDNDRW0xNnNmSzVvUWRQTzQzRFhxQ3NjbndIT3dwZmhRYy9MWTU1a
        

    Run the netq install cluster worker-init <ssh-key> on each of your worker nodes.

    Run the following command on your NetQ cloud appliance with the config-key obtained from the email you received from NVIDIA titled NetQ Access Link. You can also obtain the configuration key through the NetQ UI.

    cumulus@<hostname>:~$ netq install opta cluster full interface eth0 bundle /mnt/installables/NetQ-4.8.0-opta.tgz config-key <your-config-key> workers <worker-1-ip> <worker-2-ip> [proxy-host <proxy-hostname> proxy-port <proxy-port>]
        

    You can specify the IP address instead of the interface name here: use ip-addr <IP address> in place of interface <ifname> above.

    If you have changed the IP address or hostname of the NetQ OPTA after this step, you need to re-register this address with NetQ as follows:

    Reset the VM:

    cumulus@hostname:~$ netq bootstrap reset

    Re-run the install CLI on the appliance. This example uses interface eth0. Replace this with your updated IP address, hostname or interface using the interface or ip-addr option.

    cumulus@hostname:~$ netq install opta cluster full interface eth0 bundle /mnt/installables/NetQ-4.8.0-opta.tgz config-key <your-config-key> workers <worker-1-ip> <worker-2-ip> [proxy-host  proxy-port ]

    If this step fails for any reason, you can run netq bootstrap reset and then try again.

    Consider the following for container environments, and make adjustments as needed.

    Calico Networking

    NetQ overrides the Calico default address range and changes it to 10.244.0.0/16. To modify this range, use the netq install opta command, specifying the default address range with the pod-ip-range option. For example:

    cumulus@hostname:~$ netq install opta standalone full interface eth0 bundle /mnt/installables/NetQ-4.8.0-opta.tgz config-key <your-config-key> pod-ip-range 10.255.0.0/16

    Docker Default Bridge Interface

    The default Docker bridge interface is disabled in NetQ. If you need to reenable the interface, contact support.

    Verify Installation Status

    To view the status of the installation, use the netq show status [verbose] command. The following example shows a successful on-premises installation:

    State: Active
        Version: 4.8.0
        Installer Version: 4.8.0
        Installation Type: Standalone
        Activation Key: PKrgipMGEhVuZXRxLWVuZHBvaW50LWdhdGV3YXkYsagDIixUQmFLTUhzZU80RUdTL3pOT01uQ2lnRnrrUhTbXNPUGRXdnUwTVo5SEpBPTIHZGVmYXVsdDoHbmV0cWRldgz=
        Master SSH Public Key: a3NoLXJzYSBBQUFBQjNOemFDMXljMkVBQUFBREFRQUJBQUFCQVFEazliekZDblJUajkvQVhOZ0hteXByTzZIb3Y2cVZBWFdsNVNtKzVrTXo3dmMrcFNZTGlOdWl1bEhZeUZZVDhSNmU3bFdqS3NrSE10bzArNFJsQVd6cnRvbVVzLzlLMzQ4M3pUMjVZQXpIU2N1ZVhBSE1TdTZHZ0JyUkpXYUpTNjJ2RTkzcHBDVjBxWWJvUFo3aGpCY3ozb0VVWnRsU1lqQlZVdjhsVjBNN3JEWW52TXNGSURWLzJ2eks3K0x2N01XTG5aT054S09hdWZKZnVOT0R4YjFLbk1mN0JWK3hURUpLWW1mbTY1ckoyS1ArOEtFUllrr5TkF3bFVRTUdmT3daVHF2RWNoZnpQajMwQ29CWDZZMzVST2hDNmhVVnN5OEkwdjVSV0tCbktrWk81MWlMSDAyZUpJbXJHUGdQa2s1SzhJdGRrQXZISVlTZ0RwRlpRb3Igcm9vdEBucXRzLTEwLTE4OC00NC0xNDc=
        Is Cloud: False
        
        Cluster Status:
        IP Address     Hostname       Role    Status
        -------------  -------------  ------  --------
        10.188.44.147  10.188.44.147  Role    Ready
        
        NetQ... Active
        

    Run the netq show opta-health command to verify all applications are operating properly. Allow 10-15 minutes for all applications to come up and report their status.

    cumulus@hostname:~$ netq show opta-health
        Application                                            Status    Namespace      Restarts    Timestamp
        -----------------------------------------------------  --------  -------------  ----------  ------------------------
        cassandra-rc-0-w7h4z                                   READY     default        0           Fri Apr 10 16:08:38 2020
        cp-schema-registry-deploy-6bf5cbc8cc-vwcsx             READY     default        0           Fri Apr 10 16:08:38 2020
        kafka-broker-rc-0-p9r2l                                READY     default        0           Fri Apr 10 16:08:38 2020
        kafka-connect-deploy-7799bcb7b4-xdm5l                  READY     default        0           Fri Apr 10 16:08:38 2020
        netq-api-gateway-deploy-55996ff7c8-w4hrs               READY     default        0           Fri Apr 10 16:08:38 2020
        netq-app-address-deploy-66776ccc67-phpqk               READY     default        0           Fri Apr 10 16:08:38 2020
        netq-app-admin-oob-mgmt-server                         READY     default        0           Fri Apr 10 16:08:38 2020
        netq-app-bgp-deploy-7dd4c9d45b-j9bfr                   READY     default        0           Fri Apr 10 16:08:38 2020
        netq-app-clagsession-deploy-69564895b4-qhcpr           READY     default        0           Fri Apr 10 16:08:38 2020
        netq-app-configdiff-deploy-ff54c4cc4-7rz66             READY     default        0           Fri Apr 10 16:08:38 2020
        ...
        

    If any of the applications or services display Status as DOWN after 30 minutes, open a support ticket and attach the output of the opta-support command.

    After NetQ is installed, you can log in to NetQ from your browser.

    Set Up Your KVM Virtual Machine for a Single On-premises Server

    Follow these steps to set up and configure your VM on a single server in an on-premises deployment:

    1. Verify that your system meets the VM requirements.

      Resource Minimum Requirements
      ProcessorSixteen (16) virtual CPUs
      Memory64 GB RAM
      Local disk storage500 GB SSD with minimum disk IOPS of 1000 for a standard 4kb block size
      (Note: This must be an SSD; use of other storage options can lead to system instability and are not supported.)
      Network interface speed 1 Gb NIC
      HypervisorKVM/QCOW (QEMU Copy on Write) image for servers running CentOS, Ubuntu, and RedHat operating systems
    2. Confirm that the required ports are open for communications.

      You must open the following ports on your NetQ on-premises server:
      Port or Protocol Number Protocol Component Access
      4 IP Protocol Calico networking (IP-in-IP Protocol)
      22 TCP SSH
      80 TCP Nginx
      179 TCP Calico networking (BGP)
      443 TCP NetQ UI
      2379 TCP etcd datastore
      4789 UDP Calico networking (VxLAN)
      5000 TCP Docker registry
      6443 TCP kube-apiserver
      30001 TCP DPU communication
      31980 TCP NetQ Agent communication
      31982 TCP NetQ Agent SSL communication
      32708 TCP API Gateway
    3. Download the NetQ Platform image.

      1. On the NVIDIA Application Hub, log in to your account.
      2. Select NVIDIA Licensing Portal.
      3. Select Software Downloads from the menu.
      4. Click Product Family and select NetQ.
      5. Locate the NetQ SW 4.8 KVM image and select Download.
      6. If prompted, read the license agreement and proceed with the download.

      For enterprise customers, if you do not see a link to the NVIDIA Licensing Portal on the NVIDIA Application Hub, contact NVIDIA support.


      For NVIDIA employees, download NetQ directly from the NVIDIA Licensing Portal.

    4. Set up and configure your VM.

      KVM Example Configuration

      This example shows the VM setup process for a system with Libvirt and KVM/QEMU installed.

      1. Confirm that the SHA256 checksum matches the one posted on the NVIDIA Application Hub to ensure the image download has not been corrupted.

        $ sha256sum ./Downloads/netq-4.8.0-ubuntu-20.04-ts-qemu.qcow2
        $ 0A00383666376471A8190E2367B27068B81D6EE00FDE885C68F4E3B3025A00B6 ./Downloads/netq-4.8.0-ubuntu-20.04-ts-qemu.qcow2
      2. Copy the QCOW2 image to a directory where you want to run it.

        Tip: Copy, instead of moving, the original QCOW2 image that was downloaded to avoid re-downloading it again later should you need to perform this process again.

        $ sudo mkdir /vms
        $ sudo cp ./Downloads/netq-4.8.0-ubuntu-20.04-ts-qemu.qcow2 /vms/ts.qcow2
      3. Create the VM.

        For a Direct VM, where the VM uses a MACVLAN interface to sit on the host interface for its connectivity:

        $ virt-install --name=netq_ts --vcpus=16 --memory=65536 --os-type=linux --os-variant=generic --disk path=/vms/ts.qcow2,format=qcow2,bus=virtio,cache=none --network=type=direct,source=eth0,model=virtio --import --noautoconsole

        Replace the disk path value with the location where the QCOW2 image is to reside. Replace network model value (eth0 in the above example) with the name of the interface where the VM is connected to the external network.

        Or, for a Bridged VM, where the VM attaches to a bridge which has already been setup to allow for external access:

        $ virt-install --name=netq_ts --vcpus=16 --memory=65536 --os-type=linux --os-variant=generic \ --disk path=/vms/ts.qcow2,format=qcow2,bus=virtio,cache=none --network=bridge=br0,model=virtio --import --noautoconsole

        Replace network bridge value (br0 in the above example) with the name of the (pre-existing) bridge interface where the VM is connected to the external network.

        Make note of the name used during install as this is needed in a later step.

      4. Watch the boot process in another terminal window.
        $ virsh console netq_ts
    5. Log in to the VM and change the password.

      Use the default credentials to log in the first time:

      • Username: cumulus
      • Password: cumulus
      $ ssh cumulus@<ipaddr>
      Warning: Permanently added '<ipaddr>' (ECDSA) to the list of known hosts.
      Ubuntu 20.04 LTS
      cumulus@<ipaddr>'s password:
      You are required to change your password immediately (root enforced)
      System information as of Thu Dec  3 21:35:42 UTC 2020
      System load:  0.09              Processes:           120
      Usage of /:   8.1% of 61.86GB   Users logged in:     0
      Memory usage: 5%                IP address for eth0: <ipaddr>
      Swap usage:   0%
      WARNING: Your password has expired.
      You must change your password now and login again!
      Changing password for cumulus.
      (current) UNIX password: cumulus
      Enter new UNIX password:
      Retype new UNIX password:
      passwd: password updated successfully
      Connection to <ipaddr> closed.
      

      Log in again with your new password.

      $ ssh cumulus@<ipaddr>
      Warning: Permanently added '<ipaddr>' (ECDSA) to the list of known hosts.
      Ubuntu 20.04 LTS
      cumulus@<ipaddr>'s password:
        System information as of Thu Dec  3 21:35:59 UTC 2020
        System load:  0.07              Processes:           121
        Usage of /:   8.1% of 61.86GB   Users logged in:     0
        Memory usage: 5%                IP address for eth0: <ipaddr>
        Swap usage:   0%
      Last login: Thu Dec  3 21:35:43 2020 from <local-ipaddr>
      cumulus@ubuntu:~$
      
    6. Verify the platform is ready for installation. Fix any errors indicated before installing the NetQ software.

      cumulus@hostname:~$ sudo opta-check
    7. Change the hostname for the VM from the default value.

      The default hostname for the NetQ Virtual Machines is ubuntu. Change the hostname to fit your naming conventions while meeting Internet and Kubernetes naming standards.

      Kubernetes requires that hostnames are composed of a sequence of labels concatenated with dots. For example, “en.wikipedia.org” is a hostname. Each label must be from 1 to 63 characters long. The entire hostname, including the delimiting dots, has a maximum of 253 ASCII characters.

      The Internet standards (RFCs) for protocols specify that labels may contain only the ASCII letters a through z (in lower case), the digits 0 through 9, and the hyphen-minus character ('-').

      Use the following command:

      cumulus@hostname:~$ sudo hostnamectl set-hostname NEW_HOSTNAME

      Add the same NEW_HOSTNAME value to /etc/hosts on your VM for the localhost entry. Example:

      127.0.0.1 localhost NEW_HOSTNAME
    8. Install and activate the NetQ software:

    Run the following command on your NetQ platform server:

    cumulus@hostname:~$ netq install standalone full interface eth0 bundle /mnt/installables/NetQ-4.8.0.tgz

    You can specify the IP address instead of the interface name here: use ip-addr <IP address> in place of interface <ifname> above.

    If you have changed the IP address or hostname of the NetQ on-premises VM after this step, you need to re-register this address with NetQ as follows:

    Reset the VM, indicating whether you want to purge any NetQ DB data or keep it.

    cumulus@hostname:~$ netq bootstrap reset [purge-db|keep-db]

    Re-run the install CLI on the appliance. This example uses interface eth0. Replace this with your updated IP address, hostname or interface using the interface or ip-addr option.

    cumulus@hostname:~$ netq install standalone full interface eth0 bundle /mnt/installables/NetQ-4.8.0.tgz

    If this step fails for any reason, you can run netq bootstrap reset and then try again.

    Verify Installation Status

    To view the status of the installation, use the netq show status [verbose] command. The following example shows a successful on-premises installation:

    State: Active
        Version: 4.8.0
        Installer Version: 4.8.0
        Installation Type: Standalone
        Activation Key: PKrgipMGEhVuZXRxLWVuZHBvaW50LWdhdGV3YXkYsagDIixUQmFLTUhzZU80RUdTL3pOT01uQ2lnRnrrUhTbXNPUGRXdnUwTVo5SEpBPTIHZGVmYXVsdDoHbmV0cWRldgz=
        Master SSH Public Key: a3NoLXJzYSBBQUFBQjNOemFDMXljMkVBQUFBREFRQUJBQUFCQVFEazliekZDblJUajkvQVhOZ0hteXByTzZIb3Y2cVZBWFdsNVNtKzVrTXo3dmMrcFNZTGlOdWl1bEhZeUZZVDhSNmU3bFdqS3NrSE10bzArNFJsQVd6cnRvbVVzLzlLMzQ4M3pUMjVZQXpIU2N1ZVhBSE1TdTZHZ0JyUkpXYUpTNjJ2RTkzcHBDVjBxWWJvUFo3aGpCY3ozb0VVWnRsU1lqQlZVdjhsVjBNN3JEWW52TXNGSURWLzJ2eks3K0x2N01XTG5aT054S09hdWZKZnVOT0R4YjFLbk1mN0JWK3hURUpLWW1mbTY1ckoyS1ArOEtFUllrr5TkF3bFVRTUdmT3daVHF2RWNoZnpQajMwQ29CWDZZMzVST2hDNmhVVnN5OEkwdjVSV0tCbktrWk81MWlMSDAyZUpJbXJHUGdQa2s1SzhJdGRrQXZISVlTZ0RwRlpRb3Igcm9vdEBucXRzLTEwLTE4OC00NC0xNDc=
        Is Cloud: False
        
        Cluster Status:
        IP Address     Hostname       Role    Status
        -------------  -------------  ------  --------
        10.188.44.147  10.188.44.147  Role    Ready
        
        NetQ... Active
        

    Run the netq show opta-health command to verify all applications are operating properly. Allow 10-15 minutes for all applications to come up and report their status.

    cumulus@hostname:~$ netq show opta-health
        Application                                            Status    Namespace      Restarts    Timestamp
        -----------------------------------------------------  --------  -------------  ----------  ------------------------
        cassandra-rc-0-w7h4z                                   READY     default        0           Fri Apr 10 16:08:38 2020
        cp-schema-registry-deploy-6bf5cbc8cc-vwcsx             READY     default        0           Fri Apr 10 16:08:38 2020
        kafka-broker-rc-0-p9r2l                                READY     default        0           Fri Apr 10 16:08:38 2020
        kafka-connect-deploy-7799bcb7b4-xdm5l                  READY     default        0           Fri Apr 10 16:08:38 2020
        netq-api-gateway-deploy-55996ff7c8-w4hrs               READY     default        0           Fri Apr 10 16:08:38 2020
        netq-app-address-deploy-66776ccc67-phpqk               READY     default        0           Fri Apr 10 16:08:38 2020
        netq-app-admin-oob-mgmt-server                         READY     default        0           Fri Apr 10 16:08:38 2020
        netq-app-bgp-deploy-7dd4c9d45b-j9bfr                   READY     default        0           Fri Apr 10 16:08:38 2020
        netq-app-clagsession-deploy-69564895b4-qhcpr           READY     default        0           Fri Apr 10 16:08:38 2020
        netq-app-configdiff-deploy-ff54c4cc4-7rz66             READY     default        0           Fri Apr 10 16:08:38 2020
        ...
        

    If any of the applications or services display Status as DOWN after 30 minutes, open a support ticket and attach the output of the opta-support command.

    After NetQ is installed, you can log in to NetQ from your browser.

    Set Up Your KVM Virtual Machine for a Single Cloud Server

    Follow these steps to set up and configure your VM on a single server in a cloud deployment:

    1. Verify that your system meets the VM requirements.

      Resource Minimum Requirements
      Processor Four (4) virtual CPUs
      Memory8 GB RAM
      Local disk storage 64 GB
      Network interface speed 1 Gb NIC
      HypervisorKVM/QCOW (QEMU Copy on Write) image for servers running CentOS, Ubuntu, and RedHat operating systems
    2. Confirm that the required ports are open for communications. The OPTA must be able to initiate HTTPS connections (destination TCP port 443) to the netq.nvidia.com domain (*.netq.nvidia.com). You must also open the following ports on your NetQ OPTA:

      Port or Protocol Number Protocol Component Access
      4 IP Protocol Calico networking (IP-in-IP Protocol)
      22 TCP SSH
      80 TCP Nginx
      179 TCP Calico networking (BGP)
      443 TCP Nginx
      2379 TCP etcd datastore
      4789 UDP Calico networking (VxLAN)
      5000 TCP Docker registry
      6443 TCP kube-apiserver
      31980 TCP NetQ Agent communication
      31982 TCP NetQ Agent SSL communication
      32708 TCP API Gateway

    3. Download the NetQ images.

      1. On the NVIDIA Application Hub, log in to your account.
      2. Select NVIDIA Licensing Portal.
      3. Select Software Downloads from the menu.
      4. Click Product Family and select NetQ.
      5. Locate the NetQ SW 4.8 KVM Cloud image and select Download.
      6. If prompted, read the license agreement and proceed with the download.

      For enterprise customers, if you do not see a link to the NVIDIA Licensing Portal on the NVIDIA Application Hub, contact NVIDIA support.


      For NVIDIA employees, download NetQ directly from the NVIDIA Licensing Portal.

    4. Set up and configure your VM.

      KVM Example Configuration

      This example shows the VM setup process for a system with Libvirt and KVM/QEMU installed.

      1. Confirm that the SHA256 checksum matches the one posted on the NVIDIA Application Hub to ensure the image download has not been corrupted.

        $ sha256sum ./Downloads/netq-4.8.0-ubuntu-20.04-tscloud-qemu.qcow2
        $ FE353FC06D3F843F4041D74C853D38B0A56036C5886F6233A3ED1A9464AEB783 ./Downloads/netq-4.8.0-ubuntu-20.04-tscloud-qemu.qcow2
      2. Copy the QCOW2 image to a directory where you want to run it.

        Tip: Copy, instead of moving, the original QCOW2 image that was downloaded to avoid re-downloading it again later should you need to perform this process again.

        $ sudo mkdir /vms
        $ sudo cp ./Downloads/netq-4.8.0-ubuntu-20.04-tscloud-qemu.qcow2 /vms/ts.qcow2
      3. Create the VM.

        For a Direct VM, where the VM uses a MACVLAN interface to sit on the host interface for its connectivity:

        $ virt-install --name=netq_ts --vcpus=4 --memory=8192 --os-type=linux --os-variant=generic --disk path=/vms/ts.qcow2,format=qcow2,bus=virtio,cache=none --network=type=direct,source=eth0,model=virtio --import --noautoconsole

        Replace the disk path value with the location where the QCOW2 image is to reside. Replace network model value (eth0 in the above example) with the name of the interface where the VM is connected to the external network.

        Or, for a Bridged VM, where the VM attaches to a bridge which has already been setup to allow for external access:

        $ virt-install --name=netq_ts --vcpus=4 --memory=8192 --os-type=linux --os-variant=generic \ --disk path=/vms/ts.qcow2,format=qcow2,bus=virtio,cache=none --network=bridge=br0,model=virtio --import --noautoconsole

        Replace network bridge value (br0 in the above example) with the name of the (pre-existing) bridge interface where the VM is connected to the external network.

        Make note of the name used during install as this is needed in a later step.

      4. Watch the boot process in another terminal window.
        $ virsh console netq_ts
    5. Log in to the VM and change the password.

      Use the default credentials to log in the first time:

      • Username: cumulus
      • Password: cumulus
      $ ssh cumulus@<ipaddr>
      Warning: Permanently added '<ipaddr>' (ECDSA) to the list of known hosts.
      Ubuntu 20.04 LTS
      cumulus@<ipaddr>'s password:
      You are required to change your password immediately (root enforced)
      System information as of Thu Dec  3 21:35:42 UTC 2020
      System load:  0.09              Processes:           120
      Usage of /:   8.1% of 61.86GB   Users logged in:     0
      Memory usage: 5%                IP address for eth0: <ipaddr>
      Swap usage:   0%
      WARNING: Your password has expired.
      You must change your password now and login again!
      Changing password for cumulus.
      (current) UNIX password: cumulus
      Enter new UNIX password:
      Retype new UNIX password:
      passwd: password updated successfully
      Connection to <ipaddr> closed.
      

      Log in again with your new password.

      $ ssh cumulus@<ipaddr>
      Warning: Permanently added '<ipaddr>' (ECDSA) to the list of known hosts.
      Ubuntu 20.04 LTS
      cumulus@<ipaddr>'s password:
        System information as of Thu Dec  3 21:35:59 UTC 2020
        System load:  0.07              Processes:           121
        Usage of /:   8.1% of 61.86GB   Users logged in:     0
        Memory usage: 5%                IP address for eth0: <ipaddr>
        Swap usage:   0%
      Last login: Thu Dec  3 21:35:43 2020 from <local-ipaddr>
      cumulus@ubuntu:~$
      
    6. Verify the platform is ready for installation. Fix any errors indicated before installing the NetQ software.

      cumulus@hostname:~$ sudo opta-check-cloud
    7. Change the hostname for the VM from the default value.

      The default hostname for the NetQ Virtual Machines is ubuntu. Change the hostname to fit your naming conventions while meeting Internet and Kubernetes naming standards.

      Kubernetes requires that hostnames are composed of a sequence of labels concatenated with dots. For example, “en.wikipedia.org” is a hostname. Each label must be from 1 to 63 characters long. The entire hostname, including the delimiting dots, has a maximum of 253 ASCII characters.

      The Internet standards (RFCs) for protocols specify that labels may contain only the ASCII letters a through z (in lower case), the digits 0 through 9, and the hyphen-minus character ('-').

      Use the following command:

      cumulus@hostname:~$ sudo hostnamectl set-hostname NEW_HOSTNAME

      Add the same NEW_HOSTNAME value to /etc/hosts on your VM for the localhost entry. Example:

      127.0.0.1 localhost NEW_HOSTNAME
    8. Install and activate the NetQ software using the CLI:

    Run the following command on your NetQ cloud appliance with the config-key obtained from the email you received from NVIDIA titled NetQ Access Link. You can also obtain the configuration key through the NetQ UI.

    cumulus@<hostname>:~$ netq install opta standalone full interface eth0 bundle /mnt/installables/NetQ-4.8.0-opta.tgz config-key <your-config-key> [proxy-host <proxy-hostname> proxy-port <proxy-port>]
    

    You can specify the IP address instead of the interface name here: use ip-addr <IP address> in place of interface <ifname> above.

    If you have changed the IP address or hostname of the NetQ OPTA after this step, you need to re-register this address with NetQ as follows:

    Reset the VM:

    cumulus@hostname:~$ netq bootstrap reset

    Re-run the install CLI on the appliance. This example uses interface eth0. Replace this with your updated IP address, hostname or interface using the interface or ip-addr option.

    cumulus@hostname:~$ netq install opta standalone full interface eth0 bundle /mnt/installables/NetQ-4.8.0-opta.tgz config-key <your-config-key> [proxy-host  proxy-port ]

    If this step fails for any reason, you can run netq bootstrap reset and then try again.

    Consider the following for container environments, and make adjustments as needed.

    Calico Networking

    NetQ overrides the Calico default address range and changes it to 10.244.0.0/16. To modify this range, use the netq install opta command, specifying the default address range with the pod-ip-range option. For example:

    cumulus@hostname:~$ netq install opta standalone full interface eth0 bundle /mnt/installables/NetQ-4.8.0-opta.tgz config-key <your-config-key> pod-ip-range 10.255.0.0/16

    Docker Default Bridge Interface

    The default Docker bridge interface is disabled in NetQ. If you need to reenable the interface, contact support.

    Verify Installation Status

    To view the status of the installation, use the netq show status [verbose] command. The following example shows a successful on-premises installation:

    State: Active
        Version: 4.8.0
        Installer Version: 4.8.0
        Installation Type: Standalone
        Activation Key: PKrgipMGEhVuZXRxLWVuZHBvaW50LWdhdGV3YXkYsagDIixUQmFLTUhzZU80RUdTL3pOT01uQ2lnRnrrUhTbXNPUGRXdnUwTVo5SEpBPTIHZGVmYXVsdDoHbmV0cWRldgz=
        Master SSH Public Key: a3NoLXJzYSBBQUFBQjNOemFDMXljMkVBQUFBREFRQUJBQUFCQVFEazliekZDblJUajkvQVhOZ0hteXByTzZIb3Y2cVZBWFdsNVNtKzVrTXo3dmMrcFNZTGlOdWl1bEhZeUZZVDhSNmU3bFdqS3NrSE10bzArNFJsQVd6cnRvbVVzLzlLMzQ4M3pUMjVZQXpIU2N1ZVhBSE1TdTZHZ0JyUkpXYUpTNjJ2RTkzcHBDVjBxWWJvUFo3aGpCY3ozb0VVWnRsU1lqQlZVdjhsVjBNN3JEWW52TXNGSURWLzJ2eks3K0x2N01XTG5aT054S09hdWZKZnVOT0R4YjFLbk1mN0JWK3hURUpLWW1mbTY1ckoyS1ArOEtFUllrr5TkF3bFVRTUdmT3daVHF2RWNoZnpQajMwQ29CWDZZMzVST2hDNmhVVnN5OEkwdjVSV0tCbktrWk81MWlMSDAyZUpJbXJHUGdQa2s1SzhJdGRrQXZISVlTZ0RwRlpRb3Igcm9vdEBucXRzLTEwLTE4OC00NC0xNDc=
        Is Cloud: False
        
        Cluster Status:
        IP Address     Hostname       Role    Status
        -------------  -------------  ------  --------
        10.188.44.147  10.188.44.147  Role    Ready
        
        NetQ... Active
        

    Run the netq show opta-health command to verify all applications are operating properly. Allow 10-15 minutes for all applications to come up and report their status.

    cumulus@hostname:~$ netq show opta-health
        Application                                            Status    Namespace      Restarts    Timestamp
        -----------------------------------------------------  --------  -------------  ----------  ------------------------
        cassandra-rc-0-w7h4z                                   READY     default        0           Fri Apr 10 16:08:38 2020
        cp-schema-registry-deploy-6bf5cbc8cc-vwcsx             READY     default        0           Fri Apr 10 16:08:38 2020
        kafka-broker-rc-0-p9r2l                                READY     default        0           Fri Apr 10 16:08:38 2020
        kafka-connect-deploy-7799bcb7b4-xdm5l                  READY     default        0           Fri Apr 10 16:08:38 2020
        netq-api-gateway-deploy-55996ff7c8-w4hrs               READY     default        0           Fri Apr 10 16:08:38 2020
        netq-app-address-deploy-66776ccc67-phpqk               READY     default        0           Fri Apr 10 16:08:38 2020
        netq-app-admin-oob-mgmt-server                         READY     default        0           Fri Apr 10 16:08:38 2020
        netq-app-bgp-deploy-7dd4c9d45b-j9bfr                   READY     default        0           Fri Apr 10 16:08:38 2020
        netq-app-clagsession-deploy-69564895b4-qhcpr           READY     default        0           Fri Apr 10 16:08:38 2020
        netq-app-configdiff-deploy-ff54c4cc4-7rz66             READY     default        0           Fri Apr 10 16:08:38 2020
        ...
        

    If any of the applications or services display Status as DOWN after 30 minutes, open a support ticket and attach the output of the opta-support command.

    After NetQ is installed, you can log in to NetQ from your browser.

    Set Up Your KVM Virtual Machine for an On-premises HA Server Cluster

    First configure the VM on the master node, and then configure the VM on each worker node.

    Follow these steps to set up and configure your VM on a cluster of servers in an on-premises deployment:

    1. Verify that each node in your cluster—the master node and two worker nodes—meets the VM requirements.

      Resource Minimum Requirements
      ProcessorSixteen (16) virtual CPUs
      Memory64 GB RAM
      Local disk storage500 GB SSD with minimum disk IOPS of 1000 for a standard 4kb block size
      (Note: This must be an SSD; use of other storage options can lead to system instability and are not supported.)
      Network interface speed 1 Gb NIC
      HypervisorKVM/QCOW (QEMU Copy on Write) image for servers running CentOS, Ubuntu, and RedHat operating systems
    2. Confirm that the required ports are open for communications.

      You must open the following ports on your NetQ on-premises servers:
      Port or Protocol Number Protocol Component Access
      4 IP Protocol Calico networking (IP-in-IP Protocol)
      22 TCP SSH
      80 TCP Nginx
      179 TCP Calico networking (BGP)
      443 TCP NetQ UI
      2379 TCP etcd datastore
      4789 UDP Calico networking (VxLAN)
      5000 TCP Docker registry
      6443 TCP kube-apiserver
      30001 TCP DPU communication
      31980 TCP NetQ Agent communication
      31982 TCP NetQ Agent SSL communication
      32708 TCP API Gateway
      Additionally, for internal cluster communication, you must open these ports:
      Port Protocol Component Access
      8080 TCP Admin API
      5000 TCP Docker registry
      6443 TCP Kubernetes API server
      10250 TCP kubelet health probe
      2379 TCP etcd
      2380 TCP etcd
      7072 TCP Kafka JMX monitoring
      9092 TCP Kafka client
      7071 TCP Cassandra JMX monitoring
      7000 TCP Cassandra cluster communication
      9042 TCP Cassandra client
      7073 TCP Zookeeper JMX monitoring
      2888 TCP Zookeeper cluster communication
      3888 TCP Zookeeper cluster communication
      2181 TCP Zookeeper client
      36443 TCP Kubernetes control plane
    3. Download the NetQ Platform image.

      1. On the NVIDIA Application Hub, log in to your account.
      2. Select NVIDIA Licensing Portal.
      3. Select Software Downloads from the menu.
      4. Click Product Family and select NetQ.
      5. Locate the NetQ SW 4.8 KVM image and select Download.
      6. If prompted, read the license agreement and proceed with the download.

      For enterprise customers, if you do not see a link to the NVIDIA Licensing Portal on the NVIDIA Application Hub, contact NVIDIA support.


      For NVIDIA employees, download NetQ directly from the NVIDIA Licensing Portal.

    4. Set up and configure your VM.

      KVM Example Configuration

      This example shows the VM setup process for a system with Libvirt and KVM/QEMU installed.

      1. Confirm that the SHA256 checksum matches the one posted on the NVIDIA Application Hub to ensure the image download has not been corrupted.

        $ sha256sum ./Downloads/netq-4.8.0-ubuntu-20.04-ts-qemu.qcow2
        $ 0A00383666376471A8190E2367B27068B81D6EE00FDE885C68F4E3B3025A00B6 ./Downloads/netq-4.8.0-ubuntu-20.04-ts-qemu.qcow2
      2. Copy the QCOW2 image to a directory where you want to run it.

        Tip: Copy, instead of moving, the original QCOW2 image that was downloaded to avoid re-downloading it again later should you need to perform this process again.

        $ sudo mkdir /vms
        $ sudo cp ./Downloads/netq-4.8.0-ubuntu-20.04-ts-qemu.qcow2 /vms/ts.qcow2
      3. Create the VM.

        For a Direct VM, where the VM uses a MACVLAN interface to sit on the host interface for its connectivity:

        $ virt-install --name=netq_ts --vcpus=16 --memory=65536 --os-type=linux --os-variant=generic --disk path=/vms/ts.qcow2,format=qcow2,bus=virtio,cache=none --network=type=direct,source=eth0,model=virtio --import --noautoconsole

        Replace the disk path value with the location where the QCOW2 image is to reside. Replace network model value (eth0 in the above example) with the name of the interface where the VM is connected to the external network.

        Or, for a Bridged VM, where the VM attaches to a bridge which has already been setup to allow for external access:

        $ virt-install --name=netq_ts --vcpus=16 --memory=65536 --os-type=linux --os-variant=generic \ --disk path=/vms/ts.qcow2,format=qcow2,bus=virtio,cache=none --network=bridge=br0,model=virtio --import --noautoconsole

        Replace network bridge value (br0 in the above example) with the name of the (pre-existing) bridge interface where the VM is connected to the external network.

        Make note of the name used during install as this is needed in a later step.

      4. Watch the boot process in another terminal window.
        $ virsh console netq_ts
    5. Log in to the VM and change the password.

      Use the default credentials to log in the first time:

      • Username: cumulus
      • Password: cumulus
      $ ssh cumulus@<ipaddr>
      Warning: Permanently added '<ipaddr>' (ECDSA) to the list of known hosts.
      Ubuntu 20.04 LTS
      cumulus@<ipaddr>'s password:
      You are required to change your password immediately (root enforced)
      System information as of Thu Dec  3 21:35:42 UTC 2020
      System load:  0.09              Processes:           120
      Usage of /:   8.1% of 61.86GB   Users logged in:     0
      Memory usage: 5%                IP address for eth0: <ipaddr>
      Swap usage:   0%
      WARNING: Your password has expired.
      You must change your password now and login again!
      Changing password for cumulus.
      (current) UNIX password: cumulus
      Enter new UNIX password:
      Retype new UNIX password:
      passwd: password updated successfully
      Connection to <ipaddr> closed.
      

      Log in again with your new password.

      $ ssh cumulus@<ipaddr>
      Warning: Permanently added '<ipaddr>' (ECDSA) to the list of known hosts.
      Ubuntu 20.04 LTS
      cumulus@<ipaddr>'s password:
        System information as of Thu Dec  3 21:35:59 UTC 2020
        System load:  0.07              Processes:           121
        Usage of /:   8.1% of 61.86GB   Users logged in:     0
        Memory usage: 5%                IP address for eth0: <ipaddr>
        Swap usage:   0%
      Last login: Thu Dec  3 21:35:43 2020 from <local-ipaddr>
      cumulus@ubuntu:~$
      
    6. Verify the master node is ready for installation. Fix any errors indicated before installing the NetQ software.

      cumulus@hostname:~$ sudo opta-check
    7. Change the hostname for the VM from the default value.

      The default hostname for the NetQ Virtual Machines is ubuntu. Change the hostname to fit your naming conventions while meeting Internet and Kubernetes naming standards.

      Kubernetes requires that hostnames are composed of a sequence of labels concatenated with dots. For example, “en.wikipedia.org” is a hostname. Each label must be from 1 to 63 characters long. The entire hostname, including the delimiting dots, has a maximum of 253 ASCII characters.

      The Internet standards (RFCs) for protocols specify that labels may contain only the ASCII letters a through z (in lower case), the digits 0 through 9, and the hyphen-minus character ('-').

      Use the following command:

      cumulus@hostname:~$ sudo hostnamectl set-hostname NEW_HOSTNAME

      Add the same NEW_HOSTNAME value to /etc/hosts on your VM for the localhost entry. Example:

      127.0.0.1 localhost NEW_HOSTNAME
    8. Verify that your first worker node meets the VM requirements, as described in step 1.

    9. Confirm that the required ports are open for communications, as described in step 2.

    10. Open your hypervisor and set up the VM in the same manner as for the master node.

      Make a note of the private IP address you assign to the worker node. You need it for later installation steps.

    11. Verify the worker node is ready for installation. Fix any errors indicated before installing the NetQ software.

      cumulus@hostname:~$ sudo opta-check
    12. Repeat steps 8 through 11 for each additional worker node in your cluster.

    13. Install and activate the NetQ software using the CLI:

    Run the following command on your master node to initialize the cluster. Copy the output of the command to use on your worker nodes:

    cumulus@<hostname>:~$ netq install cluster master-init
        Please run the following command on all worker nodes:
        netq install cluster worker-init c3NoLXJzYSBBQUFBQjNOemFDMXljMkVBQUFBREFRQUJBQUFCQVFDM2NjTTZPdVVUWWJ5c2Q3NlJ4SHdseHBsOHQ4N2VMRWVGR05LSWFWVnVNcy94OEE4RFNMQVhKOHVKRjVLUXBnVjdKM2lnMGJpL2hDMVhmSVVjU3l3ZmhvVDVZM3dQN1oySVZVT29ZTi8vR1lOek5nVlNocWZQMDNDRW0xNnNmSzVvUWRQTzQzRFhxQ3NjbndIT3dwZmhRYy9MWTU1a
    

    Run the netq install cluster worker-init <ssh-key> on each of your worker nodes.

    Run the following commands on your master node, using the IP addresses of your worker nodes and the HA cluster virtual IP address (VIP):

    The HA cluster virtual IP must be allocated from the same subnet used for your master and worker nodes.

    cumulus@<hostname>:~$ netq install cluster full interface eth0 bundle /mnt/installables/NetQ-4.8.0.tgz workers <worker-1-ip> <worker-2-ip> cluster-vip <vip-ip>

    You can specify the IP address instead of the interface name here: use ip-addr <IP address> in place of interface <ifname> above.

    If you have changed the IP address or hostname of the NetQ On-premises VM after this step, you need to re-register this address with NetQ as follows:

    Reset the VM, indicating whether you want to purge any NetQ DB data or keep it.

    cumulus@hostname:~$ netq bootstrap reset [purge-db|keep-db]

    Re-run the install CLI on the appliance. This example uses interface eth0. Replace this with your updated IP address, hostname or interface using the interface or ip-addr option.

    cumulus@hostname:~$ netq install cluster full interface eth0 bundle /mnt/installables/NetQ-4.8.0.tgz workers <worker-1-ip> <worker-2-ip> cluster-vip <vip-ip>

    If this step fails for any reason, you can run netq bootstrap reset and then try again.

    Verify Installation Status

    To view the status of the installation, use the netq show status [verbose] command. The following example shows a successful on-premises installation:

    State: Active
        NetQ Live State: Active
        Installation Status: FINISHED
        Version: 4.8.0
        Installer Version: 4.8.0
        Installation Type: Cluster
        Activation Key: EhVuZXRxLWVuZHBvaW50LWdhdGV3YXkYsagDIixPSUJCOHBPWUFnWXI2dGlGY2hTRzExR2E5aSt6ZnpjOUvpVVTaDdpZEhFPQ==
        Master SSH Public Key: c3NoLXJzYSBBQUFBQjNOemFDMXljMkVBQUFBREFRQUJBQUFCZ1FDNW9iVXB6RkczNkRC
        Is Cloud: False
        
        Kubernetes Cluster Nodes Status:
        IP Address    Hostname     Role    NodeStatus    Virtual IP
        ------------  -----------  ------  ------------  ------------
        10.213.7.52   10.213.7.52  Worker  Ready         10.213.7.53
        10.213.7.51   10.213.7.51  Worker  Ready         10.213.7.53
        10.213.7.49   10.213.7.49  Master  Ready         10.213.7.53
        
        In Summary, Live state of the NetQ is... Active

    Run the netq show opta-health command to verify all applications are operating properly. Allow 10-15 minutes for all applications to come up and report their status.

    cumulus@hostname:~$ netq show opta-health
        Application                                            Status    Namespace      Restarts    Timestamp
        -----------------------------------------------------  --------  -------------  ----------  ------------------------
        cassandra-rc-0-w7h4z                                   READY     default        0           Fri Apr 10 16:08:38 2020
        cp-schema-registry-deploy-6bf5cbc8cc-vwcsx             READY     default        0           Fri Apr 10 16:08:38 2020
        kafka-broker-rc-0-p9r2l                                READY     default        0           Fri Apr 10 16:08:38 2020
        kafka-connect-deploy-7799bcb7b4-xdm5l                  READY     default        0           Fri Apr 10 16:08:38 2020
        netq-api-gateway-deploy-55996ff7c8-w4hrs               READY     default        0           Fri Apr 10 16:08:38 2020
        netq-app-address-deploy-66776ccc67-phpqk               READY     default        0           Fri Apr 10 16:08:38 2020
        netq-app-admin-oob-mgmt-server                         READY     default        0           Fri Apr 10 16:08:38 2020
        netq-app-bgp-deploy-7dd4c9d45b-j9bfr                   READY     default        0           Fri Apr 10 16:08:38 2020
        netq-app-clagsession-deploy-69564895b4-qhcpr           READY     default        0           Fri Apr 10 16:08:38 2020
        netq-app-configdiff-deploy-ff54c4cc4-7rz66             READY     default        0           Fri Apr 10 16:08:38 2020
        ...
        

    If any of the applications or services display Status as DOWN after 30 minutes, open a support ticket and attach the output of the opta-support command.

    After NetQ is installed, you can log in to NetQ from your browser.

    Set Up Your KVM Virtual Machine for an On-premises Server Cluster

    First configure the VM on the master node, and then configure the VM on each worker node.

    Follow these steps to set up and configure your VM on a cluster of servers in an on-premises deployment:

    1. Verify that each node in your cluster—the master node and two worker nodes—meets the VM requirements.

      Resource Minimum Requirements
      ProcessorSixteen (16) virtual CPUs
      Memory64 GB RAM
      Local disk storage500 GB SSD with minimum disk IOPS of 1000 for a standard 4kb block size
      (Note: This must be an SSD; use of other storage options can lead to system instability and are not supported.)
      Network interface speed 1 Gb NIC
      HypervisorKVM/QCOW (QEMU Copy on Write) image for servers running CentOS, Ubuntu, and RedHat operating systems
    2. Confirm that the required ports are open for communications.

      You must open the following ports on your NetQ on-premises servers:
      Port or Protocol Number Protocol Component Access
      4 IP Protocol Calico networking (IP-in-IP Protocol)
      22 TCP SSH
      80 TCP Nginx
      179 TCP Calico networking (BGP)
      443 TCP NetQ UI
      2379 TCP etcd datastore
      4789 UDP Calico networking (VxLAN)
      5000 TCP Docker registry
      6443 TCP kube-apiserver
      30001 TCP DPU communication
      31980 TCP NetQ Agent communication
      31982 TCP NetQ Agent SSL communication
      32708 TCP API Gateway
      Additionally, for internal cluster communication, you must open these ports:
      Port Protocol Component Access
      8080 TCP Admin API
      5000 TCP Docker registry
      6443 TCP Kubernetes API server
      10250 TCP kubelet health probe
      2379 TCP etcd
      2380 TCP etcd
      7072 TCP Kafka JMX monitoring
      9092 TCP Kafka client
      7071 TCP Cassandra JMX monitoring
      7000 TCP Cassandra cluster communication
      9042 TCP Cassandra client
      7073 TCP Zookeeper JMX monitoring
      2888 TCP Zookeeper cluster communication
      3888 TCP Zookeeper cluster communication
      2181 TCP Zookeeper client
      36443 TCP Kubernetes control plane
    3. Download the NetQ Platform image.

      1. On the NVIDIA Application Hub, log in to your account.
      2. Select NVIDIA Licensing Portal.
      3. Select Software Downloads from the menu.
      4. Click Product Family and select NetQ.
      5. Locate the NetQ SW 4.8 KVM image and select Download.
      6. If prompted, read the license agreement and proceed with the download.

      For enterprise customers, if you do not see a link to the NVIDIA Licensing Portal on the NVIDIA Application Hub, contact NVIDIA support.


      For NVIDIA employees, download NetQ directly from the NVIDIA Licensing Portal.

    4. Set up and configure your VM.

      KVM Example Configuration

      This example shows the VM setup process for a system with Libvirt and KVM/QEMU installed.

      1. Confirm that the SHA256 checksum matches the one posted on the NVIDIA Application Hub to ensure the image download has not been corrupted.

        $ sha256sum ./Downloads/netq-4.8.0-ubuntu-20.04-ts-qemu.qcow2
        $ 0A00383666376471A8190E2367B27068B81D6EE00FDE885C68F4E3B3025A00B6 ./Downloads/netq-4.8.0-ubuntu-20.04-ts-qemu.qcow2
      2. Copy the QCOW2 image to a directory where you want to run it.

        Tip: Copy, instead of moving, the original QCOW2 image that was downloaded to avoid re-downloading it again later should you need to perform this process again.

        $ sudo mkdir /vms
        $ sudo cp ./Downloads/netq-4.8.0-ubuntu-20.04-ts-qemu.qcow2 /vms/ts.qcow2
      3. Create the VM.

        For a Direct VM, where the VM uses a MACVLAN interface to sit on the host interface for its connectivity:

        $ virt-install --name=netq_ts --vcpus=16 --memory=65536 --os-type=linux --os-variant=generic --disk path=/vms/ts.qcow2,format=qcow2,bus=virtio,cache=none --network=type=direct,source=eth0,model=virtio --import --noautoconsole

        Replace the disk path value with the location where the QCOW2 image is to reside. Replace network model value (eth0 in the above example) with the name of the interface where the VM is connected to the external network.

        Or, for a Bridged VM, where the VM attaches to a bridge which has already been setup to allow for external access:

        $ virt-install --name=netq_ts --vcpus=16 --memory=65536 --os-type=linux --os-variant=generic \ --disk path=/vms/ts.qcow2,format=qcow2,bus=virtio,cache=none --network=bridge=br0,model=virtio --import --noautoconsole

        Replace network bridge value (br0 in the above example) with the name of the (pre-existing) bridge interface where the VM is connected to the external network.

        Make note of the name used during install as this is needed in a later step.

      4. Watch the boot process in another terminal window.
        $ virsh console netq_ts
    5. Log in to the VM and change the password.

      Use the default credentials to log in the first time:

      • Username: cumulus
      • Password: cumulus
      $ ssh cumulus@<ipaddr>
      Warning: Permanently added '<ipaddr>' (ECDSA) to the list of known hosts.
      Ubuntu 20.04 LTS
      cumulus@<ipaddr>'s password:
      You are required to change your password immediately (root enforced)
      System information as of Thu Dec  3 21:35:42 UTC 2020
      System load:  0.09              Processes:           120
      Usage of /:   8.1% of 61.86GB   Users logged in:     0
      Memory usage: 5%                IP address for eth0: <ipaddr>
      Swap usage:   0%
      WARNING: Your password has expired.
      You must change your password now and login again!
      Changing password for cumulus.
      (current) UNIX password: cumulus
      Enter new UNIX password:
      Retype new UNIX password:
      passwd: password updated successfully
      Connection to <ipaddr> closed.
      

      Log in again with your new password.

      $ ssh cumulus@<ipaddr>
      Warning: Permanently added '<ipaddr>' (ECDSA) to the list of known hosts.
      Ubuntu 20.04 LTS
      cumulus@<ipaddr>'s password:
        System information as of Thu Dec  3 21:35:59 UTC 2020
        System load:  0.07              Processes:           121
        Usage of /:   8.1% of 61.86GB   Users logged in:     0
        Memory usage: 5%                IP address for eth0: <ipaddr>
        Swap usage:   0%
      Last login: Thu Dec  3 21:35:43 2020 from <local-ipaddr>
      cumulus@ubuntu:~$
      
    6. Verify the master node is ready for installation. Fix any errors indicated before installing the NetQ software.

      cumulus@hostname:~$ sudo opta-check
    7. Change the hostname for the VM from the default value.

      The default hostname for the NetQ Virtual Machines is ubuntu. Change the hostname to fit your naming conventions while meeting Internet and Kubernetes naming standards.

      Kubernetes requires that hostnames are composed of a sequence of labels concatenated with dots. For example, “en.wikipedia.org” is a hostname. Each label must be from 1 to 63 characters long. The entire hostname, including the delimiting dots, has a maximum of 253 ASCII characters.

      The Internet standards (RFCs) for protocols specify that labels may contain only the ASCII letters a through z (in lower case), the digits 0 through 9, and the hyphen-minus character ('-').

      Use the following command:

      cumulus@hostname:~$ sudo hostnamectl set-hostname NEW_HOSTNAME

      Add the same NEW_HOSTNAME value to /etc/hosts on your VM for the localhost entry. Example:

      127.0.0.1 localhost NEW_HOSTNAME
    8. Verify that your first worker node meets the VM requirements, as described in step 1.

    9. Confirm that the required ports are open for communications, as described in step 2.

    10. Open your hypervisor and set up the VM in the same manner as for the master node.

      Make a note of the private IP address you assign to the worker node. You need it for later installation steps.

    11. Verify the worker node is ready for installation. Fix any errors indicated before installing the NetQ software.

      cumulus@hostname:~$ sudo opta-check
    12. Repeat steps 8 through 11 for each additional worker node in your cluster.

    13. Install and activate the NetQ software using the CLI:

    Run the following command on your master node to initialize the cluster. Copy the output of the command to use on your worker nodes:

    cumulus@<hostname>:~$ netq install cluster master-init
        Please run the following command on all worker nodes:
        netq install cluster worker-init c3NoLXJzYSBBQUFBQjNOemFDMXljMkVBQUFBREFRQUJBQUFCQVFDM2NjTTZPdVVUWWJ5c2Q3NlJ4SHdseHBsOHQ4N2VMRWVGR05LSWFWVnVNcy94OEE4RFNMQVhKOHVKRjVLUXBnVjdKM2lnMGJpL2hDMVhmSVVjU3l3ZmhvVDVZM3dQN1oySVZVT29ZTi8vR1lOek5nVlNocWZQMDNDRW0xNnNmSzVvUWRQTzQzRFhxQ3NjbndIT3dwZmhRYy9MWTU1a
    

    Run the netq install cluster worker-init <ssh-key> on each of your worker nodes.

    Run the following commands on your master node, using the IP addresses of your worker nodes:

    cumulus@<hostname>:~$ netq install cluster full interface eth0 bundle /mnt/installables/NetQ-4.8.0.tgz workers <worker-1-ip> <worker-2-ip>

    You can specify the IP address instead of the interface name here: use ip-addr <IP address> in place of interface <ifname> above.

    If you have changed the IP address or hostname of the NetQ On-premises VM after this step, you need to re-register this address with NetQ as follows:

    Reset the VM, indicating whether you want to purge any NetQ DB data or keep it.

    cumulus@hostname:~$ netq bootstrap reset [purge-db|keep-db]

    Re-run the install CLI on the appliance. This example uses interface eth0. Replace this with your updated IP address, hostname or interface using the interface or ip-addr option.

    cumulus@hostname:~$ netq install cluster full interface eth0 bundle /mnt/installables/NetQ-4.8.0.tgz workers <worker-1-ip> <worker-2-ip>

    If this step fails for any reason, you can run netq bootstrap reset and then try again.

    Verify Installation Status

    To view the status of the installation, use the netq show status [verbose] command. The following example shows a successful on-premises installation:

    State: Active
        Version: 4.8.0
        Installer Version: 4.8.0
        Installation Type: Standalone
        Activation Key: PKrgipMGEhVuZXRxLWVuZHBvaW50LWdhdGV3YXkYsagDIixUQmFLTUhzZU80RUdTL3pOT01uQ2lnRnrrUhTbXNPUGRXdnUwTVo5SEpBPTIHZGVmYXVsdDoHbmV0cWRldgz=
        Master SSH Public Key: a3NoLXJzYSBBQUFBQjNOemFDMXljMkVBQUFBREFRQUJBQUFCQVFEazliekZDblJUajkvQVhOZ0hteXByTzZIb3Y2cVZBWFdsNVNtKzVrTXo3dmMrcFNZTGlOdWl1bEhZeUZZVDhSNmU3bFdqS3NrSE10bzArNFJsQVd6cnRvbVVzLzlLMzQ4M3pUMjVZQXpIU2N1ZVhBSE1TdTZHZ0JyUkpXYUpTNjJ2RTkzcHBDVjBxWWJvUFo3aGpCY3ozb0VVWnRsU1lqQlZVdjhsVjBNN3JEWW52TXNGSURWLzJ2eks3K0x2N01XTG5aT054S09hdWZKZnVOT0R4YjFLbk1mN0JWK3hURUpLWW1mbTY1ckoyS1ArOEtFUllrr5TkF3bFVRTUdmT3daVHF2RWNoZnpQajMwQ29CWDZZMzVST2hDNmhVVnN5OEkwdjVSV0tCbktrWk81MWlMSDAyZUpJbXJHUGdQa2s1SzhJdGRrQXZISVlTZ0RwRlpRb3Igcm9vdEBucXRzLTEwLTE4OC00NC0xNDc=
        Is Cloud: False
        
        Cluster Status:
        IP Address     Hostname       Role    Status
        -------------  -------------  ------  --------
        10.188.44.147  10.188.44.147  Role    Ready
        
        NetQ... Active
        

    Run the netq show opta-health command to verify all applications are operating properly. Allow 10-15 minutes for all applications to come up and report their status.

    cumulus@hostname:~$ netq show opta-health
        Application                                            Status    Namespace      Restarts    Timestamp
        -----------------------------------------------------  --------  -------------  ----------  ------------------------
        cassandra-rc-0-w7h4z                                   READY     default        0           Fri Apr 10 16:08:38 2020
        cp-schema-registry-deploy-6bf5cbc8cc-vwcsx             READY     default        0           Fri Apr 10 16:08:38 2020
        kafka-broker-rc-0-p9r2l                                READY     default        0           Fri Apr 10 16:08:38 2020
        kafka-connect-deploy-7799bcb7b4-xdm5l                  READY     default        0           Fri Apr 10 16:08:38 2020
        netq-api-gateway-deploy-55996ff7c8-w4hrs               READY     default        0           Fri Apr 10 16:08:38 2020
        netq-app-address-deploy-66776ccc67-phpqk               READY     default        0           Fri Apr 10 16:08:38 2020
        netq-app-admin-oob-mgmt-server                         READY     default        0           Fri Apr 10 16:08:38 2020
        netq-app-bgp-deploy-7dd4c9d45b-j9bfr                   READY     default        0           Fri Apr 10 16:08:38 2020
        netq-app-clagsession-deploy-69564895b4-qhcpr           READY     default        0           Fri Apr 10 16:08:38 2020
        netq-app-configdiff-deploy-ff54c4cc4-7rz66             READY     default        0           Fri Apr 10 16:08:38 2020
        ...
        

    If any of the applications or services display Status as DOWN after 30 minutes, open a support ticket and attach the output of the opta-support command.

    After NetQ is installed, you can log in to NetQ from your browser.

    Set Up Your KVM Virtual Machine for a Cloud Server Cluster

    First configure the VM on the master node, and then configure the VM on each worker node.

    Follow these steps to set up and configure your VM on a cluster of servers in a cloud deployment:

    1. Verify that each node in your cluster—the master node and two worker nodes—meets the VM requirements.

      Resource Minimum Requirements
      Processor Four (4) virtual CPUs
      Memory8 GB RAM
      Local disk storage 64 GB
      Network interface speed 1 Gb NIC
      HypervisorKVM/QCOW (QEMU Copy on Write) image for servers running CentOS, Ubuntu, and RedHat operating systems
    2. Confirm that the required ports are open for communications. The OPTA must be able to initiate HTTPS connections (destination TCP port 443) to the netq.nvidia.com domain (*.netq.nvidia.com). You must also open the following ports on your NetQ OPTA:

      Port or Protocol Number Protocol Component Access
      4 IP Protocol Calico networking (IP-in-IP Protocol)
      22 TCP SSH
      80 TCP Nginx
      179 TCP Calico networking (BGP)
      443 TCP Nginx
      2379 TCP etcd datastore
      4789 UDP Calico networking (VxLAN)
      5000 TCP Docker registry
      6443 TCP kube-apiserver
      31980 TCP NetQ Agent communication
      31982 TCP NetQ Agent SSL communication
      32708 TCP API Gateway
      The following ports are used for internal cluster communication and must also be open between servers in your cluster:

      Port Protocol Component Access
      8080 TCP Admin API
      5000 TCP Docker registry
      6443 TCP Kubernetes API server
      10250 TCP kubelet health probe
      2379 TCP etcd
      2380 TCP etcd
      36443 TCP Kubernetes control plane

    3. Download the NetQ Platform image.

      1. On the NVIDIA Application Hub, log in to your account.
      2. Select NVIDIA Licensing Portal.
      3. Select Software Downloads from the menu.
      4. Click Product Family and select NetQ.
      5. Locate the NetQ SW 4.8 KVM Cloud image and select Download.
      6. If prompted, read the license agreement and proceed with the download.

      For enterprise customers, if you do not see a link to the NVIDIA Licensing Portal on the NVIDIA Application Hub, contact NVIDIA support.


      For NVIDIA employees, download NetQ directly from the NVIDIA Licensing Portal.

    4. Set up and configure your VM.

      KVM Example Configuration

      This example shows the VM setup process for a system with Libvirt and KVM/QEMU installed.

      1. Confirm that the SHA256 checksum matches the one posted on the NVIDIA Application Hub to ensure the image download has not been corrupted.

        $ sha256sum ./Downloads/netq-4.8.0-ubuntu-20.04-tscloud-qemu.qcow2
        $ FE353FC06D3F843F4041D74C853D38B0A56036C5886F6233A3ED1A9464AEB783 ./Downloads/netq-4.8.0-ubuntu-20.04-tscloud-qemu.qcow2
      2. Copy the QCOW2 image to a directory where you want to run it.

        Tip: Copy, instead of moving, the original QCOW2 image that was downloaded to avoid re-downloading it again later should you need to perform this process again.

        $ sudo mkdir /vms
        $ sudo cp ./Downloads/netq-4.8.0-ubuntu-20.04-tscloud-qemu.qcow2 /vms/ts.qcow2
      3. Create the VM.

        For a Direct VM, where the VM uses a MACVLAN interface to sit on the host interface for its connectivity:

        $ virt-install --name=netq_ts --vcpus=4 --memory=8192 --os-type=linux --os-variant=generic --disk path=/vms/ts.qcow2,format=qcow2,bus=virtio,cache=none --network=type=direct,source=eth0,model=virtio --import --noautoconsole

        Replace the disk path value with the location where the QCOW2 image is to reside. Replace network model value (eth0 in the above example) with the name of the interface where the VM is connected to the external network.

        Or, for a Bridged VM, where the VM attaches to a bridge which has already been setup to allow for external access:

        $ virt-install --name=netq_ts --vcpus=4 --memory=8192 --os-type=linux --os-variant=generic \ --disk path=/vms/ts.qcow2,format=qcow2,bus=virtio,cache=none --network=bridge=br0,model=virtio --import --noautoconsole

        Replace network bridge value (br0 in the above example) with the name of the (pre-existing) bridge interface where the VM is connected to the external network.

        Make note of the name used during install as this is needed in a later step.

      4. Watch the boot process in another terminal window.
        $ virsh console netq_ts
    5. Log in to the VM and change the password.

      Use the default credentials to log in the first time:

      • Username: cumulus
      • Password: cumulus
      $ ssh cumulus@<ipaddr>
      Warning: Permanently added '<ipaddr>' (ECDSA) to the list of known hosts.
      Ubuntu 20.04 LTS
      cumulus@<ipaddr>'s password:
      You are required to change your password immediately (root enforced)
      System information as of Thu Dec  3 21:35:42 UTC 2020
      System load:  0.09              Processes:           120
      Usage of /:   8.1% of 61.86GB   Users logged in:     0
      Memory usage: 5%                IP address for eth0: <ipaddr>
      Swap usage:   0%
      WARNING: Your password has expired.
      You must change your password now and login again!
      Changing password for cumulus.
      (current) UNIX password: cumulus
      Enter new UNIX password:
      Retype new UNIX password:
      passwd: password updated successfully
      Connection to <ipaddr> closed.
      

      Log in again with your new password.

      $ ssh cumulus@<ipaddr>
      Warning: Permanently added '<ipaddr>' (ECDSA) to the list of known hosts.
      Ubuntu 20.04 LTS
      cumulus@<ipaddr>'s password:
        System information as of Thu Dec  3 21:35:59 UTC 2020
        System load:  0.07              Processes:           121
        Usage of /:   8.1% of 61.86GB   Users logged in:     0
        Memory usage: 5%                IP address for eth0: <ipaddr>
        Swap usage:   0%
      Last login: Thu Dec  3 21:35:43 2020 from <local-ipaddr>
      cumulus@ubuntu:~$
      
    6. Verify the master node is ready for installation. Fix any errors indicated before installing the NetQ software.

      cumulus@hostname:~$ sudo opta-check-cloud
    7. Change the hostname for the VM from the default value.

      The default hostname for the NetQ Virtual Machines is ubuntu. Change the hostname to fit your naming conventions while meeting Internet and Kubernetes naming standards.

      Kubernetes requires that hostnames are composed of a sequence of labels concatenated with dots. For example, “en.wikipedia.org” is a hostname. Each label must be from 1 to 63 characters long. The entire hostname, including the delimiting dots, has a maximum of 253 ASCII characters.

      The Internet standards (RFCs) for protocols specify that labels may contain only the ASCII letters a through z (in lower case), the digits 0 through 9, and the hyphen-minus character ('-').

      Use the following command:

      cumulus@hostname:~$ sudo hostnamectl set-hostname NEW_HOSTNAME

      Add the same NEW_HOSTNAME value to /etc/hosts on your VM for the localhost entry. Example:

      127.0.0.1 localhost NEW_HOSTNAME
    8. Verify that your first worker node meets the VM requirements, as described in step 1.

    9. Confirm that the required ports are open for communications, as described in step 2.

    10. Open your hypervisor and set up the VM in the same manner as for the master node.

      Make a note of the private IP address you assign to the worker node. You need it for later installation steps.

    11. Verify the worker node is ready for installation. Fix any errors indicated before installing the NetQ software.

      cumulus@hostname:~$ sudo opta-check-cloud
    12. Repeat steps 8 through 11 for each additional worker node in your cluster.

    13. Install and activate the NetQ software using the CLI:

    Run the following command on your master node to initialize the cluster. Copy the output of the command to use on your worker nodes:

    cumulus@<hostname>:~$ netq install cluster master-init
        Please run the following command on all worker nodes:
        netq install cluster worker-init c3NoLXJzYSBBQUFBQjNOemFDMXljMkVBQUFBREFRQUJBQUFCQVFDM2NjTTZPdVVUWWJ5c2Q3NlJ4SHdseHBsOHQ4N2VMRWVGR05LSWFWVnVNcy94OEE4RFNMQVhKOHVKRjVLUXBnVjdKM2lnMGJpL2hDMVhmSVVjU3l3ZmhvVDVZM3dQN1oySVZVT29ZTi8vR1lOek5nVlNocWZQMDNDRW0xNnNmSzVvUWRQTzQzRFhxQ3NjbndIT3dwZmhRYy9MWTU1a
        

    Run the netq install cluster worker-init <ssh-key> on each of your worker nodes.

    Run the following command on your NetQ cloud appliance with the config-key obtained from the email you received from NVIDIA titled NetQ Access Link. You can also obtain the configuration key through the NetQ UI.

    cumulus@<hostname>:~$ netq install opta cluster full interface eth0 bundle /mnt/installables/NetQ-4.8.0-opta.tgz config-key <your-config-key> workers <worker-1-ip> <worker-2-ip> [proxy-host <proxy-hostname> proxy-port <proxy-port>]
        

    You can specify the IP address instead of the interface name here: use ip-addr <IP address> in place of interface <ifname> above.

    If you have changed the IP address or hostname of the NetQ OPTA after this step, you need to re-register this address with NetQ as follows:

    Reset the VM:

    cumulus@hostname:~$ netq bootstrap reset

    Re-run the install CLI on the appliance. This example uses interface eth0. Replace this with your updated IP address, hostname or interface using the interface or ip-addr option.

    cumulus@hostname:~$ netq install opta cluster full interface eth0 bundle /mnt/installables/NetQ-4.8.0-opta.tgz config-key <your-config-key> workers <worker-1-ip> <worker-2-ip> [proxy-host  proxy-port ]

    If this step fails for any reason, you can run netq bootstrap reset and then try again.

    Consider the following for container environments, and make adjustments as needed.

    Calico Networking

    NetQ overrides the Calico default address range and changes it to 10.244.0.0/16. To modify this range, use the netq install opta command, specifying the default address range with the pod-ip-range option. For example:

    cumulus@hostname:~$ netq install opta standalone full interface eth0 bundle /mnt/installables/NetQ-4.8.0-opta.tgz config-key <your-config-key> pod-ip-range 10.255.0.0/16

    Docker Default Bridge Interface

    The default Docker bridge interface is disabled in NetQ. If you need to reenable the interface, contact support.

    Verify Installation Status

    To view the status of the installation, use the netq show status [verbose] command. The following example shows a successful on-premises installation:

    State: Active
        Version: 4.8.0
        Installer Version: 4.8.0
        Installation Type: Standalone
        Activation Key: PKrgipMGEhVuZXRxLWVuZHBvaW50LWdhdGV3YXkYsagDIixUQmFLTUhzZU80RUdTL3pOT01uQ2lnRnrrUhTbXNPUGRXdnUwTVo5SEpBPTIHZGVmYXVsdDoHbmV0cWRldgz=
        Master SSH Public Key: a3NoLXJzYSBBQUFBQjNOemFDMXljMkVBQUFBREFRQUJBQUFCQVFEazliekZDblJUajkvQVhOZ0hteXByTzZIb3Y2cVZBWFdsNVNtKzVrTXo3dmMrcFNZTGlOdWl1bEhZeUZZVDhSNmU3bFdqS3NrSE10bzArNFJsQVd6cnRvbVVzLzlLMzQ4M3pUMjVZQXpIU2N1ZVhBSE1TdTZHZ0JyUkpXYUpTNjJ2RTkzcHBDVjBxWWJvUFo3aGpCY3ozb0VVWnRsU1lqQlZVdjhsVjBNN3JEWW52TXNGSURWLzJ2eks3K0x2N01XTG5aT054S09hdWZKZnVOT0R4YjFLbk1mN0JWK3hURUpLWW1mbTY1ckoyS1ArOEtFUllrr5TkF3bFVRTUdmT3daVHF2RWNoZnpQajMwQ29CWDZZMzVST2hDNmhVVnN5OEkwdjVSV0tCbktrWk81MWlMSDAyZUpJbXJHUGdQa2s1SzhJdGRrQXZISVlTZ0RwRlpRb3Igcm9vdEBucXRzLTEwLTE4OC00NC0xNDc=
        Is Cloud: False
        
        Cluster Status:
        IP Address     Hostname       Role    Status
        -------------  -------------  ------  --------
        10.188.44.147  10.188.44.147  Role    Ready
        
        NetQ... Active
        

    Run the netq show opta-health command to verify all applications are operating properly. Allow 10-15 minutes for all applications to come up and report their status.

    cumulus@hostname:~$ netq show opta-health
        Application                                            Status    Namespace      Restarts    Timestamp
        -----------------------------------------------------  --------  -------------  ----------  ------------------------
        cassandra-rc-0-w7h4z                                   READY     default        0           Fri Apr 10 16:08:38 2020
        cp-schema-registry-deploy-6bf5cbc8cc-vwcsx             READY     default        0           Fri Apr 10 16:08:38 2020
        kafka-broker-rc-0-p9r2l                                READY     default        0           Fri Apr 10 16:08:38 2020
        kafka-connect-deploy-7799bcb7b4-xdm5l                  READY     default        0           Fri Apr 10 16:08:38 2020
        netq-api-gateway-deploy-55996ff7c8-w4hrs               READY     default        0           Fri Apr 10 16:08:38 2020
        netq-app-address-deploy-66776ccc67-phpqk               READY     default        0           Fri Apr 10 16:08:38 2020
        netq-app-admin-oob-mgmt-server                         READY     default        0           Fri Apr 10 16:08:38 2020
        netq-app-bgp-deploy-7dd4c9d45b-j9bfr                   READY     default        0           Fri Apr 10 16:08:38 2020
        netq-app-clagsession-deploy-69564895b4-qhcpr           READY     default        0           Fri Apr 10 16:08:38 2020
        netq-app-configdiff-deploy-ff54c4cc4-7rz66             READY     default        0           Fri Apr 10 16:08:38 2020
        ...
        

    If any of the applications or services display Status as DOWN after 30 minutes, open a support ticket and attach the output of the opta-support command.

    After NetQ is installed, you can log in to NetQ from your browser.

    Install NetQ Agents

    After installing the NetQ software, you should install the NetQ Agents on each switch you want to monitor. You can install NetQ Agents on switches and servers running:

    Prepare for NetQ Agent Installation

    For switches running Cumulus Linux and SONiC, you need to:

    For servers running RHEL, CentOS, or Ubuntu, you need to:

    If your network uses a proxy server for external connections, you should first configure a global proxy so apt-get can access the software package in the NVIDIA networking repository.

    Verify NTP Is Installed and Configured

    Verify that NTP is running on the switch as outlined in the steps below. The switch system clock must be synchronized with the NetQ appliance to enable useful statistical analysis. Alternatively, you can configure PTP for time synchronization.

    cumulus@switch:~$ sudo systemctl status ntp
    [sudo] password for cumulus:
    ● ntp.service - LSB: Start NTP daemon
            Loaded: loaded (/etc/init.d/ntp; bad; vendor preset: enabled)
            Active: active (running) since Fri 2018-06-01 13:49:11 EDT; 2 weeks 6 days ago
              Docs: man:systemd-sysv-generator(8)
            CGroup: /system.slice/ntp.service
                    └─2873 /usr/sbin/ntpd -p /var/run/ntpd.pid -g -c /var/lib/ntp/ntp.conf.dhcp -u 109:114
    

    If NTP is not installed, install and configure it before continuing.

    If NTP is not running:

    • Verify the IP address or hostname of the NTP server in the /etc/ntp.conf file, and then
    • Reenable and start the NTP service using the systemctl [enable|start] ntp commands

    If you are running NTP in your out-of-band management network with VRF, specify the VRF (ntp@<vrf-name> versus just ntp) in the above commands.

    Obtain NetQ Agent Software Package

    Cumulus Linux 4.4 and later includes the netq-agent package by default. To upgrade the NetQ Agent to the latest version:

    1. Add the repository by uncommenting or adding the following line in /etc/apt/sources.list:
    cumulus@switch:~$ sudo nano /etc/apt/sources.list
    ...
    deb https://apps3.cumulusnetworks.com/repos/deb CumulusLinux-4 netq-latest
    ...
    

    You can specify a NetQ Agent version in the repository configuration. The following example shows the repository configuration to retrieve NetQ Agent 4.3:

    deb https://apps3.cumulusnetworks.com/repos/deb CumulusLinux-4 netq-4.3

    1. Add the apps3.cumulusnetworks.com authentication key to Cumulus Linux:
    cumulus@switch:~$ wget -qO - https://apps3.cumulusnetworks.com/setup/cumulus-apps-deb.pubkey | sudo apt-key add -
    

    Verify NTP Is Installed and Configured

    Verify that NTP is running on the switch as outlined in the steps below. The switch must be synchronized with the NetQ appliance to enable useful statistical analysis. Alternatively, you can configure PTP for time synchronization.

    admin@switch:~$ sudo systemctl status ntp
    ● ntp.service - Network Time Service
         Loaded: loaded (/lib/systemd/system/ntp.service; enabled; vendor preset: enabled)
         Active: active (running) since Tue 2021-06-08 14:56:16 UTC; 2min 18s ago
           Docs: man:ntpd(8)
        Process: 1444909 ExecStart=/usr/lib/ntp/ntp-systemd-wrapper (code=exited, status=0/SUCCESS)
       Main PID: 1444921 (ntpd)
          Tasks: 2 (limit: 9485)
         Memory: 1.9M
         CGroup: /system.slice/ntp.service
                 └─1444921 /usr/sbin/ntpd -p /var/run/ntpd.pid -x -u 106:112
    

    If NTP is not installed, install and configure it before continuing.

    If NTP is not running:

    • Verify the IP address or hostname of the NTP server in the /etc/sonic/config_db.json file, and then
    • Reenable and start the NTP service using the sudo config reload -n command

    Verify NTP is operating correctly. Look for an asterisk (*) or a plus sign (+) that indicates the clock synchronized with NTP.

    admin@switch:~$ show ntp
    MGMT_VRF_CONFIG is not present.
    synchronised to NTP server (104.194.8.227) at stratum 3
       time correct to within 2014 ms
       polling server every 64 s
         remote           refid      st t when poll reach   delay   offset  jitter
    ==============================================================================
    -144.172.118.20  139.78.97.128    2 u   26   64  377   47.023  -1798.1 120.803
    +208.67.75.242   128.227.205.3    2 u   32   64  377   72.050  -1939.3  97.869
    +216.229.4.66    69.89.207.99     2 u  160   64  374   41.223  -1965.9  83.585
    *104.194.8.227   164.67.62.212    2 u   33   64  377    9.180  -1934.4  97.376
    

    Obtain NetQ Agent Software Package

    To install the NetQ Agent you need to install netq-agent on each switch or host. This is available from the NVIDIA networking repository.

    Note that NetQ has a separate repository from SONiC.

    To obtain the NetQ Agent package:

    1. Install the wget utility so that you can install the GPG keys in step 3.

      admin@switch:~$ sudo apt-get update
      admin@switch:~$ sudo apt-get install wget -y
      
    2. Edit the /etc/apt/sources.list file to add the SONiC repository:

      admin@switch:~$ sudo vi /etc/apt/sources.list
      ...
      deb https://apps3.cumulusnetworks.com/repos/deb buster netq-latest
      ...
      
    3. Add the SONiC repo key:

      admin@switch:~$ sudo wget -qO - https://apps3.cumulusnetworks.com/setup/cumulus-apps-deb.pubkey | sudo apt-key add -
      

    Verify Service Package Versions

    Before you install the NetQ Agent on a Red Hat or CentOS server, make sure you install and run at least the minimum versions of the following packages:

    • iproute-3.10.0-54.el7_2.1.x86_64
    • lldpd-0.9.7-5.el7.x86_64
    • ntp-4.2.6p5-25.el7.centos.2.x86_64
    • ntpdate-4.2.6p5-25.el7.centos.2.x86_64

    Verify the Server is Running lldpd and wget

    Make sure you are running lldpd, not lldpad. CentOS does not include lldpd by default, nor does it include wget; however, the installation requires it.

    To install this package, run the following commands:

    root@rhel7:~# sudo yum -y install epel-release
    root@rhel7:~# sudo yum -y install lldpd
    root@rhel7:~# sudo systemctl enable lldpd.service
    root@rhel7:~# sudo systemctl start lldpd.service
    root@rhel7:~# sudo yum install wget
    

    Install and Configure NTP

    If NTP is not already installed and configured, follow the steps outlined below. Alternatively, you can configure PTP for time synchronization.

    1. Install NTP on the server. Servers must be synchronized with the NetQ appliance to enable useful statistical analysis.

      root@rhel7:~# sudo yum install ntp
      
    2. Configure the NTP server.

      1. Open the /etc/ntp.conf file in your text editor of choice.

      2. Under the Server section, specify the NTP server IP address or hostname.

    3. Enable and start the NTP service.

      root@rhel7:~# sudo systemctl enable ntp
      root@rhel7:~# sudo systemctl start ntp
      

    If you are running NTP in your out-of-band management network with VRF, specify the VRF (ntp@<vrf-name> versus just ntp) in the above commands.

    1. Verify NTP is operating correctly. Look for an asterisk (*) or a plus sign (+) that indicates the clock synchronized with NTP.

      root@rhel7:~# ntpq -pn
      remote           refid            st t when poll reach   delay   offset  jitter
      ==============================================================================
      +173.255.206.154 132.163.96.3     2 u   86  128  377   41.354    2.834   0.602
      +12.167.151.2    198.148.79.209   3 u  103  128  377   13.395   -4.025   0.198
      2a00:7600::41    .STEP.          16 u    - 1024    0    0.000    0.000   0.000
      \*129.250.35.250 249.224.99.213   2 u  101  128  377   14.588   -0.299   0.243
      

    Obtain NetQ Agent Software Package

    To install the NetQ Agent you need to install netq-agent on each switch or host. This is available from the NVIDIA networking repository.

    To obtain the NetQ Agent package:

    1. Reference and update the local yum repository.

      root@rhel7:~# sudo rpm --import https://apps3.cumulusnetworks.com/setup/cumulus-apps-rpm.pubkey
      root@rhel7:~# sudo wget -O- https://apps3.cumulusnetworks.com/setup/cumulus-apps-rpm-el7.repo > /etc/yum.repos.d/cumulus-host-el.repo
      
    2. Edit /etc/yum.repos.d/cumulus-host-el.repo to set the enabled=1 flag for the two NetQ repositories.

      root@rhel7:~# vi /etc/yum.repos.d/cumulus-host-el.repo
      ...
      [cumulus-arch-netq-4.0]
      name=Cumulus netq packages
      baseurl=https://apps3.cumulusnetworks.com/repos/rpm/el/7/netq-4.0/$basearch
      gpgcheck=1
      enabled=1
      [cumulus-noarch-netq-4.0]
      name=Cumulus netq architecture-independent packages
      baseurl=https://apps3.cumulusnetworks.com/repos/rpm/el/7/netq-4.0/noarch
      gpgcheck=1
      enabled=1
      ...
      

    Verify Service Package Versions

    Before you install the NetQ Agent on an Ubuntu server, make sure you install and run at least the minimum versions of the following packages:

    • iproute 1:4.3.0-1ubuntu3.16.04.1 all
    • iproute2 4.3.0-1ubuntu3 amd64
    • lldpd 0.7.19-1 amd64
    • ntp 1:4.2.8p4+dfsg-3ubuntu5.6 amd64

    Verify the Server is Running lldpd

    Make sure you are running lldpd, not lldpad. Ubuntu does not include lldpd by default; however, the installation requires it.

    To install this package, run the following commands:

    root@ubuntu:~# sudo apt-get update
    root@ubuntu:~# sudo apt-get install lldpd
    root@ubuntu:~# sudo systemctl enable lldpd.service
    root@ubuntu:~# sudo systemctl start lldpd.service
    

    Install and Configure Network Time Server

    If NTP is not already installed and configured, follow the steps below. Alternatively, you can configure PTP for time synchronization.

    1. Install NTP on the server, if not already installed. Servers must be synchronized with the NetQ appliance to enable useful statistical analysis.

      root@ubuntu:~# sudo apt-get install ntp
      
    2. Configure the network time server.

    1. Open the /etc/ntp.conf file in your text editor of choice.

    2. Under the Server section, specify the NTP server IP address or hostname.

    3. Enable and start the NTP service.

      root@ubuntu:~# sudo systemctl enable ntp
      root@ubuntu:~# sudo systemctl start ntp
      

    If you are running NTP in your out-of-band management network with VRF, specify the VRF (ntp@<vrf-name> versus just ntp) in the above commands.

    1. Verify NTP is operating correctly. Look for an asterisk (*) or a plus sign (+) that indicates the clock synchronized with NTP.

      root@ubuntu:~# ntpq -pn
      remote           refid            st t when poll reach   delay   offset  jitter
      ==============================================================================
      +173.255.206.154 132.163.96.3     2 u   86  128  377   41.354    2.834   0.602
      +12.167.151.2    198.148.79.209   3 u  103  128  377   13.395   -4.025   0.198
      2a00:7600::41    .STEP.          16 u    - 1024    0    0.000    0.000   0.000
      \*129.250.35.250 249.224.99.213   2 u  101  128  377   14.588   -0.299   0.243
      
    1. Install chrony if needed.
    root@ubuntu:~# sudo apt install chrony
    
    1. Start the chrony service.
    root@ubuntu:~# sudo /usr/local/sbin/chronyd
    
    1. Verify it installed successfully.
    root@ubuntu:~# chronyc activity
    200 OK
    8 sources online
    0 sources offline
    0 sources doing burst (return to online)
    0 sources doing burst (return to offline)
    0 sources with unknown address
    
    1. View the time servers which chrony is using.
    root@ubuntu:~# chronyc sources
    210 Number of sources = 8
    MS Name/IP address         Stratum Poll Reach LastRx Last sample
    ===============================================================================
    ^+ golem.canonical.com           2   6   377    39  -1135us[-1135us] +/-   98ms
    ^* clock.xmission.com            2   6   377    41  -4641ns[ +144us] +/-   41ms
    ^+ ntp.ubuntu.net              2   7   377   106   -746us[ -573us] +/-   41ms
    ...
    

    Open the chrony.conf configuration file (by default at /etc/chrony/) and edit if needed.

    Example with individual servers specified:

    server golem.canonical.com iburst
    server clock.xmission.com iburst
    server ntp.ubuntu.com iburst
    driftfile /var/lib/chrony/drift
    makestep 1.0 3
    rtcsync
    

    Example when using a pool of servers:

    pool pool.ntp.org iburst
    driftfile /var/lib/chrony/drift
    makestep 1.0 3
    rtcsync
    
    1. View the server chrony is currently tracking.
    root@ubuntu:~# chronyc tracking
    Reference ID    : 5BBD59C7 (golem.canonical.com)
    Stratum         : 3
    Ref time (UTC)  : Mon Feb 10 14:35:18 2020
    System time     : 0.0000046340 seconds slow of NTP time
    Last offset     : -0.000123459 seconds
    RMS offset      : 0.007654410 seconds
    Frequency       : 8.342 ppm slow
    Residual freq   : -0.000 ppm
    Skew            : 26.846 ppm
    Root delay      : 0.031207654 seconds
    Root dispersion : 0.001234590 seconds
    Update interval : 115.2 seconds
    Leap status     : Normal
    

    Obtain NetQ Agent Software Package

    To install the NetQ Agent you need to install netq-agent on each server. This is available from the NVIDIA networking repository.

    To obtain the NetQ Agent package:

    1. Reference and update the local apt repository.
    root@ubuntu:~# sudo wget -O- https://apps3.cumulusnetworks.com/setup/cumulus-apps-deb.pubkey | apt-key add -
    
    1. Add the Ubuntu repository:

    Create the file /etc/apt/sources.list.d/cumulus-host-ubuntu-bionic.list and add the following line:

    root@ubuntu:~# vi /etc/apt/sources.list.d/cumulus-apps-deb-bionic.list
    ...
    deb [arch=amd64] https://apps3.cumulusnetworks.com/repos/deb bionic netq-latest
    ...
    

    Create the file /etc/apt/sources.list.d/cumulus-host-ubuntu-focal.list and add the following line:

    root@ubuntu:~# vi /etc/apt/sources.list.d/cumulus-apps-deb-focal.list
    ...
    deb [arch=amd64] https://apps3.cumulusnetworks.com/repos/deb focal netq-latest
    ...
    

    The use of netq-latest in these examples means that a get to the repository always retrieves the latest version of NetQ, even for a major version update. If you want to keep the repository on a specific version — such as netq-4.4 — use that instead.

    Install NetQ Agent

    After completing the preparation steps, install the agent on your switch or host.

    Cumulus Linux 4.4 and later includes the netq-agent package by default. To install the NetQ Agent on earlier versions of Cumulus Linux:

    1. Update the local apt repository, then install the NetQ software on the switch.

      cumulus@switch:~$ sudo apt-get update
      cumulus@switch:~$ sudo apt-get install netq-agent
      
    2. Verify you have the correct version of the Agent.

      cumulus@switch:~$ dpkg-query -W -f '${Package}\t${Version}\n' netq-agent
      

      You should see version 4.8.0 and update 44 in the results.

        • netq-agent_4.8.0-cl4u44~1699077226.80e664937_armel.deb
        • netq-agent_4.8.0-cl4u44~1699245971.f796c0644_amd64.deb

    1. Restart rsyslog so it sends log files to the correct destination.

      cumulus@switch:~$ sudo systemctl restart rsyslog.service
      
    2. Configure the NetQ Agent, as described in the next section.

    To install the NetQ Agent (the following example uses Cumulus Linux but the steps are the same for SONiC):

    1. Update the local apt repository, then install the NetQ software on the switch.

      admin@switch:~$ sudo apt-get update
      admin@switch:~$ sudo apt-get install netq-agent
      
    2. Verify you have the correct version of the Agent.

      admin@switch:~$ dpkg-query -W -f '${Package}\t${Version}\n' netq-agent
      
    3. Restart rsyslog so it sends log files to the correct destination.

      admin@switch:~$ sudo systemctl restart rsyslog.service
      
    4. Configure the NetQ Agent, as described in the next section.

    To install the NetQ Agent:

    1. Install the Bash completion and NetQ packages on the server.

      root@rhel7:~# sudo yum -y install bash-completion
      root@rhel7:~# sudo yum install netq-agent
      
    2. Verify you have the correct version of the Agent.

      root@rhel7:~# rpm -qa | grep -i netq
      

      You should see version 4.8.0 and update 44 in the results.

      • netq-agent-4.8.0-rh7u44~1699074652.80e6649.x86_64.rpm
      1. Restart rsyslog so it sends log files to the correct destination.

        root@rhel7:~# sudo systemctl restart rsyslog
        
      2. Configure the NetQ Agent, as described in the next section.

      To install the NetQ Agent:

      1. Install the software packages on the server.

        root@ubuntu:~# sudo apt-get update
        root@ubuntu:~# sudo apt-get install netq-agent
        
      2. Verify you have the correct version of the Agent.

        root@ubuntu:~# dpkg-query -W -f '${Package}\t${Version}\n' netq-agent
        

        You should see version 4.8.0 and update 44 in the results.

        • Ubuntu 20.04: netq-agent_4.8.0-ub20.04u44~1699074936.80e664937_amd64.deb
        1. Restart rsyslog so it sends log files to the correct destination.
        root@ubuntu:~# sudo systemctl restart rsyslog.service
        
        1. Configure the NetQ Agent, as described in the next section.

        Configure NetQ Agent

        After you install the NetQ Agents on the switches you want to monitor, you must configure them to obtain useful and relevant data.

        The NetQ Agent is aware of and communicates through the designated VRF. If you do not specify one, it uses the default VRF (named default). If you later change the VRF configured for the NetQ Agent (using a lifecycle management configuration profile, for example), you might cause the NetQ Agent to lose communication.

        If you configure the NetQ Agent to communicate in a VRF that is not default or mgmt, the following line must be added to /etc/netq/netq.yml in the netq-agent section:

        netq-agent:
          netq_stream_address: 0.0.0.0
        

        Two methods are available for configuring a NetQ Agent:

        Configure NetQ Agents Using a Configuration File

        You can configure the NetQ Agent in the netq.yml configuration file contained in the /etc/netq/ directory.

        1. Open the netq.yml file using your text editor of choice. For example:

          sudo nano /etc/netq/netq.yml
          
        2. Locate the netq-agent section, or add it.

        3. Set the parameters for the agent as follows:

          • port: 31980 (default configuration)
          • server: IP address of the NetQ server where the agent should send its collected data
          • vrf: default (or one that you specify)
          • inband-interface: the interface used to reach your NetQ server and used by lifecycle management to connect to the switch (for deployments where switches are managed through an in-band interface)

          Your configuration should be similar to this:

          netq-agent:
              port: 31980
              server: 192.168.1.254
              vrf: mgmt
          

          For in-band deployments:

          netq-agent:
              inband-interface: swp1
              port: 31980
              server: 192.168.1.254
              vrf: default
          

        Configure NetQ Agents Using the NetQ CLI

        If you configured the NetQ CLI, you can use it to configure the NetQ Agent to send telemetry data to the NetQ appliance or VM. To configure the NetQ CLI, refer to Install NetQ CLI.

        If you intend to use a VRF for agent communication (recommended), refer to Configure the Agent to Use VRF. If you intend to specify a port for communication, refer to Configure the Agent to Communicate over a Specific Port.

        Use the following command to configure the NetQ Agent:

        sudo netq config add agent server <text-opta-ip> [port <text-opta-port>] [ssl true | ssl false] [ssl-cert <text-ssl-cert-file> | ssl-cert download] [vrf <text-vrf-name>] [inband-interface <interface-name>]
        

        This example uses a NetQ server IP address of 192.168.1.254, the default port, and the mgmt VRF for a switch managed through an out-of-band connection:

        sudo netq config add agent server 192.168.1.254 vrf mgmt
        Updated agent server 192.168.1.254 vrf mgmt. Please restart netq-agent (netq config restart agent).
        sudo netq config restart agent
        

        This example uses a NetQ server IP address of 192.168.1.254, the default port, and the default VRF for a switch managed through an in-band connection on interface swp1:

        sudo netq config add agent server 192.168.1.254 vrf default inband-interface swp1
        Updated agent server 192.168.1.254 vrf default. Please restart netq-agent (netq config restart agent).
        sudo netq config restart agent
        

        Configure Advanced NetQ Agent Settings

        A couple of additional options are available for configuring the NetQ Agent. If you are using VRFs, you can configure the agent to communicate over a specific VRF. You can also configure the agent to use a particular port.

        Configure the Agent to Use a VRF

        By default, NetQ uses the default VRF for communication between the NetQ appliance or VM and NetQ Agents. While optional, NVIDIA strongly recommends that you configure NetQ Agents to communicate with the NetQ appliance or VM only via a VRF, including a management VRF. To do so, you need to specify the VRF name when configuring the NetQ Agent. For example, if you configured the management VRF and you want the agent to communicate with the NetQ appliance or VM over it, configure the agent like this:

        sudo netq config add agent server 192.168.1.254 vrf mgmt
        sudo netq config restart agent
        

        If you later change the VRF configured for the NetQ Agent (using a lifecycle management configuration profile, for example), you might cause the NetQ Agent to lose communication.

        Configure the Agent to Communicate over a Specific Port

        By default, NetQ uses port 31980 for communication between the NetQ server and NetQ Agents for on-premises deployments and port 443 for cloud deployments. If you want the NetQ Agent to communicate with the NetQ sever via a different port, you need to specify the port number when configuring the NetQ Agent, like this:

        sudo netq config add agent server 192.168.1.254 port 7379
        sudo netq config restart agent
        

        Install On-switch OPTA

        Configure the On-switch OPTA

        Instead of installing a dedicated OPTA VM, you can enable the OPTA service on one or more switches in your environment to send data to the NetQ Cloud.

        On-switch OPTA (on-premises telemetry aggregator) is intended for use in small NetQ Cloud deployments where a dedicated OPTA VM might not be necessary. If you need help assessing the correct OPTA configuration for your deployment, contact your NVIDIA sales team.

        To configure a switch for OPTA functionality, install the netq-opta package. To obtain the package, add or uncomment the NetQ repository in /etc/apt/sources.list as needed:

        cumulus@switch:~$ sudo nano /etc/apt/sources.list
        ...
        deb https://apps3.cumulusnetworks.com/repos/deb CumulusLinux-4 netq-4.8
        ...
        

        You can use the deb https://apps3.cumulusnetworks.com/repos/deb CumulusLinux-4 netq-latest repository if you want to always retrieve the latest posted version of NetQ.

        After adding the repository, install the netq-opta package with the following commands:

        sudo apt-get update
        sudo apt-get install netq-opta
        

        After the netq-opta package is installed, add your OPTA configuration key. Run the following command with the config-key obtained from the email you received from NVIDIA titled NetQ Access Link. You can also obtain the configuration key through the NetQ UI in the premises management configuration.

        sudo netq config add opta config-key <config_key> [vrf <vrf_name>] [proxy-host <text-proxy-host> proxy-port <text-proxy-port>] 
        

        The VRF name should be the VRF used to communicate with the NetQ Cloud. Specifying a proxy host and port is optional. For example:

        sudo netq config add opta config-key tHkSI2d3LmRldjMubmV0cWRldi5jdW11bHVasdf29ya3MuY29tGLsDIiwzeUpNc3BwK1IyUjVXY2p2dDdPL3JHS3ZrZ1dDUkpFY2JkMVlQOGJZUW84PTIEZGV2MzoHbmV0cWRldr vrf mgmt
        

        You can also add a proxy host separately with the following command:

        sudo netq config add opta proxy-host <text-proxy-host> proxy-port <text-proxy-port>
        

        After adding the config-key, restart the OPTA service:

        sudo netq config restart opta
        

        Connect NetQ Agents to the OPTA Service

        The final step is configuring NetQ Agents to connect to the OPTA service. To configure the agent on a switch to connect locally to the OPTA service running on that switch, configure the agent to connect to localhost with the following command:

        sudo netq config add agent server localhost vrf mgmt
        sudo netq config restart agent
        

        To configure the agent on a switch to connect to the OPTA service on another switch in your network, configure the agent to connect to the IP address of the switch running the OPTA service:

        sudo netq config add agent server 192.168.1.254 vrf mgmt
        sudo netq config restart agent
        

        Configure the LCM Executor

        When the LCM executor is configured, the on-switch OPTA service supports the following lifecycle management functions:

        The NetQ Agent must be running for lifecycle management to work properly.

        LCM with the on-switch OPTA service is supported on NVIDIA Spectrum-2 platforms and later.

        After installing and configuring the netq-opta package, enable the LCM executor with the following commands:

        sudo netq config add opta executor-enabled true
        sudo netq config restart lcm-executor
        

        Considerations

        Disable the LCM Executor

        Disable the LCM executor by stopping it, then restarting the OPTA service:

        sudo netq config stop lcm-executor
        sudo netq config add opta executor-enabled false
        sudo netq config restart opta
        

        Install NetQ CLI

        Installing the NetQ CLI on your NetQ VMs, switches, or hosts gives you access to new features and bug fixes, and allows you to manage your network from multiple points in the network.

        After installing the NetQ software and agent on each switch you want to monitor, you can also install the NetQ CLI on switches running:

        If your network uses a proxy server for external connections, you should first configure a global proxy so apt-get can access the software package in the NetQ repository.

        Prepare for NetQ CLI Installation on a RHEL, CentOS, or Ubuntu Server

        For servers running RHEL 7, CentOS or Ubuntu OS, you need to:

        These steps are not required for Cumulus Linux or SONiC.

        Verify Service Package Versions

        • iproute-3.10.0-54.el7_2.1.x86_64
        • lldpd-0.9.7-5.el7.x86_64
        • ntp-4.2.6p5-25.el7.centos.2.x86_64
        • ntpdate-4.2.6p5-25.el7.centos.2.x86_64
        • iproute 1:4.3.0-1ubuntu3.16.04.1 all
        • iproute2 4.3.0-1ubuntu3 amd64
        • lldpd 0.7.19-1 amd64
        • ntp 1:4.2.8p4+dfsg-3ubuntu5.6 amd64

        Verify That CentOS and Ubuntu Are Running lldpd

        For CentOS and Ubuntu, make sure you are running lldpd, not lldpad. CentOS and Ubuntu do not include lldpd by default, even though the installation requires it. You must also install the Wget utility on CentOS distributions.

        To install the packages, run the following commands:

        root@centos:~# sudo yum -y install epel-release
        root@centos:~# sudo yum -y install lldpd
        root@centos:~# sudo systemctl enable lldpd.service
        root@centos:~# sudo systemctl start lldpd.service
        root@centos:~# sudo yum install wget
        

        To install lldpd, run the following commands:

        root@ubuntu:~# sudo apt-get update
        root@ubuntu:~# sudo apt-get install lldpd
        root@ubuntu:~# sudo systemctl enable lldpd.service
        root@ubuntu:~# sudo systemctl start lldpd.service
        

        Install and Configure NTP

        If NTP is not already installed and configured, follow these steps:

        1. Install NTP on the server. Servers must be synchronized with the NetQ appliance or VM to enable useful statistical analysis.

          root@rhel7:~# sudo yum install ntp
          
        2. Configure the NTP server.

          1. Open the /etc/ntp.conf file in your text editor of choice.

          2. Under the Server section, specify the NTP server IP address or hostname.

        3. Enable and start the NTP service.

          root@rhel7:~# sudo systemctl enable ntp
          root@rhel7:~# sudo systemctl start ntp
          

        If you are running NTP in your out-of-band management network with VRF, specify the VRF (ntp@<vrf-name> versus just ntp) in the above commands.

        1. Verify NTP is operating correctly. Look for an asterisk (*) or a plus sign (+) that indicates the clock synchronized with NTP.

          root@rhel7:~# ntpq -pn
          remote           refid            st t when poll reach   delay   offset  jitter
          ==============================================================================
          +173.255.206.154 132.163.96.3     2 u   86  128  377   41.354    2.834   0.602
          +12.167.151.2    198.148.79.209   3 u  103  128  377   13.395   -4.025   0.198
          2a00:7600::41    .STEP.          16 u    - 1024    0    0.000    0.000   0.000
          \*129.250.35.250 249.224.99.213   2 u  101  128  377   14.588   -0.299   0.243
          
        1. Install NTP on the server, if not already installed. Servers must be in time synchronization with the NetQ Platform or NetQ appliance to enable useful statistical analysis.

          root@ubuntu:~# sudo apt-get install ntp
          
        2. Configure the network time server.

        1. Open the /etc/ntp.conf file in your text editor of choice.

        2. Under the Server section, specify the NTP server IP address or hostname.

        3. Enable and start the NTP service.

          root@ubuntu:~# sudo systemctl enable ntp
          root@ubuntu:~# sudo systemctl start ntp
          

        If you are running NTP in your out-of-band management network with VRF, specify the VRF (ntp@<vrf-name> versus just ntp) in the above commands.

        1. Verify NTP is operating correctly. Look for an asterisk (*) or a plus sign (+) that indicates the clock synchronized with NTP.

          root@ubuntu:~# ntpq -pn
          remote           refid            st t when poll reach   delay   offset  jitter
          ==============================================================================
          +173.255.206.154 132.163.96.3     2 u   86  128  377   41.354    2.834   0.602
          +12.167.151.2    198.148.79.209   3 u  103  128  377   13.395   -4.025   0.198
          2a00:7600::41    .STEP.          16 u    - 1024    0    0.000    0.000   0.000
          \*129.250.35.250 249.224.99.213   2 u  101  128  377   14.588   -0.299   0.243
          

        1. Install chrony if needed.

          root@ubuntu:~# sudo apt install chrony
          
        2. Start the chrony service.

          root@ubuntu:~# sudo /usr/local/sbin/chronyd
          
        3. Verify it installed successfully.

          root@ubuntu:~# chronyc activity
          200 OK
          8 sources online
          0 sources offline
          0 sources doing burst (return to online)
          0 sources doing burst (return to offline)
          0 sources with unknown address
          
        4. View the time servers chrony is using.

          root@ubuntu:~# chronyc sources
          210 Number of sources = 8
          

          MS Name/IP address Stratum Poll Reach LastRx Last sample =============================================================================== ^+ golem.canonical.com 2 6 377 39 -1135us[-1135us] +/- 98ms ^* clock.xmission.com 2 6 377 41 -4641ns[ +144us] +/- 41ms ^+ ntp.ubuntu.net 2 7 377 106 -746us[ -573us] +/- 41ms …

          Open the chrony.conf configuration file (by default at /etc/chrony/) and edit if needed.

          Example with individual servers specified:

          server golem.canonical.com iburst
          server clock.xmission.com iburst
          server ntp.ubuntu.com iburst
          driftfile /var/lib/chrony/drift
          makestep 1.0 3
          rtcsync
          

          Example when using a pool of servers:

          pool pool.ntp.org iburst
          driftfile /var/lib/chrony/drift
          makestep 1.0 3
          rtcsync
          
        5. View the server chrony is currently tracking.

          root@ubuntu:~# chronyc tracking
          Reference ID    : 5BBD59C7 (golem.canonical.com)
          Stratum         : 3
          Ref time (UTC)  : Mon Feb 10 14:35:18 2020
          System time     : 0.0000046340 seconds slow of NTP time
          Last offset     : -0.000123459 seconds
          RMS offset      : 0.007654410 seconds
          Frequency       : 8.342 ppm slow
          Residual freq   : -0.000 ppm
          Skew            : 26.846 ppm
          Root delay      : 0.031207654 seconds
          Root dispersion : 0.001234590 seconds
          Update interval : 115.2 seconds
          Leap status     : Normal
          

        Get the NetQ CLI Software Package for Ubuntu

        To install the NetQ CLI on an Ubuntu server, you need to install netq-apps on each Ubuntu server. This is available from the NetQ repository.

        To get the NetQ CLI package:

        1. Reference and update the local apt repository.

          root@ubuntu:~# sudo wget -O- https://apps3.cumulusnetworks.com/setup/cumulus-apps-deb.pubkey | apt-key add -
          
        2. Add the Ubuntu repository:

          Create the file /etc/apt/sources.list.d/cumulus-host-ubuntu-bionic.list and add the following line:

          root@ubuntu:~# vi /etc/apt/sources.list.d/cumulus-apps-deb-bionic.list
          ...
          deb [arch=amd64] https://apps3.cumulusnetworks.com/repos/deb bionic netq-latest
          ...
          

          Create the file /etc/apt/sources.list.d/cumulus-host-ubuntu-focal.list and add the following line:

          root@ubuntu:~# vi /etc/apt/sources.list.d/cumulus-apps-deb-focal.list
          ...
          deb [arch=amd64] https://apps3.cumulusnetworks.com/repos/deb focal netq-latest
          ...
          

          The use of netq-latest in these examples means that a get to the repository always retrieves the latest version of NetQ, even for a major version update. If you want to keep the repository on a specific version — such as netq-4.4 — use that instead.

        Install NetQ CLI

        Follow these steps to install the NetQ CLI on a switch or host.

        Cumulus Linux 4.4 and later includes the netq-apps package by default. To upgrade the NetQ CLI to the latest version:

        1. Add the repository by uncommenting or adding the following line in /etc/apt/sources.list:
        cumulus@switch:~$ sudo nano /etc/apt/sources.list
        ...
        deb https://apps3.cumulusnetworks.com/repos/deb CumulusLinux-4 netq-latest
        ...
        

        You can specify a NetQ CLI version in the repository configuration. The following example shows the repository configuration to retrieve NetQ CLI v4.3:

        deb https://apps3.cumulusnetworks.com/repos/deb CumulusLinux-4 netq-4.3

        1. Update the local apt repository and install the software on the switch.

          cumulus@switch:~$ sudo apt-get update
          cumulus@switch:~$ sudo apt-get install netq-apps
          
        2. Verify you have the correct version of the CLI.

          cumulus@switch:~$ dpkg-query -W -f '${Package}\t${Version}\n' netq-apps
          

        You should see version 4.8.0 and update 44 in the results. For example:

          • netq-apps_4.8.0-cl4u44~1699077226.80e664937_armel.deb
          • netq-apps_4.8.0-cl4u44~1699245971.f796c0644_amd64.deb

        1. Continue with NetQ CLI configuration in the next section.

        To install the NetQ CLI you need to install netq-apps on each switch. This is available from the NVIDIA networking repository.

        If your network uses a proxy server for external connections, you should first configure a global proxy so apt-get can access the software package in the NVIDIA networking repository.

        To obtain the NetQ CLI package:

        1. Edit the /etc/apt/sources.list file to add the repository for NetQ.

          admin@switch:~$ sudo nano /etc/apt/sources.list
          ...
          deb [arch=amd64] https://apps3.cumulusnetworks.com/repos/deb buster netq-5
          ...
          
        2. Update the local apt repository and install the software on the switch.

          admin@switch:~$ sudo apt-get update
          admin@switch:~$ sudo apt-get install netq-apps
          
        3. Verify you have the correct version of the CLI.

          admin@switch:~$ dpkg-query -W -f '${Package}\t${Version}\n' netq-apps
          

          You should see version 4.8.0 and update 44 in the results. For example:

          • netq-apps_4.8.0-deb10u44~1699076923.80e664937_amd64.deb
        4. Continue with NetQ CLI configuration in the next section.

        1. Reference and update the local yum repository and key.

          root@rhel7:~# rpm --import https://apps3.cumulusnetworks.com/setup/cumulus-apps-rpm.pubkey
          root@rhel7:~# wget -O- https://apps3.cumulusnetworks.com/setup/cumulus-apps-rpm-el7.repo > /etc/yum.repos.d/cumulus-host-el.repo
          
        2. Edit /etc/yum.repos.d/cumulus-host-el.repo to set the enabled=1 flag for the two NetQ repositories.

          root@rhel7:~# vi /etc/yum.repos.d/cumulus-host-el.repo
          ...
          [cumulus-arch-netq-latest]
          name=Cumulus netq packages
          baseurl=https://apps3.cumulusnetworks.com/repos/rpm/el/7/netq-latest/$basearch
          gpgcheck=1
          enabled=1
          [cumulus-noarch-netq-latest]
          name=Cumulus netq architecture-independent packages
          baseurl=https://apps3.cumulusnetworks.com/repos/rpm/el/7/netq-latest/noarch
          gpgcheck=1
          enabled=1
          ...
          
        3. Install the Bash completion and CLI software on the server.

          root@rhel7:~# sudo yum -y install bash-completion
          root@rhel7:~# sudo yum install netq-apps
          
        4. Verify you have the correct version of the CLI.

          root@rhel7:~# rpm -q -netq-apps
          

        You should see version 4.8.0 and update 44 in the results. For example:

        • netq-apps_4.8.0-rh7u44~1699074652.80e6649.x86_64.rpm

        1. Continue with the next section.
        1. Install the CLI software on the server.

          root@ubuntu:~# sudo apt-get update
          root@ubuntu:~# sudo apt-get install netq-apps
          
        2. Verify you have the correct version of the CLI.

          root@ubuntu:~# dpkg-query -W -f '${Package}\t${Version}\n' netq-apps
          

        You should see version 4.8.0 and update 44 in the results. For example:

        • Ubuntu 20.04: netq-apps_4.8.0-ub20.04u44~1699074936.80e664937_amd64.deb

        1. Continue with NetQ CLI configuration in the next section.

        Configure the NetQ CLI

        By default, you do not configure the NetQ CLI during the NetQ installation. The configuration resides in the /etc/netq/netq.yml file. Until the CLI is configured on a device, you can only run netq config and netq help commands, and you must use sudo to run them.

        At minimum, you need to configure the NetQ CLI and NetQ Agent to communicate with the telemetry server. To do so, configure the NetQ Agent and the NetQ CLI so that they are running in the VRF where the routing tables have connectivity to the telemetry server (typically the management VRF).

        To access and configure the CLI for your on-premises NetQ deployment, you must generate AuthKeys. You’ll need your username and password to generate them. These keys provide authorized access (access key) and user authentication (secret key).

        To generate AuthKeys:

        1. Enter your on-premises NetQ appliance hostname or IP address into your browser to open the NetQ UI login page.

        2. Enter your username and password.

        3. Expand the Menu, then select Management.

        1. Select Manage on the User Accounts card.

        2. Select your user and click Generate keys above the table.

        3. Copy these keys to a safe place. Select Copy to obtain the CLI configuration command to use on your devices.

        The secret key is only shown once. If you do not copy these, you will need to regenerate them and reconfigure CLI access.

        You can also save these keys to a YAML file for easy reference, and to avoid having to type or copy the key values. You can:

        • store the file wherever you like, for example in /home/cumulus/ or /etc/netq
        • name the file whatever you like, for example credentials.yml, creds.yml, or keys.yml

        The file must have the following format:

        access-key: <user-access-key-value-here>
        secret-key: <user-secret-key-value-here>
        

        1. Insert the AuthKeys onto your device to configure the CLI. Alternately, use the following command.

          netq config add cli server <text-gateway-dest> [access-key <text-access-key> secret-key <text-secret-key> premises <text-premises-name> | cli-keys-file <text-key-file> premises <text-premises-name>] [vrf <text-vrf-name>] [port <text-gateway-port>]
          
        2. Restart the CLI to activate the configuration.

          The following example uses the individual access key, a premises of datacenterwest, and the default Cloud address, port and VRF. Replace the key values with your generated keys if you are using this example on your server.

          sudo netq config add cli server netqhostname.labtest.net access-key 123452d9bc2850a1726f55534279dd3c8b3ec55e8b25144d4739dfddabe8149e secret-key /vAGywae2E4xVZg8F+HtS6h6yHliZbBP6HXU3J98765= premises datacenterwest
          Updated cli server netqhostname.labtest.net vrf default port 443. Please restart netqd (netq config restart cli)
          
          sudo netq config restart cli
          Restarting NetQ CLI... Success!
          

          This example uses an optional keys file. Replace the keys filename and path with the full path and name of your keys file, and the datacenterwest premises name with your premises name if you are using this example on your server.

          sudo netq config add cli server netqhostname.labtest.net cli-keys-file /home/netq/nq-cld-creds.yml premises datacenterwest
          Updated cli server netqhostname.labtest.net vrf default port 443. Please restart netqd (netq config restart cli)
          
          sudo netq config restart cli
          Restarting NetQ CLI... Success!
          

        If you have multiple premises and want to query data from a different premises than you originally configured, rerun the netq config add cli server command with the desired premises name. You can only view the data for one premises at a time with the CLI.

        To access and configure the CLI for your NetQ cloud deployment, you must generate AuthKeys. You’ll need your username and password to generate them. These keys provide authorized access (access key) and user authentication (secret key). Your credentials and NetQ Cloud addresses were obtained during your initial login to the NetQ Cloud and premises activation.

        To generate AuthKeys:

        1. Enter netq.nvidia.com into your browser to open the NetQ UI login page.

        2. Enter your username and password.

        3. Expand the Menu, then select Management.

        1. Select Manage on the User Accounts card.

        2. Select your user and click Generate keys above the table.

        3. Copy these keys to a safe place. Select Copy to obtain the CLI configuration command to use on your devices.

        The secret key is only shown once. If you do not copy these, you will need to regenerate them and reconfigure CLI access.

        You can also save these keys to a YAML file for easy reference, and to avoid having to type or copy the key values. You can:

        • store the file wherever you like, for example in /home/cumulus/ or /etc/netq
        • name the file whatever you like, for example credentials.yml, creds.yml, or keys.yml

        The file must have the following format:

        access-key: <user-access-key-value-here>
        secret-key: <user-secret-key-value-here>
        

        1. Insert the AuthKeys onto your device to configure the CLI. Alternately, use the following command.

          netq config add cli server <text-gateway-dest> [access-key <text-access-key> secret-key <text-secret-key> premises <text-premises-name> | cli-keys-file <text-key-file> premises <text-premises-name>] [vrf <text-vrf-name>] [port <text-gateway-port>]
          
        2. Restart the CLI to activate the configuration.

          The following example uses the individual access key, a premises of datacenterwest, and the default Cloud address, port and VRF. Replace the key values with your generated keys if you are using this example on your server.

          sudo netq config add cli server api.netq.cumulusnetworks.com access-key 123452d9bc2850a1726f55534279dd3c8b3ec55e8b25144d4739dfddabe8149e secret-key /vAGywae2E4xVZg8F+HtS6h6yHliZbBP6HXU3J98765= premises datacenterwest
          Successfully logged into NetQ cloud at api.netq.cumulusnetworks.com:443
          Updated cli server api.netq.cumulusnetworks.com vrf default port 443. Please restart netqd (netq config restart cli)
          
          sudo netq config restart cli
          Restarting NetQ CLI... Success!
          

          The following example uses an optional keys file. Replace the keys filename and path with the full path and name of your keys file, and the datacenterwest premises name with your premises name if you are using this example on your server.

          sudo netq config add cli server api.netq.cumulusnetworks.com cli-keys-file /home/netq/nq-cld-creds.yml premises datacenterwest
          Successfully logged into NetQ cloud at api.netq.cumulusnetworks.com:443
          Updated cli server api.netq.cumulusnetworks.com vrf default port 443. Please restart netqd (netq config restart cli)
          
          sudo netq config restart cli
          Restarting NetQ CLI... Success!
          

        If you have multiple premises and want to query data from a different premises than you originally configured, rerun the netq config add cli server command with the desired premises name. You can only view the data for one premises at a time with the CLI.

        Install NIC and DPU Agents

        Installing NetQ telemetry agents on your hosts with NVIDIA ConnectX adapters and NVIDIA BlueField data processing units (DPUs) allows you to track inventory data and statistics across devices. The DOCA Telemetry Service (DTS) is the agent that runs on hosts and DPUs to collect data.

        • NIC telemetry for ConnectX adapters is supported only for on-premises NetQ deployments.
        • ConnectX telemetry is supported on DTS version 1.14.2 and later.

        Install DTS on ConnectX Hosts

        To install and configure the DOCA Telemetry Service container on a host with ConnectX adapters, perform the following steps:

        1. Obtain the latest DTS container image path from the NGC catalog. Select Get Container and copy the image path.

        2. Run the DTS container with Docker on the host. Use the image path obtained in the previous step for the DTS_IMAGE variable and configure the IP address of your NetQ server for the -i option:

        export DTS_IMAGE=nvcr.io/nvidia/doca/doca_telemetry:1.14.2-doca2.2.0-host
        docker run -v "/opt/mellanox/doca/services/telemetry/config:/config" --rm --name doca-telemetry-init -ti $DTS_IMAGE /bin/bash -c "DTS_CONFIG_DIR=host_netq /usr/bin/telemetry-init.sh && /usr/bin/enable-fluent-forward.sh -i=10.10.10.1 -p=30001"
        docker run -d --net=host                                                              \
                      --privileged                                                            \
                      -v "/opt/mellanox/doca/services/telemetry/config:/config"               \
                      -v "/opt/mellanox/doca/services/telemetry/ipc_sockets:/tmp/ipc_sockets" \
                      -v "/opt/mellanox/doca/services/telemetry/data:/data"                   \
                      --rm --name doca-telemetry -it $DTS_IMAGE /usr/bin/telemetry-run.sh
        

        Configure Prometheus Targets for ConnectX Adapters

        The Prometheus adapter pod in NetQ collects statistics from ConnectX adapters in your network. To add adapters as a target for data collection, perform the following steps:

        1. On your NetQ VM, edit the targets-config ConfigMap with the kubectl edit cm targets-config command.

        Add the desired host IP addresses to the targets stanza, maintaining yaml indentation. Multiple entries must be separated by commas, and the port is 9100:

        data:
          targets.json: |-
            [
              {
                "labels": {
                "job": "node"
              },
              "targets": [
                 "10.10.10.10:9100","10.10.10.11:9100"
                ]
              }
            ]
        
        1. Restart the netq-prom-adapter pod.

        Retrieve the current pod name with the kubectl get pods | grep netq-prom command:

        cumulus@netq-server:~$ kubectl get pods | grep netq-prom
        netq-prom-adapter-ffd9b874d-hxhbz                    2/2     Running   0          3h50m
        

        Restart the pod by deleting the running pod:

        kubectl delete pod netq-prom-adapter-ffd9b874d-hxhbz
        

        Install DTS on DPUs

        To install and configure the DOCA Telemetry Service (DTS) container on a DPU, perform the following steps:

        1. Obtain the DTS container image path from the NGC catalog. Select Get Container, then View all tags. Copy the 1.18.2-doca2.8.0-host image path.

        2. Remove any current DTS configurations using the following command:

        sudo rm -rf /opt/mellanox/doca/services/telemetry/config
        
        1. Retrieve the container yaml configuration file onto the host. Use the path specified in the Adjusting the .yaml Configuration section in the NGC instructions. Copy it to /etc/kubelet.d/doca_telemetry_standalone.yaml:
        wget --content-disposition https://api.ngc.nvidia.com/v2/resources/nvidia/doca/doca_container_configs/versions/2.0.2v1/files/configs/2.0.2/doca_telemetry.yaml -O /etc/kubelet.d/doca_telemetry_standalone.yaml
        
        1. Edit the image in both the containers and initContainers sections of the /etc/kubelet.d/doca_telemetry_standalone.yaml file to set the container image path retrieved in step 1.

        2. Edit the command in the initContainers section of the /etc/kubelet.d/doca_telemetry_standalone.yaml file to set the DTS_CONFIG_DIR parameter to inventory_netq. Configure the fluent forwarding -i option to your NetQ server IP address and the -p option to 30001:

          initContainers:
        ...
              command: ["/bin/bash", "-c", "DTS_CONFIG_DIR=inventory_netq /usr/bin/telemetry-init.sh && /usr/bin/enable-fluent-forward.sh -i=10.10.10.1 -p=30001"]
        

        This step replaces the default configuration of command: ["/bin/bash", "-c", "/usr/bin/telemetry-init.sh && /usr/bin/enable-fluent-forward.sh"].

        1. Restart the DPE service with the service dpe restart command.

        Add More Nodes to Your Server Cluster

        You can add additional nodes to your server cluster on-premises and cloud deployments using the CLI:

        Run the following CLI command to add a new worker node for on-premises deployments:

        netq install cluster add-worker <text-worker-01>
        

        Run the following CLI command to add a new worker node for cloud deployments:

        netq install opta cluster add-worker <text-worker-01>
        

        Install a Custom Signed Certificate

        When you first log in to the NetQ UI as part of an on-premises deployment, your browser will display a warning indicating that the default certificate is not trusted. You can avoid this warning by installing your own, custom-signed certificate using the steps outlined on this page. The self-signed certificate is sufficient for non-production environments or cloud deployments.

        If you already have a certificate installed and want to change or update it, run the kubectl delete secret netq-gui-ingress-tls [name] --namespace default command before following the steps outlined in this section. After making your updates, restart nginx with the kubectl delete pod -l app.kubernetes.io/name=ingress-nginx --namespace ingress-nginx command.

        You need the following items to perform the certificate installation:

        Install a Certificate using the NetQ CLI

        1. Log in to the NetQ VM via SSH and copy your certificate and key file there.

        2. Generate a Kubernetes secret called netq-gui-ingress-tls:

          cumulus@netq-ts:~$ kubectl create secret tls netq-gui-ingress-tls \
              --namespace default \
              --key <name of your key file>.key \
              --cert <name of your cert file>.crt
          
        3. Verify that you created the secret successfully:

          cumulus@netq-ts:~$ kubectl get secret
          
          NAME                               TYPE                                  DATA   AGE
          netq-gui-ingress-tls               kubernetes.io/tls                     2      5s
          
        4. Update the ingress rule file to install self-signed certificates.

          1. Create a new file called ingress.yaml

          2. Copy and add the following content to the file:

          apiVersion: networking.k8s.io/v1
          kind: Ingress
          metadata:
            annotations:
              nginx.ingress.kubernetes.io/ssl-redirect: "true"
              nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
              nginx.ingress.kubernetes.io/proxy-connect-timeout: "3600"
              nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
              nginx.ingress.kubernetes.io/proxy-send-timeout: "3600"
              nginx.ingress.kubernetes.io/proxy-body-size: 10g
              nginx.ingress.kubernetes.io/proxy-request-buffering: "off"
            name: netq-gui-ingress-external
            namespace: default
          spec:
            ingressClassName: ingress-nginx-class
            rules:
            - host: <your-hostname>
              http:
                paths:
                - path: /
                  pathType: Prefix
                  backend:
                    service:
                      name: netq-gui
                      port:
                        number: 80
                  path: /
                  pathType: Prefix
            tls:
            - hosts:
              - <your-hostname>
              secretName: netq-gui-ingress-tls
          
          1. Replace <your-hostname> with the FQDN of the NetQ VM.

        5. Apply the new rule:

          cumulus@netq-ts:~$ kubectl apply -f ingress.yaml
          ingress.extensions/netq-gui-ingress-external configured
          

          The message above appears if your ingress rule is successfully configured.

        6. Configure the NetQ API to use the new certificate by updating the Swagger ingress rule file.

          1. Create a new file called swagger-ingress.yaml

          2. Copy and add the following content to the file:

          apiVersion: networking.k8s.io/v1
          kind: Ingress
          metadata:
            annotations:
              nginx.ingress.kubernetes.io/ssl-redirect: "true"
              nginx.ingress.kubernetes.io/proxy-connect-timeout: "300"
              nginx.ingress.kubernetes.io/proxy-read-timeout: "300"
              nginx.ingress.kubernetes.io/proxy-send-timeout: "300"
              nginx.ingress.kubernetes.io/proxy-body-size: 10g
              nginx.ingress.kubernetes.io/proxy-request-buffering: "off"
            name: netq-swagger-ingress-external
            namespace: default
          spec:
            ingressClassName: ingress-nginx-class
            rules:
            - host: <your-hostname>
              http:
                paths:
                - path: "/swagger"
                  pathType: Prefix
                  backend:
                    service:
                      name: swagger-ui
                      port:
                        number: 8080
            tls:
            - hosts:
              - <your-hostname>
              secretName: netq-gui-ingress-tls
          
          1. Replace <your-hostname> with the FQDN of the NetQ VM.

        7. Apply the new rule:

          cumulus@netq-ts:~$ kubectl apply -f swagger-ingress.yaml
          

        Your custom certificate should now be working. Verify this by opening the NetQ UI at https://<your-hostname-or-ipaddr> in your browser.

        Update Cloud Activation Key

        NVIDIA provides a cloud activation key when you set up your premises. You use the cloud activation key (called the config-key) to access the cloud services. Note that these authorization keys are different from the ones you use to configure the CLI.

        On occasion, you might want to update your cloud service activation key—for example, if you mistyped the key during installation and now your existing key does not work, or you received a new key for your premises from NVIDIA.

        Update the activation key using the NetQ CLI:

        Run the following command on your master NetQ VM replacing text-opta-key with your new key.

        cumulus@<hostname>:~$ netq install standalone activate-job config-key <text-opta-key>
        

        Upgrade NetQ

        This section describes how to upgrade from your current installation to NetQ 4.8. Refer to the release notes before you upgrade.

        You must upgrade your NetQ on-premises or cloud virtual machines. Upgrading NetQ Agents is optional, but recommended. If you want access to new and updated commands, you can upgrade the CLI on your physical servers or VMs, and monitored switches and hosts as well.

        Follow these steps to upgrade your on-premises or cloud deployment. Note that these steps are sequential; you must upgrade your NetQ virtual machine before you upgrade the NetQ Agents.

        1. Upgrade NetQ Virtual Machines
        2. Upgrade NetQ Agents
        3. Upgrade NetQ CLI

        Upgrade NetQ Virtual Machines

        This page describes how to upgrade your NetQ virtual machines. Note that the upgrade instructions vary depending on NetQ version you’re currently running.

        For deployments running:

        Upgrading from NetQ 4.7, 4.6, or 4.5

        You can upgrade directly to NetQ 4.8.0 if your deployment is currently running version 4.5.0, 4.6.0 or 4.7.0.

        Back up your NetQ Data

        Before you upgrade, you can back up your NetQ data. This is an optional step for on-premises deployments. NVIDIA automatically creates backups for NetQ cloud deployments.

        Update NetQ Debian Packages

        1. Update /etc/apt/sources.list.d/cumulus-netq.list to netq-4.8:

          cat /etc/apt/sources.list.d/cumulus-netq.list
          deb [arch=amd64] https://apps3.cumulusnetworks.com/repos/deb focal netq-4.8
          
        2. Update the NetQ debian packages. In cluster deployments, update the packages on the master and all worker nodes.

          cumulus@<hostname>:~$ sudo apt-get update
          Get:1 https://apps3.cumulusnetworks.com/repos/deb focal InRelease [13.8 kB]
          Get:2 https://apps3.cumulusnetworks.com/repos/deb focal/netq-4.8 amd64 Packages [758 B]
          Hit:3 http://archive.ubuntu.com/ubuntu focal InRelease
          Get:4 http://security.ubuntu.com/ubuntu focal-security InRelease [88.7 kB]
          Get:5 http://archive.ubuntu.com/ubuntu focal-updates InRelease [88.7 kB]
          ...
          Reading package lists... Done
          
          cumulus@<hostname>:~$ sudo apt-get install -y netq-agent netq-apps
          Reading package lists... Done
          Building dependency tree
          Reading state information... Done
          ...
          The following NEW packages will be installed:
          netq-agent netq-apps
          ...
          Fetched 39.8 MB in 3s (13.5 MB/s)
          ...
          Unpacking netq-agent (4.8.0-ub20.04u44~1699074936.80e664937) ...
          ...
          Unpacking netq-apps (4.8.0-ub20.04u44~1699074936.80e664937) ...
          Setting up netq-apps (4.8.0-ub20.04u44~1699074936.80e664937) ...
          Setting up netq-agent (4.8.0-ub20.04u44~1699074936.80e664937) ...
          Processing triggers for rsyslog (8.32.0-1ubuntu4) ...
          Processing triggers for man-db (2.8.3-2ubuntu0.1) ...
          

        Download the Upgrade Software

        1. Download the upgrade tarball.

          1. On the NVIDIA Application Hub, log in to your account.
          2. Select NVIDIA Licensing Portal.
          3. Select Software Downloads from the menu.
          4. Click Product Family and select NetQ.
          5. Select the relevant software for your hypervisor:
            If you are upgrading NetQ Platform software for a NetQ on-premises VM, select NetQ SW 4.8 Appliance to download the NetQ-4.8.0.tgz file. If you are upgrading NetQ software for a NetQ cloud VM, select NetQ SW 4.8 Appliance Cloud to download the NetQ-4.8.0-opta.tgz file.
          6. If prompted, agree to the license agreement and proceed with the download.

          For enterprise customers, if you do not see a link to the NVIDIA Licensing Portal on the NVIDIA Application Hub, contact NVIDIA support.


          For NVIDIA employees, download NetQ directly from the NVIDIA Licensing Portal.

        2. Copy the tarball to the /mnt/installables/ directory on your NetQ VM.

        Run the Upgrade

        Perform the following steps using the cumulus user account.

        Pre-installation Checks

        Verify the following items before upgrading NetQ. For cluster deployments, verify steps 1 and 4 on all nodes in the cluster:

        1. Confirm your VM is configured with 16 vCPUs. If your VM is configured with fewer than 16 vCPUs, power off your VM, reconfigure your hypervisor to allocate 16 vCPUs, then power the VM on before proceeding.

        2. Check if there is sufficient disk space:

        cumulus@<hostname>:~$ df -h /
        Filesystem      Size  Used Avail Use% Mounted on
        /dev/sda1       248G   70G  179G  28% /
        cumulus@netq-appliance:~$
        

        NVIDIA recommends proceeding with the installation only if the Use% is less than 70%. You can delete previous software tarballs in the /mnt/installables/ directory to regain some space. If you cannot decrease disk usage to under 70%, contact the NVIDIA support team.

        1. Run the netq show opta-health command and check that all pods are in the READY state. If the pods are in a state other than READY, contact the NVIDIA support team.

        2. Check if the certificates have expired:

        cumulus@<hostname>:~$ sudo grep client-certificate-data /etc/kubernetes/kubelet.conf | cut -d: -f2 | xargs | base64 -d | openssl x509 -dates -noout | grep notAfter | cut -f2 -d=
        Dec 18 17:53:16 2021 GMT
        cumulus@netq-appliance:~$
        

        If the date in the above output is in the past, run the following commands before proceeding with the upgrade:

        sudo cp /etc/kubernetes/kubelet.conf /etc/kubernetes/kubelet.conf.bak
        sudo sed -i 's/client-certificate-data.*/client-certificate: \/var\/lib\/kubelet\/pki\/kubelet-client-current.pem/g' /etc/kubernetes/kubelet.conf
        sudo sed -i 's/client-key.*/client-key: \/var\/lib\/kubelet\/pki\/kubelet-client-current.pem/g' /etc/kubernetes/kubelet.conf
        sudo systemctl restart kubelet
        

        Confirm that the kubelet process is running with the sudo systemctl status kubelet command before proceeding with the upgrade.

        1. Confirm that the NetQ CLI is properly configured. The netq show agents command should complete successfully and display agent status.

        Upgrade Using the NetQ CLI

        Run the appropriate commands for your current version and deployment type:

        cumulus@<hostname>:~$ netq upgrade bundle /mnt/installables/NetQ-4.8.0.tgz
        
        cumulus@<hostname>:~$ netq upgrade bundle /mnt/installables/NetQ-4.8.0.tgz
        

        Clear the current install state:

        cumulus@<hostname>:~$ netq bootstrap reset
        

        Run the following install command on your NetQ cloud VM with the config key obtained from the email you received from NVIDIA titled NetQ Access Link. You can also obtain the configuration key using the NetQ UI.

        cumulus@<hostname>:~$ netq install opta standalone full interface <interface-name> bundle /mnt/installables/NetQ-4.8.0-opta.tgz config-key <your-config-key> [proxy-host <proxy-hostname> proxy-port <proxy-port>]
        

        You can specify the IP address instead of the interface name. To do so, use ip-addr <IP address> in place of the interface referenced with interface <interface-name> above.

        Clear the current install state on your master node:

        cumulus@<hostname>:~$ netq bootstrap reset
        

        Run the following command on your master node to initialize the cluster. Copy the output of the command to use on your worker nodes:

        cumulus@<hostname>:~$ netq install cluster master-init
           Please run the following command on all worker nodes:
           netq install cluster worker-init c3NoLXJzYSBBQUFBQjNOemFDMXljMkVBQUFBREFRQUJBQUFCQVFDM2NjTTZPdVVUWWJ5c2Q3NlJ4SHdseHBsOHQ4N2VMRWVGR05LSWFWVnVNcy94OEE4RFNMQVhKOHVKRjVLUXBnVjdKM2lnMGJpL2hDMVhmSVVjU3l3ZmhvVDVZM3dQN1oySVZVT29ZTi8vR1lOek5nVlNocWZQMDNDRW0xNnNmSzVvUWRQTzQzRFhxQ3NjbndIT3dwZmhRYy9MWTU1a
        

        Run the netq install cluster worker-init <ssh-key> command from the output on each of your worker nodes.

        Run the following command on your master NetQ cloud VM using the IP addresses of your worker nodes and the config key obtained from the email you received from NVIDIA titled NetQ Access Link. You can also obtain the configuration key using the NetQ UI.

        cumulus@<hostname>:~$ netq install opta cluster full interface <interface-name> bundle /mnt/installables/NetQ-4.8.0-opta.tgz config-key <your-config-key> workers <worker-1-ip> <worker-2-ip> [proxy-host <proxy-hostname> proxy-port <proxy-port>]
        

        You can specify the IP address instead of the interface name. To do so, use ip-addr <IP address> in place of the interface referenced with interface <interface-name> above.

        Clear the install state and save the current database:

        cumulus@<hostname>:~$ netq bootstrap reset keep-db purge-images
        

        Run the install command to install the new tarball:

        cumulus@<hostname>:~$ netq install standalone full interface <interface-name> bundle /mnt/installables/NetQ-4.8.0.tgz
        

        You can specify the IP address instead of the interface name. To do so, use ip-addr <IP address> in place of the interface referenced with interface <interface-name> above.

        Clear the install state on your master node and save the current database:

        cumulus@<hostname>:~$ netq bootstrap reset keep-db purge-images
        

        Run the following command on your master node to initialize the cluster. Copy the output of the command to use on your worker nodes:

        cumulus@<hostname>:~$ netq install cluster master-init
           Please run the following command on all worker nodes:
           netq install cluster worker-init c3NoLXJzYSBBQUFBQjNOemFDMXljMkVBQUFBREFRQUJBQUFCQVFDM2NjTTZPdVVUWWJ5c2Q3NlJ4SHdseHBsOHQ4N2VMRWVGR05LSWFWVnVNcy94OEE4RFNMQVhKOHVKRjVLUXBnVjdKM2lnMGJpL2hDMVhmSVVjU3l3ZmhvVDVZM3dQN1oySVZVT29ZTi8vR1lOek5nVlNocWZQMDNDRW0xNnNmSzVvUWRQTzQzRFhxQ3NjbndIT3dwZmhRYy9MWTU1a
        

        Run the netq install cluster worker-init <ssh-key> command from the output on each of your worker nodes.

        Next, run the netq install cluster full command on your master node using the IP addresses of your worker nodes:

        cumulus@<hostname>:~$ netq install cluster full interface <interface-name> bundle /mnt/installables/NetQ-4.8.0.tgz workers <worker-1-ip> <worker-2-ip>
        

        You can specify the IPv4 or IPv6 address instead of the interface name. Refer to the command line reference for the full syntax.

        Clear the current install state:

        cumulus@<hostname>:~$ netq bootstrap reset
        

        Run the following install command on your NetQ cloud VM with the config key obtained from the email you received from NVIDIA titled NetQ Access Link. You can also obtain the configuration key using the NetQ UI.

        cumulus@<hostname>:~$ netq install opta standalone full interface <interface-name> bundle /mnt/installables/NetQ-4.8.0-opta.tgz config-key <your-config-key> [proxy-host <proxy-hostname> proxy-port <proxy-port>]
        

        You can specify the IP address instead of the interface name. To do so, use ip-addr <IP address> in place of the interface referenced with interface <interface-name> above.

        Clear the current install state on your master node:

        cumulus@<hostname>:~$ netq bootstrap reset
        

        Run the following command on your master node to initialize the cluster. Copy the output of the command to use on your worker nodes:

        cumulus@<hostname>:~$ netq install cluster master-init
           Please run the following command on all worker nodes:
           netq install cluster worker-init c3NoLXJzYSBBQUFBQjNOemFDMXljMkVBQUFBREFRQUJBQUFCQVFDM2NjTTZPdVVUWWJ5c2Q3NlJ4SHdseHBsOHQ4N2VMRWVGR05LSWFWVnVNcy94OEE4RFNMQVhKOHVKRjVLUXBnVjdKM2lnMGJpL2hDMVhmSVVjU3l3ZmhvVDVZM3dQN1oySVZVT29ZTi8vR1lOek5nVlNocWZQMDNDRW0xNnNmSzVvUWRQTzQzRFhxQ3NjbndIT3dwZmhRYy9MWTU1a
        

        Run the netq install cluster worker-init <ssh-key> command from the output on each of your worker nodes.

        Run the following command on your master NetQ cloud VM using the IP addresses of your worker nodes and the config key obtained from the email you received from NVIDIA titled NetQ Access Link. You can also obtain the configuration key using the NetQ UI.

        cumulus@<hostname>:~$ netq install opta cluster full interface <interface-name> bundle /mnt/installables/NetQ-4.8.0-opta.tgz config-key <your-config-key> workers <worker-1-ip> <worker-2-ip> [proxy-host <proxy-hostname> proxy-port <proxy-port>]
        

        You can specify the IP address instead of the interface name. To do so, use ip-addr <IP address> in place of the interface referenced with interface <interface-name> above.

        Confirm the upgrade was successful:

        ```
        cumulus@<hostname>:~$ cat /etc/app-release
        BOOTSTRAP_VERSION=4.8.0
        APPLIANCE_MANIFEST_HASH=8869b5423dfcc441ea56a3c89e680b1b2ad61f6887edccb11676bac893073beb
        APPLIANCE_VERSION=4.8.0
        APPLIANCE_NAME=NetQ On-premises Appliance
        ```
        
        ```
        cumulus@<hostname>:~$ cat /etc/app-release
        BOOTSTRAP_VERSION=4.8.0
        APPLIANCE_MANIFEST_HASH=271f5943ffae42f758fef09bafeb37a63d996bd6e41bf7aeeb3a4d33232f05de
        APPLIANCE_VERSION=4.8.0
        APPLIANCE_NAME=NetQ Cloud Appliance
        ```
        

        Next Steps

        Upgrade NetQ Agents

        After upgrading your NetQ VM, upgrade the NetQ Agent:

        1. Log in to your switch or host.

        2. Update and install the new NetQ Debian package.

          sudo apt-get update
          sudo apt-get install -y netq-agent
          
          sudo yum update
          sudo yum install netq-agent
          
        3. Restart the NetQ Agent with the following command. The NetQ CLI must be installed for the command to run successfully.

          netq config restart agent
          

        Refer to Install NetQ Agents to complete the upgrade.

        Verify NetQ Agent Version

        You can verify the version of the agent software you have deployed as described in the following sections.

        Run the following command to view the NetQ Agent version.

        cumulus@switch:~$ dpkg-query -W -f '${Package}\t${Version}\n' netq-agent
        

        You should see version 4.8.0 and update 44 in the results.

          • netq-agent_4.8.0-cl4u44~1699077226.80e664937_armel.deb
          • netq-agent_4.8.0-cl4u44~1699245971.f796c0644_amd64.deb

        root@ubuntu:~# dpkg-query -W -f '${Package}\t${Version}\n' netq-agent
        

        You should see version 4.8.0 and update 44 in the results.

        • Ubuntu 20.04: netq-agent_4.8.0-ub20.04u44~1699074936.80e664937_amd64.deb

        root@rhel7:~# rpm -q -netq-agent
        

        You should see version 4.8.0 and update 44 in the results.

        • netq-agent-4.8.0-rh7u44~1699074652.80e6649.x86_64.rpm

        If you see an older version, upgrade the NetQ Agent, as described above.

        Next Steps

        Upgrade NetQ CLI

        To upgrade the NetQ CLI:

        1. Log in to your switch or host.

        2. Update and install the new NetQ Debian package:

          sudo apt-get update
          sudo apt-get install -y netq-apps
          
          sudo yum update
          sudo yum install netq-apps
          
        3. Restart the CLI:

          netq config restart cli
          

        To complete the upgrade, refer to Configure the NetQ CLI.

        Accounts and Roles

        NetQ accounts are assigned one of two roles: admin or user. Accounts with admin privileges can perform the same actions as user accounts. Additionally, admins can access a management dashboard in the UI.

        From this dashboard, admins can:

        The following image displays the management dashboard. Accounts with user privileges cannot perform the functions described above and do not have access to the management dashboard.

        netq management dashboard

        Add and Manage Accounts

        Sign in to NetQ as an admin to view and manage accounts. If you want to change individual preferences, visit Set User Preferences.

        Navigate to the NetQ management dashboard to complete the tasks outlined in this section. To get there, expand the Menu on the NetQ dashboard and select Management.

        Add an Account

        This section outlines the steps to add a local user account. To add an LDAP account, refer to LDAP Authentication.

        To create a new account:

        1. On the User Accounts card, select Manage to open a table listing all accounts.

        2. Above the table, select add Add to add an account.

        3. Enter the fields and select Save.

          Be especially careful entering the email address; you cannot change it once you save the account. If you save a mistyped email address, you must delete the account and create a new one.

        Edit an Account

        As an admin, you can:

        You cannot edit the email address associated with an account, because this is the identifier the system uses for authentication. If you need to change an email address, delete the account and create a new one.

        To edit an account:

        1. On the User Accounts card, select Manage to open a table listing all accounts.

        2. Select the account you’d like to edit. Above the table, click edit Edit to edit the account’s information.

        Reset an Admin Password

        If your account is assigned an admin role, reset your password by restoring the default password, then changing the password:

        1. Run the following command on your on-premises server’s CLI:
        kubectl exec $(kubectl get pod -oname -l app=cassandra) -- cqlsh -e "INSERT INTO master.user(id,  cust_id,  first_name,  last_name,  password,     access_key,  role,  email,  is_ldap_user,  is_active,  terms_of_use_accepted,  enable_alarm_notifications,  default_workbench,  preferences,  creation_time,  last_login,  reset_password)     VALUES(  'admin',  0,  'Admin',  '',  '009413d86fd42592e0910bb2146815deaceaadf3a4667b728463c4bc170a6511',     null, 'admin',  null,  false,  true,  true,  true,  { workspace_id : 'DEFAULT', workbench_id : 'DEFAULT' },  '{}',  toUnixTimestamp(now()),  toUnixTimestamp(now()),  true )"
        
        1. Log in to the NetQ UI with the default username and password: admin, admin. After logging in, you will be prompted to change the password.

        To reset a password for cloud deployments:

        1. Enter https://netq.nvidia.com in your browser to open the login page.

        2. Click Forgot Password? and enter an email address. Look for a message with the subject NetQ Password Reset Link from netq-sre@cumulusnetworks.com.

        3. Select the link in the email and follow the instructions to create a new password.

        Delete an Account

        To delete one or more accounts:

        1. On the User Accounts card, select Manage to open a table listing all accounts.

        2. Select one or more accounts. Above the table, click delete Delete to delete the selected account(s).

        View Account Activity

        Administrators can view account activity in the activity log. To get there, expand the menu Menu and select Activity log. Use the controls above the table to filter or export the data.

        Manage Login Policies

        Administrators can configure a session expiration time and the number of times users can refresh before requiring them to log in again to NetQ.

        To configure these login policies:

        1. On the Login Management card, select Manage.

        2. Select how long an account can be logged in before requiring a user to log in again:

        3. Click Update to save the changes.

          The Login Management card reflects the updated configuration.

        Configure Premises

        The NetQ management dashboard lets you configure a single NetQ UI and CLI for monitoring data from multiple premises. This means you do not need to log in to each premises individually to view the data.

        Configure Multiple Premises

        There are two ways to implement a multi-site, on-premises deployment: (1) as a full deployment at the primary premises and each of the external premises or (2) as a full deployment at the primary premises with smaller deployments at the secondary premises.

        The primary premises is called OPID0 by default in the UI.

        Full NetQ Deployment at Each Premises

        In this implementation, there is a NetQ appliance or VM running the NetQ software with a database. Each premises operates independently as an external premises, with its own NetQ UI and CLI. The NetQ appliance or VM at one of the deployments acts as the primary premises. A list of external premises is stored with the primary deployment.

        To configure a single UI to monitor multiple premises:

        1. From the UI of the primary premises (OPID0), select the Premises dropdown in the top-right corner of the screen.

        2. Select Manage premises, then select the External premises tab.

        3. Select Add external premises.

        4. Enter the IP address for the external server, your username, and password. The username and password are the same credentials used to log in to the UI for the external server. Select Next

          dialog prompting the user to enter the external server's IP and credentials
        5. Select the premises you want to connect, then click Finish.

          dialog displaying two premises

        You can also reduce the number of premises that can be displayed in the UI by hovering over a deployment and selecting Delete.

        To view the premises you just added, return to the home workbench and select the Premises dropdown in the top-right corner of the screen. Alternately, run the netq config show cli premises command.

        Full NetQ Deployment at Primary Premises and Smaller Deployments at Secondary Premises

        In this implementation, there is a NetQ appliance or VM at one of the deployments acting as the primary premises for the other deployments. The primary premises runs the NetQ software (including the NetQ UI and CLI) and houses the database. All other deployments are secondary premises; they run the NetQ cloud software and send their data to the primary premises for storage and processing. A list of these secondary premises is stored with the primary deployment.

        After the multiple premises are configured, you can view this list of premises in the NetQ UI at the primary premises, change the name of premises on the list, and delete premises from the list.

        In this deployment model, the data is stored and can be viewed only from the NetQ UI at the primary premises.

        The primary NetQ premises must be installed and operational before the secondary premises can be added.

        To create and add secondary premises:

        1. In the workbench header, select the Premises dropdown.

        2. Click Manage premises. Your primary premises (OPID0) is shown by default.

        3. Click Add premises.

        1. Enter the name of a secondary premises you’d like to add, then click Done.

        2. From the confirmation dialog, select View config key.

        1. Click the copy icon, then save the key to a safe place, or click e-mail to send it to yourself or others. Then click Confirm activation.
        dialog displaying configuration key with options to copy or share the key

        To view the premises you just added, return to the home workbench and select the Premises dropdown at the top-right corner of the screen. Alternately, run the netq config show cli premises command.

        Rename a Premises

        To rename an existing premises:

        1. In the workbench header, select the Premises dropdown, then Manage premises.

        2. Select a premises to rename, then click Edit.

        3. Enter the new name for the premises, then click Done.

        4. (Optional) Reconfigure the NetQ CLI by generating new AuthKeys. You must complete this step after renaming a premises for the CLI to be functional.

        Back Up and Restore NetQ

        The following sections describe how to back up and restore your NetQ data and VMs.

        These procedures only apply to on-premises deployments. Cloud deployments are backed up automatically.

        You must run backup and restore scripts with sudo privileges.

        Back Up Your NetQ Data

        NetQ stores its data in a Cassandra database. You perform backups by running scripts provided with the software and located in the /usr/sbin directory. When you run a backup, the script creates a single tar file in the /opt/backuprestore/ directory.

        To create a backup, refer to the following steps for your NetQ version.

        Back Up NetQ 4.4.1 or Earlier

        1. Retrieve the vm-backuprestore.sh script:

        a. On the NVIDIA Application Hub, log in to your account.

        b. Select NVIDIA Licensing Portal.

        c. Select Software Downloads from the menu.

        d. Click Product Family and select NetQ.

        e. Locate the NetQ Upgrade Backup Restore file and select Download.

        f. If prompted, agree to the license agreement and proceed with the download.

        1. Copy the vm-backuprestore.sh script to your NetQ server:
        username@hostname:~$ scp ./vm-backuprestore.sh cumulus@10.10.10.10:/home/cumulus/
        cumulus@10.10.10.10's password:
        vm-backuprestore.sh                                                                                       100%   15KB  54.0KB/s   00:00 
        
        1. Log in to your NetQ server and set the script to executable:
        cumulus@netq-appliance:/home/cumulus# chmod +x /usr/sbin/vm-backuprestore.sh
        
        1. In the directory you copied the vm-backuprestore.sh script, run:
        cumulus@netq-appliance:~$ sudo ./vm-backuprestore.sh --backup
        [sudo] password for cumulus:
        Mon Feb  6 12:37:18 2023 - Please find detailed logs at: /var/log/vm-backuprestore.log
        Mon Feb  6 12:37:18 2023 - Starting backup of data, the backup might take time based on the size of the data
        Mon Feb  6 12:37:19 2023 - Scaling static pods to replica 0
        Mon Feb  6 12:37:19 2023 - Scaling all pods to replica 0
        Mon Feb  6 12:37:28 2023 - Scaling all daemonsets to replica 0
        Mon Feb  6 12:37:29 2023 - Waiting for all pods to go down
        Mon Feb  6 12:37:29 2023 - All pods are down
        Mon Feb  6 12:37:29 2023 - Creating backup tar /opt/backuprestore/backup-netq-standalone-onprem-4.4.0-2023-02-06_12_37_29_UTC.tar
        Backup is successful, please scp it to the master node the below command:
              sudo scp /opt/backuprestore/backup-netq-standalone-onprem-4.4.0-2023-02-06_12_37_29_UTC.tar cumulus@<ip_addr>:/home/cumulus
         
          Restore the backup file using the below command:
              ./vm-backuprestore.sh --restore --backupfile /opt/backuprestore/backup-netq-standalone-onprem-4.4.0-2023-02-06_12_37_29_UTC.tar
        cumulus@netq-appliance:~$
        
        1. Verify the backup file creation was successful:

          cumulus@netq-appliance:~$ cd /opt/backuprestore/
          cumulus@netq-appliance:~/opt/backuprestore$ ls
          backup-netq-standalone-onprem-4.4.0-2023-02-06_12_37_29_UTC.tar
          

        Back Up NetQ 4.5.0 or Later

        1. Run the backup script /usr/sbin/vm-backuprestore.sh:
        cumulus@netq-appliance:~$ sudo /usr/sbin/vm-backuprestore.sh --backup
        
        1. Verify the backup file creation was successful:

          cumulus@netq-appliance:~$ cd /opt/backuprestore/
          cumulus@netq-appliance:~/opt/backuprestore$ ls
          

        Restore Your NetQ Data

        Restore NetQ data with the backup file you created in the steps above. The restore option of the backup script copies the data from the backup file to the database, decompresses it, verifies the restoration, and starts all necessary services. You should not see any data loss as a result of a restore operation.

        Run the restore script, referencing the directory where the backup file resides.

        If you restore NetQ data to a server with an IP address that is different from the one used to back up the data, you must reconfigure the agents on each switch as a final step.

        cumulus@netq-appliance:~$ sudo vm-backuprestore.sh --restore --backupfile /home/cumulus/backup-netq-standalone-onprem-4.4.0-2023-02-06_12_37_29_UTC.tar
        Mon Feb  6 12:39:57 2023 - Please find detailed logs at: /var/log/vm-backuprestore.log
        Mon Feb  6 12:39:57 2023 - Starting restore of data
        Mon Feb  6 12:39:57 2023 - Extracting release file from backup tar
        Mon Feb  6 12:39:57 2023 - Cleaning the system
        Mon Feb  6 12:39:57 2023 - Restoring data from tarball /home/cumulus/backup-netq-standalone-onprem-4.4.0-2023-02-06_12_37_29_UTC.tar
        Data restored successfully
          Please follow the below instructions to bootstrap the cluster
          The config key restored is EhVuZXRxLWVuZHBvaW50LWdhdGVfYXkYsagDIix2OUJhMUpyekMwSHBBaitUdTVDaTRvbVJDR3F6Qlo4VHhZRytjUUhLZGJRPQ==, alternately the config key is available in file /tmp/config-key
         
          Pass the config key while bootstrapping:
          Example(standalone): netq install standalone full interface eth0 bundle /mnt/installables/NetQ-4.8.0.tgz config-key EhVuZXRxLWVuZHBvaW50LWdhdGV3YXkYsagDIix2OUJhMUpyekMwSHBbaitUdTVDaTRvbVJDR3F6Qlo4VHhZRytjUUhLZGJRPQ==
          Example(cluster):    netq install cluster full interface eth0 bundle /mnt/installables/NetQ-4.8.0.tgz config-key EhVuZXRxLWVuZHBvaW50LWdhdGV3YXkYsagDIix2OUJhMUpyekMwSHBbaitUdTVDaTRvbVJDR3F6Qlo4VHhZRytjUUhLZGJRPQ==
          Alternately you can setup config-key post bootstrap in case you missed to pass it during bootstrap
          Example(standalone): netq install standalone activate-job config-key EhVuZXRxLWVuZHBvaW50LWdhdGV3YXkYsagDIix2OUJhMUpyekMwSHBbaitUdTVDaTRvbVJDR3F6Qlo4VHhZRytjUUhLZGJRPQ==
          Example(cluster):    netq install cluster activate-job config-key EhVuZXRxLWVuZHBvaW50LWdhdGV3YXkYsagDIix2OUJhMUpyekMwSHBbaitUdTVDaTRvbVJDR3F6Qlo4VHhZRytjUUhLZGJRPQ==
          In case the IP of the restore machine is different from the backup machine, please reconfigure the agents using: https://docs.nvidia.com/networking-ethernet-software/cumulus-netq-44/Installation-Management/Install-NetQ/Install-NetQ-Agents/#configure-netq-agents-using-a-configuration-file
        cumulus@netq-appliance:~$
        

        Post-installation Configurations

        This section describes the various integrations you can configure after installing NetQ.

        LDAP Authentication

        As an administrator, you can integrate the NetQ role-based access control (RBAC) with your lightweight directory access protocol (LDAP) server in on-premises deployments. NetQ maintains control over role-based permissions for the NetQ application. With the RBAC integration, LDAP handles account authentication and your directory service (such as Microsoft Active Directory, Kerberos, OpenLDAP, and Red Hat Directory Service). A copy of each account from LDAP is stored in the local NetQ database.

        Integrating with an LDAP server does not prevent you from configuring local accounts (stored and managed in the NetQ database) as well.

        Get Started

        LDAP integration requires information about how to connect to your LDAP server, the type of authentication you plan to use, bind credentials, and, optionally, search attributes.

        Provide Your LDAP Server Information

        To connect to your LDAP server, you need the URI and bind credentials. The URI identifies the location of the LDAP server. It comprises a FQDN (fully qualified domain name) or IP address, and the port of the LDAP server where the LDAP client can connect. For example: myldap.mycompany.com or 192.168.10.2. Typically you use port 389 for connection over TCP or UDP. In production environments, you deploy a secure connection with SSL. In this case, the port used is typically 636. Setting the Enable SSL toggle automatically sets the server port to 636.

        Specify Your Authentication Method

        There are two types of user authentication: anonymous and basic.

        If you are unfamiliar with the configuration of your LDAP server, contact your administrator to ensure you select the appropriate authentication method and credentials.

        Define User Attributes

        You need the following two attributes to define a user entry in a directory:

        Optionally, you can specify the first name, last name, and email address of the user.

        Set Search Attributes

        While optional, specifying search scope indicates where to start and how deep a given user can search within the directory. You specify the data to search for in the search query.

        Search scope options include:

        A typical search query for users could be {userIdAttribute}={userId}.

        Create an LDAP Configuration

        You can configure one LDAP server per bind DN (distinguished name). After you configure LDAP, you can verify the connectivity and save the configuration.

        To create an LDAP configuration:

        1. Expand the Menu and select Management.

        2. Locate the LDAP Server Info card, and click Configure LDAP.

        3. Fill out the LDAP server configuration form according to your particular configuration.

        1. Click Save to complete the configuration, or click Cancel to discard the configuration.

        The LDAP configuration cannot be changed after it is configured. If you need to change the configuration, you must delete the current LDAP configuration and create a new one. Note that if you change the LDAP server configuration, all users created against that LDAP server remain in the NetQ database and continue to be visible, but are no longer viable. You must manually delete those users if you do not want to see them.

        Example LDAP Configurations

        This section lists a variety of example configurations. Scenarios 1-3 are based on using an OpenLDAP or similar authentication service. Scenario 4 is based on using the Active Directory service for authentication.

        Scenario 1: Base Configuration

        In this scenario, we are configuring the LDAP server with anonymous authentication, a user ID based on an email address, and a search scope of base.

        Parameter Value
        Host Server URL ldap1.mycompany.com
        Host Server Port 389
        Authentication Anonymous
        Base DN dc=mycompany,dc=com
        User ID email
        Search Scope Base
        Search Query {userIdAttribute}={userId}

        Scenario 2: Basic Authentication and Subset of Users

        In this scenario, we are configuring the LDAP server with basic authentication, accessible only to users in the network operators group, and with a limited search scope.

        Parameter Value
        Host Server URL ldap1.mycompany.com
        Host Server Port 389
        Authentication Basic
        Admin Bind DN uid =admin,ou=netops,dc=mycompany,dc=com
        Admin Bind Password nqldap!
        Base DN dc=mycompany,dc=com
        User ID UID
        Search Scope One Level
        Search Query {userIdAttribute}={userId}

        Scenario 3: Scenario 2 with Widest Search Capability

        In this scenario, we are configuring the LDAP server with basic authentication, accessible only to users in the network administrators group, and with an unlimited search scope.

        Parameter Value
        Host Server URL 192.168.10.2
        Host Server Port 389
        Authentication Basic
        Admin Bind DN uid =admin,ou=netadmin,dc=mycompany,dc=com
        Admin Bind Password 1dap*netq
        Base DN dc=mycompany, dc=net
        User ID UID
        Search Scope Subtree
        Search Query userIdAttribute}={userId}

        Scenario 4: Scenario 3 with Active Directory Service

        In this scenario, we are configuring the LDAP server with basic authentication, accessible only to users in the given Active Directory group, and with an unlimited search scope.

        Parameter Value
        Host Server URL 192.168.10.2
        Host Server Port 389
        Authentication Basic
        Admin Bind DN cn=netq,ou=45,dc=mycompany,dc=com
        Admin Bind Password nq&4mAd!
        Base DN dc=mycompany, dc=net
        User ID sAMAccountName
        Search Scope Subtree
        Search Query {userIdAttribute}={userId}

        Add LDAP Users to NetQ

        1. Click Menu and select Management.

        2. Locate the User Accounts card, and click Manage.

        3. From the User accounts tab, select add Add user above the table.

        4. Select LDAP User, then enter the user’s ID.

        5. Enter your administrator password, then select Search.

        6. If the user is found, the email address, first, and last name fields are automatically populated. If searching is not enabled on the LDAP server, you must enter the information manually.

          If the fields are not automatically filled in, and searching is enabled on the LDAP server, you might need to edit the mapping file.

          LDAP user passwords are not stored in the NetQ database and are always authenticated against LDAP.

        7. Repeat these steps to add additional LDAP users.

        Remove LDAP Users from NetQ

        You can remove LDAP users in the same manner as local users.

        1. Expand the Menu and select Management.

        2. Locate the User Accounts card, and click Manage.

        3. Select the user(s) you want to remove, then select delete Delete.

        If you delete an LDAP user in LDAP it is not automatically deleted from NetQ; however, the login credentials for these LDAP users stop working immediately.

        Integrate NetQ with Grafana

        Switches collect statistics about the performance of their interfaces. The NetQ Agent on each switch collects these statistics every 15 seconds and then sends them to your NetQ appliance or virtual machine.

        NetQ collects statistics for physical interfaces; it does not collect statistics for virtual interfaces, such as bonds, bridges, and VXLANs.

        NetQ displays:

        You can use Grafana, an open source analytics and monitoring tool, to view these statistics. The fastest way to achieve this is by installing Grafana on an application server or locally per user, and then installing the NetQ plugin.

        If you do not have Grafana installed already, refer to grafana.com for instructions on installing and configuring the Grafana tool.

        Install NetQ Plugin for Grafana

        Use the Grafana CLI to install the NetQ plugin. For more detail about this command, refer to the Grafana CLI documentation.

        The Grafana plugin comes unsigned. Before you can install it, you need to update the grafana.ini file then restart the Grafana service:

        1. Edit the /etc/grafana/grafana.ini file and add allow_loading_unsigned_plugins = netq-dashboard under plugins:

          cumulus@netq-appliance:~$ sudo nano /etc/grafana/grafana.ini
          ...
          allow_loading_unsigned_plugins = netq-dashboard
          ...
          
        2. If you are using Grafana v11.0 or later, add support for AngularJS to the same file under security:

          cumulus@netq-appliance:~$ sudo nano /etc/grafana/grafana.ini
          ...
          angular_support_enabled = true
          ...
          
        3. Restart the Grafana service:

          cumulus@netq-appliance:~$ sudo systemctl restart grafana-server.service
          

        Then install the plugin:

        cumulus@netq-appliance:~$ grafana-cli --pluginUrl https://netq-grafana-dsrc.s3-us-west-2.amazonaws.com/NetQ-DSplugin-3.3.1-plus.zip plugins install netq-dashboard
        installing netq-dashboard @
        from: https://netq-grafana-dsrc.s3-us-west-2.amazonaws.com/NetQ-DSplugin-3.3.1-plus.zip
        into: /usr/local/var/lib/grafana/plugins
        
        ✔ Installed netq-dashboard successfully
        

        After installing the plugin, you must restart Grafana, following the steps specific to your implementation.

        Set Up the NetQ Data Source

        Now that you have the plugin installed, you need to configure access to the NetQ data source.

        1. Open the Grafana user interface and log in. Navigate to the Home Dashboard:

          Grafana Home Dashboard
        2. Click Add data source or > Data Sources.

        1. Enter Net-Q in the search box. Alternately, scroll down to the Other category, and select it from there.

        1. Enter Net-Q into the Name field.

        2. Enter the URL used to access the database:

        1. From the Module dropdown, select procdevstats.

        2. Enter your credentials (the ones used to log in).

        3. For NetQ cloud deployments only, if you have more than one premises configured, you can select the premises you want to view, as follows:

          • If you leave the Premises field blank, the first premises name is selected by default.
          • If you enter a premises name, that premises is selected for viewing.
          • If multiple premises are configured with the same name, then the first listed premises is displayed.
        1. Select Save & Test.

        Create Your NetQ Dashboard

        After you configure the data source, you can create a customizable dashboard with transmit and receive statistics.

        Create a Dashboard

        1. Click to open a blank dashboard.

        2. Click (Dashboard Settings) at the top of the dashboard.

        Add Variables

        1. Click Variables.

        2. In the Name field, enter hostname.

        3. In the Label field, enter hostname.

        1. From the Data source list, select Net-Q.

        2. From the Refresh list, select On Dashboard Load.

        3. In the Query field, enter hostname.

        4. Click Add.

          You should see a preview at the bottom of the hostname values.

        5. Click Variables to add another variable for the interface name.

        6. In the Name field, enter ifname.

        7. In the Label field, enter ifname.

        1. From the Data source list, select Net-Q.

        2. From the Refresh list, select On Dashboard Load.

        3. In the Query field, enter ifname.

        4. Click Add.

          You should see a preview at the bottom of the ifname values.

        5. Click Variables to add a variable for metrics.

        6. In the Name field, enter metrics.

        7. In the Label field, enter metrics.

        1. From the Data source list, select Net-Q.

        2. From the Refresh list, select On Dashboard Load.

        3. In the Query field, enter metrics.

        4. Click Add.

          You should see a preview at the bottom of the metrics values.

        Add Charts

        1. Now that the variables are defined, click to return to the new dashboard.

        2. Click Add Query.

        1. From the Query source list, select Net-Q.

        2. Select the interface statistic you want to view from the Metric list.

        3. Click the General icon.

        4. From the Repeat list, select hostname.

        5. Set any other parameters around how to display the data.

        6. Return to the dashboard.

        7. Select one or more hostnames from the hostname list.

        8. Select one or more interface names from the ifname list.

        9. Select one or more metrics to display for these hostnames and interfaces from the metrics list.

        The following example shows a dashboard with two hostnames, two interfaces, and one metric selected. The more values you select from the variable options, the more charts appear on your dashboard.

        Grafana dashboard displaying metrics

        Analyze the Data

        After you have configured the dashboard, you can start analyzing the data. You can explore the data by modifying the viewing parameters in one of several ways using the dashboard tool set:

        SSO Authentication

        You can integrate your NetQ Cloud deployment with a Microsoft Azure Active Directory (AD) or Google Cloud authentication server to support single sign-on (SSO) to NetQ. NetQ supports integration with SAML (Security Assertion Markup Language), OAuth (Open Authorization), and multi-factor authentication (MFA). Only one SSO configuration can be configured at a time.

        You can create local accounts with default access roles by enabling SSO. After enabling SSO, users logging in for the first time can sign up for SSO through the NetQ login screen or with a link provided by an admin.

        Add SSO Configuration and Accounts

        To integrate your authentication server:

        1. Expand the Main Menu Menu and select Management.

        2. Locate the SSO Configuration card and select Manage.

        3. Select either SAML or OpenID (which uses OAuth with OpenID Connect).

        4. Specify the parameters:

          You need several pieces of data from your Microsoft Azure or Google account and authentication server to complete the integration.

          sso configuration card with open id configuration

          SSO Organization is typically a company’s name or a department. The name entered in this field will appear in the SSO signup URL.

          Role (either user or admin) is automatically assigned when the account is initalized via SSO login.

          Name is a unique name for the SSO configuration.

          Client ID is the identifier for your resource server.

          Client Secret is the secret key for your resource server.

          Authorization Endpoint is the URL of the authorization application.

          Token Endpoint is the URL of the authorization token.

          After you enter the fields, select Add.

          As indicated, copy the redirect URI (https://api.netq.nvidia.com/netq/auth/v1/sso-callback) into your OpenID Connect configuration.

          Select Test to verify the configuration and ensure that you can log in. If it is not working, you are logged out. Check your specification and retest the configuration until it is working properly.

          Select Close. The card reflects the configuration:

          sso config card displaying an Open ID configuration with a disabled status

          To require users to log in using this SSO configuration, select Change under the “Disabled” status and confirm. The card updates to reflect that SSO is enabled.

          After an admin has configured and enabled SSO, users logging in for the first time can sign up for SSO.

          Admins can also provide users with an SSO signup URL: https://netq.nvidia.com/signup?organization=SSO_Organization

          The SSO organization you entered during the configuration will replace SSO_Organization in the URL.

          You need several pieces of data from your Microsoft Azure or Google account and authentication server to complete the integration.

          sso configuration card with SAML configuration

          SSO Organization is typically a company’s name or a department. The name entered in this field will appear in the SSO signup URL.

          Role (either user or admin) is automatically assigned when the account is initialized via SSO login.

          Name is a unique name for the SSO configuration.

          Login URL is the URL for the authorization server login page.

          Identity Provider Identifier is the name of the authorization server.

          Service Provider Identifier is the name of the application server.

          Email Claim Key is an optional field. When left blank, the email address is captured.

          After you enter the fields, select Add.

          As indicated, copy the redirect URI (https://api.netq.nvidia.com/netq/auth/v1/sso-callback) into your OpenID Connect configuration.

          Select Test to verify the configuration and ensure that you can log in. If it is not working, you are logged out. Check your specification and retest the configuration until it is working properly.

          Select Close. The card reflects the configuration:

          sso config card displaying a SAML configuration with a disabled status

          To require users to log in using this SSO configuration, select Change under the “Disabled” status and confirm. The card updates to reflect that SSO is enabled.

          Select Submit to enable the configuration. The SSO card reflects the “enabled” status.

          After an admin has configured and enabled SSO, users logging in for the first time can sign up for SSO.

          Admins can also provide users with an SSO signup URL: https://netq.nvidia.com/signup?organization=SSO_Organization

          The SSO organization you entered during the configuration will replace SSO_Organization in the URL.

        Modify Configuration

        You can change the specifications for SSO integration with your authentication server at any time, including changing to an alternate SSO type, disabling the existing configuration, or reconfiguring SSO.

        Change SSO Type

        From the SSO Configuration card:

        1. Select Disable, then Yes.

        2. Select Manage then select the desired SSO type and complete the form.

        3. Copy the redirect URL on the success dialog into your identity provider configuration.

        4. Select Test to verify that the login is working. Modify your specification and retest the configuration until it is working properly.

        5. Select Update.

        Disable SSO Configuration

        From the SSO Configuration card:

        1. Select Disable.

        2. Select Yes to disable the configuration, or Cancel to keep it enabled.

        Uninstall NetQ

        This page outlines how to remove the NetQ software from your system server and switches.

        Remove the NetQ Agent and CLI

        Use the apt-get purge command to remove the NetQ Agent or CLI package from a Cumulus Linux switch or an Ubuntu host:

        cumulus@switch:~$ sudo apt-get update
        cumulus@switch:~$ sudo apt-get purge netq-agent netq-apps
        Reading package lists... Done
        Building dependency tree
        Reading state information... Done
        The following packages will be REMOVED:
          netq-agent* netq-apps*
        0 upgraded, 0 newly installed, 2 to remove and 0 not upgraded.
        After this operation, 310 MB disk space will be freed.
        Do you want to continue? [Y/n] Y
        Creating pre-apt snapshot... 2 done.
        (Reading database ... 42026 files and directories currently installed.)
        Removing netq-agent (3.0.0-cl3u27~1587646213.c5bc079) ...
        /usr/sbin/policy-rc.d returned 101, not running 'stop netq-agent.service'
        Purging configuration files for netq-agent (3.0.0-cl3u27~1587646213.c5bc079) ...
        dpkg: warning: while removing netq-agent, directory '/etc/netq/config.d' not empty so not removed
        Removing netq-apps (3.0.0-cl3u27~1587646213.c5bc079) ...
        /usr/sbin/policy-rc.d returned 101, not running 'stop netqd.service'
        Purging configuration files for netq-apps (3.0.0-cl3u27~1587646213.c5bc079) ...
        dpkg: warning: while removing netq-apps, directory '/etc/netq' not empty so not removed
        Processing triggers for man-db (2.7.0.2-5) ...
        grep: extra.services.enabled: No such file or directory
        Creating post-apt snapshot... 3 done.
        

        If you only want to remove the agent or the CLI, but not both, specify just the relevant package in the apt-get purge command.

        To verify the removal of the packages from the switch, run:

        cumulus@switch:~$ dpkg-query -l netq-agent
        dpkg-query: no packages found matching netq-agent
        cumulus@switch:~$ dpkg-query -l netq-apps
        dpkg-query: no packages found matching netq-apps
        

        Use the yum remove command to remove the NetQ agent or CLI package from a RHEL7 or CentOS host:

        root@rhel7:~# sudo yum remove netq-agent netq-apps
        Loaded plugins: fastestmirror
        Resolving Dependencies
        --> Running transaction check
        ---> Package netq-agent.x86_64 0:3.1.0-rh7u28~1594097110.8f00ba1 will be erased
        --> Processing Dependency: netq-agent >= 3.2.0 for package: cumulus-netq-3.1.0-rh7u28~1594097110.8f00ba1.x86_64
        --> Running transaction check
        ---> Package cumulus-netq.x86_64 0:3.1.0-rh7u28~1594097110.8f00ba1 will be erased
        --> Finished Dependency Resolution
        
        Dependencies Resolved
        
        ...
        
        Removed:
          netq-agent.x86_64 0:3.1.0-rh7u28~1594097110.8f00ba1
        
        Dependency Removed:
          cumulus-netq.x86_64 0:3.1.0-rh7u28~1594097110.8f00ba1
        
        Complete!
        
        

        If you only want to remove the agent or the CLI, but not both, specify just the relevant package in the yum remove command.

        To verify the removal of the packages from the switch, run:

        root@rhel7:~# rpm -q netq-agent
        package netq-agent is not installed
        root@rhel7:~# rpm -q netq-apps
        package netq-apps is not installed
        

        Uninstall NetQ from the System Server

        First remove the data collected to free up used disk space. Then remove the software.

        1. Log in to the NetQ system server.

        2. Remove the data:

        netq bootstrap reset purge-db
        
        1. Remove the software with apt-get purge:
        cumulus@switch:~$ sudo apt-get update
        cumulus@switch:~$ sudo apt-get purge netq-agent netq-apps
        
        1. Verify the removal of the packages from the switch:
        cumulus@switch:~$ dpkg-query -l netq-agent
        dpkg-query: no packages found matching netq-agent
        cumulus@switch:~$ dpkg-query -l netq-apps
        dpkg-query: no packages found matching netq-apps
        
        1. Delete the virtual machine according to the usual VMware or KVM practice.

        Delete a virtual machine from the host computer using one of the following methods:

        • Right-click the name of the virtual machine in the Favorites list, then select Delete from Disk.
        • Select the virtual machine and choose VM > Delete from disk.

        Delete a virtual machine from the host computer using one of the following methods:

        • Run virsch undefine <vm-domain> --remove-all-storage
        • Run virsh undefine <vm-domain> --wipe-storage

        Manage Users

        As an admin, you can manage users and authentication settings from the NetQ management dashboard.

        Switch Management

        Lifecycle management displays an inventory of switches that are available for software installation or upgrade through NetQ. From the inventory list, you can assign access profiles and roles to switches, and select switches for software installation and upgrades. You can also decommission switches, which removes them from the NetQ database.

        If you manage a switch using an in-band network interface, additional configurations are required for LCM operations.

        View the LCM Switch Inventory

        From the LCM dashboard, select the Switch management tab. The Switches card displays the number of switches that NetQ discovered and the network OS versions that are running on those switches:

        switches card displaying 15 discovered switches with Cumulus Linux version 4.4.4

        To view a table of all discovered switches and their attributes, select Manage on the Switches card.

        If you have more than one network OS version running on your switches, you can click a version segment on the Switches card graph to open a list of switches pre-filtered by that version.

        To view a list of all switches discovered by lifecycle management, run:

        netq lcm show switches
            [cl-version <text-cumulus-linux-version>]
            [netq-version <text-netq-version>]
            [json]
        

        Use the version options to display switches with a given OS version. For additional details, refer to the command line reference.

        This list is the starting point for network OS upgrades or NetQ installations and upgrades. If the switches you want to upgrade are not present in the list, you can:

        Switch Discovery

        A switch discovery searches your network for all Cumulus Linux switches (with and without NetQ currently installed) and determines the versions of Cumulus Linux and NetQ installed. These results can be used to install or upgrade Cumulus Linux and NetQ on all discovered switches in a single procedure.

        To discover switches running Cumulus Linux:

        1. Click Devices in the workbench header, then click Manage switches.

        2. On the Switches card, click Discover.

        3. Enter a name for the scan.

        1. Choose whether you want to look for switches by entering IP address ranges or import switches using a comma-separated values (CSV) file.

        If you do not have a switch listing, then you can manually add the address ranges where your switches are located in the network. This has the advantage of catching switches that might have been missed in a file.

        A maximum of 50 addresses can be included in an address range. If necessary, break the range into smaller ranges.

        To discover switches using address ranges:

        1. Enter an IP address range in the IP Range field.

          Ranges can be contiguous, for example 192.168.0.24-64, or non-contiguous, for example 192.168.0.24-64,128-190,235, but they must be contained within a single subnet.

        2. Optionally, enter another IP address range (in a different subnet) by clicking .

          For example, 198.51.100.0-128 or 198.51.100.0-128,190,200-253.

        3. Add additional ranges as needed. Click to remove a range.

        If you decide to use a CSV file instead, the ranges you entered will remain if you return to using IP ranges again.

        To import switches through a CSV file:

        1. Click Browse.

        2. Select the CSV file containing the list of switches.

          The CSV file must include a header containing hostname, ip, and port. They can be in any order you like, but the data must match that order. For example, a CSV file that represents the Cumulus reference topology could look like this:

        or this:

        You must have an IP address in your file, but the hostname is optional. If the port is blank, NetQ uses switch port 22 by default.

        Click Remove if you decide to use a different file or want to use IP address ranges instead. If you entered ranges before selecting the CSV file option, they remain.

        1. Select an access profile from the dropdown menu. If you use Netq-Default you will see a message requesting that you create or update your credentials.

        2. Click Next.

          When the network discovery is complete, NetQ presents the number of Cumulus Linux switches it found. Each switch can be in one of the following categories:

          • Discovered without NetQ: Switches found without NetQ installed
          • Discovered with NetQ: Switches found with some version of NetQ installed
          • Discovered but Rotten: Switches found that are unreachable
          • Incorrect Credentials: Switches found that are unreachable because the provided access credentials do not match those for the switches
          • OS not Supported: Switches found that are running a Cumulus Linux version not supported by LCM upgrades
          • Not Discovered: IP addresses which did not have an associated Cumulus Linux switch

          If the discovery process does not find any switches for a particular category, then it does not display that category.

        Use the netq lcm discover command, specifying a single IP address, a range of IP addresses where your switches are located in the network, or a CSV file containing the IP address.

        You must also specify the access profile ID, which you can obtain with the netq lcm show credentials command.

           cumulus@switch:~$ netq lcm discover ip-range 10.0.1.12 profile_id credential_profile_3eddab251bddea9653df7cd1be0fc123c5d7a42f818b68134e42858e54a9c289
           NetQ Discovery Started with job id: job_scan_4f3873b0-5526-11eb-97a2-5b3ed2e556db
        

        When the network discovery is complete, NetQ presents the number of Cumulus Linux switches it has found. The output displays their discovery status, which can be one of the following:

        • Discovered without NetQ: Switches found without NetQ installed
        • Discovered with NetQ: Switches found with some version of NetQ installed
        • Discovered but Rotten: Switches found that are unreachable
        • Incorrect Credentials: Switches found that are unreachable because the provided access credentials do not match those for the switches
        • OS not Supported: Switches found that are running Cumulus Linux version not supported by the LCM upgrade feature
        • NOT_FOUND: IP addresses which did not have an associated Cumulus Linux switch

        Note that if you previously ran a switch discovery, you can display its results with netq lcm show discovery-job:

        cumulus@switch:~$ netq lcm show discovery-job job_scan_921f0a40-5440-11eb-97a2-5b3ed2e556db
        Scan COMPLETED
        
        Summary
        -------
        Start Time: 2021-01-11 19:09:47.441000
        End Time: 2021-01-11 19:09:59.890000
        Total IPs: 1
        Completed IPs: 1
        Discovered without NetQ: 0
        Discovered with NetQ: 0
        Incorrect Credentials: 0
        OS Not Supported: 0
        Not Discovered: 1
        
        
        Hostname          IP Address                MAC Address        CPU      CL Version  NetQ Version  Config Profile               Discovery Status Upgrade Status
        ----------------- ------------------------- ------------------ -------- ----------- ------------- ---------------------------- ---------------- --------------
        N/A               10.0.1.12                 N/A                N/A      N/A         N/A           []                           NOT_FOUND        NOT_UPGRADING
        cumulus@switch:~$ 
        

        Attach an Access Profile to a Switch

        After creating access profiles from your credentials, you can attach a profile to one or more switches.

        1. Expand the Menu and select Manage switches. On the Switches card, select Manage.

        2. The table displays a list of switches. The Access type column specifies whether the type of authentication is basic or SSH. The Profile name column displays the access profile that is assigned to the switch.

        Select the switches you’d like to assign access profiles, then select Manage access profile above the table:

        1. Select the profile from the list, then click Done.

        If the profile you want to use isn’t listed, select Add new profile and follow the steps to create an access profile.

        1. Select Ok on the confirmation dialog. The updated access profiles are now reflected in the Profile name column:

        The command syntax to attach a profile to a switch is:

        netq lcm attach credentials 
            profile_id <text-switch-profile-id> 
            hostnames <text-switch-hostnames>
        
        1. Run netq lcm show credentials to display a list of access profiles. Note the profile ID that you’d like to assign to a switch.

        2. Run netq lcm show switches to display a list of switches. Note the hostname of the switch(es) you’d like to attach a profile to.

        3. Next, attach the credentials to the switch:

        netq lcm attach credentials profile_id credential_profile_3eddab251bddea9653df7cd1be0fc123c5d7a42f818b68134e42858e54a9c289 hostnames tor-1,tor-2
        Attached profile to switch(es).
        
        1. Run netq lcm show switches and verify the change in the credential profile column.

        Reassign or Detach an Access Profile

        Detaching a profile from a switch restores it to the default access profile, Netq-Default.

        1. On the Switches card, click Manage.

        2. The table displays a list of switches. In the profile name column, locate the access profile. Hover over the access type column and select Manage access:

        1. To assign a different access profile to the switch, select it from the list. To detach the access profile, select Detach.

        After you detach the profile from the switch, NetQ reassigns it to the Netq-Default profile.

        The syntax for the detach command is netq lcm detach credentials hostname <text-switch-hostname>.

        1. To obtain a list of hostnames, run netq lcm show switches.

        2. Detach the access profile and specify the hostname. The following example detaches spine-1 from its assigned access profile:

        cumulus@switch:~$ netq lcm detach credentials hostname spine-1
        Detached profile from switch.
        
        1. Run netq lcm show switches and verify the change in the credential profile column.

        Role Management

        You can assign switches one of four roles: superspine, spine, leaf, and exit.

        Switch roles identify switch dependencies and determine the order in which switches are upgraded. The upgrade process begins with switches assigned the superspine role, then continues with the spine switches, leaf switches, exit switches, and finally, switches with no role assigned. Upgrades for all switches with a given role must be successful before the upgrade proceeds to the switches with the closest dependent role.

        Role assignment is optional, but recommended. Assigning roles can prevent switches from becoming unreachable due to dependencies between switches or single attachments. Additionally, when you deploy MLAG pairs, assigned roles avoid upgrade conflicts.

        Assign Roles to Switches

        1. On the Switches card, click Manage.

        2. Select one switch or multiple switches to assign to the same role.

        3. Above the table, select Assign Role Assign role.

        4. Select the role (superspine, leaf, spine, or exit) that applies to the selected switch(es).

        5. Click Assign.

          Note that the Role column is updated with the role assigned to the selected switch(es). To return to the full list of switches, click All.

        table displaying role column with updated role assignments
        1. Continue selecting switches and assigning roles until most or all switches have roles assigned.

        To add a role to one or more switches, run:

        netq lcm add role (superspine | spine | leaf | exit) switches <text-switch-hostnames>
        

        For a single switch, run:

        netq lcm add role leaf switches leaf01
        

        To assign multiple switches to the same role, separate the hostnames with commas (no spaces). This example configures leaf01 through leaf04 switches with the leaf role:

        netq lcm add role leaf switches leaf01,leaf02,leaf03,leaf04
        

        To view all switch roles, run:

        netq lcm show switches [version <text-cumulus-linux-version>] [json]
        

        Use the version option to only show switches with a given network OS version, X.Y.Z.

        The Role column displays assigned roles:

        cumulus@switch:~$ netq lcm show switches
        Hostname          Role       IP Address                MAC Address        CPU      CL Version           NetQ Version             Last Changed
        ----------------- ---------- ------------------------- ------------------ -------- -------------------- ------------------------ -------------------------
        leaf01            leaf       192.168.200.11            44:38:39:00:01:7A  x86_64   4.1.0                3.2.0-cl4u30~1601410518. Wed Sep 30 21:55:37 2020
                                                                                                                104fb9ed
        spine04           spine      192.168.200.24            44:38:39:00:01:6C  x86_64   4.1.0                3.2.0-cl4u30~1601410518. Tue Sep 29 21:25:16 2020
                                                                                                                104fb9ed
        leaf03            leaf       192.168.200.13            44:38:39:00:01:84  x86_64   4.1.0                3.2.0-cl4u30~1601410518. Wed Sep 30 21:55:56 2020
                                                                                                                104fb9ed
        leaf04            leaf       192.168.200.14            44:38:39:00:01:8A  x86_64   4.1.0                3.2.0-cl4u30~1601410518. Wed Sep 30 21:55:07 2020
                                                                                                                104fb9ed
        border02                     192.168.200.64            44:38:39:00:01:7C  x86_64   4.1.0                3.2.0-cl4u30~1601410518. Wed Sep 30 21:56:49 2020
                                                                                                                104fb9ed
        border01                     192.168.200.63            44:38:39:00:01:74  x86_64   4.1.0                3.2.0-cl4u30~1601410518. Wed Sep 30 21:56:37 2020
                                                                                                                104fb9ed
        fw2                          192.168.200.62            44:38:39:00:01:8E  x86_64   4.1.0                3.2.0-cl4u30~1601410518. Tue Sep 29 21:24:58 2020
                                                                                                                104fb9ed
        spine01           spine      192.168.200.21            44:38:39:00:01:82  x86_64   4.1.0                3.2.0-cl4u30~1601410518. Tue Sep 29 21:25:07 2020
                                                                                                                104fb9ed
        spine02           spine      192.168.200.22            44:38:39:00:01:92  x86_64   4.1.0                3.2.0-cl4u30~1601410518. Tue Sep 29 21:25:08 2020
                                                                                                                104fb9ed
        spine03           spine      192.168.200.23            44:38:39:00:01:70  x86_64   4.1.0                3.2.0-cl4u30~1601410518. Tue Sep 29 21:25:16 2020
                                                                                                                104fb9ed
        fw1                          192.168.200.61            44:38:39:00:01:8C  x86_64   4.1.0                3.2.0-cl4u30~1601410518. Tue Sep 29 21:24:58 2020
                                                                                                                104fb9ed
        leaf02            leaf       192.168.200.12            44:38:39:00:01:78  x86_64   4.1.0                3.2.0-cl4u30~1601410518. Wed Sep 30 21:55:53 2020
                                                                                                                104fb9ed
        

        Reassign Roles to Switches

        1. On the Switches card, click Manage.

        2. Select the switches with the incorrect role from the list.

        3. Click Assign Role Assign role.

        4. Select the correct role. To leave a switch unassigned, select No Role.

        5. Click Assign.

        You use the same command to both assign a role and change a role.

        For a single switch, run:

        netq lcm add role exit switches border01
        

        To assign multiple switches to the same role, separate the hostnames with commas (no spaces). For example:

        cumulus@switch:~$ netq lcm add role exit switches border01,border02
        

        Decommission a Switch with LCM

        Decommissioning the switch or host removes information about the switch or host from the NetQ database. When the NetQ Agent restarts at a later date, it sends a connection request back to the database, so NetQ can monitor the switch or host again.

        1. From the LCM dashboard, navigate to the Switch management tab.

        2. On the Switches card, select Manage.

        3. Select the devices to decommission, then select Decommission switch above the table:

        If you attempt to decommission a switch that is assigned a default, unmodified access profile, the process will fail. Create a unique access profile (or update the default with unique credentials), then attach the profile to the switch you want to decommission.

        1. Confirm the devices you want to decommission.

        2. Wait for the decommission process to complete, then select Done.

        To decommission a switch or host:

        1. On the given switch or host, stop and disable the NetQ Agent service:

          cumulus@switch:~$ sudo systemctl stop netq-agent
          cumulus@switch:~$ sudo systemctl disable netq-agent
          
        2. On the NetQ appliance or VM, decommission the switch or host:

          cumulus@netq-appliance:~$ netq decommission <hostname-to-decommission>
          

        Credentials and Profiles

        This section describes how to create and modify two types of profiles: access profiles and agent configuration profiles. Access profiles store user authentications credentials. Agent configuration profiles specify settings for a NetQ agent running on a switch. Both types of profiles must be applied to a switch for the changes to take effect.

        Access Profiles

        Authentication credentials are stored in access profiles which can be assigned to individual switches. You can create credentials with either basic (SSH username/password) or SSH (public/private key) authentication. This section describes how to create, edit, and delete access profiles. After you create a profile, attach it to individual switches so that you can perform upgrades on those switches.

        By default, NVIDIA supplies two access profiles: Netq-Default and Nvl4-Default (for NVLink devices). NVIDIA strongly recommends creating new access profiles or updating the default profiles with unique credentials. You cannot delete default profiles.

        Create Access Profiles

        1. Expand the Menu and select Manage switches.

        2. On the Access Profiles card, select Add profile.

        3. Enter a name for the profile, then select the authentication method you want to use: SSH or Basic

        The SSH user must have sudoer permission to configure switches when using the SSH key method. To provide sudo access to the SSH user on a switch, create a file in the /etc/sudoers.d/ directory with the following content. Replace <USER> with the SSH access profile username:

        “<USER>” ALL=(ALL) NOPASSWD: ALL
        

        1. Create a pair of SSH private and public keys on the NetQ appliance:

          ssh-keygen -t rsa -C "<USER>"
          

        When prompted, hit the enter/return key.

        1. Copy the SSH public key to each switch that you want to upgrade using one of the following methods:

          • Manually copy the SSH public key to the /home/<USER>/.ssh/authorized_keys file on each switch, or
          • Run ssh-copy-id USER@<switch_ip> on the server where you generated the SSH key pair for each switch
        2. Copy the SSH private key into the entry field:

        card displaying field for ssh private key

        For security, your private key is stored in an encrypted format, and only provided to internal processes while encrypted.

        1. (Optional) To verify that the new profile is listed among available profiles, select View profiles from the Access Profiles card.

        2. (Optional) Attach the profile to a switch so that you can perform upgrades.

        Be sure to use credentials for an account that has permission to configure switches.

        The default credentials for Cumulus Linux have changed from cumulus/CumulusLinux! to cumulus/cumulus for releases 4.2 and later. For details, read Cumulus Linux User Accounts.

        1. Enter a username and password.

        2. Click Create, then confirm.

        3. (Optional) To verify that the new profile is listed among available profiles, select View profiles from the Access Profiles card.

        4. (Optional) Attach the profile to a switch so that you can perform upgrades.

        To configure basic authentication, run:

        cumulus@netq-server:~$ netq lcm add credentials profile_name NEWPROFILE username cumulus password cumulus
        

        Specify a unique name for the configuration after profile_name.

        The default credentials for Cumulus Linux have changed from cumulus/CumulusLinux! to cumulus/cumulus for releases 4.2 and later. For details, read Cumulus Linux User Accounts.

        To configure SSH authentication using a public/private key:

        You must have sudoer permission to properly configure switches when using the SSH key method.

        1. If the keys do not yet exist, create a pair of SSH private and public keys on the NetQ appliance.

          ssh-keygen -t rsa -C "<USER>"
          

        When prompted, hit the enter/return key.

        1. Copy the SSH public key to each switch that you want to upgrade using one of the following methods:

          • Manually copy the SSH public key to the /home/<USER>/.ssh/authorized_keys file on each switch, or
          • Run ssh-copy-id USER@<switch_ip> on the server where you generated the SSH key pair for each switch

        2. Add these credentials to the switch. Specify a unique name for the configuration after profile_name.

          cumulus@netq-server:~$ netq lcm add credentials profile_name NEWPROFILE username <USERNAME> ssh-key PUBLIC_SSH_KEY
          

        Edit Access Profiles

        1. Open the LCM dashboard.

        2. On the Access Profiles card, select View profiles.

        3. Select the checkbox next to the profile you’d like to edit. Then select Edit above the table.

        4. Make your edits, then click Update.

        The syntax for editing access profiles is:

        cumulus@netq-server:~$ netq lcm edit credentials 
            profile_id <text-switch-profile-id> 
            [profile_name <text-switch-profile-name>] 
            [auth-type <text-switch-auth-type>] 
            [username <text-switch-username>] 
            [password <text-switch-password> | ssh-key <text-ssh-key>]
        

        Run netq lcm show credentials to obtain the profile ID. See the command line reference for further details.

        To configure SSH authentication using a public/private key (requires sudoer permission):

        1. If the new keys do not yet exist, create a pair of SSH private and public keys:

          ssh-keygen -t rsa -C "<USER>"
          
        2. Copy the SSH public key to each switch that you want to upgrade using one of the following methods:

          • Manually copy the SSH public key to the /home/<USER>/.ssh/authorized_keys file on each switch, or
          • Run ssh-copy-id USER@<switch_ip> on the server where you generated the SSH key pair for each switch

        3. Add these new credentials to the switch:

          cumulus@netq-server:~$ netq lcm edit credentials ssh-key PUBLIC_SSH_KEY
          

        Delete Access Profiles

        Any profile that is assigned to a switch can’t be deleted. You must attach a different profile to the switch first. Note that Netq-Default and Nvl4-Default can’t be deleted.

        1. On the Access Profiles card, select View profiles.

        2. From the list of profiles, select Delete in the profile’s row.

        The delete icon only appears next to custom profiles that are not already attached to a switch.

        1. Select Remove.
        1. Run netq lcm show credentials. Identify the profiles you’d like to delete and copy their identifiers from the Profile ID column. The following example deletes the n-1000 profile:
        cumulus@netq-server:~$ netq lcm show credentials
        Profile ID           Profile Name             Type             SSH Key        Username         Password         Number of switches                   Last Changed
        -------------------- ------------------------ ---------------- -------------- ---------------- ---------------- ------------------------------------ -------------------------
        credential_profile_d Netq-Default             BASIC                           cumulus          **************   11                                   Fri Feb  3 18:20:33 2023
        9e875bd2e6784617b304
        c20090ce28ff2bb46a4b
        9bf23cda98f1bdf91128
        5c9
        credential_profile_3 Nvl4-Default             BASIC                           admin            **************   1                                    Fri Feb  3 19:18:26 2023
        5a2eead7344fb91218bc
        dec29b12c66ebef0d806
        659b20e8805e4ff629bc
        23e
        credential_profile_3 n-1000                   BASIC                           admin            **************   0                                    Fri Feb  3 21:49:10 2023
        eddab251bddea9653df7
        cd1be0fc123c5d7a42f8
        18b68134e42858e54a9c
        289
        
        1. Run netq lcm del credentials profile_ids <text-credential-profile-ids>:
        cumulus@netq-server:~$ netq lcm del credentials profile_ids credential_profile_3eddab251bddea9653df7cd1be0fc123c5d7a42f818b68134e42858e54a9c289
        
        1. Verify that the profile is deleted with netq lcm show credentials.

        View Access Profiles

        You can view the type of credentials used to access your switches in the NetQ UI. You can view the details of the credentials using the NetQ CLI.

        1. Open the LCM dashboard.

        2. On the Access Profiles card, select View profiles.

        To view a list of access profiles and their associated credentials, run netq lcm show credentials.

        If you use an SSH key for the credentials, the public key appears in the command output.

        If you use a username and password for the credentials, the username appears in the command output with the password masked.

        Agent Configuration Profiles

        You can customize configuration profiles for NetQ Agents running on switches. When you create a configuration profile, you can adjust the following agent settings:

        The default NetQ agent configuration profile sets the VRF to mgmt, the log level to info, the WJH status to disabled, and the CPU limit to disabled.

        Create Configuration Profiles

        1. Expand the Menu and select Manage switches.

        2. Select NetQ agent configurations.

        3. On the NetQ agent configurations card, select Add config.

        4. Enter a profile name and choose the settings from the options presented in the UI. Select Advanced to set values for the log level and CPU limit:

        card displaying agent configuration profile settings
        1. Enter your NetQ CLI authentication keys and select Add.

        If you use an in-band interface to manage your switch, you must use the CLI to create NetQ agent configuration profiles using the inband-interface option.

        Create a NetQ agent configuration profile with the netq lcm add netq-config command. If you manage the switch using an in-band interface, you must specify the interface name using the inband-interface option:

        cumulus@netq-server:~$ netq lcm add netq-config 
            config-profile-name <text-config-profile> 
            accesskey <text-access-key> 
            secret-key <text-secret-key> 
            [cpu-limit <text-cpu-limit>] 
            [log-level error | log-level warn | log-level info | log-level debug] 
            [vrf default | vrf mgmt | vrf <text-config-vrf>] 
            [wjh enable | wjh disable] 
            [inband-interface <text-inband-interface>]
        

        Apply Configuration Profiles

        After you create an agent configuration profile, you must apply the profile to a switch to update the agent settings.

        1. Run a switch discovery.

        2. Select a switch from the Discovered with NetQ category and select Change config.

        card displaying discovered switch and change configuration option
        1. Select a configuration profile, then click Next.

        2. Specify which NetQ agent version you want to run on the switch, then click Next.

        3. Click Install to begin the pre-check process. After the pre-checks are successful, NetQ applies the configuration profile and installs the agent version you specified.

        NetQ and Network OS Images

        NetQ and network operating system images (Cumulus Linux and SONiC) are managed with LCM. This section explains how to check for missing images, upgrade images, and specify default images.

        View and Upload Missing Images

        You should upload images for each network OS and NetQ version currently installed in your inventory so you can support rolling back to a known good version should an installation or upgrade fail. If you have specified a default network OS and/or NetQ version, the NetQ UI also verifies that the necessary versions of the default image are available based on the known switch inventory, and if not, lists those that are missing.

        To upload missing network OS images:

        1. Expand the Menu and select Manage switches. Select the Image management tab.

        2. On the Cumulus Linux Images card, select View # missing CL images to see which images you need.

        cumulus linux images card with link to view missing images

        If you have already specified a default image, you must click Manage and then Missing to see the missing images.

        1. Select one or more of the missing images and take note of the version, ASIC vendor, and CPU architecture for each.

        2. Download the network OS disk images (.bin files) from the NVIDIA Enterprise Support Portal. Log in to the portal and from the Downloads tab, select Switches and Gateways. Under Switch Software, click All downloads next to Cumulus Linux for Mellanox Switches. Select the current version and the target version, then click Show Downloads Path. Download the file.

        3. In the UI, select Add image above the table.

        4. Provide the .bin file from an external drive that matches the criteria for the selected image(s).

        5. Click Import.

        If the upload was unsuccessful, an Image Import Failed message appears. Close the dialog and try uploading the file again.
        1. Click Done.

        2. (Optional) Click the Uploaded tab to verify the image is in the repository.

        3. Click close Close to return to the LCM dashboard.

          The Cumulus Linux Images card reflects the number of images you uploaded.

        1. (Optional) Display a summary of Cumulus Linux images uploaded to the LCM repo on the NetQ appliance or VM:
        netq lcm show cl-images
        
        1. Download the network OS disk images (.bin files) from the NVIDIA Enterprise Support Portal. Log into the portal and from the Downloads tab, select Switches and Gateways. Under Switch Software, click All downloads next to Cumulus Linux for Mellanox Switches. Select the current version and the target version, then click Show Downloads Path. Download the file.

        2. Upload the images to the LCM repository. The following example uses a Cumulus Linux 4.2.0 disk image.

          cumulus@switch:~$ netq lcm add cl-image /path/to/download/cumulus-linux-4.2.0-mlnx-amd64.bin
          
        3. Repeat step 2 for each image you need to upload to the LCM repository.

        To upload missing NetQ images:

        1. Expand the Menu and select Manage switches. Select the Image management tab.

        2. On the NetQ Images card, select View # missing NetQ images to see which images you need.

        If you have already specified a default image, you must click Manage and then Missing to see the missing images.

        1. Select one or all of the missing images and make note of the OS version, CPU architecture, and image type. Remember that you need both netq-apps and netq-agent for NetQ to perform the installation or upgrade.

        2. Download the NetQ Debian packages needed for upgrade from the NetQ repository, selecting the appropriate OS version and architecture. Place the files in an accessible part of your local network.

        3. In the UI, click Add image above the table.

        4. Provide the .deb file(s) from an external drive that matches the criteria for the selected image.

        dialog prompting the user to import the NetQ images
        1. Click Import.
        If the upload was unsuccessful, an Image Import Failed message appears. Close the dialog and try uploading the file again.
        1. Click Done.

        2. (Optional) Click the Uploaded tab to verify that the image is in the repository.

        3. Click Close to return to the LCM dashboard.

        The NetQ Images card reflects the number of images you uploaded.

        1. (Optional) Display a summary of NetQ images uploaded to the LCM repo on the NetQ appliance or VM:
        netq lcm show netq-images
        
        1. Download the NetQ Debian packages needed for upgrade from the NetQ repository, selecting the appropriate version and hypervisor/platform. Place them in an accessible part of your local network.

        2. Upload the images to the LCM repository. This example uploads the two packages (netq-agent and netq-apps) needed for NetQ version 4.4.0 for a NetQ appliance or VM running Ubuntu 18.04 with an x86 architecture.

          cumulus@switch:~$ netq lcm add netq-image /path/to/download/netq-agent_4.4.0-ub18.04u40~1667493385.97ef4c9_amd64.deb
          cumulus@switch:~$ netq lcm add netq-image /path/to/download/netq-apps_4.4.0-ub18.04u40~1667493385.97ef4c9_amd64.deb
          

        Upload Upgrade Images

        To upload the network OS or NetQ images that you want to use for upgrade, first download the Cumulus Linux or SONiC disk images (.bin files) and NetQ Debian packages from the NVIDIA Enterprise Support Portal and NetQ repository, respectively. Place them in an accessible part of your local network.

        If you are upgrading the network OS on switches with different ASIC vendors or CPU architectures, you need more than one image. For NetQ, you need both the netq-apps and netq-agent packages for each variant.

        After obtaining the images, upload them to NetQ with the UI or CLI:

        1. From the LCM dashboard, select the Image management tab.

        2. Select Add image on the appropriate card:

          cumulus linux and netq image cards prompting the user to add an image

        3. Provide one or more images from an external drive.

        4. Click Import.

        5. Monitor the progress until it completes. Click Done.

        Use the netq lcm add cl-image <text-cl-image-path> and netq lcm add netq-image <text-image-path> commands to upload the images. Run the relevant command for each image that needs to be uploaded.

        Network OS images:

        cumulus@switch:~$ netq lcm add image /path/to/download/cumulus-linux-4.2.0-mlx-amd64.bin
        

        NetQ images:

        cumulus@switch:~$ netq lcm add image /path/to/download/netq-agent_4.4.0-ub18.04u40~1667493385.97ef4c9_amd64.deb
        cumulus@switch:~$ netq lcm add image /path/to/download/netq-apps_4.4.0-ub18.04u40~1667493385.97ef4c9_amd64.deb
        

        Specify a Default Upgrade Version

        Specifying a default upgrade version is optional, but recommended. You can assign a specific OS or NetQ version as the default version to use when installing or upgrading switches. The default is typically the newest version that you intend to install or upgrade on all, or the majority, of your switches. If necessary, you can override the default selection during the installation or upgrade process if an alternate version is needed for a given set of switches.

        To specify a default version in the NetQ UI:

        1. From the LCM dashboard, select the Image management tab.

        2. Select Click here to set default x version on the relevant card.

          card highlighting link to set default version

        3. Select the version you want to use as the default for switch upgrades.

        4. Click Save. The default version is now displayed on the relevant Images card.

        To specify a default network OS version, run:

        cumulus@switch:~$ netq lcm add default-version cl-images <text-cumulus-linux-version>
        

        To verify the default network OS version, run:

        cumulus@switch:~$ netq lcm show default-version cl-images
        

        To specify a default NetQ version, run:

        cumulus@switch:~$ netq lcm add default-version netq-images <text-netq-version>
        

        To verify the default NetQ version, run:

        cumulus@switch:~$ netq lcm show default-version netq-images
        

        Remove Images from Local Repository

        After you upgrade all your switches beyond a particular release, you can remove images from the LCM repository to save space on the server. To remove images:

        1. From the LCM dashboard, select the Image management tab.

        2. Click Manage on the Cumulus Linux Images or NetQ Images card.

        3. On the Uploaded tab, select the images you want to remove.

        4. Click Delete.

        To remove Cumulus Linux images, run:

        netq lcm show cl-images [json]
        netq lcm del cl-image <text-cl-image-id>
        
        1. Determine the ID of the image you want to remove.

          cumulus@switch:~$ netq lcm show cl-images json
          [
              {
                  "id": "image_cc97be3955042ca41857c4d0fe95296bcea3e372b437a535a4ad23ca300d52c3",
                  "name": "cumulus-linux-4.2.0-vx-amd64-1594775435.dirtyzc24426ca.bin",
                  "clVersion": "4.2.0",
                  "cpu": "x86_64",
                  "asic": "VX",
                  "lastChanged": 1600726385400.0
              },
              {
                  "id": "image_c6e812f0081fb03b9b8625a3c0af14eb82c35d79997db4627c54c76c973ce1ce",
                  "name": "cumulus-linux-4.1.0-vx-amd64.bin",
                  "clVersion": "4.1.0",
                  "cpu": "x86_64",
                  "asic": "VX",
                  "lastChanged": 1600717860685.0
              }
          ]
          
        2. Remove the image you no longer need.

          cumulus@switch:~$ netq lcm del cl-image image_c6e812f0081fb03b9b8625a3c0af14eb82c35d79997db4627c54c76c973ce1ce
          
        3. Verify the command removed the image.

          cumulus@switch:~$ netq lcm show cl-images json
          [
              {
                  "id": "image_cc97be3955042ca41857c4d0fe95296bcea3e372b437a535a4ad23ca300d52c3",
                  "name": "cumulus-linux-4.2.0-vx-amd64-1594775435.dirtyzc24426ca.bin",
                  "clVersion": "4.2.0",
                  "cpu": "x86_64",
                  "asic": "VX",
                  "lastChanged": 1600726385400.0
              }
          ]
          

        To remove NetQ images, run:

        netq lcm show netq-images [json]
        netq lcm del netq-image <text-netq-image-id>
        
        1. Determine the ID of the image you want to remove.

          cumulus@switch:~$ netq lcm show netq-images json
          [
              {
                  "id": "image_d23a9e006641c675ed9e152948a9d1589404e8b83958d53eb0ce7698512e7001",
                  "name": "netq-agent_4.0.0-cl4u32_1609391187.7df4e1d2_amd64.deb",
                  "netqVersion": "4.0.0",
                  "clVersion": "cl4u32",
                  "cpu": "x86_64",
                  "imageType": "NETQ_AGENT",
                  "lastChanged": 1609885430638.0
              }, 
              {
                  "id": "image_68db386683c796d86422f2172c103494fef7a820d003de71647315c5d774f834",
                  "name": "netq-apps_4.0.0-cl4u32_1609391187.7df4e1d2_amd64.deb",
                  "netqVersion": "4.0.0",
                  "clVersion": "cl4u32",
                  "cpu": "x86_64",
                  "imageType": "NETQ_CLI",
                  "lastChanged": 1609885434704.0
              }
          ]
          
        2. Remove the image you no longer need.

          cumulus@switch:~$ netq lcm del netq-image image_68db386683c796d86422f2172c103494fef7a820d003de71647315c5d774f834
          
        3. Verify the command removed the image.

          cumulus@switch:~$ netq lcm show netq-images json
          [
              {
                  "id": "image_d23a9e006641c675ed9e152948a9d1589404e8b83958d53eb0ce7698512e7001",
                  "name": "netq-agent_4.0.0-cl4u32_1609391187.7df4e1d2_amd64.deb",
                  "netqVersion": "4.0.0",
                  "clVersion": "cl4u32",
                  "cpu": "x86_64",
                  "imageType": "NETQ_AGENT",
                  "lastChanged": 1609885430638.0
              }
          ]
          

        Network Monitoring

        Upgrade NetQ Agent

        Lifecycle management lets you upgrade to the latest agent version on switches with an existing NetQ Agent. You can upgrade only the NetQ Agent or both the NetQ Agent and NetQ CLI simultaneously. You can run up to five jobs at the same time; however, a given switch can only appear in one running job at a time.

        Prepare for a NetQ Agent Upgrade

        Before you upgrade, make sure you have the appropriate files and credentials:

        1. Upload the upgrade images.

        2. (Optional) Specify a default upgrade version.

        3. Verify or add switch access credentials.

        1. Verify or add switch access credentials.

        2. Configure switch roles to determine the order in which the switches get upgraded.

        3. Upload the Cumulus Linux upgrade images.

        Perform a NetQ Agent Upgrade

        After you complete the preparation steps, upgrade the NetQ Agents:

        1. From the LCM dashboard, select the Switch management tab. Locate the Switches card and click Manage.

        2. Select the switches you want to upgrade.

        3. Click Upgrade NetQ above the table and follow the steps in the UI.

        4. Verify that the number of switches selected for upgrade matches your expectation.

        5. Enter a name for the upgrade job. The name can contain a maximum of 22 characters (including spaces).

        6. Review each switch:

          • Is the configuration profile the one you want to apply? If not, click Change config, then select an alternate profile to apply to all selected switches.

        You can apply different profiles to switches in a single upgrade job by selecting a subset of switches then choosing a different profile. You can also change the profile on a per-switch basis by clicking the current profile link and selecting an alternate one.

        dialog displaying two profiles that can be applied to both multiple and individual switches

        1. Review the summary indicating the number of switches and the configuration profile to be used. If either is incorrect, click Back and review your selections.

        2. Select the version of NetQ Agent for upgrade. If you have designated a default version, keep the Default selection. Otherwise, select an alternate version by clicking Custom and selecting it from the list.

        By default, the NetQ Agent and CLI are upgraded on the selected switches. If you do not want to upgrade the NetQ CLI, click Advanced and change the selection to No.

        1. NetQ performs several checks to eliminate preventable problems during the upgrade process. When all of the pre-checks pass, click Upgrade to initiate the upgrade.

        To upgrade the NetQ Agent on one or more switches, run:

        netq lcm upgrade netq-image 
            job-name <text-job-name> 
            [netq-version <text-netq-version>] 
            [upgrade-cli True | upgrade-cli False] 
            hostnames <text-switch-hostnames> 
            [config_profile <text-config-profile>]
        

        The following example creates a NetQ Agent upgrade job called upgrade-cl550-nq470. It upgrades the spine01 and spine02 switches with NetQ Agents version 4.8.0.

        cumulus@switch:~$ netq lcm upgrade netq-image job-name upgrade-cl550-nq480 netq-version 4.8.0 hostnames spine01,spine02
        

        Analyze the NetQ Agent Upgrade Results

        After starting the upgrade you can monitor the progress in the NetQ UI. Successful upgrades are indicated by a green . Failed upgrades display error messages indicating the cause of failure.

        To view the progress of upgrade jobs using the CLI, run:

        netq lcm show upgrade-jobs netq-image [json]
        netq lcm show status <text-lcm-job-id> [json]
        
        Example netq lcm show upgrade-jobs

        You can view the progress of one upgrade job at a time. This requires the job identifier.

        The following example shows all upgrade jobs that are currently running or have completed, and then shows the status of the job with a job identifier of job_netq_install_7152a03a8c63c906631c3fb340d8f51e70c3ab508d69f3fdf5032eebad118cc7.

        cumulus@switch:~$ netq lcm show upgrade-jobs netq-image json
        [
            {
                "jobId": "job_netq_install_7152a03a8c63c906631c3fb340d8f51e70c3ab508d69f3fdf5032eebad118cc7",
                "name": "Leaf01-02 to NetQ330",
                "netqVersion": "4.1.0",
                "overallStatus": "FAILED",
                "pre-checkStatus": "COMPLETED",
                "warnings": [],
                "errors": [],
                "startTime": 1611863290557.0
            }
        ]
        
        cumulus@switch:~$ netq lcm show status netq-image job_netq_install_7152a03a8c63c906631c3fb340d8f51e70c3ab508d69f3fdf5032eebad118cc7
        NetQ Upgrade FAILED
        
        Upgrade Summary
        ---------------
        Start Time: 2021-01-28 19:48:10.557000
        End Time: 2021-01-28 19:48:17.972000
        Upgrade CLI: True
        NetQ Version: 4.1.0
        Pre Check Status COMPLETED
        Precheck Task switch_precheck COMPLETED
        	Warnings: []
        	Errors: []
        Precheck Task version_precheck COMPLETED
        	Warnings: []
        	Errors: []
        Precheck Task config_precheck COMPLETED
        	Warnings: []
        	Errors: []
        
        
        Hostname          CL Version  NetQ Version  Prev NetQ Ver Config Profile               Status           Warnings         Errors       Start Time
                                                    sion
        ----------------- ----------- ------------- ------------- ---------------------------- ---------------- ---------------- ------------ --------------------------
        leaf01            4.2.1       4.1.0         3.2.1         ['NetQ default config']      FAILED           []               ["Unreachabl Thu Jan 28 19:48:10 2021
                                                                                                                                 e at Invalid
                                                                                                                                 /incorrect u
                                                                                                                                 sername/pass
                                                                                                                                 word. Skippi
                                                                                                                                 ng remaining
                                                                                                                                 10 retries t
                                                                                                                                 o prevent ac
                                                                                                                                 count lockou
                                                                                                                                 t: Warning:
                                                                                                                                 Permanently
                                                                                                                                 added '192.1
                                                                                                                                 68.200.11' (
                                                                                                                                 ECDSA) to th
                                                                                                                                 e list of kn
                                                                                                                                 own hosts.\r
                                                                                                                                 \nPermission
                                                                                                                                 denied,
                                                                                                                                 please try a
                                                                                                                                 gain."]
        leaf02            4.2.1       4.1.0         3.2.1         ['NetQ default config']      FAILED           []               ["Unreachabl Thu Jan 28 19:48:10 2021
                                                                                                                                 e at Invalid
                                                                                                                                 /incorrect u
                                                                                                                                 sername/pass
                                                                                                                                 word. Skippi
                                                                                                                                 ng remaining
                                                                                                                                 10 retries t
                                                                                                                                 o prevent ac
                                                                                                                                 count lockou
                                                                                                                                 t: Warning:
                                                                                                                                 Permanently
                                                                                                                                 added '192.1
                                                                                                                                 68.200.12' (
                                                                                                                                 ECDSA) to th
                                                                                                                                 e list of kn
                                                                                                                                 own hosts.\r
                                                                                                                                 \nPermission
                                                                                                                                 denied,
                                                                                                                                 please try a
                                                                                                                                 gain."]
        

        Upgrade Cumulus Linux

        Lifecycle management (LCM) lets you upgrade Cumulus Linux on one or more switches in your network with the NetQ UI or the CLI. You do this by scheduling ‘upgrade jobs’ which upgrade Cumulus Linux on your switches. Each job can upgrade CL on up to 50 switches. NetQ upgrades the switches 5 at a time until all switches in the upgrade job are upgraded. You can schedule up to 5 upgrade jobs to run simultaneously.

        You can upgrade switches running Cumulus Linux 5.0.0 or later that are managed with flat configuration files or with NVUE.

        When you upgrade a switch that has not been configured using NVUE, LCM backs up and restores flat file configurations in Cumulus Linux. After you upgrade a switch that has been managed with flat files and subsequently run NVUE configuration commands, NVUE will overwrite the configuration restored by NetQ LCM. See Upgrading Cumulus Linux for additional information.

        During the Cumulus Linux upgrade process, NetQ does not upgrade or reinstall packages that are not part of the Cumulus Linux image. For example, if you installed node_exporter packages on a switch, you must reinstall these packages after the upgrade is complete.

        Prepare for a Cumulus Linux Upgrade

        If the NetQ Agent is already installed on the switches you’d like to upgrade, follow the steps below. If the NetQ Agent is not installed on the switches you’d like to upgrade, run a switch discovery, then proceed with the upgrade.

        Before you upgrade, make sure you have the appropriate files and credentials:

        1. Upload the Cumulus Linux upgrade images.

        2. (Optional) Specify a default upgrade version.

        3. Verify or add switch access credentials.

        4. (Optional) Assign a role to each switch.

        Upgrade Cumulus Linux

        After you complete the preparation steps, upgrade Cumulus Linux:

        1. Click Devices in any workbench header, then select Manage switches.

        2. Locate the Switches card and click Manage.

        3. Select the switches you want to upgrade.

        4. Click Upgrade OS above the table.

          Follow the steps in the UI. Create a name for the upgrade and review the switches that you selected to upgrade:

        screen displaying 2 switches selected for upgrading

        If you accidentally included a switch that you do not want to upgrade, hover over the switch information card and click Delete to remove it from the upgrade.

        If the role is incorrect or missing, click Edit, then select a role for that switch from the dropdown. Click Cancel to discard the change.

        1. Click Next.

        2. Select either a default image or custom version.

        3. Verify or add switch access credentials.

        4. Click Next.

        5. Verify the upgrade job options.

          By default, NetQ performs a roll back to the original Cumulus Linux version on any server which fails to upgrade. It also takes network snapshots before and after the upgrade.

          You can exclude selected services and protocols from the snapshots by clicking them. Node and services must be included.

        1. Click Next.

        2. NetQ performs several checks to eliminate preventable problems during the upgrade process. When all of the pre-checks pass, click Preview.

        3. NetQ directs you to a screen where you can review the upgrade. After reviewing, select Start upgrade and confirm.

        Perform the upgrade using the netq lcm upgrade cl-image command, providing a name for the upgrade job, the Cumulus Linux and NetQ version, and a comma-separated list of the hostname(s) to be upgraded:

        cumulus@switch:~$ netq lcm upgrade cl-image job-name upgrade-480 cl-version 5.5.0 netq-version 4.8.0 hostnames spine01,spine02
        

        Create a Network Snapshot

        You can also generate a network snapshot before and after the upgrade by adding the run-snapshot-before-after option to the command:

        cumulus@switch:~$ netq lcm upgrade cl-image job-name upgrade-480 cl-version 5.5.0 netq-version 4.8.0 hostnames spine01,spine02,leaf01,leaf02 order spine,leaf run-snapshot-before-after
        

        Restore upon an Upgrade Failure

        (Recommended) You can restore the previous version of Cumulus Linux if the upgrade job fails by adding the run-restore-on-failure option to the command.

        cumulus@switch:~$ netq lcm upgrade cl-image name upgrade-540 cl-version 5.4.0 netq-version 4.8.0 hostnames spine01,spine02,leaf01,leaf02 order spine,leaf run-restore-on-failure
        

        Pre-check Failures

        If one or more of the pre-checks fail, resolve the related issue and start the upgrade again. In the NetQ UI these failures appear on the Upgrade Preview page. In the NetQ CLI, it appears in the form of error messages in the netq lcm show upgrade-jobs cl-image command output.

        Analyze Results

        After starting the upgrade you can monitor the progress in the NetQ UI. Successful upgrades are indicated by a green . Failed upgrades display error messages indicating the cause of failure.

        To view the progress of current upgrade jobs and the history of previous upgrade jobs using the CLI, run netq lcm show upgrade-jobs cl-image.

        To see details of a particular upgrade job, run netq lcm show status job-ID.

        To see only Cumulus Linux upgrade jobs, run netq lcm show status cl-image job-ID.

        Upon successful upgrade, you can:

        Post-check Failures

        A successful upgrade can still have post-check warnings. For example, you updated the OS, but not all services are fully up and running after the upgrade. If one or more of the post-checks fail, warning messages appear in the Post-Upgrade Tasks section of the preview. Click the warning category to view the detailed messages.

        Upgrade Cumulus Linux on Switches Without NetQ Agent Installed

        To upgrade Cumulus Linux on switches without NetQ installed, create a switch discovery. The discovery results are then used to install or upgrade Cumulus Linux and NetQ on all discovered switches in a single procedure. You can run up to five jobs simultaneously; however, a given switch can only appear in one running job at a time.

        1. Run a switch discovery to discover switches without NetQ installed and add them to the device inventory.

        2. Select which switches you want to upgrade from each discovered category by clicking the checkbox on each switch card. Then click Next.

        1. Accept the default NetQ version or click Custom and select an alternate version.

        2. By default, the NetQ Agent and CLI are upgraded on the selected switches. If you do not want to upgrade the NetQ CLI, click Advanced and change the selection to No.

        3. Click Next.

        4. NetQ performs several checks to eliminate preventable problems during the upgrade process. When all of the pre-checks pass, select Install.

          After starting the upgrade you can monitor the progress from the preview page or the Upgrade History page.

        1. Run a switch discovery to discover switches without NetQ installed and add them to the device inventory.

        Use the netq lcm discover command, specifying a single IP address, a range of IP addresses where your switches are located in the network, or a CSV file containing the IP address.

        1. After discovery and determining which switches you need to upgrade, run the upgrade process as described above.

        Network Snapshots

        Snapshots capture a network’s state—including the services running on the network—at a particular point in time. Comparing snapshots lets you check what (if anything) changed in the network, which can be helpful when upgrading a switch or modifying its configuration. This section outlines how to create, compare, and interpret snapshots.

        Create a Network Snapshot

        To create a snapshot:

        1. From the workbench header, select snapshot Snapshot, then Create snapshot.

        2. Next, enter the snapshot’s name, time frame, and the elements you’d like included in the snapshot:

          modal prompting user to add name, time frame, and options while creating a snapshot

          To capture the network’s current state, click Now. To capture the network’s state at a previous date and time, click Past, then in the Start Time field, select the calendar icon.

          The Choose options field includes all the elements and services that may run on the network. All are selected by default. Click any element to remove it from the snapshot. Nodes and services are included in all snapshots.

          The Notes field is optional. You can add a note as a reminder of the snapshot’s purpose.

        3. Select Finish. The card now appears on your workbench.

        4. When you are finished viewing the snapshot, click Dismiss to remove it from your workbench. You can add it back by selecting snapshot Snapshot in the header and navigating to the option to view snapshots.

        Compare Network Snapshots

        You can compare the state of your network before and after an upgrade or other configuration change to help avoid unwanted changes to your network’s state.

        To compare network snapshots:

        1. From the workbench header, select snapshot Snapshot.

        2. Select Compare snapshots, then select the two snapshots you want to compare.

        3. Click Finish.

        If the snapshot cards are already on your workbench, place the cards side-by-side for a high-level comparison. For a more detailed comparison, click Compare on one of the cards and select a snapshot for comparison from the list.

        Interpreting the Comparison Data

        For each network element with changes, a visualization displays the differences between the two snapshots. Green represents additions, red represents subtractions, and orange represents updates.

        In the following example, Snapshot 3 and Snapshot 4 are being compared. Snapshot 3 has a BGP count of 212 and Snapshot 4 has a BGP count of 186. The comparison also shows 98 BGP updates.

        comparison data displayed for two snapshots

        From this view, you can dismiss the snapshots or select View Details for additional information and to filter and export the data as a JSON file.

        The following table describes the information provided for each element type when changes are present:

        Element Data Descriptions
        BGP
        • Hostname: Name of the host running the BGP session
        • VRF: Virtual route forwarding interface if used
        • BGP session: Session that was removed or added
        • ASN: Autonomous system number
        Config
        • Hostname: Name of the host where the configuration file was added or removed
        • Configuration file: File that was added or removed
        Interface
        • Hostname: Name of the host where the interface resides
        • Interface name: Name of the interface that was removed or added
        IP Address
        • Hostname: Name of the host where address was removed or added
        • Prefix: IP address prefix
        • Mask: IP address mask
        • Interface name: Name of the interface that owns the address
        Links
        • Hostname: Name of the host where the link was removed or added
        • Interface name: Name of the link
        • Kind: Bond, bridge, eth, loopback, macvlan, swp, vlan, vrf, or vxlan
        LLDP
        • Hostname: Name of the discovered host that was removed or added
        • Interface name: Name of the interface
        MAC Address
        • Hostname: Name of the host where MAC address resides
        • MAC address: MAC address that was removed or added
        • VLAN: VLAN associated with the MAC address
        MLAG
        • Hostname: Name of the host running the MLAG session
        • MLAG Sysmac: MAC address for a bond interface pair that was removed or added
        Neighbor
        • Hostname: Name of the neighbor peer that was removed or added
        • VRF: Virtual route forwarding interface if used
        • Interface name: Name of the neighbor interface
        • IP address: Neighbor IP address
        Node
        • Hostname: Name of the network node that was removed or added
        OSPF
        • Hostname: Name of the host running the OSPF session
        • Interface name: Name of the associated interface that was removed or added
        • Area: Routing domain for this host device
        • Peer ID: Network subnet address of router with access to the peer device
        Route
        • Hostname: Name of the host running the route that was removed or added
        • VRF: Virtual route forwarding interface associated with route
        • Prefix: IP address prefix
        Sensors
        • Hostname: Name of the host where sensor resides
        • Kind: Power supply unit, fan, or temperature
        • Name: Name of the sensor that was removed or added
        Services
        • Hostname: Name of the host where service is running
        • Name: Name of the service that was removed or added
        • VRF: Virtual route forwarding interface associated with service

        Events and Notifications

        Events provide information about how a network and its devices are operating during a given time period. They help with troubleshooting and alert network administrators to potential network problems before they become critical. You can view events in the UI or CLI and receive notifications about events via Slack, PagerDuty, syslog, email, or a generic webhook channel.

        NetQ captures three types of events:

        You can track events in the NetQ UI with the Events and What Just Happened cards:

        You can monitor system and threshold-crossing events in the CLI with the netq show events command. The netq show wjh-drop command lists all What Just Happened events or those with a selected drop type.

        Manage NetQ Agents

        Run the following commands to view the status of an agent, disable an agent, manage logging, and configure the events the agent collects.

        View NetQ Agent Status

        The syntax for the NetQ Agent status command is:

        netq [<hostname>] show agents
            [fresh | dead | rotten | opta]
            [around <text-time>]
            [json]
        

        You can view the status for a given switch, host or NetQ appliance or virtual machine. You can also filter by the status and view the status at a time in the past.

        To view the current status of all NetQ Agents, run:

        cumulus@switch~:$ netq show agents
        

        To view NetQ Agents that are not communicating, run:

        cumulus@switch~:$ netq show agents rotten
        No matching agents records found
        

        To view NetQ Agent status on the NetQ appliance or VM, run:

        cumulus@switch~:$ netq show agents opta
        Matching agents records:
        Hostname          Status           NTP Sync Version                              Sys Uptime                Agent Uptime              Reinitialize Time          Last Changed
        ----------------- ---------------- -------- ------------------------------------ ------------------------- ------------------------- -------------------------- -------------------------
        netq-ts           Fresh            yes      3.2.0-ub18.04u30~1601393774.104fb9e  Mon Sep 21 16:46:53 2020  Tue Sep 29 21:13:07 2020  Tue Sep 29 21:13:07 2020   Thu Oct  1 16:29:51 2020
        

        View NetQ Agent Configuration

        You can view the current configuration of a NetQ Agent to determine what data it collects and where it sends that data. The syntax for this command is:

        sudo netq config show agent 
            [cpu-limit|frr-monitor|kubernetes-monitor|loglevel|ssl|stats|wjh|wjh-threshold] 
            [json]
        

        The following example shows a NetQ Agent in an on-premises deployment, talking to an appliance or VM at 127.0.0.1 using the default ports and VRF. There is no special configuration to monitor Kubernetes, FRR, interface statistics, or WJH, and there are no limits on CPU usage or change to the default logging level.

        cumulus@switch:~$ sudo netq config show agent
        netq-agent             value      default
        ---------------------  ---------  ---------
        exhibitport
        exhibiturl
        server                 127.0.0.1  127.0.0.1
        cpu-limit              100        100
        agenturl
        enable-opta-discovery  True       True
        agentport              8981       8981
        port                   31980      31980
        vrf                    default    default
        ()
        

        To view the configuration of a particular aspect of a NetQ Agent, use the various options.

        This example shows a NetQ Agent configured with a CPU limit of 60%.

        cumulus@switch:~$ sudo netq config show agent cpu-limit
        CPU Quota
        -----------
        60%
        ()
        

        Modify the Configuration of the NetQ Agent on a Node

        The agent configuration commands let you:

        Commands apply to one agent at a time, and you run them on the switch or host where the NetQ Agent resides.

        Add or Remove a NetQ Agent

        To add or remove a NetQ Agent, you must add or remove the IP address (as well as the port and VRF, if specified) from the NetQ configuration file, /etc/netq/netq.yml. This adds or removes the information about the server where the agent sends the data it collects.

        To use the NetQ CLI to add or remove a NetQ Agent on a switch or host, run:

        sudo netq config add agent server <text-opta-ip> [port <text-opta-port>] [vrf <text-vrf-name>] [inband-interface <interface-name>]
        

        If you want to use a specific port on the server, use the port option. If you want the data sent over a particular virtual route interface, use the vrf option.

        This example shows how to add a NetQ Agent and tell it to send the data it collects to the NetQ server at the IPv4 address of 10.0.0.23 using the default port (port 31980 for on-premises and port 443 for cloud deployments) and the default VRF (mgmt). The port and VRF are not specified, so NetQ assumes default settings.

        cumulus@switch~:$ sudo netq config add agent server 10.0.0.23
        cumulus@switch~:$ sudo netq config restart agent
        

        This example shows how to add a NetQ Agent and tell it to send the data it collects to the NetQ server at the IPv4 address of 10.0.0.23 using the default port (port 31980 for on-premises and port 443 for cloud deployments) and the default VRF for a switch managed through an in-band connection on interface swp1:

        cumulus@switch~:$ sudo netq config add agent server 10.0.0.23 vrf default inband-interface swp1
        cumulus@switch~:$ sudo netq config restart agent
        

        To remove a NetQ Agent on a switch or host, run:

        sudo netq config del agent server
        

        Disable and Reenable a NetQ Agent

        You can temporarily disable the NetQ Agent on a node. Disabling the NetQ Agent maintains the data already collected in the NetQ database, but stops the NetQ Agent from collecting new data until you reenable it.

        To disable a NetQ Agent, run:

        cumulus@switch:~$ sudo netq config stop agent
        

        To reenable a NetQ Agent, run:

        cumulus@switch:~$ sudo netq config restart agent
        

        Configure a NetQ Agent to Limit Switch CPU Usage

        You can limit the NetQ Agent to use only a certain percentage of CPU resources on a switch. This setting requires a switch running Cumulus Linux versions 3.7, 4.1, or later.

        For more detail about this feature, refer to this Knowledge Base article.

        This example limits a NetQ Agent from consuming more than 40% of the CPU resources on a Cumulus Linux switch.

        cumulus@switch:~$ sudo netq config add agent cpu-limit 40
        cumulus@switch:~$ sudo netq config restart agent
        

        To remove the limit, run:

        cumulus@switch:~$ sudo netq config del agent cpu-limit
        cumulus@switch:~$ sudo netq config restart agent
        

        Configure a NetQ Agent to Collect Data from Selected Services

        You can enable and disable data collection about FRRouting (FRR), Kubernetes, and What Just Happened (WJH).

        To configure the agent to start or stop collecting FRR data, run:

        cumulus@chassis~:$ sudo netq config add agent frr-monitor
        cumulus@switch:~$ sudo netq config restart agent
        
        cumulus@chassis~:$ sudo netq config del agent frr-monitor
        cumulus@switch:~$ sudo netq config restart agent
        

        To configure the agent to start or stop collecting Kubernetes data, run:

        cumulus@switch:~$ sudo netq config add agent kubernetes-monitor
        cumulus@switch:~$ sudo netq config restart agent
        
        cumulus@switch:~$ sudo netq config del agent kubernetes-monitor
        cumulus@switch:~$ sudo netq config restart agent
        

        To configure the agent to start or stop collecting WJH data, run:

        cumulus@chassis~:$ sudo netq config add agent wjh
        cumulus@switch:~$ sudo netq config restart agent
        
        cumulus@chassis~:$ sudo netq config del agent wjh
        cumulus@switch:~$ sudo netq config restart agent
        

        Configure a NetQ Agent to Send Data to a Server Cluster

        If you have a server cluster arrangement for NetQ, you should configure the NetQ Agent to send the data it collects to every server in the cluster.

        To configure the agent to send data to the servers in your cluster, run:

        sudo netq config add agent cluster-servers <text-opta-ip-list> [port <text-opta-port>] [vrf <text-vrf-name>]
        

        You must separate the list of IP addresses by commas (not spaces). You can optionally specify a port or VRF.

        This example configures the NetQ Agent on a switch to send the data to three servers located at 10.0.0.21, 10.0.0.22, and 10.0.0.23 using the rocket VRF.

        cumulus@switch:~$ sudo netq config add agent cluster-servers 10.0.0.21,10.0.0.22,10.0.0.23 vrf rocket
        

        To stop a NetQ Agent from sending data to a server cluster, run:

        cumulus@switch:~$ sudo netq config del agent cluster-servers
        

        Configure Logging to Troubleshoot a NetQ Agent

        The logging level used for a NetQ Agent determines what types of events get logged about the NetQ Agent on the switch or host.

        First, you need to decide what level of logging you want to configure. You can configure the logging level to be the same for every NetQ Agent, or selectively increase or decrease the logging level for a NetQ Agent on a problematic node.

        Logging Level Description
        debug Sends notifications for all debug, info, warning, and error messages.
        info Sends notifications for info, warning, and error messages (default).
        warning Sends notifications for warning and error messages.
        error Sends notifications for errors messages.

        You can view the NetQ Agent log directly. Messages have the following structure:

        <timestamp> <node> <service>[PID]: <level>: <message>

        Element Description
        timestamp Date and time event occurred in UTC format
        node Hostname of network node where event occurred
        service [PID] Service and Process IDentifier that generated the event
        level Logging level assigned for the given event: debug, error, info, or warning
        message Text description of event, including the node where the event occurred

        For example:

        logging message anatomy, including timestamp, node, service, level, and message

        To configure a logging level, follow these steps. This example sets the logging level to debug:

        1. Set the logging level:

          cumulus@switch:~$ sudo netq config add agent loglevel debug
          
        2. Restart the NetQ Agent:

          cumulus@switch:~$ sudo netq config restart agent
          
        3. (Optional) Verify the connection to the NetQ appliance or VM by viewing the netq-agent.log messages.

        Disable Agent Logging

        If you set the logging level to debug for troubleshooting, NVIDIA recommends that you either change the logging level to a less verbose mode or disable agent logging when you finish troubleshooting.

        To change the logging level from debug to another level, run:

        cumulus@switch:~$ sudo netq config add agent loglevel [info|warning|error]
        cumulus@switch:~$ sudo netq config restart agent
        

        To disable all logging:

        cumulus@switch:~$ sudo netq config del agent loglevel
        cumulus@switch:~$ sudo netq config restart agent
        

        Change NetQ Agent Polling Data and Frequency

        The NetQ Agent contains a pre-configured set of modular commands that run periodically and send event and resource data to the NetQ appliance or VM. You can fine tune which events the agent can poll and vary frequency of polling using the NetQ CLI.

        For example, if your network is not running OSPF, you can disable the command that polls for OSPF events. Or you can decrease the polling interval for LLDP from the default of 60 seconds to 120 seconds. By not polling for selected data or polling less frequently, you can reduce switch CPU usage by the NetQ Agent.

        Depending on the switch platform, the NetQ Agent might not execute some supported protocol commands. For example, if a switch has no VXLAN capability, then the agent skips all VXLAN-related commands.

        Supported Commands

        To see the list of supported modular commands, run:

        cumulus@switch:~$ sudo netq config show agent commands
         Service Key               Period  Active       Command
        -----------------------  --------  --------  ---------------------------------------------------------------------
        bgp-neighbors                  60  yes       ['/usr/bin/vtysh', '-c', 'show ip bgp vrf all neighbors json']
        evpn-vni                       60  yes       ['/usr/bin/vtysh', '-c', 'show bgp l2vpn evpn vni json']
        lldp-json                     120  yes       /usr/sbin/lldpctl -f json
        clagctl-json                   60  yes       /usr/bin/clagctl -j
        dpkg-query                  21600  yes       dpkg-query --show -f ${Package},${Version},${Status}\n
        ptmctl-json                   120  yes       ptmctl
        mstpctl-bridge-json            60  yes       /sbin/mstpctl showall json
        ports                        3600  yes       Netq Predefined Command
        proc-net-dev                   30  yes       Netq Predefined Command
        agent_stats                   300  yes       Netq Predefined Command
        agent_util_stats               30  yes       Netq Predefined Command
        tcam-resource-json            120  yes       /usr/cumulus/bin/cl-resource-query -j
        btrfs-json                   1800  yes       /sbin/btrfs fi usage -b /
        config-mon-json               120  yes       Netq Predefined Command
        running-config-mon-json        30  yes       Netq Predefined Command
        cl-support-json               180  yes       Netq Predefined Command
        resource-util-json            120  yes       findmnt / -n -o FS-OPTIONS
        smonctl-json                   30  yes       /usr/sbin/smonctl -j
        ssd-util-json               86400  yes       sudo /usr/sbin/smartctl -a /dev/sda
        ospf-neighbor-json             60  yes       ['/usr/bin/vtysh', '-c', 'show ip ospf vrf all neighbor detail json']
        ospf-interface-json            60  yes       ['/usr/bin/vtysh', '-c', 'show ip ospf vrf all interface json']
        

        The NetQ predefined commands include:

        Modify the Polling Frequency

        You can change the polling frequency (in seconds) of a modular command. For example, to change the polling frequency of the lldp-json command to 60 seconds from its default of 120 seconds, run:

        cumulus@switch:~$ sudo netq config add agent command service-key lldp-json poll-period 60
        Successfully added/modified Command service lldpd command /usr/sbin/lldpctl -f json
        
        cumulus@switch:~$ sudo netq config show agent commands
         Service Key               Period  Active       Command
        -----------------------  --------  --------  ---------------------------------------------------------------------
        bgp-neighbors                  60  yes       ['/usr/bin/vtysh', '-c', 'show ip bgp vrf all neighbors json']
        evpn-vni                       60  yes       ['/usr/bin/vtysh', '-c', 'show bgp l2vpn evpn vni json']
        lldp-json                      60  yes       /usr/sbin/lldpctl -f json
        clagctl-json                   60  yes       /usr/bin/clagctl -j
        dpkg-query                  21600  yes       dpkg-query --show -f ${Package},${Version},${Status}\n
        ptmctl-json                   120  yes       /usr/bin/ptmctl -d -j
        mstpctl-bridge-json            60  yes       /sbin/mstpctl showall json
        ports                        3600  yes       Netq Predefined Command
        proc-net-dev                   30  yes       Netq Predefined Command
        agent_stats                   300  yes       Netq Predefined Command
        agent_util_stats               30  yes       Netq Predefined Command
        tcam-resource-json            120  yes       /usr/cumulus/bin/cl-resource-query -j
        btrfs-json                   1800  yes       /sbin/btrfs fi usage -b /
        config-mon-json               120  yes       Netq Predefined Command
        running-config-mon-json        30  yes       Netq Predefined Command
        cl-support-json               180  yes       Netq Predefined Command
        resource-util-json            120  yes       findmnt / -n -o FS-OPTIONS
        smonctl-json                   30  yes       /usr/sbin/smonctl -j
        ssd-util-json               86400  yes       sudo /usr/sbin/smartctl -a /dev/sda
        ospf-neighbor-json             60  no        ['/usr/bin/vtysh', '-c', 'show ip ospf vrf all neighbor detail json']
        ospf-interface-json            60  no        ['/usr/bin/vtysh', '-c', 'show ip ospf vrf all interface json']
        

        Disable a Command

        You can disable unnecessary commands. This can help reduce the compute resources the NetQ Agent consumes on the switch. For example, if your network does not run OSPF, you can disable the two OSPF commands:

        cumulus@switch:~$ sudo netq config add agent command service-key ospf-neighbor-json enable False
        Command Service ospf-neighbor-json is disabled
        
        cumulus@switch:~$ sudo netq config show agent commands
         Service Key               Period  Active       Command
        -----------------------  --------  --------  ---------------------------------------------------------------------
        bgp-neighbors                  60  yes       ['/usr/bin/vtysh', '-c', 'show ip bgp vrf all neighbors json']
        evpn-vni                       60  yes       ['/usr/bin/vtysh', '-c', 'show bgp l2vpn evpn vni json']
        lldp-json                      60  yes       /usr/sbin/lldpctl -f json
        clagctl-json                   60  yes       /usr/bin/clagctl -j
        dpkg-query                  21600  yes       dpkg-query --show -f ${Package},${Version},${Status}\n
        ptmctl-json                   120  yes       /usr/bin/ptmctl -d -j
        mstpctl-bridge-json            60  yes       /sbin/mstpctl showall json
        ports                        3600  yes       Netq Predefined Command
        proc-net-dev                   30  yes       Netq Predefined Command
        agent_stats                   300  yes       Netq Predefined Command
        agent_util_stats               30  yes       Netq Predefined Command
        tcam-resource-json            120  yes       /usr/cumulus/bin/cl-resource-query -j
        btrfs-json                   1800  yes       /sbin/btrfs fi usage -b /
        config-mon-json               120  yes       Netq Predefined Command
        running-config-mon-json        30  yes       Netq Predefined Command
        cl-support-json               180  yes       Netq Predefined Command
        resource-util-json            120  yes       findmnt / -n -o FS-OPTIONS
        smonctl-json                   30  yes       /usr/sbin/smonctl -j
        ssd-util-json               86400  yes       sudo /usr/sbin/smartctl -a /dev/sda
        ospf-neighbor-json             60  no        ['/usr/bin/vtysh', '-c', 'show ip ospf vrf all neighbor detail json']
        ospf-interface-json            60  no        ['/usr/bin/vtysh', '-c', 'show ip ospf vrf all interface json']
        

        Reset to Default

        To revert to the original command settings, run:

        cumulus@switch:~$ sudo netq config agent factory-reset commands
        Netq Command factory reset successful
        

        Networkwide Inventory

        Use the UI or CLI to monitor your network’s inventory of switches, hosts, NICs, and DPUs. The inventory includes a count for each device and information about the hardware and software components on individual switches, such as the operating system, motherboard, ASIC, microprocessor, disk, memory, fan, and power supply information.

        Networkwide Inventory Commands

        Several forms of this command are available based on the inventory component you’d like to view. See the command line reference for additional options, definitions, and examples.

        netq show inventory (brief | asic | board | cpu | disk | memory | os)
        

        View Networkwide Inventory in the UI

        To view the quantity of devices in your network, open the Inventory/Devices card. The medium-sized card displays operating system distribution across the network and the total number of devices in the network. Hover over the chart’s outer circle to view operating system distribution; hover over the chart’s inner circle to view device counts.

        medium inventory card displaying 5 switches, 3 hosts, and 1 DPU as a chart

        Expand to the large card for additional distribution info. By default, the Switches tab shows the total number of switches, ASIC vendors, OS versions, NetQ Agent versions, and platforms deployed across all your switches. You can hover over and select any of the segments in a component distribution chart to highlight and filter data, including:

        • Name or value of the component type, such as the version number or status
        • Total number of switches with a particular type of component deployed compared to the total number of switches
        • Percentage of this type as compared to all component types

        Expand the Inventory/Devices card to full-screen to view information for all switches, hosts, DPUs, and NICs in your network in a table where you can filter and export data:

        full-screen inventory/devices card displaying a list of switches

        Switch Inventory

        With the NetQ UI and NetQ CLI, you can monitor your inventory of switches across the network or individually. A user can view operating system, motherboard, ASIC, microprocessor, disk, memory, fan, and power supply information.

        For switch performance information, refer to Switch Monitoring.

        Switch Inventory Commands

        Several forms of this command are available based on the inventory component you’d like to view. See the command line reference for additional options, definitions, and examples.

        netq show inventory (brief | asic | board | cpu | disk | memory | license | os)
        

        To view Cumulus Linux OS versions supported on your switches, run netq show cl-manifest:

        netq show cl-manifest
        

        To view all installed software packages on your switches, run netq show cl-pkg-info:

        netq show cl-pkg-info
        

        To view recommended software package information for a switch, run netq show recommended-pkg-version:

        netq <hostname> show recommended-pkg-version
        

        Cumulus Linux, SONiC, and NetQ run services to deliver the various features of these products. You can monitor their status using the netq show services command:

        netq show services
        

        View Switch Inventory in the UI

        Add the Inventory/Switches card to your workbench to monitor the hardware and software component inventory on switches running NetQ in your network. To add this card to your workbench, select Add card > Inventory > Inventory/Switches card > Open cards. Select the dropdown to view additional inventory information.

        medium switch card displaying disk information for 15 switches    

        View Distribution and Component Counts

        Open the large Inventory/Switches card to display more granular information about software and hardware distribution. By default, the card displays data for fresh switches. Select Rotten switches from the dropdown to display information for switches that are in a down state. Hover over the top of the card and select a category to restrict the view to ASICs, platform, or software.

        switch software and hardware information

        Expand the Inventory/Switches card to full-screen to view, filter or export information about ASICs, motherboards, CPUs, memory, disks, and operating system.

        Decommission a Switch

        Decommissioning a switch or host removes information about the switch or host from the NetQ database. When the NetQ Agent restarts at a later date, it sends a connection request back to the database, so NetQ can monitor the switch or host again.

        1. Locate the Inventory/Switches card on your workbench and expand it to full-screen.

        2. Select the switches to decommission, then select Decommission device above the table.

          If you attempt to decommission a switch that is assigned a default, unmodified access profile, the process will fail. Create a unique access profile (or update the default with unique credentials), then attach the profile to the switch you want to decommission.

        3. Confirm the devices you want to decommission.

        4. Wait for the decommission process to complete, then select Done.

        To decommission a switch or host:

        1. On the given switch or host, stop and disable the NetQ Agent service:

          cumulus@switch:~$ sudo systemctl stop netq-agent
          cumulus@switch:~$ sudo systemctl disable netq-agent
          
        2. On the NetQ appliance or VM, decommission the switch or host:

          cumulus@netq-appliance:~$ netq decommission <hostname-to-decommission>
          

        Host Inventory

        In the UI, you can view your inventory of hosts across the network or individually, including a host’s operating system, ASIC, CPU model, disk, platform, and memory information.

        Access and View Host Inventory Data

        The Inventory/Hosts card monitors the hardware- and software-component inventory on hosts running NetQ in your network. To add this card to your workbench, select Add card > Inventory > Inventory/Hosts card > Open cards.

        host inventory card with chart

        Hover over the chart in the default card view to view component details. To view the distribution of components, hover over the card header and increase the card’s size. Select the corresponding icon to view a detailed chart for ASIC, platform, or software components:

        host inventory card displaying component distribution

        To display detailed information as a table, expand the card to its largest size:

        fully expanded host inventory card displaying table with hosts information

        Decommission a Host

        Decommissioning hosts removes information about the host from the NetQ database. The NetQ Agent must be disabled and in a ‘rotten’ state to complete the decommissioning process.

        1. Locate the Inventory/Devices card on your workbench and expand it to full-screen.

        2. From the Hosts tab, locate the Agent state column.

        list of hosts displaying a fresh netq agent

        If the NetQ Agents is in a ‘fresh’ state, you must stop and disable the NetQ Agent and wait until it reflects a ‘rotten’ state. To disable the agent, run the following commands on the host you want to decommission:

        cumulus@host:~$ sudo systemctl stop netq-agent
        cumulus@host:~$ sudo systemctl disable netq-agent
        

        It may take a few minutes for the agent’s new state to be reflected in the UI.

        1. After you have confirmed that the agent is in a ‘rotten’ state, select the host you’d like to decommission, then select Decommission device above the table.

        To decommission a host:

        1. Stop and disable the NetQ Agent service on the host:

          cumulus@host:~$ sudo systemctl stop netq-agent
          cumulus@host:~$ sudo systemctl disable netq-agent
          
        2. On the NetQ appliance or VM, decommission the host:

          cumulus@netq-appliance:~$ netq decommission <hostname-to-decommission>
          

        Validation Checks

        When you discover operational anomalies, you can check whether the devices, hosts, network protocols, and services are operating as expected. NetQ lets you see when changes have occurred to the network, devices, and interfaces by viewing their operation, configuration, and status at an earlier point in time.

        Validation support is available in the NetQ UI and the NetQ CLI for the following:

        Item NetQ UI NetQ CLI
        Addresses Yes Yes
        Agents Yes Yes
        BGP Yes Yes
        Cumulus Linux version No Yes
        EVPN Yes Yes
        Interfaces Yes Yes
        MLAG (CLAG) Yes Yes
        MTU Yes Yes
        NTP Yes Yes
        OSPF Yes Yes
        RoCE Yes Yes
        Sensors Yes Yes
        VLAN Yes Yes
        VXLAN Yes Yes

        View and Run Validations in the UI

        The Validation Summary card displays a summary of validation checks from the past 24 hours. Select Validation in the header to create or schedule new validation checks, as well as view previous checks.

        Validation with the NetQ CLI

        The NetQ CLI uses the netq check commands to validate the various elements of your network fabric, looking for inconsistencies in configuration across your fabric, connectivity faults, missing configurations, and so forth. You can run commands from any node in the network.

        View Default Validation Tests

        To view the list of tests run for a given protocol or service by default, use either netq show unit-tests <protocol/service> or perform a tab completion on netq check <protocol/service> [include|exclude]. Refer to Validation Tests Reference for a description of the individual tests.

        Select Which Tests to Run

        You can include or exclude one or more of the various tests performed during the validation. Each test is assigned a number, which is used to identify the tests. By default, all tests are run. The <protocol-number-range-list> value is used with the include and exclude options to indicate which tests to include. It is a number list separated by commas, or a range using a dash, or a combination of these. Do not use spaces after commas. For example:

        The output indicates whether a given test passed, failed, or was skipped.

        Example Validation Test

        The following example shows a BGP validation that includes only the session establishment and router ID tests. Note that you can obtain the same results using either of the include or exclude options and that the test that is not run is marked as skipped.

        cumulus@switch:~$ netq show unit-tests bgp
           0 : Session Establishment     - check if BGP session is in established state
           1 : Address Families          - check if tx and rx address family advertisement is consistent between peers of a BGP session
           2 : Router ID                 - check for BGP router id conflict in the network
        
        Configured global result filters:
        Configured per test result filters:
        
        cumulus@switch:~$ netq check bgp include 0,2
        bgp check result summary:
        
        Total nodes         : 10
        Checked nodes       : 10
        Failed nodes        : 0
        Rotten nodes        : 0
        Warning nodes       : 0
        
        Additional summary:
        Total Sessions      : 54
        Failed Sessions     : 0
        
        Session Establishment Test   : passed
        Address Families Test        : skipped
        Router ID Test               : passed
        
        cumulus@switch:~$ netq check bgp exclude 1
        bgp check result summary:
        
        Total nodes         : 10
        Checked nodes       : 10
        Failed nodes        : 0
        Rotten nodes        : 0
        Warning nodes       : 0
        
        Additional summary:
        Total Sessions      : 54
        Failed Sessions     : 0
        
        Session Establishment Test   : passed
        Address Families Test        : skipped
        Router ID Test               : passed
        

        Validation Check Result Filtering

        You can create filters to suppress false alarms or uninteresting errors and warnings. For example, certain configurations permit a singly connected MLAG bond, which generates a standard error that is not useful.

        Filtered errors and warnings related to validation checks do NOT generate notifications and are not counted in events totals. They are counted as part of suppressed notifications instead.

        You define these filters in the /etc/netq/check-filter.yml file. You can create a rule for individual check commands or you can create a global rule that applies to all tests run by the check command. Additionally, you can create a rule specific to a particular test run by the check command.

        Each rule must contain at least one match criteria and an action response. The only action currently available is filter. The match can comprise multiple criteria, one per line, creating a logical AND. You can match against any column in the validation check output. The match criteria values must match the case and spacing of the column names in the corresponding netq check output and are parsed as regular expressions.

        This example shows a global rule for the BGP checks that suppresses any events generated by the DataVrf virtual route forwarding interface coming from swp3 or swp7.. It also shows a test-specific rule to filter all Address Families events from devices with hostnames starting with exit-1 or firewall.

        bgp:
            global:
                - rule:
                    match:
                        VRF: DataVrf
                        Peer Name: (swp3|swp7.)
                    action:
                        filter
            tests:
                Address Families:
                    - rule:
                        match:
                            Hostname: (^exit1|firewall)
                        action:
                            filter
        

        Create Filters for Provisioning Exceptions

        You can configure filters to change validation errors to warnings that would normally occur due to the default expectations of the netq check commands. This applies to all protocols and services, except for agents. For example, if you provision BGP with configurations where a BGP peer is not expected or desired, then errors that a BGP peer is missing occur. By creating a filter, you can remove the error in favor of a warning.

        To create a validation filter:

        1. Navigate to the /etc/netq directory.

        2. Create or open the check_filter.yml file using your text editor of choice.

          This file contains the syntax to follow to create one or more rules for one or more protocols or services. Create your own rules, and/or edit and un-comment any example rules you would like to use.

          # Netq check result filter rule definition file.  This is for filtering
          # results based on regex match on one or more columns of each test result.
          # Currently, only action 'filter' is supported. Each test can have one or
          # more rules, and each rule can match on one or more columns.  In addition,
          # rules can also be optionally defined under the 'global' section and will
          # apply to all tests of a check.
          #
          # syntax:
          #
          # <check name>:
          #   tests:
          #     <test name, as shown in test list when using the include/exclude and tab>:
          #       - rule:
          #           match:
          #             <column name>: regex
          #             <more columns and regex.., result is AND>
          #           action:
          #             filter
          #       - <more rules..>
          #   global:
          #     - rule:
          #         . . .
          #     - rule:
          #         . . .
          #
          # <another check name>:
          #   . . .
          #
          # e.g.
          #
          # bgp:
          #   tests:
          #     Address Families:
          #       - rule:
          #           match:
          #             Hostname: (^exit*|^firewall)
          #             VRF: DataVrf1080
          #             Reason: AFI/SAFI evpn not activated on peer
          #           action:
          #             filter
          #       - rule:
          #           match:
          #             Hostname: exit-2
          #             Reason: SAFI evpn not activated on peer
          #           action:
          #             filter
          #     Router ID:
          #       - rule:
          #           match:
          #             Hostname: exit-2
          #           action:
          #             filter
          #
          # evpn:
          #   tests:
          #     EVPN Type 2:
          #       - rule:
          #           match:
          #             Hostname: exit-1
          #           action:
          #             filter
          #
          

        Use Validation Commands in Scripts

        If you are running scripts based on the older version of the netq check commands and want to stay with the old output, edit the netq.yml file to include old-check: true in the netq-cli section of the file. For example:

        netq-cli:
          port: 32708
          server: 127.0.0.1
          old-check: true
        

        Then run netq config restart cli to apply the change.

        If you update your scripts to work with the new version of the commands, change the old-check value to false or remove it. Then restart the CLI.

        DPU Inventory

        Use the UI or CLI to view your data processing unit (DPU) inventory. For DPU performance information, refer to DPU Monitoring.

        You must install and configure install and configure DOCA Telemetry Service to display DPU data in NetQ.

        DPU Inventory Commands

        Several forms of this command are available based on the inventory component you’d like to view. See the command line reference for additional options, definitions, and examples.

        netq show inventory (brief | asic | board | cpu | disk | memory | license | os)
        

        View DPU Inventory in the UI

        The Inventory/DPU card displays the hardware- and software-component inventory on DPUs running NetQ in your network, including operating system, ASIC, CPU model, disk, platform, and memory information.

        To add this card to your workbench, select Add card >Inventory > Inventory/DPU card > Open cards.

        DPU inventory card with chart

        Hover over the chart to view component details. To view the distribution of components, hover over the card header and increase the card’s size. Select the corresponding icon to view a detailed chart for ASIC, platform, or software components:

        medium DPU inventory card displaying component distribution

        Expand the card to its largest size to view, filter, and export detailed information:

        fully expanded DPU inventory card displaying a table with data

        Decommission a DPU

        Decommissioning DPUs removes information about the DPU from the NetQ database. The NetQ Agent must be disabled and in a ‘rotten’ state to complete the decommissioning process.

        1. Locate the Inventory/Devices card on your workbench and expand it to full-screen.

        2. From the DPUs tab, locate the Agent state column.

        list of DPUs displaying a fresh agent

        If the NetQ Agent is in a ‘fresh’ state, you must stop and disable the NetQ Agent and wait until it reflects a ‘rotten’ state. To disable the agent, run the following command on the DPU you want to decommission. Replace <netq_server> with the IP address of your NetQ VM:

        sed -i s'/<netq_server>/127.0.0.1/g' /etc/kubelet.d/doca_telemetry_standalone.yaml
        
        1. After you have confirmed that the agent is in a ‘rotten’ state, select the DPU you’d like to decommission, then select Decommission device above the table.

        To decommission a host:

        1. Stop and disable the NetQ Agent service on the host. Replace <netq_server> with the IP address of your NetQ VM:

          sed -i s'/<netq_server>/127.0.0.1/g' /etc/kubelet.d/doca_telemetry_standalone.yaml
          
        2. On the NetQ appliance or VM, decommission the DPU:

          cumulus@netq-appliance:~$ netq decommission <hostname-to-decommission>
          

        To read more about NVIDIA BlueField DPUs and the DOCA Telemetry Service, refer to the DOCA SDK Documentation.

        NIC Inventory

        Use the UI or CLI to view your network interface controller (NIC) inventory. For NIC performance information, refer to NIC Monitoring.

        NIC telemetry for ConnectX adapters is supported for on-premises NetQ deployments. You must have DOCA Telemetry Service enabled and Prometheus targets configured to display NIC data in NetQ.

        NIC Inventory Commands

        Run the netq show inventory brief command to display an inventory summary, including a list of NICs.

        netq show inventory brief
        

        View NIC Inventory in the UI

        The Inventory/NIC card displays the hardware- and software-component inventory on NICs running NetQ in your network, including connection adapters and firmware versions.

        To add this card to your workbench, select Add card > Inventory > Inventory/NICs card > Open cards. Select the dropdown on the card to display either connection adapters or firmware versions.

        NIC inventory card displaying firmware version

        Expand the card to full-screen to view a list of hosts and their associated NICs:

        fullscreen NIC inventory card displaying hosts and their associated NICs

        To view data from an individual NIC, select it from the table, then select Add card above the table. An individual NIC monitoring card opens on your workbench, displaying ports, packets, and bytes information:

        You can expand this card to large or full-screen to view detailed interface statistics, including frame and carrier errors.

        Decommission a NIC

        Decommissioning removes information about the NIC from the NetQ database.

        1. Stop the DTS container on the NIC’s host with the following command:

          docker stop doca_telemetry
          
        2. Locate the Inventory/Devices card on your workbench and expand it to full-screen.

        3. Navigate to the NICs tab.

        list of nics displaying a rotten netq agent
        1. Select the NIC you’d like to decommission, then select Decommission device above the table.

        To decommission a NIC:

        1. Stop the DTS container on the NIC’s host with the following command:

          docker stop doca_telemetry
          
        2. On the NetQ appliance or VM, decommission the NIC:

          cumulus@netq-appliance:~$ netq decommission '<hostname-to-decommission>;<NIC-guid>'
          

        Either obtain the NIC guid from the NetQ UI in the full-screen NIC Inventory card, or use tab completion with the netq decommission <hostname> command to view the NIC guids.

        Device Groups

        Device groups allow you to create a label for a subset of devices in the inventory. You can configure validation checks to run on select devices by referencing group names.

        Create a Device Group

        To create a device group, add the Device Groups card to your workbench. In the header, click Open card. Select the Device groups card:

        The Device Groups card will now be displayed on your workbench. Select Create new group and follow the instructions in the UI create a new group:

        1. Enter a name for the group.

        2. Create a hostname-based rule to define which devices in the inventory should be added to the group.

        3. Confirm the expected matched devices appear in the inventory, and click Create device group.

        The following example shows a group name of “exit group” matching any device in the inventory with “exit” in the hostname:

        Update a Device Group

        When new devices that match existing group rules are added to the inventory, NetQ flags the matching devices for review. The following example shows the switch “exit-2” detected in the inventory after the group was configured:

        To add the new device to the group inventory, click Add device and then click Update device group.

        Delete a Device Group

        To delete a device group:

        1. Expand the Device Groups card:
        1. Click Menu on the desired group and select Delete.

        Monitor Events

        Use the UI or CLI to monitor events: you can view all events across the entire network or all events on a device, then filter events according to their type, severity, or time frame. Event querying is supported for a 72-hour window within the past 30 days.

        Note that in the UI, it can take several minutes for NetQ to process and accurately display network events. The delay is caused by events with multiple network dependencies. It takes between 5 and 10 minutes for NetQ to consolidate and display these events.

        Refer to Configure System Event Notifications and Configure and Monitor Threshold-Crossing Events for information about configuring third-party applications to broadcast NetQ events.

        Event Commands

        Monitor events with the following command. See the command line reference for additional options, definitions, and examples.

        netq show events
        

        Monitor Events in the UI

        Expand the Menu, then select Events.

        The dashboard presents a timeline of events alongside the devices that are causing the most events.

        Events dashboard with networkwide error and info events.

        Use the controls above the summary to filter events by time, device (hostname), type, severity, or state.

        Select the tabs below the controls to display all events networkwide, interface events, network services events, system events, or threshold-crossing events. The charts and tables update according to the tab you’ve selected. In this example, the TCA tab is selected; the chart and tables update to reflect only threshold-crossing events:

        Events dashboard with networkwide error and info events.

        Events are also generated when streaming validation checks detect a failure. If an event is generated from a failed validation check, it will be marked resolved automatically the next time the check runs successfully.

        Suppress Events

        If you are receiving too many event notifications, you can create rules to suppress events. You can also create rules to suppress events attributable to known issues or false alarms. In addition to the rules you create to suppress events, NetQ suppresses some events by default.

        You can suppress events for the following types of messages:

        NetQ suppresses BGP, EVPN, link, and sensor-related events with a severity level of 'info' by default in the UI. You can disable this rule if you'd prefer to receive these notifications.

        Create an Event Suppression Configuration

        To suppress events using the NetQ UI:

        1. Click Menu, then Events.
        2. In the top-right corner, select Show suppression rules.
        3. Select Add rule. You can configure individual suppression rules or you can create a group rule that suppresses events for all message types.
        1. Enter the suppression rule parameters and click Create.

        When you add a new configuration using the CLI, you can specify a scope, which limits the suppression in the following order:

        1. Hostname.
        2. Severity.
        3. Message type-specific filters. For example, the target VNI for EVPN messages, or the interface name for a link message.

        NetQ has a predefined set of filter conditions. To see these conditions, run netq show events-config show-filter-conditions:

        cumulus@switch:~$ netq show events-config show-filter-conditions
        Matching config_events records:
        Message Name             Filter Condition Name                      Filter Condition Hierarchy                           Filter Condition Description
        ------------------------ ------------------------------------------ ---------------------------------------------------- --------------------------------------------------------
        evpn                     vni                                        3                                                    Target VNI
        evpn                     severity                                   2                                                    Severity error/info
        evpn                     hostname                                   1                                                    Target Hostname
        clsupport                fileAbsName                                3                                                    Target File Absolute Name
        clsupport                severity                                   2                                                    Severity error/info
        clsupport                hostname                                   1                                                    Target Hostname
        link                     new_state                                  4                                                    up / down
        link                     ifname                                     3                                                    Target Ifname
        link                     severity                                   2                                                    Severity error/info
        link                     hostname                                   1                                                    Target Hostname
        ospf                     ifname                                     3                                                    Target Ifname
        ospf                     severity                                   2                                                    Severity error/info
        ospf                     hostname                                   1                                                    Target Hostname
        sensor                   new_s_state                                4                                                    New Sensor State Eg. ok
        sensor                   sensor                                     3                                                    Target Sensor Name Eg. Fan, Temp
        sensor                   severity                                   2                                                    Severity error/info
        sensor                   hostname                                   1                                                    Target Hostname
        configdiff               old_state                                  5                                                    Old State
        configdiff               new_state                                  4                                                    New State
        configdiff               type                                       3                                                    File Name
        configdiff               severity                                   2                                                    Severity error/info
        configdiff               hostname                                   1                                                    Target Hostname
        ssdutil                  info                                       3                                                    low health / significant health drop
        ssdutil                  severity                                   2                                                    Severity error/info
        ssdutil                  hostname                                   1                                                    Target Hostname
        agent                    db_state                                   3                                                    Database State
        agent                    severity                                   2                                                    Severity error/info
        agent                    hostname                                   1                                                    Target Hostname
        ntp                      new_state                                  3                                                    yes / no
        ntp                      severity                                   2                                                    Severity error/info
        ntp                      hostname                                   1                                                    Target Hostname
        bgp                      vrf                                        4                                                    Target VRF
        bgp                      peer                                       3                                                    Target Peer
        bgp                      severity                                   2                                                    Severity error/info
        bgp                      hostname                                   1                                                    Target Hostname
        services                 new_status                                 4                                                    active / inactive
        services                 name                                       3                                                    Target Service Name Eg.netqd, mstpd, zebra
        services                 severity                                   2                                                    Severity error/info
        services                 hostname                                   1                                                    Target Hostname
        btrfsinfo                info                                       3                                                    high btrfs allocation space / data storage efficiency
        btrfsinfo                severity                                   2                                                    Severity error/info
        btrfsinfo                hostname                                   1                                                    Target Hostname
        clag                     severity                                   2                                                    Severity error/info
        clag                     hostname                                   1                                                    Target Hostname
        

        For example, to create a configuration called mybtrfs that suppresses OSPF-related events on leaf01 for the next 10 minutes, run:

        netq add events-config events_config_name mybtrfs message_type ospf scope '[{"scope_name":"hostname","scope_value":"leaf01"},{"scope_name":"severity","scope_value":"*"}]' suppress_until 600
        

        Delete or Disable an Event Suppression Rule

        You can delete or disable suppression rules. After you delete a rule, event notifications will resume. Disabling suppression rules pauses those rules, allowing you to receive event notifications temporarily.

        To remove suppressed event configurations:

        1. Click Menu, then Events.
        2. Select Show suppression rules at the top of the page.
        3. Toggle between the Single and All tabs to view the suppression rules. Navigate to the rule you want to delete or disable.
        4. Click the three-dot menu and select Delete. To pause the rule instead of deleting it, click Disable.

        To remove an event suppression configuration, run netq del events-config events_config_id <text-events-config-id-anchor>.

        cumulus@switch:~$ netq del events-config events_config_id eventsconfig_10
        Successfully deleted Events Config eventsconfig_10
        

        Show Event Suppression Rules

        To view suppressed events:

        1. Click Menu, then Events.
        2. Select Show suppression rules at the top of the page.
        3. Toggle between the Single and All tabs to view individual and group rules, respectively.

        You can view all event suppression configurations, or you can filter by a specific configuration or message type.

        cumulus@switch:~$ netq show events-config events_config_id eventsconfig_1
        Matching config_events records:
        Events Config ID     Events Config Name   Message Type         Scope                                                        Active Suppress Until
        -------------------- -------------------- -------------------- ------------------------------------------------------------ ------ --------------------
        eventsconfig_1       job_cl_upgrade_2d89c agent                {"db_state":"*","hostname":"spine02","severity":"*"}         True   Tue Jul  7 16:16:20
                             21b3effd79796e585c35                                                                                          2020
                             096d5fc6cef32b463e37
                             cca88d8ee862ae104d5_
                             spine02
        eventsconfig_1       job_cl_upgrade_2d89c bgp                  {"vrf":"*","peer":"*","hostname":"spine04","severity":"*"}   True   Tue Jul  7 16:16:20
                             21b3effd79796e585c35                                                                                          2020
                             096d5fc6cef32b463e37
                             cca88d8ee862ae104d5_
                             spine04
        eventsconfig_1       job_cl_upgrade_2d89c btrfsinfo            {"hostname":"spine04","info":"*","severity":"*"}             True   Tue Jul  7 16:16:20
                             21b3effd79796e585c35                                                                                          2020
                             096d5fc6cef32b463e37
                             cca88d8ee862ae104d5_
                             spine04
        eventsconfig_1       job_cl_upgrade_2d89c clag                 {"hostname":"spine04","severity":"*"}                        True   Tue Jul  7 16:16:20
                             21b3effd79796e585c35                                                                                          2020
                             096d5fc6cef32b463e37
                             cca88d8ee862ae104d5_
                             spine04
        eventsconfig_1       job_cl_upgrade_2d89c clsupport            {"fileAbsName":"*","hostname":"spine04","severity":"*"}      True   Tue Jul  7 16:16:20
                             21b3effd79796e585c35                                                                                          2020
                             096d5fc6cef32b463e37
                             cca88d8ee862ae104d5_
                             spine04
        eventsconfig_1       job_cl_upgrade_2d89c configdiff           {"new_state":"*","old_state":"*","type":"*","hostname":"spin True   Tue Jul  7 16:16:20
                             21b3effd79796e585c35                      e04","severity":"*"}                                                2020
                             096d5fc6cef32b463e37
                             cca88d8ee862ae104d5_
                             spine04
        eventsconfig_1       job_cl_upgrade_2d89c evpn                 {"hostname":"spine04","vni":"*","severity":"*"}              True   Tue Jul  7 16:16:20
                             21b3effd79796e585c35                                                                                          2020
                             096d5fc6cef32b463e37
                             cca88d8ee862ae104d5_
                             spine04
        eventsconfig_1       job_cl_upgrade_2d89c link                 {"ifname":"*","new_state":"*","hostname":"spine04","severity True   Tue Jul  7 16:16:20
                             21b3effd79796e585c35                      ":"*"}                                                              2020
                             096d5fc6cef32b463e37
                             cca88d8ee862ae104d5_
                             spine04
        eventsconfig_1       job_cl_upgrade_2d89c ntp                  {"new_state":"*","hostname":"spine04","severity":"*"}        True   Tue Jul  7 16:16:20
                             21b3effd79796e585c35                                                                                          2020
                             096d5fc6cef32b463e37
                             cca88d8ee862ae104d5_
                             spine04
        eventsconfig_1       job_cl_upgrade_2d89c ospf                 {"ifname":"*","hostname":"spine04","severity":"*"}           True   Tue Jul  7 16:16:20
                             21b3effd79796e585c35                                                                                          2020
                             096d5fc6cef32b463e37
                             cca88d8ee862ae104d5_
                             spine04
        eventsconfig_1       job_cl_upgrade_2d89c sensor               {"sensor":"*","new_s_state":"*","hostname":"spine04","severi True   Tue Jul  7 16:16:20
                             21b3effd79796e585c35                      ty":"*"}                                                            2020
                             096d5fc6cef32b463e37
                             cca88d8ee862ae104d5_
                             spine04
        eventsconfig_1       job_cl_upgrade_2d89c services             {"new_status":"*","name":"*","hostname":"spine04","severity" True   Tue Jul  7 16:16:20
                             21b3effd79796e585c35                      :"*"}                                                               2020
                             096d5fc6cef32b463e37
                             cca88d8ee862ae104d5_
                             spine04
        eventsconfig_1       job_cl_upgrade_2d89c ssdutil              {"hostname":"spine04","info":"*","severity":"*"}             True   Tue Jul  7 16:16:20
                             21b3effd79796e585c35                                                                                          2020
                             096d5fc6cef32b463e37
                             cca88d8ee862ae104d5_
                             spine04
        eventsconfig_10      job_cl_upgrade_2d89c btrfsinfo            {"hostname":"fw2","info":"*","severity":"*"}                 True   Tue Jul  7 16:16:22
                             21b3effd79796e585c35                                                                                          2020
                             096d5fc6cef32b463e37
                             cca88d8ee862ae104d5_
                             fw2
        eventsconfig_10      job_cl_upgrade_2d89c clag                 {"hostname":"fw2","severity":"*"}                            True   Tue Jul  7 16:16:22
                             21b3effd79796e585c35                                                                                          2020
                             096d5fc6cef32b463e37
                             cca88d8ee862ae104d5_
                             fw2
        eventsconfig_10      job_cl_upgrade_2d89c clsupport            {"fileAbsName":"*","hostname":"fw2","severity":"*"}          True   Tue Jul  7 16:16:22
                             21b3effd79796e585c35                                                                                          2020
                             096d5fc6cef32b463e37
                             cca88d8ee862ae104d5_
                             fw2
        eventsconfig_10      job_cl_upgrade_2d89c link                 {"ifname":"*","new_state":"*","hostname":"fw2","severity":"* True   Tue Jul  7 16:16:22
                             21b3effd79796e585c35                      "}                                                                  2020
                             096d5fc6cef32b463e37
                             cca88d8ee862ae104d5_
                             fw2
        eventsconfig_10      job_cl_upgrade_2d89c ospf                 {"ifname":"*","hostname":"fw2","severity":"*"}               True   Tue Jul  7 16:16:22
                             21b3effd79796e585c35                                                                                          2020
                             096d5fc6cef32b463e37
                             cca88d8ee862ae104d5_
                             fw2
        eventsconfig_10      job_cl_upgrade_2d89c sensor               {"sensor":"*","new_s_state":"*","hostname":"fw2","severity": True   Tue Jul  7 16:16:22
                             21b3effd79796e585c35                      "*"}                                                                2020
                             096d5fc6cef32b463e37
                             cca88d8ee862ae104d5_
                             fw2
        

        When you filter for a message type, you must include the show-filter-conditions keyword to show the conditions associated with that message type and the hierarchy in which they get processed.

        cumulus@switch:~$ netq show events-config message_type evpn show-filter-conditions
        Matching config_events records:
        Message Name             Filter Condition Name                      Filter Condition Hierarchy                           Filter Condition Description
        ------------------------ ------------------------------------------ ---------------------------------------------------- --------------------------------------------------------
        evpn                     vni                                        3                                                    Target VNI
        evpn                     severity                                   2                                                    Severity error/info
        evpn                     hostname                                   1                                                    Target Hostname
        

        Configure System Event Notifications

        You can view system events via the NetQ UI or CLI. You can also receive event notifications via a third-party application. This page explains how to integrate NetQ with syslog, PagerDuty, Slack, or email to receive notifications about events on your network. Alternately, you can send notifications to other third-party applications via a generic webhook channel.

        In an on-premises deployment, NetQ receives the raw data stream from the NetQ Agents, processes the data, then delivers events to notification channels. In a cloud deployment, NetQ passes the raw data stream to the NetQ Cloud service for processing and delivery.

        You can implement a proxy server (that sits between the NetQ appliance or VM and the integration channels) that receives, processes, and distributes the notifications rather than having them sent directly to the integration channel. If you use such a proxy, you must configure NetQ with the proxy information.

        NetQ generates notifications for network protocols, interfaces, services, traces, sensors, system software, and system hardware. Refer to the System Events Reference for descriptions and examples of these events.

        Event filters are based on rules you create. You must have at least one rule per filter. A select set of events can be triggered by a user-configured threshold. Refer to the Threshold-Crossing Events Reference for descriptions and examples of these events.

        Event Message Format

        Messages have the following structure: <message-type><timestamp><opid><hostname><severity><message>

        Element Description
        message type Category of event
        timestamp Date and time event occurred
        opid Identifier of the service or process that generated the event
        hostname Hostname of network device where event occurred
        severity Severity classification: error or info
        message Text description of event

        For example:

        To set up the integrations, you must configure NetQ with at least one channel, one rule, and one filter. To refine what messages you want to view and where to send them, you can add additional rules and filters and set thresholds on supported event types. You can also configure a proxy server to receive, process, and forward the messages. This is accomplished in the following order:

        Configure Basic NetQ Event Notifications

        The simplest configuration you can create is one that sends all events generated by all interfaces to a single notification application. A notification configuration must contain one channel, one rule, and one filter. Creation of the configuration follows this same path:

        1. Create a channel.
        2. Create a rule that accepts a selected set of events.
        3. Create a filter that associates this rule with the newly created channel.

        Create a Channel

        The first step is to create a Slack, PagerDuty, syslog, email, or generic channel to receive the notifications.

        You can use the NetQ UI or the NetQ CLI to create a Slack channel.

        1. Expand the Menu and select Notification channels.

        2. The Slack tab is displayed by default.

        3. Add a channel.

          • When no channels have been specified, click Add Slack channel.
          • When at least one channel has been specified, click Add above the table.
        4. Provide a unique name for the channel. Note that spaces are not allowed. Use dashes or camelCase instead.

        5. Create an incoming webhook as described in the Slack documentation Then copy and paste it in the Webhook URL field.

        6. Click Add.

        7. (Optional) To verify the channel configuration, click Test.

        To create and verify a Slack channel, run:

        netq add notification channel slack <text-channel-name> webhook <text-webhook-url> [severity info|severity error] [tag <text-slack-tag>]
        netq show notification channel [json]
        
        Option Description
        <text-channel-name> User-specified Slack channel name
        webhook <text-webhook-url> WebHook URL for the desired channel. For example: https://hooks.slack.com/services/text/moretext/evenmoretext
        severity <level> The log level, either info or error. The severity defaults to info if unspecified.
        tag <text-slack-tag> Optional tag appended to the Slack notification to highlight particular channels or people. An @ sign must precede the tag value. For example, @netq-info.

        The following example shows the creation of a slk-netq-events channel and verifies the configuration.

        1. Create an incoming webhook as described in the documentation for your version of Slack.

        2. Create the channel.

          cumulus@switch:~$ netq add notification channel slack slk-netq-events webhook https://hooks.slack.com/services/text/moretext/evenmoretext
          Successfully added/updated channel slk-netq-events
          
        3. Verify the configuration.

          cumulus@switch:~$ netq show notification channel
          Matching config_notify records:
          Name            Type             Severity Channel Info
          --------------- ---------------- -------- ----------------------
          slk-netq-events slack            info     webhook:https://hooks.s
                                                      lack.com/services/text/
                                                      moretext/evenmoretext
          

        You can use the NetQ UI or the NetQ CLI to create a PagerDuty channel.

        1. Expand the Menu and select Notification channels.

        2. Click PagerDuty.

        3. Add a channel.

          • When no channels have been specified, click Add PagerDuty channel.
          • When at least one channel has been specified, click Add above the table.
        4. Provide a unique name for the channel. Note that spaces are not allowed. Use dashes or camelCase instead.

        5. Obtain and enter an integration key (also called a service key or routing key).

        6. Click Add.

        7. (Optional) To verify the channel configuration, click Test.

        To create and verify a PagerDuty channel, run:

        netq add notification channel pagerduty <text-channel-name> integration-key <text-integration-key> [severity info|severity error]
        netq show notification channel [json]
        
        Option Description
        <text-channel-name> User-specified PagerDuty channel name
        integration-key <text-integration-key> The integration key is also called the service_key or routing_key. The default is an empty string ("").
        severity <level> (Optional) The log level, either info or error. The severity defaults to info if unspecified.
        The following example shows the creation of a *pd-netq-events* channel and verifies the configuration.
        1. Obtain an integration key as described in this PagerDuty support page.

        2. Create the channel.

          cumulus@switch:~$ netq add notification channel pagerduty pd-netq-events integration-key c6d666e210a8425298ef7abde0d1998
          Successfully added/updated channel pd-netq-events
          
        3. Verify the configuration.

          cumulus@switch:~$ netq show notification channel
          Matching config_notify records:
          Name            Type             Severity         Channel Info
          --------------- ---------------- ---------------- ------------------------
          pd-netq-events  pagerduty        info             integration-key: c6d666e
                                                          210a8425298ef7abde0d1998
          

        You can use the NetQ UI or the NetQ CLI to create a syslog channel.

        1. Expand the Menu and select Notification channels.

        2. Click Syslog.

        3. Add a channel.

          • When no channels have been specified, click Add syslog channel.
          • When at least one channel has been specified, click Add above the table.
        4. Provide a unique name for the channel. Note that spaces are not allowed. Use dashes or camelCase instead.

        5. Enter the IP address and port of the syslog server.

        6. Click Add.

        7. (Optional) To verify the channel configuration, click Test.

        To create and verify a syslog channel, run:

        netq add notification channel syslog <text-channel-name> hostname <text-syslog-hostname> port <text-syslog-port> [severity info | severity error ]
        netq show notification channel [json]
        
        Option Description
        <text-channel-name> User-specified syslog channel name
        hostname <text-syslog-hostname> Hostname or IP address of the syslog server to receive notifications
        port <text-syslog-port> Port on the syslog server to receive notifications
        severity <level> The log level, either info or error. The severity defaults to info if unspecified.

        The following example shows the creation of a syslog-netq-events channel and verifies the configuration.

        1. Obtain the syslog server hostname (or IP address) and port.

        2. Create the channel.

          cumulus@switch:~$ netq add notification channel syslog syslog-netq-events hostname syslog-server port 514
          Successfully added/updated channel syslog-netq-events
          
        3. Verify the configuration.

          cumulus@switch:~$ netq show notification channel
          Matching config_notify records:
          Name            Type             Severity Channel Info
          --------------- ---------------- -------- ----------------------
          syslog-netq-eve syslog            info     host:syslog-server
          nts                                        port: 514
          

        You can use the NetQ UI or the NetQ CLI to create an email channel.

        1. Expand the Menu and select Notification channels.

        2. Click Email.

        3. Add a channel.

          • When no channels have been specified, click Add email channel.
          • When at least one channel has been specified, click Add above the table.
        4. Provide a unique name for the channel. Note that spaces are not allowed. Use dashes or camelCase instead.

        5. Enter a list of emails for the people who you want to receive notifications from this channel.

          Enter the emails separated by commas, and no spaces. For example: user1@domain.com,user2@domain.com,user3@domain.com

        6. The first time you configure an email channel, you must also specify the SMTP server information:

          • Host: hostname or IP address of the SMTP server
          • Port: port of the SMTP server (typically 587)
          • User ID/Password: your administrative credentials
          • From: email address that indicates who sent the notifications

          After the first time, any additional email channels you create can use this configuration, by clicking Existing.

        7. Click Add.

        8. (Optional) To verify the channel configuration, click Test.

        To create and verify the specification of an email channel, run:

        netq add notification channel email <text-channel-name> to <text-email-toids> [smtpserver <text-email-hostname>] [smtpport <text-email-port>] [login <text-email-id>] [password <text-email-password>] [severity info | severity error ]
        netq add notification channel email <text-channel-name> to <text-email-toids>
        netq show notification channel [json]
        

        The configuration is different depending on whether you are using the on-premises or cloud version of NetQ. Do not configure SMTP for cloud deployments as the NetQ cloud service uses the NetQ SMTP server to push email notifications.

        For an on-premises deployment:

        1. Set up an SMTP server. The server can be internal or public.

        2. Create a user account (login and password) on the SMTP server. NetQ sends notifications to this address.

        3. Create the notification channel using this form of the CLI command:

          netq add notification channel email <text-channel-name> to <text-email-toids>  [smtpserver <text-email-hostname>] [smtpport <text-email-port>] [login <text-email-id>] [password <text-email-password>] [severity info | severity error ]
          
        For example:
        cumulus@switch:~$ netq add notification channel email onprem-email to netq-notifications@domain.com smtpserver smtp.domain.com smtpport 587 login smtphostlogin@domain.com password MyPassword123
        Successfully added/updated channel onprem-email
        
        1. Verify the configuration.

          cumulus@switch:~$ netq show notification channel
          Matching config_notify records:
          Name            Type             Severity         Channel Info
          --------------- ---------------- ---------------- ------------------------
          onprem-email    email            info             password: MyPassword123,
                                                            port: 587,
                                                            isEncrypted: True,
                                                            host: smtp.domain.com,
                                                            from: smtphostlogin@doma
                                                            in.com,
                                                            id: smtphostlogin@domain
                                                            .com,
                                                            to: netq-notifications@d
                                                            omain.com
          

        For a cloud deployment:

        1. Create the notification channel using this form of the CLI command:

          netq add notification channel email <text-channel-name> to <text-email-toids>
          
        For example:
        cumulus@switch:~$ netq add notification channel email cloud-email to netq-cloud-notifications@domain.com
        Successfully added/updated channel cloud-email
        
        1. Verify the configuration.

          cumulus@switch:~$ netq show notification channel
          Matching config_notify records:
          Name            Type             Severity         Channel Info
          --------------- ---------------- ---------------- ------------------------
          cloud-email    email            info             password: TEiO98BOwlekUP
                                                           TrFev2/Q==, port: 587,
                                                           isEncrypted: True,
                                                           host: netqsmtp.domain.com,
                                                           from: netqsmtphostlogin@doma
                                                           in.com,
                                                           id: smtphostlogin@domain
                                                           .com,
                                                           to: netq-notifications@d
                                                           omain.com
          

        You can use the NetQ UI or the NetQ CLI to create a generic channel.

        1. Click Menu, then click Notification channels.

        2. Click Generic.

        3. Add a channel.

          • When no channels have been specified, click Add generic channel.
          • When at least one channel has been specified, click Add above the table.
        4. Provide a unique name for the channel. Note that spaces are not allowed. Use dashes or camelCase instead.

        5. Specify a webhook URL.

        6. Set the desired notification severity, SSL, and authentication parameters for this channel.

        7. Click Add.

        8. (Optional) To verify the channel configuration, click Test.

        To create and verify a generic channel, run:

        netq add notification channel generic <text-channel-name> webhook <text-webhook-url> [severity info | severity error ] [use-ssl True | use-ssl False] [auth-type basic-auth generic-username <text-generic-username> generic-password <text-generic-password> | auth-type api-key key-name <text-api-key-name> key-value <text-api-key-value>]
        netq show notification channel [json]
        
        Option Description
        <text-channel-name> User-specified generic channel name
        webhook <text-webhook-url> URL of the remote application to receive notifications
        severity <level> The log level, either info or error. The severity defaults to info if unspecified.
        use-ssl [True | False] Enable or disable SSL
        auth-type [basic-auth | api-key] Set authentication parameters. Either basic-auth with generic-username and generic-password or api-key with a key-name and key-value

        Create a Rule

        The second step is to create and verify a rule that accepts a set of events. You create rules for system events using the NetQ CLI.

        To create and verify a rule, run:

        netq add notification rule <text-rule-name> key <text-rule-key> value <text-rule-value>
        netq show notification rule [json]
        

        Refer to the Rule Keys and Values Reference for a list of available keys and values.

        To remove notification rules, run:

        netq del notification rule <text-rule-name-anchor>
        
        Example rules

        This example creates a rule named all-interfaces, using the key ifname and the value ALL, which sends all events from all interfaces to any channel with this rule.

        cumulus@switch:~$ netq add notification rule all-interfaces key ifname value ALL
        Successfully added/updated rule all-ifs
        
        cumulus@switch:~$ netq show notification rule
        Matching config_notify records:
        Name            Rule Key         Rule Value
        --------------- ---------------- --------------------
        all-interfaces  ifname           ALL
        

        Create a BGP rule based on hostname:

        cumulus@switch:~$ netq add notification rule bgpHostname key hostname value spine-01
        Successfully added/updated rule bgpHostname 
        

        Create a rule based on a configuration file state change:

        cumulus@switch:~$ netq add notification rule sysconf key configdiff value updated
        Successfully added/updated rule sysconf
        

        Create an EVPN rule based on a VNI:

        cumulus@switch:~$ netq add notification rule evpnVni key vni value 42
        Successfully added/updated rule evpnVni
        

        Create an interface rule based on FEC support:

        cumulus@switch:~$ netq add notification rule fecSupport key new_supported_fec value supported
        Successfully added/updated rule fecSupport
        

        Create a service rule based on a status change:

        cumulus@switch:~$ netq add notification rule svcStatus key new_status value down
        Successfully added/updated rule svcStatus
        

        Create a sensor rule based on a threshold:

        cumulus@switch:~$ netq add notification rule overTemp key new_s_crit value 24
        Successfully added/updated rule overTemp
        

        Create an interface rule based on a port:

        cumulus@switch:~$ netq add notification rule swp52 key port value swp52
        Successfully added/updated rule swp52 
        

        Create a Filter

        The final step is to create a filter to tie the rule to the channel. You create filters for system events using the NetQ CLI.

        To create and verify a filter, run:

        netq add notification filter <text-filter-name> rule <text-rule-name-anchor> channel <text-channel-name-anchor>
        netq show notification filter [json]
        

        These examples use the channels and rules created in the previous sections. After creating this filter, NetQ will send all interface events to your designated channel.

        cumulus@switch:~$ netq add notification filter notify-all-ifs rule all-interfaces channel pd-netq-events
        Successfully added/updated filter notify-all-ifs
        
        cumulus@switch:~$ netq show notification filter
        Matching config_notify records:
        Name            Order      Severity         Channels         Rules
        --------------- ---------- ---------------- ---------------- ----------
        notify-all-ifs  1          info             pd-netq-events   all-interfaces
        
        cumulus@switch:~$ netq add notification filter notify-all-ifs rule all-interfaces channel slk-netq-events
        Successfully added/updated filter notify-all-ifs
        
        cumulus@switch:~$ netq show notification filter
        Matching config_notify records:
        Name            Order      Severity         Channels         Rules
        --------------- ---------- ---------------- ---------------- ----------
        notify-all-ifs  1          info             slk-netq-events   all-interfaces
        
        cumulus@switch:~$ netq add notification filter notify-all-ifs rule all-interfaces channel syslog-netq-events
        Successfully added/updated filter notify-all-ifs
        
        cumulus@switch:~$ netq show notification filter
        Matching config_notify records:
        Name            Order      Severity         Channels         Rules
        --------------- ---------- ---------------- ---------------- ----------
        notify-all-ifs  1          info             syslog-netq-events all-ifs
        
        cumulus@switch:~$ netq add notification filter notify-all-ifs rule all-interfaces channel onprem-email
        Successfully added/updated filter notify-all-ifs
        
        cumulus@switch:~$ netq show notification filter
        Matching config_notify records:
        Name            Order      Severity         Channels         Rules
        --------------- ---------- ---------------- ---------------- ----------
        notify-all-ifs  1          info             onprem-email all-ifs
        

        Additional filter examples

        Create a filter for BGP events on a particular device:

        cumulus@switch:~$ netq add notification filter bgpSpine rule bgpHostname channel pd-netq-events
        Successfully added/updated filter bgpSpine
        

        Create a filter for a given VNI in your EVPN overlay:

        cumulus@switch:~$ netq add notification filter vni42 severity warning rule evpnVni channel pd-netq-events
        Successfully added/updated filter vni42
        

        Create a filter for when a configuration file is updated:

        cumulus@switch:~$ netq add notification filter configChange severity info rule sysconf channel slk-netq-events
        Successfully added/updated filter configChange
        

        Create a filter to monitor ports with FEC support:

        cumulus@switch:~$ netq add notification filter newFEC rule fecSupport channel slk-netq-events
        Successfully added/updated filter newFEC
        

        Create a filter to monitor for services that change to a down state:

        cumulus@switch:~$ netq add notification filter svcDown severity error rule svcStatus channel slk-netq-events
        Successfully added/updated filter svcDown
        

        Create a filter to monitor overheating platforms:

        cumulus@switch:~$ netq add notification filter critTemp severity error rule overTemp channel onprem-email
        Successfully added/updated filter critTemp
        

        Create a filter to drop messages from a given interface, and match against this filter before any other filters. To create a drop-style filter, do not specify a channel. To list the filter first, use the before option.

        cumulus@switch:~$ netq add notification filter swp52Drop severity error rule swp52 before bgpSpine
        Successfully added/updated filter swp52Drop
        

        Filter names can contain spaces, but must be enclosed with single quotes in commands. It is easier to use dashes in place of spaces or mixed case for better readability. For example, use bgpSessionChanges or BGP-session-changes or BGPsessions, instead of 'BGP Session Changes'. Filter names are also case sensitive.

        As you create filters, they are added to the bottom of a list of filters. By default, NetQ processes event messages against filters starting at the top of the filter list and works its way down until it finds a match. NetQ applies the first filter that matches an event message, ignoring the other filters. Then it moves to the next event message and reruns the process, starting at the top of the list of filters. NetQ ignores events that do not match any filter.

        You might have to change the order of filters in the list to ensure you capture the events you want and drop the events you do not want. This is possible using the before or after keywords to ensure one rule is processed before or after another.

        To delete notification filters, run:

        netq del notification filter <text-filter-name-anchor>
        

        Delete a Channel

        You can remove channels if they are not part of an existing notification configuration.

        To remove notification channels:

        1. Expand the Menu and select Notification channels.

        2. Select the tab for the type of channel you want to remove.

        3. Select one or more channels.

        4. Click Delete.

        To remove notification channels, run:

        netq del notification channel <text-channel-name-anchor>
        

        This example removes a Slack integration and verifies it is no longer in the configuration:

        cumulus@switch:~$ netq del notification channel slk-netq-events
        
        cumulus@switch:~$ netq show notification channel
        Matching config_notify records:
        Name            Type             Severity         Channel Info
        --------------- ---------------- ---------------- ------------------------
        pd-netq-events  pagerduty        info             integration-key: 1234567
                                                            890
        

        Configure a Proxy Server

        To send notification messages through a proxy server instead of directly to a notification channel, you configure NetQ with the hostname and optionally a port of a proxy server. If you do not specify a port, NetQ defaults to port 80. NetQ supports one proxy server. To simplify deployment, configure your proxy server before configuring channels, rules, or filters.

        To configure and verify the proxy server, run:

        netq add notification proxy <text-proxy-hostname> [port <text-proxy-port>]
        netq show notification proxy
        

        This example configures and verifies the proxy4 server on port 80 to act as a proxy for event notifications.

        cumulus@switch:~$ netq add notification proxy proxy4
        Successfully configured notifier proxy proxy4:80
        
        cumulus@switch:~$ netq show notification proxy
        Matching config_notify records:
        Proxy URL          Slack Enabled              PagerDuty Enabled
        ------------------ -------------------------- ----------------------------------
        proxy4:80          yes                        yes
        

        You can remove the proxy server with netq del notification proxy. This changes the NetQ behavior to send events directly to the notification channels.

        Rule Keys and Values Reference

        A single key-value pair comprises each rule. The key-value pair indicates what messages to include or drop from event information sent to a notification channel. You can create more than one rule for a single filter. Creating multiple rules for a given filter can provide a very defined filter. For example, you can specify rules around hostnames or interface names, enabling you to filter messages specific to those hosts or interfaces. You can only create rules after you have set up your notification channels.

        NetQ includes a predefined fixed set of valid rule keys. You enter values as regular expressions, which vary according to your deployment.

        Service Rule Key Description Example Rule Values
        BGP message_type Network protocol or service identifier bgp
        hostname User-defined, text-based name for a switch or host server02, leaf11, exit01, spine-4
        peer User-defined, text-based name for a peer switch or host server4, leaf-3, exit02, spine06
        desc Text description
        vrf Name of VRF interface mgmt, default
        old_state Previous state of the BGP service Established, Failed
        new_state Current state of the BGP service Established, Failed
        old_last_reset_time Previous time that BGP service was reset Apr3, 2019, 4:17 PM
        new_last_reset_time Most recent time that BGP service was reset Apr8, 2019, 11:38 AM
        ConfigDiff message_type Network protocol or service identifier configdiff
        hostname User-defined, text-based name for a switch or host server02, leaf11, exit01, spine-4
        vni Virtual Network Instance identifier 12, 23
        old_state Previous state of the configuration file created, modified
        new_state Current state of the configuration file created, modified
        EVPN message_type Network protocol or service identifier evpn
        hostname User-defined, text-based name for a switch or host server02, leaf-9, exit01, spine04
        vni Virtual Network Instance identifier 12, 23
        old_in_kernel_state Previous VNI state, in kernel or not true, false
        new_in_kernel_state Current VNI state, in kernel or not true, false
        old_adv_all_vni_state Previous VNI advertising state, advertising all or not true, false
        new_adv_all_vni_state Current VNI advertising state, advertising all or not true, false
        LCM message_type Network protocol or service identifier clag
        hostname User-defined, text-based name for a switch or host server02, leaf-9, exit01, spine04
        old_conflicted_bonds Previous pair of interfaces in a conflicted bond swp7 swp8, swp3 swp4
        new_conflicted_bonds Current pair of interfaces in a conflicted bond swp11 swp12, swp23 swp24
        old_state_protodownbond Previous state of the bond protodown, up
        new_state_protodownbond Current state of the bond protodown, up
        Link message_type Network protocol or service identifier link
        hostname User-defined, text-based name for a switch or host server02, leaf-6, exit01, spine7
        ifname Software interface name eth0, swp53
        LLDP message_type Network protocol or service identifier lldp
        hostname User-defined, text-based name for a switch or host server02, leaf41, exit01, spine-5, tor-36
        ifname Software interface name eth1, swp12
        old_peer_ifname Previous software interface name eth1, swp12, swp27
        new_peer_ifname Current software interface name eth1, swp12, swp27
        old_peer_hostname Previous user-defined, text-based name for a peer switch or host server02, leaf41, exit01, spine-5, tor-36
        new_peer_hostname Current user-defined, text-based name for a peer switch or host server02, leaf41, exit01, spine-5, tor-36
        MLAG (CLAG) message_type Network protocol or service identifier clag
        hostname User-defined, text-based name for a switch or host server02, leaf-9, exit01, spine04
        old_conflicted_bonds Previous pair of interfaces in a conflicted bond swp7 swp8, swp3 swp4
        new_conflicted_bonds Current pair of interfaces in a conflicted bond swp11 swp12, swp23 swp24
        old_state_protodownbond Previous state of the bond protodown, up
        new_state_protodownbond Current state of the bond protodown, up
        Node message_type Network protocol or service identifier node
        hostname User-defined, text-based name for a switch or host server02, leaf41, exit01, spine-5, tor-36
        ntp_state Current state of NTP service in sync, not sync
        db_state Current state of DB Add, Update, Del, Dead
        NTP message_type Network protocol or service identifier ntp
        hostname User-defined, text-based name for a switch or host server02, leaf-9, exit01, spine04
        old_state Previous state of service in sync, not sync
        new_state Current state of service in sync, not sync
        Port message_type Network protocol or service identifier port
        hostname User-defined, text-based name for a switch or host server02, leaf13, exit01, spine-8, tor-36
        ifname Interface name eth0, swp14
        old_speed Previous speed rating of port 10 G, 25 G, 40 G, unknown
        old_transreceiver Previous transceiver 40G Base-CR4, 25G Base-CR
        old_vendor_name Previous vendor name of installed port module Amphenol, OEM, NVIDIA, Fiberstore, Finisar
        old_serial_number Previous serial number of installed port module MT1507VS05177, AVE1823402U, PTN1VH2
        old_supported_fec Previous forward error correction (FEC) support status none, Base R, RS
        old_advertised_fec Previous FEC advertising state true, false, not reported
        old_fec Previous FEC capability none
        old_autoneg Previous activation state of auto-negotiation on, off
        new_speed Current speed rating of port 10 G, 25 G, 40 G
        new_transreceiver Current transceiver 40G Base-CR4, 25G Base-CR
        new_vendor_name Current vendor name of installed port module Amphenol, OEM, NVIDIA, Fiberstore, Finisar
        new_part_number Current part number of installed port module SFP-H10GB-CU1M, MC3309130-001, 603020003
        new_serial_number Current serial number of installed port module MT1507VS05177, AVE1823402U, PTN1VH2
        new_supported_fec Current FEC support status none, Base R, RS
        new_advertised_fec Current FEC advertising state true, false
        new_fec Current FEC capability none
        new_autoneg Current activation state of auto-negotiation on, off
        Sensors sensor Network protocol or service identifier Fan: fan1, fan-2
        Power Supply Unit: psu1, psu2
        Temperature: psu1temp1, temp2
        hostname User-defined, text-based name for a switch or host server02, leaf-26, exit01, spine2-4
        old_state Previous state of a fan, power supply unit, or thermal sensor Fan: ok, absent, bad
        PSU: ok, absent, bad
        Temp: ok, busted, bad, critical
        new_state Current state of a fan, power supply unit, or thermal sensor Fan: ok, absent, bad
        PSU: ok, absent, bad
        Temp: ok, busted, bad, critical
        old_s_state Previous state of a fan or power supply unit. Fan: up, down
        PSU: up, down
        new_s_state Current state of a fan or power supply unit. Fan: up, down
        PSU: up, down
        new_s_max Current maximum temperature threshold value Temp: 110
        new_s_crit Current critical high temperature threshold value Temp: 85
        new_s_lcrit Current critical low temperature threshold value Temp: -25
        new_s_min Current minimum temperature threshold value Temp: -50
        Services message_type Network protocol or service identifier services
        hostname User-defined, text-based name for a switch or host server02, leaf03, exit01, spine-8
        name Name of service clagd, lldpd, ssh, ntp, netqd, netq-agent
        old_pid Previous process or service identifier 12323, 52941
        new_pid Current process or service identifier 12323, 52941
        old_status Previous status of service up, down
        new_status Current status of service up, down

        Examples of Advanced Notification Configurations

        The following section lists examples of advanced notification configurations.

        Create a Notification for BGP Events from a Selected Switch

        This example creates a notification integration with a PagerDuty channel called pd-netq-events. It then creates a rule bgpHostname and a filter called 4bgpSpine for any notifications from spine-01. The result is that any info severity event messages from Spine-01 is filtered to the pd-netq-events channel.

        Display example
        cumulus@switch:~$ netq add notification channel pagerduty pd-netq-events integration-key 1234567890
        Successfully added/updated channel pd-netq-events
        cumulus@switch:~$ netq add notification rule bgpHostname key node value spine-01
        Successfully added/updated rule bgpHostname
         
        cumulus@switch:~$ netq add notification filter bgpSpine rule bgpHostname channel pd-netq-events
        Successfully added/updated filter bgpSpine
        cumulus@switch:~$ netq show notification channel
        Matching config_notify records:
        Name            Type             Severity         Channel Info
        --------------- ---------------- ---------------- ------------------------
        pd-netq-events  pagerduty        info             integration-key: 1234567
                                                          890   
        
        cumulus@switch:~$ netq show notification rule
        Matching config_notify records:
        Name            Rule Key         Rule Value
        --------------- ---------------- --------------------
        bgpHostname     hostname         spine-01
         
        cumulus@switch:~$ netq show notification filter
        Matching config_notify records:
        Name            Order      Severity         Channels         Rules
        --------------- ---------- ---------------- ---------------- ----------
        bgpSpine        1          info             pd-netq-events   bgpHostnam
                                                                     e
        

        Create a Notification for Errors on a Given EVPN VNI

        This example creates a notification integration with a PagerDuty channel called pd-netq-events. It then creates a rule evpnVni and a filter called 3vni42 for any error messages from VNI 42 on the EVPN overlay network. The result is that any event messages from VNI 42 with a severity level of ‘error’ are filtered to the pd-netq-events channel.

        Display example
        cumulus@switch:~$ netq add notification channel pagerduty pd-netq-events integration-key 1234567890
        Successfully added/updated channel pd-netq-events
         
        cumulus@switch:~$ netq add notification rule evpnVni key vni value 42
        Successfully added/updated rule evpnVni
         
        cumulus@switch:~$ netq add notification filter vni42 rule evpnVni channel pd-netq-events
        Successfully added/updated filter vni42
         
        cumulus@switch:~$ netq show notification channel
        Matching config_notify records:
        Name            Type             Severity         Channel Info
        --------------- ---------------- ---------------- ------------------------
        pd-netq-events  pagerduty        info             integration-key: 1234567
                                                          890   
        
        cumulus@switch:~$ netq show notification rule
        Matching config_notify records:
        Name            Rule Key         Rule Value
        --------------- ---------------- --------------------
        bgpHostname     hostname         spine-01
        evpnVni         vni              42
         
        cumulus@switch:~$ netq show notification filter
        Matching config_notify records:
        Name            Order      Severity         Channels         Rules
        --------------- ---------- ---------------- ---------------- ----------
        bgpSpine        1          info             pd-netq-events   bgpHostnam
                                                                     e
        vni42           2          error            pd-netq-events   evpnVni
        

        Create a Notification for Configuration File Changes

        This example creates a notification integration with a Slack channel called slk-netq-events. It then creates a rule sysconf and a filter called configChange for any configuration file update messages. The result is that any configuration update messages are filtered to the slk-netq-events channel.

        Display example
        cumulus@switch:~$ netq add notification channel slack slk-netq-events webhook https://hooks.slack.com/services/text/moretext/evenmoretext
        Successfully added/updated channel slk-netq-events
         
        cumulus@switch:~$ netq add notification rule sysconf key message_type value configdiff
        Successfully added/updated rule sysconf
         
        cumulus@switch:~$ netq add notification filter configChange severity info rule sysconf channel slk-netq-events
        Successfully added/updated filter configChange
         
        cumulus@switch:~$ netq show notification channel
        Matching config_notify records:
        Name            Type             Severity Channel Info
        --------------- ---------------- -------- ----------------------
        slk-netq-events slack            info     webhook:https://hooks.s
                                                  lack.com/services/text/
                                                  moretext/evenmoretext     
         
        cumulus@switch:~$ netq show notification rule
        Matching config_notify records:
        Name            Rule Key         Rule Value
        --------------- ---------------- --------------------
        bgpHostname     hostname         spine-01
        evpnVni         vni              42
        sysconf         message_type     configdiff 
        
        cumulus@switch:~$ netq show notification filter
        Matching config_notify records:
        Name            Order      Severity         Channels         Rules
        --------------- ---------- ---------------- ---------------- ----------
        bgpSpine        1          info             pd-netq-events   bgpHostnam
                                                                     e
        vni42           2          error            pd-netq-events   evpnVni
        configChange    3          info             slk-netq-events  sysconf
        

        Create a Notification for When a Service Goes Down

        This example creates a notification integration with a Slack channel called slk-netq-events. It then creates a rule svcStatus and a filter called svcDown for any services state messages indicating a service is no longer operational. The result is that any service down messages are filtered to the slk-netq-events channel.

        Display example
        cumulus@switch:~$ netq add notification channel slack slk-netq-events webhook https://hooks.slack.com/services/text/moretext/evenmoretext
        Successfully added/updated channel slk-netq-events
         
        cumulus@switch:~$ netq add notification rule svcStatus key new_status value down
        Successfully added/updated rule svcStatus
         
        cumulus@switch:~$ netq add notification filter svcDown severity error rule svcStatus channel slk-netq-events
        Successfully added/updated filter svcDown
         
        cumulus@switch:~$ netq show notification channel
        Matching config_notify records:
        Name            Type             Severity Channel Info
        --------------- ---------------- -------- ----------------------
        slk-netq-events slack            info     webhook:https://hooks.s
                                                  lack.com/services/text/
                                                  moretext/evenmoretext     
         
        cumulus@switch:~$ netq show notification rule
        Matching config_notify records:
        Name            Rule Key         Rule Value
        --------------- ---------------- --------------------
        bgpHostname     hostname         spine-01
        evpnVni         vni              42
        svcStatus       new_status       down
        sysconf         configdiff       updated
        
        cumulus@switch:~$ netq show notification filter
        Matching config_notify records:
        Name            Order      Severity         Channels         Rules
        --------------- ---------- ---------------- ---------------- ----------
        bgpSpine        1          info             pd-netq-events   bgpHostnam
                                                                     e
        vni42           2          error            pd-netq-events   evpnVni
        configChange    3          info             slk-netq-events  sysconf
        svcDown         4          error            slk-netq-events  svcStatus
        

        Create a Filter to Drop Notifications from a Given Interface

        This example creates a notification integration with a Slack channel called slk-netq-events. It then creates a rule swp52 and a filter called swp52Drop that drops all notifications for events from interface swp52.

        Display example
        cumulus@switch:~$ netq add notification channel slack slk-netq-events webhook https://hooks.slack.com/services/text/moretext/evenmoretext
        Successfully added/updated channel slk-netq-events
         
        cumulus@switch:~$ netq add notification rule swp52 key port value swp52
        Successfully added/updated rule swp52
         
        cumulus@switch:~$ netq add notification filter swp52Drop severity error rule swp52 before bgpSpine
        Successfully added/updated filter swp52Drop
         
        cumulus@switch:~$ netq show notification channel
        Matching config_notify records:
        Name            Type             Severity Channel Info
        --------------- ---------------- -------- ----------------------
        slk-netq-events slack            info     webhook:https://hooks.s
                                                  lack.com/services/text/
                                                  moretext/evenmoretext     
         
        cumulus@switch:~$ netq show notification rule
        Matching config_notify records:
        Name            Rule Key         Rule Value
        --------------- ---------------- --------------------
        bgpHostname     hostname         spine-01
        evpnVni         vni              42
        svcStatus       new_status       down
        swp52           port             swp52
        sysconf         configdiff       updated
        
        cumulus@switch:~$ netq show notification filter
        Matching config_notify records:
        Name            Order      Severity         Channels         Rules
        --------------- ---------- ---------------- ---------------- ----------
        swp52Drop       1          error            NetqDefaultChann swp52
                                                    el
        bgpSpine        2          info             pd-netq-events   bgpHostnam
                                                                     e
        vni42           3          error            pd-netq-events   evpnVni
        configChange    4          info             slk-netq-events  sysconf
        svcDown         5          error            slk-netq-events  svcStatus
        

        Create a Notification for a Given Device that Has a Tendency to Overheat (Using Multiple Rules)

        This example creates a notification when switch leaf04 has passed over the high temperature threshold. Two rules were necessary to create this notification, one to identify the specific device and one to identify the temperature trigger. NetQ then sends the message to the pd-netq-events channel.

        Display example
        cumulus@switch:~$ netq add notification channel pagerduty pd-netq-events integration-key 1234567890
        Successfully added/updated channel pd-netq-events
         
        cumulus@switch:~$ netq add notification rule switchLeaf04 key hostname value leaf04
        Successfully added/updated rule switchLeaf04
        cumulus@switch:~$ netq add notification rule overTemp key new_s_crit value 24
        Successfully added/updated rule overTemp
         
        cumulus@switch:~$ netq add notification filter critTemp rule switchLeaf04 channel pd-netq-events
        Successfully added/updated filter critTemp
        cumulus@switch:~$ netq add notification filter critTemp severity critical rule overTemp channel pd-netq-events
        Successfully added/updated filter critTemp
         
        cumulus@switch:~$ netq show notification channel
        Matching config_notify records:
        Name            Type             Severity         Channel Info
        --------------- ---------------- ---------------- ------------------------
        pd-netq-events  pagerduty        info             integration-key: 1234567
                                                          890
        
        cumulus@switch:~$ netq show notification rule
        Matching config_notify records:
        Name            Rule Key         Rule Value
        --------------- ---------------- --------------------
        bgpHostname     hostname         spine-01
        evpnVni         vni              42
        overTemp        new_s_crit       24
        svcStatus       new_status       down
        switchLeaf04    hostname         leaf04
        swp52           port             swp52
        sysconf         configdiff       updated
        
        cumulus@switch:~$ netq show notification filter
        Matching config_notify records:
        Name            Order      Severity         Channels         Rules
        --------------- ---------- ---------------- ---------------- ----------
        swp52Drop       1          error            NetqDefaultChann swp52
                                                    el
        bgpSpine        2          info             pd-netq-events   bgpHostnam
                                                                     e
        vni42           3          error            pd-netq-events   evpnVni
        configChange    4          info             slk-netq-events  sysconf
        svcDown         5          error            slk-netq-events  svcStatus
        critTemp        6          error            pd-netq-events   switchLeaf
                                                                     04
                                                                     overTemp
        

        Monitor Container Environments Using Kubernetes API Server

        The NetQ Agent monitors many aspects of containers on your network by integrating with the Kubernetes API server. In particular, the NetQ Agent tracks:

        This topic assumes a reasonable familiarity with Kubernetes terminology and architecture.

        Use NetQ with Kubernetes Clusters

        The NetQ Agent interfaces with the Kubernetes API server and listens to Kubernetes events. The NetQ Agent monitors network identity and physical network connectivity of Kubernetes resources like pods, daemon sets, services, and so forth. NetQ works with any container network interface (CNI), including Calico and Flannel.

        The NetQ Kubernetes integration enables network administrators to:

        NetQ also helps network administrators identify changes within a Kubernetes cluster and determine if such changes had an adverse effect on the network performance (caused by a noisy neighbor for example). Additionally, NetQ helps the infrastructure administrator determine the distribution of Kubernetes workloads within a network.

        Requirements

        The NetQ Agent supports Kubernetes version 1.9.2 or later.

        Command Summary

        A large set of commands are available to monitor Kubernetes configurations, including the ability to monitor clusters, nodes, daemon-set, deployment, pods, replication, and services. Run netq show kubernetes help to view the commands. Refer to the command line reference for additional details.

        Enable Kubernetes Monitoring

        For Kubernetes monitoring, the NetQ Agent must be installed, running, and enabled on the hosts providing the Kubernetes service.

        To enable NetQ Agent monitoring of the containers using the Kubernetes API, you must configure the following on the Kubernetes master node:

        1. Install and configure the NetQ Agent and CLI on the master node.

          Follow the steps outlined in Install NetQ Agents and Install NetQ CLI.

        2. Enable Kubernetes monitoring by the NetQ Agent on the master node.

          You can specify a polling period between 10 and 120 seconds; 15 seconds is the default.

          cumulus@host:~$ netq config add agent kubernetes-monitor poll-period 20
          Successfully added kubernetes monitor. Please restart netq-agent.
          
        3. Restart the NetQ Agent:

          cumulus@host:~$ netq config restart agent
          
        4. After waiting for a minute, run the show command to view the cluster:

          cumulus@host:~$netq show kubernetes cluster
          
        5. Next, you must enable the NetQ Agent on every worker node for complete insight into your container network. Repeat steps 2 and 3 on each worker node.

        View Status of Kubernetes Clusters

        Run the netq show kubernetes cluster command to view the status of all Kubernetes clusters in the fabric. The following example shows two clusters: one with server11 as the master server and the other with server12 as the master server. Both are healthy and both list their associated worker nodes.

        cumulus@host:~$ netq show kubernetes cluster
        Matching kube_cluster records:
        Master                   Cluster Name     Controller Status    Scheduler Status Nodes
        ------------------------ ---------------- -------------------- ---------------- --------------------
        server11:3.0.0.68        default          Healthy              Healthy          server11 server13 se
                                                                                        rver22 server11 serv
                                                                                        er12 server23 server
                                                                                        24
        server12:3.0.0.69        default          Healthy              Healthy          server12 server21 se
                                                                                        rver23 server13 serv
                                                                                        er14 server21 server
                                                                                        22
        

        For deployments with multiple clusters, you can use the hostname option to filter the output. This example shows filtering of the list by server11:

        cumulus@host:~$ netq server11 show kubernetes cluster
        Matching kube_cluster records:
        Master                   Cluster Name     Controller Status    Scheduler Status Nodes
        ------------------------ ---------------- -------------------- ---------------- --------------------
        server11:3.0.0.68        default          Healthy              Healthy          server11 server13 se
                                                                                        rver22 server11 serv
                                                                                        er12 server23 server
                                                                                        24
        

        View Changes to a Cluster

        If data collection from the NetQ Agents is not occurring as it did previously, verify that no changes made to the Kubernetes cluster configuration use the around option. Be sure to include the unit of measure with the around value. Valid units include:

        This example shows changes that made to the cluster in the last hour. This example shows the addition of the two master nodes and the various worker nodes for each cluster.

        cumulus@host:~$ netq show kubernetes cluster around 1h
        Matching kube_cluster records:
        Master                   Cluster Name     Controller Status    Scheduler Status Nodes                                    DBState  Last changed
        ------------------------ ---------------- -------------------- ---------------- ---------------------------------------- -------- -------------------------
        server11:3.0.0.68        default          Healthy              Healthy          server11 server13 server22 server11 serv Add      Fri Feb  8 01:50:50 2019
                                                                                        er12 server23 server24
        server12:3.0.0.69        default          Healthy              Healthy          server12 server21 server23 server13 serv Add      Fri Feb  8 01:50:50 2019
                                                                                        er14 server21 server22
        server12:3.0.0.69        default          Healthy              Healthy          server12 server21 server23 server13      Add      Fri Feb  8 01:50:50 2019
        server11:3.0.0.68        default          Healthy              Healthy          server11                                 Add      Fri Feb  8 01:50:50 2019
        server12:3.0.0.69        default          Healthy              Healthy          server12                                 Add      Fri Feb  8 01:50:50 2019
        

        View Kubernetes Pod Information

        You can show configuration and status of the pods in a cluster, including the names, labels, addresses, associated cluster and containers, and whether the pod is running. This example shows pods for FRR, nginx, Calico, and various Kubernetes components sorted by master node.

        cumulus@host:~$ netq show kubernetes pod
        Matching kube_pod records:
        Master                   Namespace    Name                 IP               Node         Labels               Status   Containers               Last Changed
        ------------------------ ------------ -------------------- ---------------- ------------ -------------------- -------- ------------------------ ----------------
        server11:3.0.0.68        default      cumulus-frr-8vssx    3.0.0.70         server13     pod-template-generat Running  cumulus-frr:f8cac70bb217 Fri Feb  8 01:50:50 2019
                                                                                                 ion:1 name:cumulus-f
                                                                                                 rr controller-revisi
                                                                                                 on-hash:3710533951
        server11:3.0.0.68        default      cumulus-frr-dkkgp    3.0.5.135        server24     pod-template-generat Running  cumulus-frr:577a60d5f40c Fri Feb  8 01:50:50 2019
                                                                                                 ion:1 name:cumulus-f
                                                                                                 rr controller-revisi
                                                                                                 on-hash:3710533951
        server11:3.0.0.68        default      cumulus-frr-f4bgx    3.0.3.196        server11     pod-template-generat Running  cumulus-frr:1bc73154a9f5 Fri Feb  8 01:50:50 2019
                                                                                                 ion:1 name:cumulus-f
                                                                                                 rr controller-revisi
                                                                                                 on-hash:3710533951
        server11:3.0.0.68        default      cumulus-frr-gqqxn    3.0.2.5          server22     pod-template-generat Running  cumulus-frr:3ee0396d126a Fri Feb  8 01:50:50 2019
                                                                                                 ion:1 name:cumulus-f
                                                                                                 rr controller-revisi
                                                                                                 on-hash:3710533951
        server11:3.0.0.68        default      cumulus-frr-kdh9f    3.0.3.197        server12     pod-template-generat Running  cumulus-frr:94b6329ecb50 Fri Feb  8 01:50:50 2019
                                                                                                 ion:1 name:cumulus-f
                                                                                                 rr controller-revisi
                                                                                                 on-hash:3710533951
        server11:3.0.0.68        default      cumulus-frr-mvv8m    3.0.5.134        server23     pod-template-generat Running  cumulus-frr:b5845299ce3c Fri Feb  8 01:50:50 2019
                                                                                                 ion:1 name:cumulus-f
                                                                                                 rr controller-revisi
                                                                                                 on-hash:3710533951
        server11:3.0.0.68        default      httpd-5456469bfd-bq9 10.244.49.65     server22     app:httpd            Running  httpd:79b7f532be2d       Fri Feb  8 01:50:50 2019
                                              zm
        server11:3.0.0.68        default      influxdb-6cdb566dd-8 10.244.162.128   server13     app:influx           Running  influxdb:15dce703cdec    Fri Feb  8 01:50:50 2019
                                              9lwn
        server11:3.0.0.68        default      nginx-8586cf59-26pj5 10.244.9.193     server24     run:nginx            Running  nginx:6e2b65070c86       Fri Feb  8 01:50:50 2019
        server11:3.0.0.68        default      nginx-8586cf59-c82ns 10.244.40.128    server12     run:nginx            Running  nginx:01b017c26725       Fri Feb  8 01:50:50 2019
        server11:3.0.0.68        default      nginx-8586cf59-wjwgp 10.244.49.64     server22     run:nginx            Running  nginx:ed2b4254e328       Fri Feb  8 01:50:50 2019
        server11:3.0.0.68        kube-system  calico-etcd-pfg9r    3.0.0.68         server11     k8s-app:calico-etcd  Running  calico-etcd:f95f44b745a7 Fri Feb  8 01:50:50 2019
                                                                                                 pod-template-generat
                                                                                                 ion:1 controller-rev
                                                                                                 ision-hash:142071906
                                                                                                 5
        server11:3.0.0.68        kube-system  calico-kube-controll 3.0.2.5          server22     k8s-app:calico-kube- Running  calico-kube-controllers: Fri Feb  8 01:50:50 2019
                                              ers-d669cc78f-4r5t2                                controllers                   3688b0c5e9c5
        server11:3.0.0.68        kube-system  calico-node-4px69    3.0.2.5          server22     k8s-app:calico-node  Running  calico-node:1d01648ebba4 Fri Feb  8 01:50:50 2019
                                                                                                 pod-template-generat          install-cni:da350802a3d2
                                                                                                 ion:1 controller-rev
                                                                                                 ision-hash:324404111
                                                                                                 9
        server11:3.0.0.68        kube-system  calico-node-bt8w6    3.0.3.196        server11     k8s-app:calico-node  Running  calico-node:9b3358a07e5e Fri Feb  8 01:50:50 2019
                                                                                                 pod-template-generat          install-cni:d38713e6fdd8
                                                                                                 ion:1 controller-rev
                                                                                                 ision-hash:324404111
                                                                                                 9
        server11:3.0.0.68        kube-system  calico-node-gtmkv    3.0.3.197        server12     k8s-app:calico-node  Running  calico-node:48fcc6c40a6b Fri Feb  8 01:50:50 2019
                                                                                                 pod-template-generat          install-cni:f0838a313eff
                                                                                                 ion:1 controller-rev
                                                                                                 ision-hash:324404111
                                                                                                 9
        server11:3.0.0.68        kube-system  calico-node-mvslq    3.0.5.134        server23     k8s-app:calico-node  Running  calico-node:7b361aece76c Fri Feb  8 01:50:50 2019
                                                                                                 pod-template-generat          install-cni:f2da6bc36bf8
                                                                                                 ion:1 controller-rev
                                                                                                 ision-hash:324404111
                                                                                                 9
        server11:3.0.0.68        kube-system  calico-node-sjj2s    3.0.5.135        server24     k8s-app:calico-node  Running  calico-node:6e13b2b73031 Fri Feb  8 01:50:50 2019
                                                                                                 pod-template-generat          install-cni:fa4b2b17fba9
                                                                                                 ion:1 controller-rev
                                                                                                 ision-hash:324404111
                                                                                                 9
        server11:3.0.0.68        kube-system  calico-node-vdkk5    3.0.0.70         server13     k8s-app:calico-node  Running  calico-node:fb3ec9429281 Fri Feb  8 01:50:50 2019
                                                                                                 pod-template-generat          install-cni:b56980da7294
                                                                                                 ion:1 controller-rev
                                                                                                 ision-hash:324404111
                                                                                                 9
        server11:3.0.0.68        kube-system  calico-node-zzfkr    3.0.0.68         server11     k8s-app:calico-node  Running  calico-node:c1ac399dd862 Fri Feb  8 01:50:50 2019
                                                                                                 pod-template-generat          install-cni:60a779fdc47a
                                                                                                 ion:1 controller-rev
                                                                                                 ision-hash:324404111
                                                                                                 9
        server11:3.0.0.68        kube-system  etcd-server11        3.0.0.68         server11     tier:control-plane c Running  etcd:dde63d44a2f5        Fri Feb  8 01:50:50 2019
                                                                                                 omponent:etcd
        server11:3.0.0.68        kube-system  kube-apiserver-hostd 3.0.0.68         server11     tier:control-plane c Running  kube-apiserver:0cd557bbf Fri Feb  8 01:50:50 2019
                                              -11                                                omponent:kube-apiser          2fe
                                                                                                 ver
        server11:3.0.0.68        kube-system  kube-controller-mana 3.0.0.68         server11     tier:control-plane c Running  kube-controller-manager: Fri Feb  8 01:50:50 2019
                                              ger-server11                                       omponent:kube-contro          89b2323d09b2
                                                                                                 ller-manager
        server11:3.0.0.68        kube-system  kube-dns-6f4fd4bdf-p 10.244.34.64     server23     k8s-app:kube-dns     Running  dnsmasq:284d9d363999 kub Fri Feb  8 01:50:50 2019
                                              lv7p                                                                             edns:bd8bdc49b950 sideca
                                                                                                                               r:fe10820ffb19
        server11:3.0.0.68        kube-system  kube-proxy-4cx2t     3.0.3.197        server12     k8s-app:kube-proxy p Running  kube-proxy:49b0936a4212  Fri Feb  8 01:50:50 2019
                                                                                                 od-template-generati
                                                                                                 on:1 controller-revi
                                                                                                 sion-hash:3953509896
        server11:3.0.0.68        kube-system  kube-proxy-7674k     3.0.3.196        server11     k8s-app:kube-proxy p Running  kube-proxy:5dc2f5fe0fad  Fri Feb  8 01:50:50 2019
                                                                                                 od-template-generati
                                                                                                 on:1 controller-revi
                                                                                                 sion-hash:3953509896
        server11:3.0.0.68        kube-system  kube-proxy-ck5cn     3.0.2.5          server22     k8s-app:kube-proxy p Running  kube-proxy:6944f7ff8c18  Fri Feb  8 01:50:50 2019
                                                                                                 od-template-generati
                                                                                                 on:1 controller-revi
                                                                                                 sion-hash:3953509896
        server11:3.0.0.68        kube-system  kube-proxy-f9dt8     3.0.0.68         server11     k8s-app:kube-proxy p Running  kube-proxy:032cc82ef3f8  Fri Feb  8 01:50:50 2019
                                                                                                 od-template-generati
                                                                                                 on:1 controller-revi
                                                                                                 sion-hash:3953509896
        server11:3.0.0.68        kube-system  kube-proxy-j6qw6     3.0.5.135        server24     k8s-app:kube-proxy p Running  kube-proxy:10544e43212e  Fri Feb  8 01:50:50 2019
                                                                                                 od-template-generati
                                                                                                 on:1 controller-revi
                                                                                                 sion-hash:3953509896
        server11:3.0.0.68        kube-system  kube-proxy-lq8zz     3.0.5.134        server23     k8s-app:kube-proxy p Running  kube-proxy:1bcfa09bb186  Fri Feb  8 01:50:50 2019
                                                                                                 od-template-generati
                                                                                                 on:1 controller-revi
                                                                                                 sion-hash:3953509896
        server11:3.0.0.68        kube-system  kube-proxy-vg7kj     3.0.0.70         server13     k8s-app:kube-proxy p Running  kube-proxy:8fed384b68e5  Fri Feb  8 01:50:50 2019
                                                                                                 od-template-generati
                                                                                                 on:1 controller-revi
                                                                                                 sion-hash:3953509896
        server11:3.0.0.68        kube-system  kube-scheduler-hostd 3.0.0.68         server11     tier:control-plane c Running  kube-scheduler:c262a8071 Fri Feb  8 01:50:50 2019
                                              -11                                                omponent:kube-schedu          3cb
                                                                                                 ler
        server12:3.0.0.69        default      cumulus-frr-2gkdv    3.0.2.4          server21     pod-template-generat Running  cumulus-frr:25d1109f8898 Fri Feb  8 01:50:50 2019
                                                                                                 ion:1 name:cumulus-f
                                                                                                 rr controller-revisi
                                                                                                 on-hash:3710533951
        server12:3.0.0.69        default      cumulus-frr-b9dm5    3.0.3.199        server14     pod-template-generat Running  cumulus-frr:45063f9a095f Fri Feb  8 01:50:50 2019
                                                                                                 ion:1 name:cumulus-f
                                                                                                 rr controller-revisi
                                                                                                 on-hash:3710533951
        server12:3.0.0.69        default      cumulus-frr-rtqhv    3.0.2.6          server23     pod-template-generat Running  cumulus-frr:63e802a52ea2 Fri Feb  8 01:50:50 2019
                                                                                                 ion:1 name:cumulus-f
                                                                                                 rr controller-revisi
                                                                                                 on-hash:3710533951
        server12:3.0.0.69        default      cumulus-frr-tddrg    3.0.5.133        server22     pod-template-generat Running  cumulus-frr:52dd54e4ac9f Fri Feb  8 01:50:50 2019
                                                                                                 ion:1 name:cumulus-f
                                                                                                 rr controller-revisi
                                                                                                 on-hash:3710533951
        server12:3.0.0.69        default      cumulus-frr-vx7jp    3.0.5.132        server21     pod-template-generat Running  cumulus-frr:1c20addfcbd3 Fri Feb  8 01:50:50 2019
                                                                                                 ion:1 name:cumulus-f
                                                                                                 rr controller-revisi
                                                                                                 on-hash:3710533951
        server12:3.0.0.69        default      cumulus-frr-x7ft5    3.0.3.198        server13     pod-template-generat Running  cumulus-frr:b0f63792732e Fri Feb  8 01:50:50 2019
                                                                                                 ion:1 name:cumulus-f
                                                                                                 rr controller-revisi
                                                                                                 on-hash:3710533951
        server12:3.0.0.69        kube-system  calico-etcd-btqgt    3.0.0.69         server12     k8s-app:calico-etcd  Running  calico-etcd:72b1a16968fb Fri Feb  8 01:50:50 2019
                                                                                                 pod-template-generat
                                                                                                 ion:1 controller-rev
                                                                                                 ision-hash:142071906
                                                                                                 5
        server12:3.0.0.69        kube-system  calico-kube-controll 3.0.5.132        server21     k8s-app:calico-kube- Running  calico-kube-controllers: Fri Feb  8 01:50:50 2019
                                              ers-d669cc78f-bdnzk                                controllers                   6821bf04696f
        server12:3.0.0.69        kube-system  calico-node-4g6vd    3.0.3.198        server13     k8s-app:calico-node  Running  calico-node:1046b559a50c Fri Feb  8 01:50:50 2019
                                                                                                 pod-template-generat          install-cni:0a136851da17
                                                                                                 ion:1 controller-rev
                                                                                                 ision-hash:490828062
        server12:3.0.0.69        kube-system  calico-node-4hg6l    3.0.0.69         server12     k8s-app:calico-node  Running  calico-node:4e7acc83f8e8 Fri Feb  8 01:50:50 2019
                                                                                                 pod-template-generat          install-cni:a26e76de289e
                                                                                                 ion:1 controller-rev
                                                                                                 ision-hash:490828062
        server12:3.0.0.69        kube-system  calico-node-4p66v    3.0.2.6          server23     k8s-app:calico-node  Running  calico-node:a7a44072e4e2 Fri Feb  8 01:50:50 2019
                                                                                                 pod-template-generat          install-cni:9a19da2b2308
                                                                                                 ion:1 controller-rev
                                                                                                 ision-hash:490828062
        server12:3.0.0.69        kube-system  calico-node-5z7k4    3.0.5.133        server22     k8s-app:calico-node  Running  calico-node:9878b0606158 Fri Feb  8 01:50:50 2019
                                                                                                 pod-template-generat          install-cni:489f8f326cf9
                                                                                                 ion:1 controller-rev
                                                                                                 ision-hash:490828062
        ...
        

        You can filter this information to focus on pods on a particular node:

        cumulus@host:~$ netq show kubernetes pod node server11
        Matching kube_pod records:
        Master                   Namespace    Name                 IP               Node         Labels               Status   Containers               Last Changed
        ------------------------ ------------ -------------------- ---------------- ------------ -------------------- -------- ------------------------ ----------------
        server11:3.0.0.68        kube-system  calico-etcd-pfg9r    3.0.0.68         server11     k8s-app:calico-etcd  Running  calico-etcd:f95f44b745a7 2d:14h:0m:59s
                                                                                                 pod-template-generat
                                                                                                 ion:1 controller-rev
                                                                                                 ision-hash:142071906
                                                                                                 5
        server11:3.0.0.68        kube-system  calico-node-zzfkr    3.0.0.68         server11     k8s-app:calico-node  Running  calico-node:c1ac399dd862 2d:14h:0m:59s
                                                                                                 pod-template-generat          install-cni:60a779fdc47a
                                                                                                 ion:1 controller-rev
                                                                                                 ision-hash:324404111
                                                                                                 9
        server11:3.0.0.68        kube-system  etcd-server11        3.0.0.68         server11     tier:control-plane c Running  etcd:dde63d44a2f5        2d:14h:1m:44s
                                                                                                 omponent:etcd
        server11:3.0.0.68        kube-system  kube-apiserver-serve 3.0.0.68         server11     tier:control-plane c Running  kube-apiserver:0cd557bbf 2d:14h:1m:44s
                                              r11                                                omponent:kube-apiser          2fe
                                                                                                 ver
        server11:3.0.0.68        kube-system  kube-controller-mana 3.0.0.68         server11     tier:control-plane c Running  kube-controller-manager: 2d:14h:1m:44s
                                              ger-server11                                       omponent:kube-contro          89b2323d09b2
                                                                                                 ller-manager
        server11:3.0.0.68        kube-system  kube-proxy-f9dt8     3.0.0.68         server11     k8s-app:kube-proxy p Running  kube-proxy:032cc82ef3f8  2d:14h:0m:59s
                                                                                                 od-template-generati
                                                                                                 on:1 controller-revi
                                                                                                 sion-hash:3953509896
        server11:3.0.0.68        kube-system  kube-scheduler-serve 3.0.0.68         server11     tier:control-plane c Running  kube-scheduler:c262a8071 2d:14h:1m:44s
                                              r11                                                omponent:kube-schedu          3cb
                                                                                                 ler
        

        View Kubernetes Node Information

        You can view detailed information about a node, including their role in the cluster, pod CIDR and kubelet status. This example shows all the nodes in the cluster with server11 as the master. Note that server11 acts as a worker node along with the other nodes in the cluster, server12, server13, server22, server23, and server24.

        cumulus@host:~$ netq server11 show kubernetes node
        Matching kube_cluster records:
        Master                   Cluster Name     Node Name            Role       Status           Labels               Pod CIDR                 Last Changed
        ------------------------ ---------------- -------------------- ---------- ---------------- -------------------- ------------------------ ----------------
        server11:3.0.0.68        default          server11             master     KubeletReady     node-role.kubernetes 10.224.0.0/24            14h:23m:46s
                                                                                                   .io/master: kubernet
                                                                                                   es.io/hostname:hostd
                                                                                                   -11 beta.kubernetes.
                                                                                                   io/arch:amd64 beta.k
                                                                                                   ubernetes.io/os:linu
                                                                                                   x
        server11:3.0.0.68        default          server13             worker     KubeletReady     kubernetes.io/hostna 10.224.3.0/24            14h:19m:56s
                                                                                                   me:server13 beta.kub
                                                                                                   ernetes.io/arch:amd6
                                                                                                   4 beta.kubernetes.io
                                                                                                   /os:linux
        server11:3.0.0.68        default          server22             worker     KubeletReady     kubernetes.io/hostna 10.224.1.0/24            14h:24m:31s
                                                                                                   me:server22 beta.kub
                                                                                                   ernetes.io/arch:amd6
                                                                                                   4 beta.kubernetes.io
                                                                                                   /os:linux
        server11:3.0.0.68        default          server11             worker     KubeletReady     kubernetes.io/hostna 10.224.2.0/24            14h:24m:16s
                                                                                                   me:server11 beta.kub
                                                                                                   ernetes.io/arch:amd6
                                                                                                   4 beta.kubernetes.io
                                                                                                   /os:linux
        server11:3.0.0.68        default          server12             worker     KubeletReady     kubernetes.io/hostna 10.224.4.0/24            14h:24m:16s
                                                                                                   me:server12 beta.kub
                                                                                                   ernetes.io/arch:amd6
                                                                                                   4 beta.kubernetes.io
                                                                                                   /os:linux
        server11:3.0.0.68        default          server23             worker     KubeletReady     kubernetes.io/hostna 10.224.5.0/24            14h:24m:16s
                                                                                                   me:server23 beta.kub
                                                                                                   ernetes.io/arch:amd6
                                                                                                   4 beta.kubernetes.io
                                                                                                   /os:linux
        server11:3.0.0.68        default          server24             worker     KubeletReady     kubernetes.io/hostna 10.224.6.0/24            14h:24m:1s
                                                                                                   me:server24 beta.kub
                                                                                                   ernetes.io/arch:amd6
                                                                                                   4 beta.kubernetes.io
                                                                                                   /os:linux
        

        To display the kubelet or Docker version, use the components option with the show command. This example lists the kublet version, a proxy address if used, and the status of the container for server11 master and worker nodes.

        cumulus@host:~$ netq server11 show kubernetes node components
        Matching kube_cluster records:
                                 Master           Cluster Name         Node Name    Kubelet      KubeProxy         Container Runt
                                                                                                                   ime
        ------------------------ ---------------- -------------------- ------------ ------------ ----------------- --------------
        server11:3.0.0.68        default          server11             v1.9.2       v1.9.2       docker://17.3.2   KubeletReady
        server11:3.0.0.68        default          server13             v1.9.2       v1.9.2       docker://17.3.2   KubeletReady
        server11:3.0.0.68        default          server22             v1.9.2       v1.9.2       docker://17.3.2   KubeletReady
        server11:3.0.0.68        default          server11             v1.9.2       v1.9.2       docker://17.3.2   KubeletReady
        server11:3.0.0.68        default          server12             v1.9.2       v1.9.2       docker://17.3.2   KubeletReady
        server11:3.0.0.68        default          server23             v1.9.2       v1.9.2       docker://17.3.2   KubeletReady
        server11:3.0.0.68        default          server24             v1.9.2       v1.9.2       docker://17.3.2   KubeletReady
        

        To view only the details for a selected node, the name option with the hostname of that node following the components option:

        cumulus@host:~$ netq server11 show kubernetes node components name server13
        Matching kube_cluster records:
                                 Master           Cluster Name         Node Name    Kubelet      KubeProxy         Container Runt
                                                                                                                   ime
        ------------------------ ---------------- -------------------- ------------ ------------ ----------------- --------------
        server11:3.0.0.68        default          server13             v1.9.2       v1.9.2       docker://17.3.2   KubeletReady
        

        View Kubernetes Replica Set on a Node

        You can view information about the replica set, including the name, labels, and number of replicas present for each application. This example shows the number of replicas for each application in the server11 cluster:

        cumulus@host:~$ netq server11 show kubernetes replica-set
        Matching kube_replica records:
        Master                   Cluster Name Namespace        Replication Name               Labels               Replicas                           Ready Replicas Last Changed
        ------------------------ ------------ ---------------- ------------------------------ -------------------- ---------------------------------- -------------- ----------------
        server11:3.0.0.68        default      default          influxdb-6cdb566dd             app:influx           1                                  1              14h:19m:28s
        server11:3.0.0.68        default      default          nginx-8586cf59                 run:nginx            3                                  3              14h:24m:39s
        server11:3.0.0.68        default      default          httpd-5456469bfd               app:httpd            1                                  1              14h:19m:28s
        server11:3.0.0.68        default      kube-system      kube-dns-6f4fd4bdf             k8s-app:kube-dns     1                                  1              14h:27m:9s
        server11:3.0.0.68        default      kube-system      calico-kube-controllers-d669cc k8s-app:calico-kube- 1                                  1              14h:27m:9s
                                                               78f                            controllers
        

        View the Daemon-sets on a Node

        You can view information about the daemon set running on the node. This example shows that six copies of the cumulus-frr daemon are running on the server11 node:

        cumulus@host:~$ netq server11 show kubernetes daemon-set namespace default
        Matching kube_daemonset records:
        Master                   Cluster Name Namespace        Daemon Set Name                Labels               Desired Count Ready Count Last Changed
        ------------------------ ------------ ---------------- ------------------------------ -------------------- ------------- ----------- ----------------
        server11:3.0.0.68        default      default          cumulus-frr                    k8s-app:cumulus-frr  6             6           14h:25m:37s
        

        View Pods on a Node

        You can view information about the pods on the node. The first example shows all pods running nginx in the default namespace for the server11 cluster. The second example shows all pods running any application in the default namespace for the server11 cluster.

        cumulus@host:~$ netq server11 show kubernetes pod namespace default label nginx
        Matching kube_pod records:
        Master                   Namespace    Name                 IP               Node         Labels               Status   Containers               Last Changed
        ------------------------ ------------ -------------------- ---------------- ------------ -------------------- -------- ------------------------ ----------------
        server11:3.0.0.68        default      nginx-8586cf59-26pj5 10.244.9.193     server24     run:nginx            Running  nginx:6e2b65070c86       14h:25m:24s
        server11:3.0.0.68        default      nginx-8586cf59-c82ns 10.244.40.128    server12     run:nginx            Running  nginx:01b017c26725       14h:25m:24s
        server11:3.0.0.68        default      nginx-8586cf59-wjwgp 10.244.49.64     server22     run:nginx            Running  nginx:ed2b4254e328       14h:25m:24s
         
        cumulus@host:~$ netq server11 show kubernetes pod namespace default label app
        Matching kube_pod records:
        Master                   Namespace    Name                 IP               Node         Labels               Status   Containers               Last Changed
        ------------------------ ------------ -------------------- ---------------- ------------ -------------------- -------- ------------------------ ----------------
        server11:3.0.0.68        default      httpd-5456469bfd-bq9 10.244.49.65     server22     app:httpd            Running  httpd:79b7f532be2d       14h:20m:34s
                                              zm
        server11:3.0.0.68        default      influxdb-6cdb566dd-8 10.244.162.128   server13     app:influx           Running  influxdb:15dce703cdec    14h:20m:34s
                                              9lwn
        

        View Status of the Replication Controller on a Node

        After you create the replicas, you can then view information about the replication controller:

        cumulus@host:~$ netq server11 show kubernetes replication-controller
        No matching kube_replica records found
        

        View Kubernetes Deployment Information

        For each depolyment, you can view the number of replicas associated with an application. This example shows information for a deployment of the nginx application:

        cumulus@host:~$ netq server11 show kubernetes deployment name nginx
        Matching kube_deployment records:
        Master                   Namespace       Name                 Replicas                           Ready Replicas Labels                         Last Changed
        ------------------------ --------------- -------------------- ---------------------------------- -------------- ------------------------------ ----------------
        server11:3.0.0.68        default         nginx                3                                  3              run:nginx                      14h:27m:20s
        

        Search Using Labels

        You can search for information about your Kubernetes clusters using labels. A label search is similar to a “contains” regular expression search. The following example looks for all nodes that contain kube in the replication set name or label:

        cumulus@host:~$ netq server11 show kubernetes replica-set label kube
        Matching kube_replica records:
        Master                   Cluster Name Namespace        Replication Name               Labels               Replicas                           Ready Replicas Last Changed
        ------------------------ ------------ ---------------- ------------------------------ -------------------- ---------------------------------- -------------- ----------------
        server11:3.0.0.68        default      kube-system      kube-dns-6f4fd4bdf             k8s-app:kube-dns     1                                  1              14h:30m:41s
        server11:3.0.0.68        default      kube-system      calico-kube-controllers-d669cc k8s-app:calico-kube- 1                                  1              14h:30m:41s
                                                               78f                            controllers
        

        View Container Connectivity

        You can view the connectivity graph of a Kubernetes pod, seeing its replica set, deployment or service level. The connectivity graph starts with the server where you deployed the pod, and shows the peer for each server interface. This data appears in a similar manner as the netq trace command, showing the interface name, the outbound port on that interface, and the inbound port on the peer.

        In this example shows connectivity at the deployment level, where the nginx-8586cf59-wjwgp replica is in a pod on the server22 node. It has four possible communication paths, through interfaces swp1-4 out varying ports to peer interfaces swp7 and swp20 on torc-21, torc-22, edge01 and edge02 nodes. Similarly, it shows the connections for two additional nginx replicas.

        cumulus@host:~$ netq server11 show kubernetes deployment name nginx connectivity
        nginx -- nginx-8586cf59-wjwgp -- server22:swp1:torbond1 -- swp7:hostbond3:torc-21
                                      -- server22:swp2:torbond1 -- swp7:hostbond3:torc-22
                                      -- server22:swp3:NetQBond-2 -- swp20:NetQBond-20:edge01
                                      -- server22:swp4:NetQBond-2 -- swp20:NetQBond-20:edge02
              -- nginx-8586cf59-c82ns -- server12:swp2:NetQBond-1 -- swp23:NetQBond-23:edge01
                                      -- server12:swp3:NetQBond-1 -- swp23:NetQBond-23:edge02
                                      -- server12:swp1:swp1 -- swp6:VlanA-1:tor-1
              -- nginx-8586cf59-26pj5 -- server24:swp2:NetQBond-1 -- swp29:NetQBond-29:edge01
                                      -- server24:swp3:NetQBond-1 -- swp29:NetQBond-29:edge02
                                      -- server24:swp1:swp1 -- swp8:VlanA-1:tor-2
        

        View Kubernetes Services Information

        You can show details about the Kubernetes services in a cluster, including service name, labels associated with the service, type of service, associated IP address, an external address if a public service, and ports used. This example shows the services available in the Kubernetes cluster:

        cumulus@host:~$ netq show kubernetes service
        Matching kube_service records:
        Master                   Namespace        Service Name         Labels       Type       Cluster IP       External IP      Ports                               Last Changed
        ------------------------ ---------------- -------------------- ------------ ---------- ---------------- ---------------- ----------------------------------- ----------------
        server11:3.0.0.68        default          kubernetes                        ClusterIP  10.96.0.1                         TCP:443                             2d:13h:45m:30s
        server11:3.0.0.68        kube-system      calico-etcd          k8s-app:cali ClusterIP  10.96.232.136                     TCP:6666                            2d:13h:45m:27s
                                                                       co-etcd
        server11:3.0.0.68        kube-system      kube-dns             k8s-app:kube ClusterIP  10.96.0.10                        UDP:53 TCP:53                       2d:13h:45m:28s
                                                                       -dns
        server12:3.0.0.69        default          kubernetes                        ClusterIP  10.96.0.1                         TCP:443                             2d:13h:46m:24s
        server12:3.0.0.69        kube-system      calico-etcd          k8s-app:cali ClusterIP  10.96.232.136                     TCP:6666                            2d:13h:46m:20s
                                                                       co-etcd
        server12:3.0.0.69        kube-system      kube-dns             k8s-app:kube ClusterIP  10.96.0.10                        UDP:53 TCP:53                       2d:13h:46m:20s
                                                                       -dns
        

        You can filter the list to view details about a particular Kubernetes service using the name option, as shown here:

        cumulus@host:~$ netq show kubernetes service name calico-etcd
        Matching kube_service records:
        Master                   Namespace        Service Name         Labels       Type       Cluster IP       External IP      Ports                               Last Changed
        ------------------------ ---------------- -------------------- ------------ ---------- ---------------- ---------------- ----------------------------------- ----------------
        server11:3.0.0.68        kube-system      calico-etcd          k8s-app:cali ClusterIP  10.96.232.136                     TCP:6666                            2d:13h:48m:10s
                                                                       co-etcd
        server12:3.0.0.69        kube-system      calico-etcd          k8s-app:cali ClusterIP  10.96.232.136                     TCP:6666                            2d:13h:49m:3s
                                                                       co-etcd
        

        View Kubernetes Service Connectivity

        To see the connectivity of a given Kubernetes service, include the connectivity option. This example shows the connectivity of the calico-etcd service:

        cumulus@host:~$ netq show kubernetes service name calico-etcd connectivity
        calico-etcd -- calico-etcd-pfg9r -- server11:swp1:torbond1 -- swp6:hostbond2:torc-11
                                         -- server11:swp2:torbond1 -- swp6:hostbond2:torc-12
                                         -- server11:swp3:NetQBond-2 -- swp16:NetQBond-16:edge01
                                         -- server11:swp4:NetQBond-2 -- swp16:NetQBond-16:edge02
        calico-etcd -- calico-etcd-btqgt -- server12:swp1:torbond1 -- swp7:hostbond3:torc-11
                                         -- server12:swp2:torbond1 -- swp7:hostbond3:torc-12
                                         -- server12:swp3:NetQBond-2 -- swp17:NetQBond-17:edge01
                                         -- server12:swp4:NetQBond-2 -- swp17:NetQBond-17:edge02
        

        View the Impact of Connectivity Loss for a Service

        You can preview the impact on the service availability based on the loss of particular node using the impact option. The output is color coded (not shown in the example below) so you can clearly see the impact: green shows no impact, yellow shows partial impact, and red shows full impact.

        cumulus@host:~$ netq server11 show impact kubernetes service name calico-etcd
        calico-etcd -- calico-etcd-pfg9r -- server11:swp1:torbond1 -- swp6:hostbond2:torc-11
                                         -- server11:swp2:torbond1 -- swp6:hostbond2:torc-12
                                         -- server11:swp3:NetQBond-2 -- swp16:NetQBond-16:edge01
                                         -- server11:swp4:NetQBond-2 -- swp16:NetQBond-16:edge02
        

        View Kubernetes Cluster Configuration in the Past

        You can use the around option to go back in time to check the network status and identify any changes that occurred on the network.

        This example shows the current state of the network. Notice there is a node named server23. server23 is there because the node server22 went down and Kubernetes spun up a third replica on a different host to satisfy the deployment requirement.

        cumulus@host:~$ netq server11 show kubernetes deployment name nginx connectivity
        nginx -- nginx-8586cf59-fqtnj -- server12:swp2:NetQBond-1 -- swp23:NetQBond-23:edge01
                                      -- server12:swp3:NetQBond-1 -- swp23:NetQBond-23:edge02
                                      -- server12:swp1:swp1 -- swp6:VlanA-1:tor-1
              -- nginx-8586cf59-8g487 -- server24:swp2:NetQBond-1 -- swp29:NetQBond-29:edge01
                                      -- server24:swp3:NetQBond-1 -- swp29:NetQBond-29:edge02
                                      -- server24:swp1:swp1 -- swp8:VlanA-1:tor-2
              -- nginx-8586cf59-2hb8t -- server23:swp1:swp1 -- swp7:VlanA-1:tor-2
                                      -- server23:swp2:NetQBond-1 -- swp28:NetQBond-28:edge01
                                      -- server23:swp3:NetQBond-1 -- swp28:NetQBond-28:edge02
        

        You can see this by going back in time 10 minutes. server23 was not present, whereas server22 was present:

        cumulus@host:~$ netq server11 show kubernetes deployment name nginx connectivity around 10m
        nginx -- nginx-8586cf59-fqtnj -- server12:swp2:NetQBond-1 -- swp23:NetQBond-23:edge01
                                      -- server12:swp3:NetQBond-1 -- swp23:NetQBond-23:edge02
                                      -- server12:swp1:swp1 -- swp6:VlanA-1:tor-1
              -- nginx-8586cf59-2xxs4 -- server22:swp1:torbond1 -- swp7:hostbond3:torc-21
                                      -- server22:swp2:torbond1 -- swp7:hostbond3:torc-22
                                      -- server22:swp3:NetQBond-2 -- swp20:NetQBond-20:edge01
                                      -- server22:swp4:NetQBond-2 -- swp20:NetQBond-20:edge02
              -- nginx-8586cf59-8g487 -- server24:swp2:NetQBond-1 -- swp29:NetQBond-29:edge01
                                      -- server24:swp3:NetQBond-1 -- swp29:NetQBond-29:edge02
                                      -- server24:swp1:swp1 -- swp8:VlanA-1:tor-2
        

        View the Impact of Connectivity Loss for a Deployment

        You can determine the impact on the Kubernetes deployment in the event a host or switch goes down. The output is color coded (not shown in the example below) so you can clearly see the impact: green shows no impact, yellow shows partial impact, and red shows full impact.

        cumulus@host:~$ netq torc-21 show impact kubernetes deployment name nginx
        nginx -- nginx-8586cf59-wjwgp -- server22:swp1:torbond1 -- swp7:hostbond3:torc-21
                                      -- server22:swp2:torbond1 -- swp7:hostbond3:torc-22
                                      -- server22:swp3:NetQBond-2 -- swp20:NetQBond-20:edge01
                                      -- server22:swp4:NetQBond-2 -- swp20:NetQBond-20:edge02
              -- nginx-8586cf59-c82ns -- server12:swp2:NetQBond-1 -- swp23:NetQBond-23:edge01
                                      -- server12:swp3:NetQBond-1 -- swp23:NetQBond-23:edge02
                                      -- server12:swp1:swp1 -- swp6:VlanA-1:tor-1
              -- nginx-8586cf59-26pj5 -- server24:swp2:NetQBond-1 -- swp29:NetQBond-29:edge01
                                      -- server24:swp3:NetQBond-1 -- swp29:NetQBond-29:edge02
                                      -- server24:swp1:swp1 -- swp8:VlanA-1:tor-2
        cumulus@server11:~$ netq server12 show impact kubernetes deployment name nginx
        nginx -- nginx-8586cf59-wjwgp -- server22:swp1:torbond1 -- swp7:hostbond3:torc-21
                                      -- server22:swp2:torbond1 -- swp7:hostbond3:torc-22
                                      -- server22:swp3:NetQBond-2 -- swp20:NetQBond-20:edge01
                                      -- server22:swp4:NetQBond-2 -- swp20:NetQBond-20:edge02
              -- nginx-8586cf59-c82ns -- server12:swp2:NetQBond-1 -- swp23:NetQBond-23:edge01
                                      -- server12:swp3:NetQBond-1 -- swp23:NetQBond-23:edge02
                                      -- server12:swp1:swp1 -- swp6:VlanA-1:tor-1
              -- nginx-8586cf59-26pj5 -- server24:swp2:NetQBond-1 -- swp29:NetQBond-29:edge01
                                      -- server24:swp3:NetQBond-1 -- swp29:NetQBond-29:edge02
        

        Kubernetes Cluster Maintenance

        If you need to perform maintenance on the Kubernetes cluster itself, use the following commands to bring the cluster down and then back up.

        1. Display the list of all the nodes in the Kubernetes cluster:

          cumulus@host:~$ kubectl get nodes 
          
        2. Tell Kubernetes to drain the node so that the pods running on it are gracefully scheduled elsewhere:

          cumulus@host:~$ kubectl drain <node name> 
          
        3. After the maintenance window is over, put the node back into the cluster so that Kubernetes can start scheduling pods on it again:

          cumulus@host:~$ kubectl uncordon <node name>
          

        Adaptive Routing

        Adaptive routing is a load balancing feature that improves network utilization for eligible IP packets by selecting forwarding paths dynamically based on the state of the switch, such as queue occupancy and port utilization. You can use the adaptive routing dashboard to view switches with adaptive routing capabilities, events related to adaptive routing, RoCE settings, and egress queue lengths in the form of histograms.

        Adaptive routing monitoring is supported on Spectrum-4 switches. It requires a switch fabric running Cumulus Linux 5.5.0 or above. This feature is in beta.

        Requirements

        To display adaptive routing data, you must have adaptive routing configured on the switch; it can be either enabled or disabled. Switches without an adaptive routing configuration will not appear in the UI or CLI. Additionally, RoCE lossless mode must be enabled to display adaptive routing data. Switches with RoCE lossy mode enabled will appear in the UI and CLI, but will not display adaptive routing data.

        Adaptive Routing Commands

        Monitor adaptive routing with the netq show adaptive-routing config command.

        netq show adaptive-routing config global
        netq show adaptive-routing config interface
        

        Access the Adaptive Routing Dashboard

        1. Select Menu.

        2. Under the Network section, select Adaptive routing.

        The adaptive routing dashboard displays:

        adaptive routing dashboard displaying two devices with AR enabled

        Configure and Monitor Threshold-Crossing Events

        Threshold-crossing events are user-defined events that detect and prevent network failures for ACL resources, BGP, digital optics, ECMP, forwarding resources, interface errors and statistics, link flaps, resource utilization, RoCE, sensors, and What Just Happened events.

        You can find a complete list of TCAs—including event IDs required for the command line—in the Threshold-Crossing Events Reference.

        Create a Threshold-crossing Rule

        1. Click Menu and navigate to Threshold crossing rules.

        2. Select the tab that reflects the event type for the rule.

        3. Click Create a rule. Enter a name for the rule and assign a severity, then click Next.

        1. Select the attribute you want to monitor. The listed attributes change depending on the type of event you chose in the previous step.

        2. Click Next.

        3. On the Set threshold step, enter a threshold value.

        For digital optics, you can choose to use the thresholds defined by the optics vendor (default) or specify your own.
        1. Define the scope of the rule.

          • If you want to restrict the rule based on a particular parameter, enter values for one or more of the available attributes. For What Just Happened rules, select a reason from the available list.

          • If you want the rule to apply to across the network, select the Apply rule to entire network toggle.

        2. Click Next.

        3. (Optional) Select a notification channel where you want the events to be sent.

          Only previously created channels are available for selection. If no channel is available or selected, the notifications can only be retrieved from the database. You can add a channel at a later time and then add it to the rule.

        4. Click Finish. The rules may take several minutes to appear in the UI.

        The simplest configuration you can create is one that sends a TCA event generated by all devices and all interfaces to a single notification application. Use the netq add tca command to configure the event. Its syntax is:

        netq add tca event_id <text-event-id-anchor>
            [scope <text-scope-anchor>]
            [severity info | severity error]
            [is_active true | is_active false]
            [suppress_until <text-suppress-ts>]
            [threshold_type user_set | threshold_type vendor_set]
            [threshold <text-threshold-value>]
            [channel <text-channel-name-anchor> | channel drop <text-drop-channel-name>]
        

        Note that the event ID is case-sensitive and must be in all uppercase.

        For example, this rule tells NetQ to deliver an event notification to the tca_slack_ifstats pre-configured Slack channel when the CPU utilization exceeds 95% of its capacity on any monitored switch:

        cumulus@switch:~$ netq add tca event_id TCA_CPU_UTILIZATION_UPPER scope '*' channel tca_slack_ifstats threshold 95
        

        This rule tells NetQ to deliver an event notification to the tca_pd_ifstats PagerDuty channel when the number of transmit bytes per second (Bps) on the leaf12 switch exceeds 20,000 Bps on any interface:

        cumulus@switch:~$ netq add tca event_id TCA_TXBYTES_UPPER scope leaf12,'*' channel tca_pd_ifstats threshold 20000
        

        This rule tells NetQ to deliver an event notification to the syslog-netq syslog channel when the temperature on sensor temp1 on the leaf12 switch exceeds 32 degrees Celcius:

        cumulus@switch:~$ netq add tca event_id TCA_SENSOR_TEMPERATURE_UPPER scope leaf12,temp1 channel syslog-netq threshold 32
        

        This rule tells NetQ to deliver an event notification to the tca-slack channel when the total number of ACL drops on the leaf04 switch exceeds 20,000 for any reason, ingress port, or drop type.

        cumulus@switch:~$ netq add tca event_id TCA_WJH_ACL_DROP_AGG_UPPER scope leaf04,'*','*','*' channel tca-slack threshold 20000
        

        For a Slack channel, the event messages should be similar to this:

        Set the Severity of a Threshold-crossing Event

        In addition to defining a scope for TCA rule, you can also set a severity of either info or error. To add a severity to a rule, use the severity option.

        For example, if you want to add an error severity to the CPU utilization rule you created earlier:

        cumulus@switch:~$ netq add tca event_id TCA_CPU_UTILIZATION_UPPER scope '*' severity error channel tca_slack_resources threshold 95
        

        Or if an event is important, but not an error. Set the severity to info:

        cumulus@switch:~$ netq add tca event_id TCA_TXBYTES_UPPER scope leaf12,'*' severity info channel tca_pd_ifstats threshold 20000
        

        Set the Threshold for Digital Optics Events

        Digital optics have the additional option of applying user- or vendor-defined thresholds, using the threshold_type and threshold options.

        This example shows how to send an error to channel ch1 when the upper threshold for module voltage exceeds the vendor-defined thresholds for interface swp31 on the mlx-2700-04 switch.

        cumulus@switch:~$ netq add tca event_id TCA_DOM_MODULE_VOLTAGE_ALARM_UPPER scope 'mlx-2700-04,swp31' severity error is_active true threshold_type vendor_set channel ch1
        Successfully added/updated tca
        

        This example shows how to send an error to channel ch1 when the upper threshold for module voltage exceeds the user-defined threshold of 3V for interface swp31 on the mlx-2700-04 switch.

        cumulus@switch:~$ netq add tca event_id TCA_DOM_MODULE_VOLTAGE_ALARM_UPPER scope 'mlx-2700-04,swp31' severity error is_active true threshold_type user_set threshold 3 channel ch1
        Successfully added/updated tca
        

        Create Multiple Rules for a Single Event

        You may want to create more than one rule per event. For example, you might want to:

        To do this in the NetQ UI, create additional rule cards (as shown in the previous section).

        In the NetQ CLI, you can also add multiple rules. The following example shows the creation of three additional rules for the max temperature sensor:

        netq add tca event_id TCA_SENSOR_TEMPERATURE_UPPER scope leaf*,temp1 channel syslog-netq threshold 32
        
        netq add tca event_id TCA_SENSOR_TEMPERATURE_UPPER scope '*',temp1 channel tca_sensors,tca_pd_sensors threshold 32
        
        netq add tca event_id TCA_SENSOR_TEMPERATURE_UPPER scope leaf03,temp1 channel syslog-netq threshold 29
        

        Now you have four rules created (the original one, plus these three new ones) all based on the TCA_SENSOR_TEMPERATURE_UPPER event. To identify the various rules, NetQ automatically generates a TCA name for each rule. As you create each rule, NetQ adds an _# to the event name. The TCA Name for the first rule created is then TCA_SENSOR_TEMPERATURE_UPPER_1, the second rule created for this event is TCA_SENSOR_TEMPERATURE_UPPER_2, and so forth.

        View Threshold-crossing Rules

        1. Click Menu and navigate to Threshold crossing rules.

        2. Select the relevant tab. The UI displays each rule and its parameters as a card. Each attribute is displayed on the rule card as a regular expression:

        • Equals is displayed as an equals sign (=)
        • Starts with is displayed as a caret (^)
        • Blank (all) is displayed as an asterisk (*)

        This example indicates that the rule applies across all interfaces on the exit-1 switch.

        After creating a rule, you can use the filters that appear above the rule cards to filter by status, severity, channel, and/or events.

        To view TCA rules, run netq show tca:

        netq show tca [tca_id <text-tca-id-anchor>] [json]
        

        This example displays all TCA rules:

        cumulus@switch:~$ netq show tca
        Matching config_tca records:
        TCA Name                     Event Name           Scope                      Severity Channel/s          Active Threshold          Unit     Threshold Type Suppress Until
        ---------------------------- -------------------- -------------------------- -------- ------------------ ------ ------------------ -------- -------------- ----------------------------
        TCA_CPU_UTILIZATION_UPPER_1  TCA_CPU_UTILIZATION_ {"hostname":"leaf01"}      info     pd-netq-events,slk True   87                 %        user_set       Fri Oct  9 15:39:35 2020
                                     UPPER                                                    -netq-events
        TCA_CPU_UTILIZATION_UPPER_2  TCA_CPU_UTILIZATION_ {"hostname":"*"}           error    slk-netq-events    True   93                 %        user_set       Fri Oct  9 15:39:56 2020
                                     UPPER
        TCA_DOM_BIAS_CURRENT_ALARM_U TCA_DOM_BIAS_CURRENT {"hostname":"leaf*","ifnam error    slk-netq-events    True   0                  mA       vendor_set     Fri Oct  9 16:02:37 2020
        PPER_1                       _ALARM_UPPER         e":"*"}
        TCA_DOM_RX_POWER_ALARM_UPPER TCA_DOM_RX_POWER_ALA {"hostname":"*","ifname":" info     slk-netq-events    True   0                  mW       vendor_set     Fri Oct  9 15:25:26 2020
        _1                           RM_UPPER             *"}
        TCA_SENSOR_TEMPERATURE_UPPER TCA_SENSOR_TEMPERATU {"hostname":"leaf","s_name error    slk-netq-events    True   32                 degreeC  user_set       Fri Oct  9 15:40:18 2020
        _1                           RE_UPPER             ":"temp1"}
        TCA_TCAM_IPV4_ROUTE_UPPER_1  TCA_TCAM_IPV4_ROUTE_ {"hostname":"*"}           error    pd-netq-events     True   20000              %        user_set       Fri Oct  9 16:13:39 2020
                                     UPPER
        

        This example displays a specific TCA rule:

        cumulus@switch:~$ netq show tca tca_id TCA_TXMULTICAST_UPPER_1
        Matching config_tca records:
        TCA Name                     Event Name           Scope                      Severity         Channel/s          Active Threshold          Suppress Until
        ---------------------------- -------------------- -------------------------- ---------------- ------------------ ------ ------------------ ----------------------------
        TCA_TXMULTICAST_UPPER_1      TCA_TXMULTICAST_UPPE {"ifname":"swp3","hostname info             tca-tx-bytes-slack True   0                  Sun Dec  8 16:40:14 2269
                                     R                    ":"leaf01"}
        

        Manage Threshold-crossing Event Notifications

        Change the Threshold on a Rule

        After receiving notifications based on a rule, you might want to increase or decrease the threshold value to limit or increase the number of events you receive.

        To modify the threshold:

        1. Locate the rule you want to modify and hover over the top of the card.

        2. Click Edit.

        1. Enter a new threshold value, then select Update rule.

        To modify the threshold, run netq add tca:

        netq add tca tca_id <text-tca-id-anchor> threshold <text-threshold-value>
        

        This example changes the threshold for the rule TCA_CPU_UTILIZATION_UPPER_1 to a value of 96 percent. This overwrites the existing threshold value.

        cumulus@switch:~$ netq add tca tca_id TCA_CPU_UTILIZATION_UPPER_1 threshold 96
        

        Change the Scope of a Rule

        After receiving notifications based on a rule, you might find that you want to narrow or widen the scope value to limit or increase the number of events you receive.

        To modify the scope:

        1. Locate the rule you want to modify and hover over the top of the card.

        2. Click Edit.

        3. Select the toggle to either apply the rule to the entire network or individual hosts.

        4. Click Update rule.

        To modify the scope, run:

        netq add tca event_id <text-event-id-anchor> scope <text-scope-anchor> threshold <text-threshold-value>
        

        This example changes the scope for the rule TCA_CPU_UTILIZATION_UPPER to apply only to switches beginning with a hostname of leaf. You must also provide a threshold value. This example case uses a value of 95 percent. Note that this overwrites the existing scope and threshold values.

        cumulus@switch:~$ netq add tca event_id TCA_CPU_UTILIZATION_UPPER scope hostname^leaf threshold 95
        Successfully added/updated tca
        
        cumulus@switch:~$ netq show tca
        
        Matching config_tca records:
        TCA Name                     Event Name           Scope                      Severity         Channel/s          Active Threshold          Suppress Until
        ---------------------------- -------------------- -------------------------- ---------------- ------------------ ------ ------------------ ----------------------------
        TCA_CPU_UTILIZATION_UPPER_1  TCA_CPU_UTILIZATION_ {"hostname":"*"}           error            onprem-email       True   93                 Mon Aug 31 20:59:57 2020
                                     UPPER
        TCA_CPU_UTILIZATION_UPPER_2  TCA_CPU_UTILIZATION_ {"hostname":"hostname^leaf info                                True   95                 Tue Sep  1 18:47:24 2020
                                     UPPER                "}
        
        

        Change, Add, or Remove Channels

        1. Locate the rule you want to modify and hover over the top of the card.

        2. Click Edit.

        3. Select the Channels tab.

        4. Select one or more channels.

        5. Click Update rule.

        To change a channel association, run:

        netq add tca tca_id <text-tca-id-anchor> channel <text-channel-name-anchor>
        

        This overwrites the existing channel association.

        This example shows the changing of the channel for the disk utilization 1 rule to a PagerDuty channel pd-netq-events.

        cumulus@switch:~$ netq add tca tca_id TCA_DISK_UTILIZATION_UPPER_1 channel pd-netq-events
        Successfully added/updated tca TCA_DISK_UTILIZATION_UPPER_1
        

        To remove a channel association (stop sending events to a particular channel), run:

        netq add tca tca_id <text-tca-id-anchor> channel drop <text-drop-channel-name>
        

        This example removes the tca_slack_resources channel from the disk utilization 1 rule.

        cumulus@switch:~$ netq add tca tca_id TCA_DISK_UTILIZATION_UPPER_1 channel drop tca_slack_resources
        Successfully added/updated tca TCA_DISK_UTILIZATION_UPPER_1
        

        Change the Name of a Rule

        You cannot change the name of a threshold-crossing rule using the NetQ CLI because the rules do not have names. They receive identifiers (the tca_id) automatically. In the NetQ UI, to change a rule name, you must delete the rule and re-create it with the new name.

        Change the Severity of a Rule

        Threshold-crossing rules are categorized as either info or error.

        In the NetQ UI, you must delete the rule and re-create it, specifying the new severity.

        In the NetQ CLI, to change the severity, run:

        netq add tca tca_id <text-tca-id-anchor> (severity info | severity error)
        

        This example changes the severity of the maximum CPU utilization 1 rule from error to info:

        cumulus@switch:~$ netq add tca tca_id TCA_CPU_UTILIZATION_UPPER_1 severity info
        Successfully added/updated tca TCA_CPU_UTILIZATION_UPPER_1
        

        Suppress a Rule

        During troubleshooting or switch maintenance, you might want to suppress a rule to prevent erroneous or excessive notifications. This effectively pauses notifications for a specified time period.

        1. Locate the rule you want to disable and click Disable.

        2. Select the Date/Time field to set when you want the rule to be reenabled.

        3. Click Disable.

        Note the changes in the card:
        • The state changes to Snoozed
        • The Suppressed field displays the date and time at which the rule will be reenabled.
        • The Disable button changes to Disable forever.

        Using the suppress_until option allows you to prevent the rule from being applied for a designated amout of time (in seconds). When this time has passed, the rule is automatically reenabled.

        To suppress a rule, run:

        netq add tca tca_id <text-tca-id-anchor> suppress_until <text-suppress-ts>
        

        This example suppresses the maximum cpu utilization event for 24 hours:

        cumulus@switch:~$ netq add tca tca_id TCA_CPU_UTILIZATION_UPPER_2 suppress_until 86400
        Successfully added/updated tca TCA_CPU_UTILIZATION_UPPER_2
        

        Disable a Rule

        Whereas suppression temporarily disables a rule, you can also disable a rule indefinitely.

        To disable a rule that is currently active:

        1. Locate the rule you want to disable.

        2. Click Disable.

        3. Leave the Date/Time field blank.

        4. Click Disable.

        Note the changes in the card:
        • The state changes to Inactive
        • The rule definition is grayed out
        • The Disable option has changed to Enable to reactivate the rule when you are ready

        To disable a rule that is currently suppressed, click Disable forever.

        To disable a rule, run:

        netq add tca tca_id <text-tca-id-anchor> is_active false
        

        This example disables the maximum disk utilization 1 rule:

        cumulus@switch:~$ netq add tca tca_id TCA_DISK_UTILIZATION_UPPER_1 is_active false
        Successfully added/updated tca TCA_DISK_UTILIZATION_UPPER_1
        

        To reenable the rule, set the is_active option to true.

        Delete a Rule

        To delete a rule:

        1. Locate the rule you want to remove and hover over the card.

        2. In the card’s top-right corner, select Delete.

        To remove a rule altogether, run:

        netq del tca tca_id <text-tca-id-anchor>
        

        This example deletes the maximum receive bytes rule:

        cumulus@switch:~$ netq del tca tca_id TCA_RXBYTES_UPPER_1
        Successfully deleted TCA TCA_RXBYTES_UPPER_1
        

        Resolve Scope Conflicts

        There might be occasions where the scopes defined by multiple threshold-crossing rules overlap. In such cases, NetQ uses the rule with the most specific scope that is still true to generate the event.

        To clarify this, consider this example. Three events occurred:

        NetQ attempts to match the threshold-crossing event against hostname and interface name with three threshold-crossing rules with different scopes:

        The result is:

        In summary:

        Input Event Scope Parameters TCA Scope 1 TCA Scope 2 TCA Scope 3 Scope Applied
        leaf01,swp1 Hostname, Interface '*','*' leaf*,'*' leaf01,swp1 Scope 3
        leaf01,swp3 Hostname, Interface '*','*' leaf*,'*' leaf01,swp1 Scope 2
        spine01,swp1 Hostname, Interface '*','*' leaf*,'*' leaf01,swp1 Scope 1

        You can modify threshold-crossing rules to remove conflicts.

        BGP

        Use the UI or CLI to monitor Border Gateway Protocol (BGP) on a networkwide or per-session basis.

        BGP Commands

        Monitor BGP with the following commands. See the command line reference for additional options, definitions, and examples.

        netq show bgp
        netq show events message_type bgp
        netq show events-config message_type bgp
        

        The netq check bgp command checks for consistency across BGP sessions in your network fabric.

        netq check bgp
        

        View BGP in the UI

        To add the BGP card to your workbench, navigate to the header and select Add card > Network services > All BGP Sessions card > Open cards. In this example, there are 13 nodes running the BGP protocol, 0 open events (from the last 24 hours), and 10 nodes with unestablished sessions.

        Expand to the large card for additional BGP info. By default, the card displays the Sessions summary tab. From here you can see which devices are handling the most BGP sessions, or select the dropdown to view nodes with the most unestablished BGP sessions. You can view BGP-related events by selecting the Events tab.

        Expand the BGP card to full-screen to view, filter, or export:

        From this table, you can select a row, then click Add card above the table.

        NetQ adds a new, BGP ‘single-session’ card to your workbench. From this card, you can view session state changes and compare them with events, and monitor the running BGP configuration and changes to the configuration file.

        Before adding a BGP single-session card, verify that both the peer hostname and peer ASN are valid. This ensures the information presented is reliable.

        Monitor a Single BGP Session

        The BGP single-session card displays the node, its peer, its status (established or unestablished), and its router ID. This information can help you determine the stability of the BGP session between two devices. The heat map indicates the status of the session over the designated time period. In this example, the session has been established throughout the entire time period:

        Understanding the Heat Map

        On the medium and large single-session cards, vertically stacked heat maps represent the status of the sessions: one for established sessions, and one for unestablished sessions. Depending on the time period of data on the card, the number of smaller time blocks indicate that the status varies. A vertical stack of time blocks, one from each map, includes the results from all checks during that time. The results appear by how saturated the color is for each block. If only established sessions occurred during that time period for the entire time block, then the top block is 100% saturated (white) and the unestablished block is 0% saturated (gray). As unestablished sessions increase in saturation, the established sessions block is proportionally reduced in saturation. The following table lists the most common time periods, their corresponding number of blocks, and the amount of time represented by one block:

        Time Period Number of Runs Number Time Blocks Amount of Time in Each Block
        6 hours 18 6 1 hour
        12 hours 36 12 1 hour
        24 hours 72 24 1 hour
        1 week 504 7 1 day
        1 month 2,086 30 1 day
        1 quarter 7,000 13 1 week

        View Changes to the BGP Service Configuration File

        Each time a change is made to the configuration file for the BGP service, NetQ logs the change and lets you compare it with the previous version. This can be useful when you are troubleshooting potential causes for events or sessions losing their connections.

        To view the configuration file changes:

        1. From the large single-session card, select the Configuration File Evolution tab.

        2. Select the time.

        3. Choose between the File view and the Diff view.

          The File view displays the content of the file:

          The Diff view highlights the changes (if any) between this version (on left) and the most recent version (on right) side by side:

        Validate Overall Network Health

        The Validation Summary card in the NetQ UI lets you view the overall health of your network at a glance, giving you a high-level understanding of how well your network is operating. Successful validation results determine overall network health shown in this card.

        View Key Metrics of Network Health

        Overall network health in the NetQ UI is a calculated average of several key health metrics: system, network services, and interface health.

        System health represents the NetQ Agent and sensor health validations. In all cases, validation checks are performed on the agents. If you are monitoring platform sensors, the validation checks include these as well.

        Network service health represents the individual network protocol and services validation checks. In all cases, checks are performed on NTP. If you are running BGP, EVPN, MLAG, OSPF, or VXLAN protocols the validation checks include these as well.

        Interface health represents the interfaces, VLAN, and link MTU validation checks.

        To view network health metrics:

        1. Open or locate the Validation Summary card on your workbench.

        2. Each metric displays a distribution of the validation results for each category. Hover over the individual categories to view detailed metrics for specific validation checks.

          In this example, system health is good, but network services and interface health display validation failures:

          medium validation summary card displaying high-level health metrics

        View Detailed Network Health

        To view details about your network’s health, open or locate the large Validation Summary card on your workbench. To view devices with the most issues or recent issues, select the Most failures tab or Recent failures tab, respectively. You can unselect one or more services on the left side of the card to display devices affected by the selected services on the right side of the card.

        By default, the System health tab is displayed.

        The health of agents and sensors is represented on the left side of the card. Hover over the chart for each type of validation to see detailed results. The right side of the card displays devices with failures related to agents and sensors.

        Click the Network service health tab.

        The health of each network protocol or service is represented on the left side of the card. Hover over the chart for each type of validation to see detailed results. The right side of the card displays devices with failures related to these protocols and services.

        Click the Interface health tab.

        The health of interfaces, VLANs, and link MTUs is represented on the left side of the card. Hover over the chart for each type of validation to see detailed results. The right side of the card displays devices with failures related to interfaces, VLANs, and link MTUs.

        View Details of a Particular Service

        From the relevant tab (System Health, Network Service Health, or Interface Health) on the large Validation Summary card, you can select a chart to open a full-screen view of the validation data for that service.

        The following example shows the EVPN chart:

        EVPN validation data

        View All Network Protocol and Service Validation Results

        Expand the Validation Summary card to full-screen to view all validation check results for all network protocols and services during a designated time period.

        fullscreen validation summary card displaying EVPN metrics

        Configure and Monitor What Just Happened

        What Just Happened (WJH) streams detailed and contextual telemetry data for analysis. This provides real-time visibility into problems in the network, such as hardware packet drops due to buffer congestion, incorrect routing, and ACL or layer 1 problems.

        Using WJH in combination with NetQ helps you identify losses anywhere in the fabric. From a single management console you can:

        For a list of supported WJH events, refer to the WJH Events Reference.

        To use a gNMI client to export WJH data to a collector, refer to Collect WJH Data with gNMI.

        WJH is only supported on NVIDIA Spectrum switches running Cumulus Linux 4.4.0 or later. WJH latency and congestion monitoring is supported on NVIDIA Spectrum-2 switches and later. SONiC only supports collection of WJH data with gNMI.

        By default, Cumulus Linux 4.4.0 and later includes the NetQ Agent and CLI. Depending on the version of Cumulus Linux running on your NVIDIA switch, you might need to upgrade the NetQ Agent and CLI to the latest release:

        cumulus@<hostname>:~$ sudo apt-get update
        cumulus@<hostname>:~$ sudo apt-get install -y netq-agent
        cumulus@<hostname>:~$ sudo netq config restart agent
        cumulus@<hostname>:~$ sudo apt-get install -y netq-apps
        cumulus@<hostname>:~$ sudo netq config restart cli
        

        Configure What Just Happened

        WJH is enabled by default on NVIDIA Spectrum switches running Cumulus Linux 4.4.0 or later. Before WJH can collect data, you must enable the NetQ Agent on your switches and servers.

        To enable WJH on any switch or server:

        1. Configure the NetQ Agent on the switch:

          cumulus@switch:~$ sudo netq config add agent wjh
          
        2. Restart the NetQ Agent to begin collecting WJH data:

          cumulus@switch:~$ sudo netq config restart agent
          

        When you finish viewing WJH metrics, you can stop the NetQ Agent from collecting WJH data to reduce network traffic. Use netq config del agent wjh followed by netq config restart agent to disable WJH on a given switch.

        Using wjh_dump.py on an NVIDIA platform that is running Cumulus Linux and the NetQ Agent causes the NetQ WJH client to stop receiving packet drop call backs. To prevent this issue, run wjh_dump.py on a system other than the one where the NetQ Agent has WJH enabled, or disable wjh_dump.py and restart the NetQ Agent with netq config restart agent.

        View What Just Happened Metrics

        You can view the WJH metrics from the NetQ UI or the NetQ CLI. WJH metrics are visible on the WJH card and the Events card. To view the metrics on the Events card, open the large card and select the WJH tab at the top of the card. For a more detailed view, open the WJH card.

        To add the WJH card to your workbench, navigate to the header and select Add card > Events > What Just Happened > Open cards

        what just happened card displaying errors and warnings

        You can expand the card to see a detailed summary of WJH data, including devices with the most drops, the number of drops, their distribution, and a timeline:

        expanded what just happened card displaying devices with the most drops

        Expand the card to its largest size to open the WJH dashboard. You can also access this dashboard by clicking Menu, then What Just Happened.

        fully expanded what just happened card with detailed drop information

        The table beneath the charts displays WJH events and recommendations for resolving them. Hover over the color-coded chart to view WJH event categories:

        donut chart displaying types of drops

        Click on a category in the chart for a detailed view:

        donut chart and graph displaying detailed drop information

        Select Advanced view in the top-right corner for a tabular display of drops that can be sorted by drop type. This display includes additional information, such as source and destination IP addresses, ports, and MACs.

        advanced view of WJH L2 drops

        For L1 events, you can group entries by switch and ingress port to reduce the number of events displayed. To do this, select the Aggregate by port toggle in the top-right corner.

        advanced view of WJH L1 drops with aggregated drops

        To view WJH drops, run one of the following commands. Refer to the command line reference for a comprehensive list of options and definitions.

        netq [<hostname>] show wjh-drop 
            [severity <text-severity>] 
            [details] 
            [between <text-fixed-time> and <text-fixed-endtime>] 
            [around <text-fixed-time>] 
            [json]
        
        netq [<hostname>] show wjh-drop <text-drop-type> 
            [ingress-port <text-ingress-port>] 
            [severity <text-severity>] 
            [reason <text-reason>] 
            [src-ip <text-src-ip>] 
            [dst-ip <text-dst-ip>] 
            [proto <text-proto>] 
            [src-port <text-src-port>] 
            [dst-port <text-dst-port>] 
            [src-mac <text-src-mac>] 
            [dst-mac <text-dst-mac>] 
            [egress-port <text-egress-port>] 
            [traffic-class <text-traffic-class>] 
            [rule-id-acl <text-rule-id-acl>] 
            [vlan <text-vlan>]
            [between <text-time> and <text-endtime>] 
            [around <text-time>] 
            [json]
        

        An additional command is available that aggregates WJH L1 errors that occur on the same ingress port.

        netq [<hostname>] show wjh-drop l1 
            [ingress-port <text-ingress-port>] 
            [severity <text-severity>]
            [reason <text-reason>] 
            [port-aggregate <text-port-aggregate>] 
            [between <text-time> and <text-endtime>] 
            [around <text-time>] [json]
        

        This example uses the first form of the command to show drops on switch leaf03 for the past week.

        cumulus@switch:~$ netq leaf03 show wjh-drop between now and 7d
        Matching wjh records:
        Drop type          Aggregate Count
        ------------------ ------------------------------
        L1                 560
        Buffer             224
        Router             144
        L2                 0
        ACL                0
        Tunnel             0
        

        This example uses the second form of the command to show drops on switch leaf03 for the past week including the drop reasons.

        cumulus@switch:~$ netq leaf03 show wjh-drop details between now and 7d
        
        Matching wjh records:
        Drop type          Aggregate Count                Reason
        ------------------ ------------------------------ ---------------------------------------------
        L1                 556                            None
        Buffer             196                            WRED
        Router             144                            Blackhole route
        Buffer             14                             Packet Latency Threshold Crossed
        Buffer             14                             Port TC Congestion Threshold
        L1                 4                              Oper down
        

        This example shows the drops seen at layer 2 across the network.

        cumulus@mlx-2700-03:mgmt:~$ netq show wjh-drop l2
        Matching wjh records:
        Hostname          Ingress Port             Reason                                        Agg Count          Src Ip           Dst Ip           Proto  Src Port         Dst Port         Src Mac            Dst Mac            First Timestamp                Last Timestamp
        ----------------- ------------------------ --------------------------------------------- ------------------ ---------------- ---------------- ------ ---------------- ---------------- ------------------ ------------------ ------------------------------ ----------------------------
        mlx-2700-03       swp1s2                   Port loopback filter                          10                 27.0.0.19        27.0.0.22        0      0                0                00:02:00:00:00:73  0c:ff:ff:ff:ff:ff  Mon Dec 16 11:54:15 2019       Mon Dec 16 11:54:15 2019
        mlx-2700-03       swp1s2                   Source MAC equals destination MAC             10                 27.0.0.19        27.0.0.22        0      0                0                00:02:00:00:00:73  00:02:00:00:00:73  Mon Dec 16 11:53:17 2019       Mon Dec 16 11:53:17 2019
        mlx-2700-03       swp1s2                   Source MAC equals destination MAC             10                 0.0.0.0          0.0.0.0          0      0                0                00:02:00:00:00:73  00:02:00:00:00:73  Mon Dec 16 11:40:44 2019       Mon Dec 16 11:40:44 2019
        

        The following two examples include the severity of a drop event (error, warning, or notice) for ACLs and routers.

        cumulus@switch:~$ netq show wjh-drop acl
        Matching wjh records:
        Hostname          Ingress Port             Reason                                        Severity         Agg Count          Src Ip           Dst Ip           Proto  Src Port         Dst Port         Src Mac            Dst Mac            Acl Rule Id            Acl Bind Point               Acl Name         Acl Rule         First Timestamp                Last Timestamp
        ----------------- ------------------------ --------------------------------------------- ---------------- ------------------ ---------------- ---------------- ------ ---------------- ---------------- ------------------ ------------------ ---------------------- ---------------------------- ---------------- ---------------- ------------------------------ ----------------------------
        leaf01            swp2                     Ingress router ACL                            Error            49                 55.0.0.1         55.0.0.2         17     8492             21423            00:32:10:45:76:89  00:ab:05:d4:1b:13  0x0                    0                                                              Tue Oct  6 15:29:13 2020       Tue Oct  6 15:29:39 2020
        
        cumulus@switch:~$ netq show wjh-drop router
        Matching wjh records:
        Hostname          Ingress Port             Reason                                        Severity         Agg Count          Src Ip           Dst Ip           Proto  Src Port         Dst Port         Src Mac            Dst Mac            First Timestamp                Last Timestamp
        ----------------- ------------------------ --------------------------------------------- ---------------- ------------------ ---------------- ---------------- ------ ---------------- ---------------- ------------------ ------------------ ------------------------------ ----------------------------
        leaf01            swp1                     Blackhole route                               Notice           36                 46.0.1.2         47.0.2.3         6      1235             43523            00:01:02:03:04:05  00:06:07:08:09:0a  Tue Oct  6 15:29:13 2020       Tue Oct  6 15:29:47 2020
        

        Configure Latency and Congestion Thresholds

        WJH latency and congestion metrics depend on threshold settings to trigger the events. WJH measures packet latency as the time spent inside a single system (switch). When specified, WJH triggers events when measured values cross high thresholds and events are suppressed when values are below low thresholds.

        To configure these thresholds, run:

        netq config add agent wjh-threshold
            (latency|congestion)
            (<text-tc-list>|all)
            (<text-port-list>|all)
            <text-th-hi>
            <text-th-lo>
        

        You can specify multiple traffic classes and multiple ports by separating the classes or ports by a comma (no spaces).

        For example, the following command creates latency thresholds for Class 3 traffic on port swp1 where the upper threshold is 10 usecs and the lower threshold is 1 usec:

        cumulus@switch:~$ sudo netq config add agent wjh-threshold latency 3 swp1 10 1
        

        This example creates congestion thresholds for Class 4 traffic on port swp1 where the upper threshold is 200 cells and the lower threshold is 10 cells, where a cell is a unit of 144 bytes:

        cumulus@switch:~$ sudo netq config add agent wjh-threshold congestion 4 swp1 200 10
        

        Refer to the command line reference for a comprehensive list of options and definitions for this command.

        Suppress Events with Filters

        You can create filters with the UI or CLI to prevent WJH from generating events. Filters can be applied to a drop category (such as layer 1 drops or buffer drops), a drop reason (for example, “decapsulation error” or “multicast MAC mismatch”), or according to severity level (notice, warning, or error). With the CLI, you can create filters to suppress events according to their source or destination IP addresses.

        For a complete list of drop types, reasons, and severity levels, refer to the WJH Events Reference.

        Before configuring the NetQ Agent to filter WJH drops, you must generate AuthKeys. Copy the access key and secret key to an accessible location. You will enter them in one of the final steps.

        1. Expand the Menu and select Manage switches.

        2. Select the NetQ agent configurations tab.

        3. On the NetQ Agent Configurations card, select Add config.

        4. Enter a name for the profile. In the WJH row, select Enable, then Customize. By default, WJH includes all drop reasons and severities. Uncheck any drop reasons or severity you do not want to generate WJH events, then click Done.

          modal describing WJH event capture options

        5. Enter your NetQ access key and secret key.

        6. Select Add to save the configuration profile, or click Close to discard it.

        To configure the NetQ Agent to filter WJH drops, run netq config add agent wjh-drop-filter. Use tab completion to view the available drop type, drop reason, and severity values.

        netq config add agent wjh-drop-filter 
           drop-type <text-wjh-drop-type> 
           [drop-reasons <text-wjh-drop-reasons>] 
           [severity <text-drop-severity-list>]
        

        To configure the NetQ Agent to ignore WJH drops based on IP addresses (both source and destination), run:

        netq config add agent wjh-drop-filter 
           ips [<text-wjh-ips>]
        

        To display filter configurations, run netq config show agent wjh-drop-filter. To delete a filter, run netq config del agent wjh-drop-filter.

        DPUs

        With the NetQ UI, you can monitor hardware resources of individual data processing units (DPUs), including CPU utilization, disk usage, and memory utilization. For DPU inventory information, refer to DPU Inventory.

        You must install and configure install and configure the DOCA Telemetry Service to display DPU data in NetQ.

        View Overall Health of a DPU

        For an overview of the current or past health of DPU hardware resources, open the DPU device card. To open a DPU device card:

        1. Click Devices in the header, then click Open a device card.

        2. Select a DPU from the dropdown.

        3. Click Add. This example shows that the r-netq-bf2-01 DPU has low utilization across CPU, memory, and disks:

          DPU card displaying CPU, memory, and disk utilization statistics

        View DPU Attributes

        For a quick look at the key attributes of a particular DPU, expand the DPU card.

        Attributes are displayed as the default tab on the large DPU card. You can view the static information about the DPU, including its hostname, ASIC vendor and model, CPU information, OS version, and agent version.

        large DPU card displaying static DPU information

        To view a larger display of hardware resource utilization, select Utilization.

        Expand the card to its largest size to view a list of installed packages and RoCE counters for a given DPU. You can filter RoCE information by physical port, priority port, RoCE extended, RoCE, and peripheral component interconnect (PCI).

        gNMI Streaming

        You can use gRPC Network Management Interface (gNMI) to collect system resource, interface, and counter information from Cumulus Linux and export it to your own gNMI client.

        Configure the gNMI Agent

        The gNMI agent is disabled by default. To enable it, run:

         cumulus@switch:~$ netq config add agent gnmi-enable true
        

        The gNMI agent listens over port 9339. You can change the default port in case you use that port in another application. The /etc/netq/netq.yml file stores the configuration.

        Use the following commands to adjust the settings:

        1. Disable the gNMI agent:

          cumulus@switch:~$ netq config add agent gnmi-enable false
          
        2. Change the default port over which the gNMI agent listens:

          cumulus@switch:~$ netq config add agent gnmi-port <gnmi_port>
          
        3. Restart the NetQ agent to incorporate the configuration changes:

          cumulus@switch:~$ netq config restart agent
          

        Use the gNMI Agent Only

        NVIDIA recommends collecting data with both the gNMI and NetQ agents. However, if you do not want to collect data with both agents, you can disable the NetQ agent. Data is then sent exclusively to the gNMI agent.

        To disable the NetQ agent, use the following command:

        cumulus@switch:~$ netq config add agent opta-enable false
        

        You cannot disable both the NetQ and gNMI agents. If both agents are enabled on Cumulus Linux and a NetQ server is unreachable, the data from the following models are not sent to gNMI:

        • openconfig-interfaces
        • openconfig-if-ethernet
        • openconfig-if-ethernet-ext
        • openconfig-system
        • nvidia-if-ethernet-ext

        WJH, openconfig-platform, and openconfig-lldp data continue streaming to gNMI in this state. If you are only using gNMI and a NetQ telemetry server does not exist, you should disable the NetQ agent by setting opta-enable to false.

        Supported Models

        Cumulus Linux supports the following OpenConfig models:

        Model Supported Data
        openconfig-interfaces Name, Operstatus, AdminStatus, IfIndex, MTU, LoopbackMode, Enabled, Counters (InPkts, OutPkts, InOctets, InUnicastPkts, InDiscards, InMulticastPkts, InBroadcastPkts, InErrors, OutOctets, OutUnicastPkts, OutMulticastPkts, OutBroadcastPkts, OutDiscards, OutErrors)
        openconfig-if-ethernet AutoNegotiate, PortSpeed, MacAddress, NegotiatedPortSpeed, Counters (InJabberFrames, InOversizeFrames,​ InUndersizeFrames)
        openconfig-if-ethernet-ext Frame size counters (InFrames_64Octets, InFrames_65_127Octets, InFrames_128_255Octets, InFrames_256_511Octets, InFrames_512_1023Octets, InFrames_1024_1518Octets)
        openconfig-system Memory, CPU
        openconfig-platform Platform data (Name, Description, Version)
        openconfig-lldp LLDP data (PortIdType, PortDescription, LastUpdate, SystemName, SystemDescription, ChassisId, Ttl, Age, ManagementAddress, ManagementAddressType, Capability)

        gNMI clients can also use the following NVIDIA models:

        Model Supported Data
        nvidia-if-wjh-drop-aggregate Aggregated WJH drops, including L1, L2, router, ACL, tunnel, and buffer drops
        nvidia-if-ethernet-ext Extended Ethernet counters (AlignmentError, InAclDrops, InBufferDrops, InDot3FrameErrors, InDot3LengthErrors, InL3Drops, InPfc0Packets, InPfc1Packets, InPfc2Packets, InPfc3Packets, InPfc4Packets, InPfc5Packets, InPfc6Packets, InPfc7Packets, OutNonQDrops, OutPfc0Packets, OutPfc1Packets, OutPfc2Packets, OutPfc3Packets, OutPfc4Packets, OutPfc5Packets, OutPfc6Packets, OutPfc7Packets, OutQ0WredDrops, OutQ1WredDrops, OutQ2WredDrops, OutQ3WredDrops, OutQ4WredDrops, OutQ5WredDrops, OutQ6WredDrops, OutQ7WredDrops, OutQDrops, OutQLength, OutWredDrops, SymbolErrors, OutTxFifoFull)

        The client should use the following YANG models as a reference:

        nvidia-if-ethernet-ext
        module nvidia-if-ethernet-counters-ext {
            // xPath --> /interfaces/interface[name=*]/ethernet/counters/state/
        
           namespace "http://nvidia.com/yang/nvidia-ethernet-counters";
           prefix "nvidia-if-ethernet-counters-ext";
        
        
          // import some basic types
          import openconfig-interfaces { prefix oc-if; }
          import openconfig-if-ethernet { prefix oc-eth; }
          import openconfig-yang-types { prefix oc-yang; }
        
        
          revision "2021-10-12" {
            description
              "Initial revision";
            reference "1.0.0.";
          }
        
          grouping ethernet-counters-ext {
        
            leaf alignment-error {
              type oc-yang:counter64;
            }
        
            leaf in-acl-drops {
              type oc-yang:counter64;
            }
        
            leaf in-buffer-drops {
              type oc-yang:counter64;
            }
        
            leaf in-dot3-frame-errors {
              type oc-yang:counter64;
            }
        
            leaf in-dot3-length-errors {
              type oc-yang:counter64;
            }
        
            leaf in-l3-drops {
              type oc-yang:counter64;
            }
        
            leaf in-pfc0-packets {
              type oc-yang:counter64;
            }
        
            leaf in-pfc1-packets {
              type oc-yang:counter64;
            }
        
            leaf in-pfc2-packets {
              type oc-yang:counter64;
            }
        
            leaf in-pfc3-packets {
              type oc-yang:counter64;
            }
        
            leaf in-pfc4-packets {
              type oc-yang:counter64;
            }
        
            leaf in-pfc5-packets {
              type oc-yang:counter64;
            }
        
            leaf in-pfc6-packets {
              type oc-yang:counter64;
            }
        
            leaf in-pfc7-packets {
              type oc-yang:counter64;
            }
        
            leaf out-non-q-drops {
              type oc-yang:counter64;
            }
        
            leaf out-pfc0-packets {
              type oc-yang:counter64;
            }
        
            leaf out-pfc1-packets {
              type oc-yang:counter64;
            }
        
            leaf out-pfc2-packets {
              type oc-yang:counter64;
            }
        
            leaf out-pfc3-packets {
              type oc-yang:counter64;
            }
        
            leaf out-pfc4-packets {
              type oc-yang:counter64;
            }
        
            leaf out-pfc5-packets {
              type oc-yang:counter64;
            }
        
            leaf out-pfc6-packets {
              type oc-yang:counter64;
            }
        
            leaf out-pfc7-packets {
              type oc-yang:counter64;
            }
        
            leaf out-q0-wred-drops {
              type oc-yang:counter64;
            }
        
            leaf out-q1-wred-drops {
              type oc-yang:counter64;
            }
        
            leaf out-q2-wred-drops {
              type oc-yang:counter64;
            }
        
            leaf out-q3-wred-drops {
              type oc-yang:counter64;
            }
        
            leaf out-q4-wred-drops {
              type oc-yang:counter64;
            }
        
            leaf out-q5-wred-drops {
              type oc-yang:counter64;
            }
        
            leaf out-q6-wred-drops {
              type oc-yang:counter64;
            }
        
            leaf out-q7-wred-drops {
              type oc-yang:counter64;
            }
        
            leaf out-q8-wred-drops {
              type oc-yang:counter64;
            }
        
            leaf out-q9-wred-drops {
              type oc-yang:counter64;
            }
        
            leaf out-q-drops {
              type oc-yang:counter64;
            }
        
            leaf out-q-length {
              type oc-yang:counter64;
            }
        
            leaf out-wred-drops {
              type oc-yang:counter64;
            }
        
            leaf symbol-errors {
              type oc-yang:counter64;
            }
        
            leaf out-tx-fifo-full {
              type oc-yang:counter64;
            }
        
          }
        
          augment "/oc-if:interfaces/oc-if:interface/oc-eth:ethernet/" +
            "oc-eth:state/oc-eth:counters" {
              uses ethernet-counters-ext;
          }
        
        }
        
        nvidia-if-wjh-drop-aggregate
        module nvidia-wjh {
            // Entrypoint /oc-if:interfaces/oc-if:interface
            //
            // xPath L1     --> interfaces/interface[name=*]/wjh/aggregate/l1
            // xPath L2     --> /interfaces/interface[name=*]/wjh/aggregate/l2/reasons/reason[id=*][severity=*]
            // xPath Router --> /interfaces/interface[name=*]/wjh/aggregate/router/reasons/reason[id=*][severity=*]
            // xPath Tunnel --> /interfaces/interface[name=*]/wjh/aggregate/tunnel/reasons/reason[id=*][severity=*]
            // xPath Buffer --> /interfaces/interface[name=*]/wjh/aggregate/buffer/reasons/reason[id=*][severity=*]
            // xPath ACL    --> /interfaces/interface[name=*]/wjh/aggregate/acl/reasons/reason[id=*][severity=*]
        
            import openconfig-interfaces { prefix oc-if; }
        
            namespace "http://nvidia.com/yang/what-just-happened-config";
            prefix "nvidia-wjh";
        
            revision "2021-10-12" {
                description
                    "Initial revision";
                reference "1.0.0.";
            }
        
            augment "/oc-if:interfaces/oc-if:interface" {
                uses interfaces-wjh;
            }
        
            grouping interfaces-wjh {
                description "Top-level grouping for What-just happened data.";
                container wjh {
                    container aggregate {
                        container l1 {
                            container state {
                                leaf drop {
                                    type string;
                                    description "Drop list based on wjh-drop-types module encoded in JSON";
                                }
                            }
                        }
                        container l2 {
                            uses reason-drops;
                        }
                        container router {
                            uses reason-drops;
                        }
                        container tunnel {
                            uses reason-drops;
                        }
                        container acl {
                            uses reason-drops;
                        }
                        container buffer {
                            uses reason-drops;
                        }
                    }
                }
            }
        
            grouping reason-drops {
                container reasons {
                    list reason {
                        key "id severity";
                        leaf id {
                            type leafref {
                                path "../state/id";
                            }
                            description "reason ID";
                        }
                        leaf severity {
                            type leafref {
                                path "../state/severity";
                            }
                            description "Reason severity";
                        }
                        container state {
                            leaf id {
                                type uint32;
                                description "Reason ID";
                            }
                            leaf name {
                                type string;
                                description "Reason name";
                            }
                            leaf severity {
                                type string;
                                mandatory "true";
                                description "Reason severity";
                            }
                            leaf drop {
                                type string;
                                description "Drop list based on wjh-drop-types module encoded in JSON";
                            }
                        }
                    }
                }
            }
        }
        
        module wjh-drop-types {
            namespace "http://nvidia.com/yang/what-just-happened-config-types";
            prefix "wjh-drop-types";
        
            container l1-aggregated {
                uses l1-drops;
            }
            container l2-aggregated {
                uses l2-drops;
            }
            container router-aggregated {
                uses router-drops;
            }
            container tunnel-aggregated {
                uses tunnel-drops;
            }
            container acl-aggregated {
                uses acl-drops;
            }
            container buffer-aggregated {
                uses buffer-drops;
            }
        
            grouping reason-key {
                leaf id {
                    type uint32;
                    mandatory "true";
                    description "reason ID";
                }
                leaf severity {
                    type string;
                    mandatory "true";
                    description "Severity";
                }
            }
        
            grouping reason_info {
                leaf reason {
                        type string;
                        mandatory "true";
                        description "Reason name";
                }
                leaf drop_type {
                    type string;
                    mandatory "true";
                    description "reason drop type";
                }
                leaf ingress_port {
                    type string;
                    mandatory "true";
                    description "Ingress port name";
                }
                leaf ingress_lag {
                    type string;
                    description "Ingress LAG name";
                }
                leaf egress_port {
                    type string;
                    description "Egress port name";
                }
                leaf agg_count {
                    type uint64;
                    description "Aggregation count";
                }
                leaf severity {
                    type string;
                    description "Severity";
                }
                leaf first_timestamp {
                    type uint64;
                    description "First timestamp";
                }
                leaf end_timestamp {
                    type uint64;
                    description "End timestamp";
                }
            }
        
            grouping packet_info {
                leaf smac {
                    type string;
                    description "Source MAC";
                }
                leaf dmac {
                    type string;
                    description "Destination MAC";
                }
                leaf sip {
                    type string;
                    description "Source IP";
                }
                leaf dip {
                    type string;
                    description "Destination IP";
                }
                leaf proto {
                    type uint32;
                    description "Protocol";
                }
                leaf sport {
                    type uint32;
                    description "Source port";
                }
                leaf dport {
                    type uint32;
                    description "Destination port";
                }
            }
        
            grouping l1-drops {
                description "What-just happened drops.";
                leaf ingress_port {
                    type string;
                    description "Ingress port";
                }
                leaf is_port_up {
                    type boolean;
                    description "Is port up";
                }
                leaf port_down_reason {
                    type string;
                    description "Port down reason";
                }
                leaf description {
                    type string;
                    description "Description";
                }
                leaf state_change_count {
                    type uint64;
                    description "State change count";
                }
                leaf symbol_error_count {
                    type uint64;
                    description "Symbol error count";
                }
                leaf crc_error_count {
                    type uint64;
                    description "CRC error count";
                }
                leaf first_timestamp {
                    type uint64;
                    description "First timestamp";
                }
                leaf end_timestamp {
                    type uint64;
                    description "End timestamp";
                }
                leaf timestamp {
                    type uint64;
                    description "Timestamp";
                }
            }
            grouping l2-drops {
                description "What-just happened drops.";
                uses reason_info;
                uses packet_info;
            }
        
            grouping router-drops {
                description "What-just happened drops.";
                uses reason_info;
                uses packet_info;
            }
        
            grouping tunnel-drops {
                description "What-just happened drops.";
                uses reason_info;
                uses packet_info;
            }
        
            grouping acl-drops {
                description "What-just happened drops.";
                uses reason_info;
                uses packet_info;
                leaf acl_rule_id {
                    type uint64;
                    description "ACL rule ID";
                }
                leaf acl_bind_point {
                    type uint32;
                    description "ACL bind point";
                }
                leaf acl_name {
                    type string;
                    description "ACL name";
                }
                leaf acl_rule {
                    type string;
                    description "ACL rule";
                }
            }
        
            grouping buffer-drops {
                description "What-just happened drops.";
                uses reason_info;
                uses packet_info;
                leaf traffic_class {
                    type uint32;
                    description "Traffic Class";
                }
                leaf original_occupancy {
                    type uint32;
                    description "Original occupancy";
                }
                leaf original_latency {
                    type uint64;
                    description "Original latency";
                }
            }
        }
        

        Collect WJH Data Using gNMI

        You can export What Just Happened data from the NetQ agent to your own gNMI client. Refer to the previous section for the nvidia-if-wjh-drop-aggregate reference YANG model.

        Supported Features

        WJH Drop Reasons

        The data NetQ sends to the gNMI agent is in the form of WJH drop reasons. The reasons are generated by the SDK and are stored in the /usr/etc/wjh_lib_conf.xml file on the switch. Use this file as a guide to filter for specific reason types (L1, ACL, and so forth), reason IDs, or event severities.

        L1 Drop Reasons

        Reason ID Reason Description
        10021 Port admin down Validate port configuration
        10022 Auto-negotiation failure Set port speed manually, disable auto-negotiation
        10023 Logical mismatch with peer link Check cable/transceiver
        10024 Link training failure Check cable/transceiver
        10025 Peer is sending remote faults Replace cable/transceiver
        10026 Bad signal integrity Replace cable/transceiver
        10027 Cable/transceiver is not supported Use supported cable/transceiver
        10028 Cable/transceiver is unplugged Plug cable/transceiver
        10029 Calibration failure Check cable/transceiver
        10030 Cable/transceiver bad status Check cable/transceiver
        10031 Other reason Other L1 drop reason

        L2 Drop Reasons

        Reason ID Reason Severity Description
        201 MLAG port isolation Notice Expected behavior
        202 Destination MAC is reserved (DMAC=01-80-C2-00-00-0x) Error Bad packet was received from the peer
        203 VLAN tagging mismatch Error Validate the VLAN tag configuration on both ends of the link
        204 Ingress VLAN filtering Error Validate the VLAN membership configuration on both ends of the link
        205 Ingress spanning tree filter Notice Expected behavior
        206 Unicast MAC table action discard Error Validate MAC table for this destination MAC
        207 Multicast egress port list is empty Warning Validate why IGMP join or multicast router port does not exist
        208 Port loopback filter Error Validate MAC table for this destination MAC
        209 Source MAC is multicast Error Bad packet was received from peer
        210 Source MAC equals destination MAC Error Bad packet was received from peer

        Router Drop Reasons

        Reason ID Reason Severity Description
        301 Non-routable packet Notice Expected behavior
        302 Blackhole route Warning Validate routing table for this destination IP
        303 Unresolved neighbor/next hop Warning Validate ARP table for the neighbor/next hop
        304 Blackhole ARP/neighbor Warning Validate ARP table for the next hop
        305 IPv6 destination in multicast scope FFx0:/16 Notice Expected behavior - packet is not routable
        306 IPv6 destination in multicast scope FFx1:/16 Notice Expected behavior - packet is not routable
        307 Non-IP packet Notice Destination MAC is the router, packet is not routable
        308 Unicast destination IP but multicast destination MAC Error Bad packet was received from the peer
        309 Destination IP is loopback address Error Bad packet was received from the peer
        310 Source IP is multicast Error Bad packet was received from the peer
        311 Source IP is in class E Error Bad packet was received from the peer
        312 Source IP is loopback address Error Bad packet was received from the peer
        313 Source IP is unspecified Error Bad packet was received from the peer
        314 Checksum or IPver or IPv4 IHL too short Error Bad cable or bad packet was received from the peer
        315 Multicast MAC mismatch Error Bad packet was received from the peer
        316 Source IP equals destination IP Error Bad packet was received from the peer
        317 IPv4 source IP is limited broadcast Error Bad packet was received from the peer
        318 IPv4 destination IP is local network (destination=0.0.0.0/8) Error Bad packet was received from the peer
        320 Ingress router interface is disabled Warning Validate your configuration
        321 Egress router interface is disabled Warning Validate your configuration
        323 IPv4 routing table (LPM) unicast miss Warning Validate routing table for this destination IP
        324 IPv6 routing table (LPM) unicast miss Warning Validate routing table for this destination IP
        325 Router interface loopback Warning Validate the interface configuration
        326 Packet size is larger than router interface MTU Warning Validate the router interface MTU configuration
        327 TTL value is too small Warning Actual path is longer than the TTL

        Tunnel Drop Reasons

        Reason ID Reason Severity Description
        402 Overlay switch - Source MAC is multicast Error The peer sent a bad packet
        403 Overlay switch - Source MAC equals destination MAC Error The peer sent a bad packet
        404 Decapsulation error Error The peer sent a bad packet

        ACL Drop Reasons

        Reason ID Reason Severity Description
        601 Ingress port ACL Notice Validate ACL configuration
        602 Ingress router ACL Notice Validate ACL configuration
        603 Egress router ACL Notice Validate ACL configuration
        604 Egress port ACL Notice Validate ACL configuration

        Buffer Drop Reasons

        Reason ID Reason Severity Description
        503 Tail drop Warning Monitor network congestion
        504 WRED Warning Monitor network congestion
        505 Port TC congestion threshold crossed Notice Monitor network congestion
        506 Packet latency threshold crossed Notice Monitor network congestion

        gNMI Client Requests

        You can use your gNMI client on a host server to request capabilities and data that the agent is subscribed to.

        The following example shows a gNMI client request for interface speed:

        gnmi_client -target_addr 10.209.37.121:9339 -xpath "/interfaces/interface[name=swp1]/ethernet/state/port-speed" -once
        {
           "Response": {
              "Update": {
                 "update": [
                    {
                       "val": {
                          "Value": {
                             "StringVal": "SPEED_40GB"
                          }
                       },
                       "path": {
                          "elem": [
                             {
                                "name": "state"
                             },
                             {
                                "name": "port-speed"
                             }
                          ]
                       }
                    }
                 ],
                 "timestamp": 1636910588085654861,
                 "prefix": {
                    "target": "netq",
                    "elem": [
                       {
                          "name": "interfaces"
                       },
                       {
                          "name": "interface",
                          "key": {
                             "name": "swp1"
                          }
                       },
                       {
                          "name": "ethernet"
                       }
                    ]
                 }
              }
           }
        }
        
        
        

        The following example shows a gNMI client request for WJH drop data:

        gnmi_client -target_addr 10.209.37.121:9339 -xpath "/interfaces/interface[name=swp8]/wjh/aggregate/l2/reasons/reason[id=210]"
        {
           "Response": {
              "Update": {
                 "update": [
                    {
                       "val": {
                          "Value": {
                             "StringVal": "[{
        									  "IngressPort": "swp8",
        									  "DropType": "L2",
        									  "Reason": "Source MAC equals destination MAC",
        									  "Severity": "Error",
        									  "Smac": "00:02:10:00:00:01",
        									  "Dmac": "00:02:10:00:00:01",
        									  "Proto": 6,
        									  "Sport": 15,
        									  "Dport": 16,
        									  "Sip": "1.1.1.1"
        									  "Dip": "2.2.2.2",
        									  "AggCount": 192,
        									  "FirstTimestamp": 1636907412,
        									  "EndTimestamp": 1636907432,
        								   }]"
        
                          }
                       },
                       "path": {
                          "elem": [
                             {
                                "name": "state"
                             },
                             {
                                "name": "drop"
                             }
                          ]
                       }
                    }
                 ],
                 "prefix": {
                    "elem": [
                       {
                          "name": "interfaces"
                       },
                       {
                          "key": {
                             "name": "swp8"
                          },
                          "name": "interface"
                       },
                       {
                          "name": "wjh"
                       },
                       {
                          "name": "aggregate"
                       },
                       {
                          "name": "l2"
                       },
                       {
                          "name": "reasons"
                       },
                       {
                          "key" : {
                             "severity": "error",
                             "id": "210"
                          },
                          "name" : "reason"
                       }
                    ],
                    "target": "netq"
                 },
                 "timestamp": 1636907442362981645
              }
           }
        }
        

        System Events Reference

        The following table lists all system event messages organized by type. You can view these messages with the NetQ UI or CLI, or receive them through third-party notification applications.

        Agent Events

        Type Trigger Severity Message Format Example
        agent NetQ Agent state changed to Rotten (not heard from in over 15 seconds) Error Agent state changed to rotten Agent state changed to rotten
        agent NetQ Agent rebooted Error Netq-agent rebooted at (@last_boot) Netq-agent rebooted at 1573166417
        agent Node running NetQ Agent rebooted Error Switch rebooted at (@sys_uptime) Switch rebooted at 1573166131
        agent NetQ Agent state changed to Fresh Info Agent state changed to fresh Agent state changed to fresh
        agent NetQ Agent state was reset Info Agent state was paused and resumed at (@last_reinit) Agent state was paused and resumed at 1573166125
        agent Version of NetQ Agent has changed Info Agent version has been changed old_version:@old_version and new_version:@new_version. Agent reset at @sys_uptime Agent version has been changed old_version:2.1.2 and new_version:2.3.1. Agent reset at 1573079725

        BGP Events

        Type Trigger Severity Message Format Example
        bgp BGP Session state changed Error BGP session with peer @peer @neighbor vrf @vrf state changed from @old_state to @new_state BGP session with peer leaf03 leaf04 vrf mgmt state changed from Established to Failed
        bgp BGP Session state changed from Failed to Established Info BGP session with peer @peer @peerhost @neighbor vrf @vrf session state changed from Failed to Established BGP session with peer swp5 spine02 spine03 vrf default session state changed from Failed to Established
        bgp BGP Session state changed from Established to Failed Info BGP session with peer @peer @neighbor vrf @vrf state changed from established to failed BGP session with peer leaf03 leaf04 vrf mgmt state changed from down to up
        bgp The reset time for a BGP session changed Info BGP session with peer @peer @neighbor vrf @vrf reset time changed from @old_last_reset_time to @new_last_reset_time BGP session with peer spine03 swp9 vrf vrf2 reset time changed from 1559427694 to 1559837484

        BTRFS Events

        Type Trigger Severity Message Format Example
        btrfsinfo Disk space available after BTRFS allocation is less than 80% of partition size or only 2 GB remain. Error @info : @details high btrfs allocation space : greater than 80% of partition size, 61708420
        btrfsinfo Indicates if a rebalance operation can free up space on the disk Error @info : @details data storage efficiency : space left after allocation greater than chunk size 6170849.2","

        Cable Events

        Type Trigger Severity Message Format Example
        cable Link speed is not the same on both ends of the link Error @ifname speed @speed, mismatched with peer @peer @peer_if speed @peer_speed swp2 speed 10, mismatched with peer server02 swp8 speed 40
        cable The speed setting for a given port changed Info @ifname speed changed from @old_speed to @new_speed swp9 speed changed from 10 to 40
        cable The transceiver status for a given port changed Info @ifname transceiver changed from @old_transceiver to @new_transceiver swp4 transceiver changed from disabled to enabled
        cable The vendor of a given transceiver changed Info @ifname vendor name changed from @old_vendor_name to @new_vendor_name swp23 vendor name changed from Broadcom to NVIDIA
        cable The part number of a given transceiver changed Info @ifname part number changed from @old_part_number to @new_part_number swp7 part number changed from FP1ZZ5654002A to MSN2700-CS2F0
        cable The serial number of a given transceiver changed Info @ifname serial number changed from @old_serial_number to @new_serial_number swp4 serial number changed from 571254X1507020 to MT1552X12041
        cable The status of forward error correction (FEC) support for a given port changed Info @ifname supported fec changed from @old_supported_fec to @new_supported_fec swp12 supported fec changed from supported to unsupported

        swp12 supported fec changed from unsupported to supported

        cable The advertised support for FEC for a given port changed Info @ifname supported fec changed from @old_advertised_fec to @new_advertised_fec swp24 supported FEC changed from advertised to not advertised
        cable The FEC status for a given port changed Info @ifname fec changed from @old_fec to @new_fec swp15 fec changed from disabled to enabled

        CLAG/MLAG Events

        Type Trigger Severity Message Format Example
        clag CLAG remote peer state changed from up to down Error Peer state changed to down Peer state changed to down
        clag Local CLAG host MTU does not match its remote peer MTU Error SVI @svi1 on vlan @vlan mtu @mtu1 mismatched with peer mtu @mtu2 SVI svi7 on vlan 4 mtu 1592 mistmatched with peer mtu 1680
        clag CLAG SVI on VLAN is missing from remote peer state Error SVI on vlan @vlan is missing from peer SVI on vlan vlan4 is missing from peer
        clag CLAG peerlink is not opperating at full capacity. At least one link is down. Error Clag peerlink not at full redundancy, member link @slave is down Clag peerlink not at full redundancy, member link swp40 is down
        clag CLAG remote peer state changed from down to up Info Peer state changed to up Peer state changed to up
        clag Local CLAG host state changed from down to up Info Clag state changed from down to up Clag state changed from down to up
        clag CLAG bond in Conflicted state updated with new bonds Info Clag conflicted bond changed from @old_conflicted_bonds to @new_conflicted_bonds Clag conflicted bond changed from swp7 swp8 to @swp9 swp10
        clag CLAG bond changed state from protodown to up state Info Clag conflicted bond changed from @old_state_protodownbond to @new_state_protodownbond Clag conflicted bond changed from protodown to up

        CL Support Events

        Type Trigger Severity Message Format Example
        clsupport A new CL Support file has been created for the given node Error HostName @hostname has new CL SUPPORT file HostName leaf01 has new CL SUPPORT file

        Config Diff Events

        Type Trigger Severity Message Format Example
        configdiff Configuration file deleted on a device Error @hostname config file @type was deleted spine03 config file /etc/frr/frr.conf was deleted
        configdiff Configuration file has been created Info @hostname config file @type was created leaf12 config file /etc/lldp.d/README.conf was created
        configdiff Configuration file has been modified Info @hostname config file @type was modified spine03 config file /etc/frr/frr.conf was modified

        EVPN Events

        Type Trigger Severity Message Format Example
        evpn A VNI was configured and moved from the up state to the down state Error VNI @vni state changed from up to down VNI 36 state changed from up to down
        evpn A VNI was configured and moved from the down state to the up state Info VNI @vni state changed from down to up VNI 36 state changed from down to up
        evpn The kernel state changed on a VNI Info VNI @vni kernel state changed from @old_in_kernel_state to @new_in_kernel_state VNI 3 kernel state changed from down to up
        evpn A VNI state changed from not advertising all VNIs to advertising all VNIs Info VNI @vni vni state changed from @old_adv_all_vni_state to @new_adv_all_vni_state VNI 11 vni state changed from false to true

        Lifecycle Management Events

        Type Trigger Severity Message Format Example
        lcm Cumulus Linux backup started for a switch or host Info CL configuration backup started for hostname @hostname CL configuration backup started for hostname spine01
        lcm Cumulus Linux backup completed for a switch or host Info CL configuration backup completed for hostname @hostname CL configuration backup completed for hostname spine01
        lcm Cumulus Linux backup failed for a switch or host Error CL configuration backup failed for hostname @hostname CL configuration backup failed for hostname spine01
        lcm Cumulus Linux upgrade from one version to a newer version has started for a switch or host Error CL Image upgrade from version @old_cl_version to version @new_cl_version started for hostname @hostname CL Image upgrade from version 4.1.0 to version 4.2.1 started for hostname server01
        lcm Cumulus Linux upgrade from one version to a newer version has completed successfully for a switch or host Info CL Image upgrade from version @old_cl_version to version @new_cl_version completed for hostname @hostname CL Image upgrade from version 4.1.0 to version 4.2.1 completed for hostname server01
        lcm Cumulus Linux upgrade from one version to a newer version has failed for a switch or host Error CL Image upgrade from version @old_cl_version to version @new_cl_version failed for hostname @hostname CL Image upgrade from version 4.1.0 to version 4.2.1 failed for hostname server01
        lcm Restoration of a Cumulus Linux configuration started for a switch or host Info CL configuration restore started for hostname @hostname CL configuration restore started for hostname leaf01
        lcm Restoration of a Cumulus Linux configuration completed successfully for a switch or host Info CL configuration restore completed for hostname @hostname CL configuration restore completed for hostname leaf01
        lcm Restoration of a Cumulus Linux configuration failed for a switch or host Error CL configuration restore failed for hostname @hostname CL configuration restore failed for hostname leaf01
        lcm Rollback of a Cumulus Linux image has started for a switch or host Error CL Image rollback from version @old_cl_version to version @new_cl_version started for hostname @hostname CL Image rollback from version 4.2.1 to version 4.1.0 started for hostname leaf01
        lcm Rollback of a Cumulus Linux image has completed successfully for a switch or host Info CL Image rollback from version @old_cl_version to version @new_cl_version completed for hostname @hostname CL Image rollback from version 4.2.1 to version 4.1.0 completed for hostname leaf01
        lcm Rollback of a Cumulus Linux image has failed for a switch or host Error CL Image rollback from version @old_cl_version to version @new_cl_version failed for hostname @hostname CL Image rollback from version 4.2.1 to version 4.1.0 failed for hostname leaf01
        lcm Installation of a NetQ image has started for a switch or host Info NetQ Image version @netq_version installation started for hostname @hostname NetQ Image version 3.2.0 installation started for hostname spine02
        lcm Installation of a NetQ image has completed successfully for a switch or host Info NetQ Image version @netq_version installation completed for hostname @hostname NetQ Image version 3.2.0 installation completed for hostname spine02
        lcm Installation of a NetQ image has failed for a switch or host Error NetQ Image version @netq_version installation failed for hostname @hostname NetQ Image version 3.2.0 installation failed for hostname spine02
        lcm Upgrade of a NetQ image has started for a switch or host Info NetQ Image upgrade from version @old_netq_version to version @netq_version started for hostname @hostname NetQ Image upgrade from version 3.1.0 to version 3.2.0 started for hostname spine02
        lcm Upgrade of a NetQ image has completed successfully for a switch or host Info NetQ Image upgrade from version @old_netq_version to version @netq_version completed for hostname @hostname NetQ Image upgrade from version 3.1.0 to version 3.2.0 completed for hostname spine02
        lcm Upgrade of a NetQ image has failed for a switch or host Error NetQ Image upgrade from version @old_netq_version to version @netq_version failed for hostname @hostname NetQ Image upgrade from version 3.1.0 to version 3.2.0 failed for hostname spine02
        Type Trigger Severity Message Format Example
        link Link operational state changed from up to down Error HostName @hostname changed state from @old_state to @new_state Interface:@ifname HostName leaf01 changed state from up to down Interface:swp34
        link Link operational state changed from down to up Info HostName @hostname changed state from @old_state to @new_state Interface:@ifname HostName leaf04 changed state from down to up Interface:swp11

        LLDP Events

        Type Trigger Severity Message Format Example
        lldp Local LLDP host has new neighbor information Info LLDP Session with host @hostname and @ifname modified fields @changed_fields LLDP Session with host leaf02 swp6 modified fields leaf06 swp21
        lldp Local LLDP host has new peer interface name Info LLDP Session with host @hostname and @ifname @old_peer_ifname changed to @new_peer_ifname LLDP Session with host spine01 and swp5 swp12 changed to port12
        lldp Local LLDP host has new peer hostname Info LLDP Session with host @hostname and @ifname @old_peer_hostname changed to @new_peer_hostname LLDP Session with host leaf03 and swp2 leaf07 changed to exit01

        MTU Events

        Type Trigger Severity Message Format Example
        mtu VLAN interface link MTU is smaller than that of its parent MTU Error vlan interface @link mtu @mtu is smaller than parent @parent mtu @parent_mtu vlan interface swp3 mtu 1500 is smaller than parent peerlink-1 mtu 1690
        mtu Bridge interface MTU is smaller than the member interface with the smallest MTU Error bridge @link mtu @mtu is smaller than least of member interface mtu @min bridge swp0 mtu 1280 is smaller than least of member interface mtu 1500

        NTP Events

        Type Trigger Severity Message Format Example
        ntp NTP sync state changed from in sync to not in sync Error Sync state changed from @old_state to @new_state for @hostname Sync state changed from in sync to not sync for leaf06
        ntp NTP sync state changed from not in sync to in sync Info Sync state changed from @old_state to @new_state for @hostname Sync state changed from not sync to in sync for leaf06

        OSPF Events

        Type Trigger Severity Message Format Example
        ospf OSPF session state on a given interface changed from Full to a down state Error OSPF session @ifname with @peer_address changed from Full to @down_state

        OSPF session swp7 with 27.0.0.18 state changed from Full to Fail

        OSPF session swp7 with 27.0.0.18 state changed from Full to ExStart

        ospf OSPF session state on a given interface changed from a down state to full Info OSPF session @ifname with @peer_address changed from @down_state to Full

        OSPF session swp7 with 27.0.0.18 state changed from Down to Full

        OSPF session swp7 with 27.0.0.18 state changed from Init to Full

        OSPF session swp7 with 27.0.0.18 state changed from Fail to Full

        Package Information Events

        Type Trigger Severity Message Format Example
        packageinfo Package version on device does not match the version identified in the existing manifest Error @package_name manifest version mismatch netq-apps manifest version mismatch

        PTM Events

        Type Trigger Severity Message Format Example
        ptm Physical interface cabling does not match configuration specified in topology.dot file Error PTM cable status failed PTM cable status failed
        ptm Physical interface cabling matches configuration specified in topology.dot file Error PTM cable status passed PTM cable status passed

        Resource Events

        Type Trigger Severity Message Format Example
        resource A physical resource has been deleted from a device Error Resource Utils deleted for @hostname Resource Utils deleted for spine02
        resource Root file system access on a device has changed from Read/Write to Read Only Error @hostname root file system access mode set to Read Only server03 root file system access mode set to Read Only
        resource Root file system access on a device has changed from Read Only to Read/Write Info @hostname root file system access mode set to Read/Write leaf11 root file system access mode set to Read/Write
        resource A physical resource has been added to a device Info Resource Utils added for @hostname Resource Utils added for spine04

        Running Config Diff Events

        Type Trigger Severity Message Format Example
        runningconfigdiff Running configuration file has been modified Info @commandname config result was modified @commandname config result was modified

        Sensor Events

        Type Trigger Severity Message Format Example
        sensor A fan or power supply unit sensor has changed state Error Sensor @sensor state changed from @old_s_state to @new_s_state Sensor fan state changed from up to down
        sensor A temperature sensor has crossed the maximum threshold for that sensor Error Sensor @sensor max value @new_s_max exceeds threshold @new_s_crit Sensor temp max value 110 exceeds the threshold 95
        sensor A temperature sensor has crossed the minimum threshold for that sensor Error Sensor @sensor min value @new_s_lcrit fall behind threshold @new_s_min Sensor psu min value 10 fell below threshold 25
        sensor A temperature, fan, or power supply sensor state changed Info Sensor @sensor state changed from @old_state to @new_state

        Sensor temperature state changed from Error to ok

        Sensor fan state changed from absent to ok

        Sensor psu state changed from bad to ok

        sensor A fan or power supply sensor state changed Info Sensor @sensor state changed from @old_s_state to @new_s_state

        Sensor fan state changed from down to up

        Sensor psu state changed from down to up

        Services Events

        Type Trigger Severity Message Format Example
        services A service status changed from down to up Error Service @name status changed from @old_status to @new_status Service bgp status changed from down to up
        services A service status changed from up to down Error Service @name status changed from @old_status to @new_status Service lldp status changed from up to down
        services A service changed state from inactive to active Info Service @name changed state from inactive to active

        Service bgp changed state from inactive to active

        Service lldp changed state from inactive to active

        SSD Utilization Events

        Type Trigger Severity Message Format Example
        ssdutil 3ME3 disk health has dropped below 10% Error @info: @details low health : 5.0%
        ssdutil A dip in 3ME3 disk health of more than 2% has occurred within the last 24 hours Error @info: @details significant health drop : 3.0%

        Version Events

        Type Trigger Severity Message Format Example
        version An unknown version of the operating system was detected Error unexpected os version @my_ver unexpected os version cl3.2
        version Desired version of the operating system is not available Error os version @ver os version cl3.7.9
        version An unknown version of a software package was detected Error expected release version @ver expected release version cl3.6.2
        version Desired version of a software package is not available Error different from version @ver different from version cl4.0

        VXLAN Events

        Type Trigger Severity Message Format Example
        vxlan Replication list is contains an inconsistent set of nodes<> Error<> VNI @vni replication list inconsistent with @conflicts diff:@diff<> VNI 14 replication list inconsistent with ["leaf03","leaf04"] diff:+:["leaf03","leaf04"] -:["leaf07","leaf08"]

        Threshold-Crossing Events Reference

        This reference lists the threshold-based events that NetQ supports. You can view these messages through third-party notification applications. For details about configuring notifications for these events, refer to Configure and Monitor Threshold-Crossing Events.

        ACL Resources

        NetQ UI Name NetQ CLI Event ID Description
        Ingress ACL IPv4 % TCA_TCAM_IN_ACL_V4_FILTER_UPPER Number of ingress ACL filters for IPv4 addresses on a given switch or host exceeded user-defined threshold
        Egress ACL IPv4 % TCA_TCAM_EG_ACL_V4_FILTER_UPPER Number of egress ACL filters for IPv4 addresses on a given switch or host exceeded user-defined maximum threshold
        Ingress ACL IPv4 mangle % TCA_TCAM_IN_ACL_V4_MANGLE_UPPER Number of ingress ACL mangles for IPv4 addresses on a given switch or host exceeded user-defined maximum threshold
        Ingress ACL IPv4 mangle % TCA_TCAM_EG_ACL_V4_MANGLE_UPPER Number of egress ACL mangles for IPv4 addresses on a given switch or host exceeded user-defined maximum threshold
        Ingress ACL IPv6 % TCA_TCAM_IN_ACL_V6_FILTER_UPPER Number of ingress ACL filters for IPv6 addresses on a given switch or host exceeded user-defined maximum threshold
        Egress ACL IPv6 % TCA_TCAM_EG_ACL_V6_FILTER_UPPER Number of egress ACL filters for IPv6 addresses on a given switch or host exceeded user-defined maximum threshold
        Ingress ACL IPv6 mangle % TCA_TCAM_IN_ACL_V6_MANGLE_UPPER Number of ingress ACL mangles for IPv6 addresses on a given switch or host exceeded user-defined maximum threshold
        Egress ACL IPv6 mangle % TCA_TCAM_EG_ACL_V6_MANGLE_UPPER Number of egress ACL mangles for IPv6 addresses on a given switch or host exceeded user-defined maximum threshold
        Ingress ACL 8021x % TCA_TCAM_IN_ACL_8021x_FILTER_UPPER Number of ingress ACL 802.1 filters on a given switch or host exceeded user-defined maximum threshold
        ACL L4 port % TCA_TCAM_ACL_L4_PORT_CHECKERS_UPPER Number of ACL port range checkers on a given switch or host exceeded user-defined maximum threshold
        ACL regions % TCA_TCAM_ACL_REGIONS_UPPER Number of ACL regions on a given switch or host exceeded user-defined maximum threshold
        Ingress ACL mirror % TCA_TCAM_IN_ACL_MIRROR_UPPER Number of ingress ACL mirrors on a given switch or host exceeded user-defined maximum threshold
        ACL 18B rules % TCA_TCAM_ACL_18B_RULES_UPPER Number of ACL 18B rules on a given switch or host exceeded user-defined maximum threshold
        ACL 32B % TCA_TCAM_ACL_32B_RULES_UPPER Number of ACL 32B rules on a given switch or host exceeded user-defined maximum threshold
        ACL 54B % TCA_TCAM_ACL_54B_RULES_UPPER Number of ACL 54B rules on a given switch or host exceeded user-defined maximum threshold
        Ingress PBR IPv4 % TCA_TCAM_IN_PBR_V4_FILTER_UPPER Number of ingress policy-based routing (PBR) filters for IPv4 addresses on a given switch or host exceeded user-defined maximum threshold
        Ingress PBR IPv6 % TCA_TCAM_IN_PBR_V6_FILTER_UPPER Number of ingress policy-based routing (PBR) filters for IPv6 addresses on a given switch or host exceeded user-defined maximum threshold

        BGP

        NetQ UI Name NetQ CLI Event ID Description
        BGP connection drop TCA_BGP_CONN_DROP Increase in drop count for a BGP session exceeding user-defined threshold
        BGP packet queue length TCA_BGP_PACKET_QUEUE_LENGTH Packet queue length persistently non-zero for more than the threshold duration (in seconds)

        Digital Optics

        NetQ UI Name NetQ CLI Event ID Description
        Laser Rx power alarm upper TCA_DOM_RX_POWER_ALARM_UPPER Transceiver Input power (mW) for the digital optical module on a given switch or host interface exceeded user-defined the maximum alarm threshold
        Laser Rx power alarm lower TCA_DOM_RX_POWER_ALARM_LOWER Transceiver Input power (mW) for the digital optical module on a given switch or host exceeded user-defined minimum alarm threshold
        Laser Rx power warning upper TCA_DOM_RX_POWER_WARNING_UPPER Transceiver Input power (mW) for the digital optical module on a given switch or host exceeded user-defined specified warning threshold
        Laser Rx power warning lower TCA_DOM_RX_POWER_WARNING_LOWER Transceiver Input power (mW) for the digital optical module on a given switch or host exceeded user-defined minimum warning threshold
        Laser bias current alarm upper TCA_DOM_BIAS_CURRENT_ALARM_UPPER Laser bias current (mA) for the digital optical module on a given switch or host exceeded user-defined maximum alarm threshold
        Laser bias current alarm lower TCA_DOM_BIAS_CURRENT_ALARM_LOWER Laser bias current (mA) for the digital optical module on a given switch or host exceeded user-defined minimum alarm threshold
        Laser bias current warning upper TCA_DOM_BIAS_CURRENT_WARNING_UPPER Laser bias current (mA) for the digital optical module on a given switch or host exceeded user-defined maximum warning threshold
        Laser bias current warning lower TCA_DOM_BIAS_CURRENT_WARNING_LOWER Laser bias current (mA) for the digital optical module on a given switch or host exceeded user-defined minimum warning threshold
        Laser output power alarm upper TCA_DOM_OUTPUT_POWER_ALARM_UPPER Laser output power (mW) for the digital optical module on a given switch or host exceeded user-defined maximum alarm threshold
        Laser output power alarm lower TCA_DOM_OUTPUT_POWER_ALARM_LOWER Laser output power (mW) for the digital optical module on a given switch or host exceeded user-defined minimum alarm threshold
        Laser output power alarm upper TCA_DOM_OUTPUT_POWER_WARNING_UPPER Laser output power (mW) for the digital optical module on a given switch or host exceeded user-defined maximum warning threshold
        Laser output power warning lower TCA_DOM_OUTPUT_POWER_WARNING_LOWER Laser output power (mW) for the digital optical module on a given switch or host exceeded user-defined minimum warning threshold
        Laser module temperature alarm upper TCA_DOM_MODULE_TEMPERATURE_ALARM_UPPER Digital optical module temperature (°C) on a given switch or host exceeded user-defined maximum alarm threshold
        Laser module temperature alarm lower TCA_DOM_MODULE_TEMPERATURE_ALARM_LOWER Digital optical module temperature (°C) on a given switch or host exceeded user-defined minimum alarm threshold
        Laser module temperature warning upper TCA_DOM_MODULE_TEMPERATURE_WARNING_UPPER Digital optical module temperature (°C) on a given switch or host exceeded user-defined maximum warning threshold
        Laser module temperature warning lower TCA_DOM_MODULE_TEMPERATURE_WARNING_LOWER Digital optical module temperature (°C) on a given switch or host exceeded user-defined minimum warning threshold
        Laser module voltage alarm upper TCA_DOM_MODULE_VOLTAGE_ALARM_UPPER Transceiver voltage (V) on a given switch or host exceeded user-defined maximum alarm threshold
        Laser module voltage alarm lower TCA_DOM_MODULE_VOLTAGE_ALARM_LOWER Transceiver voltage (V) on a given switch or host exceeded user-defined minimum alarm threshold
        Laser module voltage warning upper TCA_DOM_MODULE_VOLTAGE_WARNING_UPPER Transceiver voltage (V) on a given switch or host exceeded user-defined maximum warning threshold
        Laser module voltage warning lower TCA_DOM_MODULE_VOLTAGE_WARNING_LOWER Transceiver voltage (V) on a given switch or host exceeded user-defined minimum warning threshold

        DPU RoCE

        NetQ UI Name NetQ CLI Event ID Description
        Implied nak seq error TCA_HOSTD_IMPLIED_NAK_SEQ_ERR Count of implied sequence errors exceeded user-defined maximum threshold
        Out of buffer TCA_HOSTD_OUT_OF_BUFFER Count of out-of-buffer errors exceeded user-defined maximum threshold
        Outbound PCI stalled read TCA_HOSTD_OUTBOUND_PCI_STALLED_RD Percentage of outbound stalled read requests exceeded user-defined maximum threshold
        Outbound PCI stalled write TCA_HOSTD_OUTBOUND_PCI_STALLED_WR Percentage of outbound stalled write requests exceeded user-defined maximum threshold
        Packet seq err TCA_HOSTD_PACKET_SEQ_ERR Count of packet sequence errors exceeded user-defined maximum threshold
        Req CQE error TCA_HOSTD_REQ_CQE_ERROR Count of req completion queue events (CQE) errors exceeded user-defined maximum threshold
        Req remote access errors TCA_HOSTD_REQ_REMOTE_ACCESS_ERRORS Count of remote access errors exceeded user-defined maximum threshold
        Resp CQE error TCA_HOSTD_RESP_CQE_ERROR Count of response completion queue events (CQE) errors exceeded user-defined maximum threshold
        Resp remote access errors TCA_HOSTD_RESP_REMOTE_ACCESS_ERRORS Count of response remote access errors exceeded user-defined maximum threshold
        RNR nak retry error TCA_HOSTD_RNR_NAK_RETRY_ERR Count of RNR retry errors exceeded user-defined maximum threshold
        Rx CRC errors phy TCA_HOSTD_RX_CRC_ERRORS_PHY Count of Rx CRC errors exceeded user-defined maximum threshold
        Rx discards phy TCA_HOSTD_RX_DISCARDS_PHY Rate of Rx discards exceeded user-defined maximum threshold
        Rx PCI signal integrity TCA_HOSTD_RX_PCI_SIGNAL_INTEGRITY Count of Rx PCIe signal integrity errors exceeded user-defined maximum threshold
        Rx pcs symbol err phy TCA_HOSTD_RX_PCS_SYMBOL_ERR_PHY Count of Rx symbol errors exceeded user-defined maximum threshold
        Rx prio0 buf discard TCA_HOSTD_RX_PRIO0_BUF_DISCARD Rate of p0 buffer discards exceeded user-defined maximum threshold
        Rx prio0 cong discard TCA_HOSTD_RX_PRIO0_CONG_DISCARD Rate of p0 congestion discards exceeded user-defined maximum threshold
        Rx prio1 buf discard TCA_HOSTD_RX_PRIO1_BUF_DISCARD Rate of p1 buffer discards exceeded user-defined maximum threshold
        Rx prio1 cong discard TCA_HOSTD_RX_PRIO1_CONG_DISCARD Rate of p1 congestion discards exceeded user-defined maximum threshold
        Rx prio2 buf discard TCA_HOSTD_RX_PRIO2_BUF_DISCARD Rate of p2 buffer discards exceeded user-defined maximum threshold
        Rx prio2 cong discard TCA_HOSTD_RX_PRIO2_CONG_DISCARD Rate of p2 congestion discards exceeded user-defined maximum threshold
        Rx prio3 buf discard TCA_HOSTD_RX_PRIO3_BUF_DISCARD Rate of p3 buffer discards exceeded user-defined maximum threshold
        Rx prio3 cong discard TCA_HOSTD_RX_PRIO3_CONG_DISCARD Rate of p3 congestion discards exceeded user-defined maximum threshold
        Rx prio4 buf discard TCA_HOSTD_RX_PRIO4_BUF_DISCARD Rate of p4 buffer discards exceeded user-defined maximum threshold
        Rx prio4 cong discard TCA_HOSTD_RX_PRIO4_CONG_DISCARD Rate of p4 congestion discards exceeded user-defined maximum threshold
        Rx prio5 buf discard TCA_HOSTD_RX_PRIO5_BUF_DISCARD Rate of p5 buffer discards exceeded user-defined maximum threshold
        Rx prio5 cong discard TCA_HOSTD_RX_PRIO5_CONG_DISCARD Rate of p5 congestion discards exceeded user-defined maximum threshold
        Rx prio6 buf discard TCA_HOSTD_RX_PRIO6_BUF_DISCARD Rate of p6 buffer discards exceeded user-defined maximum threshold
        Rx prio6 cong discard TCA_HOSTD_RX_PRIO6_CONG_DISCARD Rate of p6 congestion discards exceeded user-defined maximum threshold
        Rx prio7 buf discard TCA_HOSTD_RX_PRIO7_BUF_DISCARD Rate of p7 buffer discards exceeded user-defined maximum threshold
        Rx prio7 cong discard TCA_HOSTD_RX_PRIO7_CONG_DISCARD Rate of p7 congestion discards exceeded user-defined maximum threshold
        Rx symbol err phy TCA_HOSTD_RX_SYMBOL_ERR_PHY Count of Rx symbol errors (physical coding errors) exceeded user-defined maximum threshold
        Tx discards phy TCA_HOSTD_TX_DISCARDS_PHY Rate of Tx discards exceeded user-defined maximum threshold
        Tx errors phy TCA_HOSTD_TX_ERRORS_PHY Count of Tx errors exceeded user-defined maximum threshold
        Tx pause storm error events TCA_HOSTD_TX_PAUSE_STORM_ERROR_EVENTS Count of pause error events exceeded user-defined maximum threshold
        Tx pause storm warning events TCA_HOSTD_TX_PAUSE_STORM_WARNING_EVENTS Count of pause warning events exceeded user-defined maximum threshold
        Tx PCI signal integrity TCA_HOSTD_TX_PCI_SIGNAL_INTEGRITY Count of Tx PCIe signal integrity errors exceeded user-defined maximum threshold

        ECMP

        NetQ UI Name NetQ CLI Event ID Description
        ECMP imbalance TCA_ECMP_IMBALANCE ECMP path utilization imbalance greater than the threshold

        Forwarding Resources

        NetQ UI Name NetQ CLI Event ID Description
        Total route entries % TCA_TCAM_TOTAL_ROUTE_ENTRIES_UPPER Number of routes on a given switch or host exceeded user-defined maximum threshold
        Mcast routes % TCA_TCAM_TOTAL_MCAST_ROUTES_UPPER Number of multicast routes on a given switch or host exceeded user-defined maximum threshold
        MAC entries % TCA_TCAM_MAC_ENTRIES_UPPER Number of MAC addresses on a given switch or host exceeded user-defined maximum threshold
        IPv4 routes % TCA_TCAM_IPV4_ROUTE_UPPER Number of IPv4 routes on a given switch or host exceeded user-defined maximum threshold
        IPv4 hosts % TCA_TCAM_IPV4_HOST_UPPER Number of IPv4 hosts on a given switch or host exceeded user-defined maximum threshold
        Exceeding IPv6 routes % TCA_TCAM_IPV6_ROUTE_UPPER Number of IPv6 routes on a given switch or host exceeded user-defined maximum threshold
        IPv6 hosts % TCA_TCAM_IPV6_HOST_UPPER Number of IPv6 hosts on a given switch or host exceeded user-defined maximum threshold
        ECMP next hop % TCA_TCAM_ECMP_NEXTHOPS_UPPER Number of equal cost multi-path (ECMP) next hop entries on a given switch or host exceeded user-defined maximum threshold

        Interface Errors

        NetQ UI Name NetQ CLI Event ID Description
        Oversize errors TCA_HW_IF_OVERSIZE_ERRORS Number of times a frame longer than maximum size (1518 Bytes) exceeded user-defined threshold
        Undersize errors TCA_HW_IF_UNDERSIZE_ERRORS Number of times a frame shorter than minimum size (64 Bytes) exceeded user-defined threshold
        Alignment errors TCA_HW_IF_ALIGNMENT_ERRORS Number of times a frame with an uneven byte count and a CRC error exceeded user-defined threshold
        Jabber errors TCA_HW_IF_JABBER_ERRORS Number of times a frame longer than maximum size (1518 bytes) and with a CRC error exceeded user-defined threshold
        Symbol errors TCA_HW_IF_SYMBOL_ERRORS Number of times that detected undefined or invalid symbols exceeded user-defined threshold

        Interface Statistics

        NetQ UI Name NetQ CLI Event ID Description
        Broadcast received bytes TCA_RXBROADCAST_UPPER Number of broadcast receive bytes per second exceeded user-defined maximum threshold on a switch interface
        Received bytes TCA_RXBYTES_UPPER Number of receive bytes exceeded user-defined maximum threshold on a switch interface
        Multicast received bytes TCA_RXMULTICAST_UPPER rx_multicast per second on a given switch or host exceeded user-defined maximum threshold
        Broadcast transmitted bytes TCA_TXBROADCAST_UPPER Number of broadcast transmit bytes per second exceeded user-defined maximum threshold on a switch interface
        Transmitted bytes TCA_TXBYTES_UPPER Number of transmit bytes exceeded user-defined maximum threshold on a switch interface
        Multicast transmitted bytes TCA_TXMULTICAST_UPPER Number of multicast transmit bytes per second exceeded user-defined maximum threshold on a switch interface
        NetQ UI Name NetQ CLI Event ID Description
        Link flap errors TCA_LINK_FLAP_UPPER Number of link flaps exceeded user-defined maximum threshold

        Resource Utilization

        NetQ UI Name NetQ CLI Event ID Description
        Service memory utilization TCA_SERVICE_MEMORY_UTILIZATION_UPPER Percentage of service memory utilization exceeded user-defined maximum threshold on a switch
        Disk utilization TCA_DISK_UTILIZATION_UPPER Percentage of disk utilization exceeded user-defined maximum threshold on a switch or host
        CPU utilization TCA_CPU_UTILIZATION_UPPER Percentage of CPU utilization exceeded user-defined maximum threshold on a switch or host
        Service CPU utilization TCA_SERVICE_CPU_UTILIZATION_UPPER Percentage of service CPU utilization exceeded user-defined maximum threshold on a switch
        Memory utilization TCA_MEMORY_UTILIZATION_UPPER Percentage of memory utilization exceeded user-defined maximum threshold on a switch or host

        RoCE

        NetQ UI Name NetQ CLI Event ID Description
        Rx CNP buffer usage TCA_RX_CNP_BUFFER_USAGE_CELLS Percentage of Rx General+CNP buffer usage exceeded user-defined maximum threshold on a switch interface
        Rx CNP no buffer discard TCA_RX_CNP_NO_BUFFER_DISCARD Rate of Rx General+CNP no buffer discard exceeded user-defined maximum threshold on a switch interface
        Rx CNP PG usage TCA_RX_CNP_PG_USAGE_CELLS Percentage of Rx General+CNP PG usage exceeded user-defined maximum threshold on a switch interface
        Rx RoCE buffer usage TCA_RX_ROCE_BUFFER_USAGE_CELLS Percentage of Rx RoCE buffer usage exceeded user-defined maximum threshold on a switch interface
        Rx RoCE no buffer discard TCA_RX_ROCE_NO_BUFFER_DISCARD Rate of Rx RoCE no buffer discard exceeded user-defined maximum threshold on a switch interface
        Rx RoCE PG usage TCA_RX_ROCE_PG_USAGE_CELLS Percentage of Rx RoCE PG usage exceeded user-defined maximum threshold on a switch interface
        Rx RoCE PFC pause duration TCA_RX_ROCE_PFC_PAUSE_DURATION Number of Rx RoCE PFC pause duration exceeded user-defined maximum threshold on a switch interface
        Rx RoCE PFC pause packets TCA_RX_ROCE_PFC_PAUSE_PACKETS Rate of Rx RoCE PFC pause packets exceeded user-defined maximum threshold on a switch interface
        Tx CNP buffer usage TCA_TX_CNP_BUFFER_USAGE_CELLS Percentage of Tx General+CNP buffer usage exceeded user-defined maximum threshold on a switch interface
        Tx CNP TC usage TCA_TX_CNP_TC_USAGE_CELLS Percentage of Tx CNP TC usage exceeded user-defined maximum threshold on a switch interface
        Tx CNP unicast no buffer discard TCA_TX_CNP_UNICAST_NO_BUFFER_DISCARD Rate of Tx CNP unicast no buffer discard exceeded user-defined maximum threshold on a switch interface
        Tx ECN marked packets TCA_TX_ECN_MARKED_PACKETS Rate of Tx Port ECN marked packets exceeded user-defined maximum threshold on a switch interface
        Tx RoCE buffer usage TCA_TX_ROCE_BUFFER_USAGE_CELLS Percentage of Tx RoCE buffer usage exceeded user-defined maximum threshold on a switch interface
        Tx RoCE PFC pause duration TCA_TX_ROCE_PFC_PAUSE_DURATION Number of Tx RoCE PFC pause duration exceeded user-defined maximum threshold on a switch interface
        Tx RoCE PFC pause packets TCA_TX_ROCE_PFC_PAUSE_PACKETS Rate of Tx RoCE PFC pause packets exceeded user-defined maximum threshold on a switch interface
        Tx RoCE TC usage TCA_TX_ROCE_TC_USAGE_CELLS Percentage of Tx RoCE TC usage exceeded user-defined maximum threshold on a switch interface
        Tx RoCE unicast no buffer discard TCA_TX_ROCE_UNICAST_NO_BUFFER_DISCARD Rate of Tx RoCE unicast no buffer discard exceeded user-defined maximum threshold on a switch interface

        Sensors

        NetQ UI Name NetQ CLI Event ID Description
        Fan speed TCA_SENSOR_FAN_UPPER Fan speed exceeded user-defined maximum threshold on a switch
        Power supply watts TCA_SENSOR_POWER_UPPER Power supply output exceeded user-defined maximum threshold on a switch
        Power supply volts TCA_SENSOR_VOLTAGE_UPPER Power supply voltage exceeded user-defined maximum threshold on a switch
        Switch temperature TCA_SENSOR_TEMPERATURE_UPPER Temperature (° C) exceeded user-defined maximum threshold on a switch

        What Just Happened

        NetQ UI Name NetQ CLI Event ID Drop Type Reason/Port Down Reason Description
        ACL drop aggregate upper TCA_WJH_ACL_DROP_AGG_UPPER ACL Egress port ACL ACL action set to deny on the physical egress port or bond
        ACL drop aggregate upper TCA_WJH_ACL_DROP_AGG_UPPER ACL Egress router ACL ACL action set to deny on the egress switch virtual interfaces (SVIs)
        ACL drop aggregate upper TCA_WJH_ACL_DROP_AGG_UPPER ACL Ingress port ACL ACL action set to deny on the physical ingress port or bond
        ACL drop aggregate upper TCA_WJH_ACL_DROP_AGG_UPPER ACL Ingress router ACL ACL action set to deny on the ingress switch virtual interfaces (SVIs)
        Buffer drop aggregate upper TCA_WJH_BUFFER_DROP_AGG_UPPER Buffer Packet Latency Threshold Crossed Time a packet spent within the switch exceeded or dropped below the specified high or low threshold
        Buffer drop aggregate upper TCA_WJH_BUFFER_DROP_AGG_UPPER Buffer Port TC Congestion Threshold Crossed Percentage of the occupancy buffer exceeded or dropped below the specified high or low threshold
        Buffer drop aggregate upper TCA_WJH_BUFFER_DROP_AGG_UPPER Buffer Tail drop Tail drop is enabled, and buffer queue is filled to maximum capacity
        Buffer drop aggregate upper TCA_WJH_BUFFER_DROP_AGG_UPPER Buffer WRED Weighted Random Early Detection is enabled, and buffer queue is filled to maximum capacity or the RED engine dropped the packet as of random congestion prevention
        CRC error upper TCA_WJH_CRC_ERROR_UPPER L1 Auto-negotiation failure Negotiation of port speed with peer has failed
        CRC error upper TCA_WJH_CRC_ERROR_UPPER L1 Bad signal integrity Integrity of the signal on port is not sufficient for good communication
        CRC error upper TCA_WJH_CRC_ERROR_UPPER L1 Cable/transceiver is not supported The attached cable or transceiver is not supported by this port
        CRC error upper TCA_WJH_CRC_ERROR_UPPER L1 Cable/transceiver is unplugged A cable or transceiver is missing or not fully inserted into the port
        CRC error upper TCA_WJH_CRC_ERROR_UPPER L1 Calibration failure Calibration failure
        CRC error upper TCA_WJH_CRC_ERROR_UPPER L1 Link training failure Link is not able to go operational up due to link training failure
        CRC error upper TCA_WJH_CRC_ERROR_UPPER L1 Peer is sending remote faults Peer node is not operating correctly
        CRC error upper TCA_WJH_CRC_ERROR_UPPER L1 Port admin down Port has been purposely set down by user
        Drop aggregate upper TCA_WJH_DROP_AGG_UPPER L2 Destination MAC is reserved (DMAC=01-80-C2-00-00-0x) The address cannot be used by this link
        Drop aggregate upper TCA_WJH_DROP_AGG_UPPER L2 Ingress spanning tree filter Port is in Spanning Tree blocking state
        Drop aggregate upper TCA_WJH_DROP_AGG_UPPER L2 Ingress VLAN filtering Frames whose port is not a member of the VLAN are discarded
        Drop aggregate upper TCA_WJH_DROP_AGG_UPPER L2 MLAG port isolation Not supported for port isolation implemented with system ACL
        Drop aggregate upper TCA_WJH_DROP_AGG_UPPER L2 Multicast egress port list is empty No ports are defined for multicast egress
        Drop aggregate upper TCA_WJH_DROP_AGG_UPPER L2 Port loopback filter Port is operating in loopback mode; packets are being sent to itself (source MAC address is the same as the destination MAC address
        Drop aggregate upper TCA_WJH_DROP_AGG_UPPER L2 Unicast MAC table action discard Currently not supported
        Drop aggregate upper TCA_WJH_DROP_AGG_UPPER L2 VLAN tagging mismatch VLAN tags on the source and destination do not match
        Drop aggregate upper TCA_WJH_DROP_AGG_UPPER Router Blackhole ARP/neighbor Packet received with blackhole adjacency
        Drop aggregate upper TCA_WJH_DROP_AGG_UPPER Router Blackhole route Packet received with action equal to discard
        Drop aggregate upper TCA_WJH_DROP_AGG_UPPER Router Checksum or IPver or IPv4 IHL too short Cannot read packet due to header checksum error, IP version mismatch, or IPv4 header length is too short
        Drop aggregate upper TCA_WJH_DROP_AGG_UPPER Router Destination IP is loopback address Cannot read packet as destination IP address is a loopback address (dip=>127.0.0.0/8)
        Drop aggregate upper TCA_WJH_DROP_AGG_UPPER Router Egress router interface is disabled Packet destined to a different subnet cannot be routed because egress router interface is disabled
        Drop aggregate upper TCA_WJH_DROP_AGG_UPPER Router Ingress router interface is disabled Packet destined to a different subnet cannot be routed because ingress router interface is disabled
        Drop aggregate upper TCA_WJH_DROP_AGG_UPPER Router IPv4 destination IP is link local Packet has IPv4 destination address that is a local link (destination in 169.254.0.0/16)
        Drop aggregate upper TCA_WJH_DROP_AGG_UPPER Router IPv4 destination IP is local network (destination=0.0.0.0/8) Packet has IPv4 destination address that is a local network (destination=0.0.0.0/8)
        Drop aggregate upper TCA_WJH_DROP_AGG_UPPER Router IPv4 routing table (LPM) unicast miss No route available in routing table for packet
        Drop aggregate upper TCA_WJH_DROP_AGG_UPPER Router IPv4 source IP is limited broadcast Packet has broadcast source IP address
        Drop aggregate upper TCA_WJH_DROP_AGG_UPPER Router IPv6 destination in multicast scope FFx0:/16 Packet received with multicast destination address in FFx0:/16 address range
        Drop aggregate upper TCA_WJH_DROP_AGG_UPPER Router IPv6 destination in multicast scope FFx1:/16 Packet received with multicast destination address in FFx1:/16 address range
        Drop aggregate upper TCA_WJH_DROP_AGG_UPPER Router IPv6 routing table (LPM) unicast miss No route available in routing table for packet
        Drop aggregate upper TCA_WJH_DROP_AGG_UPPER Router Multicast MAC mismatch For IPv4, destination MAC address is not equal to {0x01-00-5E-0 (25 bits), DIP[22:0]} and DIP is multicast. For IPv6, destination MAC address is not equal to {0x3333, DIP[31:0]} and DIP is multicast
        Drop aggregate upper TCA_WJH_DROP_AGG_UPPER Router Non IP packet Cannot read packet header because it is not an IP packet
        Drop aggregate upper TCA_WJH_DROP_AGG_UPPER Router Non-routable packet Packet has no route in routing table
        Drop aggregate upper TCA_WJH_DROP_AGG_UPPER Router Packet size is larger than router interface MTU Packet has larger MTU configured than the VLAN
        Drop aggregate upper TCA_WJH_DROP_AGG_UPPER Router Router interface loopback Packet has destination IP address that is local. For example, SIP = 1.1.1.1, DIP = 1.1.1.128.
        Drop aggregate upper TCA_WJH_DROP_AGG_UPPER Router Source IP equals destination IP Packet has a source IP address equal to the destination IP address
        Drop aggregate upper TCA_WJH_DROP_AGG_UPPER Router Source IP is in class E Cannot read packet as source IP address is a Class E address
        Drop aggregate upper TCA_WJH_DROP_AGG_UPPER Router Source IP is loopback address Cannot read packet as source IP address is a loopback address ( ipv4 => 127.0.0.0/8 for ipv6 => ::1/128)
        Drop aggregate upper TCA_WJH_DROP_AGG_UPPER Router Source IP is multicast Cannot read packet as source IP address is a multicast address (ipv4 SIP => 224.0.0.0/4)
        Drop aggregate upper TCA_WJH_DROP_AGG_UPPER Router Source IP is unspecified Cannot read packet as source IP address is unspecified (ipv4 = 0.0.0.0/32; for ipv6 = ::0)
        Drop aggregate upper TCA_WJH_DROP_AGG_UPPER Router TTL value is too small Packet has TTL value of 1
        Drop aggregate upper TCA_WJH_DROP_AGG_UPPER Router Unicast destination IP but multicast destination MAC Cannot read packet with IP unicast address when destination MAC address is not unicast (FF:FF:FF:FF:FF:FF)
        Drop aggregate upper TCA_WJH_DROP_AGG_UPPER Router Unresolved neighbor/next-hop The next hop in the route is unknown
        Drop aggregate upper TCA_WJH_DROP_AGG_UPPER Tunnel Decapsulation error Decapsulation produced incorrect format of packet. For example, encapsulation of packet with many VLANs or IP options on the underlay can cause de-capsulation to result in a short packet.
        Drop aggregate upper TCA_WJH_DROP_AGG_UPPER Tunnel Overlay switch - Source MAC equals destination MAC Overlay packet’s source MAC address is the same as the destination MAC address
        Drop aggregate upper TCA_WJH_DROP_AGG_UPPER Tunnel Overlay switch - Source MAC is multicast Overlay packet’s source MAC address is multicast
        Symbol error upper TCA_WJH_SYMBOL_ERROR_UPPER L1 Auto-negotiation failure Negotiation of port speed with peer has failed
        Symbol error upper TCA_WJH_SYMBOL_ERROR_UPPER L1 Bad signal integrity Integrity of the signal on port is not sufficient for good communication
        Symbol error upper TCA_WJH_SYMBOL_ERROR_UPPER L1 Cable/transceiver is not supported The attached cable or transceiver is not supported by this port
        Symbol error upper TCA_WJH_SYMBOL_ERROR_UPPER L1 Cable/transceiver is unplugged A cable or transceiver is missing or not fully inserted into the port
        Symbol error upper TCA_WJH_SYMBOL_ERROR_UPPER L1 Calibration failure Calibration failure
        Symbol error upper TCA_WJH_SYMBOL_ERROR_UPPER L1 Link training failure Link is not able to go operational up due to link training failure
        Symbol error upper TCA_WJH_SYMBOL_ERROR_UPPER L1 Peer is sending remote faults Peer node is not operating correctly
        Symbol error upper TCA_WJH_SYMBOL_ERROR_UPPER L1 Port admin down Port has been purposely set down by user

        WJH Events Reference

        This reference lists all the NetQ-supported What Just Happened (WJH) metrics and provides a brief description of each. The full outputs vary slightly based on the type of drop and whether you are viewing the results in the NetQ UI or through one of the NetQ CLI commands.

        For instructions on how to configure and monitor What Just Happened events, refer to Configure and Monitor What Just Happened.

        Layer 1 Drops

        Describes why a port is in the down state.

        Reason Description
        Auto-negotiation failure Negotiation of port speed with peer has failed
        Logical mismatch with peer link Logical mismatch with peer link
        Link training failure Link is not able to go operational up due to link training failure
        Peer is sending remote faults Peer node is not operating correctly
        Bad signal integrity Integrity of the signal on port is not sufficient for good communication
        Cable/transceiver is not supported The attached cable or transceiver is not supported by this port
        Cable/transceiver is unplugged A cable or transceiver is missing or not fully inserted into the port
        Calibration failure Calibration failure
        Port state changes counter Cumulative number of state changes
        Symbol error counter Cumulative number of symbol errors
        CRC error counter Cumulative number of CRC errors

        In addition to the reason, the information provided for these drops includes:

        Parameter Description
        Corrective Action Provides recommend actions to take to resolve the port down state
        First Timestamp Date and time this port was marked as down for the first time
        Ingress Port Port accepting incoming traffic
        CRC Error Count Number of CRC errors generated by this port
        Symbol Error Count Number of Symbol errors generated by this port
        State Change Count Number of state changes that have occurred on this port
        OPID Operation identifier; used for internal purposes
        Is Port Up Indicates whether the port is in an Up (true) or Down (false) state

        Layer 2 Drops

        Describes why a link is down.

        Reason Severity Description
        MLAG port isolation Notice Not supported for port isolation implemented with system ACL
        Destination MAC is reserved (DMAC=01-80-C2-00-00-0x) Error The address cannot be used by this link
        VLAN tagging mismatch Error VLAN tags on the source and destination do not match
        Ingress VLAN filtering Error Frames whose port is not a member of the VLAN are discarded
        Ingress spanning tree filter Notice Port is in Spanning Tree blocking state
        Unicast MAC table action discard Notice Packet dropped due to a MAC table configuration rule
        Multicast egress port list is empty Warning No ports are defined for multicast egress
        Port loopback filter Error Port is operating in loopback mode; packets are being sent to itself (source MAC address is the same as the destination MAC address)
        Source MAC is multicast Error Packets have multicast source MAC address
        Source MAC equals destination MAC Error Source MAC address is the same as the destination MAC address

        In addition to the reason, the information provided for these drops includes:

        Parameter Description
        Source Port Port ID where the link originates
        Source IP Port IP address where the link originates
        Source MAC Port MAC address where the link originates
        Destination Port Port ID where the link terminates
        Destination IP Port IP address where the link terminates
        Destination MAC Port MAC address where the link terminates
        First Timestamp Date and time this link was marked as down for the first time
        Aggregate Count Total number of dropped packets
        Protocol ID of the communication protocol running on this link
        Ingress Port Port accepting incoming traffic
        OPID Operation identifier; used for internal purposes

        Router Drops

        Describes why the server is unable to route a packet.

        Reason Severity Description
        Non-routable packet Notice Packet has no route in routing table
        Blackhole route Warning Packet received with action equal to discard
        Unresolved next hop Warning The next hop in the route is unknown
        Blackhole ARP/neighbor Warning Packet received with blackhole adjacency
        IPv6 destination in multicast scope FFx0:/16 Notice Packet received with multicast destination address in FFx0:/16 address range
        IPv6 destination in multicast scope FFx1:/16 Notice Packet received with multicast destination address in FFx1:/16 address range
        Non-IP packet Notice Cannot read packet header because it is not an IP packet
        Unicast destination IP but non-unicast destination MAC Error Cannot read packet with IP unicast address when destination MAC address is not unicast (FF:FF:FF:FF:FF:FF)
        Destination IP is loopback address Error Cannot read packet as destination IP address is a loopback address (dip=>127.0.0.0/8)
        Source IP is multicast Error Cannot read packet as source IP address is a multicast address (ipv4 SIP => 224.0.0.0/4)
        Source IP is in class E Error Cannot read packet as source IP address is a Class E address
        Source IP is loopback address Error Cannot read packet as source IP address is a loopback address (ipv4 => 127.0.0.0/8 for ipv6 => ::1/128)
        Source IP is unspecified Error Cannot read packet as source IP address is unspecified (ipv4 = 0.0.0.0/32; for ipv6 = ::0)
        Checksum or IP ver or IPv4 IHL too short Error Cannot read packet due to header checksum error, IP version mismatch, or IPv4 header length is too short
        Multicast MAC mismatch Error For IPv4, destination MAC address is not equal to {0x01-00-5E-0 (25 bits), DIP[22:0]} and DIP is multicast. For IPv6, destination MAC address is not equal to {0x3333, DIP[31:0]} and DIP is multicast
        Source IP equals destination IP Error Packet has a source IP address equal to the destination IP address
        IPv4 source IP is limited broadcast Error Packet has broadcast source IP address
        IPv4 destination IP is local network (destination = 0.0.0.0/8) Error Packet has IPv4 destination address that is a local network (destination=0.0.0.0/8)
        IPv4 destination IP is link-local (destination in 169.254.0.0/16) Error Packet has IPv4 destination address that is a local link
        Ingress router interface is disabled Warning Packet destined to a different subnet cannot be routed because ingress router interface is disabled
        Egress router interface is disabled Warning Packet destined to a different subnet cannot be routed because egress router interface is disabled
        IPv4 routing table (LPM) unicast miss Warning No route available in routing table for packet
        IPv6 routing table (LPM) unicast miss Warning No route available in routing table for packet
        Router interface loopback Warning Packet has destination IP address that is local. For example, SIP = 1.1.1.1, DIP = 1.1.1.128.
        Packet size is larger than MTU Warning Packet has larger MTU configured than the VLAN
        TTL value is too small Warning Packet has TTL value of 1

        Tunnel Drops

        Describes why a tunnel is down.

        Reason Severity Description
        Overlay switch - source MAC is multicast Error Overlay packet’s source MAC address is multicast
        Overlay switch - source MAC equals destination MAC Error Overlay packet’s source MAC address is the same as the destination MAC address
        Decapsulation error Error De-capsulation produced incorrect format of packet. For example, encapsulation of packet with many VLANs or IP options on the underlay can cause de-capsulation to result in a short packet.
        Tunnel interface is disabled Error Packet cannot de-capsulate because the tunnel interface is disabled

        Buffer Drops

        Describes why the server buffer has dropped packets.

        Reason Severity Description
        Tail drop Warning Tail drop is enabled, and buffer queue is filled to maximum capacity
        WRED Warning Weighted Random Early Detection is enabled, and buffer queue is filled to maximum capacity or the RED engine dropped the packet as of random congestion prevention
        Port TC Congestion Threshold Crossed Warning Percentage of the occupancy buffer exceeded or dropped below the specified high or low threshold
        Packet Latency Threshold Crossed Warning Time a packet spent within the switch exceeded or dropped below the specified high or low threshold

        ACL Drops

        Describes why an ACL has dropped packets.

        Reason Severity Description
        Ingress port ACL Notice ACL action set to deny on the physical ingress port or bond
        Ingress router ACL Notice ACL action set to deny on the ingress switch virtual interfaces (SVIs)
        Egress port ACL Notice ACL action set to deny on the physical egress port or bond
        Egress router ACL Notice ACL action set to deny on the egress SVIs

        EVPN

        Use the UI or CLI to monitor Ethernet VPN (EVPN) on a networkwide or per-session basis.

        EVPN Commands

        Monitor EVPN with the following commands. See the command line reference for additional options, definitions, and examples.

        netq show evpn
        netq show events message_type evpn
        netq show events-config message_type evpn
        

        The netq check evpn command verifies the communication status for all nodes (leafs, spines, and hosts) running instances of EVPN in your network fabric:

        netq check evpn
        

        View EVPN in the UI

        To add the EVPN card to your workbench, navigate to the header and select Add card > Network services > All EVPN Sessions card > Open cards. In this example, there are 6 nodes running the EVPN service, 0 open events (from the last 24 hours), and 48 VNIs.

        View the Distribution of Layer-2 and -3 VNIs and Sessions

        To view the number of sessions between devices and Virtual Network Identifiers (VNIs) that occur over layer 3, open the large EVPN Sessions card. In this example, there are 18 layer-3 VNIs.

        Select the dropdown to display the switches with the most EVPN sessions, as well as the switches with the most layer-2 and layer-3 EVPN sessions.

        You can view EVPN-related events by selecting the Events tab.

        Expand the EVPN card to full-screen to view, filter, or export:

        From this table, you can select a row, then click Add card above the table.

        NetQ adds a new, EVPN ‘single-session’ card to your workbench. From this card, you can view the number of VTEPs (VXLAN Tunnel Endpoints) for a given EVPN session as well as the attributes of all EVPN sessions for a given VNI.

        Monitor a Single EVPN Session

        The EVPN single-session card displays the number of VTEPs for a given EVPN session (in this case, 48).

        Expand the card to display the associated VRF (layer 3) or VLAN (layer 2) on each device participating in this session. The full-screen card displays all stored attributes of all EVPN sessions running networkwide.

        Hosts

        The NetQ Agent monitors the following on Linux hosts:

        Using NetQ on a Linux host is the same as using it on a Cumulus Linux switch. For example, if you want to check LLDP neighbor information for a given host, run netq show lldp and specify the hostname:

        cumulus@host:~$ netq server01 show lldp
        Matching lldp records:
        Hostname          Interface                 Peer Hostname     Peer Interface            Last Changed
        ----------------- ------------------------- ----------------- ------------------------- -------------------------
        server01          eth0                      oob-mgmt-switch   swp2                      Thu Sep 17 20:27:48 2020
        server01          eth1                      leaf01            swp1                      Thu Sep 17 20:28:21 2020
        server01          eth2                      leaf02            swp1                      Thu Sep 17 20:28:21 2020
        

        Then, to see LLDP from the switch perspective run the same command, specifying the hostname of the switch:

        cumulus@switch:~$ netq leaf01 show lldp
        Matching lldp records:
        Hostname          Interface                 Peer Hostname     Peer Interface            Last Changed
        ----------------- ------------------------- ----------------- ------------------------- -------------------------
        leaf01            eth0                      oob-mgmt-switch   swp10                     Thu Sep 17 20:10:05 2020
        leaf01            swp54                     spine04           swp1                      Thu Sep 17 20:26:13 2020
        leaf01            swp53                     spine03           swp1                      Thu Sep 17 20:26:13 2020
        leaf01            swp49                     leaf02            swp49                     Thu Sep 17 20:26:13 2020
        leaf01            swp2                      server02          mac:44:38:39:00:00:34     Thu Sep 17 20:28:14 2020
        leaf01            swp51                     spine01           swp1                      Thu Sep 17 20:26:13 2020
        leaf01            swp52                     spine02           swp1                      Thu Sep 17 20:26:13 2020
        leaf01            swp50                     leaf02            swp50                     Thu Sep 17 20:26:13 2020
        leaf01            swp1                      server01          mac:44:38:39:00:00:32     Thu Sep 17 20:28:14 2020
        leaf01            swp3                      server03          mac:44:38:39:00:00:36     Thu Sep 17 20:28:14 2020
        

        To view the routing table for a server, run netq show ip routes:

        cumulus@host:~$ netq server01 show ip routes
        Matching routes records:
        Origin VRF             Prefix                         Hostname          Nexthops                            Last Changed
        ------ --------------- ------------------------------ ----------------- ----------------------------------- -------------------------
        no     default         0.0.0.0/0                      server01          192.168.200.1: eth0                 Thu Sep 17 20:27:30 2020
        yes    default         192.168.200.31/32              server01          eth0                                Thu Sep 17 20:27:30 2020
        yes    default         10.1.10.101/32                 server01          uplink                              Thu Sep 17 20:27:30 2020
        no     default         10.0.0.0/8                     server01          10.1.10.1: uplink                   Thu Sep 17 20:27:30 2020
        yes    default         192.168.200.0/24               server01          eth0                                Thu Sep 17 20:27:30 2020
        yes    default         10.1.10.0/24                   server01          uplink                              Thu Sep 17 20:27:30 2020
        

        Interfaces

        Physical Interfaces Commands

        Use the CLI to monitor OSI Layer 1 physical components on network devices, including interfaces, ports, links, and peers. You can monitor transceivers and cabling deployed per port (interface), per vendor, per part number, and so forth.

        This information can help you:

        NetQ uses LLDP (Link Layer Discovery Protocol) to collect port information. NetQ can also identify peer ports connected to DACs (Direct Attached Cables) and AOCs (Active Optical Cables) without using LLDP, even if the link is not UP.

        View performance and status information about cables, transceiver modules, and interfaces with netq show interfaces physical:

        netq show interfaces physical 
        

        View Utilization Statistics Networkwide

        Utilization statistics can indicate whether resources are becoming dangerously close to their maximum capacity or other, user-defined thresholds. Depending on the function of the switch, the acceptable thresholds can vary.

        Compute Resources Utilization

        View how many compute resources—CPU, disk, and memory—the switches on your network consume with netq show resource-util:

        netq show resource-util 
        

        Port Statistics

        View statistics about a given node and interface, including frame errors, ACL drops, and buffer drops with netq show ethtool-stats:

        netq show ethtool-stats
        

        Interface Statistics and Utilization

        NetQ Agents collect performance statistics every 30 seconds for the physical interfaces on switches in your network. The NetQ Agent does not collect statistics for non-physical interfaces, such as bonds, bridges, and VXLANs. The NetQ Agent collects the following statistics:

        To view interface statistics and utilization, run the netq show interface-stats or netq show interface-utilization commands:

        netq show interface-stats 
        netq show interface-utilization
        

        ACL Resource Utilization Networkwide

        View incoming and outgoing access control lists (ACLs) configured on all switches and host with netq show cl-resource acl:

        netq show cl-resource acl
        

        Forwarding Resources Utilization Networkwide

        View forwarding resources on all devices with netq show cl-resource forwarding:

        netq show cl-resource forwarding
        

        SSD Utilization Networkwide

        For NetQ Appliances that have 3ME3 solid state drives (SSDs) installed (primarily in on-premises deployments), you can view the utilization of the drive on demand. A warning is generated when a drive drops below 10% health, or has more than a 2% loss of health in 24 hours, indicating the need to rebalance the drive. Tracking SSD utilization over time lets you see any downward trend or drive instability before you receive a warning message.

        View SDD utilization with netq show cl-ssd-util:

        netq show cl-ssd-util
        

        Disk Storage After BTRFS Allocation Networkwide

        Customers running Cumulus Linux 3 which uses the BTRFS (b-tree file system) might experience issues with disk space management. This is a known problem of BTRFS because it does not perform periodic garbage collection, or rebalancing. If left unattended, these errors can make it impossible to rebalance the partitions on the disk. To avoid this issue, NVIDIA recommends rebalancing the BTRFS partitions in a preemptive manner, but only when absolutely needed to avoid reduction in the lifetime of the disk. By tracking the state of the disk space usage, users can determine when to rebalance.

        For details about when to rebalance a partition, refer to When to Rebalance BTRFS Partitions.

        View BTRFS disk utilization with netq show cl-btrfs-info:

        netq show cl-btrfs-info
        

        View interface (link) state, type, count, aliases, and additional information with variations of the netq show interfaces command, including netq show interfaces type and netq show events message_type interfaces:

        netq show interfaces
        netq show interfaces type
        netq show events message_type interfaces 
        

        The netq check interfaces command verifies interface communication status for all nodes (leafs, spines, and hosts) or an interface between specific nodes in your network fabric. This command only checks the physical interfaces; it does not check bridges, bonds, or other software constructs.

        netq check interfaces
        

        You can monitor the same information outlined in the section above via the UI by expanding the Menu, then selecting Interfaces.

        Check for MTU Inconsistencies

        The maximum transmission unit (MTU) determines the largest size packet or frame that can be transmitted across a given communication link. When the MTU is not configured to the same value on both ends of the link, communication problems can occur. Use the netq check mtu command to verify that the MTU is correctly specified for each link.

        IP Addresses

        Use the UI or CLI to monitor Internet Protocol (IP) addresses, neighbors, and routes.

        This information can help you:

        IP Address Commands

        Monitor IP addresses and determine neighbors and routes with netq show ip addresses, netq show ip neighbors, and netq show ip routes. Two sets of IP commands are available—one for IPv4 and one for IPv6.

        netq show ip addresses
        netq show ipv6 addresses
        
        netq show ip neighbors
        netq show ipv6 neighbors 
        
        netq show ip routes    
        netq show ipv6 routes
        

        The netq show address-history command displays when an IP address configuration changed for an interface. Add options to the command to show:

        All changes are listed chronologically.

        netq show address-history
        

        The netq show neighbor-history command displays when the neighbor configuration changed for an IP address.

        netq show neighbor-history
        

        The netq check addresses command searches for duplicate IPv4 and IPv6 addresses assigned to interfaces across devices in the inventory, and checks for duplicate /32 host routes in each VRF.

        netq check addresses
        

        View IP Addresses in the UI

        IPv4 and IPv6 address, neighbor, and route information is available in the NetQ UI. To access this information, select the Menu. Then select IP addresses, IP neighbors, or IP routes from the list of options. The following image displays a list of IP addresses:

        Validate Network Protocol and Service Operations

        NetQ lets you validate the operation of the protocols and services running in your network either on demand or according to a schedule. For a general understanding of how well your network is operating, refer to the Validate Overall Network Health.

        On-demand Validations

        When you want to validate the operation of one or more network protocols and services right now, you can create and run on-demand validations using the NetQ UI or the NetQ CLI.

        Create an On-demand Validation

        Using the NetQ UI, you can create an on-demand validation for multiple protocols or services at the same time. This is handy when the protocols are strongly related regarding a possible issue or if you only want to create one validation request.

        To run on-demand validations with the CLI, use the netq check commands.

        To create and run a request containing checks on one or more protocols or services within the NetQ UI:

        1. In the workbench header, select Validation, then Create a validation. Choose whether the on-demand validation should run on all devices or on specific device groups.

        2. Select the protocols or services you want to validate, then click Next.

          This example has BGP selected and displays the 8 checks that NetQ runs during a BGP validation:

        1. Select Now and specify a workbench:
        1. Click Run to start the validation. It might take a few minutes for results to appear.

          The respective On-demand Validation Result card opens on your workbench. If you selected more than one protocol or service, a card opens for each selection. To view additional information about the errors reported, hover over a check and click View details. To view all data for all on-demand validation results for a given protocol, click Show all results.

        To create a request containing checks on a single protocol or service in the NetQ CLI, run:

        netq add validation type (ntp | interfaces | license | sensors | evpn | vxlan | agents | mlag | vlan | bgp | mtu | ospf | roce | addr) [alert-on-failure]
        

        The associated Validation Result card is accessible from the full-screen Validate Network card.

        Run an Existing Scheduled Validation On Demand

        To run a scheduled validation now:

        1. Click Validation, then click Existing validations.

        2. Select one or more validations, then click View results.

        3. The associated Validation Result cards open on your workbench.

        Scheduled Validations

        By default, a scheduled validation for each protocol and service runs every hour. You can disable these validation checks in the UI, but you cannot edit them.

        You can create and schedule up to 15 custom validation checks. The hourly, default validation checks do not count towards this limit.

        Schedule a Validation

        1. Click Validation, then click Create a validation. Choose whether the scheduled validation should run on all devices or on specific device groups.

        2. Select the protocols or services you want to validate, then click Next.

        3. Click Later then choose when to start the check and how frequently to repeat the check (every 30 minutes, 1 hour, 3 hours, 6 hours, 12 hours, or 1 day).

        4. Click Schedule.

          To see the card with the other network validations, click View. If you selected more than one protocol or service, a card opens for each selection. To view the card on your workbench, click Open card.

        To create a scheduled request containing checks on a single protocol or service in the NetQ CLI, run:

        netq add validation name <text-new-validation-name> type (addr | agents | bgp | evpn | interfaces | license | mlag | mtu | ntp | ospf | roce | sensors | vlan | vxlan) interval <text-time-min> [alert-on-failure]
        

        The following example creates a BGP validation that runs every 15 minutes:

        cumulus@switch:~$ netq add validation name Bgp15m type bgp interval 15m
        Successfully added Bgp15m running every 15m
        

        The associated Validation Result card is accessible from the full-screen Scheduled Validation Result card.

        View Scheduled Validation Results

        After creating scheduled validations with either the NetQ UI or the NetQ CLI, the results appear in the Scheduled Validation Result card. When a request has completed processing, you can access the Validation Result card from the full-screen Validations card. Each protocol and service has its own validation result card, but the content is similar on each.

        Granularity of Data Shown Based on Time Period

        On the medium and large Validation Result cards, vertically stacked heat maps represent the status of the runs; one for passing runs, one for runs with warnings, and one for runs with failures. Depending on the time period of data on the card, the number of smaller time blocks indicate that the status varies. A vertical stack of time blocks, one from each map, includes the results from all checks during that time. The results appear by how saturated the color is for each block. If all validations during that time period pass, then the middle block is 100% saturated (white) and the warning and failure blocks are zero % saturated (gray). As warnings and errors increase in saturation, the passing block is proportionally reduced in saturation. The example heat map for a time period of 24 hours shown here uses the most common time periods from the table showing the resulting time blocks and regions.

        Time Period Number of Runs Number Time Blocks Amount of Time in Each Block
        6 hours 18 6 1 hour
        12 hours 36 12 1 hour
        24 hours 72 24 1 hour
        1 week 504 7 1 day
        1 month 2,086 30 1 day
        1 quarter 7,000 13 1 week

        Access and Analyze the Scheduled Validation Results

        After a scheduled validation request has completed, the results are available in the corresponding Validation Result card.

        To access the results:

        1. In the workbench header, select Validation, then click Existing validations.

        2. Select the validation results you want to view, then click View results.

        3. The medium Scheduled Validation Result card(s) for the selected items appear on your workbench.

        To analyze the results:

        1. Note the distribution of results. Are there many failures? Are they concentrated together in time? Has the protocol or service recovered after the failures?

        2. Hover over the heat maps to view the status numbers and what percentage of the total results that represents for a given region. The tooltip also shows the number of devices included in the validation and the number with warnings and/or failures. This is useful when you see the failures occurring on a small set of devices, as it might point to an issue with the devices rather than the network service.

        3. Expand the card to display a chart alongside events metrics. Click to expand or collapse the chart.

        4. You can view the configuration of the request that produced the results shown on this card, by hovering over the card and selecting the Configuration tab.

        5. To view all data available for all scheduled validation results for the given protocol or service, expand the card to full-screen.

        6. In the Checks box, hover over an individual check and select View details for additional information:

        Manage Scheduled Validations

        You can edit or delete any scheduled validation that you created. Default validations cannot be edited or deleted, but can be disabled.

        Edit a Scheduled Validation

        At some point you might want to change the schedule or validation types that are specified in a scheduled validation request. This creates a new validation request and the original validation has the (old) label applied to the name. The old validation can no longer be edited.

        When you update a scheduled request, the results for all future runs of the validation will be different from the results of previous runs of the validation.

        To edit a scheduled validation:

        1. Click Validation, then click Scheduled validations.

        2. Hover over the validation then click Edit.

        3. Select which checks to add or remove from the validation request, then click Update.

        4. Change the schedule for the validation, then click Update.

          You can run the modified validation immediately or wait for it to run according to the schedule you specified.

        Delete a Scheduled Validation

        You can remove a user-defined scheduled validation using the NetQ UI or the NetQ CLI. Default validations cannot be deleted, but they can be disabled.

        1. Click Validation, then click Scheduled validations.

        2. Hover over the validation you want to remove.

        1. Click , then click Yes to confirm.

        2. To disable a default validation, select the icon on the card for the desired validation and select Disable validation. Validation checks can be enabled from the same menu.

        validation card presenting option to disable validation
        1. Determine the name of the scheduled validation you want to remove:

          netq show validation summary [name <text-validation-name>] type (ntp | interfaces | license | sensors | evpn | vxlan | agents | mlag | vlan | bgp | mtu | ospf | roce | addr) [around <text-time-hr>] [json]
          

          This example shows all scheduled validations for BGP.

          cumulus@switch:~$ netq show validation summary type bgp
          Name            Type             Job ID       Checked Nodes              Failed Nodes             Total Nodes            Timestamp
          --------------- ---------------- ------------ -------------------------- ------------------------ ---------------------- -------------------------
          Bgp30m          scheduled        4c78cdf3-24a 0                          0                        0                      Thu Nov 12 20:38:20 2020
                                          6-4ecb-a39d-
                                          0c2ec265505f
          Bgp15m          scheduled        2e891464-637 10                         0                        10                     Thu Nov 12 20:28:58 2020
                                          a-4e89-a692-
                                          3bf5f7c8fd2a
          Bgp30m          scheduled        4c78cdf3-24a 0                          0                        0                      Thu Nov 12 20:24:14 2020
                                          6-4ecb-a39d-
                                          0c2ec265505f
          Bgp30m          scheduled        4c78cdf3-24a 0                          0                        0                      Thu Nov 12 20:15:20 2020
                                          6-4ecb-a39d-
                                          0c2ec265505f
          Bgp15m          scheduled        2e891464-637 10                         0                        10                     Thu Nov 12 20:13:57 2020
                                          a-4e89-a692-
                                          3bf5f7c8fd2a
          ...
          
        2. To remove the validation, run:

          netq del validation <text-validation-name>
          

          This example removes the scheduled validation named Bgp15m.

          cumulus@switch:~$ netq del validation Bgp15m
          Successfully deleted validation Bgp15m
          
        3. Repeat these steps to remove additional scheduled validations.

        Validate Device Groups

        Both on-demand and scheduled validations can run on specific device groups. To create a validation for a device group rather than all devices:

        1. Click Validation, then click Create a validation. Choose Run on group of switches:
        1. Select which group to run the validation on:
        1. Select the protocols or services you want to validate, then click Next.

        2. Select which individual validations to run for each service. Individual checks can be disabled by clicking .

        3. Choose whether to run the validation now or schedule it for another time, then click Run.

        LLDP

        Network devices use Layer Link Discovery Protocol (LLDP) to advertise their identity, capabilities, and neighbors on a LAN. You can view this information for one or more devices. You can also view the information at an earlier point in time or view changes that have occurred to the information during a specified time period. For an overview and how to configure LLDP in your network, refer to Link Layer Discovery Protocol.

        LLDP Commands

        Monitor LLDP with the following commands. See the command line reference for additional options, definitions, and examples.

        netq show lldp
        netq show events message_type lldp
        

        View LLDP in the UI

        To add the LLDP card to your workbench, navigate to the header and select Add card >Network services >All LLDP Sessions card > Open cards. In this example, there are 25 nodes running the LLDP protocol, 184 established sessions, and no LLDP-related events from the past 24 hours:

        Expand to the large card for additional LLDP information. This view displays the number of missing neighbors and how that number has changed over time. This is a good indicator of link communication issues. This info is displayed in the bottom chart, under Total sessions with no NBR. The right half of the card displays the switches handling the most LLDP traffic. Select the dropdown to view switches with unestablished LLDP sessions.

        Expand the LLDP card to full-screen to view, filter, or export:

        From this table, you can select a row, then click Add card above the table.

        NetQ adds a new, LLDP ‘single-session’ card to your workbench.

        Monitor a Single LLDP Session

        From the LLDP single-session card, you can view the number of nodes running the LLDP service, view neighbor state changes, and monitor the running LLDP configuration and any changes to the configuration file. This view is helpful for determining the stability of the LLDP session between two devices.

        Understanding the Heat Map

        On the medium and large single-session cards, vertically stacked heat maps represent the status of the neighboring peers: one for peers that are reachable (neighbor detected) and one for peers that are unreachable (neighbor not detected). Depending on the time period of data on the card, the number of smaller time blocks used to indicate the status varies. A vertical stack of time blocks, one from each map, includes the results from all checks during that time. The results appear by how saturated the color is for each block. If LLDP detected all peers during that time period for the entire time block, then the top block is 100% saturated (white) and the neighbor not detected block is 0% saturated (gray). As peers become reachable, the neighbor-detected block increases in saturation and the peers that are unreachable (neighbor not detected) block is proportionally reduced in saturation. The following table lists the most common time periods, their corresponding number of blocks, and the amount of time represented by one block:

        Time Period Number of Runs Number Time Blocks Amount of Time in Each Block
        6 hours 18 6 1 hour
        12 hours 36 12 1 hour
        24 hours 72 24 1 hour
        1 week 504 7 1 day
        1 month 2,086 30 1 day
        1 quarter 7,000 13 1 week

        View Changes to the LLDP Service Configuration File

        Each time a change is made to the configuration file for the LLDP service, NetQ logs the change and lets you compare it with the last version using the NetQ UI. This can be useful when you are troubleshooting potential causes for alarms or sessions losing their connections.

        1. From the large single-session card, select the Configuration file evolution tab.

        2. Select the time.

        3. Choose between the File view and the Diff view.

          The File view displays the content of the file:

          The Diff view highlights the changes (if any) between this version (on left) and the most recent version (on right) side by side:

        MAC Addresses

        A MAC (media access control) address is a layer 2 construct that uses 48 bits to uniquely identify a network interface controller (NIC) for communication within a network.

        With NetQ, you can:

        MAC addresses are associated with switch interfaces. They are classified as:

        The NetQ UI provides a listing of current MAC addresses that you can filter by hostname, timestamp, MAC address, VLAN, and origin. You can sort the list by these parameters and also remote, static, and next hop.

        The NetQ CLI provides the following commands:

        netq show macs [<mac>] [vlan <1-4096>] [origin] [around <text-time>] [json]
        netq <hostname> show macs [<mac>] [vlan <1-4096>] [origin | count] [around <text-time>] [json]
        netq <hostname> show macs egress-port <egress-port> [<mac>] [vlan <1-4096>] [origin] [around <text-time>] [json]
        netq [<hostname>] show mac-history <mac> [vlan <1-4096>] [diff] [between <text-time> and <text-endtime>] [listby <text-list-by>] [json]
        netq [<hostname>] show mac-commentary <mac> vlan <1-4096> [between <text-time> and <text-endtime>] [json]
        netq [<hostname>] show events [severity info | severity error ] message_type macs [between <text-time> and <text-endtime>] [json]
        

        View MAC Addresses Networkwide

        You can view all MAC addresses across your network with the NetQ UI or the NetQ CLI.

        1. Select the Menu.

        2. Under the Network section, select MACs.

        table listing all devices and their associated MAC addresses

        Use the netq show macs command to view all MAC addresses.

        This example shows all MAC addresses in the Cumulus Networks reference topology.

        cumulus@switch:~$ netq show macs
        Matching mac records:
        Origin MAC Address        VLAN   Hostname          Egress Port                    Remote Last Changed
        ------ ------------------ ------ ----------------- ------------------------------ ------ -------------------------
        no     46:38:39:00:00:46  20     leaf04            bond2                          no     Tue Oct 27 22:29:07 2020
        yes    44:38:39:00:00:5e  20     leaf04            bridge                         no     Tue Oct 27 22:29:07 2020
        yes    00:00:00:00:00:1a  10     leaf04            bridge                         no     Tue Oct 27 22:29:07 2020
        yes    44:38:39:00:00:5e  4002   leaf04            bridge                         no     Tue Oct 27 22:29:07 2020
        no     44:38:39:00:00:5d  30     leaf04            peerlink                       no     Tue Oct 27 22:29:07 2020
        no     44:38:39:00:00:37  30     leaf04            vni30                          no     Tue Oct 27 22:29:07 2020
        no     44:38:39:00:00:59  30     leaf04            vni30                          no     Tue Oct 27 22:29:07 2020
        yes    7e:1a:b3:4f:05:b8  20     leaf04            vni20                          no     Tue Oct 27 22:29:07 2020
        no     44:38:39:00:00:36  30     leaf04            vni30                          yes    Tue Oct 27 22:29:07 2020
        no     44:38:39:00:00:59  20     leaf04            vni20                          no     Tue Oct 27 22:29:07 2020
        no     44:38:39:00:00:37  20     leaf04            vni20                          no     Tue Oct 27 22:29:07 2020
        ...
        yes    7a:4a:c7:bb:48:27  4001   border01          vniRED                         no     Tue Oct 27 22:28:48 2020
        yes    ce:93:1d:e3:08:1b  4002   border01          vniBLUE                        no     Tue Oct 27 22:28:48 2020
        

        View MAC Addresses for a Given Device

        1. Select the Menu.

        2. Under the Network section, select MACs.

        3. Click Filters and enter a hostname:

        filter dialog prompting user to enter a hostname
        1. Click Apply.

        Use the netq <hostname> show macs command to view MAC address on a given device.

        This example shows all MAC addresses on the leaf03 switch.

        cumulus@switch:~$ netq leaf03 show macs
        Matching mac records:
        Origin MAC Address        VLAN   Hostname          Egress Port                    Remote Last Changed
        ------ ------------------ ------ ----------------- ------------------------------ ------ -------------------------
        yes    2e:3d:b4:55:40:ba  4002   leaf03            vniBLUE                        no     Tue Oct 27 22:28:24 2020
        no     44:38:39:00:00:5e  20     leaf03            peerlink                       no     Tue Oct 27 22:28:24 2020
        no     46:38:39:00:00:46  20     leaf03            bond2                          no     Tue Oct 27 22:28:24 2020
        yes    44:38:39:00:00:5d  4001   leaf03            bridge                         no     Tue Oct 27 22:28:24 2020
        yes    00:00:00:00:00:1a  10     leaf03            bridge                         no     Tue Oct 27 22:28:24 2020
        yes    44:38:39:00:00:5d  30     leaf03            bridge                         no     Tue Oct 27 22:28:24 2020
        yes    26:6e:54:35:3b:28  4001   leaf03            vniRED                         no     Tue Oct 27 22:28:24 2020
        no     44:38:39:00:00:37  30     leaf03            vni30                          no     Tue Oct 27 22:28:24 2020
        no     44:38:39:00:00:59  30     leaf03            vni30                          no     Tue Oct 27 22:28:24 2020
        yes    72:78:e6:4e:3d:4c  20     leaf03            vni20                          no     Tue Oct 27 22:28:24 2020
        no     44:38:39:00:00:36  30     leaf03            vni30                          yes    Tue Oct 27 22:28:24 2020
        no     44:38:39:00:00:59  20     leaf03            vni20                          no     Tue Oct 27 22:28:24 2020
        no     44:38:39:00:00:37  20     leaf03            vni20                          no     Tue Oct 27 22:28:24 2020
        no     44:38:39:00:00:59  10     leaf03            vni10                          no     Tue Oct 27 22:28:24 2020
        no     44:38:39:00:00:37  10     leaf03            vni10                          no     Tue Oct 27 22:28:24 2020
        no     46:38:39:00:00:48  30     leaf03            bond3                          no     Tue Oct 27 22:28:24 2020
        no     46:38:39:00:00:38  10     leaf03            vni10                          yes    Tue Oct 27 22:28:24 2020
        yes    36:99:0d:48:51:41  10     leaf03            vni10                          no     Tue Oct 27 22:28:24 2020
        yes    1a:6e:d8:ed:d2:04  30     leaf03            vni30                          no     Tue Oct 27 22:28:24 2020
        no     46:38:39:00:00:36  30     leaf03            vni30                          yes    Tue Oct 27 22:28:24 2020
        no     44:38:39:00:00:5e  30     leaf03            peerlink                       no     Tue Oct 27 22:28:24 2020
        no     44:38:39:00:00:3e  10     leaf03            bond1                          no     Tue Oct 27 22:28:24 2020
        no     44:38:39:00:00:34  20     leaf03            vni20                          yes    Tue Oct 27 22:28:24 2020
        no     44:38:39:00:00:5e  10     leaf03            peerlink                       no     Tue Oct 27 22:28:24 2020
        no     46:38:39:00:00:3c  30     leaf03            vni30                          yes    Tue Oct 27 22:28:24 2020
        no     46:38:39:00:00:3e  10     leaf03            bond1                          no     Tue Oct 27 22:28:24 2020
        no     46:38:39:00:00:34  20     leaf03            vni20                          yes    Tue Oct 27 22:28:24 2020
        no     44:38:39:00:00:42  30     leaf03            bond3                          no     Tue Oct 27 22:28:24 2020
        yes    44:38:39:00:00:5d  4002   leaf03            bridge                         no     Tue Oct 27 22:28:24 2020
        yes    44:38:39:00:00:5d  20     leaf03            bridge                         no     Tue Oct 27 22:28:24 2020
        yes    44:38:39:be:ef:bb  4002   leaf03            bridge                         no     Tue Oct 27 22:28:24 2020
        no     44:38:39:00:00:32  10     leaf03            vni10                          yes    Tue Oct 27 22:28:24 2020
        yes    44:38:39:00:00:5d  10     leaf03            bridge                         no     Tue Oct 27 22:28:24 2020
        yes    00:00:00:00:00:1b  20     leaf03            bridge                         no     Tue Oct 27 22:28:24 2020
        no     46:38:39:00:00:44  10     leaf03            bond1                          no     Tue Oct 27 22:28:24 2020
        no     46:38:39:00:00:42  30     leaf03            bond3                          no     Tue Oct 27 22:28:24 2020
        yes    44:38:39:be:ef:bb  4001   leaf03            bridge                         no     Tue Oct 27 22:28:24 2020
        yes    00:00:00:00:00:1c  30     leaf03            bridge                         no     Tue Oct 27 22:28:24 2020
        no     46:38:39:00:00:32  10     leaf03            vni10                          yes    Tue Oct 27 22:28:24 2020
        no     44:38:39:00:00:40  20     leaf03            bond2                          no     Tue Oct 27 22:28:24 2020
        no     46:38:39:00:00:3a  20     leaf03            vni20                          yes    Tue Oct 27 22:28:24 2020
        no     46:38:39:00:00:40  20     leaf03            bond2                          no     Tue Oct 27 22:28:24 2020
        

        View MAC Addresses Associated with a VLAN

        1. Select the Menu.

        2. Under the Network section, select MACs.

        3. Click Filters and enter a VLAN ID.

        4. Click Apply.

        5. (Optional) Select Filters and add the additional hostname filter to view the MAC addresses for a VLAN on a particular device.

        Use the netq show macs command with the vlan option to view the MAC addresses for a given VLAN.

        This example shows the MAC addresses associated with VLAN 10.

        cumulus@switch:~$ netq show macs vlan 10
        Matching mac records:
        Origin MAC Address        VLAN   Hostname          Egress Port                    Remote Last Changed
        ------ ------------------ ------ ----------------- ------------------------------ ------ -------------------------
        yes    00:00:00:00:00:1a  10     leaf04            bridge                         no     Tue Oct 27 22:29:07 2020
        no     44:38:39:00:00:37  10     leaf04            vni10                          no     Tue Oct 27 22:29:07 2020
        no     44:38:39:00:00:59  10     leaf04            vni10                          no     Tue Oct 27 22:29:07 2020
        no     46:38:39:00:00:38  10     leaf04            vni10                          yes    Tue Oct 27 22:29:07 2020
        no     44:38:39:00:00:3e  10     leaf04            bond1                          no     Tue Oct 27 22:29:07 2020
        no     46:38:39:00:00:3e  10     leaf04            bond1                          no     Tue Oct 27 22:29:07 2020
        yes    44:38:39:00:00:5e  10     leaf04            bridge                         no     Tue Oct 27 22:29:07 2020
        no     44:38:39:00:00:32  10     leaf04            vni10                          yes    Tue Oct 27 22:29:07 2020
        no     44:38:39:00:00:5d  10     leaf04            peerlink                       no     Tue Oct 27 22:29:07 2020
        no     46:38:39:00:00:44  10     leaf04            bond1                          no     Tue Oct 27 22:29:07 2020
        no     46:38:39:00:00:32  10     leaf04            vni10                          yes    Tue Oct 27 22:29:07 2020
        yes    36:ae:d2:23:1d:8c  10     leaf04            vni10                          no     Tue Oct 27 22:29:07 2020
        yes    00:00:00:00:00:1a  10     leaf03            bridge                         no     Tue Oct 27 22:28:24 2020
        no     44:38:39:00:00:59  10     leaf03            vni10                          no     Tue Oct 27 22:28:24 2020
        no     44:38:39:00:00:37  10     leaf03            vni10                          no     Tue Oct 27 22:28:24 2020
        no     46:38:39:00:00:38  10     leaf03            vni10                          yes    Tue Oct 27 22:28:24 2020
        yes    36:99:0d:48:51:41  10     leaf03            vni10                          no     Tue Oct 27 22:28:24 2020
        no     44:38:39:00:00:3e  10     leaf03            bond1                          no     Tue Oct 27 22:28:24 2020
        no     44:38:39:00:00:5e  10     leaf03            peerlink                       no     Tue Oct 27 22:28:24 2020
        no     46:38:39:00:00:3e  10     leaf03            bond1                          no     Tue Oct 27 22:28:24 2020
        no     44:38:39:00:00:32  10     leaf03            vni10                          yes    Tue Oct 27 22:28:24 2020
        yes    44:38:39:00:00:5d  10     leaf03            bridge                         no     Tue Oct 27 22:28:24 2020
        no     46:38:39:00:00:44  10     leaf03            bond1                          no     Tue Oct 27 22:28:24 2020
        no     46:38:39:00:00:32  10     leaf03            vni10                          yes    Tue Oct 27 22:28:24 2020
        yes    00:00:00:00:00:1a  10     leaf02            bridge                         no     Tue Oct 27 22:28:51 2020
        no     44:38:39:00:00:59  10     leaf02            peerlink                       no     Tue Oct 27 22:28:51 2020
        yes    44:38:39:00:00:37  10     leaf02            bridge                         no     Tue Oct 27 22:28:51 2020
        no     46:38:39:00:00:38  10     leaf02            bond1                          no     Tue Oct 27 22:28:51 2020
        no     44:38:39:00:00:3e  10     leaf02            vni10                          yes    Tue Oct 27 22:28:51 2020
        no     46:38:39:00:00:3e  10     leaf02            vni10                          yes    Tue Oct 27 22:28:51 2020
        no     44:38:39:00:00:5e  10     leaf02            vni10                          no     Tue Oct 27 22:28:51 2020
        no     44:38:39:00:00:5d  10     leaf02            vni10                          no     Tue Oct 27 22:28:51 2020
        no     44:38:39:00:00:32  10     leaf02            bond1                          no     Tue Oct 27 22:28:51 2020
        no     46:38:39:00:00:44  10     leaf02            vni10                          yes    Tue Oct 27 22:28:51 2020
        no     46:38:39:00:00:32  10     leaf02            bond1                          no     Tue Oct 27 22:28:51 2020
        yes    4a:32:30:8c:13:08  10     leaf02            vni10                          no     Tue Oct 27 22:28:51 2020
        yes    00:00:00:00:00:1a  10     leaf01            bridge                         no     Tue Oct 27 22:28:42 2020
        no     44:38:39:00:00:37  10     leaf01            peerlink                       no     Tue Oct 27 22:28:42 2020
        yes    44:38:39:00:00:59  10     leaf01            bridge                         no     Tue Oct 27 22:28:42 2020
        no     46:38:39:00:00:38  10     leaf01            bond1                          no     Tue Oct 27 22:28:42 2020
        no     44:38:39:00:00:3e  10     leaf01            vni10                          yes    Tue Oct 27 22:28:43 2020
        no     46:38:39:00:00:3e  10     leaf01            vni10                          yes    Tue Oct 27 22:28:42 2020
        no     44:38:39:00:00:5e  10     leaf01            vni10                          no     Tue Oct 27 22:28:42 2020
        no     44:38:39:00:00:5d  10     leaf01            vni10                          no     Tue Oct 27 22:28:42 2020
        no     44:38:39:00:00:32  10     leaf01            bond1                          no     Tue Oct 27 22:28:43 2020
        no     46:38:39:00:00:44  10     leaf01            vni10                          yes    Tue Oct 27 22:28:43 2020
        no     46:38:39:00:00:32  10     leaf01            bond1                          no     Tue Oct 27 22:28:42 2020
        yes    52:37:ca:35:d3:70  10     leaf01            vni10                          no     Tue Oct 27 22:28:42 2020
        

        Use the netq show macs command with the hostname and vlan options to view the MAC addresses for a given VLAN on a particular device.

        This example shows the MAC addresses associated with VLAN 10 on the leaf02 switch.

        cumulus@switch:~$ netq leaf02 show macs vlan 10
        Matching mac records:
        Origin MAC Address        VLAN   Hostname          Egress Port                    Remote Last Changed
        ------ ------------------ ------ ----------------- ------------------------------ ------ -------------------------
        yes    00:00:00:00:00:1a  10     leaf02            bridge                         no     Tue Oct 27 22:28:51 2020
        no     44:38:39:00:00:59  10     leaf02            peerlink                       no     Tue Oct 27 22:28:51 2020
        yes    44:38:39:00:00:37  10     leaf02            bridge                         no     Tue Oct 27 22:28:51 2020
        no     46:38:39:00:00:38  10     leaf02            bond1                          no     Tue Oct 27 22:28:51 2020
        no     44:38:39:00:00:3e  10     leaf02            vni10                          yes    Tue Oct 27 22:28:51 2020
        no     46:38:39:00:00:3e  10     leaf02            vni10                          yes    Tue Oct 27 22:28:51 2020
        no     44:38:39:00:00:5e  10     leaf02            vni10                          no     Tue Oct 27 22:28:51 2020
        no     44:38:39:00:00:5d  10     leaf02            vni10                          no     Tue Oct 27 22:28:51 2020
        no     44:38:39:00:00:32  10     leaf02            bond1                          no     Tue Oct 27 22:28:51 2020
        no     46:38:39:00:00:44  10     leaf02            vni10                          yes    Tue Oct 27 22:28:51 2020
        no     46:38:39:00:00:32  10     leaf02            bond1                          no     Tue Oct 27 22:28:51 2020
        yes    4a:32:30:8c:13:08  10     leaf02            vni10                          no     Tue Oct 27 22:28:51 2020
        

        View MAC Addresses Associated with an Egress Port

        1. Select the Menu.

        2. Under the Network section, select MACs.

        3. Locate the Egress port column. Hover over the column header and select it to sort A-Z or Z-A order of the egress port used by a MAC address.

        4. (Optional) Click Filters and enter a hostname to view the MAC addresses on a particular device.

        Use the netq <hostname> show macs egress-port <egress-port> command to view the MAC addresses on a given device that use a given egress port. Note that you cannot view this information across all devices.

        This example shows MAC addresses associated with the leaf03 switch that use the bridge port for egress.

        cumulus@switch:~$ netq leaf03 show macs egress-port bridge
        Matching mac records:
        Origin MAC Address        VLAN   Hostname          Egress Port                    Remote Last Changed
        ------ ------------------ ------ ----------------- ------------------------------ ------ -------------------------
        yes    44:38:39:00:00:5d  4001   leaf03            bridge                         no     Tue Oct 27 22:28:24 2020
        yes    00:00:00:00:00:1a  10     leaf03            bridge                         no     Tue Oct 27 22:28:24 2020
        yes    44:38:39:00:00:5d  30     leaf03            bridge                         no     Tue Oct 27 22:28:24 2020
        yes    44:38:39:00:00:5d  4002   leaf03            bridge                         no     Tue Oct 27 22:28:24 2020
        yes    44:38:39:00:00:5d  20     leaf03            bridge                         no     Tue Oct 27 22:28:24 2020
        yes    44:38:39:be:ef:bb  4002   leaf03            bridge                         no     Tue Oct 27 22:28:24 2020
        yes    44:38:39:00:00:5d  10     leaf03            bridge                         no     Tue Oct 27 22:28:24 2020
        yes    00:00:00:00:00:1b  20     leaf03            bridge                         no     Tue Oct 27 22:28:24 2020
        yes    44:38:39:be:ef:bb  4001   leaf03            bridge                         no     Tue Oct 27 22:28:24 2020
        yes    00:00:00:00:00:1c  30     leaf03            bridge                         no     Tue Oct 27 22:28:24 2020
        

        View MAC Addresses Associated with VRR Configurations

        You can view all MAC addresses associated with your VRR (virtual router reflector) interface configuration using the netq show interfaces type macvlan command. This is useful for determining if the specified MAC address inside a VLAN is the same or different across your VRR configuration.

        cumulus@switch:~$ netq show interfaces type macvlan
        Matching link records:
        Hostname          Interface                 Type             State      VRF             Details                             Last Changed
        ----------------- ------------------------- ---------------- ---------- --------------- ----------------------------------- -------------------------
        leaf01            vlan10-v0                 macvlan          up         RED             MAC: 00:00:00:00:00:1a,             Tue Oct 27 22:28:42 2020
                                                                                                Mode: Private
        leaf01            vlan20-v0                 macvlan          up         RED             MAC: 00:00:00:00:00:1b,             Tue Oct 27 22:28:42 2020
                                                                                                Mode: Private
        leaf01            vlan30-v0                 macvlan          up         BLUE            MAC: 00:00:00:00:00:1c,             Tue Oct 27 22:28:42 2020
                                                                                                Mode: Private
        leaf02            vlan10-v0                 macvlan          up         RED             MAC: 00:00:00:00:00:1a,             Tue Oct 27 22:28:51 2020
                                                                                                Mode: Private
        leaf02            vlan20-v0                 macvlan          up         RED             MAC: 00:00:00:00:00:1b,             Tue Oct 27 22:28:51 2020
                                                                                                Mode: Private
        leaf02            vlan30-v0                 macvlan          up         BLUE            MAC: 00:00:00:00:00:1c,             Tue Oct 27 22:28:51 2020
                                                                                                Mode: Private
        leaf03            vlan10-v0                 macvlan          up         RED             MAC: 00:00:00:00:00:1a,             Tue Oct 27 22:28:23 2020
                                                                                                Mode: Private
        leaf03            vlan20-v0                 macvlan          up         RED             MAC: 00:00:00:00:00:1b,             Tue Oct 27 22:28:23 2020
                                                                                                Mode: Private
        leaf03            vlan30-v0                 macvlan          up         BLUE            MAC: 00:00:00:00:00:1c,             Tue Oct 27 22:28:23 2020
                                                                                                Mode: Private
        leaf04            vlan10-v0                 macvlan          up         RED             MAC: 00:00:00:00:00:1a,             Tue Oct 27 22:29:06 2020
                                                                                                Mode: Private
        leaf04            vlan20-v0                 macvlan          up         RED             MAC: 00:00:00:00:00:1b,             Tue Oct 27 22:29:06 2020
                                                                                                Mode: Private
        leaf04            vlan30-v0                 macvlan          up         BLUE            MAC: 00:00:00:00:00:1c,             Tue Oct 27 22:29:06 2020
                                                                                                Mode: Private
        

        View the History of a MAC Address

        It is useful when debugging to be able to see whether a MAC address is learned, where it moved in the network after that, if there was a duplicate at any time, and so forth. The netq show mac-history command makes this information available. It enables you to see:

        The default time range used is now to one hour ago. You can view the output in JSON format as well.

        View MAC Address Changes in Chronological Order

        View the full listing of changes for a MAC address for the last hour in chronological order using the netq show mac-history command.

        This example shows how to view a full chronology of changes for a MAC address of 44:38:39:00:00:5d. When shown, the caret (^) notation indicates no change in this value from the row above.

        cumulus@switch:~$ netq show mac-history 44:38:39:00:00:5d
        Matching machistory records:
        Last Changed              Hostname          VLAN   Origin Link             Destination            Remote Static
        ------------------------- ----------------- ------ ------ ---------------- ---------------------- ------ ------------
        Tue Oct 27 22:28:24 2020  leaf03            10     yes    bridge                                  no     no
        Tue Oct 27 22:28:42 2020  leaf01            10     no     vni10            10.0.1.2               no     yes
        Tue Oct 27 22:28:51 2020  leaf02            10     no     vni10            10.0.1.2               no     yes
        Tue Oct 27 22:29:07 2020  leaf04            10     no     peerlink                                no     yes
        Tue Oct 27 22:28:24 2020  leaf03            4002   yes    bridge                                  no     no
        Tue Oct 27 22:28:24 2020  leaf03            0      yes    peerlink                                no     no
        Tue Oct 27 22:28:24 2020  leaf03            20     yes    bridge                                  no     no
        Tue Oct 27 22:28:42 2020  leaf01            20     no     vni20            10.0.1.2               no     yes
        Tue Oct 27 22:28:51 2020  leaf02            20     no     vni20            10.0.1.2               no     yes
        Tue Oct 27 22:29:07 2020  leaf04            20     no     peerlink                                no     yes
        Tue Oct 27 22:28:24 2020  leaf03            4001   yes    bridge                                  no     no
        Tue Oct 27 22:28:24 2020  leaf03            30     yes    bridge                                  no     no
        Tue Oct 27 22:28:42 2020  leaf01            30     no     vni30            10.0.1.2               no     yes
        Tue Oct 27 22:28:51 2020  leaf02            30     no     vni30            10.0.1.2               no     yes
        Tue Oct 27 22:29:07 2020  leaf04            30     no     peerlink                                no     yes
        

        View MAC Address Changes for a Given Time Frame

        View a listing of changes for a MAC address for a given timeframe using the netq show mac-history command with the between option. When shown, the caret (^) notation indicates no change in this value from the row above.

        This example shows changes for a MAC address of 44:38:39:00:00:5d between now three and seven days ago.

        cumulus@switch:~$ netq show mac-history 44:38:39:00:00:5d between 3d and 7d
        Matching machistory records:
        Last Changed              Hostname          VLAN   Origin Link             Destination            Remote Static
        ------------------------- ----------------- ------ ------ ---------------- ---------------------- ------ ------------
        Tue Oct 20 22:28:19 2020  leaf03            10     yes    bridge                                  no     no
        Tue Oct 20 22:28:24 2020  leaf01            10     no     vni10            10.0.1.2               no     yes
        Tue Oct 20 22:28:37 2020  leaf02            10     no     vni10            10.0.1.2               no     yes
        Tue Oct 20 22:28:53 2020  leaf04            10     no     peerlink                                no     yes
        Wed Oct 21 22:28:19 2020  leaf03            10     yes    bridge                                  no     no
        Wed Oct 21 22:28:26 2020  leaf01            10     no     vni10            10.0.1.2               no     yes
        Wed Oct 21 22:28:44 2020  leaf02            10     no     vni10            10.0.1.2               no     yes
        Wed Oct 21 22:28:55 2020  leaf04            10     no     peerlink                                no     yes
        Thu Oct 22 22:28:20 2020  leaf03            10     yes    bridge                                  no     no
        Thu Oct 22 22:28:28 2020  leaf01            10     no     vni10            10.0.1.2               no     yes
        Thu Oct 22 22:28:45 2020  leaf02            10     no     vni10            10.0.1.2               no     yes
        Thu Oct 22 22:28:57 2020  leaf04            10     no     peerlink                                no     yes
        Fri Oct 23 22:28:21 2020  leaf03            10     yes    bridge                                  no     no
        Fri Oct 23 22:28:29 2020  leaf01            10     no     vni10            10.0.1.2               no     yes
        Fri Oct 23 22:28:45 2020  leaf02            10     no     vni10            10.0.1.2               no     yes
        Fri Oct 23 22:28:58 2020  leaf04            10     no     peerlink                                no     yes
        Sat Oct 24 22:28:28 2020  leaf03            10     yes    bridge                                  no     no
        Sat Oct 24 22:28:29 2020  leaf01            10     no     vni10            10.0.1.2               no     yes
        Sat Oct 24 22:28:45 2020  leaf02            10     no     vni10            10.0.1.2               no     yes
        Sat Oct 24 22:28:59 2020  leaf04            10     no     peerlink                                no     yes
        Tue Oct 20 22:28:19 2020  leaf03            4002   yes    bridge                                  no     no
        Tue Oct 20 22:28:19 2020  leaf03            0      yes    peerlink                                no     no
        Tue Oct 20 22:28:19 2020  leaf03            20     yes    bridge                                  no     no
        Tue Oct 20 22:28:24 2020  leaf01            20     no     vni20            10.0.1.2               no     yes
        Tue Oct 20 22:28:37 2020  leaf02            20     no     vni20            10.0.1.2               no     yes
        Tue Oct 20 22:28:53 2020  leaf04            20     no     peerlink                                no     yes
        Wed Oct 21 22:28:19 2020  leaf03            20     yes    bridge                                  no     no
        Wed Oct 21 22:28:26 2020  leaf01            20     no     vni20            10.0.1.2               no     yes
        Wed Oct 21 22:28:44 2020  leaf02            20     no     vni20            10.0.1.2               no     yes
        Wed Oct 21 22:28:55 2020  leaf04            20     no     peerlink                                no     yes
        Thu Oct 22 22:28:20 2020  leaf03            20     yes    bridge                                  no     no
        Thu Oct 22 22:28:28 2020  leaf01            20     no     vni20            10.0.1.2               no     yes
        Thu Oct 22 22:28:45 2020  leaf02            20     no     vni20            10.0.1.2               no     yes
        Thu Oct 22 22:28:57 2020  leaf04            20     no     peerlink                                no     yes
        Fri Oct 23 22:28:21 2020  leaf03            20     yes    bridge                                  no     no
        Fri Oct 23 22:28:29 2020  leaf01            20     no     vni20            10.0.1.2               no     yes
        Fri Oct 23 22:28:45 2020  leaf02            20     no     vni20            10.0.1.2               no     yes
        Fri Oct 23 22:28:58 2020  leaf04            20     no     peerlink                                no     yes
        Sat Oct 24 22:28:28 2020  leaf03            20     yes    bridge                                  no     no
        Sat Oct 24 22:28:29 2020  leaf01            20     no     vni20            10.0.1.2               no     yes
        Sat Oct 24 22:28:45 2020  leaf02            20     no     vni20            10.0.1.2               no     yes
        Sat Oct 24 22:28:59 2020  leaf04            20     no     peerlink                                no     yes
        Tue Oct 20 22:28:19 2020  leaf03            4001   yes    bridge                                  no     no
        Tue Oct 20 22:28:19 2020  leaf03            30     yes    bridge                                  no     no
        Tue Oct 20 22:28:24 2020  leaf01            30     no     vni30            10.0.1.2               no     yes
        Tue Oct 20 22:28:37 2020  leaf02            30     no     vni30            10.0.1.2               no     yes
        Tue Oct 20 22:28:53 2020  leaf04            30     no     peerlink                                no     yes
        Wed Oct 21 22:28:19 2020  leaf03            30     yes    bridge                                  no     no
        Wed Oct 21 22:28:26 2020  leaf01            30     no     vni30            10.0.1.2               no     yes
        Wed Oct 21 22:28:44 2020  leaf02            30     no     vni30            10.0.1.2               no     yes
        Wed Oct 21 22:28:55 2020  leaf04            30     no     peerlink                                no     yes
        Thu Oct 22 22:28:20 2020  leaf03            30     yes    bridge                                  no     no
        Thu Oct 22 22:28:28 2020  leaf01            30     no     vni30            10.0.1.2               no     yes
        Thu Oct 22 22:28:45 2020  leaf02            30     no     vni30            10.0.1.2               no     yes
        Thu Oct 22 22:28:57 2020  leaf04            30     no     peerlink                                no     yes
        Fri Oct 23 22:28:21 2020  leaf03            30     yes    bridge                                  no     no
        Fri Oct 23 22:28:29 2020  leaf01            30     no     vni30            10.0.1.2               no     yes
        Fri Oct 23 22:28:45 2020  leaf02            30     no     vni30            10.0.1.2               no     yes
        Fri Oct 23 22:28:58 2020  leaf04            30     no     peerlink                                no     yes
        Sat Oct 24 22:28:28 2020  leaf03            30     yes    bridge                                  no     no
        Sat Oct 24 22:28:29 2020  leaf01            30     no     vni30            10.0.1.2               no     yes
        Sat Oct 24 22:28:45 2020  leaf02            30     no     vni30            10.0.1.2               no     yes
        Sat Oct 24 22:28:59 2020  leaf04            30     no     peerlink                                no     yes
        

        View Only the Differences in MAC Address Changes

        Instead of viewing the full chronology of change made for a MAC address within a given timeframe, you can view only the differences between two snapshots using the netq show mac-history command with the diff option. When shown, the caret (^) notation indicates no change in this value from the row above.

        This example shows only the differences in the changes for a MAC address of 44:38:39:00:00:5d between now and an hour ago.

        cumulus@switch:~$ netq show mac-history 44:38:39:00:00:5d diff
        Matching machistory records:
        Last Changed              Hostname          VLAN   Origin Link             Destination            Remote Static
        ------------------------- ----------------- ------ ------ ---------------- ---------------------- ------ ------------
        Tue Oct 27 22:29:07 2020  leaf04            30     no     peerlink                                no     yes
        

        This example shows only the differences in the changes for a MAC address of 44:38:39:00:00:5d between now and 30 days ago.

        cumulus@switch:~$ netq show mac-history 44:38:39:00:00:5d diff between now and 30d
        Matching machistory records:
        Last Changed              Hostname          VLAN   Origin Link             Destination            Remote Static
        ------------------------- ----------------- ------ ------ ---------------- ---------------------- ------ ------------
        Mon Sep 28 00:02:26 2020  leaf04            30     no     peerlink                                no     no
        Tue Oct 27 22:29:07 2020  leaf04            ^      ^      ^                ^                      ^      yes
        

        View MAC Address Changes by a Given Attribute

        You can order the output of the MAC address changes by many of the attributes associated with the changes that you can make using the netq show mac-history command with the listby option. For example, you can order the output by hostname, link, destination, and so forth.

        This example shows the history of MAC address 44:38:39:00:00:5d ordered by hostname. When shown, the caret (^) notation indicates no change in this value from the row above.

        cumulus@switch:~$ netq show mac-history 44:38:39:00:00:5d listby hostname
        Matching machistory records:
        Last Changed              Hostname          VLAN   Origin Link             Destination            Remote Static
        ------------------------- ----------------- ------ ------ ---------------- ---------------------- ------ ------------
        Tue Oct 27 22:28:51 2020  leaf02            20     no     vni20            10.0.1.2               no     yes
        Tue Oct 27 22:28:24 2020  leaf03            4001   yes    bridge                                  no     no
        Tue Oct 27 22:28:24 2020  leaf03            0      yes    peerlink                                no     no
        Tue Oct 27 22:28:24 2020  leaf03            4002   yes    bridge                                  no     no
        Tue Oct 27 22:28:42 2020  leaf01            10     no     vni10            10.0.1.2               no     yes
        Tue Oct 27 22:29:07 2020  leaf04            10     no     peerlink                                no     yes
        Tue Oct 27 22:29:07 2020  leaf04            30     no     peerlink                                no     yes
        Tue Oct 27 22:28:42 2020  leaf01            30     no     vni30            10.0.1.2               no     yes
        Tue Oct 27 22:28:42 2020  leaf01            20     no     vni20            10.0.1.2               no     yes
        Tue Oct 27 22:28:51 2020  leaf02            10     no     vni10            10.0.1.2               no     yes
        Tue Oct 27 22:29:07 2020  leaf04            20     no     peerlink                                no     yes
        Tue Oct 27 22:28:51 2020  leaf02            30     no     vni30            10.0.1.2               no     yes
        Tue Oct 27 22:28:24 2020  leaf03            10     yes    bridge                                  no     no
        Tue Oct 27 22:28:24 2020  leaf03            20     yes    bridge                                  no     no
        Tue Oct 27 22:28:24 2020  leaf03            30     yes    bridge                                  no     no
        

        View MAC Address Changes for a Given VLAN

        View a listing of changes for a MAC address for a given VLAN using the netq show mac-history command with the vlan option. When shown, the caret (^) notation indicates no change in this value from the row above.

        This example shows changes for a MAC address of 44:38:39:00:00:5d and VLAN 10.

        cumulus@switch:~$ netq show mac-history 44:38:39:00:00:5d vlan 10
        Matching machistory records:
        Last Changed              Hostname          VLAN   Origin Link             Destination            Remote Static
        ------------------------- ----------------- ------ ------ ---------------- ---------------------- ------ ------------
        Tue Oct 27 22:28:24 2020  leaf03            10     yes    bridge                                  no     no
        Tue Oct 27 22:28:42 2020  leaf01            10     no     vni10            10.0.1.2               no     yes
        Tue Oct 27 22:28:51 2020  leaf02            10     no     vni10            10.0.1.2               no     yes
        Tue Oct 27 22:29:07 2020  leaf04            10     no     peerlink                                no     yes
        

        View MAC Address Commentary

        You can get more descriptive information about changes to a given MAC address on a specific VLAN. Commentary is available for the following MAC address-related events based on their classification (refer to the definition of these at the beginning of this topic):

        Event Triggers Example Commentary
        A MAC address is created, or the MAC address on the interface is changed via the hwaddress option in /etc/network/interface leaf01 00:00:5e:00:00:03 configured on interface vlan1000-v0
        An interface becomes a slave in, or is removed from, a bond leaf01 00:00:5e:00:00:03 configured on interface vlan1000-v0
        An interface is a bridge and it inherits a different MAC address due to a membership change leaf01 00:00:5e:00:00:03 configured on interface vlan1000-v0
        A remote MAC address is learned or installed by control plane on a tunnel interface 44:38:39:00:00:5d learned/installed on vni vni10 pointing to remote dest 10.0.1.34
        A remote MAC address is flushed or expires leaf01 44:38:39:00:00:5d is flushed or expired
        A remote MAC address moves from behind one remote switch to another remote switch or becomes a local MAC address leaf02: 00:08:00:00:aa:13 moved from remote dest 27.0.0.22 to remote dest 27.0.0.34
        00:08:00:00:aa:13 moved from remote dest 27.0.0.22 to local interface hostbond2
        A MAC address is learned at the first-hop switch (or MLAG switch pair) leaf04 (and MLAG peer leaf05): 44:38:39:00:00:5d learned on first hop switch, pointing to local interface bond4
        A local MAC address is flushed or expires leaf04 (and MLAG peer leaf05) 44:38:39:00:00:5d is flushed or expires from bond4
        A local MAC address moves from one interface to another interface or to another switch leaf04: 00:08:00:00:aa:13 moved from hostbond2 to hostbond3
        00:08:00:00:aa:13 moved from hostbond2 to remote dest 27.0.0.13

        To view MAC address commentary:

        1. Select the Menu.

        2. Under the Network heading, select MACs.

        3. Select the checkbox next to one of the entries, then select Open card above the table.

        4. Choose a time range, then click Continue.

        5. You can scroll through the list to see comments related to the MAC address moves and changes:

        MAC move commentary card displaying 7 results from the past 24 hours
        1. (Optional) From here, you can filter the list by a given device by selecting Filters.
        A red dot on the filter icon indicates that filtering is active. To remove the filter, click again, then click Clear Filter.

        To see MAC address commentary, use the netq show mac-commentary command. The following examples show the commentary seen in common situations.

        MAC Address Configured Locally

        In this example, the 46:38:39:00:00:44 MAC address was configured on the VlanA-1 interface of multiple switches, so we see the MAC configured commentary on each of them.

        cumulus@server-01:~$ netq show mac-commentary 46:38:39:00:00:44 between now and 1hr 
        Matching mac_commentary records:
        Last Updated              Hostname         VLAN   Commentary
        ------------------------- ---------------- ------ --------------------------------------------------------------------------------
        Mon Aug 24 2020 14:14:33  leaf11           100    leaf11: 46:38:39:00:00:44 configured on interface VlanA-1
        Mon Aug 24 2020 14:15:03  leaf12           100    leaf12: 46:38:39:00:00:44 configured on interface VlanA-1
        Mon Aug 24 2020 14:15:19  leaf21           100    leaf21: 46:38:39:00:00:44 configured on interface VlanA-1
        Mon Aug 24 2020 14:15:40  leaf22           100    leaf22: 46:38:39:00:00:44 configured on interface VlanA-1
        Mon Aug 24 2020 14:15:19  leaf21           1003   leaf21: 46:38:39:00:00:44 configured on interface VlanA-1
        Mon Aug 24 2020 14:15:40  leaf22           1003   leaf22: 46:38:39:00:00:44 configured on interface VlanA-1
        Mon Aug 24 2020 14:16:32  leaf02           1003   leaf02: 00:00:5e:00:01:01 configured on interface VlanA-1
        

        MAC Address Configured on Server and Learned from a Peer

        In this example, the 00:08:00:00:aa:13 MAC address was configured on server01. As a result, both leaf11 and leaf12 learned this address on the next hop interface serv01bond2 (learned locally), whereas, the leaf01 switch learned this address remotely on vx-34 (learned remotely).

        cumulus@server11:~$ netq show mac-commentary 00:08:00:00:aa:13 vlan 1000 between now and 5hr 
        Matching mac_commentary records:
        Last Updated              Hostname         VLAN   Commentary
        ------------------------- ---------------- ------ --------------------------------------------------------------------------------
        Tue Aug 25 2020 10:29:23  leaf12           1000     leaf12: 00:08:00:00:aa:13 learned on first hop switch interface serv01bond2
        Tue Aug 25 2020 10:29:23  leaf11           1000     leaf11: 00:08:00:00:aa:13 learned on first hop switch interface serv01bond2
        Tue Aug 25 2020 10:29:23  leaf01           1000     leaf01: 00:08:00:00:aa:13 learned/installed on vni vx-34 pointing to remote dest 36.0.0.24
        

        MAC Address Removed

        In this example the bridge FDB entry for the 00:02:00:00:00:a0 MAC address, interface VlanA-1, and VLAN 100 was deleted impacting leaf11 and leaf12.

        cumulus@server11:~$ netq show mac-commentary 00:02:00:00:00:a0 vlan 100 between now and 5hr 
        Matching mac_commentary records:
        Last Updated              Hostname         VLAN   Commentary
        ------------------------- ---------------- ------ --------------------------------------------------------------------------------
        Mon Aug 24 2020 14:14:33  leaf11           100    leaf11: 00:02:00:00:00:a0 configured on interface VlanA-1
        Mon Aug 24 2020 14:15:03  leaf12           100    leaf12: 00:02:00:00:00:a0 learned on first hop switch interface peerlink-1
        Tue Aug 25 2020 13:06:52  leaf11           100    leaf11: 00:02:00:00:00:a0 unconfigured on interface VlanA-1
        

        MAC Address Moved on Server and Learned from a Peer

        The MAC address on server11 changed from 00:08:00:00:aa:13. In this example, the MAC learned remotely on leaf01 is now a locally learned MAC address from its local interface swp6. Similarly, the locally learned MAC addresses on leaf11 and leaf12 are now learned from remote dest 27.0.0.22.

        cumulus@server11:~$ netq show mac-commentary 00:08:00:00:aa:13 vlan 1000 between now and 5hr
        Matching mac_commentary records:
        Last Updated              Hostname         VLAN   Commentary
        ------------------------- ---------------- ------ --------------------------------------------------------------------------------
        Tue Aug 25 2020 10:29:23  leaf12           1000   leaf12: 00:08:00:00:aa:13 learned on first hop switch interface serv01bond2
        Tue Aug 25 2020 10:29:23  leaf11           1000   leaf11: 00:08:00:00:aa:13 learned on first hop switch interface serv01bond2
        Tue Aug 25 2020 10:29:23  leaf01           1000   leaf01: 00:08:00:00:aa:13 learned/installed on vni vx-34 pointing to remote dest 36.0.0.24
        Tue Aug 25 2020 10:33:06  leaf01           1000   leaf01: 00:08:00:00:aa:13 moved from remote dest 36.0.0.24 to local interface swp6
        Tue Aug 25 2020 10:33:06  leaf12           1000   leaf12: 00:08:00:00:aa:13 moved from local interface serv01bond2 to remote dest 27.0.0.22
        Tue Aug 25 2020 10:33:06  leaf11           1000   leaf11: 00:08:00:00:aa:13 moved from local interface serv01bond2 to remote dest 27.0.0.22
        

        MAC Address Learned from MLAG Pair

        In this example, after the local first hop learning of the 00:02:00:00:00:1c MAC address on leaf11 and leaf12, the MLAG exchanged the learning on the dually connected interface serv01bond3.

        cumulus@server11:~$ netq show mac-commentary 00:02:00:00:00:1c vlan 105 between now and 2d
        Matching mac_commentary records:
        Last Updated              Hostname         VLAN   Commentary
        ------------------------- ---------------- ------ --------------------------------------------------------------------------------
        Sun Aug 23 2020 14:13:39  leaf11          105    leaf11: 00:02:00:00:00:1c learned on first hop switch interface serv01bond3
        Sun Aug 23 2020 14:14:02  leaf12          105    leaf12: 00:02:00:00:00:1c learned on first hop switch interface serv01bond3
        Sun Aug 23 2020 14:14:16  leaf11          105    leaf11: 00:02:00:00:00:1c moved from interface serv01bond3 to interface serv01bond3
        Sun Aug 23 2020 14:14:23  leaf12          105    leaf12: 00:02:00:00:00:1c learned on MLAG peer dually connected interface serv01bond3
        Sun Aug 23 2020 14:14:37  leaf11          105    leaf11: 00:02:00:00:00:1c learned on MLAG peer dually connected interface serv01bond3
        Sun Aug 23 2020 14:14:39  leaf12          105    leaf12: 00:02:00:00:00:1c moved from interface serv01bond3 to interface serv01bond3
        Sun Aug 23 2020 14:53:31  leaf11          105    leaf11: 00:02:00:00:00:1c learned on MLAG peer dually connected interface serv01bond3
        Mon Aug 24 2020 14:15:03  leaf12          105    leaf12: 00:02:00:00:00:1c learned on MLAG peer dually connected interface serv01bond3
        

        MAC Address Flushed

        In this example, the interface VlanA-1 associated with the 00:02:00:00:00:2d MAC address and VLAN 1008 is deleted, impacting leaf11 and leaf12.

        cumulus@server11:~$ netq show mac-commentary 00:02:00:00:00:2d vlan 1008 between now and 5hr 
        Matching mac_commentary records:
        Last Updated              Hostname         VLAN   Commentary
        ------------------------- ---------------- ------ --------------------------------------------------------------------------------
        Mon Aug 24 2020 14:14:33  leaf11           1008   leaf11:  00:02:00:00:00:2d learned/installed on vni vx-42 pointing to remote dest 27.0.0.22
        Mon Aug 24 2020 14:15:03  leaf12           1008   leaf12:  00:02:00:00:00:2d learned/installed on vni vx-42 pointing to remote dest 27.0.0.22
        Mon Aug 24 2020 14:16:03  leaf01           1008   leaf01:  00:02:00:00:00:2d learned on MLAG peer dually connected interface swp8
        Tue Aug 25 2020 11:36:06  leaf11           1008   leaf11:  00:02:00:00:00:2d is flushed or expired
        Tue Aug 25 2020 11:36:06  leaf11           1008   leaf11:  00:02:00:00:00:2d on vni 1008 remote dest changed to 27.0.0.22
        

        MLAG

        You use Multi-Chassis Link Aggregation (MLAG) to enable a server or switch with a two-port bond (such as a link aggregation group/LAG, EtherChannel, port group or trunk) to connect those ports to different switches and operate as if they have a connection to a single, logical switch. This provides greater redundancy and greater system throughput. Dual-connected devices can create LACP bonds that contain links to each physical switch. Therefore, NetQ supports active-active links from the dual-connected devices even though each switch connects to a different physical switch. For an overview and how to configure MLAG in your network, refer to Multi-Chassis Link Aggregation - MLAG.

        MLAG or CLAG? Other vendors refer to the Cumulus Linux implementation of MLAG as MLAG, MC-LAG or VPC. The NetQ UI uses the MLAG terminology predominantly. However, the management daemon, named clagd, and other options in the code, such as clag-id, remain for historical purposes.

        MLAG Commands

        Monitor MLAG with the following commands. See the command line reference for additional options, definitions, and examples.

        netq show mlag
        netq show events message_type mlag
        

        The netq check mlag command verifies MLAG session consistency by identifying all MLAG peers with errors or misconfigurations in the NetQ domain.

        netq check mlag
        

        View MLAG in the UI

        To add the MLAG card to your workbench, navigate to the header and select Add card > Network services > All MLAG Sessions card > Open cards. This example shows the following for the last 24 hours:

        Expand to the large card for additional MLAG info. By default, the card displays the Sessions summary tab. From here you can see which devices are handling the most MLAG sessions, or select the dropdown to view nodes with the most unestablished MLAG sessions. You can view MLAG-related events by selecting the Events tab.

        Expand the MLAG card to full-screen to view, filter, or export:

        From this table, you can select a row, then click Add card above the table.

        NetQ adds a new, MLAG ‘single-session’ card to your workbench. From this card, you can monitor the number of nodes running the MLAG service, view switches with the most peers alive and not alive, and view events triggered by the MLAG service.

        Monitor a Single MLAG Session

        The MLAG single-session card displays a summary of the MLAG session. In this example, the leaf01 switch plays the primary role in this session with leaf02 and the session is in good health. The heat map tells us that the peer switch has been alive for the entire 24-hour period.

        From this card, you can also view the node role, peer role and state, and MLAG system MAC address which identify the session in further detail.

        Granularity of Data Shown Based on Time Period

        On the medium and large single MLAG session cards, vertically stacked heat maps represent the status of the peers; one for peers that are reachable (alive), and one for peers that are unreachable (not alive). Depending on the time period of data on the card, the number of smaller time blocks used to indicate the status varies. A vertical stack of time blocks, one from each map, includes the results from all checks during that time. The amount of saturation for each block indicates how many peers were alive. If all peers during that time period were alive for the entire time block, then the top block is 100% saturated (white) and the not alive block is zero percent saturated (gray). As peers that are not alive increase in saturation, the amount of saturation diminishes proportionally for peers that are in the alive block. The following table lists the most common time periods, their corresponding number of blocks, and the amount of time represented by one block:

        Time Period Number of Runs Number Time Blocks Amount of Time in Each Block
        6 hours 18 6 1 hour
        12 hours 36 12 1 hour
        24 hours 72 24 1 hour
        1 week 504 7 1 day
        1 month 2,086 30 1 day
        1 quarter 7,000 13 1 week

        View Changes to the MLAG Service Configuration File

        Each time a change is made to the configuration file for the MLAG service, NetQ logs the change and enables you to compare it with the last version using the NetQ UI. This can be useful when you are troubleshooting potential causes for alarms or sessions losing their connections.

        1. From the large single-session card, select the MLAG Configuration File Evolution tab.

        2. Select the time.

        3. Choose between the File view and the Diff view.

          The File view displays the content of the file:

          The Diff view highlights the changes (if any) between this version (on left) and the most recent version (on right) side by side:

        Network Topology

        The network topology dashboard displays a visual representation of your network, showing connections and device information for all monitored nodes. The view allows you to understand your network’s architecture at a high-level, but also lets you isolate individual devices or sections.

        The topology view has been redesigned for 4.8 to better accommodate larger networks with over 50 devices. This feature is in beta and feedback is welcome. You can still access the legacy topology view by following the instructions in the NetQ 4.7 documentation.

        Access the Topology View

        To open the topology view, click Topology in the workbench header. Select Topology Beta to open a full-screen view of your network topology. The UI displays the highest-level view of your network’s topology, showing devices as part of tiers corresponding to your network’s architecture.

        The default view reflects the devices comprising the network. A two-tier architecture is made up of leaf and spine devices; a three-tier architecture is made up of leaf, spine, and super-spine devices. There is an additional ‘unclassified’ tier for devices that do not have a role assigned to them. If your devices appear in this tier, assign roles to them. Then return to the topology view and select Auto arrange.

        view of a networkwide topology displaying connections between devices

        Interact with the Topology

        The topology screen features a main panel displaying tiers or, when zoomed in, the individual devices that comprise the tiers. You can zoom in or out of the topology via the zoom controls at the bottom-right corner of the screen, a mouse with a scroll wheel, or with a trackpad on your computer. You can also adjust the focus by clicking anywhere on the topology and dragging it with your mouse to view a different portion of the network diagram. Above the zoom controls, a smaller screen reflects a macro view of your network and helps with orienting, similar to mapping applications.

        Zoom in and select a device to open a side panel with additional statistics, including interfaces statistics, resource utilization, and events occurring on that device.

        overview of events, protocols, and utilization data for spine 1

        The following data is presented in the side panel for each selected device:

        Node Data Description
        ASIC Name of the ASIC used in the switch. A value of Cumulus Networks VX indicates a virtual machine.
        NetQ Agent status Operational status of the NetQ Agent on the switch (fresh or rotten).
        NetQ Agent version Version ID of the NetQ Agent on the switch.
        OS Name Operating system running on the switch.
        Platform Vendor and name of the switch hardware.
        Interface statistics Transmit and receive data.
        Resource utilization CPU, memory, and disk utilization.
        Events Warning and info events.

        Hovering over a line highlights each end of the connection; the number embedded in the line displays the total number of links. Select the line to open a side panel with additional configuration data, which can be sorted by link pairs.

        side panel displaying configuration data between two nodes

        From the side panel, you can view the following data about links:

        Link Data Description
        Source hostname Switch where the connection originates
        Source Interface Port on the source switch used by the connection
        Peer hostname Switch where the connection ends
        Peer interface Port on the destination switch used by the connection

        Rearrange and Edit the Topology

        You can rearrange the topology’s tiers by selecting Edit at the top of the screen and dragging the tiers into different positions. Click Save to preserve the view or Reset to undo the changes. You can also move devices to other tiers or create new tiers by right-clicking on a device. Through this menu, you can move the device to a different tier or enforce the role's assignment and tier associated with that assignment.

        This menu also displays options to move the device to the unclassified tier or to a new tier. In the example above, the topology consists of three tiers and an unclassified tier. By selecting Move to tier 4, NetQ creates a new tier and places the selected device within it.

        Create Queries to View a Subset of Devices

        You can create queries to segment a topology into smaller, more manageable parts. This can be especially helpful when you need to view a particular section of a very large topology. To create a query, select Queries on the left side of the screen, then Add query. The name of the query is pre-populated with a unique identifier that you can edit by selecting the field.

        You can select between node_name and node_tier to display either a subsection of nodes based on their names or the tiers where they’re located, respectively. Select Add filter group to combine queries with logical operators. For example, the following filter group consists of two queries: one displaying any node containing the letters “tor” and of those nodes, those located in tier three:

        Select the three-dot menu on a given query to either delete or remove the query.

        Lifecycle Management

        Using the NetQ UI or CLI, lifecycle management (LCM) allows you to:

        Access Lifecycle Management in the UI

        You can access the LCM dashboard in a few ways:

        dashboard displaying switch management tab

        Access Lifecycle Management with the CLI

        Lifecycle management workflows use the netq lcm command set. Refer to the command line reference for a comprehensive list of options and definitions.

        LCM Support for In-band Management

        If you manage a switch using an in-band network interface, the inband-interface option must be specified in the agent configuration for LCM operations:

        After the NetQ Agent is configured for in-band connections, you can create custom agent configuration profiles using the CLI, then apply the custom profiles to switches during upgrades.

        NICs

        With the NetQ UI, you can view the attributes of individual network interface controllers (NICs), including their connection adapters and firmware versions. For NIC inventory information, refer to NIC Inventory.

        NIC telemetry for ConnectX adapters is supported for on-premises NetQ deployments. You must have DOCA Telemetry Service enabled and Prometheus targets configured to display NIC data in NetQ.

        View NIC Attributes in the UI

        To view attributes per NIC, open a NIC device card:

        1. Click Devices in the header, then click Open a device card.

        2. Select a NIC from the dropdown.

        3. Click Add to open an individual NIC card on your workbench, displaying ports, packets, and bytes information:

        For a quick look at the key attributes of a particular NIC, expand the NIC card. Attributes are displayed as the default tab on the large NIC card. Select the Interface stats tab at the top of the card to view detailed interface statistics, including frame and carrier errors.

        NIC card displaying transmit and recieve data

        Expand the card to its largest size to view this information as tabular data, which you can filter and export.

        NTP

        Use the CLI to view Network Time Protocol (NTP). The command output displays the time synchronization status for all devices. You can filter for devices that are either in synchronization or out of synchronization, currently or at a time in the past.

        Monitor NTP with the following commands. See the command line reference for additional options, definitions, and examples.

        netq show ntp
        netq show events message_type ntp
        netq show events-config message_type ntp
        

        The netq check ntp command verifies network time synchronization for all nodes (leafs, spines, and hosts) in your network fabric.

        netq check ntp
        

        OSPF

        Use the UI or CLI to monitor Open Shortest Path First (OSPF) on your switches and hosts. For each device, you can view its associated interfaces, areas, peers, state, and type of OSPF running (numbered or unnumbered).

        On switches running Cumulus Linux 5.4.0 and later, NetQ supports OSPF monitoring only on interfaces configured for point-to-point mode and a single IP subnet in the default VRF.

        OSPF Commands

        Monitor OSPF with the following commands. See the command line reference for additional options, definitions, and examples.

        netq show ospf
        netq show events message_type ospf
        netq show events-config message_type ospf
        

        The netq check ospf command checks for consistency across OSPF sessions in your network fabric.

        netq check ospf
        

        View OSPF in the UI

        To add the OSPF card to your workbench, navigate to the header and select Add card > Network services > All OSPF Sessions card > Open cards. In this example, there are 8 nodes running OSPF and no reported events.

        Expand to the large card to display which switches are handling the most OSPF traffic. By default, the card displays the Sessions summary tab and lists switches with established sessions. Select the dropdown to view nodes with the most unestablished OSPF sessions. You can view OSPF-related events by selecting the Events tab.

        Expand the OSPF card to full-screen to view, filter, or export all stored attributes of all switches and hosts running OSPF in your network.

        From the table, you can select a row, then click Open card above the table. NetQ adds a new, OSPF ‘single-session’ card to your workbench. From this card, you can view session state changes and compare them with events, and monitor the running OSPF configuration and changes to the configuration file.

        Monitor a Single OSPF Session

        The OSPF single-session card displays the interface name, peer address, and peer ID that identifies the session. The heat map indicates the stability of the OSPF session between two devices over a period of time. In this example, the session has been established throughout the past 24 hours:

        Understanding the Heat Map

        On the medium and large single OSPF session cards, vertically stacked heat maps represent the status of the sessions; one for established sessions, and one for unestablished sessions. Depending on the time period of data on the card, the number of smaller time blocks used to indicate the status varies. A vertical stack of time blocks, one from each map, includes the results from all checks during that time. The results appear by how saturated the color is for each block. If all sessions during that time period were established for the entire time block, then the top block is 100% saturated (white) and the unestablished block is zero percent saturated (gray). As sessions that are not established increase in saturation, the sessions that are established block is proportionally reduced in saturation. The following example heat map is for a time period of 24 hours, with the most common time periods in the table showing the resulting time blocks.

        Time Period Number of Runs Number Time Blocks Amount of Time in Each Block
        6 hours 18 6 1 hour
        12 hours 36 12 1 hour
        24 hours 72 24 1 hour
        1 week 504 7 1 day
        1 month 2,086 30 1 day
        1 quarter 7,000 13 1 week

        View Changes to the OSPF Service Configuration File

        Each time a change is made to the configuration file for the OSPF service, NetQ logs the change and lets you compare it with the previous version. This can be useful when you are troubleshooting potential causes for events or sessions losing their connections.

        To view the configuration file changes:

        1. From the large single-session card, select the Configuration File Evolution tab.

        2. Select the time.

        3. Select the toggle to display either the File or Diff view. The file view displays the contents of the file and the diff view highlights the changes (if any) between configurations.

          OSPF card displaying configuration file

        PTP

        PTP monitoring is only supported on Spectrum switches running Cumulus Linux version 5.0.0 and later.

        Use the UI or CLI to monitor Precision Time Protocol (PTP), including clock hierarchies and priorities, synchronization thresholds, and accuracy rates.

        PTP Commands

        Monitor PTP with the following commands. See the command line reference for additional options, definitions, and examples.

           netq show ptp clock-details
           netq show ptp counters (tx | rx) 
           netq show ptp global-config
           netq show ptp port-status 
           netq show events message_type ptp
        

        Access the PTP Dashboard

        1. Select Menu.

        2. Under the Network section, select PTP.

        The PTP summary dashboard displays:

        PTP summary screen displaying grandmaster clock details, events total, and violations summary

        Navigate to the Events tab to view, filter, and sort PTP-related events:

        detailed display of 133 PTP events, including list of devices with PTP-related events

        View PTP on a Switch

        1. Select Devices in the workbench header, then click Open a device card.

        2. Select a switch from the dropdown and specify the large card.

        3. Hover over the top of the card and select the PTP icon :

        large switch card with PTP display selected
        1. For more granular data, expand the card to full-size and navigate to PTP:
        full screen graph of a switch's average offsest-from-master and average mean-path-delay statistics

        Hover over the chart at any point to display timestamped mean-path-delay and offset-from-master data. You can drag the bottom bar to expand and compress the period of time displayed in the graph.

        Select the tabs above the chart to display information about domains, clocks, ports, and configurations:

        clock domain, identiy, port, and quality information for the grandmaster clock

        RoCE

        Use the UI or CLI to monitor RDMA over Converged Ethernet (RoCE) for Spectrum switches and BlueField DPUs.

        RoCE Commands

        The following commands display your network’s RoCE configuration, RoCE counters and counter pools, and RoCE-related events. See the command line reference for additional options, definitions, and examples.

        netq show roce-config 
        netq show roce-counters (dpu | nic)
        netq show roce-counters pool
        netq show events message_type tca_roce
        netq show events message_type roceconfig
        

        The netq check roce command checks for consistent RoCE and QoS configurations across all nodes in your network fabric.

        netq check roce
        

        View RoCE Counters Networkwide in the UI

        1. Select the Menu.

        2. Under the RoCE counters heading, select either RoCE switches or RoCE DPUs.

        The RoCE switches tab displays transmit (TX) and receive (RX) counters as well as counter pools for all switches running RoCE in your network.

        The RoCE DPUs tab displays physical port, priority port, RoCE extended, RoCE, and peripheral component interconnect (PCI) information for all DPUs running RoCE in your network.

        View RoCE Counters for a Given Switch

        You can view the following RoCE counters for a given switch:

        To view RoCE counters on a switch, navigate to the header and select Devices, then click Open a device card. Select a switch that is running RoCE and open the large card on your workbench. Click the RoCE icon at the top of the card to view RoCE counters and their associated ports:

        switch card displaying list of ports

        Expand the card to the largest size, then select RoCE counters from the side menu. Use the controls above the table to view, filter, or export counter statistics by Rx, Tx, or Pool.

        Disable RoCE Monitoring

        To disable RoCE monitoring:

        1. Edit /etc/netq/commands/cl4-netq-commands.yml and comment out the following lines:

           cumulus@netq-ts:~$ sudo nano /etc/netq/commands/cl4-netq-commands.yml
          
           #- period: "60"
           #  key: "roce"
           #  isactive: true
           #  command: "/usr/lib/cumulus/mlxcmd --json roce counters"
           #  parser: "local"
          
        2. Delete the /var/run/netq/netq_commands.yml file:

           cumulus@netq-ts:~$ sudo rm /var/run/netq/netq_commands.yml
          
        3. Restart the NetQ Agent:

          cumulus@netq-ts:~$ netq config agent restart
          

        STP

        Use the CLI to view the Spanning Tree Protocol (STP) topology on a bridge or switch.

        Monitor STP with the following command. If you do not have a bridge in your configuration, the output indicates such. See the command line reference for additional options, definitions, and examples.

        netq show stp topology
        

        Switches

        With the NetQ UI and NetQ CLI, you can monitor the health of individual switches, including interface performance and resource utilization.

        NetQ reports switch performance metrics for the following categories:

        For switch inventory information (ASIC, platform, CPU, memory, disk, and OS), refer to Switch Inventory.

        View Switch Metrics and Attributes

        To view events, metrics, and attributes per switch, open the Switch card:

        1. In the header, select Devices, then click Open a device card.

        2. Select a switch from the list:

        dropdown displaying switches
        1. Click Add.

        2. Adjust the card’s size to view information at different levels of granularity.

        Attributes are displayed as the default tab on the large Switch card. You can view the static information about the switch, including its hostname, addresses, server and ASIC vendors and models, OS and NetQ software information. You can also view the state of the interfaces and NetQ Agent on the switch.

        large switch card displaying attributes

        Hover over the top of the card and select the appropriate icon to view utilization info, interface statistics, digital optics info, RoCE metrics, and PTP clock graphs. This example displays utilization information, including CPU, memory, and disk utilization from the past 24 hours:

        large switch card displaying attributes

        Expand the Switch card to full-screen to view, filter, or export information about events, interfaces, MAC addresses, VLANs, IP routes, IP neighbors, IP addresses, BTRFS utilization, software packages, SSD utilization, forwarding resources, ACL resources, What Just Happened events, sensors, RoCE counters, digital optics, PTP, and process monitoring:

        The information available in the UI can also be displayed via the CLI with a corresponding netq show command. Each command that begins with netq show includes the option <hostname>. When the <hostname> option is included in the command, the output displays results limited to the switch or host you specified.

        For example, you can view all events across your network with the netq show events command. To view all events on a particular switch, specify its name in the <hostname> field in netq <hostname> show events. The following example displays all events on the leaf01 switch:

        cumulus@switch:~$ netq leaf01 show events
        
        Matching events records:
        Hostname          Message Type             Severity         Message                             Timestamp
        ----------------- ------------------------ ---------------- ----------------------------------- -------------------------
        leaf01            btrfsinfo                error            data storage efficiency : space lef Wed Sep  2 20:34:31 2020
                                                                    t after allocation greater than chu
                                                                    nk size 0.57 GB
        leaf01            btrfsinfo                error            data storage efficiency : space lef Wed Sep  2 20:04:30 2020
                                                                    t after allocation greater than chu
                                                                    nk size 0.57 GB
        

        Refer to the command line reference for a comprehensive list of netq show commands.

        View CPU and Memory Utilization for Processes and Services

        Use the UI or CLI to visualize which services and processes are consuming the most CPU and memory on a switch. You can add or remove certain services that NetQ monitors using the CLI.

        Process monitoring is only supported on Spectrum switches.

        To visualize CPU and memory utilization at the process level, open a large device card and navigate to the Utilization tab. Then select Show process monitoring data. The UI depicts two charts—one each for CPU and memory utilization—along with a list of services and processes.

        Select a process from the Process name column for its usage data to be reflected in the CPU and memory utilization charts. The data presented is aggregated over a 5-minute period; NetQ lists the process consuming the most CPU resources (aggregated over a 5-minute period or the CPU 5min column) from highest to lowest. The process whose data is reflected in the charts is indicated by an icon next to the name of the process.

        The following graphs depict CPU and memory usage over a 6-hour time period from the system monitor daemon, smond.

        CPU and memory utilization info for the smond service

        The information displayed in the UI can be viewed using the CLI with the netq show services resource-util command:

        cumulus@switch:~$ netq show services resource-util
        
        Matching services records:
        Hostname          Service              PID   VRF                  Enabled Active Uptime               CPU one Minute       CPU five Minute      Memory one Minute    Memory five Minute   Last Updated
        ----------------- -------------------- ----- -------------------- ------- ------ -------------------- -------------------- -------------------- -------------------- -------------------- ------------------------
        r-3700-02         sx_sdk               19012 default              yes     yes    81 day 17h ago       7.7                  24.65                9.44                 9.44                 Tue Jul 18 18:49:19 2023
        r-3700-03         sx_sdk               13627 default              yes     yes    81 day 18h ago       0                    17.82                9.44                 9.44                 Tue Jul 18 18:49:19 2023
        r-3700-02         switchd              21100 default              yes     yes    81 day 17h ago       56.77                15.07                1.13                 1.13                 Tue Jul 18 18:49:19 2023
        r-3700-03         switchd              15768 default              yes     yes    81 day 18h ago       0                    8.28                 1.11                 1.11                 Tue Jul 18 18:49:19 2023
        neo-switch02      sx_sdk               1841  default              yes     yes    2h 29min ago         30.1                 6.55                 9.67                 9.67                 Tue Jul 18 18:49:19 2023
        ufm-switch19      sx_sdk               2343  default              yes     yes    21h 3min ago         5.22                 5.73                 2.84                 2.84                 Tue Jul 18 18:49:19 2023
        ufm-switch29      sx_sdk               2135  default              yes     yes    8 day 4h ago         2.88                 5.73                 9.54                 9.54                 Tue Jul 18 18:49:19 2023
        r-3420-01         sx_sdk               1885  default              yes     yes    9 day 3h ago         5.28                 5.01                 9.3                  9.3                  Tue Jul 18 18:49:19 2023
        ufm-switch29      clagd                7095  default              no      yes    8 day 4h ago         23.57                4.71                 0.63                 0.63                 Tue Jul 18 18:49:19 2023
        r-3700-01         smond                7301  default              yes     yes    9 day 3h ago         0                    4.7                  0.2                  0.2                  Tue Jul 18 18:49:19 2023
        ... 
        

        To configure the NetQ Agent to start monitoring additional services, run netq config add agent services, specifying the services you want the agent to monitor in the command. Restart the agent, then run netq config show agent services to display a list of services that the NetQ Agent is monitoring for CPU and memory usage.

        To stop the agent from monitoring a service run netq config del agent services. Some services and processes cannot be excluded from monitoring.

        To actively monitor process-level CPU and memory utilization, you can create threshold-crossing rules. These rules generate events when a process or service exceeds the utilization limit you defined when creating the rule. Refer to the resource utilization table in the TCA Events Reference for service memory and service CPU utilization event IDs.

        View Queue Lengths in Histograms

        Monitoring queue lengths in your network’s fabric is useful for detecting microbursts which can lead to higher packet latency or buffer congestion. The Cumulus Linux documentation provides a detailed description of ASIC monitoring, including example bin configurations and information on interpreting histogram queue lengths.

        Queue length monitoring is supported on Spectrum switches running Cumulus Linux 5.1 and later. To display queue histogram data, you must set the snapshot file count to at least 120 when you are configuring ASIC monitoring, as described in the Snapshots section in the ASIC monitoring configuration documentation.

        The information available in the UI can also be displayed via the CLI with the netq show histogram command. To view queue histograms in the UI:

        1. Expand the Menu. Under the traffic histograms section, select Queue histogram.

        Devices are grouped according to their roles: superspine, leaf, spine, or exit. If you haven’t assigned roles to your devices, they appear as ‘unassigned.’

        dashboard displaying 6 devices with egress queue lengths as histograms

        Each device is represented by a card that displays its hostname, the port with the longest queue length (displayed horizontally, divided into bins), standard deviation, P95 value across all ports (with an ASIC monitoring configuration), and average queue length. The data updates when you change the time parameters using the controls at the top of the screen. The values reflected in the bins are color-coded, with higher values displayed in darker colors and lower values in lighter colors. Hover over a bin to view its corresponding queue length count.

        dashboard displaying 6 devices with egress queue lengths as histograms

        Select View more to open a dashboard that displays the full range of ports configured to send histogram data along with their associated devices, which are visible when you hover over a section with your cursor. From this view, you can compare devices against each other or the same devices over a different time period. For example, the following view displays switch r-qa-sw-eth-2231 with queue length data from the past minute in the top panel and the past 30 minutes in the bottom panel.

        histogram comparison of the same device with different time parameters

        The y-axis represents bins 0 through 9. The hostname associated with the port is displayed on the x-axis.

        VLAN

        Use the UI or CLI to view Virtual Local Area Network (VLAN) information.

        VLAN Commands

        Monitor VLAN with the following commands. Use these commands to display configuration information, interfaces associated with VLANs, MAC addresses associated with a given VLAN, MAC addresses associated with you vRR (virtual route reflector) interface configuration, and VLAN events. See the command line reference for additional options, definitions, and examples.

        netq show vlan
        netq show interfaces type macvlan
        netq show interfaces type vlan 
        netq show macs
        netq show events message_type vlan 
        

        The netq check vlan command verifies consistency of the VLAN nodes and interfaces across all links in your network fabric:

        netq check vlan
        

        View VLAN in the UI

        To view VLAN information, select the Menu, then VLAN.

        From here you can view a list of switches or hostnames and their associated VLANs, interfaces, SVIs (switch virtual interfaces), and ports.

        To view MAC addresses associated with a given VLAN, select the Menu, then MACs.

        VXLAN

        Use the CLI to monitor Virtual Extensible LAN (VXLAN) and validate overlay communication paths. See the command line reference for additional options, definitions, and examples.

        netq show vxlan
        netq show interfaces type vxlan
        netq show events message_type vxlan
        

        The netq check vxlan command verifies the consistency of the VXLAN nodes and interfaces across all links in your network fabric.

        netq check vxlan
        

        Device Inventory

        This section describes how to monitor your inventory from networkwide and device-specific perspectives. Use the UI or CLI to view all hardware and software components installed and running on switches, hosts, DPUs, and NICs.

        NVLink Quick Start Guide

        System Requirements and Installation

        Follow the installation instructions for the NetQ on-premises deployment with a server cluster arrangement. This arrangement requires 3 worker nodes. Each node requires the following:

        Resource Minimum Requirements
        ProcessorSixteen (16) virtual CPUs
        Memory64 GB RAM
        Local disk storage500 GB SSD with minimum disk IOPS of 1000 for a standard 4kb block size
        (Note: This must be an SSD; use of other storage options can lead to system instability and are not supported.)
        Network interface speed 1 Gb NIC
        HypervisorKVM/QCOW (QEMU Copy on Write) image for servers running CentOS, Ubuntu, and RedHat operating systems

        After ensuring you have the minimum system resource requirements, follow the installation instructions for either a KVM hypervisor or VMware hypervisor.

        After you complete the installation, log in to NetQ as described in the next section.

        Log in to NetQ

        1. Open a new Chrome or Firefox browser window or tab.
        2. Enter the following URL into the address bar: https://<hostname-or-ipaddress>:443
        NetQ login screen
        1. Log in.

          The default username and password for UI access is admin, admin

        After creating a new password and accepting the Terms of Use, the default workbench opens with your username displayed in the upper-right corner.

        Access NVLink4

        NVLink4 features are hidden by default in the NetQ UI. To access these features, run netq features nvl4 enable on your NetQ server’s CLI. Return to the UI and refresh the page. The NVL4 icon should be visible in the header. Select this icon to display the NVLink management dashboard.

        To verify that NVLink4 features are enabled, run netq show features nvl4. You can also hide the NVLink4 features in the UI with netq features nvl4 disable.

        Intro to the NetQ UI

        If you are unfamiliar with NetQ, the following sections provide an overview of the NetQ layout and functionality.

        Validation Tests Reference

        NetQ collects data that validates the health of your network fabric, devices, and interfaces. You can create and run validations with either the NetQ UI or the NetQ CLI. The number of checks and the type of checks are tailored to the particular protocol or element being validated.

        Use the value in the Test Number column in the tables below with the CLI when you want to include or exclude specific tests with the netq check command. You can get the test numbers by running the netq show unit-tests command.

        Addresses Validation Tests

        The duplicate address detection tests look for duplicate IPv4 and IPv6 addresses assigned to interfaces across devices in the inventory. It also checks for duplicate /32 host routes in each VRF.

        Test Number Test Name Description
        0 IPv4 Duplicate Addresses Checks for duplicate IPv4 addresses
        1 IPv6 Duplicate Addresses Checks for duplicate IPv6 addresses

        Agent Validation Tests

        NetQ Agent validation looks for an agent status of rotten for each node in the network. A fresh status indicates the agent is running as expected. The agent sends a ‘heartbeat’ every 30 seconds, and if it does not send three consecutive heartbeats, its status changes to rotten.

        Test Number Test Name Description
        0 Agent Health Checks for nodes that have failed or lost communication

        BGP Validation Tests

        The BGP validation tests look for status and configuration anomalies.

        Test Number Test Name Description
        0 Session Establishment Checks that BGP sessions are in an established state
        1 Address Families Checks if transmit and receive address family advertisement is consistent between peers of a BGP session
        2 Router ID Checks for BGP router ID conflict in the network
        3 Hold Time Checks for mismatch of hold time between peers of a BGP session
        4 Keep Alive Interval Checks for mismatch of keep alive interval between peers of a BGP session
        5 Ipv4 Stale Path Time Checks for mismatch of IPv4 stale path timer between peers of a BGP session
        6 IPv6 Stale Path Time Checks for mismatch of IPv6 stale path timer between peers of a BGP session
        7 Interface MTU Checks for consistency of interface MTU for BGP peers

        Cumulus Linux Version Tests

        The Cumulus Linux version test looks for version consistency.

        Test Number Test Name Description
        0 Cumulus Linux Image Version Checks the following:
        • No version specified, checks that all switches in the network have consistent version
        • match-version specified, checks that a switch’s OS version is equals the specified version
        • min-version specified, checks that a switch’s OS version is equal to or greater than the specified version

        EVPN Validation Tests

        The EVPN validation tests look for status and configuration anomalies.

        Test Number Test Name Description
        0 EVPN BGP Session Checks if:
        • BGP EVPN sessions are established
        • The EVPN address family advertisement is consistent
        1 EVPN VNI Type Consistency Because a VNI can be of type L2 or L3, checks that for a given VNI, its type is consistent across the network
        2 EVPN Type 2 Checks for consistency of IP-MAC binding and the location of a given IP-MAC across all VTEPs
        3 EVPN Type 3 Checks for consistency of replication group across all VTEPs
        4 EVPN Session For each EVPN session, checks if:
        • adv_all_vni is enabled
        • FDB learning is disabled on tunnel interface
        5 VLAN Consistency Checks for consistency of VLAN to VNI mapping across the network
        6 VRF Consistency Checks for consistency of VRF to L3 VNI mapping across the network

        Interface Validation Tests

        The interface validation tests look for consistent configuration between two nodes.

        Test Number Test Name Description
        0 Admin State Checks for consistency of administrative state on two sides of a physical interface
        1 Oper State Checks for consistency of operational state on two sides of a physical interface
        2 Speed Checks for consistency of the speed setting on two sides of a physical interface
        3 Autoneg Checks for consistency of the auto-negotiation setting on two sides of a physical interface

        The link MTU validation tests look for consistency across an interface and appropriate size MTU for VLAN and bridge interfaces.

        Test Number Test Name Description
        0 Link MTU Consistency Checks for consistency of MTU setting on two sides of a physical interface
        1 VLAN interface Checks if the MTU of an SVI is no smaller than the parent interface, subtracting the VLAN tag size
        2 Bridge interface Checks if the MTU on a bridge is not arbitrarily smaller than the smallest MTU among its members

        MLAG Validation Tests

        The MLAG validation tests look for misconfigurations, peering status, and bond error states.

        Test Number Test Name Description
        0 Peering Checks if:
        • MLAG peerlink is up
        • MLAG peerlink bond slaves are down (not in full capacity and redundancy)
        • Peering is established between two nodes in an MLAG pair
        1 Backup IP Checks if:
        • MLAG backup IP configuration is missing on an MLAG node
        • MLAG backup IP is correctly pointing to the MLAG peer and its connectivity is available
        2 CLAG Sysmac Checks if:
        • MLAG Sysmac is consistently configured on both nodes in an MLAG pair
        • Any duplication of an MLAG sysmac exists within a bridge domain
        3 VXLAN Anycast IP Checks if the VXLAN anycast IP address is consistently configured on both nodes in an MLAG pair
        4 Bridge Membership Checks if the MLAG peerlink is part of bridge
        5 Spanning tree* Checks if:
        • STP is enabled and running on the MLAG nodes
        • MLAG peerlink role is correct from STP perspective
        • The bridge ID is consistent between two nodes of an MLAG pair
        • The VNI in the bridge has BPDU guard and BPDU filter enabled
        *Not supported in per-VLAN rapid spanning tree (PVRST) mode
        6 Dual Home Checks for:
        • MLAG bonds that are not in dually connected state
        • Dually connected bonds have consistent VLAN and MTU configuration on both sides
        • STP has consistent view of bonds' dual connectedness
        7 Single Home Checks for:
        • Singly connected bonds
        • STP has consistent view of bond’s single connectedness
        8 Conflicted Bonds Checks for bonds in MLAG conflicted state and shows the reason
        9 ProtoDown Bonds Checks for bonds in protodown state and shows the reason
        10 SVI Checks if:
        • Both sides of an MLAG pair have an SVI configured
        • SVI on both sides have consistent MTU setting

        NTP Validation Tests

        The NTP validation test looks for poor operational status of the NTP service.

        Test Number Test Name Description
        0 NTP Sync Checks if the NTP service is running and in sync state

        OSPF Validation Tests

        The OSPF validation tests look for indications of the service health and configuration consistency.

        Test Number Test Name Description
        0 Router ID Checks for OSPF router ID conflicts in the network
        1 Adjacency Checks for OSPF adjacencies in a down or unknown state
        2 Timers Checks for consistency of OSPF timer values in an OSPF adjacency
        3 Network Type Checks for consistency of network type configuration in an OSPF adjacency
        4 Area ID Checks for consistency of area ID configuration in an OSPF adjacency
        5 Interface MTU Checks for MTU consistency in an OSPF adjacency
        6 Service Status Checks for OSPF service health in an OSPF adjacency

        RoCE Validation Tests

        The RoCE validation tests look for consistent RoCE and QoS configurations across nodes.

        Test Number Test Name Description
        0 RoCE Mode Checks whether RoCE is configured for lossy or lossless mode
        1 Classification Checks for consistency of DSCP, service pool, port group, and traffic class settings
        2 Congestion Control Checks for consistency of ECN and RED threshold settings
        3 Flow Control Checks for consistency of PFC configuration for RoCE lossless mode
        4 ETS Checks for consistency of Enhanced Transmission Selection settings
        5 RoCE Miscellaneous Checks for consistency across related services

        Sensor Validation Tests

        The sensor validation tests looks for chassis power supply, fan, and temperature sensors that are not operating as expected.

        Test Number Test Name Description
        0 PSU sensors Checks for power supply unit sensors that are not in ok state
        1 Fan sensors Checks for fan sensors that are not in ok state
        2 Temperature sensors Checks for temperature sensors that are not in ok state

        VLAN Validation Tests

        The VLAN validation tests look for configuration consistency between two nodes.

        Test Number Test Name Description
        0 Link Neighbor VLAN Consistency Checks for consistency of VLAN configuration on two sides of a port or a bond
        1 CLAG Bond VLAN Consistency Checks for consistent VLAN membership of a CLAG (MLAG) bond on each side of the CLAG (MLAG) pair

        VXLAN Validation Tests

        The VXLAN validation tests look for configuration consistency across all VTEPs.

        Test Number Test Name Description
        0 VLAN Consistency Checks for consistent VLAN to VXLAN mapping across all VTEPs
        1 BUM replication Checks for consistent replication group membership across all VTEPs

        Flow Analysis

        Create a flow analysis to sample data from TCP and UDP flows in your environment and to review latency and buffer utilization statistics across network paths.

        Flow analysis is supported on NVIDIA Spectrum-2 switches and later. It requires a switch fabric running Cumulus Linux version 5.0 or later. You must enable Lifecycle Management (LCM) to run a flow analysis. If LCM is disabled, you will not see the flow analysis icon in the UI. LCM is enabled for on-premises deployments by default and disabled for cloud deployments by default. Contact your local NVIDIA sales representative or submit a support ticket to activate LCM on cloud deployments.

        Create a New Flow Analysis

        To start a new flow analysis, click the Flow analysis icon and select Create new flow analysis.

        flow analysis menu with options to create a new flow analysis or view a previous analysis

        In the dialog, enter the application parameters, including the source IP address, destination IP address, source port, and destination port of the flow you wish to analyze. Select the protocol and VRF for the flow from the dropdown menus.

        flow analysis wizard prompting user to enter application parameters

        After you enter the application parameters, enter the monitor settings, including the sampling rate and time parameters.

        flow analysis wizard prompting user to enter sampling and scheduling information

        If you attempt to run a flow analysis that includes switches assigned a default, unmodified access profile, the process will fail. Create a unique access profile (or update the default profile with unique credentials), then assign the profile to the switches you want to include in the flow analysis.

        Running a flow analysis will affect switch CPU performance. For high-volume flows, set a lower sampling rate to limit switch CPU impact.

        View Flow Analysis Data

        After starting the flow analysis, a flow analysis card will appear on the NetQ Workbench.

        flow analysis card showing that a flow analysis is in progress

        View a previous flow analysis by selecting Flow analysis and View previous flow analysis.

        flow analysis menu with the option to view previous flow analysis highlighted

        Select View details next to the name of the flow analysis to display the analysis dashboard. You can use this dashboard to view latency and buffer statistics for the monitored flow. If bi-directional monitoring was enabled, you can view the reverse direction of the flow by selecting the icon. The following example shows flow data across a single path:

        flow analysis dashboard displaying flow data across a single path

        The dashboard header shows the monitored flow settings:

        dashboard header displaying settings and paramters selected with the flow analysis wizard
        Flow Settings Description
        Lifetime The lifetime of the flow analysis. This example completed in 11 minutes.
        Source IP The source IP address of the flow. In this example it is 10.1.100.125.
        Destination IP The destination IP address of the flow. In this example it is 10.1.10.105.
        Source Port The source port of the flow. In this example it displays N/A because it was not set.
        Destination Port The destination port of the flow. In this example it is 2222.
        Protocol The protocol of the monitored flow. In this example it is UDP.
        Sampling Rate The sampling rate of the flow. In this example it is low.
        VRF The VRF the flow is present in. In this example it is the default VRF.
        Bi-directional Monitoring This determines if the flow is monitored in both directions between the source IP address and the destination IP address. In this example it is enabled. Click to change the direction that is displayed.

        Understanding the Flow Analysis Graph

        The flow analysis graph is color coded relative to the values measured across devices. Lower values are displayed in green, and higher values are displayed in orange. The color gradient is displayed below the graph along with the low and high values from the collected flow data. Each hop in the path is represented in the graph with a vertical, gray-striped line labeled by hostname. The following example shows a single path:

        single-path flow analysis with five hops ranging from low to high values

        The flow graph panel on the right side of the dashboard displays the devices along the selected path.

        flow graph panel showing the five devices associated with the flow analysis graph

        View Flow Latency

        The latency measured by the flow analysis is the total transit time of the sampled packets through individual devices. A summary of measured latency for each device is displayed above the main flow analysis graph.

        three devices displaying their average latencies, including minimum, maximum and P95 value.

        The average latency for packets in the flow is displayed under the hostname of each device, along with the minimum and maximum latencies observed during the analysis lifetime. The 95th percentile (P95) latency value for sampled packets is also displayed. The P95 calculation means that 95% of the sampled packets have a latency value less than or equal to the calculation.

        Use your cursor to hover over sections of the main analysis graph to view average latency values for each device in a path.

        cursor hovering over a device to show latency values

        The left panel of the flow analysis dashboard also displays a timeline of measured latency for each device on that path. Use your cursor to hover over the plotted data points on the timeline for each device to view the latency measured at each time interval.

        a cursor hovering over a device's timeline showing maximum, minimum, and average latency at 6:15 AM on November 24th 2021

        View Buffer Occupancy

        The main flow analysis dashboard also displays the buffer occupancy of each device along the path. To change the graph view to display buffer occupancy for the flow, click next to Avg. flow latency and select Avg. buffer occupancy. You can view an overview graph of buffer occupancy or select each device to see the buffer occupancy for the analyzed flow:

        overview graph displaying average buffer occupancy between 8 total devices

        The percentages represent the amount of buffer space on the switch that the analyzed flow occupied while the analysis was running.

        buffer occupancy displaying percentages at 0

        View Multiple Paths

        When packets matching the flow settings traverse multiple paths in the topology, the flow graph displays latency and buffer occupancy for each path:

        flow graph displaying multiple paths along with latency and buffer-occupancy data along those paths

        You can switch between paths by clicking on an alternate path in the Flow graph panel, or by clicking on an unselected path on the main analysis graph:

        flow graph panel highlighting a selected path with several unselected paths also displayed

        In the detail panel on the left side of the dashboard, you can select a path to view the percent of packets distributed over each path.

        a selected path showing that 50.1% of packets are distributed over that path

        Partial Path Support

        Some flows can still be analyzed if they traverse a network path that includes switches lacking flow analysis support. Partial-path flow analysis is supported in the following conditions:

        An unsupported device is represented in the flow analysis graph as a black bar lined with red x’s. Flow statistics are not displayed for that device.

        flow analysis graph showing an unsupported switch

        Unsupported devices are also designated in the flow graph panel:

        flow graph panel with an unsupported switch

        Selecting the unsupported device displays device statistics in the left panel if they are available to NetQ. Otherwise, the display will indicate why the device is not supported:

        a panel showing an unsupported device. The device is not supported because the CL version is not supported for flow analysis

        Path discovery will terminate if multiple consecutive switches do not support flow analysis. When additional data is available from switches outside of discovered paths, you can view data from those devices from the menu at the top of the page:

        menu displaying three unsupported devices

        The left panel displays the data, along with ingress and egress ports.

        View Device Statistics

        You can view latency, buffer occupancy, interface statistics, resource utilization, and WJH events for each device by clicking on a device in the Flow Graph panel, or by clicking on the line associated with a device in the main flow analysis graph. The left panel will then update to reflect statistics for the respective device.

        panel displaying statistics of a selected device

        After selecting a device, click to expand the statistics chart:

        a cursor hovering over an icon that, when selected, expands the chart

        In this view, you can select additional categories to add to the chart:

        expanded chart displaying latency and WJH data, with buffer occupancy and total packet unselected and therefore not dispayed

        The Flow Graph panel allows you to access the topology view, where you can also click the paths and devices to view statistics. Click View in topology to switch to the topology view.

        topology view showing both selected and unselected devices and their paths

        View WJH Events

        Flow analysis monitors the path for WJH events and records any drops for the flow. Switches with WJH events recorded are represented in the flow analysis graph as a red bar with white stripes. Hover over the device to see a WJH event summary:

        a user hovering over a device in the main flow analysis graph with a WJH event summary showing 94,300 total packet drops

        You can also view devices with WJH events in the flow graph panel:

        a user hovering over a device in the flow graph panel with a WJH event summary showing 94,300 total packet drops

        Click on a device with WJH events to see the statistics in the left panel. Hover over the data to reveal the type of drops over time:

        invdividual device WJH statistics showing 2673 router drops

        WJH drops can also be viewed from the expanded device chart by selecting the WJH category:

        expanded device chart showing WJH data of 24 total router drops

        Select Show all drops to display a list of all WJH drops for the device:

        WJH statistics for all drops, including tabular information on count, drop type, drop reason, severity, and corrective action

        Verify Network Connectivity

        You can verify the connectivity between two devices in both an ad-hoc fashion and by defining connectivity checks to occur on a scheduled basis.

        Specifying Source and Destination Values

        When specifying traces, the following options are available for the source and destination values:

        Trace Type Source Destination
        Layer 2 Hostname MAC address plus VLAN
        Layer 2 IPv4/IPv6 address plus VRF (if not default) MAC address plus VLAN
        Layer 2 MAC Address MAC address plus VLAN
        Layer 3 Hostname IPv4/IPv6 address
        Layer 3 IPv4/IPv6 address plus VRF (if not default) IPv4/IPv6 address

        If you use an IPv6 address, you must enter the complete, non-truncated address.

        Known Addresses

        The tracing function only knows about previously learned addresses. If you find that a path is invalid or incomplete, ping the identified device so that its address becomes known.

        Create On-demand Traces

        You can view the current connectivity between two devices in your network by creating an on-demand trace. You can perform these traces at layer 2 or layer 3 using the NetQ UI or the NetQ CLI.

        Create a Layer 3 On-demand Trace Request

        It is helpful to verify the connectivity between two devices when you suspect an issue is preventing proper communication between them. If you cannot find a layer 3 path, you might also try checking connectivity through a layer 2 path.

        1. Determine the IP addresses of the two devices you want to trace.

          1. Click Menu, then select IP addresses.

          2. Select Filter and enter a hostname.

          3. From the list of results, note the relevant address.

          4. Filter the list again for the other hostname, and note its address.

        2. Open the Trace Request card.

          • On a new workbench: Type trace in the Global search field and select the card.
          • On a current workbench: Click Add card, then select the Trace card.
        3. In the Source field, enter the hostname or IP address of the device where you want to start the trace.

        4. In the Destination field, enter the IP address of the device where you want to end the trace.

        If you mistype an address, you must double-click it, or backspace over the error, and retype the address. You cannot select the address by dragging over it as this action attempts to move the card to another location.

        1. Click Run now. A corresponding Trace Results card is opened on your workbench.

        Use the netq trace command to view the results in the terminal window. Use the netq add trace command to view the results in the NetQ UI.

        To create a layer 3 on-demand trace and see the results in the terminal window, run:

        netq trace <ip> from (<src-hostname>|<ip-src>) [vrf <vrf>] [around <text-time>] [json|detail|pretty] [debug]
        

        Note the syntax requires the destination device address first and then the source device address or hostname.

        This example shows a trace from 10.10.10.1 (source, leaf01) to 10.10.10.63 (destination, border01) on the underlay in pretty output. You could have used leaf01 as the source instead of its IP address. The example first identifies the addresses for the source and destination devices using netq show ip addresses then runs the trace.

        cumulus@switch:~$ netq border01 show ip addresses
        
        Matching address records:
        Address                   Hostname          Interface                 VRF             Last Changed
        ------------------------- ----------------- ------------------------- --------------- -------------------------
        192.168.200.63/24         border01          eth0                                      Tue Nov  3 15:45:31 2020
        10.0.1.254/32             border01          lo                        default         Mon Nov  2 22:28:54 2020
        10.10.10.63/32            border01          lo                        default         Mon Nov  2 22:28:54 2020
        
        cumulus@switch:~$ netq trace 10.10.10.63 from  10.10.10.1 pretty
        Number of Paths: 12
        Number of Paths with Errors: 0
        Number of Paths with Warnings: 0
        Path MTU: 9216
        
         leaf01 swp54 -- swp1 spine04 swp6 -- swp54 border02 peerlink.4094 -- peerlink.4094 border01 lo
                                                             peerlink.4094 -- peerlink.4094 border01 lo
         leaf01 swp53 -- swp1 spine03 swp6 -- swp53 border02 peerlink.4094 -- peerlink.4094 border01 lo
                                                             peerlink.4094 -- peerlink.4094 border01 lo
         leaf01 swp52 -- swp1 spine02 swp6 -- swp52 border02 peerlink.4094 -- peerlink.4094 border01 lo
                                                             peerlink.4094 -- peerlink.4094 border01 lo
         leaf01 swp51 -- swp1 spine01 swp6 -- swp51 border02 peerlink.4094 -- peerlink.4094 border01 lo
                                                             peerlink.4094 -- peerlink.4094 border01 lo
         leaf01 swp54 -- swp1 spine04 swp5 -- swp54 border01 lo
         leaf01 swp53 -- swp1 spine03 swp5 -- swp53 border01 lo
         leaf01 swp52 -- swp1 spine02 swp5 -- swp52 border01 lo
         leaf01 swp51 -- swp1 spine01 swp5 -- swp51 border01 lo
        

        Each row of the pretty output shows one of the 12 available paths, with each path described by hops using the following format:

        source hostname and source egress port – ingress port of first hop and device hostname and egress port – n*(ingress port of next hop and device hostname and egress port) – ingress port of destination device hostname

        In this example, 8 of 12 paths use four hops to get to the destination and four use three hops. The overall MTU for all paths is 9216. No errors or warnings are present on any of the paths.

        To create a layer 3 on-demand trace and see the results in the On-demand Trace Results card, run:

        netq add trace <ip> from (<src-hostname> | <ip-src>) [alert-on-failure]
        

        This example shows a trace from 10.10.10.1 (source, leaf01) to 10.10.10.63 (destination, border01).

        cumulus@switch:~$ netq add trace 10.10.10.63 from 10.10.10.1
        Running job None src 10.10.10.1 dst 10.10.10.63
        

        Create a Layer 3 On-demand Trace Through a Given VRF

        To create the trace request:

        Follow steps 1 through 4 as outlined in the previous section.

        1. In the VRF field, enter the identifier for the VRF associated with these devices.

        2. Click Run now. A corresponding Trace Results card is opened on your workbench.

        Use the netq trace command to view the results in the terminal window. Use the netq add trace command to view the results in the NetQ UI.

        To create a layer 3 on-demand trace through a given VRF and see the results in the terminal window, run:

        netq trace <ip> from (<src-hostname>|<ip-src>) [vrf <vrf>] [around <text-time>] [json|detail|pretty] [debug]
        

        Note the syntax requires the destination device address first and then the source device address or hostname.

        This example shows a trace from 10.1.10.101 (source, server01) to 10.1.10.104 (destination, server04) through VRF RED in detail output. It first identifies the addresses for the source and destination devices and a VRF between them using netq show ip addresses then runs the trace. Note that the VRF name is case sensitive. The trace job might take some time to compile all the available paths, especially if there are many of them.

        cumulus@switch:~$ netq server01 show ip addresses
        Matching address records:
        Address                   Hostname          Interface                 VRF             Last Changed
        ------------------------- ----------------- ------------------------- --------------- -------------------------
        192.168.200.31/24         server01          eth0                      default         Tue Nov  3 19:50:21 2020
        10.1.10.101/24            server01          uplink                    default         Tue Nov  3 19:50:21 2020
        
        cumulus@switch:~$ netq server04 show ip addresses
        Matching address records:
        Address                   Hostname          Interface                 VRF             Last Changed
        ------------------------- ----------------- ------------------------- --------------- -------------------------
        10.1.10.104/24            server04          uplink                    default         Tue Nov  3 19:50:23 2020
        192.168.200.34/24         server04          eth0                      default         Tue Nov  3 19:50:23 2020
        
        cumulus@switch:~$ netq trace 10.1.10.104 from 10.1.10.101 vrf RED
        Number of Paths: 16
        Number of Paths with Errors: 0
        Number of Paths with Warnings: 0
        Path MTU: 9000
        
        Id  Hop Hostname    InPort          InTun, RtrIf    OutRtrIf, Tun   OutPort
        --- --- ----------- --------------- --------------- --------------- ---------------
        1   1   server01                                                    mac:44:38:39:00
                                                                            :00:38
            2   leaf02      swp1                            vni: 10         swp54
            3   spine04     swp2            swp2            swp4            swp4
            4   leaf04      swp54           vni: 10                         bond1
            5   server04    uplink
        --- --- ----------- --------------- --------------- --------------- ---------------
        2   1   server01                                                    mac:44:38:39:00
                                                                            :00:38
            2   leaf02      swp1                            vni: 10         swp54
            3   spine04     swp2            swp2            swp3            swp3
            4   leaf03      swp54           vni: 10                         bond1
            5   server04    uplink
        --- --- ----------- --------------- --------------- --------------- ---------------
        3   1   server01                                                    mac:44:38:39:00
                                                                            :00:38
            2   leaf02      swp1                            vni: 10         swp53
            3   spine03     swp2            swp2            swp4            swp4
            4   leaf04      swp53           vni: 10                         bond1
            5   server04    uplink
        --- --- ----------- --------------- --------------- --------------- ---------------
        4   1   server01                                                    mac:44:38:39:00
                                                                            :00:38
            2   leaf02      swp1                            vni: 10         swp53
            3   spine03     swp2            swp2            swp3            swp3
            4   leaf03      swp53           vni: 10                         bond1
            5   server04    uplink
        --- --- ----------- --------------- --------------- --------------- ---------------
        5   1   server01                                                    mac:44:38:39:00
                                                                            :00:38
            2   leaf02      swp1                            vni: 10         swp52
            3   spine02     swp2            swp2            swp4            swp4
            4   leaf04      swp52           vni: 10                         bond1
            5   server04    uplink
        --- --- ----------- --------------- --------------- --------------- ---------------
        6   1   server01                                                    mac:44:38:39:00
                                                                            :00:38
            2   leaf02      swp1                            vni: 10         swp52
            3   spine02     swp2            swp2            swp3            swp3
            4   leaf03      swp52           vni: 10                         bond1
            5   server04    uplink
        --- --- ----------- --------------- --------------- --------------- ---------------
        7   1   server01                                                    mac:44:38:39:00
                                                                            :00:38
            2   leaf02      swp1                            vni: 10         swp51
            3   spine01     swp2            swp2            swp4            swp4
            4   leaf04      swp51           vni: 10                         bond1
            5   server04    uplink
        --- --- ----------- --------------- --------------- --------------- ---------------
        8   1   server01                                                    mac:44:38:39:00
                                                                            :00:38
            2   leaf02      swp1                            vni: 10         swp51
            3   spine01     swp2            swp2            swp3            swp3
            4   leaf03      swp51           vni: 10                         bond1
            5   server04    uplink
        --- --- ----------- --------------- --------------- --------------- ---------------
        9   1   server01                                                    mac:44:38:39:00
                                                                            :00:32
            2   leaf01      swp1                            vni: 10         swp54
            3   spine04     swp1            swp1            swp4            swp4
            4   leaf04      swp54           vni: 10                         bond1
            5   server04    uplink
        --- --- ----------- --------------- --------------- --------------- ---------------
        10  1   server01                                                    mac:44:38:39:00
                                                                            :00:32
            2   leaf01      swp1                            vni: 10         swp54
            3   spine04     swp1            swp1            swp3            swp3
            4   leaf03      swp54           vni: 10                         bond1
            5   server04    uplink
        --- --- ----------- --------------- --------------- --------------- ---------------
        11  1   server01                                                    mac:44:38:39:00
                                                                            :00:32
            2   leaf01      swp1                            vni: 10         swp53
            3   spine03     swp1            swp1            swp4            swp4
            4   leaf04      swp53           vni: 10                         bond1
            5   server04    uplink
        --- --- ----------- --------------- --------------- --------------- ---------------
        12  1   server01                                                    mac:44:38:39:00
                                                                            :00:32
            2   leaf01      swp1                            vni: 10         swp53
            3   spine03     swp1            swp1            swp3            swp3
            4   leaf03      swp53           vni: 10                         bond1
            5   server04    uplink
        --- --- ----------- --------------- --------------- --------------- ---------------
        13  1   server01                                                    mac:44:38:39:00
                                                                            :00:32
            2   leaf01      swp1                            vni: 10         swp52
            3   spine02     swp1            swp1            swp4            swp4
            4   leaf04      swp52           vni: 10                         bond1
            5   server04    uplink
        --- --- ----------- --------------- --------------- --------------- ---------------
        14  1   server01                                                    mac:44:38:39:00
                                                                            :00:32
            2   leaf01      swp1                            vni: 10         swp52
            3   spine02     swp1            swp1            swp3            swp3
            4   leaf03      swp52           vni: 10                         bond1
            5   server04    uplink
        --- --- ----------- --------------- --------------- --------------- ---------------
        15  1   server01                                                    mac:44:38:39:00
                                                                            :00:32
            2   leaf01      swp1                            vni: 10         swp51
            3   spine01     swp1            swp1            swp4            swp4
            4   leaf04      swp51           vni: 10                         bond1
            5   server04    uplink
        --- --- ----------- --------------- --------------- --------------- ---------------
        16  1   server01                                                    mac:44:38:39:00
                                                                            :00:32
            2   leaf01      swp1                            vni: 10         swp51
            3   spine01     swp1            swp1            swp3            swp3
            4   leaf03      swp51           vni: 10                         bond1
            5   server04    uplink
        --- --- ----------- --------------- --------------- --------------- ---------------
        

        To create a layer 3 on-demand trace and see the results in the On-demand Trace Results card, run:

        netq add trace <ip> from (<src-hostname> | <ip-src>) vrf <vrf>
        

        This example shows a trace from 10.1.10.101 (source, server01) to 10.1.10.104 (destination, server04) through VRF RED.

        cumulus@switch:~$ netq add trace 10.1.10.104 from 10.1.10.101 vrf RED
        

        Create a Layer 2 On-demand Trace

        It is helpful to verify the connectivity between two devices when you suspect an issue is preventing proper communication between them. If you cannot find a path through a layer 2 path, you might also try checking connectivity through a layer 3 path.

        To create a layer 2 trace request:

        Follow steps 1 through 4 as outlined in the previous section.

        1. In the VLAN ID field, enter the identifier for the VLAN associated with the destination.

        2. Click Run Now. A corresponding Trace Results card is opened on your workbench.

        Use the netq trace command to view on-demand trace results in the terminal window.

        To create a layer 2 on-demand trace and see the results in the terminal window, run:

        netq trace (<mac> vlan <1-4096>) from <mac-src> [around <text-time>] [json|detail|pretty] [debug]
        

        Note the syntax requires the destination device address first and then the source device address or hostname.

        This example shows a trace from 44:38:39:00:00:32 (source, server01) to 44:38:39:00:00:3e (destination, server04) through VLAN 10 in detail output. It first identifies the MAC addresses for the two devices using netq show ip neighbors. Then it determines the VLAN using netq show macs. Then it runs the trace.

        cumulus@switch:~$ netq show ip neighbors
        Matching neighbor records:
        IP Address                Hostname          Interface                 MAC Address        VRF             Remote Last Changed
        ------------------------- ----------------- ------------------------- ------------------ --------------- ------ -------------------------
        ...
        192.168.200.1             server04          eth0                      44:38:39:00:00:6d  default         no     Tue Nov  3 19:50:23 2020
        10.1.10.1                 server04          uplink                    00:00:00:00:00:1a  default         no     Tue Nov  3 19:50:23 2020
        10.1.10.101               server04          uplink                    44:38:39:00:00:32  default         no     Tue Nov  3 19:50:23 2020
        10.1.10.2                 server04          uplink                    44:38:39:00:00:5d  default         no     Tue Nov  3 19:50:23 2020
        10.1.10.3                 server04          uplink                    44:38:39:00:00:5e  default         no     Tue Nov  3 19:50:23 2020
        192.168.200.250           server04          eth0                      44:38:39:00:01:80  default         no     Tue Nov  3 19:50:23 2020
        192.168.200.1             server03          eth0                      44:38:39:00:00:6d  default         no     Tue Nov  3 19:50:22 2020
        192.168.200.250           server03          eth0                      44:38:39:00:01:80  default         no     Tue Nov  3 19:50:22 2020
        192.168.200.1             server02          eth0                      44:38:39:00:00:6d  default         no     Tue Nov  3 19:50:22 2020
        10.1.20.1                 server02          uplink                    00:00:00:00:00:1b  default         no     Tue Nov  3 19:50:22 2020
        10.1.20.2                 server02          uplink                    44:38:39:00:00:59  default         no     Tue Nov  3 19:50:22 2020
        10.1.20.3                 server02          uplink                    44:38:39:00:00:37  default         no     Tue Nov  3 19:50:22 2020
        10.1.20.105               server02          uplink                    44:38:39:00:00:40  default         no     Tue Nov  3 19:50:22 2020
        192.168.200.250           server02          eth0                      44:38:39:00:01:80  default         no     Tue Nov  3 19:50:22 2020
        192.168.200.1             server01          eth0                      44:38:39:00:00:6d  default         no     Tue Nov  3 19:50:21 2020
        10.1.10.1                 server01          uplink                    00:00:00:00:00:1a  default         no     Tue Nov  3 19:50:21 2020
        10.1.10.2                 server01          uplink                    44:38:39:00:00:59  default         no     Tue Nov  3 19:50:21 2020
        10.1.10.3                 server01          uplink                    44:38:39:00:00:37  default         no     Tue Nov  3 19:50:21 2020
        10.1.10.104               server01          uplink                    44:38:39:00:00:3e  default         no     Tue Nov  3 19:50:21 2020
        192.168.200.250           server01          eth0                      44:38:39:00:01:80  default         no     Tue Nov  3 19:50:21 2020
        ...
        
        cumulus@switch:~$ netq show macs
        Matching mac records:
        Origin MAC Address        VLAN   Hostname          Egress Port                    Remote Last Changed
        ------ ------------------ ------ ----------------- ------------------------------ ------ -------------------------
        yes    44:38:39:00:00:5e  4002   leaf04            bridge                         no     Fri Oct 30 22:29:16 2020
        no     46:38:39:00:00:46  20     leaf04            bond2                          no     Fri Oct 30 22:29:16 2020
        no     44:38:39:00:00:5d  30     leaf04            peerlink                       no     Fri Oct 30 22:29:16 2020
        yes    00:00:00:00:00:1a  10     leaf04            bridge                         no     Fri Oct 30 22:29:16 2020
        yes    44:38:39:00:00:5e  20     leaf04            bridge                         no     Fri Oct 30 22:29:16 2020
        yes    7e:1a:b3:4f:05:b8  20     leaf04            vni20                          no     Fri Oct 30 22:29:16 2020
        ...
        no     46:38:39:00:00:3e  10     leaf01            vni10                          yes    Fri Oct 30 22:28:50 2020
        ...
        yes    44:38:39:00:00:4d  4001   border01          bridge                         no     Fri Oct 30 22:28:53 2020
        yes    7a:4a:c7:bb:48:27  4001   border01          vniRED                         no     Fri Oct 30 22:28:53 2020
        yes    ce:93:1d:e3:08:1b  4002   border01          vniBLUE                        no     Fri Oct 30 22:28:53 2020
        
        cumulus@switch:~$ netq trace 44:38:39:00:00:3e vlan 10 from 44:38:39:00:00:32
        Number of Paths: 16
        Number of Paths with Errors: 0
        Number of Paths with Warnings: 0
        Path MTU: 9000
        
        Id  Hop Hostname    InPort          InTun, RtrIf    OutRtrIf, Tun   OutPort
        --- --- ----------- --------------- --------------- --------------- ---------------
        1   1   server01                                                    mac:44:38:39:00
                                                                            :00:38
            2   leaf02      swp1                            vni: 10         swp54
            3   spine04     swp2            swp2            swp4            swp4
            4   leaf04      swp54           vni: 10                         bond1
            5   server04    uplink
        --- --- ----------- --------------- --------------- --------------- ---------------
        2   1   server01                                                    mac:44:38:39:00
                                                                            :00:38
            2   leaf02      swp1                            vni: 10         swp54
            3   spine04     swp2            swp2            swp3            swp3
            4   leaf03      swp54           vni: 10                         bond1
            5   server04    uplink
        --- --- ----------- --------------- --------------- --------------- ---------------
        3   1   server01                                                    mac:44:38:39:00
                                                                            :00:38
            2   leaf02      swp1                            vni: 10         swp53
            3   spine03     swp2            swp2            swp4            swp4
            4   leaf04      swp53           vni: 10                         bond1
            5   server04    uplink
        --- --- ----------- --------------- --------------- --------------- ---------------
        4   1   server01                                                    mac:44:38:39:00
                                                                            :00:38
            2   leaf02      swp1                            vni: 10         swp53
            3   spine03     swp2            swp2            swp3            swp3
            4   leaf03      swp53           vni: 10                         bond1
            5   server04    uplink
        --- --- ----------- --------------- --------------- --------------- ---------------
        5   1   server01                                                    mac:44:38:39:00
                                                                            :00:38
            2   leaf02      swp1                            vni: 10         swp52
            3   spine02     swp2            swp2            swp4            swp4
            4   leaf04      swp52           vni: 10                         bond1
            5   server04    uplink
        --- --- ----------- --------------- --------------- --------------- ---------------
        6   1   server01                                                    mac:44:38:39:00
                                                                            :00:38
            2   leaf02      swp1                            vni: 10         swp52
            3   spine02     swp2            swp2            swp3            swp3
            4   leaf03      swp52           vni: 10                         bond1
            5   server04    uplink
        --- --- ----------- --------------- --------------- --------------- ---------------
        7   1   server01                                                    mac:44:38:39:00
                                                                            :00:38
            2   leaf02      swp1                            vni: 10         swp51
            3   spine01     swp2            swp2            swp4            swp4
            4   leaf04      swp51           vni: 10                         bond1
            5   server04    uplink
        --- --- ----------- --------------- --------------- --------------- ---------------
        8   1   server01                                                    mac:44:38:39:00
                                                                            :00:38
            2   leaf02      swp1                            vni: 10         swp51
            3   spine01     swp2            swp2            swp3            swp3
            4   leaf03      swp51           vni: 10                         bond1
            5   server04    uplink
        --- --- ----------- --------------- --------------- --------------- ---------------
        9   1   server01                                                    mac:44:38:39:00
                                                                            :00:32
            2   leaf01      swp1                            vni: 10         swp54
            3   spine04     swp1            swp1            swp4            swp4
            4   leaf04      swp54           vni: 10                         bond1
            5   server04    uplink
        --- --- ----------- --------------- --------------- --------------- ---------------
        10  1   server01                                                    mac:44:38:39:00
                                                                            :00:32
            2   leaf01      swp1                            vni: 10         swp54
            3   spine04     swp1            swp1            swp3            swp3
            4   leaf03      swp54           vni: 10                         bond1
            5   server04    uplink
        --- --- ----------- --------------- --------------- --------------- ---------------
        11  1   server01                                                    mac:44:38:39:00
                                                                            :00:32
            2   leaf01      swp1                            vni: 10         swp53
            3   spine03     swp1            swp1            swp4            swp4
            4   leaf04      swp53           vni: 10                         bond1
            5   server04    uplink
        --- --- ----------- --------------- --------------- --------------- ---------------
        12  1   server01                                                    mac:44:38:39:00
                                                                            :00:32
            2   leaf01      swp1                            vni: 10         swp53
            3   spine03     swp1            swp1            swp3            swp3
            4   leaf03      swp53           vni: 10                         bond1
            5   server04    uplink
        --- --- ----------- --------------- --------------- --------------- ---------------
        13  1   server01                                                    mac:44:38:39:00
                                                                            :00:32
            2   leaf01      swp1                            vni: 10         swp52
            3   spine02     swp1            swp1            swp4            swp4
            4   leaf04      swp52           vni: 10                         bond1
            5   server04    uplink
        --- --- ----------- --------------- --------------- --------------- ---------------
        14  1   server01                                                    mac:44:38:39:00
                                                                            :00:32
            2   leaf01      swp1                            vni: 10         swp52
            3   spine02     swp1            swp1            swp3            swp3
            4   leaf03      swp52           vni: 10                         bond1
            5   server04    uplink
        --- --- ----------- --------------- --------------- --------------- ---------------
        15  1   server01                                                    mac:44:38:39:00
                                                                            :00:32
            2   leaf01      swp1                            vni: 10         swp51
            3   spine01     swp1            swp1            swp4            swp4
            4   leaf04      swp51           vni: 10                         bond1
            5   server04    uplink
        --- --- ----------- --------------- --------------- --------------- ---------------
        16  1   server01                                                    mac:44:38:39:00
                                                                            :00:32
            2   leaf01      swp1                            vni: 10         swp51
            3   spine01     swp1            swp1            swp3            swp3
            4   leaf03      swp51           vni: 10                         bond1
            5   server04    uplink
        --- --- ----------- --------------- --------------- --------------- ---------------
        

        Use the netq add trace command to view on-demand trace results in the NetQ UI.

        To create a layer 2 on-demand trace and see the results in the On-demand Trace Results card, run:

        netq add trace <mac> vlan <1-4096> from <mac-src>
        

        This example shows a trace from 44:38:39:00:00:32 (source, server01) to 44:38:39:00:00:3e (destination, server04) through VLAN 10.

        cumulus@switch:~$ netq add trace 44:38:39:00:00:3e vlan 10 from 44:38:39:00:00:32
        

        View On-demand Trace Results

        After you have started an on-demand trace or run the netq add trace command, the results appear in either the UI or CLI. In the CLI, run the netq show trace results command. In the UI, locate the On-demand Trace Result card:

        After you click Run now, the corresponding results card opens on your workbench. While it is working on the trace, a notice appears on the card indicating it is running. When it is finished, the results are displayed:

        To view additional information, expand the card to its largest size and click on a trace. From this screen, you can view configuration details, error and warning messages, and granular data for individual paths.

        Create Scheduled Traces

        There might be paths through your network that you consider critical or particularly important to your everyday operations. In these cases, it might be useful to create one or more traces to periodically confirm that at least one path is available between the relevant two devices. You can create scheduled traces at layer 2 or layer 3 in your network, from the NetQ UI and the NetQ CLI.

        Create a Layer 3 Scheduled Trace

        To schedule a trace:

        Follow steps 1 through 4 as outlined in the previous section.

        1. Select a timeframe under Schedule to specify how often you want to run the trace.
        1. Accept the default starting time, or click in the Starting field to specify the day you want the trace to run for the first time.

        2. Verify your entries are correct, then click Save as new.

        3. Provide a name for the trace. Note: This name must be unique for a given user.

        4. Click Save.

          You can now run this trace on demand by selecting it from the dropdown list, or wait for it to run on its defined schedule.

        To create a layer 3 scheduled trace and see the results in the Scheduled Trace Results card, run:

        netq add trace name <text-new-trace-name> <ip> from (<src-hostname>|<ip-src>) interval <text-time-min>
        

        This example shows the creation of a scheduled trace between leaf01 (source, 10.10.10.1) and border01 (destination, 10.10.10.63) with a name of L01toB01Daily that runs on an daily basis. The interval option value is 1440 minutes, as denoted by the units indicator (m).

        cumulus@switch:~$ netq add trace name Lf01toBor01Daily 10.10.10.63 from 10.10.10.1 interval 1440m
        Successfully added/updated Lf01toBor01Daily running every 1440m
        

        View the results in the NetQ UI.

        Create a Layer 3 Scheduled Trace through a Given VRF

        To schedule a trace from the NetQ UI:

        Follow steps 1 through 4 as outlined in the previous section.

        1. Enter a VRF interface if you are using anything other than the default VRF.

        2. Select a timeframe under Schedule to specify how often you want to run the trace.

        1. Accept the default starting time, or click in the Starting field to specify the day you want the trace to run for the first time.

        2. Verify your entries are correct, then click Save as new.

        3. Provide a name for the trace. Note: This name must be unique for a given user.

        4. Click Save.

          You can now run this trace on demand by selecting it from the dropdown list, or wait for it to run on its defined schedule.

        To create a layer 3 scheduled trace that uses a VRF other than default and then see the results in the Scheduled Trace Results card, run:

        netq add trace name <text-new-trace-name> <ip> from (<src-hostname>|<ip-src>) vrf <vrf> interval <text-time-min>
        

        This example shows the creation of a scheduled trace between server01 (source, 10.1.10.101) and server04 (destination, 10.1.10.104) with a name of Svr01toSvr04Hrly that runs on an hourly basis. The interval option value is 60 minutes, as denoted by the units indicator (m).

        cumulus@switch:~$ netq add trace name Svr01toSvr04Hrly 10.1.10.104 from 10.10.10.1 interval 60m
        Successfully added/updated Svr01toSvr04Hrly running every 60m
        

        View the results in the NetQ UI.

        Create a Layer 2 Scheduled Trace

        To schedule a layer 2 trace:

        Follow steps 1 through 4 as outlined in the previous section.

        1. In the VLAN field, enter the VLAN ID associated with the destination device.

        2. Select a timeframe under Schedule to specify how often you want to run the trace.

        1. Accept the default starting time, or click in the Starting field to specify the day you want the trace to run for the first time.

        2. Verify your entries are correct, then click Save as new.

        3. Provide a name for the trace. Note: This name must be unique for a given user.

        4. Click Save.

          You can now run this trace on demand by selecting it from the dropdown list, or wait for it to run on its defined schedule.

        To create a layer 2 scheduled trace and then see the results in the Scheduled Trace Result card, run:

        netq add trace name <text-new-trace-name> <mac> vlan <1-4096> from (<src-hostname> | <ip-src>) [vrf <vrf>] interval <text-time-min>
        

        This example shows the creation of a scheduled trace between server01 (source, 10.1.10.101) and server04 (destination, 44:38:39:00:00:3e) on VLAN 10 with a name of Svr01toSvr04x3Hrs that runs every three hours. The interval option value is 180 minutes, as denoted by the units indicator (m).

        cumulus@switch:~$ netq add trace name Svr01toSvr04x3Hrs 44:38:39:00:00:3e vlan 10 from 10.1.10.101 interval 180m
        Successfully added/updated Svr01toSvr04x3Hrs running every 180m
        

        View the results in the NetQ UI.

        View Scheduled Trace Results

        The results of scheduled traces are displayed on the Scheduled Trace Results card. To view the results:

        1. Locate the Scheduled Trace Request card on your workbench and expand it to its largest size:
        1. Select the scheduled trace results you want to view. Above the table, select Open card. This opens the medium Scheduled Trace Results card(s) for the selected items.

        View a Summary of All Scheduled Traces

        You can view a summary of all scheduled traces using the netq show trace summary command. The summary displays the name of the trace, a job ID, status, and timestamps for when was run and when it completed.

        This example shows all scheduled traces run in the last 24 hours.

        cumulus@switch:~$ netq show trace summary
        Name            Job ID       Status           Status Details               Start Time           End Time
        --------------- ------------ ---------------- ---------------------------- -------------------- ----------------
        leaf01toborder0 f8d6a2c5-54d Complete         0                            Fri Nov  6 15:04:54  Fri Nov  6 15:05
        1               b-44a8-9a5d-                                               2020                 :21 2020
                        9d31f4e4701d
        New Trace       0e65e196-ac0 Complete         1                            Fri Nov  6 15:04:48  Fri Nov  6 15:05
                        5-49d7-8c81-                                               2020                 :03 2020
                        6e6691e191ae
        Svr01toSvr04Hrl 4c580c97-8af Complete         0                            Fri Nov  6 15:01:16  Fri Nov  6 15:01
        y               8-4ea2-8c09-                                               2020                 :44 2020
                        038cde9e196c
        Abc             c7174fad-71c Complete         1                            Fri Nov  6 14:57:18  Fri Nov  6 14:58
                        a-49d3-8c1d-                                               2020                 :11 2020
                        67962039ebf9
        Lf01toBor01Dail f501f9b0-cca Complete         0                            Fri Nov  6 14:52:35  Fri Nov  6 14:57
        y               3-4fa1-a60d-                                               2020                 :55 2020
                        fb6f495b7a0e
        L01toB01Daily   38a75e0e-7f9 Complete         0                            Fri Nov  6 14:50:23  Fri Nov  6 14:57
                        9-4e0c-8449-                                               2020                 :38 2020
                        f63def1ab726
        leaf01toborder0 f8d6a2c5-54d Complete         0                            Fri Nov  6 14:34:54  Fri Nov  6 14:57
        1               b-44a8-9a5d-                                               2020                 :20 2020
                        9d31f4e4701d
        leaf01toborder0 f8d6a2c5-54d Complete         0                            Fri Nov  6 14:04:54  Fri Nov  6 14:05
        1               b-44a8-9a5d-                                               2020                 :20 2020
                        9d31f4e4701d
        New Trace       0e65e196-ac0 Complete         1                            Fri Nov  6 14:04:48  Fri Nov  6 14:05
                        5-49d7-8c81-                                               2020                 :02 2020
                        6e6691e191ae
        Svr01toSvr04Hrl 4c580c97-8af Complete         0                            Fri Nov  6 14:01:16  Fri Nov  6 14:01
        y               8-4ea2-8c09-                                               2020                 :43 2020
                        038cde9e196c
        ...
        L01toB01Daily   38a75e0e-7f9 Complete         0                            Thu Nov  5 15:50:23  Thu Nov  5 15:58
                        9-4e0c-8449-                                               2020                 :22 2020
                        f63def1ab726
        leaf01toborder0 f8d6a2c5-54d Complete         0                            Thu Nov  5 15:34:54  Thu Nov  5 15:58
        1               b-44a8-9a5d-                                               2020                 :03 2020
                        9d31f4e4701d
        

        View Scheduled Trace Settings for a Given Trace

        You can view the configuration settings used by a give scheduled trace using the netq show trace settings command.

        This example shows the settings for the scheduled trace named Lf01toBor01Daily.

        cumulus@switch:~$ netq show trace settings name Lf01toBor01Daily
        

        View Scheduled Trace Results for a Given Trace

        You can view the results for a give scheduled trace using the netq show trace results command.

        This example obtains the job ID for the trace named Lf01toBor01Daily, then shows the results.

        cumulus@switch:~$ netq show trace summary name Lf01toBor01Daily json
        cumulus@switch:~$ netq show trace results f501f9b0-cca3-4fa1-a60d-fb6f495b7a0e
        

        Modify a Scheduled Trace

        You can modify scheduled traces at any time as described below. An administrator can also manage scheduled traces through the NetQ management dashboard.

        Be aware that changing the configuration of a trace can cause the results to be inconsistent with prior runs of the trace. If this is an unacceptable result, create a new scheduled trace. Optionally you can remove the original trace.

        To modify a scheduled trace:

        1. Open the Trace Request card.

        2. Select the trace from the New trace request dropdown.

        3. Edit the schedule, VLAN, or VRF and select Update.

        4. From the confirmation dialog, click Yes to complete the changes or select the link to change the name of the previous version of this scheduled trace.

          The validation can now be selected from the New Trace listing and run immediately by selecting Go or Run now. Alternately, you can wait for it to run the first time according to the schedule you specified.

        Remove Scheduled Traces

        If you have reached the maximum of 15 scheduled traces for your premises, you will need to remove traces to create additional ones.

        Both a standard user and an administrative user can remove scheduled traces. No notification is generated on removal. Be sure to communicate with other users before removing a scheduled trace to avoid confusion and support issues.

        1. Open the Trace Request card and expand the card to the largest size.

        2. Select one or more traces.

        3. Above the table, select Delete.

        1. Find the name of the scheduled trace you want to remove:

          netq show trace summary [name <text-trace-name>] [around <text-time-hr>] [json]
          

          The following example shows all scheduled traces in JSON format:

          cumulus@switch:~$ netq show trace summary json
          [
              {
                  "job_end_time": 1605300327131,
                  "job_req_time": 1604424893944,
                  "job_start_time": 1605300318198,
                  "jobid": "f8d6a2c5-54db-44a8-9a5d-9d31f4e4701d",
                  "status": "Complete",
                  "status_details": "1",
                  "trace_name": "leaf01toborder01",
                  "trace_params": {
                      "alert_on_failure": "0",
                      "dst": "10.10.10.63",
                      "src": "10.10.10.1",
                      "vlan": "-1",
                      "vrf": ""
                  }
              },
              {
                  "job_end_time": 1605300237448,
                  "job_req_time": 1604424893944,
                  "job_start_time": 1605300206939,
                  "jobid": "f8d6a2c5-54db-44a8-9a5d-9d31f4e4701d",
                  "status": "Complete",
                  "status_details": "1",
                  "trace_name": "leaf01toborder01",
                  "trace_params": {
                      "alert_on_failure": "0",
                      "dst": "10.10.10.63",
                      "src": "10.10.10.1",
                      "vlan": "-1",
                      "vrf": ""
                  }
              },
              {
                  "job_end_time": 1605300223824,
                  "job_req_time": 1604599038706,
                  "job_start_time": 1605300206930,
                  "jobid": "c7174fad-71ca-49d3-8c1d-67962039ebf9",
                  "status": "Complete",
                  "status_details": "1",
                  "trace_name": "Abc",
                  "trace_params": {
                      "alert_on_failure": "1",
                      "dst": "27.0.0.2",
                      "src": "27.0.0.1",
                      "vlan": "-1",
                      "vrf": ""
                  }
              },
              {
                  "job_end_time": 1605300233045,
                  "job_req_time": 1604519423182,
                  "job_start_time": 1605300206930,
                  "jobid": "38a75e0e-7f99-4e0c-8449-f63def1ab726",
                  "status": "Complete",
                  "status_details": "1",
                  "trace_name": "L01toB01Daily",
                  "trace_params": {
                      "alert_on_failure": "0",
                      "dst": "10.10.10.63",
                      "src": "10.10.10.1",
                      "vlan": "-1",
                      "vrf": ""
                  }
              },
          ...
          
        2. To remove the trace, run:

          netq del trace <text-trace-name>
          

          This example removes the leaf01toborder01 trace.

          cumulus@switch:~$ netq del trace leaf01toborder01
          Successfully deleted schedule trace leaf01toborder01
          
        3. Repeat these steps to remove additional traces.

        Troubleshoot NetQ

        This page describes how to generate a support file for the NVIDIA support team to help troubleshoot issues with NetQ itself.

        Browse Configuration and Log Files

        The following configuration and log files contain information that can help with troubleshooting:

        File Description
        /etc/netq/netq.yml The NetQ configuration file. This file appears only if you installed either the netq-apps package or the NetQ Agent on the system.
        /var/log/netqd.log The NetQ daemon log file for the NetQ CLI. This log file appears only if you installed the netq-apps package on the system.
        /var/log/netq-agent.log The NetQ Agent log file. This log file appears only if you installed the NetQ Agent on the system.

        Check NetQ System Installation Status

        The netq show status verbose command shows the status of NetQ components after installation. Use this command to validate NetQ system readiness:

        cumulus@netq:~$ netq show status verbose
        NetQ Live State: Active
        Installation Status: FINISHED
        Version: 4.8.0
        Installer Version: 4.8.0
        Installation Type: Standalone
        Activation Key: EhVuZXRxLWasdW50LWdhdGV3YXkYsagDIixkWUNmVmhVV2dWelVUOVF3bXozSk8vb2lSNGFCaE1FR2FVU2dHK1k3RzJVPQ==
        Master SSH Public Key: c3NoLXJzYSBBQUFBQjNOemFDMXljMkVBQUFBREFRQUJBQUFCfdsaHpjKzcwNmJiNVROOExRRXdLL3l5RVNLSHRhUE5sZS9FRjN0cTNzaHh1NmRtMkZpYmg3WWxKUE9lZTd5bnVlV2huaTZxZ0xxV3ZMYkpLMGdkc3RQcGdzNUlqanNMR3RzRTFpaEdNa3RZNlJYenQxLzh4Z3pVRXp3WTBWZDB4aWJrdDF3RGQwSjhnbExlbVk1RDM4VUdBVFVkMWQwcndLQ3gxZEhRdEM5L1UzZUs5cHFlOVdBYmE0ZHdiUFlaazZXLzM0ZmFsdFJxaG8rNUJia0pkTkFnWHdkZGZ5RXA1Vjc3Z2I1TUU3Q1BxOXp2Q1lXZW84cGtXVS9Wc0gxWklNWnhsa2crYlZ4MDRWUnN4ZnNIVVJHVmZvckNLMHRJL0FrQnd1N2FtUGxObW9ERHg2cHNHaU1EQkM0WHdud1lmSlNleUpmdTUvaDFKQ2NuRXpOVnVWRjUgcm9vdEBhbmlscmVzdG9yZQ==
        Is Cloud: False
        Kubernetes Cluster Nodes Status:
        IP Address     Hostname       Role    NodeStatus
        -------------  -------------  ------  ------------
        10.188.46.243  10.188.46.243  Role    Ready
        Task                                                                Status
        ------------------------------------------------------------------  --------
        Prepared for download and extraction                                FINISHED
        Completed setting up python virtual environment                     FINISHED
        Checked connectivity from master node                               FINISHED
        Installed Kubernetes control plane services                         FINISHED
        Installed Calico CNI                                                FINISHED
        Installed K8 Certificates                                           FINISHED
        Updated etc host file with master node IP address                   FINISHED
        Stored master node hostname                                         FINISHED
        Generated and copied master node configuration                      FINISHED
        Updated cluster information                                         FINISHED
        Plugged in release bundle                                           FINISHED
        Downloaded, installed, and started node service                     FINISHED
        Downloaded, installed, and started port service                     FINISHED
        Patched Kubernetes infrastructure                                   FINISHED
        Removed unsupported conditions from master node                     FINISHED
        Installed NetQ Custom Resource Definitions                          FINISHED
        Installed Master Operator                                           FINISHED
        Updated Master Custom Resources                                     FINISHED
        Updated NetQ cluster manager custom resource                        FINISHED
        Installed Cassandra                                                 FINISHED
        Created new database                                                FINISHED
        Updated Master Custom Resources                                     FINISHED
        Updated Kafka Custom Resources                                      FINISHED
        Read Config Key ConfigMap                                           FINISHED
        Backed up ConfigKey                                                 FINISHED
        Read ConfigKey                                                      FINISHED
        Created Keys                                                        FINISHED
        Verified installer version                                          FINISHED
        ...
        

        Verify Connectivity between Agents and Appliances

        The sudo opta-info.py command displays the status of and connectivity between agents and appliances. This command is typically used when debugging NetQ.

        In the output below, the Opta Health Status column displays a healthy status, which indicates that the appliance is functioning properly. The Opta-Gateway Channel Status column displays the connectivity status between the appliance and cloud endpoint. The Agent ID column displays the switches connected to the appliance.

        cumulus@netq-appliance:~$ sudo opta-info.py
        [sudo] password for cumulus:
        Service IP:  10.102.57.27
        
        Opta Health Status    Opta-Gateway Channel Status
        --------------------  -----------------------------
        Healthy               READY
        
        Agent ID        Remote Address    Status      Messages Exchanged  Time Since Last Communicated
        ----------      ----------------  --------  --------------------  ------------------------------
        switch1         /20.1.1.10:46420  UP                         906  2023-02-14 00:32:43.920000
        netq-appliance  /20.1.1.10:44717  UP                        1234  2023-02-14 00:32:31.757000
        
        cumulus@sm-telem-06:~$ sudo opta-info.py
        Service IP:  10.97.49.106
        
        Agent ID                                   Remote Address         Status      Messages Exchanged  Time Since Last Communicated
        -----------------------------------------  ---------------------  --------  --------------------  ------------------------------
        netq-lcm-executor-deploy-65c984fc7c-x97bl  /10.244.207.135:52314  UP                        1340  2023-02-13 19:31:37.311000
        sm-telem-06                                /10.188.47.228:2414    UP                        1449  2023-02-14 06:42:12.215000
        mlx-2010a1-14                              /10.188.47.228:12888   UP                          15  2023-02-14 06:42:27.003000
        

        Generate a Support File on the NetQ System

        The opta-support command generates information for troubleshooting issues with NetQ. It provides information about the NetQ Platform configuration and runtime statistics as well as output from the docker ps command.

        cumulus@server:~$ sudo opta-support
        Please send /var/support/opta_support_server_2021119_165552.txz to Nvidia support.
        

        To export network validation check data in addition to OPTA health data to the support bundle, the NetQ CLI must be activated with AuthKeys. If the CLI access key is not activated, the command output displays a notification and data collection excludes netq show output:

        cumulus@server:~$ sudo opta-support
        Access key is not found. Please check the access key entered or generate a fresh access_key,secret_key pair and add it to the CLI configuration
        Proceeding with opta-support generation without netq show outputs
        Please send /var/support/opta_support_server_20211122_22259.txz to Nvidia support.
        

        Generate a Support File on Switches and Hosts

        The netq-support command generates information for troubleshooting NetQ issues on a host or switch. Similar to collecting a support bundle on the NetQ system, the NVIDIA support team might request this output to gather more information about switch and host status.

        When you run the netq-support command on a switch running Cumulus Linux, a cl-support file will also be created and bundled within the NetQ support archive:

        cumulus@switch:mgmt:~$ sudo netq-support
        Collecting cl-support...
        Collecting netq-support...
        Please send /var/support/netq_support_switch_20221220_16188.txz to Nvidia support.
        

        More Documents

        The following sections contain NetQ reference materials.

        NVLink4

        With NetQ, you can monitor the performance of your NVLink devices, manage NVOS upgrades, create NVLink domains, and troubleshoot issues. This section describes the NetQ integration with NVLink4.

        Each NVSwitch has a designated telemetry agent embedded in NVOS. This agent fetches telemetry data and streams it to a Fluentd data collector that integrates with NetQ or a third-party client.

        NetQ maintains GFM processes with high availability. If the GFM process stops unexpectedly, NetQ quickly and automatically remediates issues.

        Domain Management

        This section describes how to create, edit, and delete NVLink4 domains. To collect telemetry data that can be visualized in the UI, create and configure a domain, then run Global Fabric Manager (GFM).

        Requirements

        To run GFM, each domain needs a configuration file, a topology file, and an IP address file. You need to upload the topology and IP address files during the domain creation process. The configuration file is created automatically after you have configured the domain.

        Create a Domain

        This section outlines the steps to create a new domain using the UI. Advanced users can manually adjust GFM variables (beyond what is presented in the UI) by creating a new domain as outlined below, then following the steps to edit GFM configuration variables.

        1. Select the NVL4 icon in the header, then select Add domain.

        2. Fill out the fields in the UI, starting with configuring the GFM:

        wizard prompting user to configure GFM

        When toggled on, the Create all nodes partition switch creates a single, default partition.

        1. Select Next.

        2. The next screen prompts you to upload a topology file. For GFM to run, the topology file must reflect how the network is wired. The same topology file is frequently reused for multiple domains. If a topology file was previously used to create a domain, you will be able to select it from this screen. NetQ supports both IPv4 and IPv6 addresses.

        3. Select Next.

        4. The subsequent screen prompts you to upload your fabric node configuration. This is a text file listing the IP addresses for the nodes (NVLink switches) that comprise the domain.

        5. Select Next.

        6. The final screen displays a summary of the domain’s parameters. In addition to the summary, you can choose to start GFM after creating the domain. If you are not ready to start GFM or if you are planning to edit the GFM variables, you can save the configuration and start it later.

        7. After reviewing the summary, select Finish. NetQ adds the domain to a list of all NVLink4 domains:

        From the list of NVLink4 domains, you can view and manage multiple domains. Per domain, you can view the:

        You can also perform the following actions:

        View Additional Domain Information

        When you select the View details button on a given domain, you gain access to granular information about the domain’s configuration:

        Edit a Domain

        To edit a domain, you must first stop the GFM by selecting the stop button. After GFM stops, select the three-dot menu, then select Edit. Note that this menu only appears when GFM is not running. To manually adjust and configure a broader range GFM variables that are unavailable in the NetQ UI, refer to Edit GFM Variables.

        Delete a Domain

        To delete a domain, you must first stop the GFM by selecting the stop button. After GFM stops, select the three-dot menu, then select Delete. You cannot delete a topology file that is in use by a domain.

        NetQ CLI Reference

        This reference provides details about each of the NetQ CLI commands. For an overview of the CLI structure and usage, read the NetQ Command Line Overview.

        The commands appear alphabetically by command name.

        When options are available, you should use them in the order listed.

        Integrate NetQ API with Your Applications

        The NetQ API provides data about the performance and operation of your network and its devices. You can view the data with your internal tools or with third-party analytic tools. The API displays the health of individual switches, network protocols and services, trace and validation results, as well as networkwide inventory and events.

        This guide provides an overview of the NetQ API framework, including the basics of using Swagger UI 2.0 or bash plus curl to view and test the APIs. Descriptions of each endpoint and model parameter are in individual API JSON files.

        API Organization

        The NetQ API provides endpoints for:

        Each endpoint has its own API. You can make requests for all data and all devices or you can filter the request by a given hostname. Each API returns a predetermined set of data as defined in the API models.

        The Swagger interface displays both public and internal APIs. Public APIs do not have internal in their name. Internal APIs are not supported for public use and subject to change without notice.

        Get Started

        You can access the API gateway and execute requests from the Swagger UI or a terminal interface. If you are using a terminal window, proceed to the next section.

        1. Open the Swagger interface by entering one of the following in your browser’s address bar:

        2. Select auth from the Select a definition dropdown at the top right of the window. This opens the authorization API.

        Log In

        Although you can view the API endpoints without authorization, you can only execute the API endpoints if you have been authorized.

        You must first obtain an access key and then use that key to authorize your access to the API.

        1. Click POST/login.
        1. Click Try it out.
        1. Enter the username and password you use to log in to the NetQ UI. Do not change the access-key value.
        1. Click Execute.

        2. Scroll down to view the Responses. In the Server response section, in the Response body of the 200 code response, copy the access token in the top line.

        1. Click Authorize.
        1. Paste the access key into the Value field, and click Authorize.

        2. Click Close.

        To log in and obtain authorization:

        1. Open a terminal window.

        2. Log in to obtain the access token. You will need the following information:

          • Hostname or IP address, and port (443 for cloud deployments, 32708 for on-premises deployments) of your API gateway
          • Your login credentials that were provided as part of the NetQ installation process. For this release, the default is username admin and password admin.

          This example uses an IP address of 192.168.0.10, port of 443, and the default credentials:

          <computer-name>:~ <username>$ curl -X POST "https://api.192.168.0.10.netq.nvidia.com:443/netq/auth/v1/login" -H "accept: application/json" -H "Content-Type: application/json" -d "{ \"username\": \"admin\", \"password\": \"admin\", \"access_key\": \"string\"}"
          

          The output provides the access token as the first parameter.

          {"access_token":"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9....","customer_id":0,"expires_at":1597200346504,"id":"admin","is_cloud":true,"premises":[{"name":"OPID0","namespace":"NAN","opid":0},{"name":"ea-demo-dc-1","namespace":"ea1","opid":30000},{"name":"ea-demo-dc-2","namespace":"ea1","opid":30001},{"name":"ea-demo-dc-3","namespace":"ea1","opid":30002},{"name":"ea-demo-dc-4","namespace":"ea1","opid":30003},{"name":"ea-demo-dc-5","namespace":"ea1","opid":30004},{"name":"ea-demo-dc-6","namespace":"ea1","opid":30005},{"name":"ea-demo-dc-7","namespace":"ea1","opid":80006},{"name":"Cumulus Data Center","namespace":"NAN","opid":1568962206}],"reset_password":false,"terms_of_use_accepted":true}
          
        3. Copy the access token to a text file. You will need this token to make API data requests.

        You are now able to create and execute API requests against the endpoints.

        By default, authorization is valid for 24 hours, after which users must sign in again and reauthorize their account.

        API Requests

        You can use either the Swagger UI or a terminal window with bash and curl commands to create and execute API requests.

        1. Select the endpoint from the definition dropdown at the top right of the application.

          This example shows the BGP endpoint selected:

        1. Select the endpoint object.

          This example shows the results of selecting the GET bgp object:

        A description is provided for each object and the various parameters that can be specified. In the Responses section, you can see the data that is returned when the request is successful.
        1. Click Try it out.

        2. Enter values for the required parameters.

        3. Click Execute.

        In a terminal window, use bash plus curl to execute requests. Each request contains an API method (GET, POST, etc.), the address and API endpoint object to query, a variety of headers, and sometimes a body. For example, in the log in step above:

        • API method = POST
        • Address and API object = “https://<netq.domain>:443/netq/auth/v1/login”
        • Headers = -H “accept: application/json” and -H “Content-Type: application/json”
        • Body = -d “{ "username": "admin", "password": "admin", "access_key": "string"}”

        API Responses

        A NetQ API response comprises a status code, any relevant error codes (if unsuccessful), and the collected data (if successful).

        The following HTTP status codes might be presented in the API responses:

        Code Name Description Action
        200 Success Request was successfully processed. Review response.
        400 Bad Request Invalid input was detected in request. Check the syntax of your request and make sure it matches the schema.
        401 Unauthorized Authentication has failed or credentials were not provided. Provide or verify your credentials, or request access from your administrator.
        403 Forbidden Request was valid, but user might not have the needed permissions. Verify your credentials or request an account from your administrator.
        404 Not Found Requested resource could not be found. Try the request again after a period of time or verify status of resource.
        409 Conflict Request cannot be processed due to conflict in current state of the resource. Verify status of resource and remove conflict.
        500 Internal Server Error Unexpected condition has occurred. Perform general troubleshooting and try the request again.
        503 Service Unavailable The service being requested is currently unavailable. Verify the status of the NetQ Platform or appliance, and the associated service.

        Example Requests and Responses

        Some command requests and their responses are shown here, but feel free to run your own requests. To run a request, you will need your authorization token. When using the curl commands, the responses have been piped through a python tool to make them more readable. You can choose to do so as well.

        Validate Networkwide Status of the BGP Service

        Make your request to the bgp endpoint to obtain validate the operation of the BGP service on all nodes running the service.

        1. Open the check endpoint.
        1. Open the check object.
        1. Click Try it out.

        2. Enter values for time, duration, by, and proto parameters.

          In this example, time=1597256560, duration=24, by=scheduled, and proto=bgp.

        3. Click Execute, then scroll down to see the results under Server response.

        Run the following curl command, entering values for the various parameters. In this example, time=1597256560, duration=24 (hours), by=scheduled, and proto=bgp.

        curl -X GET "<https://<netq.domain>:<port>/netq/telemetry/v1/object/check?time=1597256560&duration=24&by=scheduled&proto=bgp" -H "accept: application/json" -H  "Authorization: <auth-token> " | python -m json.tool
        
          % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                         Dload  Upload   Total   Spent    Left  Speed
        100 22869  100 22869    0     0  34235      0 --:--:-- --:--:-- --:--:-- 34183
        {
            "count": 24,
            "data": [
                {
                    "additional_summary": {
                        "failed_sessions": 0,
                        "total_sessions": 0
                    },
                    "failed_node_set": [],
                    "jobid": "c5c046d1-3cc5-4c8b-b4e8-cf2bbfb050e6",
                    "res_timestamp": 1597254743280,
                    "rotten_node_set": [],
                    "summary": {
                        "checkedNodeCount": 0,
                        "failedNodeCount": 0,
                        "failedSessionCount": 0,
                        "rottenNodeCount": 0,
                        "totalNodeCount": 0,
                        "totalSessionCount": 0,
                        "warningNodeCount": 0
                    },
        ...
        

        Get Status of EVPN on a Specific Switch

        Make your request to the evpn/hostname endpoint to view the status of all EVPN sessions running on that node.

        This example uses the server01 switch.

        1. Open the EVPN endpoint.
        1. Open the hostname object.
        1. Click Try it out.

        2. Enter a value for hostname, and optional values for eq_timestamp, count, and offset parameters.

          In this example, time=1597256560, duration=24, by=scheduled, and proto=bgp.

        3. Click Execute, then scroll down to see the results under Server response.

        This example uses the server01 switch in an on-premises network deployment.

        curl -X GET "https://<netq.domain>:32708/netq/telemetry/v1/object/evpn/hostname/spine01" -H "accept: application/json" -H "Authorization: <auth-token>" | python -m json.tool
        
          % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                         Dload  Upload   Total   Spent    Left  Speed
        100     2    0     2    0     0      3      0 --:--:-- --:--:-- --:--:--     3
        []
        
        
        <!-- old output -->
        
        [
            {
            "import_rt": "[\"197:42\"]",
            "vni": 42,
            "rd": "27.0.0.22:2",
            "hostname": "server01",
            "timestamp": 1556037403853,
            "adv_all_vni": true,
            "export_rt": "[\"197:42\"]",
            "db_state": "Update",
            "in_kernel": true,
            "adv_gw_ip": "Disabled",
            "origin_ip": "27.0.0.22",
            "opid": 0,
            "is_l3": false
            },
            {
            "import_rt": "[\"197:37\"]",
            "vni": 37,
            "rd": "27.0.0.22:8",
            "hostname": "server01",
            "timestamp": 1556037403811,
            "adv_all_vni": true,
            "export_rt": "[\"197:37\"]",
            "db_state": "Update",
            "in_kernel": true,
            "adv_gw_ip": "Disabled",
            "origin_ip": "27.0.0.22",
            "opid": 0,
            "is_l3": false
            },
            {
            "import_rt": "[\"197:4001\"]",
            "vni": 4001,
            "rd": "6.0.0.194:5",
            "hostname": "server01",
            "timestamp": 1556036360169,
            "adv_all_vni": true,
            "export_rt": "[\"197:4001\"]",
            "db_state": "Refresh",
            "in_kernel": true,
            "adv_gw_ip": "Disabled",
            "origin_ip": "27.0.0.22",
            "opid": 0,
            "is_l3": true
            },
        ...
        

        Get Status on All Interfaces at a Given Time

        Make your request to the interfaces endpoint to view the status of all interfaces. By specifying the eq-timestamp option and entering a date and time in epoch format, you indicate the data for that time (versus in the last hour by default), as follows:

        curl -X GET "https://<netq.domain>:32708/netq/telemetry/v1/object/interface?eq_timestamp=1556046250" -H "Content-Type: application/json" -H "Authorization: <auth-token>" | python -m json.tool
         
        [
          {
            "hostname": "exit-1",
            "timestamp": 1556046270494,
            "state": "up",
            "vrf": "DataVrf1082",
            "last_changed": 1556037405259,
            "ifname": "swp3.4",
            "opid": 0,
            "details": "MTU: 9202",
            "type": "vlan"
          },
          {
            "hostname": "exit-1",
            "timestamp": 1556046270496,
            "state": "up",
            "vrf": "DataVrf1081",
            "last_changed": 1556037405320,
            "ifname": "swp7.3",
            "opid": 0,
            "details": "MTU: 9202",
            "type": "vlan"
          },
          {
            "hostname": "exit-1",
            "timestamp": 1556046270497,
            "state": "up",
            "vrf": "DataVrf1080",
            "last_changed": 1556037405310,
            "ifname": "swp7.2",
            "opid": 0,
            "details": "MTU: 9202",
            "type": "vlan"
          },
          {
            "hostname": "exit-1",
            "timestamp": 1556046270499,
            "state": "up",
            "vrf": "",
            "last_changed": 1556037405315,
            "ifname": "DataVrf1081",
            "opid": 0,
            "details": "table: 1081, MTU: 65536, Members:  swp7.3,  DataVrf1081,  swp4.3,  swp6.3,  swp5.3,  swp3.3, ",
            "type": "vrf"
          },
        ...
        

        Get a List of All Devices Being Monitored

        Make your request to the inventory endpoint to get a listing of all monitored nodes and their configuration information, as follows:

        curl -X GET "https://<netq.domain>:32708/netq/telemetry/v1/object/inventory" -H "Content-Type: application/json" -H "Authorization: <auth-token>" | python -m json.tool
         
        [
          {
            "hostname": "border01",
            "timestamp": 1556037425658,
            "asic_model": "A-Z",
            "agent_version": "3.2.0-cl4u30~1601403318.104fb9ed",
            "os_version": "A.2.0",
            "disk_total_size": "10 GB",
            "os_version_id": "A.2.0",
            "platform_model": "A_VX",
            "memory_size": "2048.00 MB",
            "asic_vendor": "AA Inc",
            "cpu_model": "A-SUBLEQ",
            "asic_model_id": "N/A",
            "platform_vendor": "A Systems",
            "asic_ports": "N/A",
            "cpu_arch": "x86_64",
            "cpu_nos": "2",
            "platform_mfg_date": "N/A",
            "platform_label_revision": "N/A",
            "agent_state": "fresh",
            "cpu_max_freq": "N/A",
            "platform_part_number": "3.7.6",
            "asic_core_bw": "N/A",
            "os_vendor": "CL",
            "platform_base_mac": "00:01:00:00:01:00",
            "platform_serial_number": "00:01:00:00:01:00"
          },
          {
            "hostname": "exit-2",
            "timestamp": 1556037432361,
            "asic_model": "C-Z",
            "agent_version": "3.2.0-cl4u30~1601403318.104fb9ed",
            "os_version": "C.2.0",
            "disk_total_size": "30 GB",
            "os_version_id": "C.2.0",
            "platform_model": "C_VX",
            "memory_size": "2048.00 MB",
            "asic_vendor": "CC Inc",
            "cpu_model": "C-CRAY",
            "asic_model_id": "N/A",
            "platform_vendor": "C Systems",
            "asic_ports": "N/A",
            "cpu_arch": "x86_64",
            "cpu_nos": "2",
            "platform_mfg_date": "N/A",
            "platform_label_revision": "N/A",
            "agent_state": "fresh",
            "cpu_max_freq": "N/A",
            "platform_part_number": "3.7.6",
            "asic_core_bw": "N/A",
            "os_vendor": "CL",
            "platform_base_mac": "00:01:00:00:02:00",
            "platform_serial_number": "00:01:00:00:02:00"
          },
          {
            "hostname": "firewall-1",
            "timestamp": 1556037438002,
            "asic_model": "N/A",
            "agent_version": "2.1.0-ub16.04u15~1555608012.1d98892",
            "os_version": "16.04.1 LTS (Xenial Xerus)",
            "disk_total_size": "3.20 GB",
            "os_version_id": "(hydra-poc-01 /tmp/purna/Kleen-Gui1/)\"16.04",
            "platform_model": "N/A",
            "memory_size": "4096.00 MB",
            "asic_vendor": "N/A",
            "cpu_model": "QEMU Virtual  version 2.2.0",
            "asic_model_id": "N/A",
            "platform_vendor": "N/A",
            "asic_ports": "N/A",
            "cpu_arch": "x86_64",
            "cpu_nos": "2",
            "platform_mfg_date": "N/A",
            "platform_label_revision": "N/A",
            "agent_state": "fresh",
            "cpu_max_freq": "N/A",
            "platform_part_number": "N/A",
            "asic_core_bw": "N/A",
            "os_vendor": "Ubuntu",
            "platform_base_mac": "N/A",
            "platform_serial_number": "N/A"
          },
        ...
        

        Spectrum Support

        Several NetQ features function exclusively on NVIDIA Spectrum switches. The following table summarizes supported features:

        Spectrum-1 Spectrum-2 Spectrum-3 Spectrum-4
        What Just Happened Partial support; no latency and congestion monitoring Yes Yes Yes
        LCM with on-switch OPTA No Yes Yes Yes
        Flow analysis No Yes Yes Yes
        Adaptive routing monitoring No No No Yes
        ECMP monitoring Yes Yes Yes Yes
        PTP monitoring Yes Yes Yes Yes
        RoCE monitoring Yes Yes Yes Yes
        Queue length histograms Yes Yes Yes Yes
        Process monitoring Yes Yes Yes Yes

        Visit the Spectrum website to learn more about the Spectrum-X networking platform.

        Glossary

        Common Cumulus Linux and NetQ Terminology

        The following table covers some basic terms used throughout the NetQ user documentation.

        Term Definition
        Agent NetQ software that resides on a host server that provides metrics about the host to the NetQ Telemetry Server for network health analysis.
        Bridge Device that connects two communication networks or network segments. Occurs at OSI Model Layer 2, Data Link Layer.
        Clos Multistage circuit switching network used by the telecommunications industry, first formalized by Charles Clos in 1952.
        Device UI term referring to a switch, host, or chassis or combination of these. Typically used when describing hardware and components versus a software or network topology. See also Node.
        Event Change or occurrence in network or component that can trigger a notification. Events are categorized by severity: error or info.
        Fabric Network topology where a set of network nodes interconnects through one or more network switches.
        Fresh Node that has been communicative for the last 120 seconds.
        High Availability Software used to provide a high percentage of uptime (running and available) for network devices.
        Host A device connected to a TCP/IP network. It can run one or more virtual machines.
        Hypervisor Software which creates and runs virtual machines. Also called a virtual machine monitor.
        IP Address An Internet Protocol address comprises a series of numbers assigned to a network device to uniquely identify it on a given network. Version 4 addresses are 32 bits and written in dotted decimal notation with 8-bit binary numbers separated by decimal points. Example: 10.10.10.255. Version 6 addresses are 128 bits and written in 16-bit hexadecimal numbers separated by colons. Example: 2018:3468:1B5F::6482:D673.
        Leaf An access layer switch in a Spine-Leaf or Clos topology. An Exit-Leaf is a switch that connects to services outside of the data center such as firewalls, load balancers, and internet routers. See also Spine, Clos, Top of Rack, and Access Switch.
        Linux Set of free and open-source software operating systems built around the Linux kernel. Cumulus Linux is one of the available distribution packages.
        Node UI term referring to a switch, host, or chassis in a topology.
        Notification Item that informs a user of an event. Notifications are received through third-party applications, such as email or Slack.
        Peer link Link, or bonded links, used to connect two switches in an MLAG pair.
        Rotten Node that has been silent for 120 seconds or more.
        Router Device that forwards data packets (directs traffic) from nodes on one communication network to nodes on another network. Occurs at the OSI Model Layer 3, Network Layer.
        Spine Used to describe the role of a switch in a Spine-Leaf or Clos topology. See also Aggregation switch, End of Row switch, and distribution switch.
        Switch High-speed device that receives data packets from one device or node and redirects them to other devices or nodes on a network.
        Telemetry server NetQ server that receives metrics and other data from NetQ agents on leaf and spine switches and hosts.
        Top of Rack Switch that connects to the network (versus internally); also known as a ToR switch.
        Virtual Machine Emulation of a computer system that provides all the functions of a particular architecture.
        Web-scale A network architecture designed to deliver capabilities of large cloud service providers within an enterprise IT environment.
        Whitebox Generic, off-the-shelf, switch or router hardware used in Software Defined Networks (SDN).

        Common Cumulus Linux and NetQ Acronyms

        The following table covers some common acronyms used throughout the NetQ user documentation.

        Acronym Meaning
        ACL Access Control Link
        ARP Address Resolution Protocol
        ASN Autonomous System Number
        BGP/eBGP/iBGP Border Gateway Protocol, External BGP, Internal BGP
        CLAG Cumulus multi-chassis Link Aggregation Group
        DHCP Dynamic Host Control Protocol
        DNS Domain Name Server
        ECMP Equal Cost Multi-Path routing
        EVPN Ethernet Virtual Private Network
        FDB Forwarding Data Base
        GNU “GNU’s Not Linux”
        HA High Availability
        IGMP Internet Group Management Protocol
        IPv4/IPv6 Internet Protocol, version 4 or 6
        LACP Link Aggregation Control Protocol
        LAN Local Area Network
        LCM Lifecycle Management
        LLDP Link Layer Data Protocol
        MAC Media Access Control
        MIB Management Information Base
        MLAG Multi-chassis Link Aggregation Group
        MLD Multicast Listener Discovery
        NTP Network Time Protocol
        OOB Out of Band (management)
        OPTA On-premises Telemetry Aggregator
        OSPF Open Shortest Path First
        PTP Precision Time Protocol
        RFC Remote Function Call
        RoCE RDMA over Converged Ethernet
        SDN Software-Defined Network
        SNMP Simple Network Management Protocol
        STP Spanning Tree Protocol
        TCA Threshold Crossing Alarms
        TCP Transport Control Protocol
        ToR Top of Rack
        UDP User Datagram Protocol
        URL Universal Resource Locator
        USB Universal Serial Bus
        VLAN Virtual Local Area Network
        VNI Virtual Network Instance
        VPN Virtual Private Network
        VRF Virtual Routing and Forwarding
        VRR Virtual Router Redundancy
        VTEP VXLAN Tunnel EndPoint
        VXLAN Virtual Extensible Local Area Network
        ZTP Zero Touch Provisioning

        NVLink4 Events

        View events generated by NVLink devices and NetQ in the events dashboard. The dashboard displays LFM and GFM status events, events related to the health of NVLink switches, and the severity levels of those events. The dashboard also displays timestamp data and ASIC IDs to help with troubleshooting. The dashboard updates every 60 seconds.

        Monitor Events in the UI

        Expand the Menu, then select Events. The dashboard presents a timeline of events alongside the devices that are causing the most events. Select the NVL4 tab to view events related exclusively to NVLink devices.

        Use the controls above the summary to filter events by time, device (hostname), type, severity, or state.

        To learn more about the types of events generated by NetQ and how to manage those events, see Events and Notifications.

        NVLink4 Inventory

        This section describes how to view device statistics and data for NVLink4 L1 and L2 switches.

        Add inventory cards to your workbench to:

        Add NVLink4 Cards to Your Dashboard

        Select Add card in the header:

        Select the cards to add them to your workbench. There are two NVLink4 inventory cards—NVLink L1 Switches and NVLink Switches. You can also enter the name of the device in the global search field at the top of the dashboard and add the respective device card to your workbench.

        When fully expanded, NVLink4 cards display a table with device statistics about cable ports, sensors, and digital optics. You can view additional data about each of these categories by selecting a sub-category in the header:

        fully-expanded NVLink card showing devices statistics

        By selecting devices and adjusting a card’s size, you can view device statistics and data using different displays and visualizations. The following cards display interface data for a given switch:

        card displaying flits data
        card displaying channel data

        For more information on how to interact with cards, refer to the Access Data with Cards in the NetQ UI overview section.

        NVOS Management

        NVOS images are managed with lifecycle management in the NetQ UI. This section details how to check for missing images, upload images, and specify default images. You can download NVOS images from the NVIDIA Application Hub.

        To complete the tasks outlined in this section, expand the Menu in the upper-left corner of the UI and select Manage switches. From the dashboard, select the Image management tab to display the NetQ and network OS images, including NVOS:

        images card with link to view missing images

        View and Upload Missing Images

        You should upload images for each NVOS version currently installed in your inventory so you can support rolling back to a known good version should an installation or upgrade fail. If you have specified a default NVOS version, NetQ verifies that the necessary versions of the default image are available based on the known device inventory, and if not, lists those that are missing.

        To upload missing NVOS images:

        1. On the NVOS Images card, select View # missing NVOS images to see which images you need.
        nvos images card with link to view missing images

        If you have already specified a default image, you must click Manage and then Missing to see the missing images.

        1. Select one or more of the missing images and make note of the version, ASIC vendor, and CPU architecture for each.
        1. In the UI, select Add image above the table.

        2. Provide the file that matches the criteria for the selected image(s).

        3. Click Import.

        If the upload was unsuccessful, an Image Import Failed message appears. Close the dialog and try uploading the file again.
        1. Click Done.

        2. (Optional) Click the Uploaded tab to verify the image is in the repository.

        Upload Images

        To upload the NVOS images that you want to use for the upgrade, first download the images (.img files). Place them in an accessible part of your local network. After obtaining the images, upload them to NetQ:

        1. Select the Add image button on the NVOS card:
        nvos images card
        1. Upload the images, then select Import.

        2. Monitor the progress until it completes. Click Done.

        Specify a Default Upgrade Version

        Specifying a default upgrade version is optional, but recommended. You can assign an NVOS version as the default version to use when installing or upgrading switches. The default is typically the newest version that you intend to install or upgrade on all, or the majority, of your switches. If necessary, you can override the default selection during the installation or upgrade process if an alternate version is needed for a given set of switches.

        To specify a default version:

        1. On the NVOS Images card, select Click here to set default NVOS version:

          card highlighting link to set default NVOS version

        2. Select the version you want to use as the default for switch upgrades.

        3. Click Save. The default version is now displayed on the card.

        Remove Images from Local Repository

        After you upgrade all your switches beyond a particular release, you can remove images from the LCM repository to save space on the server. To remove images:

        1. Click Manage on the NVOS Images card.

        2. On the Uploaded tab, select the images you want to remove.

        3. Click Delete.

        Perform an NVOS Upgrade

        1. Expand the Menu in the upper-left corner of the UI and select Manage switches.

        2. From the Switches card, select Manage:

        1. Select the device(s) to include in the upgrade, then click Upgrade NVOS above the table and follow the steps in the UI: give the upgrade a name, select the NVOS version, then choose whether to restart the devices after they’ve been upgraded. If you choose not to restart the devices after the upgrade, the upgrade will remain in a pending state until the devices are restarted.

        2. NetQ directs you to a screen where you can monitor the upgrade and view past upgrades:

        View Previous NVOS Upgrades

        To view the full history of NVOS upgrades:

        1. Expand the Menu and select Manage switches.

        2. Select the Job history tab:

        1. On the NVOS upgrade history card, select View. From here, you can sort and filter upgrades using the controls at the top of the screen.

        To view information at the most granular level, expand an individual upgrade job and select the arrow:

        Select Details on any device to display a timestamped history of the upgrade:

        Debugging Files

        Use the NetQ UI to generate and download diagnostic files for debugging. You can generate system dumps for NVLink L1 and L2 switches and GFM logs for a given domain. Note that after you delete a domain, you will not be able to generate or download debugging files and any files previously generated for the domain will also be deleted.

        Create and Download a System Dump

        1. From the NVLink4 management dashboard, locate the domain and select the View details button.

        2. Navigate to the Devices tab.

        3. Select up to 5 devices, then select the Generate sysdump icon above the table.

        You can monitor the progress of the system dump file in the Sysdump file status column. If the system dump file fails to generate, the column indicates a failed status along with a tooltip indicating the reason for the failure.

        1. Open the File manager at the top of the screen.
        1. Select the files you’d like to download, then select the download icon above the table.

        Create and Download GFM Log Files

        1. Generate GFM logs by selecting the Fetch button on a given domain.
        1. After the file is generated, a download icon appears in place of the Fetch button. Select the icon to download the logs. You can also download the file via the file manager, as described in the previous section.

        Edit GFM Variables

        After creating a new domain, you can manually adjust GFM variables in the fabric manager configuration file, fabricmanager.cfg. This section contains an example configuration and a complete reference of variables that you can adjust.

        You can only edit domains that have already been created. To get started, create a domain, then return to this section.

        Example

        In the following example, we’ll adjust the GFM_WAIT_TIMEOUT variable from 20 seconds to 15 seconds.

        1. After creating a new domain, run the following command to display the current configuration. The output indicates that the wait time is currently set to 20 seconds.
        cumulus@netq-appliance:~$ kubectl exec -it netq-app-nvl4-controller-12-deploy-0 -c gfm  cat /usr/share/nvidia/nvswitch/fabricmanager.cfg | grep GFM
        GFM_WAIT_TIMEOUT=20
        
        1. On your NetQ server, edit the configuration with the kubectl edit netqapps netqapps-nvl4 command. In the example below, the gfm_wait_timeout variable is adjusted to 15 seconds. Note that this command displays variables in lower case; the same variables are written in upper case in the fabric manager configuration file, fabricmanager.cfg.
        cumulus@netq-appliance:~$ kubectl edit netqapps netqapps-nvl4
        domains:
            "12"
                created_by: admin
                domain_name: test
                fabric_configuration_id: 23
                fabric_mode: 0
                gfm_wait_timeout: 15
                log_level: 4
                multi_node_enable_all_node_partition: 0
                opid: 0
                topology_id: 21
        
        1. Save the new configuration and exit. Then restart the pods.

        2. Retrieve the pod name with the kubectl get pods | grep nvl4 command:

        cumulus@netq-appliance:~$ kubectl get pods | grep nvl4
        netq-app-nvl4-controller-12-deploy-0
        
        1. Check the fabric manager configuration file, fabricmanager.cfg, to confirm that the values updated to reflect the new configuration:
        $ kubectl exec -it netq-app-nvl4-controller-12-deploy-0 -c gfm  cat /usr/share/nvidia/nvswitch/fabricmanager.cfg | grep GFM
        GFM_WAIT_TIMEOUT=15
        

        Variables Reference

        You can edit the following GFM variables as described in the previous section.

        LOG_LEVEL=${LOG_LEVEL}
        Description: Fabric Manager logging levels
        Possible Values:
        	0  - All the logging is disabled
        	1  - Set log level to CRITICAL and above
        	2  - Set log level to ERROR and above
        	3  - Set log level to WARNING and above
        	4  - Set log level to INFO and above
        
        LOG_FILE_NAME=/var/log/fabricmanager.log
        Description: Filename for Fabric Manager logs
        Possible Values:
            Full path/filename string (max length of 256). Logs will be redirected to console(stderr) if the specified log file can't be opened or the path is empty.
        
        LOG_APPEND_TO_LOG=${LOG_APPEND_TO_LOG}
        Description: Append to an existing log file or overwrite the logs
        Possible Values:
            0  - No (Log file will be overwritten)
            1  - Yes (Append to existing log)
        
        LOG_FILE_MAX_SIZE=${LOG_FILE_MAX_SIZE}
        Description: Max size of log file (in MB)
        Possible Values:
        	Any integer values
        
        LOG_USE_SYSLOG=${LOG_USE_SYSLOG}
        Description: Redirect all the logs to syslog instead of logging to file
        Possible Values:
        	0  - No
        	1  - Yes
        
        DAEMONIZE=${DAEMONIZE}
        Description: daemonize Fabric Manager on start-up
        Possible Values:
            0  - No (Do not daemonize and run fabric manager as a normal process)
            1  - Yes (Run Fabric Manager process as Unix daemon
        
        BIND_INTERFACE_IP=${BIND_INTERFACE_IP}
        Description: Network interface to listen for Global and Local Fabric Manager communication
        Possible Values:
        	A valid IPv4 address. By default, uses loopback (127.0.0.1) interface
        
        STARTING_TCP_PORT=${STARTING_TCP_PORT}
        Description: Starting TCP port number for Global and Local Fabric Manager communication
        Possible Values:
        	Any value between 0 and 65535
        
        UNIX_SOCKET_PATH=${UNIX_SOCKET_PATH}
        Description: Use Unix sockets instead of TCP Socket for Global and Local Fabric Manager communication
        Possible Values:
        	Unix domain socket path (max length of 256)
        Default Value: 
        	Empty String (TCP socket will be used instead of Unix sockets)
        
        FABRIC_MODE=${FABRIC_MODE}
        Description: Fabric Manager Operating Mode
        Possible Values:
            0  - Start Fabric Manager in Bare metal or Full pass through virtualization mode
            1  - Start Fabric Manager in Shared NVSwitch multitenancy mode. 
            2  - Start Fabric Manager in vGPU based multitenancy mode.
        
        FABRIC_MODE_RESTART=${FABRIC_MODE_RESTART}
        Description: Restart Fabric Manager after exit. Applicable only in Shared NVSwitch or vGPU based multitenancy mode
        Possible Values:
            0  - Start Fabric Manager and follow full initialization sequence
            1  - Start Fabric Manager and follow Shared NVSwitch or vGPU based multitenancy mode resiliency/restart sequence.
        
        STATE_FILE_NAME=${STATE_FILE_NAME}
        Description: Specify the filename to be used to save Fabric Manager states. Valid only if Shared NVSwitch or vGPU based multitenancy mode is enabled
        Possible Values:
        	Full path/filename string (max length of 256)
        
        FM_CMD_BIND_INTERFACE=${FM_CMD_BIND_INTERFACE}
        Description: Network interface to listen for Fabric Manager SDK/API to communicate with running FM instance.
        Possible Values:
        	A valid IPv4 address. By default, uses loopback (127.0.0.1) interface
        
        FM_CMD_PORT_NUMBER=${FM_CMD_PORT_NUMBER}
        Description: TCP port number for Fabric Manager SDK/API to communicate with running FM instance.
        Possible Values:
        	Any value between 0 and 65535
        
        FM_CMD_UNIX_SOCKET_PATH=${FM_CMD_UNIX_SOCKET_PATH}
        Description: Use Unix sockets instead of TCP Socket for Fabric Manager SDK/API communication
        Possible Values:
        		Unix domain socket path (max length of 256)
        Default Value: 
        		Empty string (TCP socket will be used instead of Unix sockets)
        
        
        FM_STAY_RESIDENT_ON_FAILURES=${FM_STAY_RESIDENT_ON_FAILURES}
        Description: Fabric Manager does not exit when facing failures
        Possible Values:
            0 – Fabric Manager service will terminate on errors such as, NVSwitch and GPU config failure, typical software errors, etc.  
            1 – Fabric Manager service will stay running on errors such as, NVSwitch and GPU config failure, typical software errors etc. However, the system will be uninitialized and CUDA application launch will fail. 
        
        ACCESS_LINK_FAILURE_MODE=${ACCESS_LINK_FAILURE_MODE}
        Description: Degraded Mode options when there is an Access Link Failure (GPU to NVSwitch NVLink failure)
        Possible Values:
            In bare metal or full passthrough virtualization mode
            0  - Remove the GPU with the Access NVLink failure from NVLink P2P capability
            1  - Disable the NVSwitch and its peer NVSwitch, which reduces NVLink P2P bandwidth
        
            In Shared NVSwitch or vGPU based multitenancy mode
            0  - Disable partitions which are using the Access Link failed GPU
            1  - Disable the NVSwitch and its peer NVSwitch, all partitions will be available but with reduced NVLink P2P bandwidth
        
        TRUNK_LINK_FAILURE_MODE=${TRUNK_LINK_FAILURE_MODE}
        Description: Degraded Mode options when there is a Trunk Link Failure (NVSwitch to NVSwitch NVLink failure)
        Possible Values:
            In bare metal or full passthrough virtualization mode
            0  - Exit Fabric Manager and leave the system/NVLinks uninitialized
            1  - Disable the NVSwitch and its peer NVSwitch, which reduces NVLink P2P bandwidth
        
            In Shared NVSwitch or vGPU based multitenancy mode
            0  - Remove partitions that are using the Trunk NVLinks
            1  - Disable the NVSwitch and its peer NVSwitch,
                    all partitions will be available but with reduced NVLink P2P bandwidth
        
        NVSWITCH_FAILURE_MODE=${NVSWITCH_FAILURE_MODE}
        Description: Degraded Mode options when there is a NVSwitch failure or an NVSwitch is excluded
        Possible Values:
            In bare metal or full passthrough virtualization mode
            0  - Abort Fabric Manager
            1  - Disable the NVSwitch and its peer NVSwitch, which reduces P2P bandwidth
        
            In Shared NVSwitch or vGPU based multitenancy mode
            0  - Disable partitions that are using the NVSwitch
            1  - Disable the NVSwitch and its peer NVSwitch,
                   all partitions will be available but with reduced NVLink P2P bandwidth
        
        ABORT_CUDA_JOBS_ON_FM_EXIT=${ABORT_CUDA_JOBS_ON_FM_EXIT}
        Description: Control running CUDA jobs behavior when Fabric Manager service is stopped or terminated
        Possible Values:
            0  - Do not abort running CUDA jobs when Fabric Manager exits. However new CUDA job launch will fail.
            1  - Abort all running CUDA jobs when Fabric Manager exits.
        
        TOPOLOGY_FILE_PATH=${TOPOLOGY_FILE_PATH}
        Description: Absolute directory path containing Fabric Manager topology files
        Possible Values:
            A valid directory path string (max length of 256)
        
        ENABLE_LOCALFM=${ENABLE_LOCALFM}
        
        GFM_WAIT_TIMEOUT=${GFM_WAIT_TIMEOUT}
        Time in Seconds. Negative value for gfmWaitTimeout denotes an infinite wait time.
        
        ENABLE_TOPOLOGY_VALIDATION=${ENABLE_TOPOLOGY_VALIDATION}
        
        FABRIC_NODE_CONFIG_FILE=/usr/share/nvidia/nvswitch/${FABRIC_NODE_CONFIG}
        
        MULTI_NODE_TOPOLOGY=${MULTI_NODE_TOPOLOGY}
        Filename of active multi-node-topology
        
        MULTI_NODE_ENABLE_ALL_NODE_PARTITION=${MULTI_NODE_ENABLE_ALL_NODE_PARTITION}
        Description: Indicates that all nodes are by default in a single default partition
        Possible Values:
           0 -  No default partition is enabled
           1 -  default partition is enabled
        
        

        Fluentd Reference

        Enable Fluentd Streaming

        To enable Fluentd streaming from NVlink4 switches to your Fluent collector, use the netq_telemetry_agent_handler tool to configure streaming parameters. The netq_telemetry_agent_handler application can be downloaded from NVIDIA Product Information Delivery Portal. The syntax for the command can be reviewed on the command line with the netq_telemetry_agent_handler -h command:

        $ ./netq_telemetry_agent_handler -h
        Usage of ./netq_telemetry_agent_handler:
          -add
                Append a new destination collector
          -address value
                List of addresses for discovery
          -delete
                Delete destination collectors
          -delete_all
                Delete all destination collectors
          -destination value
                List of fluent collectors (format <ip_addr>,<port>,<tcp/tls>)
          -domain string
                Domain name (optional)
          -domain_id string
                Domain identifier (optional)
          -password string
                NVOS http password (default "admin")
          -replace
                Replace destination collectors
          -user string
                NVOS http user (default "admin")
        Examples for configuring one switch to append a fluent destination:
                ./netq_telemetry_agent_handler -add -address 192.168.0.17 -destination 10.188.44.17,30001,tcp -user admin -password admin -domain test -domain_id 1 -domain my_domain
        Examples for configuring one switch to append two fluent destinations:
                ./netq_telemetry_agent_handler -add -address 192.168.0.17 -destination 10.188.44.17,30001,tcp -destination 10.188.44.43,30001,tcp -user admin -password admin -domain_id 1 -domain my_domain
        Examples for configuring two switches to replace with one fluent destination:
                ./netq_telemetry_agent_handler -replace -address 192.168.0.17 -address 192.168.0.21 -destination 10.188.44.17,30001,tcp -user admin -password admin -domain_id 1 -domain my_domain
        Examples for configuring one switche to delete a fluent destination:
                ./netq_telemetry_agent_handler -delete -address 192.168.0.17 -destination 10.188.44.17,30001,tcp -user admin -password admin -domain_id 1 -domain my_domain
        Examples for configuring one switch to delete all fluent destinations:
                ./netq_telemetry_agent_handler -delete_all -address 192.168.0.17 -user admin -password admin -domain_id 1 -domain my_domain
        

        NVLink4 Fluentd Message Examples

        Expand the dropdown menu to view NVLink4 Fluentd message output in JSON format:

        JSON examples
        ResourceUtil
        {
          "aid": "s-a14-ou20-ch1-evt-kg4.nvidia.com",
          "message_type": "ResourceUtil",
          "ts": 1698784259506,
          "trans_mode": 1,
          "message": [
            {
              "active": true,
              "cpu_utilization": 132,
              "deleted": false,
              "disk_utilization": {
                "/dev/sda10": {
                  "percent": 10.61,
                  "total": 50950307840,
                  "used": 5406457856
                },
                "/dev/sda8": {
                  "percent": 1.65,
                  "total": 190840832,
                  "used": 3145728
                }
              },
              "domain": "",
              "hostname": "s-a14-ou20-ch1-evt-kg4.nvidia.com",
              "is_disk_read_only": true,
              "mem_utilization": 12.34,
              "message_type": "ResourceUtil",
              "timestamp": 1698784259506
            }
          ]
        }
        
        Port
        {
          "aid": "s-a14-ou20-ch1-evt-kg4.nvidia.com",
          "message_type": "Port",
          "ts": 1698784259506,
          "trans_mode": 1,
          "message": [
            {
              "active": true,
              "connector": "Optical module",
              "deleted": false,
              "domain": "",
              "hostname": "s-a14-ou20-ch1-evt-kg4.nvidia.com",
              "identifier": "OSFP",
              "ifname": "NVL1/14/2/1",
              "length": "30m OM3 ,50m OM4 ,50m OM5",
              "message_type": "Port",
              "part_number": "MMA4Z00-NS",
              "serial_number": "MT2306FT11744",
              "speed": "106.250 Gbps",
              "state": "active",
              "timestamp": 1698784259506,
              "transceiver": "2 x NDR, 2 x 400G-SR4",
              "vendor_name": "NVIDIA"
            }
          ]
        }
        
        Power
        {
          "aid": "s-a14-ou20-ch1-evt-kg4.nvidia.com",
          "message_type": "Power",
          "ts": 1698784259506,
          "trans_mode": 1,
          "message": [
            {
              "active": true,
              "deleted": false,
              "domain": "",
              "hostname": "s-a14-ou20-ch1-evt-kg4.nvidia.com",
              "message_type": "Power",
              "s_adapter_name": "PS1",
              "s_power_in_input": 582.65,
              "s_power_out_input": 536,
              "s_voltage_in_input": 200.5,
              "s_voltage_in_max": 13.8,
              "s_voltage_in_min": 10.2,
              "s_voltage_out_input": 12.02,
              "timestamp": 1698784259506
            }
          ]
        }
        
        PSU
        {
          "aid": "s-a14-ou20-ch1-evt-kg4.nvidia.com",
          "message_type": "PSU",
          "ts": 1698784259506,
          "trans_mode": 1,
          "message": [
            {
              "active": true,
              "deleted": false,
              "domain": "",
              "hostname": "s-a14-ou20-ch1-evt-kg4.nvidia.com",
              "message_type": "PSU",
              "s_name": "PS1",
              "s_prev_state": "ok",
              "s_state": "ok",
              "timestamp": 1698784259506
            }
          ]
        }
        
        Fan
        {
          "aid": "s-a14-ou20-ch1-evt-kg4.nvidia.com",
          "message_type": "Fan",
          "ts": 1698784259506,
          "trans_mode": 1,
          "message": [
            {
              "active": true,
              "deleted": false,
              "domain": "",
              "hostname": "s-a14-ou20-ch1-evt-kg4.nvidia.com",
              "message_type": "Fan",
              "s_input": 29446,
              "s_name": "FAN3-F1",
              "s_prev_state": "OK",
              "s_state": "ok",
              "timestamp": 1698784259506
            }
          ]
        }
        
        Temp
        {
          "aid": "s-a14-ou20-ch1-evt-kg4.nvidia.com",
          "message_type": "Temp",
          "ts": 1698784259506,
          "trans_mode": 1,
          "message": [
            {
              "active": true,
              "deleted": false,
              "domain": "",
              "hostname": "s-a14-ou20-ch1-evt-kg4.nvidia.com",
              "message_type": "Temp",
              "s_desc": "PS2 power-mon T1",
              "s_input": 54,
              "s_name": "temp1",
              "s_prev_state": "ok",
              "s_state": "ok",
              "timestamp": 1698784259506
            }
          ]
        }
        
        Address
        {
          "aid": "s-a14-ou20-ch1-evt-kg4.nvidia.com",
          "message_type": "Address",
          "ts": 1698784259506,
          "trans_mode": 1,
          "message": [
            {
              "active": true,
              "deleted": false,
              "domain": "",
              "hostname": "s-a14-ou20-ch1-evt-kg4.nvidia.com",
              "ifname": "mgmt0",
              "is_ipv6": false,
              "mask": 32,
              "message_type": "Address",
              "prefix": "10.137.20.77",
              "timestamp": 1698784259506,
              "vrf": "default"
            }
          ]
        }
        
        Inventory
        {
          "aid": "s-a14-ou20-ch1-evt-kg4.nvidia.com",
          "message_type": "Inventory",
          "ts": 1698784259506,
          "trans_mode": 1,
          "message": [
            {
              "active": true,
              "agent_version": "N/A",
              "asic_core_bw": "N/A",
              "asic_data": "[]",
              "asic_model": "N/A",
              "asic_model_id": "SGXLS10-NS2F",
              "asic_ports": "NVL4-1,NVL4-2,EROT-1,EROT-2",
              "asic_vendor": "NVIDIA",
              "cpu_arch": "x86_64",
              "cpu_data": "[]",
              "cpu_max_freq": "",
              "cpu_model": "",
              "cpu_nos": "4",
              "deleted": false,
              "disk_data": "[{\"firmware_version\":\"0202-000\",\"model\":\"StorFly VSFBM4XI060G-MLX\",\"serial_number\":\"58247-0059\"}]",
              "disk_total_size": "60.0 GB",
              "domain": "",
              "hostname": "s-a14-ou20-ch1-evt-kg4.nvidia.com",
              "license_data": "[{\"license\":\"LK2-RESTRICTED_CMDS_GEN2-43A5-Q7Y9-1X1U-888A-2B7M-9JLR-E0\",\"name\":\"B8:3F:D2:1F:07:68 (ok)\"}]",
              "license_state": "ok",
              "memory_data": "[]",
              "memory_total_size": "15779 MB total",
              "message_type": "Inventory",
              "os_name": "NVOS",
              "os_version": "20231009-190322",
              "os_version_id": "X86_64 20231009-190322 2023-10-09 16:08:03 x86_64",
              "platform_base_mac": "B8:3F:D2:1F:07:68",
              "platform_label_revision": "A4",
              "platform_mfg_date": "N/A",
              "platform_model": "unknown",
              "platform_part_number": "N/A",
              "platform_serial_number": "MT2243XZ0C2L",
              "platform_vendor": "NVIDIA",
              "timestamp": 1698784259506
            }
          ]
        }
        
        Node
        {
          "aid": "s-a14-ou20-ch1-evt-kg4.nvidia.com",
          "message_type": "Node",
          "ts": 1698784259506,
          "trans_mode": 1,
          "message": [
            {
              "active": true,
              "deleted": false,
              "domain": "",
              "hostname": "s-a14-ou20-ch1-evt-kg4.nvidia.com",
              "last_reinit": 1698784661,
              "lastboot": 1698784661,
              "message_type": "Node",
              "ntp_state": "yes",
              "sys_uptime": 1698351073,
              "timestamp": 1698784661,
              "version": "N/A"
            }
          ]
        }
        Link
        {
          "aid": "s-a14-ou20-ch1-evt-kg4.nvidia.com",
          "message_type": "Link",
          "ts": 1698784259506,
          "trans_mode": 1,
          "message": [
            {
              "active": true,
              "admin_state": "Enabled",
              "deleted": false,
              "domain": "",
              "down_reason": "",
              "hostname": "s-a14-ou20-ch1-evt-kg4.nvidia.com",
              "ifalias": "",
              "ifname": "NVL2/28/1/2",
              "kind": "nvl",
              "managed": true,
              "master": "",
              "message_type": "Link",
              "mtu": 256,
              "oper_state": "active",
              "pre_failure": "no",
              "recovery_count": 0,
              "timestamp": 1698784259506,
              "vrf": "default"
            }
          ]
        }
        
        NvlStats
        {
          "aid": "s-a14-ou20-ch1-evt-kg4.nvidia.com",
          "message_type": "NvlStats",
          "ts": 1698784259506,
          "trans_mode": 1,
          "message": [
            {
              "active": true,
              "crc_errors": 358,
              "deleted": false,
              "domain": "",
              "hostname": "s-a14-ou20-ch1-evt-kg4.nvidia.com",
              "ifname": "NVL1/6/2/2",
              "message_type": "NvlStats",
              "rx_all_flits": 0,
              "rx_crc_bit_error_rate": 0,
              "rx_data_flits": 0,
              "rx_physical_bit_error_rate": 0,
              "rx_physical_errors_per_lane_0": 0,
              "rx_physical_errors_per_lane_1": 0,
              "rx_replay_rate": 0,
              "timestamp": 1698784259506,
              "tx_all_flits": 0,
              "tx_data_flits": 0,
              "tx_replay_rate": 0,
              "wait": 0
            }
          ]
        }
        
        Dom
        {
          "aid": "s-a14-ou20-ch1-evt-kg4.nvidia.com",
          "message_type": "Dom",
          "ts": 1698784259506,
          "trans_mode": 1,
          "message": [
            {
              "active": true,
              "deleted": false,
              "domain": "",
              "hostname": "s-a14-ou20-ch1-evt-kg4.nvidia.com",
              "identifier": "",
              "ifname": "NVL1/5/2/1",
              "laser_bias_current": {
                "Channel 1": "9.07200 mA",
                "Channel 2": "8.96600 mA",
                "Channel 3": "9.14200 mA",
                "Channel 4": "9.50800 mA",
                "Channel 5": "9.22800 mA",
                "Channel 6": "9.42000 mA",
                "Channel 7": "9.28000 mA",
                "Channel 8": "9.05400 mA"
              },
              "laser_bias_current_high_alarm_th": "11.00000 mA",
              "laser_bias_current_high_warning_th": "131 mA",
              "laser_bias_current_low_alarm_th": "7.00000 mA",
              "laser_bias_current_low_warning_th": "0 mA",
              "laser_output_power": {
                "Channel 1": "1.18400 mW / 0.73352 dBm",
                "Channel 2": "1.22080 mW / 0.86645 dBm",
                "Channel 3": "1.18750 mW / 0.74634 dBm",
                "Channel 4": "1.22320 mW / 0.87497 dBm",
                "Channel 5": "1.19420 mW / 0.77077 dBm",
                "Channel 6": "1.23460 mW / 0.91526 dBm",
                "Channel 7": "1.17860 mW / 0.71366 dBm",
                "Channel 8": "1.23960 mW / 0.93282 dBm"
              },
              "laser_output_power_high_alarm_th": "2.51190 mW / 4.00002 dBm",
              "laser_output_power_high_warning_th": "6.5535 mW",
              "laser_output_power_low_alarm_th": "0.45000 mW / -3.46788 dBm",
              "laser_output_power_low_warning_th": "0 mW",
              "laser_rx_power": {
                "Channel 1": "1.17770 mW / 0.71035 dBm",
                "Channel 2": "1.21720 mW / 0.85362 dBm",
                "Channel 3": "1.15860 mW / 0.63934 dBm",
                "Channel 4": "1.20620 mW / 0.81419 dBm",
                "Channel 5": "1.16100 mW / 0.64832 dBm",
                "Channel 6": "1.21460 mW / 0.84433 dBm",
                "Channel 7": "1.16280 mW / 0.65505 dBm",
                "Channel 8": "1.14250 mW / 0.57856 dBm"
              },
              "laser_rx_power_high_alarm_th": "2.51190 mW / 4.00002 dBm",
              "laser_rx_power_high_warning_th": "6.5535 mW",
              "laser_rx_power_low_alarm_th": "0.31620 mW / -5.00038 dBm",
              "laser_rx_power_low_warning_th": "0 mW",
              "message_type": "Dom",
              "module_temp": "66.000000 degrees C/150.800003 degrees F",
              "module_temp_high_alarm_th": "80.000000 degrees C/176.000000 degrees F",
              "module_temp_high_warning_th": "127.000000 degrees C/260.600006 degrees F",
              "module_temp_low_alarm_th": "-10.000000 degrees C/14.000000 degrees F",
              "module_temp_low_warning_th": "-127.000000 degrees C/-196.599991 degrees F",
              "module_voltage": "3.23440 V",
              "module_voltage_high_alarm_th": "3.50000 V",
              "module_voltage_high_warning_th": "0 V",
              "module_voltage_low_alarm_th": "3.10000 V",
              "module_voltage_low_warning_th": "6.5535 V",
              "timestamp": 1698784259506
            }
          ]
        }
        
        NvlDeviceInfo
        {
          "aid": "s-a14-ou20-ch1-evt-kg4.nvidia.com",
          "message_type": "NvlDeviceInfo",
          "ts": 1698784259506,
          "trans_mode": 1,
          "message": [
            {
              "active": true,
              "bind_interface_ip": "10.137.20.77",
              "deleted": false,
              "domain": "",
              "hostname": "s-a14-ou20-ch1-evt-kg4.nvidia.com",
              "log_append_to_log": "enabled",
              "log_level": "info",
              "message_type": "NvlDeviceInfo",
              "shutdown": "no",
              "starting_tcp_port": "16000",
              "state": "running",
              "timestamp": 1698784259506,
              "uid_led_status": "Off"
            }
          ]
        }
        
        NvlEvents
        {
          "aid": "s-a14-ou20-ch1-evt-kg4.nvidia.com",
          "message_type": "NvlEvents",
          "ts": 1698784259506,
          "trans_mode": 1,
          "message": [
            {
              "ASIC": "NVL4-1",
              "Description": "-",
              "DomainName": "",
              "InstanceLink": "1/5/1/2 (44)",
              "SXID": "-",
              "Severity": "Non-fatal",
              "Subinstance": "-",
              "Time": 1698350778,
              "Type": "Port up",
              "active": true,
              "deleted": false,
              "domain": "",
              "hostname": "s-a14-ou20-ch1-evt-kg4.nvidia.com",
              "message_type": "NvlEvents",
              "timestamp": 1698784259506
            },
          ]
        }
        

        Fluentd Message Reference

        Fluentd message reference

        General

        Field Name Type Description
        date Double Message timestamp by fluent-bit
        aid String Agent ID/hostname
        domain String Domain name
        message_type String Message type (node, inventory, link, etc.)
        ts Long Message timestamp
        trans_mode String Transition mode (full, partial, first segment, mid segment, last segment)
        message Object Message content for the message type

        Node

        Field Name Type Description
        domain String Domain name
        deleted Boolean Record deletion status
        hostname String Hostname
        last_reinit Long Last system initialization
        lastboot Long Last system reboot
        ntp_state String NTP daemon state
        sys_uptime Long System uptime
        timestamp Long Inner message timestamp
        version String Agent version

        Inventory

        Field Name Type Description
        active Boolean Record activity status
        agent_version String Agent version
        asic_core_bw String (not supported)
        asic_data String Additional ASIC data (not supported)
        asic_model String ASIC model
        asic_model_id String ASIC model ID (system type)
        asic_ports String ASIC ports
        asic_vendor String ASIC vendor (hard-coded NVIDIA)
        cpu_arch String CPU architecture
        cpu_data String (not supported)
        cpu_max_freq String CPU max frequency
        cpu_model String CPU model
        cpu_nos String Number of CPU nos
        deleted Boolean Record deletion status
        disk_data String Additional disk data information
        disk_total_size String Disk total size
        domain String Domain name
        hostname String Hostname
        license_data String License information
        license_state String License state
        memory_data String Additional memory data information (not supported)
        memory_total_size String Memory total size on the switch
        message_type String Message type
        os_name String Operating system name
        os_version String Operating system version
        os_version_id String Operating system version summary
        platform_base_mac String Platform management MAC address
        platform_label_revision String Platform label revision (not supported)
        platform_mfg_date String Platform manufacture data (not supported)
        platform_model String Platform model
        platform_part_number String Platform part number (not supported)
        platform_serial_number String Platform serial number (not supported)
        platform_vendor String Platform vendor (hard-coded NVIDIA)
        timestamp Long Message timestamp

        ResourceUtil

        Field Name Type Description
        active Boolean Record activity status
        cpu_utilization String Total CPU utilization as a percentage
        deleted Boolean Record deletion status
        mem_utilization String Show memory utilization as a percentage
        message_type String Message type
        timestamp Long Message timestamp
        disk_utilization List Show disk utilization as a percentage
        “disk_utilization” Item start
        percent Double Percentage of used disk space
        total Long Total disk utilization in bytes
        used Long Used disk utilization in bytes
        is_disk_read_only Boolean Read-only status (always true)

        Port

        Field Name Type Description
        active Boolean Record activity status
        connector String Interface connector (not supported)
        deleted Boolean Record deletion status
        hostname String Hostname
        identifier String Interface identifier (not supported)
        length String
        ifname String Interface name
        message_type String Message type
        part_number String Interface part number (not supported)
        serial_number String Interface serial number (not supported)
        speed String Interface port speed
        state String Interface operational state
        timestamp Long Message timestamp
        transceiver String Transceiver information (not supported)
        vendor_name String Interface vendor (not supported)

        Fan

        Field Name Type Description
        active Boolean Record activity status
        deleted Boolean Record deletion status
        hostname String Hostname
        message_type String Message type
        s_input String Fan source input
        s_name String Fan source name
        s_prev_state String Fan previous state
        s_state String Fan current state
        timestamp Long Message timestamp
        Temp
        active Boolean Record activity status
        deleted Boolean Record deletion status
        hostname String Hostname
        message_type String Message type
        s_desc String Temperature description
        s_input String Temperature input
        s_name String Temperature name
        s_prev_state String Temperature previous state
        s_state String Temperature current state
        timestamp Long Message timestamp

        Power

        Field Name Type Description
        active Boolean Record activity status
        deleted Boolean Record deletion status
        hostname String Hostname
        message_type String Message type
        s_adapter_name String Message adapter name
        timestamp Long Message timestamp

        PSU

        Field Name Type Description
        active Boolean Record activity status
        deleted Boolean Record deletion status
        hostname String Hostname
        message_type String Message type
        s_name String PSU name
        s_prev_state String PSU previous state
        s_state String PSU current state
        timestamp Long Message timestamp

        Link

        Field Name Type Description
        active Boolean Record activity status
        admin_state String Interface admin state
        deleted Boolean Record deletion status
        down_reason String Interface down reason
        hostname String Hostname
        ifalias String Interface description
        ifname String Interface name
        kind String Interface type
        managed String Interface device is managed
        master String Interface parent device
        message_type String Message type
        mtu String Interface MTU
        oper_state String Interface operational state
        timestamp Long Message timestamp

        NvlStats

        Field Name Type Description
        active Boolean Record activity status
        crc_errors Long CRC errors counter
        deleted Boolean Record deletion status
        hostname String Hostname
        ifname String Interface name
        message_type String Message type
        rx_all_flits Long RX all flits counter
        rx_data_flits Long RX data flits counter
        timestamp Long Message timestamp
        tx_all_flits Long TX all flits counter
        tx_data_flits Long TX data flits counter
        rx_physical_bit_error_rate Double RX physical bit error rate
        rx_physical_errors_per_lane_0 Long RX physical errors lane 0
        rx_physical_errors_per_lane_1 Long RX physical errors lane 1
        rx_crc_bit_error_rate Double RX CRC but error rate
        tx_replay_rate Double TX replay rate
        rx_replay_rate Double RX replay rate
        wait Long TX wait counter

        NVL device info

        Field Name Type Short Description
        active Boolean Record activity status
        bind_interface_ip String Binded interface IP
        deleted Boolean Record deletion status
        hostname String Hostname
        log_append_to_log Boolean Log appended to log
        log_level String Log level
        message_type String Message type
        shutdown String Shutdown state
        starting_tcp_port String Starting TCP port
        uid_led_status String Current switch LED status
        state String LFM state
        timestamp Long Message timestamp

        DOM

        Field Name Type Short Description
        active Boolean Record activity status
        deleted Boolean Record deletion status
        hostname String Hostname
        identifier String Identifier (not supported)
        ifname String Interface name
        laser_bias_current Object
        “laser_bias_current” start
        Channel 1 String Channel 1
        Channel 2 String Channel 2
        Channel 3 String Channel 3
        Channel 4 String Channel 4
        Channel 5 String Channel 5
        Channel 6 String Channel 6
        Channel 7 String Channel 7
        Channel 8 String Channel 8
        “laser_bias_current” end
        laser_bias_current_high_alarm_th String Laser bias current high alarm threshold
        laser_bias_current_high_warning_th String Laser bias current high warning threshold
        laser_bias_current_low_alarm_th String Laser bias current low alarm threshold
        laser_bias_current_low_warning_th String Laser bias current low warning threshold
        laser_output_power Object
        “laser_output_power” start
        Channel 1 String Channel 1
        Channel 2 String Channel 2
        Channel 3 String Channel 3
        Channel 4 String Channel 4
        Channel 5 String Channel 5
        Channel 6 String Channel 6
        Channel 7 String Channel 7
        Channel 8 String Channel 8
        “laser_output_power” end
        laser_output_power_high_alarm_th String Laser output power high alarm threshold
        laser_output_power_high_warning_th String Laser output power high warning threshold
        laser_output_power_low_alarm_th String Laser output power low alarm threshold
        laser_output_power_low_warning_th String Laser output power low warning threshold
        laser_rx_power Object
        “laser_rx_power” start
        Channel 1 String Channel 1
        Channel 2 String Channel 2
        Channel 3 String Channel 3
        Channel 4 String Channel 4
        Channel 5 String Channel 5
        Channel 6 String Channel 6
        Channel 7 String Channel 7
        Channel 8 String Channel 8
        “laser_rx_power” end
        laser_rx_power_high_alarm_th String Laser RX power high alarm threshold
        laser_rx_power_high_warning_th String Laser RX power high warning threshold
        laser_rx_power_low_alarm_th String Laser RX power low alarm threshold
        laser_rx_power_low_warning_th String Laser RX power low warning threshold
        message_type String Message type
        module_temp String Module temperature
        module_temp_high_alarm_th String Module temperature high alarm threshold
        module_temp_high_warning_th String Module temperature high warning threshold
        module_temp_low_alarm_th String Module temperature low alarm threshold
        module_temp_low_warning_th String Module temperature low warning threshold
        module_voltage String Module voltage
        module_voltage_high_alarm_th String Module voltage high alarm threshold
        module_voltage_high_warning_th String Module voltage high warning threshold
        module_voltage_low_alarm_th String Module voltage low alarm threshold
        module_voltage_low_warning_th String Module voltage low warning threshold
        timestamp Long

        MGMT address

        Field Name Type Short Description
        ifname String Interface attached to mgmt address
        prefix String IP address for mgmt address
        mask String Mask for mgmt address
        vrf String VRF for mgmt address
        is_ipv6 Boolean IPv6 address status

        NVL events

        Field Name Type Short Description
        ASIC String ASIC source
        SXID String
        Description String Event description
        Severity String Event severity
        InstanceLink String
        Subinstance String
        DomainName String Domain name
        timestamp Long Event timestamp

        Fluentd Collection

        You can use your own Fluent collector, or use Fluent Bit:

        $ /opt/fluent-bit/bin/fluent-bit -i forward -p port=30001 -o stdout -p format=json_lines -m '*'
        Fluent Bit v2.0.5
        * Copyright (C) 2015-2022 The Fluent Bit Authors
        * Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
        * https://fluentbit.io
        
        
        [2023/01/19 03:15:40] [ info] [fluent bit] version=2.0.5, commit=, pid=2341886
        [2023/01/19 03:15:40] [ info] [storage] ver=1.3.0, type=memory, sync=normal, checksum=off, max_chunks_up=128
        [2023/01/19 03:15:40] [ info] [cmetrics] version=0.5.7
        [2023/01/19 03:15:40] [ info] [ctraces ] version=0.2.5
        [2023/01/19 03:15:40] [ info] [input:forward:forward.0] initializing
        [2023/01/19 03:15:40] [ info] [input:forward:forward.0] storage_strategy='memory' (memory only)
        [2023/01/19 03:15:40] [ info] [input:forward:forward.0] listening on 0.0.0.0:30001
        [2023/01/19 03:15:40] [ info] [sp] stream processor started
        [2023/01/19 03:15:40] [ info] [output:stdout:stdout.0] worker #0 started
        {"date":1674090930.626897,"aid":"<hostname>","message_type":"Link","ts":1674090926111,"trans_mode":1,"domain":"my_domain","message":[{"active":true,"admin_state":"Enabled","deleted":false,"down_reason":"","hostname":"<hostname>","ifalias":"","ifname":"NVL1/2/1/2","kind":"nvl","managed":true,"master":"test","message_type":"Link","mtu":256,"oper_state":"active","timestamp":1674090926111,"vrf":"default"},{"active":true,"admin_state":"Enabled","deleted":false,"down_reason":"","hostname":"<hostname>","ifalias":"","ifname":"NVL1/12/2/1","kind":"nvl","managed":true,"master":"test","message_type":"Link","mtu":256,"oper_state":"init","timestamp":1674090926111,"vrf":"default"},{"active":true,"admin_state":"Enabled","deleted":false,"down_reason":"","hostname":"<hostname>","ifalias":"","ifname":"NVL1/13/2/2","kind":"nvl","managed":true,"master":"test","message_type":"Link","mtu":256,"oper_state":"init","timestamp":1674090926111,"vrf":"default"},{"active":true,"admin_state":"Enabled","deleted":false,"down_reason":"","hostname":"<hostname>","ifalias":"","ifname":"NVL2/29/2/2","kind":"nvl","managed":true,"master":"test","message_type":"Link","mtu":256,"oper_state":"init","timestamp":1674090926111,"vrf":"default"},{"active":true,"admin_state":"Enabled","deleted":false,"down_reason":"","hostname":"<hostname>","ifalias":"","ifname":"NVL2/32/1/2","kind":"nvl","managed":true,"master":"test","message_type":"Link","mtu":256,"oper_state":"init","timestamp":1674090926111,"vrf":"default"},{"active":true,"admin_state":"Enabled","deleted":false,"down_reason":"","hostname":"<hostname>","ifalias":"","ifname":"NVL1/7/1/1","kind":"nvl","managed":true,"master":"test","message_type":"Link","mtu":256,"oper_state":"init","timestamp":1674090926111,"vrf":"default"},{"active":true,"admin_state":"Enabled","deleted":false,"down_reason":"","hostname":"<hostname>","ifalias":"","ifname":"NVL1/9/1/2","kind":"nvl","managed":true,"master":"test","message_type":"Link","mtu":256,"oper_state":"init","timestamp":1674090926111,"vrf":"default"},{"active":true,"admin_state":"Enabled","deleted":false,"down_reason":"","hostname":"<hostname>","ifalias":"","ifname":"NVL2/30/1/1","kind":"nvl","managed":true,"master":"test","message_type":"Link","mtu":256,"oper_state":"init","timestamp":1674090926111,"vrf":"default"},{"active":true,"admin_state":"Enabled","deleted":false,"down_reason":"","hostname":"<hostname>","ifalias":"","ifname":"NVL1/16/2/2","kind":"nvl","managed":true,"master":"test","message_type":"Link","mtu":256,"oper_state":"init","timestamp":1674090926111,"vrf":"default"},{"active":true,"admin_state":"Enabled","deleted":false,"down_reason":"","hostname":"<hostname>","ifalias":"","ifname":"NVL2/17/2/1","kind":"nvl","managed":true,"master":"test","message_type":"Link","mtu":256,"oper_state":"init","timestamp":1674090926111,"vrf":"default"}]}