Monitor Network Health
As with any network, one of the challenges is keeping track of all of the moving parts. With the NetQ GUI, you can view the overall health of your network at a glance and then delve deeper for periodic checks or as conditions arise that require attention. For a general understanding of how well your network is operating, the Network Health card workflow is the best place to start as it contains the highest view and performance roll-ups.
Network Health Card Workflow Summary
The small Network Health card displays:
Item | Description |
---|---|
Indicates data is for overall Network Health | |
Health trend | Trend of overall network health, represented by an arrow:
The data collection window varies based on the time period of the card. For a 24 hour time period (default), the window is one hour. This gives you current, hourly, updates about your network health. |
Health score | Average of health scores for system health, network services health, and interface health during the last data collection window. The health score for each category is calculated as the percentage of items which passed validations versus the number of items checked. The collection window varies based on the time period of the card. For a 24 hour time period (default), the window is one hour. This gives you current, hourly, updates about your network health. |
Health rating | Performance rating based on the health score during the time window:
|
Chart | Distribution of overall health status during the designated time period |
The medium Network Health card displays the distribution, score, and trend of the:
Item | Description |
---|---|
Time period | Range of time in which the displayed data was collected; applies to all card sizes |
Indicates data is for overall Network Health | |
Health trend | Trend of system, network service, and interface health, represented by an arrow:
The data collection window varies based on the time period of the card. For a 24 hour time period (default), the window is one hour. This gives you current, hourly, updates about your network health. |
Health score | Percentage of devices which passed validation versus the number of devices checked during the time window for:
The data collection window varies based on the time period of the card. For a 24 hour time period (default), the window is one hour. This gives you current, hourly, updates about your network health. |
Chart | Distribution of overall health status during the designated time period |
The large Network Health card contains three tabs.
The System Health tab displays:
Item | Description |
---|---|
Time period | Range of time in which the displayed data was collected; applies to all card sizes |
Indicates data is for System Health | |
Health trend | Trend of NetQ Agents, Cumulus Linux licenses, and sensor health, represented by an arrow:
The data collection window varies based on the time period of the card. For a 24 hour time period (default), the window is one hour. This gives you current, hourly, updates about your network health. |
Health score | Percentage of devices which passed validation versus the number of devices checked during the time window for NetQ Agents, Cumulus Linux license status, and platform sensors. The data collection window varies based on the time period of the card. For a 24 hour time period (default), the window is one hour. This gives you current, hourly, updates about your network health. |
Charts | Distribution of health score for NetQ Agents, Cumulus Linux license status, and platform sensors during the designated time period |
Table | Listing of items that match the filter selection:
|
Show All Validations | Opens full screen Network Health card with a listing of validations performed by network service and protocol |
The Network Service Health tab displays:
Item | Description |
---|---|
Time period | Range of time in which the displayed data was collected; applies to all card sizes |
Indicates data is for Network Protocols and Services Health | |
Health trend | Trend of BGP, CLAG, EVPN, NTP, OSPF, and VXLAN services health, represented by an arrow:
The data collection window varies based on the time period of the card. For a 24 hour time period (default), the window is one hour. This gives you current, hourly, updates about your network health. |
Health score | Percentage of devices which passed validation versus the number of devices checked during the time window for BGP, CLAG, EVPN, NTP, and VXLAN protocols and services. The data collection window varies based on the time period of the card. For a 24 hour time period (default), the window is one hour. This gives you current, hourly, updates about your network health. |
Charts | Distribution of passing validations for BGP, CLAG, EVPN, NTP, and VXLAN services during the designated time period |
Table | Listing of devices that match the filter selection:
|
Show All Validations | Opens full screen Network Health card with a listing of validations performed by network service and protocol |
The Interface Health tab displays:
Item | Description |
---|---|
Time period | Range of time in which the displayed data was collected; applies to all card sizes |
Indicates data is for Interface Health | |
Health trend | Trend of interfaces, VLAN, and MTU health, represented by an arrow:
The data collection window varies based on the time period of the card. For a 24 hour time period (default), the window is one hour. This gives you current, hourly, updates about your network health. |
Health score | Percentage of devices which passed validation versus the number of devices checked during the time window for interfaces, VLAN, and MTU protocols and ports. The data collection window varies based on the time period of the card. For a 24 hour time period (default), the window is one hour. This gives you current, hourly, updates about your network health. |
Charts | Distribution of passing validations for interfaces, VLAN, and MTU protocols and ports during the designated time period |
Table | Listing of devices that match the filter selection:
|
Show All Validations | Opens full screen Network Health card with a listing of validations performed by network service and protocol |
The full screen Network Health card displays all events in the network.
Item | Description |
---|---|
Title | Network Health |
Closes full screen card and returns to workbench | |
Default Time | Range of time in which the displayed data was collected |
Displays data refresh status. Click to pause data refresh. Click to resume data refresh. Current refresh rate is visible by hovering over icon. | |
Results | Number of results found for the selected tab |
Network protocol or service tab | Displays results of that network protocol or service validations that occurred during the designated time period. By default, the requests list is sorted by the date and time that the validation was completed (Time). This tab provides the following additional data about all protocols and services:
The following protocols and services have additional data:
|
Table Actions | Select, export, or filter the list. Refer to Table Settings. |
View Network Health Summary
Overall network health is based on successful validation results. The summary includes the percentage of successful results, a trend indicator, and a distribution of the validation results.
To view a summary of your network health, open the small Network Health card.
In this example, the overall health is relatively good, but improving compared to recent status. Refer to the next section for viewing the key health metrics.
View Key Metrics of Network Health
Overall network health is a calculated average of several key health metrics: System, Network Services, and Interface health.
To view these key metrics, open the medium Network Health card. Each metric is shown with percentage of successful validations, a trend indicator, and a distribution of the validation results.
In this example, the health of each of the system and network services are good, but interface health is on the lower side. While it is improving, you might choose to dig further if it does not continue to improve. Refer to the following section for additional details.
View System Health
The system health is a calculated average of the NetQ Agent, Cumulus Linux license, and sensor health metrics. In all cases, validation is performed on the agents and licenses. If you are monitoring platform sensors, the calculation includes these as well. You can view the overall health of the system from the medium Network Health card and information about each component from the System Health tab on the large Network Health card.
To view information about each system component:
-
Open the large Network Health card.
-
Hover over the card and click .
The health of each system protocol or service is represented on the left side of the card by a distribution of the health score, a trend indicator, and a percentage of successful results. The right side of the card provides a listing of devices running the services.
View Devices with the Most Issues
It is useful to know which devices are experiencing the most issues with their system services in general, as this can help focus troubleshooting efforts toward selected devices versus the service itself. To view devices with the most issues, select Most Failures from the filter above the table on the right.
Devices with the highest number of issues are listed at the top. Scroll down to view those with fewer issues. To further investigate the critical devices, open the Event cards and filter on the indicated switches.
View Devices with Recent Issues
It is useful to know which devices are experiencing the most issues with their system services right now, as this can help focus troubleshooting efforts toward selected devices versus the service itself. To view devices with recent issues, select Recent Failures from the filter above the table on the right. Devices with the highest number of issues are listed at the top. Scroll down to view those with fewer issues. To further investigate the critical devices, open the Switch card or the Event cards and filter on the indicated switches.
Filter Results by System Service
You can focus the data in the table on the right, by unselecting one or more services. Click the checkbox next to the service you want to remove from the data. In this example, we have unchecked Licenses.
This removes the checkbox next to the associated chart and grays out the title of the chart, temporarily removing the data related to that service from the table. Add it back by hovering over the chart and clicking the checkbox that appears.
View Details of a Particular System Service
From the System Health tab on the large Network Health card you can click on a chart to take you to the full-screen card pre-focused on that service data.
View Network Services Health
The network services health is a calculated average of the individual network protocol and services health metrics. In all cases, validation is performed on NTP. If you are running BGP, CLAG, EVPN, OSPF, or VXLAN protocols the calculation includes these as well. You can view the overall health of network services from the medium Network Health card and information about individual services from the Network Service Health tab on the large Network Health card.
To view information about each network protocol or service:
-
Open the large Network Health card.
-
Hover over the card and click .
The health of each network protocol or service is represented on the left side of the card by a distribution of the health score, a trend indicator, and a percentage of successful results. The right side of the card provides a listing of devices running the services.
If you have more services running than fit naturally into the chart area, a scroll bar appears for you to access their data. Use the scroll bars on the table to view more columns and rows.
View Devices with the Most Issues
It is useful to know which devices are experiencing the most issues with their system services in general, as this can help focus troubleshooting efforts toward selected devices versus the protocol or service. To view devices with the most issues, open the large Network Health card, then click the Network Services tab. Select Most Failures from the dropdown above the table on the right.
Devices with the highest number of issues are listed at the top. Scroll down to view those with fewer issues. To further investigate the critical devices, open the Event cards and filter on the indicated switches.
View Devices with Recent Issues
It is useful to know which devices are experiencing the most issues with their network services right now, as this can help focus troubleshooting efforts toward selected devices versus the protocol or service. To view devices with the most issues, open the large Network Health card. Select Recent Failures from the dropdown above the table on the right. Devices with the highest number of issues are listed at the top. Scroll down to view those with fewer issues. To further investigate the critical devices, open the Switch card or the Event cards and filter on the indicated switches.
Filter Results by Network Service
You can focus the data in the table on the right, by unselecting one or more services. Click the checkbox next to the service you want to remove. In this example, we removed NTP and are in the process of removing OSPF.
This grays out the chart title and removes the associated checkbox, temporarily removing the data related to that service from the table.
View Details of a Particular Network Service
From the Network Service Health tab on the large Network Health card you can click on a chart to take you to the full-screen card pre-focused on that service data.
View Interfaces Health
The interface health is a calculated average of the interfaces, VLAN, and MTU health metrics. You can view the overall health of interfaces from the medium Interface Health card and information about each component from the Interface Health tab on the large Interface Health card.
To view information about each system component:
-
Open the large Network Health card.
-
Hover over the card and click .
The health of each interface protocol or service is represented on the left side of the card by a distribution of the health score, a trend indicator, and a percentage of successful results. The right side of the card provides a listing of devices running the services.
View Devices with the Most Issues
It is useful to know which devices are experiencing the most issues with their interfaces in general, as this can help focus troubleshooting efforts toward selected devices versus the service itself. To view devices with the most issues, select Most Failures from the filter above the table on the right.
Devices with the highest number of issues are listed at the top. Scroll down to view those with fewer issues. To further investigate the critical devices, open the Event cards and filter on the indicated switches.
View Devices with Recent Issues
It is useful to know which devices are experiencing the most issues with their network services right now, as this can help focus troubleshooting efforts toward selected devices versus the service itself. To view devices with recent issues, select Recent Failures from the filter above the table on the right. Devices with the highest number of issues are listed at the top. Scroll down to view those with fewer issues. To further investigate the critical devices, open the Switch card or the Event cards and filter on the indicated switches.
Filter Results by Interface Service
You can focus the data in the table on the right, by unselecting one or more services. Click the checkbox next to the interface item you want to remove from the data. In this example, we have unchecked MTU.
This removes the checkbox next to the associated chart and grays out the title of the chart, temporarily removing the data related to that service from the table. Add it back by hovering over the chart and clicking the checkbox that appears.
View Details of a Particular Interface Service
From the Interface Health tab on the large Network Health card you can click on a chart to take you to the full-screen card pre-focused on that service data.
View All Network Protocol and Service Validation Results
The Network Health card workflow enables you to view all of the results of all validations run on the network protocols and services during the designated time period.
To view all the validation results:
-
Open the full screen Network Health card.
-
Click <network protocol or service name> tab in the navigation panel.
-
Look for patterns in the data. For example, when did nodes, sessions, links, ports, or devices start failing validation? Was it at a specific time? Was it when you starting running the service on more nodes? Did sessions fail, but nodes were fine?
Where to go next depends on what data you see, but a few options include:
- Look for matching event information for the failure points in a given protocol or service.
- When you find failures in one protocol, compare with higher level protocols to see if they fail at a similar time (or vice versa with supporting services).
- Export the data for use in another analytics tool, by clicking and providing a name for the data file.