Configure Threshold-Crossing Event Notifications
Threshold-crossing events are user-defined events that detect and prevent network failures for ACL resources, digital optics, forwarding resources, interface errors and statistics, link flaps, resource utilization, and sensor events. You can find a complete list in the Threshold-Crossing Events Reference.
A notification configuration must contain one rule. Each rule must contain a scope and a threshold. If you want to deliver events to one or more notification channels (for example, email or Slack), create them by following the instructions in Create a Channel, and then return here to define your rule.
Define a Scope
Scope parameters are used to filter events generated by a given rule. You can filter all rules by hostname, while other rules can be filtered by interface or event-specific parameters.
Select Scope Parameters
For each event type, you can filter rules according to the following parameters:
Event ID | Scope Parameters |
---|---|
TCA_TCAM_IN_ACL_V4_FILTER_UPPER | Hostname |
TCA_TCAM_EG_ACL_V4_FILTER_UPPER | Hostname |
TCA_TCAM_IN_ACL_V4_MANGLE_UPPER | Hostname |
TCA_TCAM_EG_ACL_V4_MANGLE_UPPER | Hostname |
TCA_TCAM_IN_ACL_V6_FILTER_UPPER | Hostname |
TCA_TCAM_EG_ACL_V6_FILTER_UPPER | Hostname |
TCA_TCAM_IN_ACL_V6_MANGLE_UPPER | Hostname |
TCA_TCAM_EG_ACL_V6_MANGLE_UPPER | Hostname |
TCA_TCAM_IN_ACL_8021x_FILTER_UPPER | Hostname |
TCA_TCAM_ACL_L4_PORT_CHECKERS_UPPER | Hostname |
TCA_TCAM_ACL_REGIONS_UPPER | Hostname |
TCA_TCAM_IN_ACL_MIRROR_UPPER | Hostname |
TCA_TCAM_ACL_18B_RULES_UPPER | Hostname |
TCA_TCAM_ACL_32B_RULES_UPPER | Hostname |
TCA_TCAM_ACL_54B_RULES_UPPER | Hostname |
TCA_TCAM_IN_PBR_V4_FILTER_UPPER | Hostname |
TCA_TCAM_IN_PBR_V6_FILTER_UPPER | Hostname |
Event ID | Scope Parameters |
---|---|
TCA_DOM_RX_POWER_ALARM_UPPER | Hostname, Interface |
TCA_DOM_RX_POWER_ALARM_LOWER | Hostname, Interface |
TCA_DOM_RX_POWER_WARNING_UPPER | Hostname, Interface |
TCA_DOM_RX_POWER_WARNING_LOWER | Hostname, Interface |
TCA_DOM_BIAS_CURRENT_ALARM_UPPER | Hostname, Interface |
TCA_DOM_BIAS_CURRENT_ALARM_LOWER | Hostname, Interface |
TCA_DOM_BIAS_CURRENT_WARNING_UPPER | Hostname, Interface |
TCA_DOM_BIAS_CURRENT_WARNING_LOWER | Hostname, Interface |
TCA_DOM_OUTPUT_POWER_ALARM_UPPER | Hostname, Interface |
TCA_DOM_OUTPUT_POWER_ALARM_LOWER | Hostname, Interface |
TCA_DOM_OUTPUT_POWER_WARNING_UPPER | Hostname, Interface |
TCA_DOM_OUTPUT_POWER_WARNING_LOWER | Hostname, Interface |
TCA_DOM_MODULE_TEMPERATURE_ALARM_UPPER | Hostname, Interface |
TCA_DOM_MODULE_TEMPERATURE_ALARM_LOWER | Hostname, Interface |
TCA_DOM_MODULE_TEMPERATURE_WARNING_UPPER | Hostname, Interface |
TCA_DOM_MODULE_TEMPERATURE_WARNING_LOWER | Hostname, Interface |
TCA_DOM_MODULE_VOLTAGE_ALARM_UPPER | Hostname, Interface |
TCA_DOM_MODULE_VOLTAGE_ALARM_LOWER | Hostname, Interface |
TCA_DOM_MODULE_VOLTAGE_WARNING_UPPER | Hostname, Interface |
TCA_DOM_MODULE_VOLTAGE_WARNING_LOWER | Hostname, Interface |
Event ID | Scope Parameters |
---|---|
TCA_TCAM_TOTAL_ROUTE_ENTRIES_UPPER | Hostname |
TCA_TCAM_TOTAL_MCAST_ROUTES_UPPER | Hostname |
TCA_TCAM_MAC_ENTRIES_UPPER | Hostname |
TCA_TCAM_ECMP_NEXTHOPS_UPPER | Hostname |
TCA_TCAM_IPV4_ROUTE_UPPER | Hostname |
TCA_TCAM_IPV4_HOST_UPPER | Hostname |
TCA_TCAM_IPV6_ROUTE_UPPER | Hostname |
TCA_TCAM_IPV6_HOST_UPPER | Hostname |
Event ID | Scope Parameters |
---|---|
TCA_HW_IF_OVERSIZE_ERRORS | Hostname, Interface |
TCA_HW_IF_UNDERSIZE_ERRORS | Hostname, Interface |
TCA_HW_IF_ALIGNMENT_ERRORS | Hostname, Interface |
TCA_HW_IF_JABBER_ERRORS | Hostname, Interface |
TCA_HW_IF_SYMBOL_ERRORS | Hostname, Interface |
Event ID | Scope Parameters |
---|---|
TCA_RXBROADCAST_UPPER | Hostname, Interface |
TCA_RXBYTES_UPPER | Hostname, Interface |
TCA_RXMULTICAST_UPPER | Hostname, Interface |
TCA_TXBROADCAST_UPPER | Hostname, Interface |
TCA_TXBYTES_UPPER | Hostname, Interface |
TCA_TXMULTICAST_UPPER | Hostname, Interface |
Event ID | Scope Parameters |
---|---|
TCA_LINK | Hostname, Interface |
Event ID | Scope Parameters |
---|---|
TCA_CPU_UTILIZATION_UPPER | Hostname |
TCA_DISK_UTILIZATION_UPPER | Hostname |
TCA_MEMORY_UTILIZATION_UPPER | Hostname |
Event ID | Scope Parameters |
---|---|
Tx CNP Unicast No Buffer Discard | Hostname, Interface |
Rx RoCE PFC Pause Duration | Hostname |
Rx RoCE PG Usage Cells | Hostname, Interface |
Tx RoCE TC Usage Cells | Hostname, Interface |
Rx RoCE No Buffer Discard | Hostname, Interface |
Tx RoCE PFC Pause Duration | Hostname, Interface |
Tx CNP Buffer Usage Cells | Hostname, Interface |
Tx ECN Marked Packets | Hostname, Interface |
Tx RoCE PFC Pause Packets | Hostname, Interface |
Rx CNP No Buffer Discard | Hostname, Interface |
Rx CNP PG Usage Cells | Hostname, Interface |
Tx CNP TC Usage Cells | Hostname, Interface |
Rx RoCE Buffer Usage Cells | Hostname, Interface |
Tx RoCE Unicast No Buffer Discard | Hostname, Interface |
Rx CNP Buffer Usage Cells | Hostname, Interface |
Rx RoCE PFC Pause Packets | Hostname, Interface |
Tx RoCE Buffer Usage Cells | Hostname, Interface |
Event ID | Scope Parameters |
---|---|
TCA_SENSOR_FAN_UPPER | Hostname, Sensor Name |
TCA_SENSOR_POWER_UPPER | Hostname, Sensor Name |
TCA_SENSOR_TEMPERATURE_UPPER | Hostname, Sensor Name |
TCA_SENSOR_VOLTAGE_UPPER | Hostname, Sensor Name |
Event ID | Scope Parameters |
---|---|
TCA_WJH_DROP_AGG_UPPER | Hostname, Reason |
TCA_WJH_ACL_DROP_AGG_UPPER | Hostname, Reason, Ingress port |
TCA_WJH_BUFFER_DROP_AGG_UPPER | Hostname, Reason |
TCA_WJH_SYMBOL_ERROR_UPPER | Hostname, Port down reason |
TCA_WJH_CRC_ERROR_UPPER | Hostname, Port down reason |
Specify the Scope
A rule’s scope can include all monitored devices or a subset. You define scopes as regular expressions, which is how they appear in NetQ. Each event has a set of attributes you can use to apply the rule to a subset of all devices. The definition and display is slightly different between the NetQ UI and the NetQ CLI, but the results are the same.
You define the scope in the Choose Attributes step when creating an event rule. You can choose to apply the rule to all devices or narrow the scope using attributes. If you choose to narrow the scope, but then do not enter any values for the available attributes, the result is all devices and attributes.
Scopes appear in threshold-crossing rule cards using the following format: Attribute, Operation, Value.
In this example, three attributes are available. For one or more of these attributes, select the operation (equals or starts with) and enter a value. For drop reasons, click in the value field to open a list of reasons, and select one from the list.
Note that you should leave the drop type attribute blank.
Create rule to show events from a … | Attribute | Operation | Value |
---|---|---|---|
Single device | hostname | Equals | <hostname> such as spine01 |
Single interface | ifname | Equals | <interface-name> such as swp6 |
Single sensor | s_name | Equals | <sensor-name> such as fan2 |
Single WJH drop reason | reason or port_down_reason | Equals | <drop-reason> such as WRED |
Single WJH ingress port | ingress_port | Equals | <port-name> such as 47 |
Set of devices | hostname | Starts with | <partial-hostname> such as leaf |
Set of interfaces | ifname | Starts with | <partial-interface-name> such as swp or eth |
Set of sensors | s_name | Starts with | <partial-sensor-name> such as fan, temp, or psu |
Refer to WJH Events Reference for WJH drop types and reasons. Leaving an attribute value blank defaults to all: all hostnames, interfaces, sensors, forwarding resources, ACL resources, and so forth.
Each attribute is displayed on the rule card as a regular expression equivalent to your choices above:
- Equals is displayed as an equals sign (=)
- Starts with is displayed as a caret (^)
- Blank (all) is displayed as an asterisk (*)
Scopes are defined with regular expressions. When more than one scoping parameter is available, they must be separated by a comma (without spaces), and all parameters must be defined in order. When an asterisk (*) is used alone, it must be entered inside either single or double quotes. Single quotes are used here.
The single hostname scope parameter is used by the ACL resources, forwarding resources, and resource utilization events.
Scope Value | Example | Result |
---|---|---|
<hostname> | leaf01 | Deliver events for the specified device |
<partial-hostname>* | leaf* | Deliver events for devices with hostnames starting with specified text (leaf) |
The hostname and interface scope parameters are used by the digital optics, interface errors, interface statistics, and link flaps events.
Scope Value | Example | Result |
---|---|---|
<hostname>,<interface> | leaf01,swp9 | Deliver events for the specified interface (swp9) on the specified device (leaf01) |
<hostname>,'*' | leaf01,'*' | Deliver events for all interfaces on the specified device (leaf01) |
'*',<interface> | '*',swp9 | Deliver events for the specified interface (swp9) on all devices |
<partial-hostname>*,<interface> | leaf*,swp9 | Deliver events for the specified interface (swp9) on all devices with hostnames starting with the specified text (leaf) |
<hostname>,<partial-interface>* | leaf01,swp* | Deliver events for all interface with names starting with the specified text (swp) on the specified device (leaf01) |
The hostname and sensor name scope parameters are used by the sensor events.
Scope Value | Example | Result |
---|---|---|
<hostname>,<sensorname> | leaf01,fan1 | Deliver events for the specified sensor (fan1) on the specified device (leaf01) |
'*',<sensorname> | '*',fan1 | Deliver events for the specified sensor (fan1) for all devices |
<hostname>,'*' | leaf01,'*' | Deliver events for all sensors on the specified device (leaf01) |
<partial-hostname>*,<interface> | leaf*,fan1 | Deliver events for the specified sensor (fan1) on all devices with hostnames starting with the specified text (leaf) |
<hostname>,<partial-sensorname>* | leaf01,fan* | Deliver events for all sensors with names starting with the specified text (fan) on the specified device (leaf01) |
The hostname, reason/port down reason, ingress port, and drop type scope parameters are used by the What Just Happened events.
Scope Value | Example | Result |
---|---|---|
<hostname>,<reason>,<ingress_port>,<drop_type> | leaf01,ingress-port-acl,'*','*' | Deliver WJH events for all ports on the specified device (leaf01) with the specified reason triggered (ingress-port-acl exceeded the threshold) |
'*',<reason>,'*' | '*',tail-drop,'*' | Deliver WJH events for the specified reason (tail-drop) for all devices |
<partial-hostname>*,<port_down_reason>,<drop_type> | leaf*,calibration-failure,'*' | Deliver WJH events for the specified reason (calibration-failure) on all devices with hostnames starting with the specified text (leaf) |
<hostname>,<partial-reason>*,<drop_type> | leaf01,blackhole,'*' | Deliver WJH events for reasons starting with the specified text (blackhole [route]) on the specified device (leaf01) |
Create a Threshold-crossing Rule
-
Click Menu and navigate to Threshold crossing rules.
-
Select the tab that reflects the event type for the rule.
-
Click Create a rule. Enter a name for the rule and assign a severity, then click Next.
-
Select the attribute you want to monitor. The listed attributes change depending on the type of event you chose in the previous step.
-
Click Next.
-
On the Set threshold step, enter a threshold value.
-
Define the scope of the rule.
-
If you want to restrict the rule based on a particular parameter, enter values for one or more of the available attributes. For What Just Happened rules, select a reason from the available list.
-
If you want the rule to apply to across the network, select the Apply rule to entire network toggle.
-
-
Click Next.
-
(Optional) Select a notification channel where you want the events to be sent.
Only previously created channels are available for selection. If no channel is available or selected, the notifications can only be retrieved from the database. You can add a channel at a later time and then add it to the rule.
-
Click Finish. The rules may take several minutes to appear in the UI.
The simplest configuration you can create is one that sends a TCA event generated by all devices and all interfaces to a single notification application. Use the netq add tca
command to configure the event. Its syntax is:
netq add tca [event_id <text-event-id-anchor>] [scope <text-scope-anchor>] [tca_id <text-tca-id-anchor>] [severity info | severity error] [is_active true | is_active false] [suppress_until <text-suppress-ts>] [threshold_type user_set | threshold_type vendor_set] [threshold <text-threshold-value>] [channel <text-channel-name-anchor> | channel drop <text-drop-channel-name>]
Note that the event ID is case sensitive and must be in all uppercase.
For example, this rule tells NetQ to deliver an event notification to the tca_slack_ifstats pre-configured Slack channel when the CPU utilization exceeds 95% of its capacity on any monitored switch:
cumulus@switch:~$ netq add tca event_id TCA_CPU_UTILIZATION_UPPER scope '*' channel tca_slack_ifstats threshold 95
This rule tells NetQ to deliver an event notification to the tca_pd_ifstats PagerDuty channel when the number of transmit bytes per second (Bps) on the leaf12 switch exceeds 20,000 Bps on any interface:
cumulus@switch:~$ netq add tca event_id TCA_TXBYTES_UPPER scope leaf12,'*' channel tca_pd_ifstats threshold 20000
This rule tells NetQ to deliver an event notification to the syslog-netq syslog channel when the temperature on sensor temp1 on the leaf12 switch exceeds 32 degrees Celcius:
cumulus@switch:~$ netq add tca event_id TCA_SENSOR_TEMPERATURE_UPPER scope leaf12,temp1 channel syslog-netq threshold 32
This rule tells NetQ to deliver an event notification to the tca-slack channel when the total number of ACL drops on the leaf04 switch exceeds 20,000 for any reason, ingress port, or drop type.
cumulus@switch:~$ netq add tca event_id TCA_WJH_ACL_DROP_AGG_UPPER scope leaf04,'*','*','*' channel tca-slack threshold 20000
For a Slack channel, the event messages should be similar to this:
Set the Severity of a Threshold-crossing Event
In addition to defining a scope for TCA rule, you can also set a severity of either info or error. To add a severity to a rule, use the severity
option.
For example, if you want to add an error severity to the CPU utilization rule you created earlier:
cumulus@switch:~$ netq add tca event_id TCA_CPU_UTILIZATION_UPPER scope '*' severity error channel tca_slack_resources threshold 95
Or if an event is important, but not an error. Set the severity
to info:
cumulus@switch:~$ netq add tca event_id TCA_TXBYTES_UPPER scope leaf12,'*' severity info channel tca_pd_ifstats threshold 20000
Set the Threshold for Digital Optics Events
Digital optics have the additional option of applying user- or vendor-defined thresholds, using the threshold_type
and threshold
options.
This example shows how to send an error to channel ch1 when the upper threshold for module voltage exceeds the vendor-defined thresholds for interface swp31 on the mlx-2700-04 switch.
cumulus@switch:~$ netq add tca event_id TCA_DOM_MODULE_VOLTAGE_ALARM_UPPER scope 'mlx-2700-04,swp31' severity error is_active true threshold_type vendor_set channel ch1
Successfully added/updated tca
This example shows how to send an error to channel ch1 when the upper threshold for module voltage exceeds the user-defined threshold of 3V for interface swp31 on the mlx-2700-04 switch.
cumulus@switch:~$ netq add tca event_id TCA_DOM_MODULE_VOLTAGE_ALARM_UPPER scope 'mlx-2700-04,swp31' severity error is_active true threshold_type user_set threshold 3 channel ch1
Successfully added/updated tca
Create Multiple Rules for a Single Event
You may want to create more than one rule per event. For example, you might want to:
- Monitor the same event but for a different interface, sensor, or device
- Send the event notification to more than one channel
- Change the threshold for a particular device that you are troubleshooting
To do this in the NetQ UI, create additional rule cards (as shown in the previous section).
In the NetQ CLI, you can also add multiple rules. The following example shows the creation of three additional rules for the max temperature sensor:
netq add tca event_id TCA_SENSOR_TEMPERATURE_UPPER scope leaf*,temp1 channel syslog-netq threshold 32
netq add tca event_id TCA_SENSOR_TEMPERATURE_UPPER scope '*',temp1 channel tca_sensors,tca_pd_sensors threshold 32
netq add tca event_id TCA_SENSOR_TEMPERATURE_UPPER scope leaf03,temp1 channel syslog-netq threshold 29
Now you have four rules created (the original one, plus these three new ones) all based on the TCA_SENSOR_TEMPERATURE_UPPER event. To identify the various rules, NetQ automatically generates a TCA name for each rule. As you create each rule, NetQ adds an _# to the event name. The TCA Name for the first rule created is then TCA_SENSOR_TEMPERATURE_UPPER_1, the second rule created for this event is TCA_SENSOR_TEMPERATURE_UPPER_2, and so forth.
Manage Threshold-crossing Event Notifications
View Threshold-crossing Rules
-
Click Menu and navigate to Threshold crossing rules.
-
Select the relevant tab. The UI displays each rule and its parameters as a card.
After creating a rule, you can use the filters that appear above the rule cards to filter by status, severity, channel, and/or events.
To view TCA rules, run:
netq show tca [tca_id <text-tca-id-anchor>] [json]
This example displays all TCA rules:
cumulus@switch:~$ netq show tca
Matching config_tca records:
TCA Name Event Name Scope Severity Channel/s Active Threshold Unit Threshold Type Suppress Until
---------------------------- -------------------- -------------------------- -------- ------------------ ------ ------------------ -------- -------------- ----------------------------
TCA_CPU_UTILIZATION_UPPER_1 TCA_CPU_UTILIZATION_ {"hostname":"leaf01"} info pd-netq-events,slk True 87 % user_set Fri Oct 9 15:39:35 2020
UPPER -netq-events
TCA_CPU_UTILIZATION_UPPER_2 TCA_CPU_UTILIZATION_ {"hostname":"*"} error slk-netq-events True 93 % user_set Fri Oct 9 15:39:56 2020
UPPER
TCA_DOM_BIAS_CURRENT_ALARM_U TCA_DOM_BIAS_CURRENT {"hostname":"leaf*","ifnam error slk-netq-events True 0 mA vendor_set Fri Oct 9 16:02:37 2020
PPER_1 _ALARM_UPPER e":"*"}
TCA_DOM_RX_POWER_ALARM_UPPER TCA_DOM_RX_POWER_ALA {"hostname":"*","ifname":" info slk-netq-events True 0 mW vendor_set Fri Oct 9 15:25:26 2020
_1 RM_UPPER *"}
TCA_SENSOR_TEMPERATURE_UPPER TCA_SENSOR_TEMPERATU {"hostname":"leaf","s_name error slk-netq-events True 32 degreeC user_set Fri Oct 9 15:40:18 2020
_1 RE_UPPER ":"temp1"}
TCA_TCAM_IPV4_ROUTE_UPPER_1 TCA_TCAM_IPV4_ROUTE_ {"hostname":"*"} error pd-netq-events True 20000 % user_set Fri Oct 9 16:13:39 2020
UPPER
This example displays a specific TCA rule:
cumulus@switch:~$ netq show tca tca_id TCA_TXMULTICAST_UPPER_1
Matching config_tca records:
TCA Name Event Name Scope Severity Channel/s Active Threshold Suppress Until
---------------------------- -------------------- -------------------------- ---------------- ------------------ ------ ------------------ ----------------------------
TCA_TXMULTICAST_UPPER_1 TCA_TXMULTICAST_UPPE {"ifname":"swp3","hostname info tca-tx-bytes-slack True 0 Sun Dec 8 16:40:14 2269
R ":"leaf01"}
Change the Threshold on a Rule
After receiving notifications based on a rule, you might want to increase or decrease the threshold value to limit or increase the number of events you receive.
To modify the threshold:
-
Locate the rule you want to modify and hover over the top of the card.
-
Click Edit.
- Enter a new threshold value, then select Update rule.
To modify the threshold, run:
netq add tca tca_id <text-tca-id-anchor> threshold <text-threshold-value>
This example changes the threshold for the rule TCA_CPU_UTILIZATION_UPPER_1 to a value of 96 percent. This overwrites the existing threshold value.
cumulus@switch:~$ netq add tca tca_id TCA_CPU_UTILIZATION_UPPER_1 threshold 96
Change the Scope of a Rule
After receiving notifications based on a rule, you might find that you want to narrow or widen the scope value to limit or increase the number of events you receive.
To modify the scope:
-
Locate the rule you want to modify and hover over the top of the card.
-
Click Edit.
-
Select the toggle to either apply the rule to the entire network or individual hosts.
-
Click Update rule.
To modify the scope, run:
netq add tca event_id <text-event-id-anchor> scope <text-scope-anchor> threshold <text-threshold-value>
This example changes the scope for the rule TCA_CPU_UTILIZATION_UPPER to apply only to switches beginning with a hostname of leaf. You must also provide a threshold value. This example case uses a value of 95 percent. Note that this overwrites the existing scope and threshold values.
cumulus@switch:~$ netq add tca event_id TCA_CPU_UTILIZATION_UPPER scope hostname^leaf threshold 95
Successfully added/updated tca
cumulus@switch:~$ netq show tca
Matching config_tca records:
TCA Name Event Name Scope Severity Channel/s Active Threshold Suppress Until
---------------------------- -------------------- -------------------------- ---------------- ------------------ ------ ------------------ ----------------------------
TCA_CPU_UTILIZATION_UPPER_1 TCA_CPU_UTILIZATION_ {"hostname":"*"} error onprem-email True 93 Mon Aug 31 20:59:57 2020
UPPER
TCA_CPU_UTILIZATION_UPPER_2 TCA_CPU_UTILIZATION_ {"hostname":"hostname^leaf info True 95 Tue Sep 1 18:47:24 2020
UPPER "}
Change, Add, or Remove Channels
-
Locate the rule you want to modify and hover over the top of the card.
-
Click Edit.
-
Select the Channels tab.
-
Select one or more channels.
-
Click Update rule.
To change a channel association, run:
netq add tca tca_id <text-tca-id-anchor> channel <text-channel-name-anchor>
This overwrites the existing channel association.
This example shows the changing of the channel for the disk utilization 1 rule to a PagerDuty channel pd-netq-events.
cumulus@switch:~$ netq add tca tca_id TCA_DISK_UTILIZATION_UPPER_1 channel pd-netq-events
Successfully added/updated tca TCA_DISK_UTILIZATION_UPPER_1
To remove a channel association (stop sending events to a particular channel), run:
netq add tca tca_id <text-tca-id-anchor> channel drop <text-drop-channel-name>
This example removes the tca_slack_resources channel from the disk utilization 1 rule.
cumulus@switch:~$ netq add tca tca_id TCA_DISK_UTILIZATION_UPPER_1 channel drop tca_slack_resources
Successfully added/updated tca TCA_DISK_UTILIZATION_UPPER_1
Change the Name of a Rule
You cannot change the name of a threshold-crossing rule using the NetQ CLI because the rules do not have names. They receive identifiers (the tca_id
) automatically. In the NetQ UI, to change a rule name, you must delete the rule and re-create it with the new name.
Change the Severity of a Rule
Threshold-crossing rules are categorized as either info or error.
In the NetQ UI, you must delete the rule and re-create it, specifying the new severity.
In the NetQ CLI, to change the severity, run:
netq add tca tca_id <text-tca-id-anchor> (severity info | severity error)
This example changes the severity of the maximum CPU utilization 1 rule from error to info:
cumulus@switch:~$ netq add tca tca_id TCA_CPU_UTILIZATION_UPPER_1 severity info
Successfully added/updated tca TCA_CPU_UTILIZATION_UPPER_1
Suppress a Rule
During troubleshooting or switch maintenance, you might want to suppress a rule to prevent erroneous or excessive notifications. This effectively pauses notifications for a specified time period.
-
Locate the rule you want to disable and click Disable.
-
Select the Date/Time field to set when you want the rule to be reenabled.
-
Click Disable.
- The state changes to Snoozed
- The Suppressed field displays the date and time at which the rule will be reenabled.
- The Disable button changes to Disable forever.
Using the suppress_until
option allows you to prevent the rule from being applied for a designated amout of time (in seconds). When this time has passed, the rule is automatically reenabled.
To suppress a rule, run:
netq add tca tca_id <text-tca-id-anchor> suppress_until <text-suppress-ts>
This example suppresses the maximum cpu utilization event for 24 hours:
cumulus@switch:~$ netq add tca tca_id TCA_CPU_UTILIZATION_UPPER_2 suppress_until 86400
Successfully added/updated tca TCA_CPU_UTILIZATION_UPPER_2
Disable a Rule
Whereas suppression temporarily disables a rule, you can also disable a rule indefinitely.
To disable a rule that is currently active:
-
Locate the rule you want to disable.
-
Click Disable.
-
Leave the Date/Time field blank.
-
Click Disable.
- The state changes to Inactive
- The rule definition is grayed out
- The Disable option has changed to Enable to reactivate the rule when you are ready
To disable a rule that is currently suppressed, click Disable forever.
To disable a rule, run:
netq add tca tca_id <text-tca-id-anchor> is_active false
This example disables the maximum disk utilization 1 rule:
cumulus@switch:~$ netq add tca tca_id TCA_DISK_UTILIZATION_UPPER_1 is_active false
Successfully added/updated tca TCA_DISK_UTILIZATION_UPPER_1
To reenable the rule, set the is_active
option to true.
Delete a Rule
To delete a rule:
-
Locate the rule you want to remove and hover over the card.
-
In the card’s top-right corner, select Delete.
To remove a rule altogether, run:
netq del tca tca_id <text-tca-id-anchor>
This example deletes the maximum receive bytes rule:
cumulus@switch:~$ netq del tca tca_id TCA_RXBYTES_UPPER_1
Successfully deleted TCA TCA_RXBYTES_UPPER_1
Resolve Scope Conflicts
There might be occasions where the scopes defined by multiple threshold-crossing rules overlap. In such cases, NetQ uses the rule with the most specific scope that is still true to generate the event.
To clarify this, consider this example. Three events occurred:
- First event on switch leaf01, interface swp1
- Second event on switch leaf01, interface swp3
- Third event on switch spine01, interface swp1
NetQ attempts to match the threshold-crossing event against hostname and interface name with three threshold-crossing rules with different scopes:
- Scope 1 send events for the swp1 interface on switch leaf01 (very specific)
- Scope 2 send events for all interfaces on switches that start with leaf (moderately specific)
- Scope 3 send events for all switches and interfaces (very broad)
The result is:
- For the first event, NetQ applies the scope from rule 1 because it matches scope 1 exactly
- For the second event, NetQ applies the scope from rule 2 because it does not match scope 1, but does match scope 2
- For the third event, NetQ applies the scope from rule 3 because it does not match either scope 1 or scope 2
In summary:
Input Event | Scope Parameters | TCA Scope 1 | TCA Scope 2 | TCA Scope 3 | Scope Applied |
---|---|---|---|---|---|
leaf01,swp1 | Hostname, Interface | '*','*' | leaf*,'*' | leaf01,swp1 | Scope 3 |
leaf01,swp3 | Hostname, Interface | '*','*' | leaf*,'*' | leaf01,swp1 | Scope 2 |
spine01,swp1 | Hostname, Interface | '*','*' | leaf*,'*' | leaf01,swp1 | Scope 1 |
You can modify threshold-crossing rules to remove conflicts.