RoCE
RDMA over Converged Ethernet (RoCE) is a network protocol that writes to compute or storage elements using remote direct memory access (RDMA) over an Ethernet network instead of using host CPUs. RoCE relies on congestion control and lossless Ethernet to operate. Cumulus Linux and SONiC both support features that can enable lossless Ethernet for RoCE environments.
RoCE helps you obtain a converged network, where all services run over the Ethernet infrastructure, including InfiniBand apps.
You monitor RoCE in your network with the UI and with the following CLI commands:
netq [<hostname>] show roce-counters [<text-port>] tx | rx [roce | general] [around <text-time>] [json]
netq [<hostname>] show roce-config [<text-port>] [around <text-time>] [json]
netq [<hostname>] show roce-counters pool [json]
netq [<hostname>] show events tca_roce
netq [<hostname>] show events roceconfig
View the RoCE Configuration
To view the RoCE configuration, run netq show roce-config
:
cumulus@switch:~$ netq show roce-config
Matching roce records:
Hostname Interface RoCE Mode Enabled TCs Mode ECN Max ECN Min DSCP->SP SP->PG SP->TC PFC SPs PFC Rx PFC Tx ETS Mode Last Changed
----------------- --------------- ---------- ------------ -------- -------- -------- ---------- -------- -------- -------- ---------- ---------- ---------- -------------------------
switch swp34 Lossy 0,3 ECN 10432 1088 26 -> 3 3 -> 2 3 -> 3 3 disabled disabled dwrr Thu May 20 22:05:48 2021
switch swp47 Lossy 0,3 ECN 10432 1088 26 -> 3 3 -> 2 3 -> 3 3 disabled disabled dwrr Thu May 20 22:05:48 2021
switch swp19 Lossy 0,3 ECN 10432 1088 26 -> 3 3 -> 2 3 -> 3 3 disabled disabled dwrr Thu May 20 22:05:48 2021
switch swp37 Lossy 0,3 ECN 10432 1088 26 -> 3 3 -> 2 3 -> 3 3 disabled disabled dwrr Thu May 20 22:05:48 2021
switch swp30 Lossy 0,3 ECN 10432 1088 26 -> 3 3 -> 2 3 -> 3 3 disabled disabled dwrr Thu May 20 22:05:48 2021
switch swp45 Lossy 0,3 ECN 10432 1088 26 -> 3 3 -> 2 3 -> 3 3 disabled disabled dwrr Thu May 20 22:05:48 2021
switch swp57 Lossy 0,3 ECN 10432 1088 26 -> 3 3 -> 2 3 -> 3 3 disabled disabled dwrr Thu May 20 22:05:48 2021
switch swp33 Lossy 0,3 ECN 10432 1088 26 -> 3 3 -> 2 3 -> 3 3 disabled disabled dwrr Thu May 20 22:05:48 2021
switch swp31 Lossy 0,3 ECN 10432 1088 26 -> 3 3 -> 2 3 -> 3 3 disabled disabled dwrr Thu May 20 22:05:48 2021
switch swp39 Lossy 0,3 ECN 10432 1088 26 -> 3 3 -> 2 3 -> 3 3 disabled disabled dwrr Thu May 20 22:05:48 2021
switch swp24 Lossy 0,3 ECN 10432 1088 26 -> 3 3 -> 2 3 -> 3 3 disabled disabled dwrr Thu May 20 22:05:48 2021
switch swp13 Lossy 0,3 ECN 10432 1088 26 -> 3 3 -> 2 3 -> 3 3 disabled disabled dwrr Thu May 20 22:05:48 2021
switch swp53 Lossy 0,3 ECN 10432 1088 26 -> 3 3 -> 2 3 -> 3 3 disabled disabled dwrr Thu May 20 22:05:48 2021
switch swp1s1 Lossy 0,3 ECN 10432 1088 26 -> 3 3 -> 2 3 -> 3 3 disabled disabled dwrr Thu May 20 22:05:48 2021
switch swp6 Lossy 0,3 ECN 10432 1088 26 -> 3 3 -> 2 3 -> 3 3 disabled disabled dwrr Thu May 20 22:05:48 2021
switch swp29 Lossy 0,3 ECN 10432 1088 26 -> 3 3 -> 2 3 -> 3 3 disabled disabled dwrr Thu May 20 22:05:48 2021
switch swp42 Lossy 0,3 ECN 10432 1088 26 -> 3 3 -> 2 3 -> 3 3 disabled disabled dwrr Thu May 20 22:05:48 2021
switch swp35 Lossy 0,3 ECN 10432 1088 26 -> 3 3 -> 2 3 -> 3 3 disabled disabled dwrr Thu May 20 22:05:48 2021
...
View RoCE Counters
Various RoCE counters are available for viewing for a given switch, including:
- Rx and Tx counters
- General, CNP and RoCE-specific counters
- Counter pools
- Port-specific counters
You can also go back in time to view counters at a particular point in the past.
View Rx Counters
You can view RoCE Rx counters in both the UI and CLI.
- To view Rx counters, open the large switch card, then click the RoCE icon ().
- Switch to the full-screen card, then click RoCE Counters.
To view general and CNP Rx counters, run netq show roce-counters rx general
:
cumulus@switch:~$ netq show roce-counters rx general
Matching roce records:
Hostname Interface PG packets PG bytes no buffer discard buffer usage buffer max usage PG usage PG max usage
----------------- -------------------- -------------------- -------------------- -------------------- -------------------- -------------------- -------------------- --------------------
switch swp1s1 1627273 152582910 0 0 1 0 1
switch swp1s2 1627273 152582910 0 0 1 0 1
switch swp63s1 1618361 160178796 0 0 2 0 2
switch swp1s0 1627273 152582910 0 0 1 0 1
switch swp63s3 1618361 160178796 0 0 2 0 2
switch swp1s3 1627273 152582910 0 0 1 0 1
switch swp63s0 1094532 120228456 0 0 1 0 1
switch swp63s2 1618361 160178796 0 0 2 0 2
To view RoCE-specific Rx counters, run netq show roce-counters rx roce
:
cumulus@switch:~$ netq show roce-counters rx roce
Matching roce records:
Hostname Interface PG packets PG bytes no buffer discard PFC pause packets PFC pause duration buffer usage buffer max usage PG usage PG max usage
----------------- --------------- ------------ ------------ ------------------ ------------------ ------------------ ------------ ------------------ ------------ ---------------
switch swp1s1 0 0 0 0 0 0 0 0 0
switch swp1s2 0 0 0 0 0 0 0 0 0
switch swp63s1 0 0 0 0 0 0 0 0 0
switch swp1s0 0 0 0 0 0 0 0 0 0
switch swp63s3 0 0 0 0 0 0 0 0 0
switch swp1s3 0 0 0 0 0 0 0 0 0
switch swp63s0 0 0 0 0 0 0 0 0 0
switch swp63s2 0 0 0 0 0 0 0 0 0
View Tx Counters
You can view RoCE Tx counters in both the UI and CLI.
- To view Tx counters, open the large switch card, then click the RoCE icon ().
- Switch to the full-screen card, then click RoCE Counters.
- Click Tx above the panel on the right.
To view general and CNP Tx counters, run netq show roce-counters tx general
:
cumulus@switch:~$ netq show roce-counters tx general
Matching roce records:
Hostname Interface ECN marked packets TC packets TC bytes unicast no buffer discard buffer usage buffer max usage TC usage TC max usage
----------------- --------------- -------------------- ------------ ------------ ------------------------- ------------ ------------------ ------------ ------------
switch swp1s1 0 0 0 0 0 0 0 0
switch swp1s2 0 0 0 0 0 0 0 0
switch swp63s1 0 0 0 0 0 0 0 0
switch swp1s0 0 0 0 0 0 0 0 0
switch swp63s3 0 0 0 0 0 0 0 0
switch swp1s3 0 0 0 0 0 0 0 0
switch swp63s0 0 0 0 0 0 0 0 0
switch swp63s2 0 0 0 0 0 0 0 0
cumulus@switch :~$
To view RoCE-specific Tx counters, run netq show roce-counters tx roce
:
cumulus@switch:~$ netq show roce-counters tx roce
Matching roce records:
Hostname Interface TC packets TC bytes unicast no buffer discard PFC pause packets PFC pause duration buffer usage buffer max usage TC usage TC max usage
----------------- --------------- ---------- ---------- ------------------------- ------------------ ------------------ ------------ ------------------ ---------- ---------------
switch swp1s1 0 0 0 0 0 0 0 0 0
switch swp1s2 0 0 0 0 0 0 0 0 0
switch swp63s1 0 0 0 0 0 0 0 0 0
switch swp1s0 0 0 0 0 0 0 0 0 0
switch swp63s3 0 0 0 0 0 0 0 0 0
switch swp1s3 0 0 0 0 0 0 0 0 0
switch swp63s0 0 0 0 0 0 0 0 0 0
switch swp63s2 0 0 0 0 0 0 0 0 0
View RoCE Counter Pools
- To view RoCE counter pools, open the large switch card, then click the RoCE icon ().
- Switch to the full-screen card, then click RoCE Counters. Look for these columns: Lossy Default Ingress Size, RoCE Reserved Ingress Size, Lossy Default Egress Size, and RoCE Reserved Egress Size.
To view the RoCE counter pools, run netq show roce-counters pool
:
cumulus@switch:~$ netq show roce-counters pool
Matching roce records:
Hostname Lossy Default Ingress Size Roce Reserved Ingress Size Lossy Default Egress Size Roce Reserved Egress Size
----------------- ------------------------------ ------------------------------ ------------------------------ ------------------------------
switch 104823 104823 104823 104823
View Counters for a Specific Switch Port
To view counters for a specific port:
- Open the large switch card, then click the RoCE icon ().
- Select a port on the left.
To view counters for a specific switch port, include the switch name with the command.
cumulus@switch:~$ netq show roce-counters swp1s1 rx general
Matching roce records:
Hostname Interface PG packets PG bytes no buffer discard buffer usage buffer max usage PG usage PG max usage
----------------- -------------------- -------------------- -------------------- -------------------- -------------------- -------------------- -------------------- --------------------
switch swp1s1 1643392 154094520 0 0 1 0 1
View Results from a Time in the Past
To view counters for a different time period in the past:
- Open the large switch card, then click the RoCE icon ().
- Click in the header and select a different time period.
You can use the around
keyword with any RoCE-related command to go back in time to view counters.
cumulus@switch:~$ netq show roce-counters swp1s1 rx general around 1h
Matching roce records:
Hostname Interface PG packets PG bytes no buffer discard buffer usage buffer max usage PG usage PG max usage
----------------- -------------------- -------------------- -------------------- -------------------- -------------------- -------------------- -------------------- --------------------
switch swp1s1 661 61856 0 0 1 0 1
Disable RoCE Monitoring
If you need to disable RoCE monitoring, do the following:
-
Edit
/etc/netq/commands/cl4-netq-commands.yml
and comment out the following lines:cumulus@netq-ts:~$ sudo nano /etc/netq/commands/cl4-netq-commands.yml #- period: "60" # key: "roce" # isactive: true # command: "/usr/lib/cumulus/mlxcmd --json roce counters" # parser: "local"
-
Delete the
/var/run/netq/netq_commands.yml
file:cumulus@netq-ts:~$ sudo rm /var/run/netq/netq_commands.yml
-
Restart the NetQ agent:
cumulus@netq-ts:~$ netq config agent restart