Before You Install

This overview is designed to help you understand the various NetQ deployment and installation options.

Installation Overview

Consider the following deployment options and requirements before you install the NetQ system.

Single Server	Cluster	Scale Cluster
On-premises only	On-premises only	On-premises only
Network size: small 1-node: Supports up to 40 switches*	Network size: medium 3-node: Supports up to 100 switches*	Network size: large Support varies based on number of nodes. See Server Arrangement.
KVM or VMware hypervisor	KVM or VMware hypervisor	KVM or VMware hypervisor
No high-availability option	High-availability	High-availability
System requirements: 16 virtual CPUs 64GB RAM 500GB SSD disk	System requirements (per node): 16 virtual CPUs 64GB RAM 500GB SSD disk	System requirements (per node): 48 virtual CPUs 512GB RAM 3.2TB SSD disk
Not supported: NVLink monitoring	Not supported: NVLink monitoring	Not supported: Network snapshots Trace requests Flow analysis MAC commentary Duplicate IP address validations Limited support: Link health view (beta)

*When switches are configured with both OpenTelemetry (OTLP) and the NetQ agent, switch support per deployment model is reduced by half.

Server Arrangement

Single server: A standalone server is easier to set up, configure, and manage, but limits your ability to scale your network monitoring and provides no redundancy in case of a hardware failure.

Cluster: The cluster deployment comprises three servers: one master and two workers nodes. NetQ supports high-availability using a virtual IP address. Even if the master node fails, NetQ services remain operational.

Scale cluster: The scale cluster deployment is intended for large network environments and allows you to expand NetQ monitoring capacity by adding nodes as your network grows. NVIDIA typically recommends this deployment for environments with 100 or more switches. It is the only deployment model that supports monitoring for NVIDIA NVLink, NVIDIA Spectrum-X Ethernet, as well as mixed Ethernet and NVLink networks.

The following table shows high-level device support per-node for Ethernet-only, NVLink-only, and combined deployments. This deployment model is currently in beta for clusters larger than 5 nodes. See Verified Limits for detailed testing information.

Deployment	3 Nodes	4 Nodes	5 Nodes	6 Nodes	7 Nodes	8 Nodes	9 Nodes
Exclusively Ethernet	500 switches, 2K hosts	750 switches, 3K hosts	1000 switches, 4K hosts	1250 switches, 5K hosts	1500 switches, 6K hosts	1750 switches, 7K hosts	2000 switches, 8K hosts
Exclusively NVLink	128 NVL	160 NVL	192 NVL	224 NVL	256 NVL	288 NVL	320 NVL
Ethernet and NVLink combined	250 switches, 1K hosts, 64 NVL	375 switches, 1.5K hosts, 96 NVL	500 switches, 2K hosts, 128 NVL	625 switches, 2.5K hosts, 160 NVL	750 switches, 3K hosts, 192 NVL	875 switches, 3.5K hosts, 224 NVL	1K switches, 4K hosts, 256 NVL

In both cluster deployments, the majority of nodes must be operational for NetQ to function. For example, a three-node cluster can tolerate a one-node failure, but not a two-node failure. Similarly, a five-node cluster can tolerate a two-node failure, but not a three-node failure. If the majority of failed nodes are Kubernetes control plane nodes, NetQ will no longer function. For more information, refer to the etcd documentation.

Verified Limits

The following values have been explicitly tested and validated, but they might not reflect the maximum theoretical system limits for NetQ.

Deployment Type	Verified Features	Verified Scale Limit	Data Rate	Hardware Requirements
6-node scale cluster: Ethernet + NVLink	- Ethernet agent features: WJH, RoCE, histograms, adaptive routing, interfaces, inventory, BGP sessions, validations - Switch OTLP data collection - DPU OTLP data collection - NVLink data collection: topology, partitions, metrics	- Ethernet switches: 675 (GPUs: 32K) - DPUs: 8K (OTLP data) - NVLink: 450 GB with 72x1 configuration	- NetQ Agent: ~7 Mbps - OTLP switch: 445 MB/s (3.56 Gbps) - OTLP host: 1,000,000 samples/s at 10-second interval - NVLink: ~32,000 messages/s (2,628 ports) - Counters: 112 per GB/s	6 nodes, each with: - 48 vCPUs - 512 GB RAM - 3 TB SSD/NVMe
6-node scale cluster: Ethernet + NVLink	- Ethernet agent features: WJH, RoCE, histograms, adaptive routing, interfaces, inventory, BGP sessions, validations - Switch OTLP data collection - DPU OTLP data collection	- Ethernet switches: 1,300 (GPUs: 55K) - DPUs: 14K (OTLP data)	- NetQ Agent: ~7 Mbps - OTLP switch: 445 MB/s (3.56 Gbps) - OTLP host: 1,718,750 samples/s at 10-second interval	6 nodes, each with: - 48 vCPUs - 512 GB RAM - 3 TB SSD/NVMe
5-node scale cluster: Ethernet + NVLink (Ethernet agent only)	- Ethernet agent features: WJH, RoCE, histograms, adaptive routing, interfaces, inventory, BGP sessions, validations	- Ethernet switches: 1,300 (GPUs: 55K)	- NetQ Agent: ~14 Mbps	5 nodes, each with: - 48 vCPUs - 512 GB RAM - 3 TB SSD/NVMe
3-node scale cluster: Ethernet + NVLink	- Ethernet agent features: WJH, RoCE, histograms, adaptive routing, interfaces, inventory, BGP sessions, validations - Switch OTLP data collection - DPU OTLP data collection - NVLink data collection: topology, partitions, metrics	- Ethernet switches: 250 (GPUs: 8K) - DPUs: 1K (OTLP data) - NVLink: 100 GB with 72x1 configuration	- NetQ Agent: 2.5 Mbps - OTLP switch: 165 MB/s (1.32 Gbps) - OTLP host: 250,000 samples/s at 10-second interval - NVLink: ~9,200 messages/s (2,628 ports) - Counters: 112 per GB/s	3 nodes, each with: - 48 vCPUs - 512 GB RAM - 3 TB SSD/NVMe
3-node scale cluster: Ethernet-only	- Ethernet agent features: WJH, RoCE, histograms, adaptive routing, interfaces, inventory, BGP sessions, validations - Ethernet OTLP data collection	- Ethernet switches: 500 (GPUs: 16K) - DPUs: 2K (OTLP data)	- NetQ Agent: 5 Mbps - OTLP switch: 330 MB/s (2.64 Gbps) - OTLP host: 500,000 samples/s at 10-second interval	3 nodes, each with: - 48 vCPUs - 512 GB RAM - 3 TB SSD/NVMe
3-node scale cluster: NVLink-only	- NVLink data collection: topology, partitions, metrics	- NVLink: 110 GB with 72x1 configuration - Partitions: 1,600	- NVLink: ~10,000 messages/s (2,628 ports) - Counters: 112 per GB/s	3 nodes, each with: - 48 vCPUs - 512 GB RAM - 3 TB SSD/NVMe
5-node scale cluster: Ethernet-only	- Ethernet agent features: WJH, RoCE, histograms, adaptive routing, interfaces, inventory, BGP sessions, validations - Ethernet OTLP data collection	- Ethernet switches: 1,000 (GPUs: 32K) - DPUs: 4K (OTLP data)	- NetQ Agent: 10 Mbps - OTLP switch: 660 MB/s (5.28 Gbps) - OTLP host: 1,000,000 samples/s at 10-second interval	5 nodes, each with: - 48 vCPUs - 512 GB RAM - 3 TB SSD/NVMe
3-node cluster (non-scale): Ethernet-only	- Ethernet agent features: WJH, RoCE, histograms, adaptive routing, interfaces, inventory, BGP sessions, validations - Ethernet OTLP data collection	- Ethernet switches: 50 (GPUs: 1.6K)	- NetQ Agent: 500 Kbps - OTLP switch: 33 MB/s (264 Mbps) - OTLP host: 50,000 samples/s at 10-second interval	3 nodes, each with: - 16 vCPUs - 64 GB RAM - 500 GB SSD/NVMe

Large networks have the potential to generate a large amount of data. For large networks, NVIDIA does not recommend using the NetQ CLI; additionally, tabular data in the UI is limited to 10,000 rows. If you need to review a large amount of data, NVIDIA recommends downloading and exporting the tabular data as a CSV or JSON file and analyzing it in a spreadsheet program.

Base Command Manager

NetQ is also available through NVIDIA’s cluster management software, Base Command Manager. Refer to the Base Command Manager administrator and containerization manuals for instructions on how to launch and configure NetQ using Base Command Manager.

Next Steps

After you’ve decided on your deployment type, you’re ready to install NetQ.

Before You Install

Installation Overview

Server Arrangement

Verified Limits

Base Command Manager

Next Steps

Products

Solutions

Learn