A Proof-of-Concept demonstration showcasing RoCEv2 configuration on SONiC switches in an NVIDIA Air simulated environment, highlighting expertise in lossless Ethernet fabrics, RDMA networking, and NVIDIA networking ecosystems.
All objectives achieved successfully!
- β 3 SONiC Switches: PFC configured on priority 3 (Ethernet0, Ethernet4)
- β 6 Ubuntu Hosts: Soft-RoCE operational on all servers
- β RDMA Tests: 20-23 MB/sec bandwidth, zero packet loss
- β Lossless Transport: Verified across VXLAN EVPN fabric
- β Multi-VLAN: Successfully tested both VLAN 10 and VLAN 20
Quick Results:
- VLAN 10: server01 β server03 (23.08 MB/sec), server01 β server05 (21.10 MB/sec)
- VLAN 20: server02 β server04 (20.56 MB/sec)
- PFC: Enabled and ready (0 pause frames = no congestion)
See detailed results for complete analysis.
This PoC demonstrates:
- SONiC Configuration: Lossless Ethernet fabric setup using Priority Flow Control (PFC) and QoS on virtual NVIDIA Spectrum switches
- RoCEv2 Enablement: RDMA over Converged Ethernet configuration on simulated ConnectX adapters
- Performance Validation: Functional RDMA testing using industry-standard tools (
perftestsuite) - Automation: Scripted configuration and validation workflows
Platform: Entirely cloud-based using NVIDIA Air - no physical hardware required.
This project evidences practical experience with:
- SONiC NOS: Configuration management, PFC/QoS setup, lossless fabric design
- NVIDIA Networking: Spectrum switches and ConnectX adapter configuration
- RoCE/RDMA: RoCEv2 protocol, RDMA programming, performance testing
- Linux Networking: Interface configuration, RDMA tools, system tuning
- Automation: Bash/Python scripting for network configuration
- PoC Development: End-to-end proof-of-concept design and execution
- Lab: SONiC Numbered BGP EVPN VXLAN Demo (NVIDIA Air)
- Architecture: 3-leaf, 2-spine SONiC switches with 6 Ubuntu 18.04 servers
- VLANs: VLAN 10 and VLAN 20 (L2 extension via VXLAN)
- Protocols: BGP EVPN VXLAN overlay, RoCEv2 for RDMA traffic
- SONiC Version: SONiC.202305_RC.78 (config_db.json configuration method)
See docs/lab-topology.md for detailed topology and IP addressing information.
- SONiC Switches: Virtual NVIDIA Spectrum switches running SONiC
- Host Systems: Ubuntu servers with RDMA-capable interfaces
- Lossless Fabric: PFC-enabled ports, QoS policies for RoCE traffic
- RDMA Tools:
perftestsuite for bandwidth and latency testing
roceonsonic/
βββ README.md # This file
βββ PRD.md # Product Requirements Document
βββ docs/ # Additional documentation
β βββ setup-guide.md # Step-by-step setup instructions
β βββ lab-topology.md # Lab topology and IP addressing details
β βββ results.md # Test results and analysis
βββ configs/ # SONiC configuration files
β βββ sonic/ # SONiC switch configurations
β βββ qos/ # QoS and PFC configurations
βββ scripts/ # Automation scripts
β βββ setup/ # Initial setup scripts
β βββ validation/ # Validation and testing scripts
β βββ perftest/ # RDMA performance test scripts
βββ screenshots/ # Topology, configs, and results screenshots
βββ results/ # Test output files and logs
- NVIDIA Air Account: Free registration at https://air.nvidia.com/
- Browser: Modern browser with WebRTC support for NVIDIA Air console access
- Basic Knowledge: Familiarity with Linux CLI and networking concepts
- Access NVIDIA Air: Log into your NVIDIA Air account at https://air.nvidia.com/
- Launch Lab: Start the "SONiC Numbered BGP EVPN VXLAN Demo" lab
- Access Environment: Connect via oob-mgmt-server (username:
ubuntu, password:nvidia) - Review Topology: See
docs/lab-topology.mdfor device names, IPs, and connectivity - Configure Switches: Apply PFC/QoS configurations on leaf switches (see
configs/anddocs/setup-guide.md) - Configure Hosts: Enable RoCEv2 on server
eth1interfaces (seescripts/setup/) - Validate: Run validation scripts to verify configuration
- Test: Execute RDMA performance tests between same-VLAN servers
Note: Detailed step-by-step instructions are available in
docs/setup-guide.md. Lab-specific topology information is indocs/lab-topology.md.
- β FR-01: SONiC lossless configuration with PFC and QoS
- β FR-02: RoCEv2 enabled on host interfaces
- β FR-03: RDMA tests (ib_write_bw, ib_send_lat) running successfully
- β FR-04: Zero packet loss validation with PFC counters
- β FR-05: Automated configuration and testing scripts
- β FR-06: Comprehensive documentation
ibv_devices/ibstat: RDMA device verificationib_write_bw: RDMA write bandwidth testib_send_lat: RDMA send latency testshow qos(SONiC): QoS and PFC verification
# Example PFC configuration commands
# (See configs/ directory for complete examples)# Example host configuration
# (See scripts/setup/ directory for complete examples)Switch Configuration:
- β PFC configured on priority 3 for Ethernet0 and Ethernet4 on all three leaf switches (leaf01, leaf02, leaf03)
- β Lossless buffer pools configured (12.7 MB ingress/egress)
- β PORT_QOS_MAP applied successfully
Host Configuration:
- β Soft-RoCE (rdma_rxe) configured on server01 and server03
- β rxe0 RDMA devices created and active on eth1 interfaces
RDMA Testing:
- β Functional RDMA communication established between server01 and server03
- β Achieved ~23 MB/sec bandwidth with ib_send_bw
- β Zero packet loss demonstrated
- β PFC counters verified (PFC enabled and ready)
Performance Metrics:
- Bandwidth: ~23 MB/sec (ib_send_bw, 65536 byte messages)
- Packet Loss: Zero
- PFC Status: Enabled on priority 3, ready to activate on congestion
Important: Results are from NVIDIA Air simulation and are not representative of real hardware performance. Focus is on functional correctness and configuration validation.
Performance results and analysis are documented in:
docs/results.md: Comprehensive test results and analysis with full detailsscreenshots/: Visual evidence of configuration and testingresults/: Raw test output files
This is a personal portfolio project. However, suggestions and improvements are welcome!
This project is licensed under the MIT License - see the LICENSE file for details.
For questions or feedback, please open an issue in this repository.
Status: β PoC Completed Successfully
Completed: December 30, 2025
- PFC configured on leaf01, leaf02, leaf03 (priority 3, Ethernet0/4)
- Soft-RoCE configured on server01, server03
- RDMA tests passing with ~23 MB/sec bandwidth
- Zero packet loss demonstrated
- PFC ready to activate on congestion
Last Updated: December 30, 2025
Network Topology
This diagram shows the OOB management plane, spine/leaf fabric, and servers used in the lab.
flowchart LR
subgraph OOB[OOB Management]
OMS[OOB Management Server]
OBS[OOB Management Switch]
OMS --- OBS
end
subgraph Spine[Spines]
SP1[Spine01]
SP2[Spine02]
SP1 --- SP2
end
subgraph Leafs[Leaf Fabric]
L1[Leaf01]
L2[Leaf02]
L3[Leaf03]
end
subgraph Hosts[Servers]
S1[server01]
S2[server02]
S3[server03]
S4[server04]
S5[server05]
S6[server06]
end
%% Spine-to-Leaf connectivity
SP1 --> L1
SP1 --> L2
SP1 --> L3
SP2 --> L1
SP2 --> L2
SP2 --> L3
%% Leaf-to-Host connectivity
L1 --> S1
L1 --> S2
L2 --> S3
L2 --> S4
L3 --> S5
L3 --> S6
%% Management plane connectivity to leaf fabric
OBS --> L1
OBS --> L2
OBS --> L3
You can also edit the separate source file at diagrams/topology.mmd.