The Great Interconnect Debate: NVLink, InfiniBand, UALink, and the Quest for AI Supremacy

The Great Interconnect Debate: NVLink, InfiniBand, UALink, and the Quest for AI Supremacy

In the cutting-edge realm of Artificial Intelligence (AI) and High-Performance Computing (HPC), the performance bottleneck has long shifted from raw processor power to the intricate high-speed networks that connect these powerful compute units—the interconnect technologies. Our recent deep dive has unveiled a fascinating world of competition and complementarity, populated by technologies like NVLink, InfiniBand, UALink, and Ultra Ethernet. This article systematically organizes our key findings to present a panoramic view of these critical technologies.

Chapter 1: The Four Protagonists—The Cornerstones of AI Networking

Before we delve into the technical details, let's first meet the protagonists of this grand drama:

  • NVLink & NVSwitch: NVIDIA's trump card, a proprietary, high-speed, short-range interconnect technology for GPU-to-GPU communication.
  • InfiniBand: A battle-tested high-performance interconnect standard in the HPC and AI fields, renowned for its ultra-low latency and high throughput.
  • Ultra Ethernet (UEC): The Ethernet camp's revolutionary response, aiming to bring InfiniBand-level performance to the ubiquitous Ethernet ecosystem.
  • UALink (Ultra Accelerator Link): An open standard for connecting AI accelerators within a node, seen as a direct challenger to NVLink.

Chapter 2: Scale-Up vs. Scale-Out—Two Dimensions of Expansion

The first key to understanding these technologies lies in distinguishing between two different dimensions of expansion: Scale-Up and Scale-Out.

Interconnect Technology Expansion Type Core Application Scenario
NVLink / NVSwitch Scale-Up (In-Node) Tightly connecting multiple GPUs within a single server or compute node to form a powerful compute unit.
UALink Scale-Up (In-Node) Connecting multiple AI accelerators within a node to build a unified memory pool.
InfiniBand Scale-Out (Inter-Node) Connecting thousands of server nodes to build large-scale AI training and HPC clusters.
Ultra Ethernet Scale-Out (Inter-Node) Also used for connecting large-scale nodes, but focused on achieving the high performance required for AI/HPC on top of Ethernet.

Chapter 3: The Reliability Debate—Why Traditional IP Networks Fall Short

One of the most profound discoveries from our discussion is the fundamental difference in reliability design philosophy between traditional IP networks and specialized interconnect technologies.

Traditional Ethernet/IP: Best-Effort

The core design of IP networks is "best-effort," meaning it does not guarantee that a data packet will arrive. When the network is congested and router buffers are full, it chooses to directly drop packets. Reliability is entirely left to the upper-layer TCP protocol, which performs complex packet loss detection and retransmission in software (the operating system kernel). The costs of this design are:

  • High Latency: TCP retransmissions are on the order of milliseconds (ms), which is fatal for AI training that requires microsecond (μs) level synchronization.
  • High CPU Overhead: The processing of the kernel protocol stack consumes valuable CPU resources.
  • Unpredictable Performance: Packet loss rates and latency fluctuate dramatically with network load.

InfiniBand & NVLink: Lossless Networks

In contrast, InfiniBand and NVLink are designed as "lossless networks," which fundamentally avoid packet loss through hardware mechanisms.

Core Mechanism: Credit-Based Flow Control (CBFC)

This is a proactive congestion management technique. Before sending data, the sender must first obtain "credit" from the receiver, with each credit representing an available buffer on the receiver's side. Without credit, no data is ever sent. This is like a system that "doesn't release the hawk until it sees the rabbit," ensuring that data packets are never dropped due to insufficient buffer space at the receiving end.

This hardware-level reliability guarantee provides microsecond or even sub-microsecond latency and almost zero CPU overhead, providing a stable and predictable performance foundation for AI/HPC.

Chapter 4: Ultra Ethernet—The "Super Evolution" of Ethernet

Ultra Ethernet aims to transplant the advantages of InfiniBand into the Ethernet ecosystem. It achieves this goal through a brand-new protocol stack (UET), with its core innovations including:

  1. Multi-path Routing and Packet Spraying: Dispersing a single data stream across multiple network paths, greatly improving network utilization and effective bandwidth.
  2. Flexible Ordered Transport: Allowing a certain degree of out-of-order packet delivery to optimize network paths, while reordering at the receiving end to ensure data correctness at the application layer.
  3. Advanced Congestion Control: Introducing more refined congestion management mechanisms than traditional PFC (Priority Flow Control) to avoid issues like head-of-line blocking.

Chapter 5: NVIDIA's Unified Architecture—An Engineering Marvel

The most astonishing part of our discussion was NVIDIA's "unified architecture" implemented across its three major switch product lines—NVSwitch, Quantum (InfiniBand), and Spectrum (Ethernet).

The Truth About "Shared IP Blocks"

Here, "IP" does not refer to the Internet Protocol, but to Intellectual Property. In the semiconductor design field, an IP Block or IP Core refers to a reusable, pre-designed circuit functional module. NVIDIA's unified architecture is achieved precisely by reusing these mature IP blocks in different chips.

The Three Tiers of the Unified Architecture

  1. NVSwitch ↔ Quantum IB (Control Plane Sharing): The fourth-generation NVSwitch shares a large number of control plane IP blocks with InfiniBand switches, such as topology discovery, routing calculation, and partition management logic. Its core management service, NVLSM (NVLink Subnet Manager), evolved from the IB Subnet Manager.
  2. Quantum IB ↔ Spectrum Ethernet (Physical and Link Layer Sharing): These two share underlying SerDes technology, silicon photonics integration (CPO) technology, and even manufacturing and packaging partners.
  3. Unified Management Plane Concept: Although the management tools are separate, they share similar management concepts (like LID, PKEY, etc.) and architecture, greatly reducing the learning and management costs for users.

An Interesting Discovery: Why Do Spectrum Switches Have Twice the Transistors of Quantum Switches?

An NVIDIA networking lead revealed that the Spectrum-4 switch has more than twice the number of transistors as the Quantum-2. The reason is that Ethernet needs to support distributed routing protocols (like BGP/OSPF), and this complex routing calculation logic must be implemented within the switch chip. In contrast, InfiniBand uses centralized routing, where routing calculations are performed by an external Subnet Manager (SM). The switch itself only needs to perform simple table lookups and forwarding, resulting in a simpler chip design.

Chapter 6: Clarifying Misconceptions—Fabric Manager vs. UFM

A common misconception is that Fabric Manager can uniformly manage all NVIDIA networks. The actual situation is:

Management Tool Managed Objects Scope
Fabric Manager (FM) + NVLSM NVLink and NVSwitch In-Node
UFM (Unified Fabric Manager) InfiniBand Network Data Center Scale

"Unified" refers to the underlying architecture and design philosophy, not a single management software. It is a layered management system with distinct responsibilities.

Chapter 7: The UDP "Disguise" of RoCEv2

In our discussion, we also uncovered a clever design in RoCEv2 (RDMA over Converged Ethernet): why does it need to encapsulate a UDP header on top of the InfiniBand transport layer?

The answer is: the UDP here does not play the role of a transport layer, but rather a "stateless encapsulation layer." It acts as a "disguise," wrapping the IB data packet into a standard UDP packet so that it can be routed on existing IP network devices without any modifications to those devices. The source port number in the UDP header is also cleverly used to implement ECMP (Equal-Cost Multi-Path) load balancing. The real reliable transport, congestion control, and other functions are still handled by the encapsulated InfiniBand transport layer within.

Chapter 8: The New Contender—Huawei's Unified Bus

Just as the battle between NVIDIA's proprietary ecosystem and the open standards alliance intensifies, a new, ambitious contender has emerged: Huawei's Unified Bus (UB) and its UB-Mesh architecture. Unveiled at Hot Chips 2025, this technology represents a radical vision: to replace the entire alphabet soup of interconnects—from PCIe and NVLink to TCP/IP and RoCE—with a single, unified protocol.

One Protocol to Rule Them All

Huawei's core idea is that by creating a single protocol, any port can talk to any other port without the need for protocol conversion. This approach aims to dramatically reduce latency, control costs, and improve reliability, especially in the context of massive, gigawatt-scale AI data centers, which Huawei calls "SuperNodes." The vision is to transform the entire data center into a single, coherent system connected by UB-Mesh.

Key Technical Ambitions of Unified Bus:

  • Scale: Designed to unify up to 1,000,000 processors (CPUs, GPUs, NPUs) into a single SuperNode.
  • Bandwidth: Aims for per-chip bandwidth scaling from 100 Gbps to an astonishing 10 Tbps (1.25 TB/s).
  • Latency: Targets a reduction in hop latency from microseconds down to ~150 nanoseconds.
  • Semantics: Shifts from the asynchronous DMA model of traditional networks to a synchronous load/store model, similar to NVLink, for more efficient data access.
  • Reliability: Implements reliability at every layer of the stack, from link-level retries on optical paths to hot-spare racks for system-level resilience, targeting a 100x improvement in Mean Time Between Failures (MTBF).

A Strategic Move in the Geopolitical Landscape

Huawei's push for a completely new, open-source interconnect standard is also a strategic response to the geopolitical landscape and semiconductor sanctions. By creating a homegrown, open standard, Huawei aims to reduce its dependence on Western technologies and build a self-sufficient AI ecosystem. This makes the Unified Bus not just a technical competitor to NVLink, UALink, and Ultra Ethernet, but also a significant player in the global technology race.

Conclusion: The "Warring States" Era of Interconnects

The field of AI and HPC interconnects is entering an exciting "Warring States" era. NVIDIA, with its highly vertically integrated NVLink+InfiniBand ecosystem and unified architecture, has built a formidable technological barrier. At the same time, the open standards camp, represented by Ultra Ethernet and UALink, is launching a powerful challenge, leveraging the vast Ethernet ecosystem. Now, with Huawei's Unified Bus entering the fray with its all-encompassing vision and open-source strategy, the competition has become even more complex and fascinating.

There is no absolute winner in this contest; different technological paths will continue to evolve in their respective areas of strength. But one thing is certain: it is this fierce competition that is driving the entire industry forward at an unprecedented pace, paving the way for the next generation of super-intelligent systems.

← Previous Post

Leave a Comment