# Update on Recent Developments in Industry

Lily Lyu (Huawei) Jieyu Li (CMCC)

**January Interim 2025** 

## Motivation

- Given the critical role of networks in AI cluster, the industry has seen rapid development.
- Industry alliances have been established to create an open ecosystem for the networks.
- This presentation aims to provide an update on the latest progress in industry.
  - The concepts of 'scale-out' and 'scale-up' are important.
  - Industry alliances have focused their efforts around 'scale-out' and 'scale-up'.

#### 'Scale-Out' and 'Scale-Up' Recap

Scale-up

### AICN: Connecting Accelerators for AI Training



Source: https://mentor.ieee.org/802.1/dcn/24/1-24-0055-00-ICne-clarification-of-the-content-in-aicn.pdf

## **Commercial Product Example**



## Scale-out Network Technologies Converging on Ethernet Stack

#### Ultra Ethernet Consortium(UEC) is formed by industry giants

#### **Background:**

- Steering member include AMD, Broadcom, Arista, Cisco, Eviden, HPE, Intel, Meta, Microsoft and Oracle.
- Aim to build an Ethernet-based, open, interoperable, high performance, full-communications stack architecture to meet the growing network demands of AI & HPC at scale
- Spec 1.0 targets the scale-out network
- Website: https://ultraethernet.org/

#### **Timelines**:



#### **Key technologies:** Transport layer Application Packet spraving, multipathing -- Out-of-order delivery Congestion control: Enhanced **Transport Layer** Tx and new Rx • ..... Network Laver Ethernet link/phy layer Data Link Layer Negotiation – LLDP Data Link Layer Link Level Retry - LLR Credit Based Flow Control - CBFC Physical Laver IEEE Compliant 100G **Physical Layer** Signaling

Source: 2024 OCP Global summit

## **Scale-up Network Technologies Under Discussion**

#### **Ultra Accelerator Link Consortium (UALink) Leveraging Ethernet PHY**

#### **Background:**

- Promoter members include AMD, AWS, AsteraLabs, Cisco, Google, HPE, Intel, Meta, Microsoft, Synopsis, Alibaba, Apple.
- Aim to develop interconnect technical specifications that facilitate direct load, store, and atomic operations between Al Accelerators (<1K endpoints), supporting Al/ML scale-up networks and workloads
- Website: https://www.ualinkconsortium.org/



#### Key technologies:

### Scale-Out + Scale-Up = Interconnection Between Accelerators

#### The Scale-Out and Scale-Up networks constitute Accelerator-to-Accelerator Interconnection, but there is no clear boundary between the two.

#### UEC point of view:

| • | Scale-Out Network |
|---|-------------------|
|---|-------------------|

- Scale: Cluster -10k nodes and **7**
- Distance: <100m; RTT <10 uS + ; BW ~100GB/S
- Network semantics (DMA and packetized I/O)
- Scale-Up Network
  - Scale: Within a node; small scale e.g., 256 XPU?
  - Distance: ~1m ; RTT ~1 uS +; BW ~1200 GB/S
  - Memory and Network semantics

#### UALink point of view:

- Memory shared across Accelerators
  - Memory semantic (Direct load, store, and atomic operations between accelerators)
- The ability to make accelerators in a single Rack Pod or several Rack Pods act like a one giant accelerator to complete the task
- Complementary with scale out approaches such as UEC

Source: 2024 OCP Global summit

#### My point of view:

- Supporting memory semantic (e.g. load/store operation) is the key characteristic of scale-up.
  - Both scale-up and scale-out require low latency/high bandwidth network. Scale-up network has more demanding requirements in order to support memory semantic.
- Defining boundary between scale-out and scale-up networks is a comprehensive issue, impacted by factors such as network technology, accelerator capability, model architecture etc.
  - Open discussion in industry

## Scale-Out More Focus on Network LB & CC; Both Scale-Out and Scale-Up Enhance PHY/LL with Similar Strategies

- Scale-out: load balancing  $\rightarrow$  packet spray
- PHY/LL: similar strategies in the technical approaches to enhance the performance of underlying links, such as LLR,CBFC etc.



## **Discussion: IEEE 802's Role in AI Network Development**

IEEE802 covers Ethernet physical layer and link layer.

- ✓ IEEE802.1 used to standardize DCB technologies. PFC is widely used and QCN has enlightened other congestion control mechanisms (e.g. DCQCN).
  - CBFC was discussed in ieee802.1
  - LLR was in ieee802
  - LB was analyzed in NENDICA report
- ✓ IEEE802.3 is working on Beyond 400G project which is recognized as high bandwidth opportunity for AI networks.
  - There was a contribution about low latency PHY in NEA

Scale-out Scale-up Application Application Link layer **High Goodput** Transport Layer IIR. Transaction Layer Network Layer CBFC High Reliability Payload efficiency Data Link Layer Data Link Layer High BW Physical layer Physical Layer Physical Layer Ethernet serdes (200G/lane) Low Latency Low latency FEC

IEEE802 could start/re-start investigations into the PHY/link technologies toward performance matrix required for AI network.

## **Thanks!**