pertinent timely is redundent. Delete one or the other
Be consistent with capitalization - don't do it here. Same goes for High Performance Computing and Artificial Intelligence...
during?
delete
to quickly transmit
Consider deleting all of this text. While all true, not really adding a lot of value to the paper at this point and we are too long
Could delete this sub-clause title and simply continue
What does this stand for? spell out first time.
GIV = Global Industry Vision
Unclear what all this means. Re-word this
on?
to
Today data centers
re-word. clumsy read
the network
Maybe we just say that "AI applications and environments are growing in scale and complexity."
creates temporal correlations
This
can exceed
consider removing this and putting a single sentence at the beginning of the next paragraph
Reverse this - spell it out first and then put the abbreviation in parens
Do we really care? Delete the sentence
results
, but is poised to become a significant component of latency with networked SCM storage.
There are two distinct types of network latency;
that
at the
The parallel AI computing models create unique traffic patters that result in heavy network congestion.
and the key to addressing dynamic latency is mitigating congestion.
maybe too much background and not necessary?
This really seems like it could have been placed up in the model parallel data parallel area
relative historical
Again, reverse this - spell out first, then acronym in parens
another's
Could be deleted without any loss
In 2000, the InfiniBand Trade Association (IBTA) released the initial Infiniband specification with support for RDMA. Infiniband is tailored for an efficient hardware design that ensures reliable data transmission and direct access to the memory of remote nodes.
iWarp is an RDMA protocol, defined by the IETF in 2014 to run over TCP. Using TCP as a transport allows iWarp to traverse the Internet and wide area as well as a standard Ethernet network and within a data center. While iWarp can be implemented in software, to obtain the desired performance within the data center specialized iWarp NICs card are used.
Reverse - ackronym last
Can be removed
has become the protocol of choice for
For example, distributed training for machine learning has accelerated more than 100 times and the I/O speed of networked SSD storage has improved more than 50 times using RDMA for communications as opposed to TCP/IP.
a GPU memory.
This technology can by supported by any PCIe peer which provides access to its memory, such as NVIDIA GPU, XEON PHI, AMD GPU, FPGA, and so on
These two types of “pinned” memory are separate sections of host memory that are dedicated to the GPU and the SmartNIC.
Next, the SmartNIC used RDMA to transmit the data to the remote server across the network.
1. Consumption of CPU resources. The CPU may become a bottleneck during the data copy. 2. Increased latency and reduced bandwidth. The additional memory copies take time and reduce I/O bandwidth. 3. Host memory consumption. Multiple sets of pinned buffers reduce available host memory which impacts application performance and increases system TCO.
For this optimization to work, the CPU coordinates RDMA communication tasks for the GPU and SmartNIC.
data to a remote GPU memory.
These graphs have no legend. Unclear what the green and red lines are. I suspect red is GPU DirectRMDA and green is traditional
flows
increasing
The CBD is created when a dependent switch, in a sequence of switches, is waiting for the availability of buffers in other switches before transmitting a packet
the sequence of switches are
Use Caps for first letter as done in subsequent usage
This leads to multiple combinations of TCP and RoCE flows traversing common links.
The
a fixed amount of
a common
the memory usage for
to
by deploying
network, but
assigned
Just move the reference to the end of the previous sentence and delete the last sentence.
let the original PAUSE timeout.
These probably shouldn't be all CAPs. Maybe ok to leave XON and XOFF, but the PAUSE should maybe be lower case
Could have a sub-heading here
With thousands of ports to configure, a network operator will benefit from an automated solution the configures PFC headroom.
Could have a sub-heading here
As shown in Figure 18
oriented
in real-time
the rate
end-stations
Delayed congestion signals and untimely adjustment to sending rates can cause fluctuations in switch queue depths within the network, leading to variance in throughput and latency.
loss, but as previously discussed, it is increasingly difficult to support large buffers.
at
at
flow
Is this really an ODCC proposal? Propose new sentence: A mechanism to prevent PFC deadlock involves discovering and avoiding CBD loops.
again, we could put the reference at the end of the previous sentence and delete this sentence.
in
transmitter (RP)
receiver (NP)
This could be deleted without loss of information.
congestion spreading or can lead to congestion spreading
The unwanted
PFC messages
messages
In this scenario, the
the packet sequence number for a congested flow
has experienced
results
buffer? or provide buffer to receive
, but
for the amount of
For example, interface
has
makes it
flowing directly
Using the telemetry stream of data from network devices,
into the packet headers by the switch data plane
with the contents of the original data packet, which has a finite size.
thus
are
a result of
optimizes
for the channel
in
specification
CNPs,
above, augment
in
IEEE Std 802.1AS-2020
A solution for the data center is needed to reduce the overhead of lossless mode configuration and the associated chance of error.
A mechanism to communicate this capability between peers and an update to the current description of how to manually calculate headroom are excellent candidates for an amendment to IEEE Std 802.1Q
, but
of the network state,
algorithm that leverages