Resnics Technical Forum丨What can P4 do?

2022 May 13



Preface

When Intel acquired Barefoot Networks in 2019, it didn’t make many waves in the industry. That’s because Broadcom, the dominant player in the market at the time, didn’t join the P4 camp and pushed its own P4-like featured OpenNAS, while Mellanox’s spectrum series was just a Hybird solution based on its own SAI.[1]

Time comes to 2022, when AMD spent 1.9 billion dollars to acquire Pensando, a P4 NIC manufacturer, P4 has widely attracted attention. The main reason was that Intel had previously launched its own P4-enabled IPU Mount.Evans, and switch boss Cisco had also launched the G100 Silicon One,[2] which fully supported P4.

From the P4 organization’s website, you can find a lot of information as well as standards, but why there is a P4, and what P4 can do, this also has to start from the father of P4 set off a network revolution.

The Changing Face of the Web

For the traditional 7- or 4-layer network model, a layered architecture eliminates the maintenance of the network’s connection state, and each layer focuses only on its own implementation of functionality, thus enabling a large number of applications to be implemented over TCP/IP. The layered architecture also allows IP to run on a variety of different media, from Ethernet to wifi to 5G networks, without the need to focus on the specific implementation of the underlying layer.

The Internet is basically a distributed packet-switching based network implemented according to this model, which are connected through store-and-forward based gateways. Such networks enable the basic functions of the early Internet: reliability and scalability.

Since TCP/IP ruled over the entire Layer 3 and 4 protocols, change in networking has essentially come to a standstill. The following network framework was accepted:

1. hardware switches and routers with distributed management

2. a mechanism for packet-switching-based store-and-forwarding

3. Host-side use of kernel-based TCP/IP stacks

On this basis, we have implemented network protocols on more transmission media, realized more various applications, and verified various flow control mechanisms. After all, in a TCP network, the method of triggering flow control is to drop packets, as well as extending the IP address from IPv4 to IPv6.

But there’s an important change happening since Hyperscale came along, which has a large number of private networks where the east-west traffic is often orders of magnitude greater than the north-south traffic. For such data center networks, they are making new demands:

🔹 Management based on individual hardware network devices is no longer sufficient, they need global management.

🔹 There is a wide variety of devices and they want to enable full lifecycle management of devices.

🔹 For Hyperscale, the network environment of the private data center is relatively simple compared to the WAN, and they want to pursue higher performance.

Switch Changes

As a result, for traditional network switch hardware devices, the separation of the control path from the data path becomes mainstream. With the advent of OpenFlow [3] and SDN [4] controllers, a plethora of white box switch products became standard in Hyperscale data centers.

At the same time, as the data center network scales, more and more network ports are on a large L2 network, the original L2 network relies on the broadcast topology discovery mechanism has problems, the broadcast consumes too much bandwidth, and scalability problems occur. The original VLAN technology could not meet the requirements because of scalability. Therefore, the global concept of SDN has become the mainstream of the data center network. You can see that the physical switch of the original network becomes the single function packet forwarding engine, and the control path is realized by the Network OS.

When the datapath and control path are separated, flexible control of the datapath becomes a new requirement, and it is hoped that the following functions can be realized through programmable datapaths:

🔹 Protocol-agnostic, the switch’s data path can handle arbitrary packet formats.

🔹 Configurable, the switch’s controller can define the configuration of the parsers in the data path as well as the packet processing modifiers.

🔹 Completely decoupled from hardware, the processing language for network packets is completely independent of the hardware, and the mapping to the hardware is realized by the compiler.

The emergence and adoption of switch P4

Thus, the earliest PISA architectures emerged [5].

Barefoot led the way, and the mainstream switches in the industry are now moving in this direction. In academia, there are also many applications using P4, which are currently focused on monitoring the network. But the domestic Alibaba in 2021 Sigcomm on the Luo Shen gateway [6] is considered a real landing application.

The solution uses the programmable function of Barefoot switch chip, and the gateway service flow information combination, in the hardware path to achieve the flow of different services for different custom processing.

In other scenarios, host offloading of multicast or multicast applications in the public cloud can also be well realized using P4’s data path programmability. [7].

Therefore, it can be seen that on a switch chip that supports P4, users can combine their applications in the following directions to realize acceleration of their original applications:

1 Network monitoring, using P4’s language for stream-based statistics.

2 Forwarding of data streams, in the user example above the data streams only go through the switch’s data path and do not enter the host’s protocol stack.

3 Flow control algorithms for data center networks, using P4-defined INTs can achieve end-to-end flow congestion control for the network.

Pensando.

After supporting P4 on the switch side of the network, Intel’s IPUs introduced in 2021 also added P4 support, and Pensando [8], a company founded by Cisco’s “MLPS” quartet, also implemented a P4 engine in the NIC from the beginning, before Intel.

The architecture of the first generation of NICs can still be seen in the switch, using HBM as the memory cache for the P4 engine. The P4 engine uses 112 MPUs, 4 MPUs for each P4 stage, thus realizing a P4 ingress/P4 egress 6+6, and P4 TxDMA/RxDMA 8+8 stage pipeline. However, in the second generation of chips, DDR4 is used instead of HBM. After all, 1.6TB/s of bandwidth is trivial for switches, but it is indeed a bull’s-eye for NICs.

For each level of P4 engine, it’s architecture is unified, starting from TE and containing 4 MPUs. based on each MPU, Pensando implements its own instruction set for P4. ]

Compared to the rich L3/L4 features of traditional switches, with the addition of P4, Pensando can realize rich network features that were originally only available in high-end devices. [10].

Intel IPU.

Intel’s IPUs released in 2021 also include a P4 engine for network packet processing, which is different from the switch-based architectures of Barefoot and Pensando, and is targeted at Virtual Switches, which are currently dominating Hyperscale.[11]

It is also very well characterized:

1 Full support for Open vSwitch and full hardening of the entire OVS data path.

1Full support for Open vSwitch, the entire OVS data path is fully hardened.

2 Programmable Parser to support exact lookups, wildcard lookups, and range lookups, which are fully compatible with OVS.

2 Programmable Parser to support exact lookup, wildcard lookup, and range lookup.
3 OVS slow path and fast path implementation can be seamlessly mapped to IPU.

In order for Hyperscale to better utilize IPU, Intel also provides P4-based frameworks such as IPDK. From Intel’s framework, you can see the figure of ovs-opctl on the control path. [12]

Resnics P4.

Like Intel, Resnics began by considering the programmability of P4 to maintain the flexibility of its network data paths, reusing the most popular open source virtual switching software frameworks for its control paths, in order to achieve data center network acceleration.

A P4 engine based on VLIW instructions was implemented to generate customized PHVs, enabling an all-hardware network datapath inside the NIC. Like Pensando, the P4 engine is used to transfer data between the host’s PCIE interface, the internal CPU and the network. [13].

With P4-based hardware implementation and OVS control path, Resnics’ 25G NICs provide OVS data path acceleration, while P4-based Parser supports different network protocols, Tunnel protocols and various option header fields.

For OVS data path acceleration in current open source virtualization solutions, Resnics implements the following features based on standard OVS:

▶ Hardware acceleration for OVS exact lookups

▶ Hardware acceleration of encapsulate and decapsulate operations for Tunnel network protocols.

▶ Hardware acceleration of OVS connection tracking based on TCP state

▶ Provides QoS based on the queue level of Virtio devices

▶Implementation of hardware offloading of TCP checksum and native RSS mechanisms

▶ Implementation of Ethernet port-based bonding mechanism

With more and more NICs supporting P4 as the packet processing engine, in addition to being able to do Virtual Switch data path acceleration, the P4 engine on the NIC can accomplish more business processing.

Scenarios for using P4 on NICs

As with the previously cited example of Alibaba’s Data Center Gateway, many current Internet-based applications require the use of Tunnel technology to establish a connection with the server side to build a secure Tunnel-based data service. Such as the current enterprise VPN, as well as a variety of video services and so on, these applications often require a lot of network access servers. Deployment of P4-based smart card on these servers can be realized for the Tunnel protocol encap/decap offload, as well as the related NAT operation of the offload.

Another relatively mature application scenario has to do with the virtualization of network functions for carriers. In an NFV usage scenario, there will be a large number of vRouters to provide routing functions on the network.

P4-based smart NICs enable hardware offloading for different levels of network traffic through the programmability of data paths.

P4 in switch and NIC linkage

Similarly, in P4 usage scenarios, P4 switch and P4 smart NIC linkage is an important direction. In the current AI training scenario, SwitchML [14] is also an example of P4 switch and NIC linkage. Deploying a P4-based smart NIC on the server side can use a hardware engine for network packet processing, thus replacing the currently used kernel-bypass DPDK framework.

In large data center networks, ICMP-based network monitoring mechanisms can no longer meet the demand, and P4 defines the INT, using P4’s INT to provide users with end-to-end network latency information for per packet.

Finally, as the port bandwidth of large data center networks grows, more applications are more inclined to the underlying hardware for accelerated offloading in pursuit of low latency and stability, and P4-based network processing engines will become more and more important to more applications.

[1] https://gist.github.com/ jopietsch/c1573518516af6071ae9cd0462ff0fd3#file-ethernet_asics-csv

[2] https://www.cisco.com/c/en/us /solutions/collateral/silicon-one/datasheet-c78-744833.html

[3] https://opennetworking.org/sdn-resources/ customer-case-studies/openflow/

[4] https://www.sdxcentral.com/networking/sdn/definitions/what-the-definition-of-software-defined-networking-sdn/what -is-sdn-controller/

[5] https://www.infoq.com/presentations/pisa-asic-p4/

[6] https://dl.acm.org/doi/abs/10.1145/3452296.3472889

[7] https://onfstaging1.opennetworking.org/wp-content/uploads/2019/09/5.30pm-Satoshi- Horiuchu-Shinji-Yonesaka-The-Result-of-Usecase-of-P4-and-New-Usecase-of-P4.pdf

[8] www.pensando.io

[9] http://www.opennetworking.org/wp- content/uploads/2020/04/Plenary-4-Slide-Deck.pdf

[10] https://www.youtube.com/watch?v=8UIYP7h_KG4

[11] https://www.servethehome.com/intel-mount-evans-dpu-ipu-arm-accelerator-at-hot-chips-33/hc33-intel-mount- evans-dpu-ipu-packet-processing-p4/

[12] https://www.youtube.com/watch? v=jpE2qdV0tJw& amp;ab_channel=IPDK

[13] www.resnics.com

[14] https://github.com/p4lang/p4app-switchML

About Us

Resnics was founded in July 2020 and is headquartered in Caohejing New Technology Development Zone, Shanghai, China. The team consists of core professionals in the fields of networking, switching, and storage at home and abroad, with deep technical expertise in networking, switching, storage, and high-performance CPUs.

The company is committed to providing leading storage and network chip solutions for the communications and Internet industries, aiming to become a high-tech innovative company with international competitiveness, leading the industry with innovation and participating in the development of industry standards.