Resnics Technology Forum|NVMe over Fabric: The Rise of NVMe over Fabric

2022 Mar 30

Preface

Starting in 2013, NVMe SSDs started to be deployed in data centers, and the PCIE SSD market, which was previously a mishmash of 200+ vendors such as Fusionio, Virident, OCZ, etc., was finally unified by the NVMe protocol. You can take a look at [1] to get an idea of what 211 SSD Makers were up to back in the day.

The data center industry requires a high degree of interoperability of components, unlike the communications industry which can start by making their own chips and work their way up to boards, systems, and solutions – after all, they have a relatively fixed number of customers. Most vendors in the data center need to focus on their strongest areas, so the whole system is naturally divided into three directions: compute, network and storage.

As with traditional storage, when the NVMe protocol standard standardized standalone configurations, there was a natural demand for networked storage. Fusion-io, an early leader in PCIE SSDs, demonstrated products like ioSAN [2], which uses 10G Ethernet, 40G InfiniBand to export storage resources externally.

In the same vein, during the development of the NVMe protocol standard, Mr. Huang Yiren, the founder of Resnics Technology, invented the NVMoE patent in 2013 and designed the world’s first NVMe SSD controller that supports NVMe over Ethernet.

NVM Express Controller for Remote Access of Memory and I/O Over Ethernet-Type Networks [3].

First Wave.

In the Silicon Valley of California, where bulls come out in abundance, Andy Bechtolsheim [4], as the first employee and Chief Hardware Engineer of SUN Micro, and an angel investor of Google, is the wind vane of investors. He also invested in DSSD [5] and Annapurna Lab [6], which are undoubtedly two bright new stars after the introduction of the NVMe protocol standard.

DSSD’s systems, launched after DSSD was acquired by EMC, use the NVMe over PCIE architecture. [7] [8]

Customized PCIE cables are used on the control system side of the storage to connect to the clients. An NVMe over PCIE strategy was used to provide high bandwidth and high capacity. In a statement from DSSD’s supercomputing customers: [9]

The entire DSSD D5 system provides 10P of storage capacity, delivering 1TB/s of bandwidth. Undoubtedly the fastest storage system available in 2015.

The architecture of Annapurna Lab, which was also acquired, used 10G Ethernet, which was more mainstream at the time, to transport the NVMe protocol. 10]

Use NVMe over Ethernet to deliver low-latency, high-bandwidth storage resources before NVMe over Fabric was a standard.

After Annapurna was acquired by AWS in 2015, it even launched a series of Nitro datacenter chips [11], which are now being emulated by various datacenter giants.

DSSD and Annapurna Lab both implemented NVMe over Ethernet before the standard for the industrial protocol came out, and this can be understood as a transport protocol. So you can see that after the NVMe Express Protocol Organization released the protocol for NVMe over Fabric in July 2016, more and more vendors started to offer different products.

NVMe over Fabric.

NVMe over PCIE.

In terms of NVMe implementation, using PCIE switch for expansion and networking is indeed the least changeable solution, Dolphin [12] based on PCIE switch and NTB Host Adapter is the continuation of DSSD solution. However, because the PCIE protocol needs to transmit the clock signal in the connection line, it is not destined to have the data center-wide scalability like Ethernet, and the RACK-level based solution can certainly be realized, but there is the problem of scalability.

The PCIE device hot plug is also a function of the current motherboard support is incomplete, based on the physical device of the AIC hot-plug to be relatively more complex than based on the hot-plug backplane program, and the impact of the system is relatively large.

Onboard Power and Reset buttons are accompanied by a LED Debug Display that puts users in the driving seat when experimenting with different hardware configurations.

NVMe over RDMA

There are two main schools of thought on RDMA bearers: RoCEv2 and iWARP, both of which have representative vendors. Currently, RoCEv2 is based on UDP, so it needs to realize lossless network in the data center, which is a challenge for network operation and stability. The TCP-based iWARP has a relatively low threshold for use.

NVMe over TCP.

Over TCP is a new standard added by the NVMe Protocol Organization after 2018, mainly because of the higher cost of using RDMA-based solutions. And the deployment cost of TCP-based solutions is very low. But this one also brings the difference between the performance of Over TCP, especially the latency, and Over RDMA. Because RDMA is one side operation, the receiving end of I/O processing does not require the involvement of the operating system and kernel, and directly interfaces with the application program through HCA.

NVMe over FC

NVMe over FC [13] is the most popular among traditional storage vendors, with IBM, NetApp, Dell/EMC, and many others launching such products that leverage the reliability of FC networks to achieve highly available storage solutions to a certain extent.

One of the biggest differences between this solution and the original FC is that traditional SCSI-based solutions have I/O Fence to preserve device IDs for device sharing, but the NVMe over Fabric protocol doesn’t define this feature, so people are more likely to use mirroring to achieve data reliability.

NVMe over Ethernet.

CNEX Labs, a company founded by Mr. Yiren Huang, founder of Resnics Technology, has also introduced NVMe over Ethernet functionality, enabling NVMe I/O queues to interface with SSD I/O queues at the hardware level. 40Gbps NVMe over Ethernet is available on CNEX Labs’ Westlake SSD controllers.

The image above shows the NVMe over Ethernet card developed by CNEX Labs based on the Westlake controller chip [14].

NVMe over InfiniBand

With the popularity of AI training, InfiniBand is moving away from traditional HPC to more customers, and to be able to run expensive GPUs at full speed, NVMe over InfiniBand-based solutions have emerged.

EasyCore’s NVMe End-to-End Solution

As you can see above, there is a wide variety of NVMe over fabrics, making it difficult to have a clearer choice as a user. Most of the NVMe over fabric solutions, like traditional storage vendors, are only concerned with the server side, and don’t have a lot of input on how users can use them, hoping that the open source community will provide solutions. The NVMe over Fabric initiator side of the market, implemented by users using ordinary NICs or customized hardware, has potential pitfalls in terms of data reliability and system security.

Resnics saw this problem and realized that it would be a challenge for users to manage both RDMA HBAs and NVMe SSDs at the same time. Users prefer to use NVMe for storage capacity, and they don’t need to be concerned about where this NVMe SSD is located, and they want their network to be free of any additional modifications for NVMe over fabric.

Resnics provides a client device that allows users to directly see an NVMe device without having to care about the physical location of the NVMe SSD. For supported networks, Resnics utilizes the self-developed P4 network engine to seamlessly support various underlying fabrics, resulting in perfect unification of NVMe over fabrics. Based on the support of standard NVMe protocol, the shared storage function is realized through end-to-end linkage.

Currently the major international manufacturers, Intel [15] and Nvidia [16], have seen the same need and have added NVMe client implementations to their IPU/DPUs.

Cloud Native is starting to spread to enterprise users from Hyperscale, and here is a survey of Cloud Native SDI (Software-defined Infrastructure) that unsurprisingly shows that EBS based on NVMe interfaces provided by AWS is the dominant player in the market.

Therefore, it is foreseeable that with the development of cloud computing technology, NVMe’s interface, like Virtio-blk, will become cloud-native, and users will no longer need to care about the physical location of their storage, and the NVMe protocol will be ubiquitous, resulting in a renewed surge of NVMe over fabrics. With this in mind, we are pleased to announce that we are pleased to be able to offer the latest in NVMe technology to our customers in the form of a new NVMe platform.


[1] https://www.storagenewsletter .com/2011/06/14/91-ssd-manufacturers-in-the-world-document/

[2] https://thefutureofthings.com/6193-iosan-ssd/

[3] https://patents.google.com/patent/US20150006663

[4] https://en.wikipedia.org/wiki/Andy_Bechtolsheim

[5] https://gigaom.com/ 2013/04/04/meet-dssd-andy-bechtolsheims-secret-chip-startup-for-big-data/

[6] https://globalny.biz/catalog/id/966

[7] https://virtualgeek. typepad.com/virtual_geek/2015/05/emc-world-day-3-dssd-tech-preview.html

[8] https:/ /virtualgeek.typepad.com/virtual_geek/2016/02/dssd-and-emc-breaking-records-creating-categories.html

[9] https://virtualgeek.typepad.com/virtual_geek/2015/05/emc-world-day-3-dssd-tech-preview.html

[10] https://community. cadence.com/cadence_blogs_8/b/breakfast-bytes/posts/the-aws-nitro-project

[11] https://en.wikipedia.org/wiki/Annapurna_Labs

[12] https://www.dolphinics. com/download/WHITEPAPERS/nvme_over_pcie_fabrics_device_lending.pdf

[13] https://demartek.principledtechnologies.com/Demartek_NetApp_Broadcom_NVMe_over_Fibre_Channel_Evaluation_2018-05.html

[14] https:// www.datacenterdynamics.com/en/news/ocp-summit-microsoft-speeds-up-cloud-ssds-with-project-denali/

[15] https://www.intel.cn/content/www/cn/zh/ products/network-io/smartnic.html

[16] https://www.mellanox.com/related-docs/ solutions/SB_Mellanox_NVMe_SNAP.pdf