The goal of computational storage architecture is either to reduce the need to move large datasets around or alleviate constraints on existing compute or storage resources, such as in an edge deployment.
One factor driving the development of computational storage is data – or, more precisely, the growing volumes of data that organizations increasingly have to contend with. Organizations are turning to data science, analytics, and machine learning to glean insights from all this data. Still, these are very data-intensive and tend to be bound by input/output (I/O) speeds or are latency-sensitive. Therefore, it makes more sense to process the data as close as possible to where it is stored, rather than shuffling gigabytes or terabytes into memory and back again.
The companies developing computational storage products have taken differing architectural approaches, from integrating processors into drives to accelerators that plug into a PCIe slot and access existing data stores via NVMe.
The Storage Networking Industry Association (SNIA) formed a Computational Storage Technical Work Group (TWG) to avoid a balkanization of the nascent computational storage ecosystem into mutually incompatible product lines. The group is working to define standards and develop a common programming model that will allow applications to discover and use any computational storage resources attached to a computer system.
SNIA has split the definition of computational storage devices into computational storage processors (CSPs), computational storage drives (CSDs), and computational storage arrays (CSAs). A CSP contains a compute engine but does not actually contain any storage itself. A CSD (typically a solid-state drive/SSD) includes both compute and storage. A CSA contains one or more add engines and storage devices.
The SNIA model includes a list of computational storage functions performed by computational storage devices, such as compression and decompression. Some computational storage products have been designed to carry out specific functions, such as video encoding or decoding, while others have been designed to be user-programmable.
NGD Systems is one of the more prominent computational storage suppliers. Its products are CSDs under the SNIA definition, integrating compute processing into an NVMe SSD. This is achieved by using a custom application-specific integrated circuit (ASIC) that incorporates both the SSD controller functions and a quad-core Arm Cortex-A53 CPU block.
There are several advantages of this architecture. The ASIC has direct access to the Nand flash chips in the drive via familiar flash interface (CFI) channels. These provide high-bandwidth and low-latency access, compared with transferring data into memory for the host CPU to process it.
Thanks to the embedded Arm cores, NGD’s devices can run a version of Ubuntu Linux, which simplifies the development and deployment of applications, or Microsoft’s Azure IoT Edge. The drive itself can also be accessed as simply a standard SSD.
This type of architecture is well suited to edge deployments. There may only be enough space or sufficient power for a single edge server, but with demanding requirements to analyze data in real-time, such as a video feed from a security camera. NGD has a Solution Brief on its website that describes how a MongoDB database can be shared across multiple CSD SSDs inside a single server instead of in various server nodes, reducing the datacentre footprint and the overall cost while delivering lower latency when replicating data.
NGD also cites as use cases automotive artificial intelligence (AI), content delivery networks, and hyperscale datacentres and offers a fully integrated In-Situ Processing Development System (ISDP) that enables developers and integrators to build and deploy applications.
Samsung has a similar CSD product, but its SmartSSD integrates a Xilinx field-programmable gate array (FPGA) and Samsung NVMe SSD controller inside a standard 2.5in (U.2) form factor SSD with a capacity of up to 4TB. The resulting product is marketed by Xilinx.
Xilinx provides a development platform, Vitis, which allows development in C, C++, or OpenCL. It also enables organizations to build accelerated applications via a set of open-source libraries optimized for the Xilinx FPGA in the SmartSSD. There are Viti’s libraries for accelerating AI inferencing, data analytics, quantitative finance, and others. Xilinx claims that using Bigstream’s hyper-acceleration layer, SmartSSD can make Apache Spark analytics 10 times faster.
Meanwhile, the NoLoad products from Eideticom are CSPs in that they contain an accelerator engine but no storage. Instead, they connect with storage and the host CPU via NVMe, allowing compute and storage to be scaled independently. In fact, with support for NVMe-oF, the data could equally be held in external storage arrays.
The load devices use an FPGA as the accelerator and are available as a PCIe card, a U.2 form factor like a drive enclosure, or EDSFF format, based on Intel’s Ruler SSD format. The load can support various functions, such as compression, encryption, erasure coding, deduplication, data analytics, and machine learning (ML).
NoLoad devices have already been deployed at the Los Alamos National Laboratory (LANL) as part of a next-generation storage system for high-performance computing (HPC). This has seen NoLoad devices used to offload essential storage tasks in a Lustre/ZFS file system, leading to improved performance and reduced costs for the storage system.
Also targeting storage is Pliops, which uses a PCIe card with an FPGA to accelerate key-value operations used in applications such as databases. The Pliops Storage Processor (PSP) implements an optimized data structure for database-related storage operations, such as indexing, searching, or sorting, and accelerates them without requiring any software changes to the application. It does this by replacing the underlying key-value storage engine, such as InnoDB, the default option for MySQL, with its hardware accelerator. Plops claim that this implementation can deliver 10 times the number of queries per second while making more efficient SSD storage space, delivering immediate business value.
GPUs can do computational storage too.
Perhaps the most extreme computational storage accelerator example is Nyriad. The firm has developed a software-defined storage platform called Nsulate that uses an Nvidia GPU to accelerate erasure coding functions. It is intended as an alternative to RAID for high-performance scale-out storage deployments requiring a high level of reliability.
In fact, it is claimed to be able to cope with dozens of simultaneous device failures in real-time, with no performance degradation, as Nsulate can rebuild any missing data faster than the data can be fetched from storage. This means that replacing a failed drive does not need to be a high priority for the IT team. Nyriad claims that the GPU can simultaneously be used for other workloads such as machine learning.
Insulate is currently available as part of pre-built systems by Boston Limited, which offers a Supermicro-based Nsulate storage server.
Computational storage is still at an early stage of development, although some suppliers have been offering deployable products for several years. Therefore, organizations evaluating it for their datacentre need to use caution, but there are already benefits from using computational storage products in specific applications. They can lead to lower overall power consumption and the need for fewer CPU cores per server node, for example, and deliver a significant boost in performance in many cases.