An air mass that begins life as a single huge cloud may separate, under the complex interactions of wind, temperature, pressure, and humidity, into several distinct strata with different characteristics. Similarly, today’s seemingly uniform cloud data centers are being transformed, through the emerging pressures of big-data computing and the Internet of Things (IoT), into multiple, distinct layers of computing, networking, and storage, reaching from the heart of the data center all the way to the myriad sensors and actuators strewn through the real world. Papers at April’s Open Server Summit in Santa Clara, California offered a unique cross-section view into this developing cloud stack (Figure 1).
An unusual feature of this stratification is that it appears to be driven not by the need for concentrated computing power—the usual force behind architectural change—but by constraints on the movement of data. Increasingly, bandwidth and latency, rather that giga floating point operations per second (GFLOPS), are determining the gross structure of the cloud.
Storage Realigns Itself
Some of the most dramatic changes are happening deep in the data center, in the realm of memory and storage. What began as a simple hierarchy of archival tape, massive arrays of disks, and bank upon bank of local DRAM has shattered into a fine mist of different device types, locations, and interconnect technologies. Now that confused mist is recondensing into a few distinct layers.
The driving need is to get as much data as possible as close to the server CPUs as possible. This trend, traceable to the early importance of Memcached, has received a strong push from software engines such as Spark and Redis, which attempt to make an entire data set memory-resident rather than disk-resident. Under this pressure, legacy configurations of centralized RAID arrays connected into server racks via Ethernet, gave way to small, fast drives on the server cards, linked by SATA to the DRAM array.
These drives, in turn, are giving way to massive fast solid-state memory on the server card. Today that means providing arrays of high-density NAND flash, connected to the CPU clusters via NVMe protocol over a PCI Express® (PCIe®) bus, or even residing in DIMMs on the server card’s DRAM bus.
Tomorrow, according to Micron Technology vice president of advanced storage Rob Pelgar in his Summit keynote, the best choice will be Micron’s 3D XPoint memory. With non-volatility, a thousand times the speed of NAND flash, and far greater density that DRAM, 3D XPoint fits naturally in between big, shared solid-state drives (SSDs) in the rack and DRAM on the blade, Pelgar said (Figure 2).
Not only does local 3D XPoint help systems like Spark by, in effect, expanding the local memory on the blade into one massive DRAM-like pool, but it has broader implications as well. Pelgar pointed out that massive non-volatile memory in the CPUs’ address space could be a huge help in servicing the sorts of latency-critical applications that device virtualization and the IoT are creating. And once that non-volatile memory is in place, it changes the ground rules for operating-system kernels as well. Now structures such as file-system metadata, key-value stores, and application data can be persistent, rather than reloaded with each new invocation of a process.
The sinew holding together the server blades, disk arrays, and bulk storage in the data center is Ethernet. Often today this means legacy 10 Gbps Ethernet over copper backplane—although speeds between the server cards and the top-of-rack (ToR) switches are moving to 40 Gbps. The ToR switches are then linked, usually via fiber, to centralized Ethernet switches that define a broader network spanning the data center. But two speakers from Microsoft at the Summit described how this traditional arrangement is disaggregating and evolving.
To begin with, the network is separating into two distinct strata–a control plane and a data plane—with very different implementations. This control/data distinction has nearly always been there within the network switches. But now, the switches and network interface cards (NICs) are opening up, as it were, allowing the control-plane functions to migrate into the general-purpose computing hardware—the server CPUs and their hardware accelerators—while the data plane becomes a pure transport medium.
“Networking is becoming a server problem,” said Microsoft general manager of Azure server hardware engineering Kushagra Vaid. “We are moving toward a policy-based network control plane in the servers, and a data plane that is just pipe. This gives us a flat space for routable RDMA, with network appliances and security implemented on the servers.”
There is a problem with the vision, however. As you ramp up speed in the pipe, the server CPUs have trouble keeping up, even when they are not handling data at full wire speed. So Microsoft has added a layer of computing—hardware acceleration—at the point where Ethernet attaches to the server card: the NIC. This configurable processor—Microsoft calls it a Smart NIC—has access to the Ethernet packet streams entering or leaving the card, allowing it to cooperate with the server CPUs on stream-oriented tasks. “For example, we can do wire-speed, end-to-end encryption at 40 Gbps,” Vaid illustrated.
Once this sort of processing power is available, it tends to attract new code. “It is important that the Smart NIC is configurable,” Vaid said. “We update its FPGA about every week.” But Vaid sees this as just the beginning of a trend. “Many emerging workloads, such as machine learning, network functions virtualization, or storage processing are very parallel, and map poorly to CPU instruction sets. But they can be done in the I/O complex with minimal CPU interaction. We could see disaggregation of the server and I/O, with legacy code remaining on the CPUs and new applications executing in the I/O. There is experimentation with all kinds of parallel architectures going on today,” he observed.
As the network adds processing power, the nature of the pipe may be changing as well. Microsoft is moving the server card’s network connection from 40 to 50 Gbps this year, said that company’s principal Azure network architect Brad Booth. From there, the company hopes to reach 100 Gbps in 2018. But that will require a fundamental change.
Today the transceivers on the blade are driving a couple of meters of copper backplance to reach the ToR switch. “But 100 G PAM4 can’t drive a long piece of copper,” Booth warned. The network inside the rack will have to move to optical fiber. And once you make that move, the range goes from 3 m to 20 m or more. You can eliminate the ToR switch altogether and take fiber directly from the server card to centralized data-center switches, eliminating an entire layer in the network hierarchy.
The End of the Wire
The data center, then, is set for profound changes. A new class of server memory, in-network computing, and a new network topology could all appear in the next few years, each innovation moving us incrementally closer to a single, flat address space where tasks come to data, not data to tasks.
But another speaker at the Open Server Summit, HP Enterprise vice president and general manager Tom Bradicich, argued that the IoT would bring profound change outside the data center’s concrete-slab walls—at the very edge of the Internet.
A simple view of the IoT has an enterprise’s constellation of Things pouring data into a cloud, from which big-data analyses periodically hand down guiding or controlling edicts. “At last, Operations directly connects to IT,” Bradicich enthused. And then he explained why it’s not going to be like that at all.
To begin with, he cited the volume of data involved. “To take an extreme example of an embedded system, the Super Collider at CERN generates 40 Terabytes of data per second,” Bradicich related. “One ordinary car, connected, can generate 500 megabytes (MB) per minute.” All that data is not likely to be funneling back into the data center, even over some future 5G wireless network. And then there is the matter of security. “Nobody in their right mind would put the nation’s power grid on the Internet,” he observed, perhaps rather optimistically.
In any case, massive bandwidth requirements and stringent security concerns both argue for a stratum of local processing at the network edge. So does latency. Some systems will have control loops or functional-safety overrides that depend both on analysis of the data they are collecting and on having the answers within a known latency. Those systems will require some degree of local computing to achieve latency and jitter requirements that simply could not be met across the best-effort Internet.
Further, that local processing resource needs to be tailored to the application, Bradicich argued. He pointed out that the environment at the edge of the IoT—with its flurries of sensor data, interrupts, and real-time deadlines—is completely unlike the data-center environment. So why try to meet edge computing requirements by deploying data-center server racks?
Bradicich offered examples. In one, Airbus Industries is deploying smart glasses to assembly workers. When a user prepares to install a fastener, the glasses identify the location of the hole, and preset the correct driver to the specified torque for that specific fastener and location. When the insertion is complete, the glasses log the data they have captured during the task, creating a complete assembly log of the airframe. In effect this is an augmented-reality application. In order not to slow down the worker, many of these operations must meet real-time deadlines, and so must be local. You don’t want a technician picking up the wrong driver or setting the wrong torque because she couldn’t wait for an unresponsive system.
Another example came from the Virgin Racing Formula E team, which is competing in the global electric formula-car championship series. The races are on a tight schedule, with practice laps and qualifying laps separated only by a few hours. The challenge is to capture telemetry, video, and audio from the cars during practice laps, perform big-data analyses, and tune the cars for optimal motor/chassis/tire settings and battery management.
The team uses two HP Enterprise Moonshot server racks: one in the labs back at headquarters, and one in the pit area at the track. Both are configured specifically for their task loads. Virgin Racing has said that their original intent was to use a Moonshot in the lab as a private cloud, and do all the processing there. But with a global championship series, they quickly discovered that in some race locations there would not be enough Internet bandwidth to meet the computing deadlines. So they evolved the edge-cloud approach.
The original, simple image of a computing cloud, the Internet, and a world full of Things is resolving into a far more complex picture. Within the data center, storage, computing, and networking are separating into multiple, sometimes overlapping layers. Outside, a new layer of real-time computing is condensing at the network edge. All these new formations are responses to the contending forces of application data flows and real-world bandwidth and latency limitations. The future will be interesting.
Read an overview of FPGA applications in the data center.