Embedded computing has passed—more or less unscathed—through many technology shifts and marketing fashions. But the most recent—the rise of edge computing—could mean important new possibilities and challenges.
So what is edge computing (Figure 1)? The cynic might say it is just a grab for market share by giant cloud companies that have in the past struggled in the fragmented embedded market, but now see their chance. That theory goes something like this.
With the concept of the Internet of Things came a rather naïve new notion of embedded architecture: all the embedded system’s sensors and actuators would be connected directly to the Internet—think smart wall switch and smart lightbulb—and all the computing would be done in the cloud. Naturally, this proved wildly impractical for a number of reasons, so the gurus of the IoT retreated to a more tenable position: some computing had to be local, even though the embedded system was still very much connected to the Internet.
Since the local processing would be done at the extreme periphery of the Internet, where IP connectivity ended and private industrial networks or dedicated connections began, the cloud- and network-centric folks called it edge computing. They saw the opportunity to lever their command of the cloud and network resources to redefine embedded computing as a networking application, with edge computing as its natural extension.
A less cynical and more useful view looks at edge computing as one facet of a new partitioning problem that the concurrence of cloud computing, widespread broadband access, and some innovations in LTE cellular networks have created. Today, embedded systems designers must, from requirements definition on through the design process, remember that there are several very different processing sites available to them (Figure 2). There is the cloud. There is the so-called fog. And there is the edge. Partitioning tasks and data among these sits has become a necessary skill to the success of an embedded design project. If you don’t use the new computing resources wisely, you will be vulnerable to a competitor who does—not only in terms of features, performance, and cost advantages to be gained, but in consideration of the growing value of data that can be collected from embedded systems in operation.
The Joy of Partitioning
Unfortunately, partitioning is not often a skill embedded-system designers cultivate. Traditional embedded designs employ a single processor, or at worst a multi-core SoC with an obvious division of labor amongst the cores.
But edge computing creates a new scale of difficulty. There are several different kinds of processing sites, each with quite distinct characteristics. And the connections between processors are far more complicated than the nearly transparent inter-task communications of shared-memory multicore systems. So, doing edge computing well requires a rather formal partitioning process. It begins with defining the tasks and identifying their computing, storage, bandwidth, and latency requirements. Then the process continues by characterizing the compute resources you have available, and the links between them. Finally, partitioning must map tasks onto processors and inter-task communications onto links so that the system requirements are met. This is often an iterative process that at best refines the architecture and at worst turns into a protracted, multi-party game of Whack-a-Mole. It is helpful, perhaps, to look at each of these issues: tasks, processing and storage sites, and communications links, in more detail.
There are several categories of tasks in a traditional embedded system, and a couple of categories that have recently become important for many designs. Each category has its own characteristic needs in computing, storage, I/O bandwidth, and task latency.
In any embedded design there are supervisory and housekeeping tasks that are necessary, but are not particularly compute- or I/O- intensive, and that have no hard deadlines. This category includes most operating-system services, user interfaces, utilities, system maintenance and update, and data logging.
A second category of tasks with very different characteristics is present in most embedded designs. These tasks directly influence the physical behavior of the system, and they do have hard real-time deadlines, often because they are implementing algorithms within feedback control loops responsible for motion control or dynamic process control. Or they may be signal-processing or signal interpretation tasks that lie on a critical path to a system response, such as object recognition routines behind a camera input.
Often these tasks don’t have complex I/O needs: just a stream or two of data in and one or two out. But today these data rates can be extremely high, as in the case of multiple HD cameras on a robot or digitized radar signals coming off a target-acquisition and tracking radar. Algorithm complexity has traditionally been low, held down by the history of budget-constrained embedded designs in which a microcontroller had to implement the digital transfer function in a control loop. But as control systems adopt more modern techniques, including stochastic state estimation, model-based control, and, recently, insertion of artificial intelligence into control loops, in some designs the complexity of algorithms inside time-critical loops has exploded. As we will see, this explosion scatters shrapnel over a wide area.
The most important issue for all these time-critical tasks is that the overall delay from sensor or control input to actuator response be below a set maximum latency, and often that it lies within a narrow jitter window. That makes partitioning of these tasks particularly interesting, because it forces designers to consider both execution time—fully laden with indeterminacies, memory access and storage access delays—and communications latencies together. The fastest place to execute a complex algorithm may be unacceptably far from the system.
We also need to recognize a third category of tasks. These have appeared fairly recently for many designers, and differ from both supervisory and real-time tasks. They arise from the intrusion of three new areas of concern: machine learning, functional safety, and cyber security. The distinguishing characteristic of these tasks is that, while each can be performed in miniature with very modest demands on the system, each can quickly develop an enormous appetite for computing and memory resources. And, most unfortunately, each can end up inside delay-sensitive control loops, posing very tricky challenges for the design team.
Machine learning is a good case in point. Relatively simply deep-learning programs are already being used as supervisory tasks to, for instance, examine sensor data to detect progressive wear on machinery or signs of impending failure. Such tasks normally run in the cloud without any real-time constraints, which is just as well, as they do best with access to huge volumes of data. At the other extreme, trained networks can be ported to quite compact blocks of code, especially with the use of small hardware accelerators, making it possible to use a neural network inside a smart phone. But a deep-learning inference engine trained to detect, say, excessive vibration in a cutting tool during a cut or the intrusion of an unidentified object into a robot’s planned trajectory—either of which could require immediate intervention—could end up being both computationally intensive and on a time-critical path.
Similarly for functional safety and system security, simple rule-based safety checks or authentication/encryption tasks may present few problems for the system design. But simple often, in these areas, means weak. Systems that must operate in an unfamiliar environment or must actively repel novel intrusion attempts may require very complex algorithms, including machine learning, with very fast response times. Intrusion detection, for instance, is much less valuable as a forensic tool than as a prevention.
Traditionally, the computing and storage resources available to an embedded system designer were easy to list. There were microcontroller chips, single-board computers based on commercial microprocessors, and in some cases boards or boxes using digital signal processing hardware of one sort or another. Any of these could have external memory, and most could attach, with the aid of an operating system, mass storage ranging from a thumb drive to a RAID disk array. And these resources were all in one place: they were physically part of the system, directly connected to sensors, actuators, and maybe to an industrial network.
But add Internet connectivity, and this simple picture snaps out of focus. The original system is now just the network edge. And in addition to edge computing, there are two new locations where there may be important computing resources: the cloud, and what Cisco and some others are calling the fog.
The edge remains much as it has been, except of course that everything is growing in power. In the shadow of the massive market for smart-phone SoCs, microcontrollers have morphed into low-cost SoCs too, often with multiple 32-bit CPU cores, extensive caches, and dedicated functional IP suited to a particular range of applications. Board-level computers have exploited the monotonically growing power of personal computer CPU chips and the growth in solid-state storage. And the commoditization of servers for the world’s data centers has put even racks of data-center-class servers within the reach of well-funded edge computing sites, if the sites can provide the necessary space, power, and cooling.
Recently, with the advent of more demanding algorithms, hardware accelerators have become important options for edge computing as well. FPGAs have long been used to accelerate signal-processing and numerically intensive transfer functions. Today, with effective high-level design tools they have broadened their use beyond these applications into just about anything that can benefit from massively parallel or, more importantly, deeply pipelined execution. GPUs have applications in massively data-parallel tasks such as vision processing and neural network training. And as soon as an algorithm becomes stable and widely used enough to have good library support—machine vision, location and mapping, security, and deep learning are examples—someone will start work on an ASIC to accelerate it.
The cloud, of course, is a profoundly different environment: a world of essentially infinite numbers of big x86 servers and storage resources. Recently, hardware accelerators from all three races—FPGAs, GPUs, and ASICs—have begun appearing in the cloud as well. All these resources are available for the embedded system end-user to rent on an as-used basis.
The important questions in the cloud are not about how many resources are available—there are more than you need—but about terms and conditions. Will your workload run continuously, and if not, what is the activation latency? What guarantees of performance and availability are there? What will this cost the end user? And what happens if the cloud platform provider—who in specialized application areas is often not a giant data-center owner, but a small company that itself leases or rents the cloud resources—suffers a change in situation? These sorts of questions are generally not familiar to embedded-system developers, nor to their customers.
Recently there has been discussion of yet another possible processing site: the so-called fog. The fog is located somewhere between the edge and the cloud, both physically and in terms of its characteristics.
As network operators and wireless service providers turn from old dedicated switching hardware to software on servers, increasingly, Internet connections from the edge will run not through racks of networking hardware, but through data centers. For edge systems relying on cloud computing, this raises an important question: why send your inter-task communications through one data center just to get it to another one? It may be that the networking data center can provide all the resources your task needs without having to go all the way to a cloud service provider (CSP). Or it may be that a service provider can offer hardware or software packages to allow some processing in your edge-computing system, or in an aggregation node near your system, before having to make the jump to a central facility. At the very least you would have one less vendor to deal with. And you might also have less latency and uncertainly introduced by Internet connections. Thus, you can think of fog computing as a cloud computing service spread across the network and into the edge, with all the advantages and questions we have just discussed.
When all embedded computing is local, inter-task communications can almost be neglected. There are situations where multiple tasks share a critical resource, like a message-passing utility in an operating system, and on extremely critical timing paths you must be aware of the uncertainly in the delay in getting a message between tasks. But for most situations, how long it takes to trigger a task and get data to it is a secondary concern. Most designs confine real-time tasks to a subset of the system where they have a nearly deterministic environment, and focus their timing analyses there.
But when you partition a system between edge, fog, and cloud resources, the kinds of connections between those three environments, their delay characteristics, and their reliability all become important system issues. They may limit where you can place particular tasks. And they may require—by imposing timing uncertainty and the possibility of non-delivery on inter-task messages—the use of more complex control algorithms that can tolerate such surprises.
So what are the connections? We have to look at two different situations: when the edge hardware is connected to an internet service provider (ISP) through copper or fiber-optics (or a blend of the two), and when the connection is wireless (Figure 3).
The two situations have one thing in common. Unless your system will have a dedicated leased virtual channel to a cloud or fog service provider, part of the connection will be over the public Internet. That part could be from your ISP’s switch plant to the CSP’s data center, or it could be from a wireless operator’s central office to the CSP’s data center.
That Internet connection has two unfortunate characteristics, from this point of view. First, it is a packet-switching network in which different packets may take very different routes, with very different latencies. So, it is impossible to predict more than statistically what the transmission delay between two points will be. Second, Internet Protocol by itself offers only best-effort, not guaranteed, delivery. So, a system that relies on cloud tasks must tolerate some packets simply vanishing.
An additional point worth considering is that so-called data locality laws—which limit or prohibit transmission of data outside the country of origin—are spreading around the world. Inside the European Union, for instance, it is currently illegal to transmit data containing personal information across the borders of a number of member countries, even to other EU members. And in China, which uses locality rules for both privacy and industrial policy purposes, it is illegal to transmit virtually any sort of data to any destination outside the country. So, designers must ask whether their edge system will be able to exchange data with the cloud legally, given the rapidly evolving country-by-country legislation.
These limitations are one of the potential advantages of the fog computing concept. By not traversing the public network, systems relying on ISP or wireless-carrier computing resources or local edge resources can exploit additional provisions to reduce the uncertainty in connection delays.
But messages still have to get from your edge system to the service provider’s aggregation hardware or data center. For ISPs, that will mean a physical connection, typically using Internet Protocol over fiber or hybrid copper/fiber connections, often arranged in a tree structure. Such connections allow for provisioning of fog computing nodes at points where branches intersect. But as any cable TV viewer can attest, they also allow for congestion at nodes or on branches to create great uncertainties in available bandwidth and latency. Suspension of net neutrality in the US has added a further uncertainty, allowing carriers to offer different levels of service to traffic from different sources, and to charge for quality-of-service guarantees.
If the connection is wireless, as we are assured many will be once 5G is deployed, the uncertainties multiply. A 5G link will connect your edge system through multiple parallel RF channels and multiple antennas to one or more base stations. The base stations may be anything from a small cell with minimal hardware to a large local processing site with, again, the ability to offer fog-computing resources, to a remote radio transceiver that relies on a central data center for all its processing. In at least the first two cases, there will be a separate backhaul network, usually either fiber or microwave, connecting the base station to the service provider’s central data center.
The challenges include, first, that latency will depend on what kind of base stations you are working with—something often completely beyond your control. Second, changes in RF transmission characteristics along the mostly line-of-site paths can be caused by obstacles, multipath shifts, vegetation, and even weather. If the channel deteriorates, retry rates will go up, and at some point the base station and your edge system will negotiate a new data rate, or roll the connection over to a different base station. So even for a fixed client system, the characteristics of the connection may change significantly over time, sometimes quite rapidly.
Connectivity opens a new world for the embedded-system designer, offering amounts of computing power and storage inconceivable in local platforms. But it creates a partitioning problem: an iterative process of locating tasks where they have the resources they need, but with the latencies, predictability, and reliability they require.
For many tasks location is obvious. Big-data analyses that comb terabytes of data to predict maintenance needs or extract valuable conclusions about the user can go in the cloud. So, can compute-intensive real-time tasks when acceptable latency is long, and the occasional lost message is survivable or handled in a higher-level networking protocol. A smart speaker in your kitchen can always reply “Let me think on that a moment,” or “Sorry, what?”
Critical, high-frequency control loops must stay at or very near the edge. Conventional control algorithms can’t tolerate the delay and uncertainty of any other choice.
But what if there is a conflict: a task too big for the edge resources, but too time-sensitive to be located across the Internet? Fog computing may solve some of these dilemmas. Others may require you to place more resources in your system.
Just how far today’s technology has enriched the choices was illustrated recently by a series of Microsoft announcements. Primarily involved in edge computing as a CSP, Microsoft has for some time offered the Azure Stack—essentially, an instance of their Azure cloud platform—to run on servers on the customer premises. Just recently, the company enriched this offering with two new options: FPGA acceleration, including the Microsoft’s Project Brainwave machine-learning acceleration, for Azure Stack installations, and Azure Sphere, a way of encapsulating Azure’s security provisions in an approved microcontroller, secure operating system, and coordinated cloud service for use at the edge. Similarly, Intel recently announced the OpenVINO™ toolkit, a platform for implementing vision-processing and machine intelligence algorithms at the edge, relying on CPUs with optional support from FPGAs or vision-processing ASICs. Such fog-oriented provisions could allow embedded-system designers to simply incorporate cloud-oriented tasks into hardware within the confines of their own systems, eliminating the communications considerations and making ideas like deep-learning networks within control loops far more feasible.
In other cases, designers may simply have to refactor critical tasks into time-critical and time-tolerant portions. Or they may have to replace tried and true control algorithms with far more complex approaches that can tolerate the delay and uncertainty of communications links. For example, a complex model-based control algorithm could be moved to the cloud, and used to monitor and adjust a much simpler control loop that is running locally at the edge.
Life at the edge, then, is full of opportunities and complexities. It offers a range of computing and storage resources, and hence of algorithms, never before available to most embedded systems. But it demands a new level of analysis and partitioning, and it beckons the system designer into realms of advanced system control that go far beyond traditional PID control loops. Competitive pressures will force many embedded systems into this new territory, so it is best to get ahead of the curve.
See a presentation on FPGAs in edge computing.
Dig deeper into OpenVINO toolkit.