Many embedded designs absolutely have to work right. A malfunction could do unacceptable harm to persons or property. Until recently, this requirement has been addressed through careful design and hardware reliability: if the software and the logic are right and there are no hardware failures, the system will work.
But today we live in the age of undeclared cyber warfare. If your system must work, you must assume that everyone from bored hackers to criminal gangs to lavishly funded government laboratories will attack it. In order to defend your system, you must determine what—and, eventually, whom—you can trust. This is not an easy, or, some argue, even an achievable quest. But undertake it you must.
Defining a Hierarchy
By itself, the question “Can I trust my system?” is all but imponderable. To get a grip on it, we need to partition the fundamental huge question into a number of smaller, merely difficult questions. This is commonly done by breaking the system apart at well-defined boundaries: most often, the pieces we use are application software; operating systems, boot code, and firmware; and hardware. In general, if we can define the interfaces between these levels—thereby defining what it is we are trusting each level to do–and if we can trust each level of the design individually, we can trust the system (Figure 1).
Trust can only be inclusive. If your object software is perfect, you must still trust the operating system (OS) to respond correctly to its application-program interfaces and to not corrupt your code. To do that, you must trust that there is no pernicious code in the boot loader that could have corrupted the OS, and no Trojan horse in the hardware that could have taken control of the system. Some architects describe this recursive questioning as finding the root of trust. But perhaps this metaphor is too optimistic. In a panel on secure systems at June’s Design Automation Conference (DAC), Intel senior principle engineer Vincent Zimmer summoned the apocryphal story of the speaker who claimed the earth was a disk on the back of a turtle. Asked what the turtle stood on, the speaker replied “It’s turtles; all the way down.”
Eventually—if indeed the recursion ends somewhere—you end up trusting not a design nor a methodology, but a specific group of engineers—human beings. At least so some experts argue.
Sources of Trust
All of this raises an obvious question: how do you come to trust a specific layer in your design? The term “root of trust” suggests that you can just burrow down to some ultimate trusted layer and then you are safe. But it is not quite like that.
In general, there are only a few ways to establish trust in a layer of the design, be it hardware or software.
And, of course, once you trust your layer of the design, you must proceed to establish trust in the lower layers from which it is physically possible to alter the behavior of your design—all the way down. It is useful to examine this process in more detail, layer by layer.
Let’s start with application software—because your attackers probably will start there too. “It is always 1999 somewhere,” offered Dino Dai Zovi, mobile security lead at Square, speaking at the DAC panel. “Hackers don’t have to penetrate your hardware. It is still too easy to penetrate your software.” In a subsequent panel—this one on automotive security—Craig Smith, revers-engineering specialist at Theia Labs, underlined the point. “You can’t make systems unhackable,” he said. “But today you make it too easy. Buffer-overflow attacks work just fine in embedded systems. After all this time they shouldn’t be working at all.”
Several factors conspire to keep application code so vulnerable. One is simple indifference. Managers are still far more interested in schedules and coder productivity than in security. Jeff Massimilla, chief product cybersecurity officer at General Motors, warned the automotive panel: “If security is not a boardroom discussion, it won’t get the resources it needs.”
Also, accessibility is an issue. Many embedded systems run continuously, or are not accessible for remote updates once they are put into the field. You can’t take a transformer yard or a generating station off-line for software updates. Nor is it prudent to update code in moving cars: “a car can’t be off-line,” Massimilla observed. So when you discover a vulnerability in a deployed system, you may not be able to do much about it.
Another issue is scale, Massimilla said. “We have an average of 30 to 40 electronic control units (ECUs) to secure in a new car.” Jeffrey Owens, CTO and executive VP at Delphi, agreed. “A car today can have 50 ECUs and 100 million lines of code. That is four times the complexity of the F-35 fighter.” The size of such code sets makes use of formal verification tools impractical above the block level, and full redundancy out of the question. An auto manufacturer is not going to add ECUs just for security. So all of the ways to achieve trust are at best difficult.
But size also means that much of the code will be reused. And that opens the possibility of establishing trust one block at a time. One automotive safety standard, ISO 26262, offers a path to certifying trustability of code assembled from blocks of trusted IP.
ISO 26262 specifies that you can build compliant systems by putting together components that meet certain rigid criteria. For example, the components can themselves be developed to 26262 standards, which include requirements such as formal reviews, strict traceability, and trusted tools. Or components that were developed before 26262 existed may be accepted because they have been extensively deployed in the field without failures. All such components are called Safety Elements out of Context (SEooCs, if you will) because they have been certified as safe independent of the particular system context in which they were designed.
Because they are out of context, each SEooC must include a formal safety manual that lists requirements and assumptions for safe use of the IP. The manual is mandatory for anyone reusing an SEooC in a 26262 environment. Its dictates should find their way into another document, the Development Interface Agreement (DIA)—a multiparty contract that governs a 26262-compliant design in which multiple parties participate, defining who has what roles and responsibilities in the process (Figure 2).
ISO 26262 is by heritage an automotive standard—more specific perhaps than industrial standards like IEC 61508, from which it derives, but far more rigorous than the casual approach of most embedded design methodologies. But as more embedded designers face the real cost of attacks, 26262’s influence is spreading into other application areas.
Such rigor is nearly the only way to establish trust in large bodies of application software. The situation for operating systems and firmware is, if anything, more complex.
The OS Question
Operating systems and firmware share a lot of issues with application code. But there are some significant differences. First is that many designs license an entire operating system—or an entire development platform—rather than writing their own. It is difficult to find anything beyond the level of a small real-time kernel that is even close to the quality of an SEooC. Second is the fact that both operating software and firmware are often updated in the field. Think about how often you get updates to Windows. Third, system developers usually treat OSs as black boxes, with no idea what code is really under the hood, how it works, or what risks it might contain.
One major trend in system software actively undermines security. “Open sourse is not a reliability method,” warned Georgia Tech holder of the distinguished chair in embedded computing Marilyn Wolf to an audience at DAC. In an unrelated webinar on ISO 26262 SEooCs, Mentor Graphics embedded-division chief safety officer Robert Bates stated “open-source software today has no path to becoming an SEooC. Basically, forget Linux. There is an effort in Europe to get a Linux version to Automotive Safety Integrity Level (ASIL) B [the second-lowest level of 26262 rating] but that will take two or three years.”
Experts suggest negotiating with the customer either to reduce the ASIL of the system or to isolate the OS so that it can physically only affect non-critical functions. For instance, it might be acceptable to use Linux in an information-only user interface, so long as it cannot influence control loops. Otherwise the alternatives would be to use a field-proven kernel, to build an OS from SEooCs or similarly-secure components, to write a custom kernel to 26262 standards, or to run the application code on bare-metal hardware. In multicore systems it might be feasible to run different operating systems on two or three cores and use cross-checking or voting, but you would still have to examine cases in which two of the OSs had been attacked.
Updating presents a new set of issues. Updates would need to be at SEooC quality—that is, you have to be sure that nothing in the update will interact with your system to create a vulnerability, even though the team that created the update may have no knowledge of your system. And before you replace any code in your system, you want to be certain the new code comes from a valid source, and that what you received is exactly what the source sent. Generally this requires that the update be watermarked with a strong hash code, signed by the authorized releasing party, and encrypted. Your system must receive the update, decrypt it, and verify the signature and the hash code before you install anything.
But do you trust this process of decrypting and checking? To do so, you must trust the software that performs it, and the hardware on which that code executes, including, for example, a hardware crypto accelerator and the on-chip bus. Such issues take us across a boundary into the realm of secure hardware. In this world the concepts—the sources of trust—are the same, but the stakes are higher. Hardware is difficult and time-consuming to design and update. Tool chains are more complex, and more parties are involved. Attacks can be physical as well as logical. And because hardware is difficult and expensive to attack, its defenders are by definition facing the most skilled and persistent opponents.
Securing the Hardware
Trusted hardware must begin with a trusted specification: a requirements document to which all future elements of the design can be traced. You can’t agree that a system is secure until you can separate what it is supposed to do from what it is not supposed to do.
In principle requirements would be in a form that would permit formal equivalence checking and assertion testing. But in practice requirements are usually stated in often-ambiguous prose, often not in the native language of the developers and sometimes referencing previous generations of hardware that have themselves never been fully characterized. In such projects it is vital that hardware developers and system developers communicate about all the questions that come up—hence the importance of 26262’s DIA. You don’t want to tape out an SoC with questions like “how exactly does block move execute?” or “under what conditions is the key store readable?” up in the air.
From agreed requirements, the design ideally passes through successive levels of abstraction, employing trusted tools at each level, equivalence checking, assertion testing, and reviews. This process must detect not only inadvertent errors, but intentional sabotage. And yes, we are being realistic about sabotage. At the DAC security panel Lisa McIlrath of BBN Technologies Raytheon commented “I know it is realistic to put an advanced persistent threat into someone’s Verilog, because I’ve done it. Some points of attack are the CPU core, and especially, the instruction-decode logic. You have to have trusted Verilog.”
Tools are equally an issue, as is IP. Clearly IP fits in the discussion of SEooCs—but design tools? Synthesis, place and route, even timing tools are opportunities to modify the design. Again, we are being realistic: in his DAC keynote, Cadence president and CEO Lip-Bu Tan said “Security is next after power on the list of needs our customers mention. We are actively working to secure our tools and IP. We know our customers are actively worried.”
Such scrutiny can’t be limited to one operating mode. Devices have been successfully attacked through debug and test modes. JTAG ports often expose appalling amounts of data to whoever activates them. And there are side-channel attacks: monitoring core supply current to detect the flow of code execution, for example, or, as one DAC paper demonstrated, using error injection into text strings to identify encryption keys. Finally, there are physical attacks: reducing supply voltage to force an exploitable exception, or physically deconstructing and reverse-engineering an SoC.
In extreme cases, one takes extreme measures. You can split your design into two pieces—each unintelligible without the other—fabbed at different foundries and only united in a 3D module. You can obfuscate the design with misleading layouts and dummy or ambiguous functions. Such measures may seem implausibly difficult and expensive, but if the alternative is a foreign power gaining control of your air defenses, for example, such measures may seem quite reasonable. In fact both have been suggested by serious researchers for use in high-risk systems.
After all these considerations comes tape-out. And then—do you actually trust everyone in your foundry who will have access to design files, test files, or wafers? How about your assembly and test houses? All of them can introduce vulnerabilities into your hardware or reverse-engineer your design. Would you still trust them if the global political or economic situation changed dramatically?
Securing the Imponderable
After reading this far you might be tempted to just give up. Securing every stage of a design from specification of the chips to application support can sound both financially infeasible and technically impossible. “It is impossible to instill trust in every level of the stack,” agreed Siva Narendra, CEO of Typhone, during the security panel. Fortunately, it is not usually necessary.
“Attackers aren’t mindless demons,” observed Dai Zovi. “They are businesses with a different risk appetite. They shift slowly, and they don’t spend more on an attack than they expect to earn from it.”
Attacks do vary in cost to the attacker. You can get inside many applications and operating systems with share-ware tools, a reasonable PC, and a little ingenuity. Really persistent attacks may require assembling a bot net from other people’s unprotected systems, or putting together a bunch of graphic processing units (GPUs). “A thousand graphics cards and nine hours are sufficient to crack any password of 64 characters,” Narendra assured his by now uneasy audience. At the extreme, reverse-engineering an SoC may require stealing several working chips and using multi-million-dollar ion-beam tools and electron microscopes.
An attacker angling for free soda from a vending machine isn’t likely to invest much. But a criminal gang might invest tens of thousands to credibly threaten a power grid that a utility would pay tens of millions to ransom. (Sadly, today they can attack for far less than that.) A nation might invest an entire research laboratory in attacking a system vital to a potential enemy’s military or civilian order. In principle you only have to secure your design against attacks that would make economic sense.
Unfortunately, it is difficult to estimate risk versus reward for many embedded systems. What is it worth to you to keep your home water heater from failing? What is it worth to a nation-wide insurance company to keep a hundred-thousand water heaters from exploding? Both scale and the ability to attack an entire network from one node can make an attack on a seemingly insignificant device far more valuable than the device itself.
This uncertainty notwithstanding, return on investment may give us the best answer to the question with which we began this article. You can stop digging for the root of trust when you reach the level where no rational attacker would invest in going deeper. For many systems this can mean just careful application coding, a secure update process based on something like ARM’s TrustZone, and a very secure key store. For transportation or industrial systems with lives on the line, or for a country’s power and communications infrastructure, it will mean ISO 26262 or something similar. For defense systems it may mean split, obfuscated designs, multiple domestic foundries, and formally proven code.
Certainly today most designs are far more trusting then their situation warrants. Sadly, that is only likely to change as the headlines convey really bad news, and, as GM’s Massimilla said, the discussion reaches the boardroom.
Read a detailed white paper on the security architecture of a new FPGA family.