Embedded Computing Needs Hardware-Based Security

Embedded systems are in a profound transition: from physically isolated, autonomous devices to Internet-connected, accessible devices. Designers are learning—often to their dismay—that the mutation requires far more than just gluing a network interface onto the bus and adding an Internet Protocol stack. In many ways, these Internet-aware designs are coming to look less like traditional embedded systems and more like miniaturized enterprise data centers.

Much data-center technology—multitasking, multiprocessing, and fast private networks, for example—is already familiar to designers of large embedded systems, albeit on a far smaller scale. But one data-center technology—system security—may prove novel. Yet the same needs that shape data-center security architectures magically appear in embedded systems–once you connect them to the Internet. Unlike compute, storage, or connectivity requirements though, the demands of security don’t diminish much when you scale the system down from a warehouse-sized data center to a connected embedded device.

Data Center Security

So what is it that data centers—and connected embedded systems–need in the way of security? First, they need to protect themselves from external attacks and internal subversion by their own applications. This means providing a protected envelope in which any attempt to read or write code or data will be authenticated before it is performed. It also means that all system code and data—for operating systems, hypervisors, management, or maintenance—must be strongly encrypted when it is in storage or in transit outside that trusted envelope.

Second, data centers must support the security needs of their applications. Apps may provide transport-layer security (TLS, once known as secure socket layer, or SSL) for their clients, or they may use public-key authentication and encryption. They may also require authenticated and encrypted inter-process communication and storage, often using symmetric-key cryptography. They will look to the data center for key management and, often, crypto algorithm acceleration.

There are three common elements to all of these needs. They all require a secure, accelerated environment (Figure 1) in which to execute cryptographic algorithms. They need a safe way to create, store, send, and receive cryptographic keys. And, in order to create strong keys, they need a true random number generator based on a physical source of entropy.

Figure 1. Extreme measures are necessary to protect encryption keys and codes.

Crypto algorithms need a special environment for two reasons. First, they must be kept secure from corruption and monitoring. They are the ideal point of attack in the data center. Second, they can place an unsupportable computing burden on application CPUs, driving up latency in just the places where apps are most latency-sensitive. Both of these arguments suggest a physically secure proprietary hardware accelerator.

The problem of cryptographic key management presents similar issues. Secret keys must of course be kept secret. Less obviously, public keys must be protected from tampering. If a hacker can substitute a key she created for a public key you obtained from a certificate authority, you will authenticate messages from the hacker instead of genuine messages. These concerns preclude allowing unencrypted keys to ever be in server memory or storage. In fact some experts argue that they preclude allowing even encrypted keys into shared memory.

The random number problem is more mathematical. In order to generate a new key, you start with a random number. If the number is not truly random, but follows a statistical pattern, you have just narrowed the space in which an attacker must search to discover the key. Software random-number generators, though, can only approximate a genuinely random distribution. The poorer the approximation, the easier it will be for an attacker to find the key through directed trial and error. So ideally, you would get your random number by sampling a truly random physical process, such as delay-line jitter, RF noise or semiconductor junction noise. There is strong motivation to have a hardware-based random-number generator.

The HSM

These considerations led vendors to develop, and most data centers to install, a specialized appliance called a hardware security module (HSM). In either board or box form factor, the HSM meets the requirements outlined above, with several distinctive features.

First, the HSM is physically tamper-resistant, in much the same manner as a smartcard. The package may be designed to resist penetration, voltage manipulation, thermal attacks, and even examination by x-rays or ion beams. Such events should trigger the module to delete internal memory. Ideally, the module should also block side-channel attacks such as differential power analysis.

Second, the HSM should provide proprietary hardware for crypto algorithm acceleration, key storage, and random-number generation.

Third, the HSM must have a highly restrictive, bullet-proof firewall. The device should only respond to authenticated requests for a small number of pre-defined actions, such as to encrypt or decrypt a string or to create, read, write, or apply a key. Private or secret keys should only be readable under rigorous conditions, and only in encrypted form. Two special functions, key back-and up and restore, usually to a smartcard, and firmware update, must be very carefully controlled, ideally by multi-party authentication involving at least one trusted human.

By providing multiple levels of security, from external tamper protection to strong encryption of internal data and code, the HSM becomes so hard to hack that for most attackers it just isn’t worth the bother (Figure 2). Sadly, in practice it usually isn’t worth the bother because some other part of the data center is much more vulnerable. In any case, the HSM establishes the foundation on which the rest of the data-center security architecture is constructed.

.Figure 2. Full security requires multiple layers of defenses.

Understandably, HSM vendors are uninterested in describing the architectures of their modules. But it is possible to make some generalizations about just what is in a typical box-level HSM (Figure 3).

Figure 3. A typical HSM has a relatively simple structure.

The tamper resistance functions require hardware support, including motion, capacitive, radiation, voltage, and temperature sensors. There will be a secure microcontroller, ideally with in-line encryption/decryption on the memory and I/O interfaces. It will be the job of this MCU to monitor the sensors and supervise the other functions of the HSM. It will also read some sort of analog device to get a seed for random number generation. The MCU should of course also be secure against side-channel attacks.

In addition, there should be secure memory for key storage. Ideally this should be a custom device resistant to scanning from outside and instantly erasable when intrusion is detected. But the very large amount of memory that may be necessary for key storage and for buffers for encryption and decryption tasks in a data center may make DRAM the only practical solution, and the security features will have to be incorporated into the DIMMs.

Since the firewall is so restrictive, it can probably be implemented in a hardware state machine, relieving the MCU of some overhead and reducing the risk of a successful attack on the MCU software. And last but not least, our HSM will include a crypto algorithm accelerator. This would usually be a hardware data path optimized for the necessary encryption and authentication algorithms.

But there is a problem in that last statement. There are dozens of key-exchange, authentication, and encryption algorithms in wide use. Murphy’s Law dictates that a data center will have to support a large subset of them, plus some proprietary algorithms dreamed up by apps developers. Covering all these needs with a fixed hardware accelerator might mean either accelerating only very primitive operations, as in a large bank of multiply-accumulators, and shifting a lot of the work back onto the MCU, or else building a very complex—and very hard to verify—reprogrammable state machine. If the latter approach is taken, there will immediately be pressure from data center managers to make the accelerator more general and more accessible to users for application acceleration. HSM vendors must balance these desires against the absolute need to keep the accelerator verifiable during the design process and secure during operation. Some security experts, though, argue that user programmability and security are fundamentally incompatible. If you want the accelerator to be incorruptible, you must define and verify its functions at design time.

The custom hardware—primarily the crypto datapath—could be done in an easy ASIC, but it would require special attention to ensure that differential power attacks could glean no information from the ASIC’s supply rails, and that the circuitry was protected against voltage and temperature exploits—not unsimilar to the precautions you would take designing a smartcard chip. With these provisions, a secure MCU core could be included in the ASIC as well, if the design team had the necessary expertise or access to appropriate intellectual property (IP). ARM, for example, is now offering a tamper-resistant line of processor IP cores based on the Cortex*-M architecture and called SecureCore. These might prove adequate if the heavy lifting of the crypto algorithms stays in the accelerator.

This custom design could also be done in an FPGA. But use of an FPGA raises some new issues. Most FPGAs are volatile and configured at power-up from an external memory. This boot process can be protected by encryption, and vendors provide for that. Also, most FPGAs have limited or no mixed-signal capabilities, so it might be impossible to integrate the range of sensor inputs required for tamper detection without external analog-to-digital converters (ADCs), which would themselves add to the attack surface and have to be protected. There are exceptions to both the need for external configuration ROM and lack of mixed-signal circuits, but the exceptions tend to be smaller devices, such as the Intel® MAX® 10 device family.

FPGAs also introduce some new opportunities. Because the accelerator datapath would be run-time reconfigurable, the crypto accelerator could be reconfigured for each algorithm family as needed, bypassing the dilemma of flexibility versus security. Additionally, there has been some work in creating entropy sources in FPGAs for use by true random number generators.

All of these implementation options raise another important question. With so many ways to implement the HSM, how can a user know how secure a particular device actually is? The answer is independent certification. The main standard used for HSMs, Federal Information Processing Standard (FIPS) 140-2, was created by the US National Institute of Standards and Technology (NIST). FIPS 140-2 defines four levels of security, ranging from just an unprotected crypto engine on the weak end to an engine and storage subsystem fully enclosed by intrusion and tamper resistant or detecting hardware on the strong end. Each individual design must be certified by a third-party lab recognized under a certification program jointly operated by NIST and Canada’s Communications Security Establishment.

HSMs may also be evaluated at the product level under the international Common Criteria for Information Technology Security Evaluation (glibly known as CC), ISO 15408. This certification process is also done by recognized third-party labs. But unlike FIPS 140-2, which evaluates the overall actual security level of the HSM, CC evaluation in effect only checks that claims submitted by the vendor are supportable. This approach, which may or may not involve actual testing of the product, has been used to, for example, get various versions of Microsoft Windows certified under the CC. So it is pretty much up to the user to determine what was actually certified, at what level, and what the implications are for their own use case.

An Embedded HSM?

The challenges that brought HSMs to the data center are now present in edge computing, with a few important differences. Embedded systems are likely to use just a few crypto algorithms compared to the plethora a data center would face. And similarly, connected embedded systems probably would need to manage far fewer keys than a data center. Both of these differences could simplify the one big problem with bringing HSMs to embedded systems.

That problem is scale. Depending on capabilities and level of security, data-center HSMs cost from hundreds to thousands of dollars. The range in size from PCIe* cards to pizza-sized boxes. For an edge-computing rack full of servers that is not a serious problem. But for a more typical embedded system, supposed to fit into a small box or onto a circuit board inside a mechanical assembly, it is a non-starter. There is a clear need for HSM technology to scale down from the pizza box to chip level without compromising functionality or security.

But is this feasible? Technically, the answer appears to be yes. As we have seen, all the functions of an HSM could in principle be absorbed into an ASIC or FPGA, with the exception of some sensors and the more mechanical elements of physical intrusion detection. And MCU vendors have already offered pieces of a full solution, including secure software-execution modes, on-chip private memories, and limited crypto accelerators. As one report observed, even ordinary smartcard hardware could be used as a reasonably secure but very limited HSM. So an embedded design team with the requisite skills and motivation should be able to produce a chip-level HSM.

But such a project would face several serious challenges. The requisite skills include secure processor and memory design, a good grasp of cryptography, and experience with physical tamper protection. That’s not a common skill set in embedded design teams. The design should get FIPS 140-2 certification. But that can be an expensive and time-consuming process, as can ISO 15408, running into the hundreds of thousands of dollars and months of delays. And all this work could only be amortized across the relatively tiny volumes of the embedded system under design.

Most serious, perhaps would be a less tangible challenge: convincing management to take system security seriously enough to ignore halfway measures and undertake an HSM chip design. Unfortunately, there is still in management a great deal of magical thinking about the threats facing connected embedded systems, even in applications like power generation and transportation where the potential for damage is vast.

But there is another way. Perhaps it is time for a semiconductor vendor, with its far broader market and great access to specialized expertises, to undertake a FIPS 140-2 certified HSM chip. At some point, after a few more high-profile attacks on too-important and too-vulnerable physical plants, further progress in edge computing may require it.


CATEGORIES : Embedded system/ AUTHOR : Ron Wilson

Write a Reply or Comment

Your email address will not be published.