As one security expert has said, system security is not a thing, it is a process. In our increasingly connected—and increasingly hostile—environment, security has become a continuous case of new attacks leading to new countermeasures, followed by more innovative new attacks, and so on. Just as the Cold War arms race sapped the resources of its belligerents, this new challenge is claiming an ever-growing fraction of the resources of vulnerable systems.
In data centers, provisioned with fungible computing resources on a massive scale, this drain is problem enough. But in embedded systems, where resources are dedicated and often strictly limited, security may become an insuperable obstacle. Yet as the Internet of Things (IoT) spreads its tendrils through the developed world’s critical infrastructure, the need for security in these systems is becoming literally a matter of life and death. So it is vital for embedded-system designers, just as much as for data-center architects, to understand this evolving conflict.
We can divide the pursuit of system security into three distinct stages. This division works both physically—the stages focus on different areas of the system—and chronologically, as designers tend to progress through the stages in order as they come to understand the depth of their problem. And the stages lend themselves to a nice military analogy (Figure 1).
The stages are perimeter integrity, internal security, and active security. In brief, perimeter integrity techniques seek to prevent an attack from entering the system, much as a ring wall surrounded a castle. Internal security measures try to minimize the damage should an attack penetrate the system, just as towers in a wall, street mazes, and, ultimately, the castle keep sought to defeat attackers who breached the walls. Active security works to identify and neutralize attacks anywhere in the system, much as a castle guard would patrol outside and inside the walls, lay ambushes, sortie against enemy forces, or counter-mine against enemy tunnels.
In the real world all there stages of system security interlock and rely on each other, just as castle walls require a garrison, and without walls the garrison would have to be far larger. But we will examine each stage separately—in part for clarity, and in part because hapless system designers tend not to implement a stage until the previous one has failed them. It is rather as if a warlord built undefended walls, then when the wall was overrun, built towers and a keep manned by citizens with clubs and pitchforks, and only after a second defeat hired, armed, and trained soldiers and officers.
The most intuitively obvious way to protect your treasure is to build a wall around it. The fact that this has almost never succeeded—either in military history or in system security—somehow has not diminished its appeal to either politicians or system designers. So we should begin by looking at perimeter integrity: the attacks it addresses, the techniques it employs, and the resources it requires.
A fundamental form of attack on a system is false identity. An attacker pretends to be someone with the authority to read or write data, execute code, or update your system. Blocking such attacks requires authentication: requiring the corresponding party to prove their identity, usually with a password or certificate.
But anyone monitoring the transaction could easily copy a password. So systems often encrypt passwords with a changing pattern, preventing a so-called man in the middle from stealing even an encrypted form of the password and reusing it.
Still this is not enough. That middleman could steal valuable data just by watching, or he could corrupt code or data being sent to the system. So it is important in many applications to encrypt entire transactions, not just passwords. This level of security is often provided on the Internet by Transport Layer Security (TLS, the successor to Secure Socket Layer (SSL)) or by IPsec. In principle, assuming no keys are stolen, these protocols protect the entire content of a transaction from reading or tampering. There are significant differences between the two, however.
TLS, the basis of the familiar https secure Web connection, runs in Layer 4, on top of Transmission Control Protocol (TCP). Consequently, it can be implemented in applications such as browsers. IPsec, running in Layer 3, must be supported in the operating-system kernel, so it is nowhere near as widely used as TLS. Today IPsec is mainly employed in enterprise Virtual Private Networks (VPNs), even though it is arguably more secure than TLS. A third, and even more powerful means of encrypting Ethernet traffic, MACsec, encrypts the entire packet, so that the man in the middle has no idea of the source, destination, format, or content. But MACsec requires significant hardware changes to the Ethernet interface, and is not as yet widely used.
Both authentication and encryption present a logical problem: how do you get an encryption key to your correspondent in the first place without someone stealing it? The answer is usually public-key cryptography. This unfortunately compute-intensive process uses a pair of keys, both derived from a large random number. On key is public—you can share it with anyone. The other is private, and must be protected by its owner.
For authentication, a server can encrypt a certificate (or a hash of it) using its private key, and send this encrypted message to a client. Then any client can verify that the server is the actual owner of the private key by decrypting the message with the matching public key. If the decryption produces a valid certificate or hash, the data must have been encrypted with the matching private key. There are further details about registries to record ownership of certificates that provide more confidence. Unfortunately, server operators are often less than careful about keeping certificates registered, so most of the time clients are happy if they can decrypt the message.
For security of the traffic itself, the process is just the reverse. Anyone can encrypt a message using a public key, but only the holder of the matching private key can decrypt it.
This asymmetric approach gives a way of passing information without first having to somehow pass an unencrypted secret key between the server and the client. But the encryption and decryption tasks are arduous. Gregory Baudet of Barco Silex says that checking one short digital signature with crypto software takes about 2.7 seconds on a Cortex®-M3 microcontroller. That is not a huge issue in a data center, where you can throw a Xeon core at the problem. But in a connected embedded system that may receive hundreds of signatures per second, it is untenable without hardware acceleration. Even in the client-server world, TLS only uses public-key encryption to protect the exchange of a separate set of secret keys, which are then used in a far more efficient symmetric-key algorithm to protect the real messages.
None of this can protect you if your correspondent—or one of the Internet hops along the way, if you are using TLS—is infected. So perimeter integrity may include a last line of defense that scans incoming decrypted messages for threats. In a data center this may be an exhaustive search for patterns associated with known viruses. In an embedded system that has only a limited repertoire of messages, the task may be a simple rule check. Depending on complexity and speed, this firewall may be anything from a carefully protected software task to a dedicated hardware regular-expression engine.
Military history is cluttered with the remains of breached walls and destroyed forts. And today, as horrific tales of security breaches accumulate, data-center architects are coming to accept that perimeter defense is not enough. Their systems must be secure against an attack that is already inside the perimeter. Increasingly, the same is true for embedded systems.
It is natural to think of an internal threat as an attack that has somehow battered its way through the outer defenses of the system. But since the revelations by Edward Snowden, attention has shifted to another scenario: the insider threat. If a target is of high value—and many industrial, infrastructure, and transportation systems are in this category—the easiest and most reliable attack may be to get an insider—an authorized user—to do the deed. With passwords or credentials that allow them to enter the system freely, insiders have already bypassed the first line of defense, and can go to work directly at inserting malware or stealing data. Increasingly, it is these insider threats that are the focus of internal security.
Perhaps the central task of internal security is to prevent execution of unauthorized code. This task, in turn, begins with a secure boot process. Ultimately, the system must rely on tamper-proof hardware to authenticate boot code before anything gets executed. The boot code, in turn, must authenticate operating-system code before passing control to it (Figure 2). Often, this process will include creation of a trusted mode running authenticated, privileged code that will manage access to key stores, access to address spaces, and authentication of application codes.
The most sensitive of these tasks is protection of keys. Often in data centers key storage, key management, and crypto acceleration will all reside in a dedicated hardware security module, separate from the servers. Such modules provide access control, encryption, and tamper resistance. In embedded systems the question of key storage is far more difficult. On one hand, cost, size, and power constraints usually make a hardware security module infeasible. On the other hand, property or even lives may be at risk. And the embedded system may be physically under the control of an attacker, so the key store must be protected against side-channel attacks and physical tampering, not just against unauthorized read requests.
Constraints may force embedded-system developers to store keys and crypto code somewhere in the system processor’s address space. This choice emphasizes the vital importance of a secure boot process, and of a trusted operating mode that has sole access to all the memory-management units in the system. Naturally it is equally vital that no untrusted mode on any device have direct access to physical memory.
These measures not only make it possible to protect keys, but they enable the other facets of internal security as well. If a task can only be initialized and granted memory access via trusted code, the system has a good chance of preventing most kinds of malware attacks. But there is always the chance of something, somewhere, compromising the defenses.
A final line of static defense—obvious but amazingly neglected in the interest of performance or economy—is to encrypt data in storage. If stored data is encrypted, any task that attempts a read or write it must present a valid key. So even if an attacker were able to launch a malware task, he would still have to get access to an authorized task’s storage-encryption key. Storage encryption has the added benefit of protecting data even if the medium—be it flash, disk, or tape—is physically stolen.
But there are challenges. Decryption adds to storage access latency—especially if done in software. Worse, a software decryption task is a glaringly inviting target for attackers. So for both reasons, hardware-based encryption is essential.
And there is the question of where to put the encryption wall. Between disk and DRAM is an obvious choice. But with increasing use of flash as disk cache or even as main memory, do you really want unencrypted files in a shared, non-volatile memory? In the most secure systems, the processor SoC itself may be the crypto site, so that even in DRAM the data is encrypted, and the only clear version is in on-chip caches. This approach has major implications for SoC design and system performance.
Finally, there is deterrence. Even if it is not possible to completely protect the system from an insider threat, it may be possible to make the attack too dangerous for the insider. To this end, systems can track activity by user or by task, especially noting milestones like launching a new task or transferring data into or out of the system. Then these logs can be audited, either algorithmically or by human inspection, to identify suspicious activity. Auditing can’t prevent an attack from succeeding. But many insider threats require repeated attempts to find a weakness in the system’s defenses. Auditing can make any attack that isn’t likely to succeed quickly just too big a risk for the attacker to take.
This scrutiny of the system’s behavior doesn’t have to be done off-line and after the fact. You can design a system to monitor its own behavior and even take countermeasures against suspicious activity. That idea takes us into the third layer of security, active methods.
The measures we have discussed so far have been essentially passive. Either they attempt to block access to system resources or they try to render those resources unusable to an intruder. But there is a third category of defenses that employs a very different tactic: assuming that a security breach has occurred, and acting to counter it. We can cite three kinds of such measures: scrubbing, monitoring, and machine learning. All three find their heritage not in security, but in the realms of reliability and random-fault detection. And they pursue intrusions as if they were hardware glitches.
Scrubbing’s idea is simple: if there is state in the system that is critical to security, read it frequently, verify its correctness against a hash, and if necessary give an alarm or correct the state. Candidates for such scrubbing would include boot ROMs, any code executed in a CPU’s secure mode, authentication keys, MMU and cache TLB entries, and the control RAM that determines the function of FPGAs. It is very hard to avoid an infinite-regression problem if this scrubbing is not done by separate hardware, isolated from tampering malware.
Our second category is behavior monitoring: watching the system to ensure that it does nothing wrong. In practice this approach is more useful in embedded systems, where incorrect behavior may be clearly defined, than in data centers, where the host may have no clue about the expected behavior of a customer’s code.
An extreme example used in some high-reliability systems is triple module redundancy (TMR), see Figure 3. Here, you have three modules of identical behavior—but, ideally, of different implementations—running in lock-step and voting on the result of each step. An attacker would have to cause two different modules to get the same wrong result on the same step in order to influence the system behavior.
TMR demands three complete modules plus comparison, voting, and error-recovery hardware. If these are not secure hardware, attack on the voting or recovery processes negates the protection afforded by the redundancy. You can save money by using only two modules, but the system can then only detect errors, not correct them on the fly.
A related approach can save even more. Instead of a redundant module, create a state machine that compares the system state at each critical step to a set of rules that fully define acceptable behavior. This approach can save a lot of cost and power in embedded systems where it is possible to concisely define acceptable behavior. And it is used in functionally safe systems where it is possible to define unsafe actions. But in practice it can be excruciatingly hard to discover an adequate rule set.
That brings us to our third—and today, still experimental—alternative: deep learning. Connect the critical state variables of the system to a deep-learning neural network. Train the network with a blend of normal operation, known attacks, and randomly injected faults. You may never know how the resulting network operates, and you can’t prove that the network is sufficient or even correct. But in complex systems it is likely to out-perform human-devised rule lists. Needless to say, the trained network itself must be secure against attacks.
These active measures raise a critical question: at what level should we apply them? Should the module we are checking be an individual register, or a functional block within a processor, a whole processor within an SoC, or an entire embedded system? The finer-grained the module, the greater the cost and, ironically, the harder it is to detect some attacks. A perfectly functioning CPU can execute malware perfectly. Checking at the full-system level may be more economical, but it may be far more difficult to define the range of acceptable behavior, and to discover an attack before it has done internal damage to the system.
We offer three kinds of defenses: perimeter integrity, vital-asset securing, and active monitoring. From outside walls to castle stronghold to mobile guards, we offer nothing unfamiliar to a medieval military commander. Yet as cloud data centers, enabled by hardware function virtualization, take on more security-critical tasks, these ideas will appear as innovations in server, network, and storage hardware. And as embedded systems become increasingly mission-critical and connected, the ideas will be implemented in them, too.
Be ready. It is your castle.
Read a white paper on advanced security hardware in FPGAs.