Blockchains are spreading. Originally proposed in 2008, by the still-unidentified Satoshi Nakamoto, as a secure public ledger for recording Bitcoin transactions, blockchains have moved beyond Bitcoin to a profusion of other cryptocurrencies, and beyond cryptocurrencies into a plethora of other applications (Figure 1).
This proliferation raises a vital question for cloud service providers, data-center architects, and ultimately, for designers of embedded and edge computing. Just what kind of workloads do blockchains create, and what are their system implications? It is not an easy question because the answer depends on scalability, and that, in turn, depends on the use and implementation of the particular blockchain.
Satoshi Nakamoto, whoever he may be, originally described a blockchain as a public record of who transferred what Bitcoin amounts to whom. The data structure was designed so any user could view any transaction, yet all parties remained anonymous behind a wall of public-key cryptography and no one could alter a transaction or create an invalid one that would survive the scrutiny of the protocol. From the beginning, the developers anticipated blockchain uses beyond Bitcoin so they designed-in flexibility.
Spending Bitcoin is, in the most abstract sense, a contract between two parties to change ownership of an asset. The blockchain was designed to support this abstract sense. The payload in each transaction record includes an executable script that could, in principle, perform any arbitrarily complicated function. The Bitcoin blockchain is mainly used to authenticate the two parties and check the validity of the transfer. But in another application the script could serve a purpose beyond simple verification.
It could, for example, verify the transfer of a physical object from a factory to a shipper, or from a trucking company to a marine terminal. Or it could perform another transaction, such as selling stock in accordance with an option contract. Or it could initiate a physical action, such as updating software or actuating a function on a machine. The only limitations are the script’s access to external data, the application’s tolerance for latency—blockchain transactions can take several seconds or minutes—and the developer’s creativity.
Accordingly, there has been considerable activity in new applications, both from startups and from established companies. Numerous teams have built asset tracking systems and supply chain management systems. Both eFinance startups and banks have begun developing blockchain systems for clearing derivatives trades and other back-office contract processing. A variety of projects are investigating blockchains for use in various aspects of the Internet of Things—especially remote software update and secure registration.
With all of these activities it seems possible—even allowing for the tech industry’s wild optimism about undeployed new toys—that blockchain implementations will make up a significant share of datacenter workloads in the future. And with growing discussion of IoT blockchains, they may become an important factor in edge computing as well. So it is important to understand what actually goes on when a blockchain is in use.
Blockchains in Action
If we want to understand blockchain workloads, we have to look at three separate tasks, each of which is intermittently active as the blockchain grows. One task creates a transaction record in response to an external event. The next task packages a number of transaction records into a new block. The third task determines which of the peers on the blockchain network appends the new block to the chain.
For Bitcoin, an elemental blockchain implementation, the process goes something like this. A Bitcoin transaction is a simple, unconditional transfer of some amount of Bitcoin from one or more inputs to one or more outputs. Except when new Bitcoin is mined, all of the inputs of a transaction come from the outputs of earlier transactions. For instance, if Joe wants to send me a Bitcoin, he will create a transaction with some number of inputs and one output. The inputs will reference, using a unique ID, the outputs of earlier transactions that sent Bitcoins to Joe, and the output will reference me. When I want to spend Bitcoin, I will create a new transaction with my input referencing the output of Joe’s transaction and an output referencing my payee.
The genius of Bitcoin is making this all public yet concealing the identities of all parties involved. I don’t have to log into some central accounting system to make the transaction—I just compose it, normally through a Bitcoin wallet, and broadcast it to the network. The nodes of the network verify my transaction and see that it is properly included in the blockchain. All of this is done through public-key cryptography (Figure 2).
Each participant on the Bitcoin network has a private key and a matching public key. Each account ID is a hash of its public key. When Joe sends me his Bitcoin, he includes in the transaction record his public key and a cryptographic signature—a hash of the most important parts of the record, encrypted with his private key. He addresses the output to my ID—a hash of my public key. He labels the record not with a sequence number or a time stamp, but with a hash of most of the record itself.
When I spend the Bitcoin, any third party can check the validity of the transaction using just the information contained in the record. They can hash my public key and compare it to the ID number Joe sent the funds to, proving I was the recipient. They can use my public key to decrypt my signature, proving only I, or someone who has stolen my private key, could have created this transaction.
That is exactly what happens when someone creates a transaction. When I spend the funds Joe sent me, I create a transaction record that sends the Bitcoin to someone else, then broadcast that record to all the nodes on the Bitcoin network. Each node verifies the transaction and, if it is valid, puts it into a buffer of transactions waiting to be included into a block.
Playing with Blocks
That brings us to the second major task: forming blocks (Figure 3). A blockchain is not a linked list of transactions—it is a linked list of blocks. As each node in the network receives transaction records, it packs them into a block. When the block is full, the nodes compete to determine which one adds its block to the chain. (This competition is necessary to ensure a bad actor can’t corrupt the chain by cleverly creating consecutive invalid blocks.)
The winner of the competition completes the block with a time stamp and a few other pieces, links it to the block at the end of their copy of the blockchain, and broadcasts the new block to the network. Everyone else verifies the new block—including each transaction it contains—if they find it valid, they append it to their copy of the chain. It can be shown that this process, with all nodes working independently and without a master, is virtually impervious to lost messages, node or network failures, or attacks that subvert less than half the nodes.
The competition to complete the next block is the third task we have to consider. For the Bitcoin blockchain, this competition is a computing problem: find a string, called a nonce, that when inserted into the block causes a hash of the block to have a given number of leading zeros. There is no known analytical solution; you just have to keep trying strings until you find one that works. When you find one, you can form a block and broadcast it. Every other node can verify you really solved the problem simply by hashing the block.
This seemingly busy-work task makes the choice of who forms the next block essentially random, assuming all the nodes have similar compute capability. Bitcoin makes the effort worthwhile to node owners by allowing the winning node of each contest to issue itself a new Bitcoin. This has led to something of an arms race among so-called Bitcoin miners, investing in ever more powerful hardware engines to find nonces, on the assumption that the growing value of Bitcoins will repay their investment. That race seems to have halted rather abruptly as Bitcoin value has slowed of late. But the fact remains, finding the nonce in order to create the next block is a hugely compute-intensive task.
The Bitcoin blockchain was optimized for the cryptocurrency it created. Many of the individual decisions—how transactions and blocks are structured, how much central authority and control is involved, who can participate, and how the competition is conducted—were made for a cryptocurrency trading network. Other applications have led to other choices.
For instance, anyone can join the Bitcoin network. As of last summer, studies suggest there were between three and six million active users, and over 20 million wallets, as accounts on Bitcoin exchanges are known.
But a blockchain for a different purpose might have a far smaller and more restricted network. A blockchain network for derivatives clearing might be restricted to licensed securities dealers. A network for asset tracking might include only a few hundred approved vendors, shippers, and customers. A network for secure software updates might have only a few registered devices.
Restricting membership might seem like a simplification, but it can actually complicate things. If you have a central authority authenticate transactions, you have just created a single point of attack and undone much of the point of the blockchain. But if you keep authentication distributed, each node has to verify that the user who created the transaction is authorized to perform that transaction. That in turn may require each node keeps a log of acceptable public keys, creating yet another vulnerability.
Another important variation arises when users want the ledger to be secure but not public—for instance, a trading environment where members keep their contracts secret from each other for competitive reasons. In this case the entire mechanism of the blockchain—generating transactions, verifying them, and creating blocks—may be kept in a secure container using a secure operating environment such as that produced by Intel® Software Guard Extensions (Intel® SGX). Microsoft’s Confidential Consortium (Coco) Framework uses this technique. Users can request contracts or transactions, but they can’t see the blockchain or its machinery, which stays inside the secure container. The blockchain is allowed outside the container only in encrypted form. Such approaches require a hardware-based means of protecting memory and execution streams from tampering.
Along with different ideas about security, there are different types of transactions. Bitcoin mainly uses one simple transaction: I redeem funds someone paid to me, by paying them to some third party. Once the block containing the transaction redeeming the funds is posted, the transaction is complete and cannot be altered. But even the Bitcoin blockchain allows for more complex transactions.
Bitcoin’s transaction record includes not just data strings, but also executable scripts in a Forth-like language. So a transaction could involve computations and conditional actions employing any data available to the script interpreter at run time. For instance, a transaction might occur when and only when some criteria are met. A bond may pay a dividend at the end of each month, until either the bond matures or the issuer’s quarterly profit falls below a predefined threshold.
Such transactions, common in financial applications, are more properly called contracts and raise new issues. Simple transactions are just created by the sender, broadcast, verified, and put into blocks. Once their outputs have been redeemed the transactions are essentially dead. But the script that defines a contract may stay active until all of its conditions have been met. This may mean continually cycling through all the transaction records in the blockchain that contain still-active contract scripts. Or it may mean continuously monitoring external data feeds, setting triggers, and linking the triggers to the scripts they would activate. This adds a whole new dimension to the workload, and may convert what had been more of a list-processing task into a stream-processing task.
About that Competition
Bitcoin’s coin-mining process is ruinously difficult, and hence impractical for most applications. But it is still essential to the blockchain’s integrity that no node be able to predict who will form the next block. If we eliminate mining, a central authority can pick the next winner based on some random process. But this once again introduces a single point of attack for hackers. Or members could participate in some sort of game that is known not to converge to a predictable sequence of states and randomly distributes the winners. Intel, once again relying on its secure execution mode, has proposed an alternative in which every node enters a secure timing loop until a randomized counter has counted to zero and emitted an encrypted token. The first node to get their token broadcasts it and everyone else authenticates it with a public key. The computing load, while not inconsequential, is massively lower than searching for a nonce in a large block.
The precise application and its security requirements will dictate choices about how to implement a blockchain and its surrounding platform. And those choices, in turn, will influence computing requirements.
Clearly there is no simple answer to predicting the resource loading from a blockchain. Too much depends on implementation choices. But we can at least identify subtasks that are likely to present challenges as blockchains grow and become more frequent.
We should start by ignoring proof-of-work tasks such as Bitcoin mining. While at the heart of cryptocurrencies, they are irrelevant elsewhere and tend to force the blockchain onto specialized FPGAs or ASICs.
But that still leaves a number of important tasks. One, obviously, is all the hashing that has to be done: hashes of public keys, of transaction records, and of blocks. Individually, these computations aren’t onerous. But they are byte-level operations that add up quickly and can benefit from hardware assistance, either by special CPU instructions or an external accelerator.
Then there are the encryption and decryption tasks. Of these probably the heaviest is the asymmetric encryption to sign the hash of the transaction record. This must be done every time someone creates a new transaction. In most applications this will be infrequent enough to be unimportant, but it does put a lower bound on the computing power of a node. And with many nodes participating in many blockchains in the same cloud, it could become significant enough to accelerate.
In cases where the blockchain is stored in encrypted form, there is the work of encrypting—probably using a relatively efficient symmetric algorithm—and decrypting blocks as they are created, checked, or referenced. Depending on the frequency of these operations this job may be offloaded to something like a hardware security module for acceleration. That would also mostly solve the problem of securely storing the necessary keys.
One more task to watch is verifying transactions. While not computationally hard—a hash, a public-key decryption, and some string comparisons—it also requires searching through the blockchain to find the transactions referenced in the inputs. Unless the blockchain is indexed, that could become a very long search through a linked list, as the blockchain in question grows over time. For reference, the Bitcoin blockchain has been growing exponentially and passed 185 GB in September, 2018. And there is no guarantee that the referenced transaction is in a recently-created block. Every node trying to form or verify a block must do this verification for each input of each transaction in the new block. Poorly managed, the resulting searches could become a nightmare for the storage system—especially in an edge computing environment. In extreme cases, map/reduce algorithms may be in order just to search the blockchain.
Finally, in contracts triggered by external events, there is the need to monitor data feeds, such as stock-exchange ticker feeds, for trigger events, and then find and execute the appropriate contract in the blockchain when a trigger even occurs. This requirement amounts to directing one or more data streams through regular-expression processors, then performing a search of a linked list, perhaps assisted by an index. With high-density, fast input streams, which are often already parsed and processed by FPGA accelerators, and with large numbers of open contracts, this task could scale poorly—particularly in latency-critical applications.
None of these tasks is beyond the capabilities of modern CPUs if scaling remains restricted. But as blockchains grow, as they demand more encryption, or as they require monitoring of intense data streams, data center and node architects may need recourse to hardware acceleration. Further in the future, if quantum computing becomes a practical reality, it will undermine the public-key encryption upon which blockchains are based. A new assessment will have to be made based on the requirements of some new quantum-resistant cryptography.
In the short term, then, the needs of blockchains are manageable, even at the edge. In the longer term, their requirements, like the chains themselves, are unbounded.