The phrase neuromorphic computing has a long history, dating back at least to the 1980s, when legendary Caltech researcher Carver Mead proposed designing ICs to mimic the organization of living neuron cells. But recently the term has taken on a much more specific meaning, to denote a branch of neural network research that has diverged significantly from the orthodoxy of convolutional deep-learning networks. So, what exactly is neuromorphic computing now? And does it have a future of important applications, or is it just another fertile ground for sowing thesis projects?
A Matter of Definition
As the name implies—if you rea Greek, anyway—neuromorphic networks model themselves closely on biological nerve cells, or neurons. This is quite unlike modern deep-learning networks, so it is worthwhile to take a quick look at biological neurons.
Living nerve cells have four major components (Figure 1). Electrochemical pulses enter the cell through tiny interface points called synapses. The synapses are scattered over the surfaces of tree-root-like fibers called dendrites, which reach out into the surrounding nerve tissue, gather pulses from their synapses, and conduct the pulses back to the heart of the neuron, the cell body.
In the cell body are structures that transform the many pulse trains arriving over the dendrites into an output pulse train. At least 20 different transform types have been identified in nature, ranging from simple logic-like functions to some rather sophisticated transforms. One of the most interesting for researchers—and the most widely used in neuromorphic computing—is the leaky integrator: a function that adds up pulses as they arrive, while constantly decrementing the sum at a fixed rate. If the sum exceeds a threshold, the cell body outputs a pulse.
Synapses, dendrites, and cell bodies are three of the four components. The fourth one is the axon: the tree-like fiber that conducts output pulses from the cell body into the nervous tissue, ending at synapses on other cells’ dendrites or on muscle or organ synapses.
So neuromorphic computers use architectural structures modeled on neurons. But there are many different implementation approaches, ranging from pure software simulations to dedicated ICs. The best way to define the field as it exists today may be to contrast it against traditional neural networks. Both are networks in which relatively simple computations occur at the nodes. But beyond that generalization there are many important differences.
Perhaps the most fundamental difference is in signaling. The nodes in traditional neural networks communicate by sending numbers across the network, usually represented as either floating-point or integer digital quantities. Neuromorphic nodes send pulses, or sometimes strings of pulses, in which timing and frequency carry the information—in other words, forms of pulse code modulation. This is similar to what we observe in biological nervous systems.
A second important difference is in the function performed in each node. Conventional network nodes do arithmetic: they multiply the numbers arriving on each of their inputs by predetermined weights and add up the products. Mathematicians see this as a simple dot product of the input vector and the weight vector. The resulting sum may then be subjected to some non-linear function such as normalization, min or max setting, or whatever other creative impulse moves the network designer. The number is then sent on to the next layer in the network.
In contrast, neuromorphic nodes, like neuron cell bodies, can perform a large array of pulse-oriented functions. Most commonly used, as we have mentioned, is the leaky integrate and spike function, but various designers have implemented many others. Like real neurons, neuromorphic nodes usually have many input connections feeding in, but usually only one output. In reference to living cells, neuromorphic inputs are often called synapses or dendrites, the node may be called a neuron, and the output tree an axon.
The topologies of conventional and neuromorphic networks also differ significantly. Conventional deep-learning networks comprise strictly cascaded layers of computing nodes. The outputs from one layer of nodes go only into selected inputs of the next layer (Figure 2). In inference mode—when the network is already trained and is in use—signals flow only in one direction. (During training, signals flow in both directions, as we will discuss in a moment.)
There are no such restrictions on the topology of neuromorphic networks. As in real nervous tissue, a neuromorphic node may get inputs from any other node, and its axon may extend to anywhere (Figure 3). Thus, configurations such as feedback loops and delay-line memories, anathema in conventional neural networks, are in principle quite acceptable in the neuromorphic field. This allows the topologies of neuromorphic networks to extend well beyond what can be done in conventional networks, into areas of research such as long-short term memory networks and other recurrent networks.
Carver Mead may have dreamt of implementing the structure of a neuron in silicon, but developers of today’s deep-learning networks have abandoned that idea for a much simpler approach. Modern, conventional neural networks are in effect software simulations—computer programs that perform the matrix arithmetic defined by the neural network architecture. The network is just a graphic representation of a large linear algebra computation.
Given the inefficiencies of simulation, developers have been quick to adopt optimizations to reduce the computing load, and hardware accelerators to speed execution. Data compression, use of shorter number formats for the weights and outputs, and use of sparse-matrix algorithms have all been applied. GPUs, clever arrangements of multiply-accumulator arrays, and FPGAs have been used as accelerators. An interesting recent trend has been to explore FPGAs or ASICs organized as data-flow engines with embedded RAM, in an effort to reduce the massive memory traffic loads that can form around the accelerators—in effect, extracting a data-flow graph from the network and encoding it in silicon.
In contrast, silicon implementations of neuromorphic processors tend to resemble architecturally the biological neurons they consciously mimic, with identifiable hardware blocks corresponding to synapses, dendrites, cell bodies, and axons. The implementations are usually, but not always, digital, allowing them to run much faster than organic neurons or analog emulations, but they retain the pulsed operation of the biological cells and are often event-driven, offering the opportunity for huge energy savings compared to software or to synchronous arithmetic circuits.
The grandfather of neuromorphic chips is IBM’s TrueNorth, a 2014 spin-off from the US DARPA research program Systems of Neuromorphic Adaptive Plastic Scalable Electronics. (Now that is really working for an acronym.) The heart of TrueNorth is a digital core that is replicated within a network-on-chip interconnect grid. The core contains five key blocks:
The TrueNorth chip includes 4,096 such cores.
The components in the core cooperate to perform a hardware emulation of neuron activity. Pulses move through the crossbar switch from axons to synapses to the neuron processor, and are transformed for each virtual neuron. Pulse trains pass through the routers to and from other cores as encoded packets. Since transforms like leaky integration depend on arrival time, the supervisory hardware in the cores must keep track of a time-stamping mechanism to understand the intended arrival time of packets.
Like many other neuromorphic implementations, TrueNorth’s main neuron function is a leaky pulse integrator, but designers have added a number of other functions, selectable via control bits in the local SRAM. As an exercise, IBM designers showed that their neuron was sufficiently flexible to mimic 20 different functions that have been observed in living neurons.
So far we have discussed mostly behavior of conventional and neuromorphic networks that have already been fully trained. But of course that is only part of the story. How the networks learn defines another important distinction between conventional and neuromorphic networks. And that subject will introduce another IC example.
Let’s start with networks of living neurons. Learning in these living organisms is not well understood, but a few of the things we do know are relevant here. First, there are two separate aspects to learning: real nerve cells are able to reach out and establish new connections, in effect rewiring the network as they learn. And they also have a wide variety of functions available in cell bodies. So, learning can involve both changing connections and changing functions. Second, real nervous systems learn very quickly. Humans can learn to recognize a new face or a new abstract symbol, with one or two instances. Conventional convolutional deep-learning networks might require tens of thousands of training examples to master the new item.
This observation suggests, correctly, that training of deep-learning networks is profoundly different from biological learning. To begin with, the two aspects of learning are separated. Designers specify a topology before training, and it does not change unless the network requires redesign. Only the weights applied to the inputs at each node are altered during training.
The process itself is also different. The implementation of the network that gets trained is generally a software simulation running on server CPUs, often with graphics processing unit (GPU) acceleration. Trainers must assemble huge numbers—often tens or hundreds of thousands—of input data sets, and label each one with the correct classification values. Then one by one, trainers feed an input data set into the simulation’s inputs, and simultaneously input the labels. The software compares the output of the network to the correct classification and adjusts the weights of the final stage to bring the output closer to the right answers, generally using a gradient descent algorithm. Then the software moves back to the next previous stage, and repeats the process, and so on, until all the weights in the network have been adjusted to be a bit closer to yielding the correct classification for this example. Then on to the next example. Obviously this is time- and compute-intensive.
Once the network has been trained and tested—there is no guarantee that training on a given network and set of examples will be successful—designers extract the weights from the trained network, optimize the computations, and port the topology and weights to an entirely different piece of software with a quite different sort of hardware acceleration, this time optimized for inference. This is how a convolutional network that required days of training in a GPU-accelerated cloud can end up running in a smart phone.
Learning in TrueNorth is quite a different matter. The system includes its own programming language that allows users to set up the parameters in each core’s local SRAM, defining synapses within the core, selecting weights to apply to them, and choosing the functions for the virtual neurons, as well as setting up the routing table for connections with other cores. There is no learning mode per se, but apparently the programming environment can be set up so that TrueNorth cores can modify their own SRAMs, allowing for experiments with a wide variety of learning models.
That brings us to one more example, the Loihi chip described this year by Intel. Superficially, Loihi resembles TrueNorth rather closely. The chip is built as an orthogonal array cores that contain digital emulations of cell-body functions and SRAM-based synaptic connection tables. Both use digital pulses to carry information. But that is about the end of the similarity.
Instead of one time-multiplexed neuron processor in each core, each Loihi core contains 1,024 simple pulse processors, preconnected in what Intel describes as tree-like groups. Communications between these little pulse processors are said to be entirely asynchronous. The processors themselves perform leaky integration via a digital state machine. Synapse weights vary the influence of each synapse on the neuron body. Connectivity is hierarchical, with direct tree connections within a group, links between groups within a core, and a mesh packet network connecting the 128 cores on the die.
The largest difference between Loihi and TrueNorth is in learning. Each Loihi core includes a microcoded Learning Engine that captures trace data from each neuron’s synaptic inputs and axon outputs and can modify the synaptic weights during operation. The fact that the engine is programmable allows users to explore different kinds of learning, including unsupervised approaches, where the network learns without requiring tagged examples.
Where are the Apps?
We have only described two digital implementations of neuromorphic networks. There are many more examples, both digital and mixed-signal, as well as some rather speculative projects such as an MIT analog device using crystalline silicon-germanium to implement synapses. But are these devices only research aids and curiosities, or will they have practical applications? After all, conventional deep-learning networks, for all their training costs and—probably under-appreciated—limitations, are quite good at some kinds of pattern recognition.
It is just too early to say. Critics point out that in the four years TrueNorth has been available to researchers, the most impressive demo has been a pattern recognition implementation that was less effective than convolutional neural networks, and to make things even less impressive, was constructed by emulating a conventional neural network in the TrueNorth architecture. As for the other implementations, some were intended only for neurological research, some have been little-used, and some, like Loihi, are too recent to have been explored much.
But neuromorphic networks offer two tantalizing promises. First, because they are pulse-driven, potentially asynchronous, and highly parallel, they could be a gateway to an entirely new way of computing at high performance and very low energy. Second, they could be the best vehicle for developing unsupervised learning—a goal that may prove necessary for key applications like autonomous vehicles, security, and natural-language comprehension. Succeed or fail, they will create a lot more thesis projects.