It is getting increasingly difficult to attend a conference without seeing and hearing way too much of terms like Internet of Things (IoT) and Wearable Computing. But please keep reading. Submerged somewhere in the debate over whether there will be 20 billion or 40 billion IoT devices in five years, an unexpected fatality has gone unmarked. Dead as the proverbial doornail is the assumption that the newest semiconductor technology on the market is the best process technology for your next design. That idea perished an accidental victim of the IoT avalanche.
Nothing could illustrate this unlamented passing more clearly than the recent TSMC Ecosystem Forum in Silicon Valley. Along with updates on 20, 16, and 10 nm processes, the giant foundry’s executives announced no less than five new low-power processes based on older geometries, reaching clear back to 180 nm. Why—and why so many? These questions frame a new reality in small-system design.
It’s not that the IoT clamor invented low-power operation. There have been niche products in the ultra-low-power microcontroller area for years. But the hype about enormous volumes has focused the industry’s attention on two specific aspects of the low-power question: the need to operate reliably at extremely low power drain, and the presence of very low duty cycles in many IoT and wearable nodes.
The demand for very low power is coming from a rather unusual source: energy scavenging. Rather than struggle to fit a battery into a tiny package or an inaccessible space, some IoT designers choose to scavenge energy from the node’s environment. They may use a small photovoltaic cell to harvest energy from ambient light, a thermoelectric pile to convert waste heat, an inertial generator to convert motion into current, or some other means. The common result is a reliable—at least in combination with a small rechargeable battery or ultracapacitor—source of a very small amount of power.
This strategy eliminates the battery-change problem, if you can keep the node electronics within a power budget ranging from a few hundred µW from a well-endowed thermoelectric stack to only a few µW from a tiny photocell in an ill-lit room.
Low duty cycles, in contrast, present not a constraint but an opportunity. In general, the further you get away from the data center, the greater portion of the time your node can be idle. A data center might aim to never fall below 80 percent utilization. But in his keynote at the Hot Chips conference this year ARM® CTO Mike Muller estimated the average active time for a CPU browsing the Web to be seven percent, and for an MP3 playback task, about three percent. Moving even further away from the core, an IoT node taking a periodic sample of air temperature might only be active a few milliseconds per hour, or around a millionth of the time.
It is fairly obvious that low duty cycles should be an opportunity to save energy. The question is how. Investigating that question will bring us back to the issues of process technology and ultra-low power.
The most effective strategy for saving energy—thereby extending battery life—in a low-duty-cycle system is closely related to your grandmother’s strategy for managing electric bills: turn things off when you aren’t using them. But following this good advice often involves considerable planning and some hard decisions.
Turning off the power means saving state into non-volatile memory, unless you can design the node to operate without any persistent state information. But saving state requires expending time and energy—and writing to flash memory may require a burst of power too large for an energy scavenger to supply. So it is not always feasible—especially if the idle time will be short or unpredictable. In these cases you may want a low-power mode that retains data, both in state machines and in memory (Figure 1). That is where the older nodes, with their larger transistors, become once-again important.
One of the inherent characteristics of those larger transistors is lower leakage current. If you are building a server that is intended to run at fMAX all the time, low leakage isn’t nearly so important as high speed and low dynamic power—neither of which is a strong point for older processes. But if you are designing a low-duty-cycle system that will spend most of its time in a data-retention mode, and you have to extend the life of a tiny battery, you care a lot more about static leakage than about dynamic power or raw performance.
This fact explains the plethora of process options. At 180 nm you get almost insignificant leakage, but relatively high dynamic power and lower fMAX. With 28 nm you get comparatively high leakage—even with the improved low-leakage transistors TSMC has designed for 28 ULP—but much better dynamic power and speed. You can look at your planned duty cycle and pick your process technology.
Of course it’s never that simple. At every stage of architecture and implementation there are things you can do to influence the duty cycle. For instance, you can screen interrupts in a state machine or a very low-power MCU like an ARM Cortex®-M0, only interrupting the main CPU for critical events that activate demanding code. You can choose a wireless network that allows the node to sleep most of the time, instead of requiring it to be constantly ready to respond to a message. You can use hardware acceleration to shorten the active portion of the duty cycle. You can push tasks up-stream, onto a wireless hub or into the cloud.
Conversely, you can mess up the duty cycle. For example, you can slow down the clock to save power, and end up having a task run so slowly that you never get a chance to use sleep mode. Or you can choose a means of poling the nodes that, like the proverbial night nurse with the sleeping pill, keeps the nodes continually awake.
All told, finding the best combination of dynamic power, running versus idle versus shutdown time, and process technology can be a challenging quest. Add in ideas like ARM’s big.LITTLE multicore CPU configuration, which could allow you to run a difficult thread quickly on a powerful core, then switch to a slower, much-lower-power core for background tasks. At some point you have almost too many choices.
There is one strategy that is a clear winner regardless of the duty cycle or the processor arrangement. Voltage is a quadratic term in the formulae for both static and dynamic power. If you can reduce Vdd, your fMAX will go down, but so will your power. This point explains an important feature of TSMC’s ultra-low-power offerings: they have been characterized for operation at very low—in fact, near-threshold—voltages.
Preparing processes for Vdd in the range of 0.7-0.5V required a number of activities, according to TSMC vice president of R/D Cliff Hou. The company focused on getting the best performance it could from high-threshold transistors. But it also had to deal with two other major issues: timing changes and SRAM topologies.
MOSFETs operating near threshold are going to take longer to drive their loads. This simple bit of physics will limit the clock frequency of most processors to 1 MHz or so. But Hou pointed to another issue as well. “Near threshold you get non-linear waveforms,” he explained. “It was necessary to adjust the static timing analysis to account for this, so that chip designers wouldn’t have to change their methodology.”
With significant changes to timing, it was naturally necessary to inspect all the IP for proper operation at near-threshold levels, Hou said. “In general the inspection went well,” he reported. “But we observed that some cells—those with three to four stages, and those using transmission gates—tend to have problems when you approach 0.5V.”
SRAM presented a different issue: different voltage levels require different cell designs. Normal SRAM cells with read/write assist work fine at higher voltages. But near 0.5V, 8- or 10-transistor cells become necessary. Below 0.5V, Hou suggested logic-based memory cells.
If you are determined to reduce Vdd, you don’t have to stop at the transistor’s threshold voltage. There is a whole world of logic—and even analog—design in the sub-threshold region. Here, in effect, the transistors remain turned off, and your circuits work by modulating the leakage current. Naturally this is a world of very energy-efficient, but very slow, logic. At the TSMC Forum ARM’s Muller argued that it isn’t easy, but in the world of IoT it will be important to employ sub-threshold operation.
Muller described a test chip ARM developed with TSMC in an unspecified 40 nm process with low-voltage optimizations. The chip contained Cortex-A5 and Cortex-M0 cores and a dozen independent power domains, allowing engineers to experiment with a wide variety of near- and sub-threshold strategies in different combinations for different parts of different processors.
The CTO began with some warnings about designing such a chip. You have to design the level-shifters that carry signals between the power domains and the power-gating switches very carefully, he cautioned. These devices have to work very efficiently over a range of voltages so wide that it spans both above and below the logic transistors’ threshold voltages.
At an architectural level, Muller pointed out the huge difference between powering-down a core and keeping it in data-retention mode. “80 percent of our standby power went to the SRAM,” he reported. This result underlines the value of minimizing the amount of state that actually has to be retained between active periods. State costs joules.
Nor was it easy, Muller said, to manage timing using conventional timing-closure techniques. “Timing tools today assume that delay is RC-dominated. But at these voltages gate-delay dominates timing. When you try to close timing, the tools take you in the wrong direction,” Muller lamented.
Once you have an architecture, the next question is operating points. Muller presented a very informative graph (Figure 2) on the subject. As you reduce Vdd below threshold, Muller explained, the power—both dynamic and static—goes down. There is a minimum-power point at about 200 mV, below which the circuits stop working. That is the best place to operate if you are power-limited by, for example, an energy-harvesting device.
Figure 2. Below threshold voltage, dynamic energy-consumption per task continues to decrease until the circuit fails, but energy lost to leakage increases as the circuit slows down. So the energy minimum is above the lowest-power point.
But as power decreases, so does the speed. And the longer a task takes, the more static power will be burned during its execution. So the total energy per task does not continue to go down as you lower the voltage—there is actually a pronounced inflection point, and then the total energy per task starts rising as the voltage decreases below that point. In Muller’s data, the inflection point where energy per task reaches its minimum is at 400 mV.
Muller observed that the curve presents designers with a nice range of alternatives. If power is paramount, a CPU can operate at about 1 kHz at 200 mV and consume about 1 µW. If the goal is minimum energy for a fixed task, the optimum point in this experiment would be about 400 mV, yielding 100 kHz operation at around 100 µW. “And in the near-threshold region around 600 mV, you give up about half the energy savings, but design is much easier to do,” Muller concluded.
Unfortunately, reworking the design and selecting the appropriate voltage are not the end of the story. In the sub-threshold region, threshold-voltage variations caused by both process variations and aging effects can be catastrophic. Muller showed that sub-threshold circuits are so sensitive to changes in Vth that even the miniscule drift caused by Bias-Temperature Instability (BTI) in a low-power circuit with a low duty cycle can cause circuit failure in an unfortunately achievable length of time. So designers must think very carefully about variation-tolerance in their circuits.
The need for energy efficiency in IoT nodes has taken us on quite a journey, from exploiting low duty cycles to the depths of sub-threshold transistor behavior. Today some of these things are common design techniques in niche markets, and others are the stuff of PhD theses. But increasingly, as fabless companies race to cash in on the maybe-mythical promise of the IoT, all of these options—presented in a range of processes at widely different geometries–will form a continuum from which designers can select their operating point. That freedom for chip designers will have its own implications for system designers, as they face chip specifications unlike anything they’ve seen before, but also system-level opportunities that used to lie way outside the envelope. Nor, in the long term, will near- and sub-threshold techniques remain in the capillary parts of the IoT. They will be coming to a design mainstream near you.