From Glue Logic to Subsystem: Altera’s Second Decade

The year was 1994. The U.S. Space Shuttle fleet was in regular service. The world watched in fascination as comet Shoemaker-Levy 9 broke up and crashed into Jupiter’s atmosphere. The Channel Tunnel realized a centuries-old dream, connecting England and France. Achieving the undreamt-of, the Provisional Republican Army ceased military hostilities in Northern Ireland. In the electronics industry, Intel’s recently-announced Pentium processor overcame the interruption of the famous Pentium Bug and dominated personal computing. And Altera began its second decade.

From beginnings as simple logic chips, Programmable Logic Devices (PLDs) had in ten years ridden Moore’s Law to far greater complexity—logic capacities exceeding the equivalent of 10K gate-array gates. Along the way, PLDs had divided into two architectural camps: Complex PLDs (CPLDs) based on the Boolean sum-of-products and programmed via EEPROM cells, and FPGAs, based on look-up tables (LUTs) implemented as tiny SRAMs.

The two architectures had begun to specialize. CPLDs (Figure 1) were preferred for circuits that had high fan-in or rigorous timing constraints. FPGAs favored designs that were rich in registers. A less reliable generalization held that CPLDs were more friendly to designers who thought in terms of combinatorial blocks and state machines—by 1994, the old school. FPGAs favored the younger wave of designers who visualized logic as text: Verilog or VHDL.

From Glue Logic to Subsystem: Altera’s Second Decade

Figure 1. CPLDs Architecture

CPLDs Architectur

The two architectures shared one trend: as the logic capacity of the chips increased, PLD applications began to change. When CPLDs had been limited to the equivalent of a few thousand logic gates, designers used the chips mainly to implement simple functions next to a central microprocessor or microcontroller (MCU) chip. The smaller devices served as address decoders, interrupt controllers, or bus extenders. Larger devices might be intelligent direct-memory-access controllers of fast state machines, handling some real-time operation for which software was too slow.

But 10K gates opened up new possibilities. A designer could build an 8 bit MCU in about 3K gates, or a simple 16 bit CPU core like the already-ancient 8086 in just over twice that many gates. With 10K-20K gates, designers could build small CPU-based subsystems, such as a bus interface and programmable serial-interface controller, all inside the PLD. Such a design would be slower and less energy-efficient than a standard-product non-PLD solution, but it would be completely user-defined and reprogrammable. Notably, most such subsystem designs were register-rich, and many of them needed many kilobytes (KB) of internal SRAM.

Altera Moves Towards FPGAs

In this environment, Altera designers concluded that the market needed a new kind of product. Like FPGAs, it should employ fine-grained logic elements, each containing a LUT and a register. But like CPLDs, it should use a hierarchy of deterministic interconnect, to keep timing predictable and simple. And Altera added a third element, borrowed from certain types of gate arrays. This new element was a set of SRAM blocks embedded in the architecture, designed to serve either as buffer or scratchpad memory, as register files, or as look-up tables for complex function generators. The result of this thinking was, in 1995, Altera’s first FPGA-like family, FLEX® 10K device (Figure 2).

Figure 2. FLEX 10K Device Block Diagram

FLEX 10K Device Block Diagram

In another break from company tradition, Altera engineers used SRAM cells—rather than their trusted EEPROM technology—to hold the FLEX chip’s configuration data. This choice meant that FLEX, like SRAM-based FPGAs, had to be configured from an external memory each time the system power came on. But it also meant that—without the need for space-gobbling EEPROM cells, FLEX logic elements would be smaller. And, an implication that would take a decade to bear fruit, the switch to SRAM meant that Altera could build its devices on leading-edge processes as soon as the new process was available, instead of waiting two years or more for the foundry to develop embedded EEPROM for the process. That change would eventually make FPGAs among the first designs in the semiconductor industry to use each new process node.

New Application, New Techniques

The FLEX 10K device and its successors added momentum to the PLD industry’s eternal wheel of evolution (Figure 3). New architecture made possible new applications, which demanded new tools and techniques, which increased the demand for even newer architectures.

Figure 3. Circle of FPGA Evolution

Circle of FPGA Evolution

With as many as 250K gate-equivalents, up to 40K of SRAM, and the ability to implement some quite complex state machines and arithmetic functions using its SRAM blocks, the FLEX device family went far beyond traditional interface and glue applications. Users could contemplate an entire subsystem, such as an Ethernet interface, its media access controller, and a protocol off-load engine; or perhaps an entire signal-processing accelerator with its bus interface, local storage, and hardware finite-impulse-response (FIR) filter ; all on one FLEX chip. Despite Altera’s initial desire to differentiate FLEX devices from FPGAs, FPGA was becoming the generic term for large PLDs, and Altera increasingly adopted the term too.

As users began to think in subsystems, they found new needs. Subsystems often required multiple clocks, each clock synchronized to its own phase-locked loop (PLL). But discrete PLLs were expensive and ate up board space. So in 1996 Altera introduced a FLEX device with internal, programmable PLLs. The capability saved board space and could improve clock quality on the chip. But more important, it foreshadowed an enormous trend: as FPGAs became subsystems, they began to integrate blocks of commonly-used, non-PLD hardware to improve the overall system design.

Design techniques were also changing. In the late 1990s, FLEX devices with over 100K gates had simply outgrown traditional, clean-sheet-of-paper design techniques. Design teams broke their work into blocks, and tried to reuse previously-designed blocks instead of creating new ones. And teams began to license blocks from third-party intellectual-property (IP) developers. Altera began building IP libraries. And design tools adapted, adding features—such as the ability to set the parameters on a reusable IP block through a simple user interface, or the ability for a third-party vendor to provide an IP block in encrypted form—to help with IP reuse.

The added capacity brought with it another issue: debug. For small PLDs, the most common debug technique had been the smoke test: plug in the chip and turn on the power. At 100K gates, the probability that the chip would work on the first try, or that the designer could learn anything useful by watching the external pins with a logic analyzer, was about nil. The smoke test had become useless. Designers began to mimic their relatives in the ASIC world, testing their register-transfer-level (RTL) code with an RTL simulator before they synthesized it into a netlist for the FPGA.

But RTL that worked in simulation could still break in practice. Designers needed a way to observe the internal workings of the chip while it was running. With SRAM-programmed FPGAs it was possible to write ad-hoc debug circuitry into the RTL, resynthesize, and retest. But this was laborious and time-consuming. So in 1999, Altera introduced SignalTap™ logic analyzer: a hardware feature that allowed users to monitor every register in a FLEX device while the system was running.

Enter the Processors, and The Great Crash

One of the most requested IP blocks for these large FPGAs was some kind of microprocessor core. CPU core IP was commonly available, both as hardened netlists and as RTL, for ASIC designs. But CPUs had proved a challenge for FPGAs. Some structures vital to CPU cores, such as arithmetic units and multi-port register files, fit poorly into FPGA hardware. And timing closure on a CPU core’s critical paths could be difficult.

So in 2000, Altera introduced Nios®, a microprocessor core designed from the ground up, by FPGA designers, to be implemented in a FLEX device. Unlike cores intended for cell-based ASICs, the 16 bit RISC Nios processor could be small, fast—50 MHz was blazingly fast for a CPU in FPGA fabric then—and easy to implement. An ecosystem of bus interfaces, peripherals, and software quickly began to form around the core.

Then, in mid-2000, the lights went out. The dot-com bubble, which had fed not only ridiculous stock prices but a huge build-out of Internet capacity, began its catastrophic collapse, carrying away much of the demand for semiconductors as it fell. FPGA companies were particularly hard-hit, as the bubble had made network-equipment vendors dominant users of FPGAs. Along with retrenching financially, FPGA vendors became more aggressive in pursuing applications outside the networking community.

In 2001, amid the rubble of the burst bubble, Altera chose the Embedded Systems Conference to introduce a new device, destined to be more important as a harbinger than long-lived as a product. The Excalibur™ processor united an APEX™ device family FPGA fabric—an evolution of the FLEX device architecture—with a cell-based ARM922 CPU core running at up to 200 MHz. The device represented a conscious attempt to tune an FPGA for embedded applications, and a major step forward in embedding key IP blocks into FPGA hardware.

Trends in IP

As FPGAs grew and IP reuse became more established, more trends began to emerge. Certainly design teams wanted processor cores they could drop into a design. Also, they wanted cores for industry-standard interfaces. But new needs were showing up as well.

For example, starting in the bandwidth-starved communications industry, system designers were beginning to abandon parallel interfaces with separate clocks, in favor of high-speed, self-clocking serial I/O. Signals between chips on a board were beginning to resemble the signals coming from a disk read-amplifier or a satellite receiver. And the transceiver circuits on these chips were correspondingly complex mixed-signal blocks, often running at Gigahertz speeds. See Figure 4.

These transceivers were both specialized design tasks—outside the expertise of most FPGA users—and unsuited for implementation in programmable logic. Accordingly, in 2001 Altera announced the Mercury™ device family of FPGAs, with 1.25 Gbps transceivers built-in as hard IP. The blocks included both the 1.25 GHz analog drivers and receivers and the mixed-signal clock-data recovery circuits that recreated the original data from the received waveform.

A similar process was taking place in the world of signal processing, both for wireless communications and in military applications. The most critical building block in these designs, the multiply-accumulator, was particularly taxing for programmable logic. Beginning with dedicated 8×8 multiplier sub-blocks in Mercury devices, Altera moved on to embed full digital signal processing (DSP) building blocks when it introduced the Stratix® device architecture in 2002.

Figure 4. A Mercury Device Incorporated with Clock-Recovery Circuits for High-Speed Serial Interconnect

Mercury Device Incorporated with Clock-Recovery Circuits for High-Speed Serial Interconnect

An Evolving Methodology

FPGAs were growing in capacity, and increasingly they included significant blocks of hard IP, such as PLLs, debug controllers, serial transceivers, and, in some applications, CPU cores. Customers increasingly created their designs by tying together blocks of previously-designed IP. Some of these designs were application-specific accelerators, often for packet processing or signal processing, that implemented one powerful pipeline. But now another architecture was emerging as well: designs centered on a CPU core, with a system bus emanating from the CPU as the backbone upon which the other blocks hung.

This design approach emphasized selection and verification of IP, and making the right connections, over creating new subsystems in Verilog. And once again Altera responded, inventing SOPC Builder. This tool was an interactive, guided user interface for constructing CPU-based systems on an FPGA. The user indicated which blocks to assemble where, and the tool generated the necessary RTL.

Once again evolutionary change was setting the stage for revolution. PLDs had grown from glue logic and bus interface components to self-contained packet-processing, signal-processing, or CPU-based subsystems. With enough logic capacity, the right IP, and the appropriate tools, FPGAs were ready to move from their subsystem role to become the heart of the system.

CATEGORIES : All, System Architecture/ AUTHOR : Ron Wilson

Write a Reply or Comment

Your email address will not be published.