The case for integrating FPGA fabrics with CPU architectures

August 28, 2017 //By Alok Sanghavi
The case for integrating FPGA fabrics with CPU architectures
Physics restricts how much further process geometry shrinkage can take us in terms of boosting processor throughput.

As this happens, designers are debating how they can approach designs in such a way that they’re not relying on packing yet-more transistors onto a chip to achieve speed increases. One of the biggest innovations in this industry is going to come from a fundamental reapplication of a technology that has been known and understood for some time; enter the humble FPGA.

FPGAs started life as discrete devices in 1984. At this time Xilinx and Actel started to introduce products primarily used in low-volume industrial applications and prototyping as useful ‘band aid’ to patch holes in system logic.  Altera Lucent and Agere drove FPGAs into networking and telco applications. Subsequent process shrinkage, reduction in mask costs, and the integration of SRAM blocks, large MACs, sophisticated configurable I/O and banks of SerDes marked a growth in FPGAs from 1995. Over the last 10 years FPGAs have continued to proliferate and prices have fallen to the point that FPGAs are adding significant value even to high-volume applications in functions previously associated only with DSPs, GPUs, and MCUs.

Latterly, with low-cost, low-power FPGAs we’re now arguably entering the ‘third age’ of FPGA development: Architects now integrate FPGAs into data centre systems for example as hardware accelerators providing packet inspection, database acceleration and security, machine learning and software defined networking.

 

Blending FPGA fabrics into CPUs

This idea has historically met with resistance. SoC developers were initially concerned about size, speed and cost but this just isnt the case any more. FPGAs have made order-of-magnitude advancements in every term. Nor is it the case that CPUs have advanced to the point where they can, reasonably, take on the required processing loads.

 

The basics

Every modern CPU you’ll ever encounter will essentially employ the load-store/modified Harvard architecture, wherein ‘instructions’ and ‘data’ are stored seperately and transmitted along differing signal pathways. Instruction sets will be communicated on the control plane. This describes how data will be acted upon, as well as administering ‘housekeeping’ for the overall system.

The constant need to continually load (often highly complex) instructions and store the resulting data, the need to keep a number of different hard-wired fabrics ‘on stand by’ to act on data within the same chip, and the need to continually switch context (every 100 cycles or so) to carry out different tasks, makes the humble CPU relatively inefficient at handling complex yet largely consistent data-plane operations.


Vous êtes certain ?

Si vous désactivez les cookies, vous ne pouvez plus naviguer sur le site.

Vous allez être rediriger vers Google.