
Hard wired floating point changes FPGA
Previously floating point DSP designs such as radar or pattern matching had to be converted to fixed point, taking up the DSP blocks and up to 700 logic elements per block. By adding a layer of hardened multipliers and adders to the existing DSP blocks in the architecture, and tweaking the interconnect, designs can be implemented direct from the C output of tools such as OPenCL and Simulink directly in the DSP blocks without additional logic usage, says Altera architect Martin Langhammer. This also allows FPGAs to be used to run more high performance computing algorithms that currently use GPU arrays.
The IEEE 754-compliant hardened blocks are shipping in 20nm Arria 10 FPGAs but the software to use them will not be available for a few more months. The blocks deliver 1.5TFLOPS of processing the Arria devices, and will provide over 10TFLOPS in the next generation Stratix 10 devices. This comes from process improvements to 14nm, larger devices and architectural changes, he says.
“I had to find a way to do it for close to free for it to work, so the floating point multiplier is overlaid on the 27 x 27 fixed point multipliers,” said Langhammer. “The adder is more complex as it’s a separate structure – I had to design it to target the technology library, the maximum frequency and the routing that was available in the DSP block as well as the block aspect ratio of the block. The adder not only had to fit into the spare space in the block but as routing is more expensive than the logic I had to find a way to reuse all that wiring for the floating point adder.”
“In this way we were able to get floating point on the DSP block – it’s cheap enough to put on every device and doesn’t affect the fixed point power or performance,” he said. “This opens up a new market for people that want to do floating point.”
The other element is the design of the floating point units to support large matrix operations. “We designed hardware recursive structures for the vector mode that greatly reduces the latency to seamlessly combine thousands of operators, reusing the existing fixed point routing,” he said. The structures are self timed and self aligned, with timing adjustment registers that avoid problems with data dependencies but minimize the latency for large vector calculations.
“The two big innovations are that we got floating point on cost effectively and the new vector structures that allow us to put together all the blocks however you want,” he said.
The integration of hardened floating-point DSP blocks in Altera FPGAs and SoCs can reduce development time by upwards of 12 months. Designers can translate their DSP designs directly into floating-point hardware, rather than converting their designs to fixed point. As a result, timing closure and verification times are cut, which will benefit the move to the larger Stratix 10 devices, says Langhammer.
DSP Builder Advanced Blockset offers a model-based design flow that allows designers to go from system definition and simulation to system implementation using the industry-standard MathWorks Simulink tools. Altera also offers a publicly available C-based, high-level OpenCL design flow that targets FPGAs.
“The implementation of IEEE 754-compliant floating-point DSP blocks in our devices is truly a game-changer for FPGAs,” said Alex Grbic, director of software, IP and DSP marketing at Altera. “With hardened floating point, Altera FPGAs and SoCs offer a performance and power efficiency advantage over microprocessors and GPUs in an expanded range of applications.”
The implementation has been independently verified by BTDI for minimal impact on power consumption and utilization, says Langhammer, and the architecture will be published in peer reviewed journals later in the year. The 20 nm Arria 10 FPGAs with hardened floating-point DSP blocks are available now. Floating-point design flows, including demonstrations and benchmarks, that target the hardened floating-point DSP blocks in Arria 10 devices will be available in the second half of 2014. The 14nm Stratix 10 devices with the floating point DSP blocks are due in 2015.
Related stories:
Five new graphics technologies at Nvidia
TI to put floating point in every DSP core
A tradeoff between microcontroller, DSP, FPGA and ASIC technologies
French startup takes on FPGAs with multicore DSP chip
