TI’s multicore DSPs sink to new power lows
Texas Instruments Inc. (TI) was showing off its highest performing, lowest power multicore digital signal processors (DSP), based on its TMS320C66x DSP generation at Supercomputing 2011 on Monday (November 14).
TI said its C66x KeyStone-based multicore DSPs were its highest performing yet, at 16 GFLOPs/W, while also being fairly straightforward to program, making them well suited to researchers working in oil and gas exploration, financial modeling and molecular dynamics.
“We have a really good power/performance profile,” said Arnon Friedmann, TI’s business manager of Multicore DSP, adding that the firm also offered a plethora of connectivity options which weren’t widely available in the HPC space.
“What we’re offering is already production level, we know it’s viable and it gives customers more teraflops, using less power in less space,” he said.
TI’s DSP constantly converts signals from analog to digital, manipulates them digitally, and then converts them back again to analog form. Aimed squarely at the HPC accelerator market, the offerings can also act as media accelerators, and are widely used in platforms for medical imaging, avionics, military defense systems and more.
TI also provides free optimized libraries for HPC, which the firm claims makes it easier to achieve maximum performance without having to spend a lot of time optimizing code. The DSPs also support standard programming languages like C and OpenMP, which are open, unlike Nvidia’s proprietary CUDA model.
Indeed, TI’s programming model is a lot more similar to Intel’s method for programming multiple cores.
The firm said the method provided “an exceptional level of density and integration” as well as a number of options for high speed, low latency and socket to socket interconnect.
“We’ve learned these lessons from having sold millions of DSPs into the market already,” said Friedmann, who also discussed TI’s Smart Reflex feature, allowing power to be modulated to the chip, in order to get exactly the amount needed at the right time.
TI has teamed up with telecom computing blade maker Advantech to develop its DSPC-8681 multimedia processing engine (MPE), which consists of a half-length PCIe card with more than 500 GFLOPs of performance, using just 50W of power. The firms also plan to come out with full length cards which aim to deliver one and two teraflops of performance.
The DSPC-8681 PCIe card includes four C6678 multicore DSPs, while the upcoming PCIe cards will include eight C6678 multicore DSPs to achieve one teraflop. Four TCI6609 multicore DSPs will achieve two teraflops.
Ti says the eight 1.25-GHz DSP cores will be able to deliver 160 GFLOPs at 10W. TI’s forthcoming TCIC6609 multicore DSP will offer developers 4X the performance of its C6678 multicore DSP, achieving 512 GFLOPs in just 32W.
The TCIC6609 is set to sample in 2012.
Meanwhile, TI has also announced that the University of Texas at Austin has ported its libflame algebra library for scientific computing to the firm’s TMS320C6678 multicore DSP, showing how easily it can be done.