The results were presented at the Linley Spring Processor Conference by Vinay Mehta, Flex Logix’s inference technical marketing manager.
The InferX X1 has a very small die size: 1/7th the area of Nvidia’s Xavier NX and 1/11th the area of Nvidia’s Tesla T4. Despite being so much smaller, the InferX X1 has latency for YOLOv3, an open source model that many customers plan to use, similar to Xavier NX. On two real customer models, InferX X1 was much faster, as much as 10x faster in one case, says Flex Logix.
In terms of price/performance as measured by streaming throughput divided by die size, InferX X1 is 2-10x better than Tesla T4 and 10-30x better than Xavier NX, according to the company.
“Customers expect that they can use performance on ResNet-50 to compare alternatives. These benchmarks demonstrate that the relative performance of an inference accelerator on one model does not apply to all models,” said Geoff Tate, CEO and co-founder of Flex Logix. “Customers should really be asking each vendor they evaluate to benchmark the model that they will use to find out the performance they will experience. We are doing this for customers now and welcome more – we can benchmark any neural network model in TensorFlow Lite or ONNX.”
The InferX X1 edge inference co-processor is based on Flex Logix’s nnMAX architecture integrating 4 tiles for 4K MACs and 8MB L2 SRAM. InferX X1 connects to a single x32 LPDDR4 DRAM. Four lanes of PCIe Gen3 connect to the host processor; a x32 GPIO link is available for hosts without PCIe. Two X1’s can work together to increase throughput up to 2x. The InferX X1 is completing final design checks and will tape-out soon with sampling expected in Q3 2020 as a chip and as a PCIe board.
Flex Logix - https://flex-logix.com