Over the last six months, there has been a rapid influx of new inference chip announcements. As each new chip has been launched, the only indicator of performance given by the vendor has usually been TOPS. Unfortunately TOPS doesn’t measure throughput for a given model, which is really the only way to measure an inference chip’s performance. In contrast, TOPS simply provides the potential peak performance (TOPS = trillions of operations per seconds = the number of MACs in the chip times the frequency of MAC operations x 2).
As a result, when a benchmark is given by an inference chip vendor, there is typically just one and it is almost always ResNet-50, usually omitting batch size! ResNet-50 is a classification benchmark that uses images of 224 pixels x 224 pixels, and performance is typically measured with INT8 operation. However, ResNet-50 is a very misleading benchmark for megapixel images because all models that process megapixel images use memory very differently than the tiny model used in ResNet-50’s 224x224. This article will explain why and highlight the more accurate way to benchmark megapixel images used in inference.
Batch Size Matters
When ResNet-50 throughput is quoted, it is very common that batch size is not mentioned, even though batch size has a huge effect on throughput! For example, for the Nvidia Tesla T4, ResNet-50 throughput at batch = 1 is 25% of what it is at batch = 32. This represents a 4x difference. When batch size is not specified, customers can assume that it was measured at a large batch size that maximized throughput.
Even more interesting is that no customer actually uses ResNet-50 in any real world applications. Customers have image sensors that give megapixel images and they want to do real time object detection and recognition on them using YOLOv2 or YOLOv3. Larger images give higher prediction accuracy, and more challenging models give higher prediction accuracy.