Wave Computing: A Coarse Grain Reconfigurable Array (CGRA) for Statically Scheduled Data Flow Computing

May 03, 2017 // By Chris Nicol
This paper proposes the use of coarse grained reconfigurable array (CGRA) architectures for the acceleration of dataflow computations used in deep neural network training and inferencing. The paper discusses the problems with other parallel acceleration systems such as massively parallel processor arrays (MPPAs) and heterogeneous systems based on CUDA and OpenCL, and proposes that CGRAs with autonomous computing features deliver improved performance and computational efficiency. This paper describes the tools needed for efficient compilation of datafloow graphs to the CGRA architecture, and outlines Wave Computing’s WaveFlow software (SW) framework for the online mapping of models from popular workflows such as Tensorflow, MXNet and Caffe.
asynchronous, clock-less logic, self-timed, TensorFlow