Abstract
In this paper, we propose a multi-task convolutional neural network (CNN) architecture optimized for a low power automotive grade SoC. We introduce a network based on a unified architecture where the encoder is shared among the two tasks namely detection and segmentation. The pro-posed network runs at 25FPS for 1280x800 resolution. We briefly discuss the methods used to optimize the network architecture such as using native YUV image directly, optimization of layers & feature maps and applying quantization. We also focus on memory bandwidth in our design as convolutions are data intensives and most SOCs are bandwidth bottlenecked. We then demonstrate the efficiency of our proposed network for a dedicated CNN accelerators presenting the key performance indicators (KPI) for the detection and segmentation tasks obtained from the hardware execution and the corresponding run-time.
Abstract (translated)
本文提出了一种针对低功耗汽车级SOC的多任务卷积神经网络(CNN)结构。我们介绍了一种基于统一体系结构的网络,其中编码器在检测和分割这两个任务之间共享。pro-posited网络以25fps的速度运行,分辨率为1280x800。本文简要讨论了网络结构优化的方法,如直接使用本机YUV图像、优化层和特征图以及应用量化等。在我们的设计中,我们也关注内存带宽,因为卷积是数据强度,而大多数SOC都是带宽瓶颈。然后,我们展示了我们针对专用CNN加速器的提议网络的效率,展示了从硬件执行和相应运行时获得的检测和分段任务的关键性能指标(KPI)。
URL
https://arxiv.org/abs/1904.05673