Abstract
The key to device-edge co-inference paradigm is to partition models into computation-friendly and computation-intensive parts across the device and the edge, respectively. However, for Graph Neural Networks (GNNs), we find that simply partitioning without altering their structures can hardly achieve the full potential of the co-inference paradigm due to various computational-communication overheads of GNN operations over heterogeneous devices. We present GCoDE, the first automatic framework for GNN that innovatively Co-designs the architecture search and the mapping of each operation on Device-Edge hierarchies. GCoDE abstracts the device communication process into an explicit operation and fuses the search of architecture and the operations mapping in a unified space for joint-optimization. Also, the performance-awareness approach, utilized in the constraint-based search process of GCoDE, enables effective evaluation of architecture efficiency in diverse heterogeneous systems. We implement the co-inference engine and runtime dispatcher in GCoDE to enhance the deployment efficiency. Experimental results show that GCoDE can achieve up to $44.9\times$ speedup and $98.2\%$ energy reduction compared to existing approaches across various applications and system configurations.
Abstract (translated)
设备边缘协同推理范式的关键在于将模型在设备和边缘之间 partition成计算友好和计算密集的部分。然而,对于图神经网络(GNNs),我们发现,仅仅通过分割模型而不改变其结构,很难实现协同推理范式的全部潜力,因为GNN操作在异构设备上的计算通信开销 various。我们提出了GCoDE,第一个自动框架,创新地协同设计GNN的架构搜索和每个操作在设备-边缘层次结构上的映射。GCoDE将设备通信过程抽象为一个显式操作,并将搜索架构和操作映射统一到一个联合优化的空间。此外,GCoDE使用的性能感知方法使得在各种异构系统中的架构效率有效评估。我们在GCoDE中实现了协同推理引擎和运行时调度器,以提高部署效率。实验结果表明,GCoDE可以在各种应用和系统配置上实现最高速度up至44.9倍,能量减少至98.2%。
URL
https://arxiv.org/abs/2404.05605