Aerial Scene Understanding in The Wild: Multi-Scene Recognition via Prototype-based Memory Networks

2021-04-22 17:32:14

Yuansheng Hua, Lichao Moua, Jianzhe Lin, Konrad Heidler, Xiao Xiang Zhu

arXiv_CV

arXiv_CV Recognition Memory_Networks Attention Prediction Pose

Abstract
Abstract (translated)
URL
PDF

Abstract

Aerial scene recognition is a fundamental visual task and has attracted an increasing research interest in the last few years. Most of current researches mainly deploy efforts to categorize an aerial image into one scene-level label, while in real-world scenarios, there often exist multiple scenes in a single image. Therefore, in this paper, we propose to take a step forward to a more practical and challenging task, namely multi-scene recognition in single images. Moreover, we note that manually yielding annotations for such a task is extraordinarily time- and labor-consuming. To address this, we propose a prototype-based memory network to recognize multiple scenes in a single image by leveraging massive well-annotated single-scene images. The proposed network consists of three key components: 1) a prototype learning module, 2) a prototype-inhabiting external memory, and 3) a multi-head attention-based memory retrieval module. To be more specific, we first learn the prototype representation of each aerial scene from single-scene aerial image datasets and store it in an external memory. Afterwards, a multi-head attention-based memory retrieval module is devised to retrieve scene prototypes relevant to query multi-scene images for final predictions. Notably, only a limited number of annotated multi-scene images are needed in the training phase. To facilitate the progress of aerial scene recognition, we produce a new multi-scene aerial image (MAI) dataset. Experimental results on variant dataset configurations demonstrate the effectiveness of our network. Our dataset and codes are publicly available.

Abstract (translated)

URL

https://arxiv.org/abs/2104.11200

PDF

https://arxiv.org/pdf/2104.11200.pdf