Paper Reading AI Learner

A Framework for Real-time Object Detection and Image Restoration

2023-03-16 10:01:12
Rui-Yang Ju, Chih-Chia Chen, Jen-Shiun Chiang, Yu-Shian Lin, Wei-Han Chen

Abstract

Object detection and single image super-resolution are classic problems in computer vision (CV). The object detection task aims to recognize the objects in input images, while the image restoration task aims to reconstruct high quality images from given low quality images. In this paper, a two-stage framework for object detection and image restoration is proposed. The first stage uses YOLO series algorithms to complete the object detection and then performs image cropping. In the second stage, this work improves Swin Transformer and uses the new proposed algorithm to connect the Swin Transformer layer to design a new neural network architecture. We name the newly proposed network for image restoration SwinOIR. This work compares the model performance of different versions of YOLO detection algorithms on MS COCO dataset and Pascal VOC dataset, demonstrating the suitability of different YOLO network models for the first stage of the framework in different scenarios. For image super-resolution task, it compares the model performance of using different methods of connecting Swin Transformer layers and design different sizes of SwinOIR for use in different life scenarios. Our implementation code is released at this https URL.

Abstract (translated)

物体检测和单张图像超分辨率是计算机视觉(CV)中的经典问题。物体检测任务旨在从输入图像中识别物体,而图像修复任务旨在从给定低质量图像中重构高质量的图像。在本文中,提出了一种物体检测和图像修复的二阶段框架。在第一阶段,使用YOLO系列算法完成物体检测,然后进行图像裁剪。在第二阶段,改进了 Swin Transformer,并使用新提出的算法连接 Swin Transformer 层来设计新的神经网络架构。我们提出了名为 SwinOIR 的图像修复网络。该工作对不同版本的YOLO检测算法在MS COCO数据和Pascal VOC数据集上的性能进行了比较,证明了不同YOLO网络模型在不同场景下的适用性。对于图像超分辨率任务,该工作比较了使用不同连接 Swin Transformer 层的方法以及设计不同大小的 SwinOIR 用于不同生命场景的性能。我们的实现代码在此httpsURL上发布。

URL

https://arxiv.org/abs/2303.09190

PDF

https://arxiv.org/pdf/2303.09190.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot