Paper Reading AI Learner

Image Coding for Machines with Edge Information Learning Using Segment Anything

2024-03-07 03:07:59
Takahiro Shindo, Kein Yamada, Taiju Watanabe, Hiroshi Watanabe

Abstract

Image Coding for Machines (ICM) is an image compression technique for image recognition. This technique is essential due to the growing demand for image recognition AI. In this paper, we propose a method for ICM that focuses on encoding and decoding only the edge information of object parts in an image, which we call SA-ICM. This is an Learned Image Compression (LIC) model trained using edge information created by Segment Anything. Our method can be used for image recognition models with various tasks. SA-ICM is also robust to changes in input data, making it effective for a variety of use cases. Additionally, our method provides benefits from a privacy point of view, as it removes human facial information on the encoder's side, thus protecting one's privacy. Furthermore, this LIC model training method can be used to train Neural Representations for Videos (NeRV), which is a video compression model. By training NeRV using edge information created by Segment Anything, it is possible to create a NeRV that is effective for image recognition (SA-NeRV). Experimental results confirm the advantages of SA-ICM, presenting the best performance in image compression for image recognition. We also show that SA-NeRV is superior to ordinary NeRV in video compression for machines.

Abstract (translated)

图像编码(ICM)是一种图像压缩技术,用于图像识别。由于图像识别人工智能(AI)的需求不断增长,ICM技术在图像识别中具有重要作用。在本文中,我们提出了一个专注于对图像中物体部分边缘信息的编码和解码的方法,我们称之为SA-ICM。这是我们使用Segment Anything生成的边缘信息训练的学习图像压缩(LIC)模型。我们的方法可以应用于各种图像识别任务模型。SA-ICM对输入数据的变化也非常鲁棒,因此适用于各种用例。此外,从隐私角度来看,我们的方法移除了编码器侧的人脸信息,从而保护个人隐私。此外,通过使用Segment Anything生成的边缘信息训练LIC模型,我们还可以用于训练Neural Representations for Videos(NeRV),这是一种视频压缩模型。通过训练NeRV使用Segment Anything生成的边缘信息,可以创建一个有效的NeRV用于图像识别(SA-NeRV)。实验结果证实了SA-ICM的优越性,在图像压缩方面取得了最佳性能。我们还证明了SA-NeRV在机器对视频压缩方面的优势。

URL

https://arxiv.org/abs/2403.04173

PDF

https://arxiv.org/pdf/2403.04173.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot