Paper Reading AI Learner

Image Generative Semantic Communication with Multi-Modal Similarity Estimation for Resource-Limited Networks

2024-04-17 11:42:39
Eri Hosonuma, Taku Yamazaki, Takumi Miyoshi, Akihito Taya, Yuuki Nishiyama, Kaoru Sezaki

Abstract

To reduce network traffic and support environments with limited resources, a method for transmitting images with low amounts of transmission data is required. Machine learning-based image compression methods, which compress the data size of images while maintaining their features, have been proposed. However, in certain situations, reconstructing a part of semantic information of images at the receiver end may be sufficient. To realize this concept, semantic-information-based communication, called semantic communication, has been proposed, along with an image transmission method using semantic communication. This method transmits only the semantic information of an image, and the receiver reconstructs the image using an image-generation model. This method utilizes one type of semantic information, but reconstructing images similar to the original image using only it is challenging. This study proposes a multi-modal image transmission method that leverages diverse semantic information for efficient semantic communication. The proposed method extracts multi-modal semantic information from an image and transmits only it. Subsequently, the receiver generates multiple images using an image-generation model and selects an output based on semantic similarity. The receiver must select the output based only on the received features; however, evaluating semantic similarity using conventional metrics is challenging. Therefore, this study explored new metrics to evaluate the similarity between semantic features of images and proposes two scoring procedures. The results indicate that the proposed procedures can compare semantic similarities, such as position and composition, between semantic features of the original and generated images. Thus, the proposed method can facilitate the transmission and utilization of photographs through mobile networks for various service applications.

Abstract (translated)

为了减少网络流量并支持资源有限的生态环境,需要一种能够传输图像且传输数据量较少的图像传输方法。基于机器学习的图像压缩方法,在压缩图像数据大小的同时保持其特征,已经被提出。然而,在某些情况下,在接收端重构图像的部分语义信息可能已经足够。为了实现这一概念,提出了基于语义信息的有向通信(称为语义通信)以及使用语义通信的图像传输方法。这种方法仅传输图像的语义信息,接收端使用图像生成模型重构图像。这种方法利用了一种类型的语义信息,但仅基于它重构类似于原始图像的图像具有挑战性。本研究提出了一个多模态图像传输方法,利用多样语义信息进行高效的语义通信。所提出的方法从图像中提取多模态语义信息并仅传输它。随后,接收端使用图像生成模型生成多个图像,并根据语义相似度选择一个输出。接收端只能基于接收到的特征选择输出;然而,使用传统指标评估语义相似度具有挑战性。因此,本研究探索了新的指标来评估图像语义特征之间的相似性,并提出了两个评分程序。结果显示,所提出的程序可以比较原始和生成图像的语义特征之间的相似性,如位置和构图。因此,所提出的方法可以为移动网络提供照片传输和各种服务应用程序的便利。

URL

https://arxiv.org/abs/2404.11280

PDF

https://arxiv.org/pdf/2404.11280.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot