Image Generative Semantic Communication with Multi-Modal Similarity Estimation for Resource-Limited Networks

Abstract
Abstract (translated)
URL
PDF

Abstract

To reduce network traffic and support environments with limited resources, a method for transmitting images with low amounts of transmission data is required. Machine learning-based image compression methods, which compress the data size of images while maintaining their features, have been proposed. However, in certain situations, reconstructing a part of semantic information of images at the receiver end may be sufficient. To realize this concept, semantic-information-based communication, called semantic communication, has been proposed, along with an image transmission method using semantic communication. This method transmits only the semantic information of an image, and the receiver reconstructs the image using an image-generation model. This method utilizes one type of semantic information, but reconstructing images similar to the original image using only it is challenging. This study proposes a multi-modal image transmission method that leverages diverse semantic information for efficient semantic communication. The proposed method extracts multi-modal semantic information from an image and transmits only it. Subsequently, the receiver generates multiple images using an image-generation model and selects an output based on semantic similarity. The receiver must select the output based only on the received features; however, evaluating semantic similarity using conventional metrics is challenging. Therefore, this study explored new metrics to evaluate the similarity between semantic features of images and proposes two scoring procedures. The results indicate that the proposed procedures can compare semantic similarities, such as position and composition, between semantic features of the original and generated images. Thus, the proposed method can facilitate the transmission and utilization of photographs through mobile networks for various service applications.

Abstract (translated)

为了减少网络流量并支持资源有限的生态环境，需要一种能够传输图像且传输数据量较少的图像传输方法。基于机器学习的图像压缩方法，在压缩图像数据大小的同时保持其特征，已经被提出。然而，在某些情况下，在接收端重构图像的部分语义信息可能已经足够。为了实现这一概念，提出了基于语义信息的有向通信（称为语义通信）以及使用语义通信的图像传输方法。这种方法仅传输图像的语义信息，接收端使用图像生成模型重构图像。这种方法利用了一种类型的语义信息，但仅基于它重构类似于原始图像的图像具有挑战性。本研究提出了一个多模态图像传输方法，利用多样语义信息进行高效的语义通信。所提出的方法从图像中提取多模态语义信息并仅传输它。随后，接收端使用图像生成模型生成多个图像，并根据语义相似度选择一个输出。接收端只能基于接收到的特征选择输出；然而，使用传统指标评估语义相似度具有挑战性。因此，本研究探索了新的指标来评估图像语义特征之间的相似性，并提出了两个评分程序。结果显示，所提出的程序可以比较原始和生成图像的语义特征之间的相似性，如位置和构图。因此，所提出的方法可以为移动网络提供照片传输和各种服务应用程序的便利。

URL

https://arxiv.org/abs/2404.11280

PDF

https://arxiv.org/pdf/2404.11280.pdf

Image Generative Semantic Communication with Multi-Modal Similarity Estimation for Resource-Limited Networks

Abstract

Abstract (translated)

URL

PDF Copy

PDF