Vision + Language Applications: A Survey

2023-05-24 00:42:06

Yutong Zhou, Nobutaka Shimada

arXiv_CV

arXiv_CV Survey

Abstract
Abstract (translated)
URL
PDF

Abstract

Text-to-image generation has attracted significant interest from researchers and practitioners in recent years due to its widespread and diverse applications across various industries. Despite the progress made in the domain of vision and language research, the existing literature remains relatively limited, particularly with regard to advancements and applications in this field. This paper explores a relevant research track within multimodal applications, including text, vision, audio, and others. In addition to the studies discussed in this paper, we are also committed to continually updating the latest relevant papers, datasets, application projects and corresponding information at this https URL

Abstract (translated)

文本到图像生成在过去几年中吸引了研究人员和实践中的重要兴趣，因为它在各个行业中具有广泛的应用。尽管在视觉和语言研究领域中取得了一些进展，但现有文献仍然相对有限，特别是在这个领域的的进展和应用方面。本文探讨了 multimodal 应用中的相关研究 track，包括文本、视觉、音频和其他类型的内容。除了本文中所讨论的研究外，我们还将致力于不断更新这个 https URL 上的最新相关论文、数据集、应用项目和相关信息。

URL

https://arxiv.org/abs/2305.14598

PDF

https://arxiv.org/pdf/2305.14598.pdf