Abstract
Despite progress in comment-aware multimodal and multilingual summarization for English and Chinese, research in Indian languages remains limited. This study addresses this gap by introducing COSMMIC, a pioneering comment-sensitive multimodal, multilingual dataset featuring nine major Indian languages. COSMMIC comprises 4,959 article-image pairs and 24,484 reader comments, with ground-truth summaries available in all included languages. Our approach enhances summaries by integrating reader insights and feedback. We explore summarization and headline generation across four configurations: (1) using article text alone, (2) incorporating user comments, (3) utilizing images, and (4) combining text, comments, and images. To assess the dataset's effectiveness, we employ state-of-the-art language models such as LLama3 and GPT-4. We conduct a comprehensive study to evaluate different component combinations, including identifying supportive comments, filtering out noise using a dedicated comment classifier using IndicBERT, and extracting valuable insights from images with a multilingual CLIP-based classifier. This helps determine the most effective configurations for natural language generation (NLG) tasks. Unlike many existing datasets that are either text-only or lack user comments in multimodal settings, COSMMIC uniquely integrates text, images, and user feedback. This holistic approach bridges gaps in Indian language resources, advancing NLP research and fostering inclusivity.
Abstract (translated)
尽管在英语和中文的评论感知多模态和跨语言摘要研究方面已经取得了一定进展,但印地语的研究仍然有限。为了解决这一空白,本研究引入了COSMMIC,这是一个开创性的、针对九种主要印度语言的评论敏感型多模态、跨语言数据集。COSMMIC包含4,959篇文章-图像对和24,484条读者评论,并且所有纳入的语言都提供了真实的摘要作为基准。 我们的方法通过整合读者见解和反馈来增强摘要内容。我们在四种配置下探索了摘要生成和标题制作:(1)仅使用文章文本;(2)结合用户评论;(3)利用图像;以及 (4) 结合文本、评论和图像。为了评估数据集的有效性,我们采用了最先进的语言模型,如LLama3和GPT-4。 我们进行了一项全面的研究来评估不同组件组合的效果,包括识别支持性的评论,使用专门的IndicBERT评论分类器过滤噪音,以及利用多语言CLIP基分类器从图像中提取有价值的见解。这有助于确定自然语言生成(NLG)任务中最有效的配置。与许多现有数据集要么仅包含文本信息,要么在多模态场景下缺乏用户评论不同,COSMMIC独特地整合了文本、图像和用户反馈。 这种全面的方法填补了印度语言资源的空白,推进了NLP研究,并促进了包容性发展。
URL
https://arxiv.org/abs/2506.15372