Paper Reading AI Learner

COSMMIC: Comment-Sensitive Multimodal Multilingual Indian Corpus for Summarization and Headline Generation

2025-06-18 11:38:23
Raghvendra Kumar, S. A. Mohammed Salman, Aryan Sahu, Tridib Nandi, Pragathi Y. P., Sriparna Saha, Jose G. Moreno

Abstract

Despite progress in comment-aware multimodal and multilingual summarization for English and Chinese, research in Indian languages remains limited. This study addresses this gap by introducing COSMMIC, a pioneering comment-sensitive multimodal, multilingual dataset featuring nine major Indian languages. COSMMIC comprises 4,959 article-image pairs and 24,484 reader comments, with ground-truth summaries available in all included languages. Our approach enhances summaries by integrating reader insights and feedback. We explore summarization and headline generation across four configurations: (1) using article text alone, (2) incorporating user comments, (3) utilizing images, and (4) combining text, comments, and images. To assess the dataset's effectiveness, we employ state-of-the-art language models such as LLama3 and GPT-4. We conduct a comprehensive study to evaluate different component combinations, including identifying supportive comments, filtering out noise using a dedicated comment classifier using IndicBERT, and extracting valuable insights from images with a multilingual CLIP-based classifier. This helps determine the most effective configurations for natural language generation (NLG) tasks. Unlike many existing datasets that are either text-only or lack user comments in multimodal settings, COSMMIC uniquely integrates text, images, and user feedback. This holistic approach bridges gaps in Indian language resources, advancing NLP research and fostering inclusivity.

Abstract (translated)

尽管在英语和中文的评论感知多模态和跨语言摘要研究方面已经取得了一定进展,但印地语的研究仍然有限。为了解决这一空白,本研究引入了COSMMIC,这是一个开创性的、针对九种主要印度语言的评论敏感型多模态、跨语言数据集。COSMMIC包含4,959篇文章-图像对和24,484条读者评论,并且所有纳入的语言都提供了真实的摘要作为基准。 我们的方法通过整合读者见解和反馈来增强摘要内容。我们在四种配置下探索了摘要生成和标题制作:(1)仅使用文章文本;(2)结合用户评论;(3)利用图像;以及 (4) 结合文本、评论和图像。为了评估数据集的有效性,我们采用了最先进的语言模型,如LLama3和GPT-4。 我们进行了一项全面的研究来评估不同组件组合的效果,包括识别支持性的评论,使用专门的IndicBERT评论分类器过滤噪音,以及利用多语言CLIP基分类器从图像中提取有价值的见解。这有助于确定自然语言生成(NLG)任务中最有效的配置。与许多现有数据集要么仅包含文本信息,要么在多模态场景下缺乏用户评论不同,COSMMIC独特地整合了文本、图像和用户反馈。 这种全面的方法填补了印度语言资源的空白,推进了NLP研究,并促进了包容性发展。

URL

https://arxiv.org/abs/2506.15372

PDF

https://arxiv.org/pdf/2506.15372.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot