Abstract
The proliferation of social media has led to information overload and increased interest in opinion mining. We propose "Question-Answering Network Analysis" (QANA), a novel opinion mining framework that utilizes Large Language Models (LLMs) to generate questions from users' comments, constructs a bipartite graph based on the comments' answerability to the questions, and applies centrality measures to examine the importance of opinions. We investigate the impact of question generation styles, LLM selections, and the choice of embedding model on the quality of the constructed QA networks by comparing them with annotated Key Point Analysis datasets. QANA achieves comparable performance to previous state-of-the-art supervised models in a zero-shot manner for Key Point Matching task, also reducing the computational cost from quadratic to linear. For Key Point Generation, questions with high PageRank or degree centrality align well with manually annotated key points. Notably, QANA enables analysts to assess the importance of key points from various aspects according to their selection of centrality measure. QANA's primary contribution lies in its flexibility to extract key points from a wide range of perspectives, which enhances the quality and impartiality of opinion mining.
Abstract (translated)
社交媒体的普及导致信息过载和意见挖掘兴趣的增加。我们提出了“问题-回答网络分析”(QANA),一种新颖的意见挖掘框架,利用大型语言模型(LLMs)生成用户评论,基于评论对问题的回答能力构建二分图,并应用中心度度量来研究意见的重要性。我们研究了问题生成方式、LLM选择和嵌入模型的选择对构建的QA网络质量的影响,将它们与注释的Key Point Analysis数据集进行比较。QANA在零散射击情况下实现了与先前最先进的监督模型相媲美的性能,并且从二次方降到线性。对于Key Point生成,具有高PageRank或度中心性的问题与手动注释的关键点对齐得很好。值得注意的是,QANA使分析员根据其选择的中心度度量从各种角度评估关键点的重要性。QANA的主要贡献在于其从各种角度提取关键点的灵活性,这提高了意见挖掘的质量和平等性。
URL
https://arxiv.org/abs/2404.18371