Abstract
In the era of rapid evolution of generative language models within the realm of natural language processing, there is an imperative call to revisit and reformulate evaluation methodologies, especially in the domain of aspect-based sentiment analysis (ABSA). This paper addresses the emerging challenges introduced by the generative paradigm, which has moderately blurred traditional boundaries between understanding and generation tasks. Building upon prevailing practices in the field, we analyze the advantages and shortcomings associated with the prevalent ABSA evaluation paradigms. Through an in-depth examination, supplemented by illustrative examples, we highlight the intricacies involved in aligning generative outputs with other evaluative metrics, specifically those derived from other tasks, including question answering. While we steer clear of advocating for a singular and definitive metric, our contribution lies in paving the path for a comprehensive guideline tailored for ABSA evaluations in this generative paradigm. In this position paper, we aim to provide practitioners with profound reflections, offering insights and directions that can aid in navigating this evolving landscape, ensuring evaluations that are both accurate and reflective of generative capabilities.
Abstract (translated)
在自然语言处理领域中,生成式语言模型的发展带来了对评估方法论的重新审视和重新定义的必要性,特别是在基于方面的情感分析(ABSA)领域。本文探讨了生成范式带来的新兴挑战,该范式在理解和生成任务的边界上稍微模糊了传统的界限。我们在领域内现有的实践基础上,分析了我们所使用的普遍评估范式中的优势和不足。通过深入的分析和示例,我们强调了将生成输出与来自其他任务的评估指标相协调的复杂性。虽然我们不主张为一个单一和确定的指标,但我们的贡献在于为这个生成范式中的ABSA评估提供全面的指导方针。在这种情况下,我们的目标是为实践者提供深刻的思考,提供有益的见解和方向,以帮助 navigate这个不断发展的领域,并确保评估既准确又反映生成能力。
URL
https://arxiv.org/abs/2404.11539