Paper Reading AI Learner

FREB-TQA: A Fine-Grained Robustness Evaluation Benchmark for Table Question Answering

2024-04-29 10:55:08
Wei Zhou, Mohsen Mesgar, Heike Adel, Annemarie Friedrich
   

Abstract

Table Question Answering (TQA) aims at composing an answer to a question based on tabular data. While prior research has shown that TQA models lack robustness, understanding the underlying cause and nature of this issue remains predominantly unclear, posing a significant obstacle to the development of robust TQA systems. In this paper, we formalize three major desiderata for a fine-grained evaluation of robustness of TQA systems. They should (i) answer questions regardless of alterations in table structure, (ii) base their responses on the content of relevant cells rather than on biases, and (iii) demonstrate robust numerical reasoning capabilities. To investigate these aspects, we create and publish a novel TQA evaluation benchmark in English. Our extensive experimental analysis reveals that none of the examined state-of-the-art TQA systems consistently excels in these three aspects. Our benchmark is a crucial instrument for monitoring the behavior of TQA systems and paves the way for the development of robust TQA systems. We release our benchmark publicly.

Abstract (translated)

翻译:表问题回答(TQA)旨在根据表格数据回答问题。虽然先前的研究已经表明,TQA模型缺乏稳健性,但理解这一问题的根本原因和性质仍然存在很大不确定性,这成为发展稳健 TQA 系统的重大障碍。在本文中,我们正式提出了三个对细粒度评估 TQA 系统稳健性的主要需求。它们应该(i)回答无论表格结构如何变化的问题,(ii)基于相关单元格的内容而不是基于偏见,(iii)展示稳健的数值推理能力。为了研究这些方面,我们在英语中创建并发布了一个新颖的 TQA 评估基准。我们广泛的实验分析发现,没有考察的先进 TQA 系统在三个方面都表现不佳。我们的基准是监控 TQA 系统行为的关键工具,为发展稳健 TQA 系统铺平道路。我们将基准公开发布。

URL

https://arxiv.org/abs/2404.18585

PDF

https://arxiv.org/pdf/2404.18585.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot