Paper Reading AI Learner

Is Peer Review Really in Decline? Analyzing Review Quality across Venues and Time

2026-01-21 16:48:29
Ilia Kuznetsov, Rohan Nayak, Alla Rozovskaya, Iryna Gurevych

Abstract

Peer review is at the heart of modern science. As submission numbers rise and research communities grow, the decline in review quality is a popular narrative and a common concern. Yet, is it true? Review quality is difficult to measure, and the ongoing evolution of reviewing practices makes it hard to compare reviews across venues and time. To address this, we introduce a new framework for evidence-based comparative study of review quality and apply it to major AI and machine learning conferences: ICLR, NeurIPS and *ACL. We document the diversity of review formats and introduce a new approach to review standardization. We propose a multi-dimensional schema for quantifying review quality as utility to editors and authors, coupled with both LLM-based and lightweight measurements. We study the relationships between measurements of review quality, and its evolution over time. Contradicting the popular narrative, our cross-temporal analysis reveals no consistent decline in median review quality across venues and years. We propose alternative explanations, and outline recommendations to facilitate future empirical studies of review quality.

Abstract (translated)

同行评审是现代科学的核心。随着提交的数量增加和研究社区的扩大,关于评审质量下降的说法变得流行且广泛担忧。然而,这种说法是否属实呢?评审质量难以衡量,并且审稿实践的持续演变使得跨平台和时间点比较评审变得困难。为了解决这一问题,我们引入了一个新的基于证据的研究评审质量的框架,并将其应用于主要的人工智能与机器学习会议:ICLR、NeurIPS 和 *ACL。我们记录了不同形式的评审多样性,并提出了一种新的评审标准化方法。我们还提出了一个多维度模式来量化评审质量作为对编辑和作者的效用,结合了LLM(大型语言模型)和其他轻量级测量方式。我们研究了评审质量度量之间的关系及其随时间的变化趋势。与流行说法相反,我们的跨时间段分析发现,各会议在不同年份的中位数评审质量没有持续下降的趋势。我们提出了替代解释,并概述了未来关于评审质量实证研究的建议。

URL

https://arxiv.org/abs/2601.15172

PDF

https://arxiv.org/pdf/2601.15172.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot