Paper Reading AI Learner

Measuring the State of Open Science in Transportation Using Large Language Models

2026-01-20 19:39:52
Junyi Ji, Ruth Lu, Linda Belkessa, Liming Wang, Silvia Varotto, Yongqi Dong, Nicolas Saunier, Mostafa Ameli, Gregory S. Macfarlane, Bahman Madadi, Cathy Wu

Abstract

Open science initiatives have strengthened scientific integrity and accelerated research progress across many fields, but the state of their practice within transportation research remains under-investigated. Key features of open science, defined here as data and code availability, are difficult to extract due to the inherent complexity of the field. Previous work has either been limited to small-scale studies due to the labor-intensive nature of manual analysis or has relied on large-scale bibliometric approaches that sacrifice contextual richness. This paper introduces an automatic and scalable feature-extraction pipeline to measure data and code availability in transportation research. We employ Large Language Models (LLMs) for this task and validate their performance against a manually curated dataset and through an inter-rater agreement analysis. We applied this pipeline to examine 10,724 research articles published in the Transportation Research Part series of journals between 2019 and 2024. Our analysis found that only 5% of quantitative papers shared a code repository, 4% of quantitative papers shared a data repository, and about 3% of papers shared both, with trends differing across journals, topics, and geographic regions. We found no significant difference in citation counts or review duration between papers that provided data and code and those that did not, suggesting a misalignment between open science efforts and traditional academic metrics. Consequently, encouraging these practices will likely require structural interventions from journals and funding agencies to supplement the lack of direct author incentives. The pipeline developed in this study can be readily scaled to other journals, representing a critical step toward the automated measurement and monitoring of open science practices in transportation research.

Abstract (translated)

开放科学倡议已在多个领域加强了科学研究的诚信度并加速了研究进展,但在交通研究领域的实践状况却尚未得到充分调查。由于交通研究固有的复杂性,要提取其关键特征(这里定义为数据和代码的可用性)颇具挑战。以往的研究要么局限于小规模的手动分析研究,要么依赖于大型文献计量方法,从而牺牲了上下文的丰富度。本文介绍了一种自动且可扩展的功能提取流水线,用于测量交通研究中的数据和代码可用性。我们利用大规模语言模型(LLMs)来完成这项任务,并通过手动策划的数据集以及评阅人之间的一致性分析对其性能进行了验证。我们将该流程应用于2019年至2024年间在《运输研究》期刊系列中发表的10,724篇研究论文,其中定量论文只有5%分享了代码库,4%共享了数据存储库,大约3%同时共享两者。这些趋势因期刊、主题和地理区域而异。我们发现提供数据和代码的论文与那些未提供的论文在引用次数或审稿时间上没有显著差异,这表明开放科学努力与传统学术指标之间存在脱节。因此,鼓励此类实践可能需要期刊和资助机构采取结构性干预措施来弥补作者激励不足的问题。本研究开发出的流水线可轻松扩展至其他期刊,是自动测量和监控交通研究领域中的开放科学实践的重要一步。

URL

https://arxiv.org/abs/2601.14429

PDF

https://arxiv.org/pdf/2601.14429.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot