Abstract
Data-driven approaches have revolutionized scientific research. Machine learning and statistical analysis are commonly utilized in this type of research. Despite their widespread use, these methodologies differ significantly in their techniques and objectives. Few studies have utilized a consistent dataset to demonstrate these differences within the social sciences, particularly in language and cognitive sciences. This study leverages the Buckeye Speech Corpus to illustrate how both machine learning and statistical analysis are applied in data-driven research to obtain distinct insights. This study significantly enhances our understanding of the diverse approaches employed in data-driven strategies.
Abstract (translated)
数据驱动的方法已经彻底颠覆了科学研究。机器学习和统计分析是这类研究中最常用的方法。尽管这些方法在科学领域具有广泛应用,但它们在技术和目标上存在显著差异。迄今为止,几乎没有研究在社会科学领域利用一个一致的数据集来展示这些差异,特别是在语言和认知科学领域。本研究利用布奇克语音数据集来说明,机器学习和统计分析在数据驱动研究中如何应用于获得独特的见解。本研究显著增强了我们对数据驱动策略采用的不同方法的认知。
URL
https://arxiv.org/abs/2404.14052