Abstract
Deep learning (DL) is revolutionizing the scientific computing community. To reduce the data gap caused by usually expensive simulations or experimentation, active learning has been identified as a promising solution for the scientific computing community. However, the deep active learning (DAL) literature is currently dominated by image classification problems and pool-based methods, which are not directly transferrable to scientific computing problems, dominated by regression problems with no pre-defined 'pool' of unlabeled data. Here for the first time, we investigate the robustness of DAL methods for scientific computing problems using ten state-of-the-art DAL methods and eight benchmark problems. We show that, to our surprise, the majority of the DAL methods are not robust even compared to random sampling when the ideal pool size is unknown. We further analyze the effectiveness and robustness of DAL methods and suggest that diversity is necessary for a robust DAL for scientific computing problems.
Abstract (translated)
深度学习(DL)正在改变科学计算社区。为了填补通常昂贵的模拟或实验造成的数据差距, Active Learning 被确认为科学计算社区的一个有前途的解决方案。然而,深 active learning(DAL) literature 目前正在被图像分类问题和基于池的方法所主导,这些方法不能直接用于科学计算问题,而 regression 问题则没有预设的未标记数据“池”。在这里,我们第一次使用十种先进的 DAL 方法和八个基准问题,研究 DAL 方法在科学计算问题中的鲁棒性。我们显示,令人惊讶地,大多数 DAL 方法即使在未确定理想池大小的情况下,与随机抽样相比也不可靠。我们进一步分析了 DAL 方法的有效性和鲁棒性,并建议对于科学计算问题的 robust DAL,需要多样性。
URL
https://arxiv.org/abs/2302.00098