Abstract
Testing and evaluating the safety performance of autonomous vehicles (AVs) is essential before the large-scale deployment. Practically, the number of testing scenarios permissible for a specific AV is severely limited by tight constraints on testing budgets and time. With the restrictions imposed by strictly restricted numbers of tests, existing testing methods often lead to significant uncertainty or difficulty to quantifying evaluation results. In this paper, we formulate this problem for the first time the "few-shot testing" (FST) problem and propose a systematic framework to address this challenge. To alleviate the considerable uncertainty inherent in a small testing scenario set, we frame the FST problem as an optimization problem and search for the testing scenario set based on neighborhood coverage and similarity. Specifically, under the guidance of better generalization ability of the testing scenario set on AVs, we dynamically adjust this set and the contribution of each testing scenario to the evaluation result based on coverage, leveraging the prior information of surrogate models (SMs). With certain hypotheses on SMs, a theoretical upper bound of evaluation error is established to verify the sufficiency of evaluation accuracy within the given limited number of tests. The experiment results on cut-in scenarios demonstrate a notable reduction in evaluation error and variance of our method compared to conventional testing methods, especially for situations with a strict limit on the number of scenarios.
Abstract (translated)
在自动驾驶车辆(AVs)的大规模部署之前,对AV的安全性能进行测试和评估是至关重要的。实际上,特定AV允许的测试场景数量受到严格预算和时间限制的严重限制。由于限制了测试预算和时间,现有测试方法通常导致对评估结果的不确定性和量化评估结果的困难。在本文中,我们首次将这个问题定义为“少样本测试”(FST)问题,并提出了一个系统框架来解决这个挑战。为了减轻小测试场景集中存在的相当大的不确定性,我们将FST问题定义为优化问题,并基于邻域覆盖和相似性搜索测试场景集。具体来说,在AV测试场景集的更好泛化能力指导下,我们动态调整该集,并根据覆盖率基于每个测试场景对评估结果的贡献进行调整,利用代理模型(SMs)的先验信息。在某些假设关于SMs的情况下,建立了评估误差的理论上限,以验证在给定的有限测试数量内评估准确性的充分性。对切分场景的实验结果表明,与传统测试方法相比,我们的方法在评估误差和方差方面具有显著的减少,尤其是在有限场景数量的情况下。
URL
https://arxiv.org/abs/2402.01795