Abstract
We analyse the effectiveness of RMSE, PSNR, SSIM and FOM for evaluating edge detection algorithms used for automated coastline detection. Typically, the accuracy of detected coastlines is assessed visually. This can be impractical on a large scale leading to the need for objective evaluation metrics. Hence, we conduct an experiment to find reliable metrics. We apply Canny edge detection to 95 coastline satellite images across 49 testing locations. We vary the Hysteresis thresholds and compare metric values to a visual analysis of detected edges. We found that FOM was the most reliable metric for selecting the best threshold. It could select a better threshold 92.6% of the time and the best threshold 66.3% of the time. This is compared RMSE, PSNR and SSIM which could select the best threshold 6.3%, 6.3% and 11.6% of the time respectively. We provide a reason for these results by reformulating RMSE, PSNR and SSIM in terms of confusion matrix measures. This suggests these metrics not only fail for this experiment but are not useful for evaluating edge detection in general.
Abstract (translated)
我们分析RMSE、PSNR、SSIM和FOM在评估用于自动海岸线检测的边缘检测算法的有效性。通常,检测到的海岸线的准确性是通过视觉评估的。在大型范围内这可能是不可行的,因此需要使用客观评估指标。因此,我们进行了一项实验,以找到可靠的指标。我们将Canny边缘检测应用于49个测试地点的95个海岸线卫星图像。我们变化辉度阈值并比较指标值与通过视觉分析检测到的边缘的比较。我们发现,FOM是最可靠的指标,用于选择最佳阈值。它可以在92.6%的时间内选择更好的阈值,而在66.3%的时间内选择最佳阈值。这相当于RMSE,PSNR和SSIM,它们分别可以在6.3%,6.3%和11.6%的时间内选择最佳阈值。我们通过重新定义RMSE、PSNR和SSIM,将其转化为混淆矩阵衡量指标,来解释这些结果。这表明,这些指标不仅在这个实验中失败,而且对于评估边缘检测通常没有用处。
URL
https://arxiv.org/abs/2405.11498