Improving Baselines in the Wild

2021-12-31 16:59:03

Kazuki Irie, Imanol Schlag, Róbert Csordás, Jürgen Schmidhuber

arXiv_CV

arXiv_CV Relation Knowledge

Abstract
Abstract (translated)
URL
PDF

Abstract

We share our experience with the recently released WILDS benchmark, a collection of ten datasets dedicated to developing models and training strategies which are robust to domain shifts. Several experiments yield a couple of critical observations which we believe are of general interest for any future work on WILDS. Our study focuses on two datasets: iWildCam and FMoW. We show that (1) Conducting separate cross-validation for each evaluation metric is crucial for both datasets, (2) A weak correlation between validation and test performance might make model development difficult for iWildCam, (3) Minor changes in the training of hyper-parameters improve the baseline by a relatively large margin (mainly on FMoW), (4) There is a strong correlation between certain domains and certain target labels (mainly on iWildCam). To the best of our knowledge, no prior work on these datasets has reported these observations despite their obvious importance. Our code is public.

Abstract (translated)

URL

https://arxiv.org/abs/2112.15550

PDF

https://arxiv.org/pdf/2112.15550.pdf