ClimaBench: A Benchmark Dataset For Climate Change Text Understanding in English

2023-01-11 00:22:56

Tanmay Laud, Daniel Spokoyny, Tom Corringham, Taylor Berg-Kirkpatrick

arXiv_AI

Abstract
Abstract (translated)
URL
PDF

Abstract

The topic of Climate Change (CC) has received limited attention in NLP despite its real world urgency. Activists and policy-makers need NLP tools in order to effectively process the vast and rapidly growing textual data produced on CC. Their utility, however, primarily depends on whether the current state-of-the-art models can generalize across various tasks in the CC domain. In order to address this gap, we introduce Climate Change Benchmark (ClimaBench), a benchmark collection of existing disparate datasets for evaluating model performance across a diverse set of CC NLU tasks systematically. Further, we enhance the benchmark by releasing two large-scale labelled text classification and question-answering datasets curated from publicly available environmental disclosures. Lastly, we provide an analysis of several generic and CC-oriented models answering whether fine-tuning on domain text offers any improvements across these tasks. We hope this work provides a standard assessment tool for research on CC text data.

Abstract (translated)

URL

https://arxiv.org/abs/2301.04253

PDF

https://arxiv.org/pdf/2301.04253.pdf