Toward Grammatical Error Detection from Sentence Labels: Zero-shot Sequence Labeling with CNNs and Contextualized Embeddings

Abstract
Abstract (translated)
URL
PDF

Abstract

Zero-shot grammatical error detection is the task of tagging token-level errors in a sentence when only given access to labels at the sentence-level for training. Recent work has explored attention- and gradient-based approaches for the task. We analyze a decomposition of a CNN trained as a sentence-level classifier, demonstrating zero-shot labeling effectiveness competitive with previously proposed bi-LSTM attention-based approaches. Interestingly, with the advantage of pre-trained contextualized embeddings, the approach is competitive with baseline (but no longer state-of-the-art) fully supervised bi-LSTM models (using standard pre-trained word embeddings), despite only having access to sentence-level labels for training. For reference, we also show that the basic approach extends to the fully supervised setting, yielding an error detection model as strong as the current state-of-the-art fully supervised approach with feature-based contextualized embeddings.

Abstract (translated)

URL

https://arxiv.org/abs/1906.01154

PDF

https://arxiv.org/pdf/1906.01154.pdf