Abstract
Extreme multi-label text classification utilizes the label hierarchy to partition extreme labels into multiple label groups, turning the task into simple multi-group multi-label classification tasks. Current research encodes labels as a vector with fixed length which needs establish multiple classifiers for different label groups. The problem is how to build only one classifier without sacrificing the label relationship in the hierarchy. This paper adopts the multi-answer questioning task for extreme multi-label classification. This paper also proposes an auxiliary classification evaluation metric. This study adopts the proposed method and the evaluation metric to the legal domain. The utilization of legal Berts and the study on task distribution are discussed. The experiment results show that the proposed hierarchy and multi-answer questioning task can do extreme multi-label classification for EURLEX dataset. And in minor/fine-tuning the multi-label classification task, the domain adapted BERT models could not show apparent advantages in this experiment. The method is also theoretically applicable to zero-shot learning.
Abstract (translated)
极端多标签文本分类利用标签层次将极端标签分割为多个标签组,将任务转化为简单的多组多标签分类任务。目前的研究将标签编码为具有固定长度的向量,需要为不同的标签组建立多个分类器。问题在于如何仅建立一种分类器,而不放弃标签层次中的标签关系。本文采用多回答问答任务来进行极端多标签分类。本文还提出了辅助分类评估 metric。本文将所提出的方法应用于法律领域。利用法律Berts和研究任务分布是讨论的内容。实验结果显示,所提出的层次和多回答问答任务可以对eurLex数据集进行极端多标签分类。在微调多标签分类任务时,域适应的BERT模型在实验中未能表现出明显的优势。该方法也理论上适用于零样本学习。
URL
https://arxiv.org/abs/2303.01064