Improving Text-based Early Prediction by Distillation from Privileged Time-Series Text

Abstract
Abstract (translated)
URL
PDF

Abstract

Modeling text-based time-series to make prediction about a future event or outcome is an important task with a wide range of applications. The standard approach is to train and test the model using the same input window, but this approach neglects the data collected in longer input windows between the prediction time and the final outcome, which are often available during training. In this study, we propose to treat this neglected text as privileged information available during training to enhance early prediction modeling through knowledge distillation, presented as Learning using Privileged tIme-sEries Text (LuPIET). We evaluate the method on clinical and social media text, with four clinical prediction tasks based on clinical notes and two mental health prediction tasks based on social media posts. Our results show LuPIET is effective in enhancing text-based early predictions, though one may need to consider choosing the appropriate text representation and windows for privileged text to achieve optimal performance. Compared to two other methods using transfer learning and mixed training, LuPIET offers more stable improvements over the baseline, standard training. As far as we are concerned, this is the first study to examine learning using privileged information for time-series in the NLP context.

Abstract (translated)

建模基于文本的时间序列以对未来事件或结果进行预测是一项广泛应用的任务,具有广泛的应用范围。标准方法是使用相同的输入窗口训练和测试模型,但这种方法忽略了在预测时间和最终结果之间的更长输入窗口中收集的数据,这些数据通常在比赛期间可用。在本研究中,我们建议将被忽略的文本视为训练期间可用的特权信息,通过知识蒸馏来提高早期预测建模,并呈现为“学习使用特权文本”(LuPIET)方法。我们对临床和社交媒体文本进行了评估,包括四个基于临床笔记的临床预测任务和两个基于社交媒体帖子的心理健康预测任务。我们的结果表明,LuPIET在增强基于文本的早期预测方面有效,但可能需要考虑选择适当的文本表示和窗口以获得最佳表现。与使用迁移学习和混合训练的其他两种方法相比,LuPIET提供了比基线更稳定的改进。就目前而言,这是第一个研究探讨在自然语言处理上下文中使用特权信息对时间序列学习的研究。

URL

https://arxiv.org/abs/2301.10887

PDF

https://arxiv.org/pdf/2301.10887.pdf