Paper Reading AI Learner

Input-length-shortening and text generation via attention values

2023-03-14 02:11:24
Neşet Özkan Tan, Alex Yuxuan Peng, Joshua Bensemann, Qiming Bao, Tim Hartill, Mark Gahegan, Michael Witbrock

Abstract

Identifying words that impact a task's performance more than others is a challenge in natural language processing. Transformers models have recently addressed this issue by incorporating an attention mechanism that assigns greater attention (i.e., relevance) scores to some words than others. Because of the attention mechanism's high computational cost, transformer models usually have an input-length limitation caused by hardware constraints. This limitation applies to many transformers, including the well-known bidirectional encoder representations of the transformer (BERT) model. In this paper, we examined BERT's attention assignment mechanism, focusing on two questions: (1) How can attention be employed to reduce input length? (2) How can attention be used as a control mechanism for conditional text generation? We investigated these questions in the context of a text classification task. We discovered that BERT's early layers assign more critical attention scores for text classification tasks compared to later layers. We demonstrated that the first layer's attention sums could be used to filter tokens in a given sequence, considerably decreasing the input length while maintaining good test accuracy. We also applied filtering, which uses a compute-efficient semantic similarities algorithm, and discovered that retaining approximately 6\% of the original sequence is sufficient to obtain 86.5\% accuracy. Finally, we showed that we could generate data in a stable manner and indistinguishable from the original one by only using a small percentage (10\%) of the tokens with high attention scores according to BERT's first layer.

Abstract (translated)

识别影响任务表现的某些单词是自然语言处理中的一项挑战。Transformer模型最近解决了这个问题,通过引入一种关注机制,给某些单词赋予更高的关注(即相关性)分数。由于关注机制的高计算成本,Transformer模型通常由于硬件限制而存在输入长度的限制。这适用于许多Transformer模型,包括Transformer模型著名的双向编码器表示(BERT)模型。在本文中,我们研究了BERT的注意力分配机制,重点是两个问题:(1)如何应用关注以减少输入长度?(2)如何应用关注作为条件文本生成控制机制?我们在文本分类任务的背景下研究了这些问题。我们发现,BERT的早期层对文本分类任务赋予更高的关注得分,而后期层则更加关注。我们展示了第一个层的关注总和可以用来过滤给定序列中的 tokens,显著减少输入长度,同时保持较好的测试精度。我们还应用了一种计算效率高语义相似度算法的过滤器,并发现保留约6%的原序列是足够的,以获得86.5%的准确率。最后,我们展示了我们可以通过只使用BERT第一个层中高度关注单词的一小部分(约10%)来稳定地生成数据,并且与原始数据几乎无区别,从而证明了我们的方法。

URL

https://arxiv.org/abs/2303.07585

PDF

https://arxiv.org/pdf/2303.07585.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot