Paper Reading AI Learner

Energy-Based Models with Applications to Speech and Language Processing


Abstract

Energy-Based Models (EBMs) are an important class of probabilistic models, also known as random fields and undirected graphical models. EBMs are un-normalized and thus radically different from other popular self-normalized probabilistic models such as hidden Markov models (HMMs), autoregressive models, generative adversarial nets (GANs) and variational auto-encoders (VAEs). Over the past years, EBMs have attracted increasing interest not only from the core machine learning community, but also from application domains such as speech, vision, natural language processing (NLP) and so on, due to significant theoretical and algorithmic progress. The sequential nature of speech and language also presents special challenges and needs a different treatment from processing fix-dimensional data (e.g., images). Therefore, the purpose of this monograph is to present a systematic introduction to energy-based models, including both algorithmic progress and applications in speech and language processing. First, the basics of EBMs are introduced, including classic models, recent models parameterized by neural networks, sampling methods, and various learning methods from the classic learning algorithms to the most advanced ones. Then, the application of EBMs in three different scenarios is presented, i.e., for modeling marginal, conditional and joint distributions, respectively. 1) EBMs for sequential data with applications in language modeling, where the main focus is on the marginal distribution of a sequence itself; 2) EBMs for modeling conditional distributions of target sequences given observation sequences, with applications in speech recognition, sequence labeling and text generation; 3) EBMs for modeling joint distributions of both sequences of observations and targets, and their applications in semi-supervised learning and calibrated natural language understanding.

Abstract (translated)

基于能量的模型(EBMs)是一种重要的概率模型,也被称为随机场和无向图形模型。EBMs 是不归一化的,因此与广受欢迎的自归一化概率模型(如隐马尔可夫模型(HMMs)、自回归模型、生成对抗网络(GANs)和变分自编码器(VAEs)等)具有显著的区别。在过去的几年里,EBMs 不仅吸引了来自核心机器学习社区越来越多的关注,还吸引了来自应用领域(如语音、视觉、自然语言处理等)的更多关注, due to 显著的理论和算法的进步。语音和语言的序列特性也带来了特殊的挑战,需要与处理固定维数据(如图像)不同的处理方式。因此,本著作的目的是系统地介绍基于能量的模型,包括理论进展和语音和语言处理中的应用。首先介绍 EBM 的基本原理,包括经典模型、通过神经网络参数化的最近模型、采样方法和各种经典学习算法到最先进模型的各种学习方法。然后,分别介绍 EBM 在三种不同场景中的应用,即分别建模序列数据、目标序列的联合分布以及同时建模观察序列和目标序列的联合分布及其在半监督学习和标定自然语言理解中的应用。1)用于语言建模的序列数据的 EBM,主要关注序列本身的后验分布;2)用于建模给定观察序列的目标序列联合分布,应用于语音识别、序列标签和文本生成;3)用于同时建模观察序列和目标序列的联合分布及其在半监督学习和标定自然语言理解中的应用。

URL

https://arxiv.org/abs/2403.10961

PDF

https://arxiv.org/pdf/2403.10961.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot