Speech
Speech
-
Thoughts on the potential to compensate a hearing loss in noise
Marc René Schädler
arXiv_SD
arXiv_SD
Recognition
Speech
Pose
Speech_Recognition
PDF
-
SEP-28k: A Dataset for Stuttering Event Detection From Podcasts With People Who Stutter
Colin Lea, Vikramjit Mitra, Aparna Joshi, Sachin Kajarekar, Jeffrey P. Bigham
arXiv_SD
arXiv_SD
Recognition
Speech
Detection
Speech_Recognition
PDF
-
From Universal Language Model to Downstream Task: Improving RoBERTa-Based Vietnamese Hate Speech Detection
Quang Huu Pham, Viet Anh Nguyen, Linh Bao Doan, Ngoc N. Tran, Ta Minh Thanh
arXiv_CL
arXiv_CL
Transformer
Bert
Text_Classification
Speech
Pose
Classification
Detection
Language_Model
PDF
-
Augmenting Part-of-speech Tagging with Syntactic Information for Vietnamese and Chinese
Duc-Vu Nguyen, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen
arXiv_CL
arXiv_CL
Segmentation
Speech
Pose
PDF
-
Hopeful_Men@LT-EDI-EACL2021: Hope Speech Detection Using Indic Transliteration and Transformers
Ishan Sanjeev Upadhyay, Nikhil E, Anshul Wadhawan, Radhika Mamidi
arXiv_CL
arXiv_CL
Transformer
Embedding
Bert
RNN
Speech
Detection
PDF
-
Speech Enhancement Using Multi-Stage Self-Attentive Temporal Convolutional Networks
Ju Lin, Adriaan J. van Wijngaarden, Kuang-Ching Wang, Melissa C. Smith
arXiv_SD
arXiv_SD
Enhancement
Recognition
Speech
Attention
Speech_Recognition
CNN
Prediction
PDF
-
Handling Background Noise in Neural Speech Generation
Tom Denton, Alejandro Luebs, Felicia S. C. Lim, Andrew Storus, Hengchin Yeh, W. Bastiaan Kleijn, Jan Skoglund
arXiv_SD
arXiv_SD
Speech
Denoising
PDF
-
Dual-Path Modeling for Long Recording Speech Separation in Meetings
Chenda Li, Zhuo Chen, Yi Luo, Cong Han, Tianyan Zhou, Keisuke Kinoshita, Marc Delcroix, Shinji Watanabe, Yanmin Qian
arXiv_SD
arXiv_SD
Transformer
Speech
Pose
CNN
PDF
-
Unidirectional Memory-Self-Attention Transducer for Online Speech Recognition
Jian Luo, Jianzong Wang, Ning Cheng, Jing Xiao
arXiv_SD
arXiv_SD
Recognition
RNN
Speech
Pose
Attention
Speech_Recognition
PDF
-
Data Fusion for Audiovisual Speaker Localization: Extending Dynamic Stream Weights to the Spatial Domain
Julio Wissing, Benedikt Boenninghoff, Dorothea Kolossa, Tsubasa Ochiaiy, Marc Delcroixy, Keisuke Kinoshitay, Tomohiro Nakataniy, Shoko Arakiy, Christopher Schymura
arXiv_SD
arXiv_SD
Recognition
Speech
Pose
Speech_Recognition
Prediction
PDF
-
Memory-efficient Speech Recognition on Smart Devices
Ganesh Venkatesh, Alagappan Valliappan, Jay Mahadeokar, Yuan Shangguan, Christian Fuegen, Michael L. Seltzer, Vikas Chandra
arXiv_SD
arXiv_SD
Recognition
Optimization
Speech
Speech_Recognition
PDF
-
End-to-End Dereverberation, Beamforming, and Speech Recognition with Improved Numerical Stability and Advanced Frontend
Wangyou Zhang, Christoph Boeddeker, Shinji Watanabe, Tomohiro Nakatani, Marc Delcroix, Keisuke Kinoshita, Tsubasa Ochiai, Naoyuki Kamo, Reinhold Haeb-Umbach, Yanmin Qian
arXiv_SD
arXiv_SD
Recognition
Speech
Pose
Detection
Speech_Recognition
Activity
PDF
-
Senone-aware Adversarial Multi-task Training for Unsupervised Child to Adult Speech Adaptation
Richeng Duan, Nancy F. Chen
arXiv_SD
arXiv_SD
Unsupervised
Recognition
Adversarial
Speech
Pose
Speech_Recognition
Prediction
PDF
-
Evolutionary optimization of contexts for phonetic correction in speech recognition systems
Rafael Viana-Cámara, Diego Campos-Sobrino, Mario Campos-Soberanis
arXiv_SD
arXiv_SD
Recognition
Optimization
Speech
Pose
Speech_Recognition
Language_Model
PDF
-
Investigating Deep Neural Structures and their Interpretability in the Domain of Voice Conversion
Samuel J. Broughton, Md Asif Jalal, Roger K. Moore
arXiv_SD
arXiv_SD
Transfer_Learning
Adversarial
Speech
GAN
PDF
-
'Am I A Good Therapist?' Automated Evaluation Of Psychotherapy Skills Using Speech And Language Technologies
Nikolaos Flemotomos, Victor R. Martinez, Zhuohao Chen, Karan Singla, Victor Ardulov, Raghuveer Peri, Derek D. Caperton, James Gibson, Michael J. Tanana, Panayiotis Georgiou, Jake Van Epps, Sarah P. Lord, Tad Hirsch, Zac E. Imel, David C. Atkins, Shrikanth Narayanan
arXiv_CL
arXiv_CL
Speech
PDF
-
Creating a Universal Dependencies Treebank of Spoken Frisian-Dutch Code-switched Data
Anouck Braggaar, Rob van der Goot
arXiv_CL
arXiv_CL
Segmentation
Speech
Pose
PDF
-
Generating Human Readable Transcript for Automatic Speech Recognition with Pre-trained Language Model
Junwei Liao, Yu Shi, Ming Gong, Linjun Shou, Sefik Eskimez, Liyang Lu, Hong Qu, Michael Zeng
arXiv_CL
arXiv_CL
Recognition
Bert
Speech
Pose
Action
Speech_Recognition
Language_Model
PDF
-
ReINTEL Challenge 2020: Exploiting Transfer Learning Modelsfor Reliable Intelligence Identification on Vietnamese Social Network Sites
Kim Thi-Thanh Nguyen, Kiet Nguyen Van
arXiv_CL
arXiv_CL
Transfer_Learning
Bert
Speech
Pose
GAN
PDF
-
The Use of Voice Source Features for Sung Speech Recognition
Gerardo Roa Dabike, Jon Barker
arXiv_AI
arXiv_AI
Recognition
Speech
Speech_Recognition
PDF
-
The Accented English Speech Recognition Challenge 2020: Open Datasets, Tracks, Baselines, Results and Methods
Xian Shi, Fan Yu, Yizhou Lu, Yuhao Liang, Qiangze Feng, Daliang Wang, Yanmin Qian, Lei Xie
arXiv_SD
arXiv_SD
Recognition
Review
Speech
Pose
Speech_Recognition
PDF
-
TransMask: A Compact and Fast Speech Separation Model Based on Transformer
Zining Zhang, Bingsheng He, Zhenjie Zhang
arXiv_SD
arXiv_SD
Transformer
Speech
Pose
Deep_Learning
Attention
Inference
PDF
-
Speech enhancement with weakly labelled data from AudioSet
Qiuqiang Kong, Haohe Liu, Xingjian Du, Li Chen, Rui Xia, Yuxuan Wang
arXiv_SD
arXiv_SD
Enhancement
Speech
Pose
GAN
Inference
Prediction
PDF
-
Artificially Synthesising Data for Audio Classification and Segmentation to Improve Speech and Music Detection in Radio Broadcast
Satvik Venkatesh, David Moffat, Alexis Kirke, Gözel Shakeri, Stephen Brewster, Jörg Fachner, Helen Odell-Miller, Alex Street, Nicolas Farina, Sube Banerjee, Eduardo Reck Miranda
arXiv_SD
arXiv_SD
Segmentation
RNN
Speech
Classification
Deep_Learning
Detection
CNN
PDF
-
End-to-End Neural Systems for Automatic Children Speech Recognition: An Empirical Study
Prashanth Gurunath Shivakumar, Shrikanth Narayanan
arXiv_SD
arXiv_SD
Recognition
Speech
Speech_Recognition
Language_Model
PDF
-
Alternate Endings: Improving Prosody for Incremental Neural TTS with Predicted Future Text Input
Brooke Stephenson, Thomas Hueber, Laurent Girin, Laurent Besacier
arXiv_CL
arXiv_CL
Speech
Language_Model
PDF
-
KBCNMUJAL@HASOC-Dravidian-CodeMix-FIRE2020: Using Machine Learning for Detection of Hate Speech and Offensive Code-Mixed Social Media text
Varsha Pathak, Manish Joshi, Prasad Joshi, Monica Mundada, Tanmay Joshi
arXiv_CL
arXiv_CL
Speech
Pose
Classification
Detection
GAN
PDF
-
AISPEECH-SJTU accent identification system for the Accented English Speech Recognition Challenge
Houjun Huang, Xu Xiang, Yexin Yang, Rao Ma, Yanmin Qian
arXiv_SD
arXiv_SD
Embedding
Recognition
Speech
Pose
Speech_Recognition
PDF
-
Unit selection synthesis based data augmentation for fixed phrase speaker verification
Houjun Huang, Xu Xiang, Fei Zhao, Shuai Wang, Yanmin Qian
arXiv_SD
arXiv_SD
Speech
Pose
PDF
-
Fixing Errors of the Google Voice Recognizer through Phonetic Distance Metrics
Diego Campos-Sobrino, Mario Campos-Soberanis, Iván Martínez-Chin, Víctor Uc-Cetina
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Speech_Recognition
Language_Model
PDF
-
Dynamic curriculum learning via data parameters for noise robust keyword spotting
Takuya Higuchi, Shreyas Saxena, Mehrez Souden, Tien Dung Tran, Masood Delfarah, Chandra Dhir
arXiv_AI
arXiv_AI
Gradient_Descent
Optimization
Speech
Pose
PDF
-
Generative Speech Coding with Predictive Variance Regularization
W. Bastiaan Kleijn, Andrew Storus, Michael Chinen, Tom Denton, Felicia S. C. Lim, Alejandro Luebs, Jan Skoglund, Hengchin Yeh
arXiv_SD
arXiv_SD
Regularization
Speech
PDF
-
Modelling Paralinguistic Properties in Conversational Speech to Detect Bipolar Disorder and Borderline Personality Disorder
Bo Wang, Yue Wu, Nemanja Vaci, Maria Liakata, Terry Lyons, Kate E A Saunders
arXiv_SD
arXiv_SD
Speech
Pose
Detection
PDF
-
Semantic Parsing to Manipulate Relational Database For a Management System
Muhammad Hamzah Mushtaq
arXiv_AI
arXiv_AI
Speech
Pose
Relation
QA
PDF
-
DINO: A Conditional Energy-Based GAN for Domain Translation
Konstantinos Vougioukas, Stavros Petridis, Maja Pantic
arXiv_CV
arXiv_CV
Reconstruction
Adversarial
Speech
Pose
GAN
PDF
-
Gaussian Kernelized Self-Attention for Long Sequence Data and Its Application to CTC-based Speech Recognition
Yosuke Kashiwagi, Emiru Tsunoo, Shinji Watanabe
arXiv_SD
arXiv_SD
Recognition
Speech
Pose
Classification
Attention
Speech_Recognition
Inference
PDF
-
Echo State Speech Recognition
Harsh Shrivastava, Ankush Garg, Yuan Cao, Yu Zhang, Tara Sainath
arXiv_CL
arXiv_CL
Recognition
RNN
Speech
Pose
Speech_Recognition
PDF
-
Fundamental Frequency Feature Normalization and Data Augmentation for Child Speech Recognition
Gary Yeung, Ruchao Fan, Abeer Alwan
arXiv_SD
arXiv_SD
Recognition
RNN
Speech
Pose
Action
Relation
Speech_Recognition
PDF
-
Introducing the Hidden Neural Markov Chain framework
Elie Azeraf, Emmanuel Monfrini, Emmanuel Vignon, Wojciech Pieczynski
arXiv_CL
arXiv_CL
Embedding
Recognition
Restoration
RNN
Speech
Pose
PDF
-
Towards generalisable hate speech detection: a review on obstacles and solutions
Wenjie Yin, Arkaitz Zubiaga
arXiv_CL
arXiv_CL
Review
Speech
Pose
Survey
Detection
PDF
-
Variational Autoencoder for Speech Enhancement with a Noise-Aware Encoder
Huajian Fang, Guillaume Carbajal, Stefan Wermter, Timo Gerkmann
arXiv_SD
arXiv_SD
Enhancement
Speech
Pose
PDF
-
Do End-to-End Speech Recognition Models Care About Context?
Lasse Borgholt, Jakob Drachmann Havtorn, Željko Agić, Anders Søgaard, Lars Maaløe, Christian Igel
arXiv_CL
arXiv_CL
Recognition
Speech
Classification
Attention
Speech_Recognition
Language_Model
PDF
-
End-to-end lyrics Recognition with Voice to Singing Style Transfer
Sakya Basak, Shrutina Agarwal, Sriram Ganapathy, Naoya Takahashi
arXiv_SD
arXiv_SD
Transfer_Learning
Style_Transfer
Recognition
Speech
Pose
Contour
Language_Model
PDF
-
ATCSpeechNet: A multilingual end-to-end speech recognition framework for air traffic control systems
Yi Lin, Bo Yang, Linchao Li, Dongyue Guo, Jianwei Zhang, Hu Chen, Yi Zhang
arXiv_CL
arXiv_CL
Unsupervised
Recognition
Represenation_Learning
Speech
Self-Supervised
Pose
Speech_Recognition
Prediction
PDF
-
Context-Aware Prosody Correction for Text-Based Speech Editing
Max Morrison, Lucas Rencker, Zeyu Jin, Nicholas J. Bryan, Juan-Pablo Caceres, Bryan Pardo
arXiv_SD
arXiv_SD
Salient
Speech
Pose
Denoising
PDF
-
Improving Deep-learning-based Semi-supervised Audio Tagging with Mixup
Léo Cances, Etienne Labbé, Thomas Pellegrini
arXiv_SD
arXiv_SD
Recognition
Speech
Classification
Deep_Learning
PDF
-
End-to-End Automatic Speech Recognition with Deep Mutual Learning
Ryo Masumura, Mana Ihori, Akihiko Takashima, Tomohiro Tanaka, Takanori Ashihara
arXiv_CL
arXiv_CL
Transformer
Recognition
Knowledge
Speech
Pose
Classification
Speech_Recognition
Prediction
PDF
-
Axial Residual Networks for CycleGAN-based Voice Conversion
Jaeseong You, Gyuhyeon Nam, Dalhyun Kim, Gyeongsu Chae
arXiv_SD
arXiv_SD
Speech
Pose
GAN
PDF
-
Improving speech recognition models with small samples for air traffic control systems
Yi Lin, Qin Li, Bo Yang, Zhen Yan, Huachun Tan, Zhengmao Chen
arXiv_CL
arXiv_CL
Unsupervised
Transfer_Learning
Recognition
Speech
Pose
Face
Deep_Learning
Speech_Recognition
PDF
-
Voice Gender Scoring and Independent Acoustic Characterization of Perceived Masculinity and Femininity
Fuling Chen, Roberto Togneri, Murray Maybery, Diana Tan
arXiv_SD
arXiv_SD
Speech
Pose
Classification
Relation
PDF
-
Deep Learning based Multi-Source Localization with Source Splitting and its Effectiveness in Multi-Talker Speech Recognition
Aswin Shanmugam Subramanian, Chao Weng, Shinji Watanabe, Meng Yu, Dong Yu
arXiv_SD
arXiv_SD
Recognition
Speech
Pose
Classification
Deep_Learning
Relation
Speech_Recognition
Prediction
PDF
-
Hierarchical Transformer-based Large-Context End-to-end ASR with Large-Context Knowledge Distillation
Ryo Masumura, Naoki Makishima, Mana Ihori, Akihiko Takashima, Tomohiro Tanaka, Shota Orihashi
arXiv_CL
arXiv_CL
Transformer
Recognition
Knowledge
Speech
Pose
Speech_Recognition
Language_Model
PDF
-
A multispeaker dataset of raw and reconstructed speech production real-time MRI video and 3D volumetric images
Yongwan Lim, Asterios Toutios, Yannick Bliesener, Ye Tian, Sajan Goud Lingala, Colin Vaz, Tanner Sorensen, Miran Oh, Sarah Harper, Weiyi Chen, Yoonjeong Lee, Johannes Töger, Mairym Lloréns Montesserin, Caitlin Smith, Bianca Godinez, Louis Goldstein, Dani Byrd, Krishna S. Nayak, Shrikanth S. Narayanan
arXiv_SD
arXiv_SD
Reconstruction
3D
Speech
Action
PDF
-
PeriodNet: A non-autoregressive waveform generation model with a structure separating periodic and aperiodic components
Yukiya Hono, Shinji Takaki, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda
arXiv_SD
arXiv_SD
Speech
Pose
PDF
-
Personalization Strategies for End-to-End Speech Recognition Systems
Aditya Gourav, Linda Liu, Ankur Gandhe, Yile Gu, Guitang Lan, Xiangyang Huang, Shashank Kalmane, Gautam Tiwari, Denis Filimonov, Ariya Rastrow, Andreas Stolcke, Ivan Bulyko
arXiv_CL
arXiv_CL
Recognition
Speech
Speech_Recognition
PDF
-
Fast End-to-End Speech Recognition via Non-Autoregressive Models and Cross-Modal Knowledge Transferring from BERT
Ye Bai, Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Zhengqi Wen, Shuai Zhang
arXiv_AI
arXiv_AI
Transfer_Learning
Recognition
Bert
Knowledge
Speech
Pose
Classification
Relation
Attention
Speech_Recognition
Inference
Language_Model
Prediction
PDF
-
Jira: a Kurdish Speech Recognition System Designing and Building Speech Corpus and Pronunciation Lexicon
Hadi Veisi, Hawre Hosseini, Mohammad Mohammadamini (LIA), Wirya Fathy, Aso Mahmudi
arXiv_AI
arXiv_AI
Recognition
Speech
Pose
Face
Speech_Recognition
Language_Model
PDF
-
Leveraging Acoustic and Linguistic Embeddings from Pretrained speech and language Models for Intent Classification
Bidisha Sharma, Maulik Madhavi, Haizhou Li
arXiv_CL
arXiv_CL
Embedding
Recognition
Knowledge
Speech
Pose
Classification
Attention
Speech_Recognition
Language_Model
PDF
-
I-vector Based Within Speaker Voice Quality Identification on connected speech
Chuyao Feng, Eva van Leer, Mackenzie Lee Curtis, David V. Anderson
arXiv_SD
arXiv_SD
Speech
Classification
PDF
-
Attention-gated convolutional neural networks for off-resonance correction of spiral real-time MRI
Yongwan Lim, Shrikanth S. Narayanan, Krishna S. Nayak
arXiv_CV
arXiv_CV
Speech
Relation
Attention
CNN
PDF
-
Thank you for Attention: A survey on Attention-based Artificial Neural Networks for Automatic Speech Recognition
Priyabrata Karmakar, Shyh Wei Teng, Guojun Lu
arXiv_CL
arXiv_CL
Transformer
Recognition
Review
Speech
Survey
Attention
Speech_Recognition
PDF
-
Multi-Channel Speech Enhancement using Graph Neural Networks
Panagiotis Tzirakis, Anurag Kumar, Jacob Donley
arXiv_SD
arXiv_SD
Embedding
Enhancement
Speech
Pose
Relation
PDF
-
Learning Speech-driven 3D Conversational Gestures from Video
Ikhsanul Habibie, Weipeng Xu, Dushyant Mehta, Lingjie Liu, Hans-Peter Seidel, Gerard Pons-Moll, Mohamed Elgharib, Christian Theobalt
arXiv_CV
arXiv_CV
3D
Gesture
Pose_Estimation
Adversarial
Speech
Pose
Face
Relation
GAN
PDF
-
Multimodal Punctuation Prediction with Contextual Dropout
Andrew Silva, Barry-John Theobald, Nicholas Apostoloff
arXiv_AI
arXiv_AI
Transformer
Recognition
Speech
Speech_Recognition
Prediction
PDF
-
Do as I mean, not as I say: Sequence Loss Training for Spoken Language Understanding
Milind Rao, Pranav Dheram, Gautam Tiwari, Anirudh Raju, Jasha Droppo, Ariya Rastrow, Andreas Stolcke
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Speech_Recognition
PDF
-
Hybrid phonetic-neural model for correction in speech recognition systems
Rafael Viana-Cámara, Mario Campos-Soberanis, Diego Campos-Sobrino
arXiv_CL
arXiv_CL
Recognition
Speech
Deep_Learning
Speech_Recognition
PDF
-
End-to-end Audio-visual Speech Recognition with Conformers
Pingchuan Ma, Stavros Petridis, Maja Pantic
arXiv_CV
arXiv_CV
Transformer
Recognition
Speech
Pose
Attention
Speech_Recognition
Language_Model
PDF
-
Detecting Adversarial Attacks On Audiovisual Speech Recognition
Pingchuan Ma, Stavros Petridis, Maja Pantic
arXiv_CV
arXiv_CV
Recognition
Knowledge
Adversarial
Speech
Pose
Deep_Learning
Detection
Relation
Speech_Recognition
PDF
-
On the human evaluation of audio adversarial examples
Jon Vadillo, Roberto Santana
arXiv_SD
arXiv_SD
Adversarial
Speech
Pose
Action
Attention
Prediction
PDF
-
Transformer Language Models with LSTM-based Cross-utterance Information Representation
G. Sun, C. Zhang, P. C. Woodland
arXiv_AI
arXiv_AI
Transformer
Recognition
RNN
Speech
Pose
Speech_Recognition
Language_Model
PDF
-
Content-Aware Speaker Embeddings for Speaker Diarisation
G. Sun, D. Liu, C. Zhang, P. C. Woodland
arXiv_SD
arXiv_SD
Segmentation
Embedding
Recognition
Adversarial
Speech
Pose
Relation
Speech_Recognition
PDF
-
Guided Variational Autoencoder for Speech Enhancement With a Supervised Classifier
Guillaume Carbajal, Julius Richter, Timo Gerkmann
arXiv_SD
arXiv_SD
Enhancement
Speech
Pose
Activity
PDF
-
VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention
Peng Liu, Yuewen Cao, Songxiang Liu, Na Hu, Guangzhi Li, Chao Weng, Dan Su
arXiv_CL
arXiv_CL
Speech
Pose
Attention
Inference
PDF
-
Emoji-Based Transfer Learning for Sentiment Tasks
Susann Boy, Dana Ruiter, Dietrich Klakow
arXiv_CL
arXiv_CL
Transfer_Learning
Speech
Emotion
Detection
Sentiment
PDF
-
Contrastive Unsupervised Learning for Speech Emotion Recognition
Mao Li, Bo Yang, Joshua Levy, Andreas Stolcke, Viktor Rozgic, Spyros Matsoukas, Constantinos Papayiannis, Daniel Bone, Chao Wang
arXiv_SD
arXiv_SD
Unsupervised
Recognition
Represenation_Learning
Salient
Speech
Emotion
Relation
PDF
-
DEEPF0: End-To-End Fundamental Frequency Estimation for Music and Speech Signals
Satwinder Singh, Ruili Wang, Yuanhang Qiu
arXiv_AI
arXiv_AI
Speech
Pose
Deep_Learning
CNN
PDF
-
A Multi-View Approach To Audio-Visual Speaker Verification
Leda Sarı, Kritika Singh, Jiatong Zhou, Lorenzo Torresani, Nayan Singhal, Yatharth Saraf
arXiv_SD
arXiv_SD
Embedding
Speech
Pose
PDF
-
Speech-language Pre-training for End-to-end Spoken Language Understanding
Yao Qian, Ximo Bian, Yu Shi, Naoyuki Kanda, Leo Shen, Zhen Xiao, Michael Zeng
arXiv_CL
arXiv_CL
Transformer
Speech
Pose
Inference
Language_Model
PDF
-
An Investigation of End-to-End Models for Robust Speech Recognition
Archiki Prasad, Preethi Jyothi, Rajbabu Velmurugan
arXiv_SD
arXiv_SD
Enhancement
Recognition
Knowledge
Adversarial
Speech
Speech_Recognition
PDF
-
Speech enhancement with mixture-of-deep-experts with clean clustering pre-training
Shlomo E. Chazan, Jacob Goldberger, Sharon Gannot
arXiv_SD
arXiv_SD
Enhancement
Speech
PDF
-
Language Independent Emotion Quantification using Non linear Modelling of Speech
Uddalok Sarkar, Sayan Nag, Chirayata Bhattacharya, Shankha Sanyal, Archi Banerjee, Ranjan Sengupta, Dipak Ghosh
arXiv_CL
arXiv_CL
Speech
Emotion
Action
PDF
-
CASA-Based Speaker Identification Using Cascaded GMM-CNN Classifier in Noisy and Emotional Talking Conditions
Ali Bou Nassif, Ismail Shahin, Shibani Hamsa, Nawel Nemmour, Keikichi Hirose
arXiv_AI
arXiv_AI
Recognition
Speech
Pose
Emotion
CNN
PDF
-
ASVspoof 2019: spoofing countermeasures for the detection of synthesized, converted and replayed speech
Andreas Nautsch, Xin Wang, Nicholas Evans, Tomi Kinnunen, Ville Vestman, Massimiliano Todisco, Héctor Delgado, Md Sahidullah, Junichi Yamagishi, Kong Aik Lee
arXiv_SD
arXiv_SD
Speech
Detection
PDF
-
Fused Acoustic and Text Encoding for Multimodal Bilingual Pretraining and Speech Translation
Renjie Zheng, Junkun Chen, Mingbo Ma, Liang Huang
arXiv_CL
arXiv_CL
Represenation_Learning
Speech
Pose
Language_Model
PDF
-
ABSP System for The Third DIHARD Challenge
A Kishore Kumar, Shefali Waldekar, Goutam Saha, Md Sahidullah
arXiv_SD
arXiv_SD
Embedding
Speech
PDF
-
Automated Video Labelling: Identifying Faces by Corroborative Evidence
Andrew Brown, Ernesto Coto, Andrew Zisserman
arXiv_CV
arXiv_CV
Speech
Face
Quantitative
PDF
-
Voice Cloning: a Multi-Speaker Text-to-Speech Synthesis Approach based on Transfer Learning
Giuseppe Ruggiero, Enrico Zovato, Luigi Di Caro, Vincent Pollet
arXiv_SD
arXiv_SD
Transfer_Learning
Speech
Pose
Deep_Learning
PDF
-
Dompteur: Taming Audio Adversarial Examples
Thorsten Eisenhofer, Lea Schönherr, Joel Frank, Lars Speckemeier, Dorothea Kolossa, Thorsten Holz
arXiv_SD
arXiv_SD
Recognition
Adversarial
Speech
Pose
Face
Speech_Recognition
PDF
-
NUVA: A Naming Utterance Verifier for Aphasia Treatment
David Sabate Barbera, Mark Huckvale, Victoria Fleming, Emily Upton, Henry Coley-Fisher, Catherine Doogan, Ian Shaw, William Latham, Alexander P. Leff, Jenny Crinion
arXiv_CL
arXiv_CL
Recognition
Speech
Deep_Learning
Speech_Recognition
PDF
-
CDPAM: Contrastive learning for perceptual audio similarity
Pranay Manocha, Zeyu Jin, Richard Zhang, Adam Finkelstein
arXiv_SD
arXiv_SD
Enhancement
Speech
Contrastive_Learning
Deep_Learning
PDF
-
On permutation invariant training for speech source separation
Xiaoyu Liu, Jordi Pons
arXiv_AI
arXiv_AI
Tracking
Speech
Pose
PDF
-
Sparsification via Compressed Sensing for Automatic Speech Recognition
Kai Zhen (1 and 2), Hieu Duy Nguyen (2), Feng-Ju Chang (2), Athanasios Mouchtaris (2), Ariya Rastrow (2). ((1) Indiana University Bloomington, (2) Alexa Machine Learning, Amazon, USA)
arXiv_AI
arXiv_AI
Quantization
Recognition
Sparse
Speech
Pose
Action
Speech_Recognition
PDF
-
Leveraging cross-platform data to improve automated hate speech detection
John D Gallacher
arXiv_CL
arXiv_CL
Speech
Pose
Classification
Detection
PDF
-
BembaSpeech: A Speech Recognition Corpus for the Bemba Language
Claytone Sikasote, Antonios Anastasopoulos
arXiv_CL
arXiv_CL
Recognition
Speech
Speech_Recognition
PDF
-
Amplitude Demodulation of Wideband Signals
Mantas Gabrielaitis
arXiv_SD
arXiv_SD
Speech
Pose
PDF
-
A Study on the Manifestation of Trust in Speech
Lara Gauder, Leonardo Pepino, Pablo Riera, Silvina Brussino, Jazmín Vidal, Agustín Gravano, Luciana Ferrer
arXiv_AI
arXiv_AI
Speech
Action
Prediction
PDF
-
Bayesian Transformer Language Models for Speech Recognition
Boyang Xue, Jianwei Yu, Junhao Xu, Shansong Liu, Shoukang Hu, Zi Ye, Mengzhe Geng, Xunying Liu, Helen Meng
arXiv_CL
arXiv_CL
Transformer
Embedding
Recognition
Speech
Pose
Attention
Speech_Recognition
Inference
Language_Model
PDF
-
Principal components variable importance reconstruction : Exploring predictive importance in multicollinear acoustic speech data
Christopher Carignan, Ander Egurtzegi
arXiv_SD
arXiv_SD
Reconstruction
Speech
Pose
Prediction
PDF
-
Train your classifier first: Cascade Neural Networks Training from upper layers to lower layers
Shucong Zhang, Cong-Thanh Do, Rama Doddipatla, Erfan Loweimi, Peter Bell, Steve Renals
arXiv_AI
arXiv_AI
Recognition
RNN
Speech
Pose
Attention
Speech_Recognition
Language_Model
PDF
-
Independent Vector Extraction for Joint Blind Source Separation and Dereverberation
Rintaro Ikeshita, Tomohiro Nakatani
arXiv_SD
arXiv_SD
Optimization
Speech
Pose
Action
CNN
Prediction
PDF
-
Real-time Monaural Speech Enhancement With Short-time Discrete Cosine Transform
Qinglong Li, Fei Gao, Haixin Guan, Kaichi Ma
arXiv_SD
arXiv_SD
Enhancement
Speech
Pose
Deep_Learning
CNN
PDF
-
A study of text representations in Hate Speech Detection
Chrysoula Themeli, George Giannakopoulos, Nikiforos Pittaras
arXiv_CL
arXiv_CL
Embedding
Speech
Classification
Detection
PDF
-
Federated Acoustic Modeling For Automatic Speech Recognition
Xiaodong Cui, Songtao Lu, Brian Kingsbury
arXiv_SD
arXiv_SD
Recognition
RNN
Speech
Pose
Speech_Recognition
PDF
-
ICASSP 2021 Deep Noise Suppression Challenge: Decoupling Magnitude and Phase Optimization with a Two-Stage Deep Network
Andong Li, Wenzhe Liu, Xiaoxue Luo, Chengshi Zheng, Xiaodong Li
arXiv_SD
arXiv_SD
Optimization
Speech
Pose
Denoising
PDF
-
Switching Variational Auto-Encoders for Noise-Agnostic Audio-visual Speech Enhancement
Mostafa Sadeghi, Xavier Alameda-Pineda
arXiv_CV
arXiv_CV
Unsupervised
Enhancement
Speech
Pose
PDF
-
Effects of Layer Freezing when Transferring DeepSpeech to New Languages
Onno Eberhard, Torsten Zesch
arXiv_CL
arXiv_CL
Speech
PDF
-
Speaker and Direction Inferred Dual-channel Speech Separation
Chenxing Li, Jiaming Xu, Nima Mesgarani, Bo Xu
arXiv_SD
arXiv_SD
Speech
Pose
Attention
PDF
-
LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search
Renqian Luo, Xu Tan, Rui Wang, Tao Qin, Jinzhu Li, Sheng Zhao, Enhong Chen, Tie-Yan Liu
arXiv_AI
arXiv_AI
NAS
Speech
Pose
Inference
PDF
-
Non-linear frequency warping using constant-Q transformation for speech emotion recognition
Premjeet Singh, Goutam Saha, Md Sahidullah
arXiv_SD
arXiv_SD
Recognition
Speech
Emotion
PDF
-
Extracting the Locus of Attention at a Cocktail Party from Single-Trial EEG using a Joint CNN-LSTM Model
Ivine Kuruvila, Jan Muncke, Eghart Fischer, Ulrich Hoppe
arXiv_AI
arXiv_AI
RNN
Speech
Pose
Quantitative
Relation
Attention
CNN
PDF
-
End-to-End Multi-Channel Transformer for Speech Recognition
Feng-Ju Chang, Martin Radfar, Athanasios Mouchtaris, Brian King, Siegfried Kunzmann
arXiv_CL
arXiv_CL
Transformer
Recognition
Speech
Relation
Attention
Speech_Recognition
PDF
-
'Short is the Road that Leads from Fear to Hate': Fear Speech in Indian WhatsApp Groups
Punyajoy Saha, Binny Mathew, Kiran Garimella, Animesh Mukherjee
arXiv_AI
arXiv_AI
Speech
Face
Survey
PDF
-
U-vectors: Generating clusterable speaker embedding from unlabeled data
M. F. Mridha, Abu Quwsar Ohi, M. Ameer Ali, Muhammad Mostafa Monowar, Md. Abdul Hamid
arXiv_AI
arXiv_AI
Embedding
Unsupervised
Recognition
Speech
Pose
Deep_Learning
PDF
-
EMA2S: An End-to-End Multimodal Articulatory-to-Speech System
Yu-Wen Chen, Kuo-Hsuan Hung, Shang-Yi Chuang, Jonathan Sherman, Wen-Chin Huang, Xugang Lu, Yu Tsao
arXiv_SD
arXiv_SD
Speech
PDF
-
Time-Domain Speech Extraction with Spatial Information and Multi Speaker Conditioning Mechanism
Jisi Zhang, Catalin Zorila, Rama Doddipatla, Jon Barker
arXiv_CL
arXiv_CL
Embedding
Recognition
Speech
Pose
Action
Speech_Recognition
PDF
-
A bandit approach to curriculum generation for automatic speech recognition
Anastasia Kuznetsova, Anurag Kumar, Francis M. Tyers
arXiv_CL
arXiv_CL
Recognition
Reinforcement_Learning
Adversarial
Speech
Speech_Recognition
PDF
-
The DKU-Duke-Lenovo System Description for the Third DIHARD Speech Diarization Challenge
Weiqing Wang, Qingjian Lin, Danwei Cai, Lin Yang, Ming Li
arXiv_SD
arXiv_SD
Segmentation
Embedding
Speech
Action
Detection
Activity
PDF
-
Speaker attribution with voice profiles by graph-based semi-supervised learning
Jixuan Wang, Xiong Xiao, Jian Wu, Ranjani Ramamurthy, Frank Rudzicz, Michael Brudno
arXiv_SD
arXiv_SD
Embedding
Speech
Pose
PDF
-
Child-directed Listening: How Caregiver Inference Enables Children's Early Verbal Communication
Stephan C. Meylan, Ruthe Foushee, Elika Bergelson, Roger P. Levy
arXiv_CL
arXiv_CL
Recognition
Speech
Inference
PDF
-
Supervised Speaker Embedding De-Mixing in Two-Speaker Environment
Yanpei Shi, Thomas Hain
arXiv_CL
arXiv_CL
Reconstruction
Embedding
Speech
Pose
PDF
-
Multi-Task Self-Supervised Pre-Training for Music Classification
Ho-Hsiang Wu, Chieh-Chi Kao, Qingming Tang, Ming Sun, Brian McFee, Juan Pablo Bello, Chao Wang
arXiv_SD
arXiv_SD
Reconstruction
Unsupervised
Recognition
Speech
Self-Supervised
Emotion
Classification
Deep_Learning
Speech_Recognition
PDF
-
Intermediate Loss Regularization for CTC-based Speech Recognition
Jaesong Lee, Shinji Watanabe
arXiv_CL
arXiv_CL
Recognition
Regularization
Speech
Pose
Classification
Speech_Recognition
Inference
Language_Model
PDF
-
Real-time Denoising and Dereverberation with Tiny Recurrent U-Net
Hyeong-Seok Choi, Sungjin Park, Jie Hwan Lee, Hoon Heo, Dongsuk Jeon, Kyogu Lee
arXiv_AI
arXiv_AI
Enhancement
Speech
Pose
Deep_Learning
Denoising
Inference
PDF
-
Two-Stage Augmentation and Adaptive CTC Fusion for Improved Robustness of Multi-Stream End-to-End ASR
Ruizhi Li, Gregory Sell, Hynek Hermansky
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Classification
Attention
Speech_Recognition
Inference
PDF
-
Low Bit-Rate Wideband Speech Coding: A Deep Generative Model based Approach
Gang Min, Xiongwei Zhang, Xia Zou, Xiangyang Liu
arXiv_SD
arXiv_SD
Quantization
Speech
Deep_Learning
PDF
-
VSEGAN: Visual Speech Enhancement Generative Adversarial Network
Xinmeng Xu, Yang Wang, Dongxiang Xu, Yiyuan Peng, Cong Zhang, Jie Jia, Binbin Chen
arXiv_SD
arXiv_SD
Enhancement
Adversarial
Speech
Pose
GAN
PDF
-
Audio Adversarial Examples: Attacks Using Vocal Masks
Lynnette Ng, Kai Yuan Tay, Wei Han Chua, Lucerne Loke, Danqi Ye, Melissa Chua
arXiv_AI
arXiv_AI
Adversarial
Speech
PDF
-
Confusion2vec 2.0: Enriching Ambiguous Spoken Language Representations with Subwords
Prashanth Gurunath Shivakumar, Panayiotis Georgiou, Shrikanth Narayanan
arXiv_CL
arXiv_CL
Unsupervised
Recognition
Speech
Pose
Detection
Speech_Recognition
PDF
-
What Do We See in Them? Identifying Dimensions of Partner Models for Speech Interfaces Using a Psycholexical Approach
Philip R Doyle, Leigh Clark, Benjamin R Cowan
arXiv_AI
arXiv_AI
Review
Speech
Face
Action
PDF
-
Monaural Speech Enhancement with Complex Convolutional Block Attention Module and Joint Time Frequency Losses
Shengkui Zhao, Trung Hieu Nguyen, Bin Ma
arXiv_SD
arXiv_SD
Enhancement
Speech
Pose
Attention
CNN
PDF
-
Towards Natural and Controllable Cross-Lingual Voice Conversion Based on Neural TTS Model and Phonetic Posteriorgram
Shengkui Zhao, Hao Wang, Trung Hieu Nguyen, Bin Ma
arXiv_SD
arXiv_SD
Transformer
Speech
Pose
Inference
PDF
-
General-Purpose Speech Representation Learning through a Self-Supervised Multi-Granularity Framework
Yucheng Zhao, Dacheng Yin, Chong Luo, Zhiyuan Zhao, Chuanxin Tang, Wenjun Zeng, Zheng-Jun Zha
arXiv_SD
arXiv_SD
Recognition
Represenation_Learning
Speech
Self-Supervised
Pose
Emotion
Classification
Speech_Recognition
Language_Model
PDF
-
Speech Emotion Recognition with Multiscale Area Attention and Data Augmentation
Mingke Xu, Fan Zhang, Xiaodong Cui, Wei Zhang
arXiv_SD
arXiv_SD
Recognition
Knowledge
Speech
Emotion
Attention
CNN
PDF
-
The Multilingual TEDx Corpus for Speech Recognition and Translation
Elizabeth Salesky, Matthew Wiesner, Jacob Bremerman, Roldano Cattoni, Matteo Negri, Marco Turchi, Douglas W. Oard, Matt Post
arXiv_CL
arXiv_CL
Recognition
Speech
Speech_Recognition
PDF
-
SPEAK WITH YOUR HANDS Using Continuous Hand Gestures to control Articulatory Speech Synthesizer
Pramit Saha, Debasish Ray Mohapatra, Sidney Fels
arXiv_CL
arXiv_CL
Gesture
Speech
Face
PDF
-
CTC-based Compression for Direct Speech Translation
Marco Gaido, Mauro Cettolo, Matteo Negri, Marco Turchi
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Classification
PDF
-
WeNet: Production First and Production Ready End-to-End Speech Recognition Toolkit
Binbin Zhang, Di Wu, Chao Yang, Xiaoyu Chen, Zhendong Peng, Xiangming Wang, Zhuoyuan Yao, Xiong Wang, Fan Yu, Lei Xie, Xin Lei
arXiv_CL
arXiv_CL
Recognition
Speech
Speech_Recognition
PDF
-
Internal Language Model Training for Domain-Adaptive End-to-End Speech Recognition
Zhong Meng, Naoyuki Kanda, Yashesh Gaur, Sarangarajan Parthasarathy, Eric Sun, Liang Lu, Xie Chen, Jinyu Li, Yifan Gong
arXiv_AI
arXiv_AI
Recognition
Speech
Pose
Attention
Speech_Recognition
Inference
Language_Model
PDF
-
The Hitachi-JHU DIHARD III System: Competitive End-to-End Neural Diarization and X-Vector Clustering Systems Combined by DOVER-Lap
Shota Horiguchi, Nelson Yalta, Paola Garcia, Yuki Takashima, Yawen Xue, Desh Raj, Zili Huang, Yusuke Fujita, Shinji Watanabe, Sanjeev Khudanpur
arXiv_CL
arXiv_CL
Speech
PDF
-
Multimodal Attention Fusion for Target Speaker Extraction
Hiroshi Sato, Tsubasa Ochiai, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani, Shoko Araki
arXiv_SD
arXiv_SD
Speech
Pose
Action
Attention
PDF
-
Generacion de voces artificiales infantiles en castellano con acento costarricense
Ana Lilia Alvarez-Blanco, Eugenia Cordoba-Warner, Marvin Coto-Jimenez, Vivian Fallas-Lopez, Maribel Morales Rodriguez
arXiv_CL
arXiv_CL
Speech
Detection
PDF
-
Generative Spoken Language Modeling from Raw Audio
Kushal Lakhotia, Evgeny Kharitonov, Wei-Ning Hsu, Yossi Adi, Adam Polyak, Benjamin Bolte, Tu-Anh Nguyen, Jade Copet, Alexei Baevski, Adelrahman Mohamed, Emmanuel Dupoux
arXiv_CL
arXiv_CL
Unsupervised
Bert
Zero-Shot
Speech
Language_Model
PDF
-
Universal Neural Vocoding with Parallel WaveNet
Yunlong Jiao, Adam Gabrys, Georgi Tinchev, Bartosz Putrycz, Daniel Korzekwa, Viacheslav Klimkov
arXiv_CL
arXiv_CL
Speech
Pose
PDF
-
Mixture Density Network for Phone-Level Prosody Modelling in Speech Synthesis
Chenpeng Du, Kai Yu
arXiv_SD
arXiv_SD
Speech
Inference
PDF
-
On Scaling Contrastive Representations for Low-Resource Speech Recognition
Lasse Borgholt, Tycho Max Sylvester Tax, Jakob Drachmann Havtorn, Lars Maaløe, Christian Igel
arXiv_SD
arXiv_SD
Recognition
Speech
Self-Supervised
Pose
Speech_Recognition
PDF
-
Commonsense Knowledge Mining from Term Definitions
Zhicheng Liang, Deborah L. McGuinness
arXiv_AI
arXiv_AI
Knowledge
Knowledge_Graph
Speech
Relation
PDF
-
Polyphone Disambiguition in Mandarin Chinese with Semi-Supervised Learning
Yi Shi, Congyi Wang, Yu Chen, Bin Wang
arXiv_AI
arXiv_AI
RNN
Knowledge
Speech
Pose
Quantitative
PDF
-
High Fidelity Speech Regeneration with Application to Speech Enhancement
Adam Polyak, Lior Wolf, Yossi Adi, Ori Kabeli, Yaniv Taigman
arXiv_SD
arXiv_SD
Enhancement
Speech
Pose
Denoising
PDF
-
Graph Neural Networks to Predict Customer Satisfaction Following Interactions with a Corporate Call Center
Teja Kanchinadam, Zihang Meng, Joseph Bockhorst, Vikas Singh Kim, Glenn Fung
arXiv_CL
arXiv_CL
Speech
Pose
Survey
Action
Classification
Relation
GAN
Prediction
PDF
-
Speech Recognition by Simply Fine-tuning BERT
Wen-Chin Huang, Chia-Hua Wu, Shang-Bao Luo, Kuan-Yu Chen, Hsin-Min Wang, Tomoki Toda
arXiv_CL
arXiv_CL
Recognition
Bert
Speech
Pose
Speech_Recognition
Language_Model
PDF
-
Triple M: A Practical Neural Text-to-speech System With Multi-guidance Attention And Multi-band Multi-time Lpcnet
Shilun Lin, Xinhui Li, Li Lu
arXiv_CL
arXiv_CL
Speech
Pose
Attention
Inference
PDF
-
LSSED: a large-scale dataset and benchmark for speech emotion recognition
Weiquan Fan, Xiangmin Xu, Xiaofen Xing, Weidong Chen, Dongyan Huang
arXiv_AI
arXiv_AI
Recognition
Speech
Emotion
Action
PDF
-
Directional Sparse Filtering using Weighted Lehmer Mean for Blind Separation of Unbalanced Speech Mixtures
Karn Watcharasupat, Anh H. T. Nguyen, Ching-Hui Ooi, Andy W. H. Khong
arXiv_SD
arXiv_SD
Sparse
Speech
Pose
PDF
-
Adversarially learning disentangled speech representations for robust multi-factor voice conversion
Jie Wang, Jingbei Li, Xintao Zhao, Zhiyong Wu, Helen Meng
arXiv_SD
arXiv_SD
Style_Transfer
Bert
Represenation_Learning
Adversarial
Speech
Pose
Relation
Prediction
PDF
-
Expressive Neural Voice Cloning
Paarth Neekhara, Shehzeen Hussain, Shlomo Dubnov, Farinaz Koushanfar, Julian McAuley
arXiv_SD
arXiv_SD
Style_Transfer
Speech
Pose
Quantitative
Contour
Inference
PDF
-
Speech Enhancement for Wake-Up-Word detection in Voice Assistants
David Bonet, Guillermo Cámbara, Fernando López, Pablo Gómez, Carlos Segura, Jordi Luque
arXiv_CL
arXiv_CL
Reconstruction
Enhancement
Recognition
Speech
Pose
Classification
Detection
Object_Detection
Denoising
CNN
PDF
-
BCN2BRNO: ASR System Fusion for Albayzin 2020 Speech to Text Challenge
Martin Kocour, Guillermo Cámbara, Jordi Luque, David Bonet, Mireia Farrús, Martin Karafiát, Karel Veselý, Jan ''Honza'' Ĉernocký
arXiv_CL
arXiv_CL
Recognition
Speech
Speech_Recognition
CNN
Language_Model
PDF
-
Acoustic Structure Inverse Design and Optimization Using Deep Learning
Xuecong Sun, Han Jia, Yuzhen Yang, Han Zhao, Yafeng Bi, Zhaoyong Sun, Jun Yang
arXiv_SD
arXiv_SD
Enhancement
Optimization
Speech
Pose
Deep_Learning
Attention
Prediction
PDF
-
Vx2Text: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs
Xudong Lin, Gedas Bertasius, Jue Wang, Shih-Fu Chang, Devi Parikh, Lorenzo Torresani
arXiv_CV
arXiv_CV
Transformer
Embedding
Speech
Pose
Text_Generation
Caption
PDF
-
LESA: Linguistic Encapsulation and Semantic Amalgamation Based Generalised Claim Detection from Online Content
Shreya Gupta, Parantak Singh, Megha Sundriyal, Md Shad Akhtar, Tanmoy Chakraborty
arXiv_CL
arXiv_CL
Embedding
Speech
Pose
Detection
Language_Model
PDF
-
Transformer Based Deliberation for Two-Pass Speech Recognition
Ke Hu, Ruoming Pang, Tara N. Sainath, Trevor Strohman
arXiv_CL
arXiv_CL
Transformer
Recognition
RNN
Speech
Attention
Speech_Recognition
PDF
-
Mining Large-Scale Low-Resource Pronunciation Data From Wikipedia
Tania Chakraborty, Manasa Prasad, Theresa Breiner, Sandy Ritchie, Daan van Esch
arXiv_CL
arXiv_CL
Speech
PDF
-
Syntactically Guided Generative Embeddings for Zero-Shot Skeleton Action Recognition
Pranay Gupta, Divyanshu Sharma, Ravi Kiran Sarvadevabhatla
arXiv_CV
arXiv_CV
Embedding
Recognition
Zero-Shot
Speech
Action_Recognition
Action
PDF
-
Convolutional Neural Network-Based Age Estimation Using B-Mode Ultrasound Tongue Image
Kele Xu, Tamas Gábor Csapó, Ming Feng
arXiv_CV
arXiv_CV
Speech
Pose
Face
Deep_Learning
Attention
CNN
PDF
-
Exploring multi-task multi-lingual learning of transformer models for hate speech and offensive speech identification in social media
Sudhanshu Mishra, Shivangi Prasad, Shubhanshu Mishra
arXiv_AI
arXiv_AI
Transformer
Speech
Pose
Inference
PDF
-
A Case Study of Deep Learning Based Multi-Modal Methods for Predicting the Age-Suitability Rating of Movie Trailers
Mahsa Shafaei, Christos Smailis, Ioannis A. Kakadiaris, Thamar Solorio
arXiv_SD
arXiv_SD
Speech
Pose
Deep_Learning
PDF
-
Leveraging End-to-End ASR for Endangered Language Documentation: An Empirical Study on Yolox'ochitl Mixtec
Jiatong Shi. Jonathan D. Amith, Rey Castillo García, Esteban Guadalupe Sierra, Kevin Duh, Shinji Watanabe
arXiv_SD
arXiv_SD
Recognition
Review
Speech
Pose
Speech_Recognition
PDF
-
Unsupervised Abstractive Summarization of Bengali Text Documents
Radia Rayan Chowdhury, Mir Tafseer Nayeem, Tahsin Tasnim Mim, Md. Saifur Rahman Chowdhury, Taufiqul Jannat
arXiv_CL
arXiv_CL
Unsupervised
Speech
Pose
Summarization
Language_Model
PDF
-
Semi-supervised source localization in reverberant environments with deep generative modeling
Michael J. Bianco, Sharon Gannot, Efren Fernandez-Grande, Peter Gerstoft
arXiv_SD
arXiv_SD
Speech
Pose
Activity
CNN
PDF
-
droidlet: modular, heterogenous, multi-modal agents
Anurag Pratik, Soumith Chintala, Kavya Srinet, Dhiraj Gandhi, Rebecca Qian, Yuxuan Sun, Ryan Drew, Sara Elkafrawy, Anoushka Tiwari, Tucker Hart, Mary Williamson, Abhinav Gupta, Arthur Szlam
arXiv_AI
arXiv_AI
Speech
Action
PDF
-
High-Quality Vocoding Design with Signal Processing for Speech Synthesis and Voice Conversion
Mohammed Salah Al-Radhi
arXiv_SD
arXiv_SD
Speech
Pose
Deep_Learning
PDF
-
Domain-Dependent Speaker Diarization for the Third DIHARD Challenge
A Kishore Kumar, Shefali Waldekar, Goutam Saha, Md Sahidullah
arXiv_SD
arXiv_SD
Embedding
Speech
PDF
-
Separating Stimulus-Induced and Background Components of Dynamic Functional Connectivity in Naturalistic fMRI
Chee-Ming Ting, Jeremy I. Skipper, Steven L. Small, Hernando Ombao
arXiv_SD
arXiv_SD
Sparse
Speech
Pose
Action
Detection
Relation
Activity
PDF
-
A Review of Speaker Diarization: Recent Advances with Deep Learning
Tae Jin Park, Naoyuki Kanda, Dimitrios Dimitriadis, Kyu J. Han, Shinji Watanabe, Shrikanth Narayanan
arXiv_CL
arXiv_CL
Recognition
Review
Speech
Survey
Deep_Learning
Speech_Recognition
PDF
-
Towards efficient models for real-time deep noise suppression
Sebastian Braun, Hannes Gamper, Chandan K.A. Reddy, Ivan Tashev
arXiv_SD
arXiv_SD
Enhancement
Speech
Deep_Learning
CNN
Inference
PDF
-
Streaming Models for Joint Speech Recognition and Translation
Orion Weller, Matthias Sperber, Christian Gollan, Joris Kluivers
arXiv_CL
arXiv_CL
Recognition
Speech
Speech_Recognition
Inference
PDF
-
Exploiting Beam Search Confidence for Energy-Efficient Speech Recognition
Dennis Pinto, Jose-María Arnau, Antonio González
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Face
Action
Speech_Recognition
PDF
-
HASOCOne@FIRE-HASOC2020: Using BERT and Multilingual BERT models for Hate Speech Detection
Suman Dowlagar, Radhika Mamidi
arXiv_CL
arXiv_CL
Transfer_Learning
Bert
Speech
Pose
Detection
PDF
-
Understanding the Tradeoffs in Client-Side Privacy for Speech Recognition
Peter Wu, Paul Pu Liang, Ruslan Salakhutdinov, Louis-Philippe Morency
arXiv_SD
arXiv_SD
Recognition
Speech
Speech_Recognition
PDF
-
Mindless Attractor: A False-Positive Resistant Intervention for Drawing Attention Using Auditory Perturbation
Riku Arakawa, Hiromu Yakura
arXiv_AI
arXiv_AI
Speech
Pose
Detection
Attention
PDF
-
LEAF: A Learnable Frontend for Audio Classification
Neil Zeghidour, Olivier Teboul, Félix de Chaumont Quitry, Marco Tagliasacchi
arXiv_SD
arXiv_SD
Speech
Pose
Action
Classification
PDF
-
A Study of F0 Modification for X-Vector Based Speech Pseudonymization Across Gender
Pierre Champion (MULTISPEECH), Denis Jouvet (MULTISPEECH), Anthony Larcher (LIUM)
arXiv_SD
arXiv_SD
Speech
Pose
PDF
-
Online End-to-End Neural Diarization Handling Overlapping Speech and Flexible Numbers of Speakers
Yawen Xue, Shota Horiguchi, Yusuke Fujita, Yuki Takashima, Shinji Watanabe, Paola Garcia, Kenji Nagamatsu
arXiv_SD
arXiv_SD
RNN
Speech
Pose
PDF
-
Arabic Speech Recognition by End-to-End, Modular Systems and Human
Amir Hussein, Shinji Watanabe, Ahmed Ali
arXiv_CL
arXiv_CL
Transformer
Recognition
Speech
Speech_Recognition
PDF
-
A survey of joint intent detection and slot-filling models in natural language understanding
H. Weld, X. Huang, S. Long, J. Poon, S. C. Han (School of Computer Science, The University of Sydney)
arXiv_CL
arXiv_CL
Speech
Survey
Classification
Detection
Relation
PDF
-
The Challenges of Persian User-generated Textual Content: A Machine Learning-Based Approach
Mohammad Kasra Habib
arXiv_CL
arXiv_CL
Recognition
Speech
Sentiment
PDF
-
VOTE400: A Speech Dataset to Study Voice Interface for Elderly-Care
Minsu Jang, Sangwon Seo, Dohyung Kim, Jaeyeon Lee, Jaehong Kim, Jun-Hwan Ahn
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Face
GAN
Speech_Recognition
PDF
-
UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data
Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang
arXiv_CL
arXiv_CL
Transfer_Learning
Recognition
Represenation_Learning
Speech
Self-Supervised
Pose
Speech_Recognition
PDF
-
Improved parallel WaveGAN vocoder with perceptually weighted spectrogram loss
Eunwoo Song, Ryuichi Yamamoto, Min-Jae Hwang, Jin-Seob Kim, Ohsung Kwon, Jae-Min Kim
arXiv_SD
arXiv_SD
Adversarial
Speech
Pose
GAN
CNN
PDF
-
Tiny Transducer: A Highly-efficient Speech Recognition Model on Edge Devices
Yuekai Zhang, Sining Sun, Long Ma
arXiv_SD
arXiv_SD
Recognition
Speech
Pose
Speech_Recognition
PDF
-
Fusing Wav2vec2.0 and BERT into End-to-end Model for Low-resource Speech Recognition
Cheng Yi, Shiyu Zhou, Bo Xu
arXiv_CL
arXiv_CL
Recognition
Bert
Speech
Self-Supervised
Pose
Attention
Speech_Recognition
PDF
-
A Novel Approach for Earthquake Early Warning System Design using Deep Learning Techniques
Tonumoy Mukherjee, Chandrani Singh, Prabir Kumar Biswas
arXiv_SD
arXiv_SD
Recognition
RNN
Speech
Pose
Deep_Learning
Speech_Recognition
CNN
Prediction
PDF
-
Minimum-volume Multichannel Nonnegative matrix factorization for blind source separation
Jianyu Wang, Shanzheng Guan, Xiao-Lei Zhang
arXiv_SD
arXiv_SD
Speech
Pose
PDF
-
Mispronunciation Detection in Non-native English with Uncertainty Modeling
Daniel Korzekwa, Jaime Lorenzo-Trueba, Szymon Zaporowski, Shira Calamaro, Thomas Drugman, Bozena Kostek
arXiv_SD
arXiv_SD
Recognition
Speech
Pose
Detection
PDF
-
Attentional Multi-layer Feature Fusion Convolution Network for Audio-visual Speech Enhancement
Xinmeng Xu, Yang Wang, Dongxiang Xu, Cong Zhang, Yiyuan Peng, Jie Jia, Binbin Chen
arXiv_SD
arXiv_SD
Enhancement
Speech
Pose
Attention
CNN
PDF
-
MFFCN: Multi-layer Feature Fusion Convolution Network for Audio-visual Speech Enhancement
Xinmeng Xu, Dongxiang Xu, Jie Jia, Yang Wang, Binbin Chen
arXiv_SD
arXiv_SD
Enhancement
Speech
Pose
PDF
-
Enabling Robots to Draw and Tell: Towards Visually Grounded Multimodal Description Generation
Ting Han, Sina Zarrieß
arXiv_AI
arXiv_AI
Image_Caption
Gesture
Sketch
Speech
Pose
Face
Action
PDF
-
EmoCat: Language-agnostic Emotional Voice Conversion
Bastian Schnell, Goeric Huybrechts, Bartek Perz, Thomas Drugman, Jaime Lorenzo-Trueba
arXiv_SD
arXiv_SD
Adversarial
Speech
Pose
Emotion
PDF
-
Generating coherent spontaneous speech and gesture from text
Simon Alexanderson, Éva Székely, Gustav Eje Henter, Taras Kucherenko, Jonas Beskow
arXiv_SD
arXiv_SD
3D
Gesture
Speech
PDF
-
Stacked DeBERT: All Attention in Incomplete Data for Text Classification
Gwenaelle Cunha Sergio, Minho Lee
arXiv_CL
arXiv_CL
Transformer
Reconstruction
Embedding
Bert
Text_Classification
Speech
Pose
Classification
Sentiment
Denoising
Attention
QA
PDF
-
An evaluation of word-level confidence estimation for end-to-end automatic speech recognition
Dan Oneata, Alexandru Caranica, Adriana Stan, Horia Cucu
arXiv_CL
arXiv_CL
Recognition
Speech
Deep_Learning
Speech_Recognition
Prediction
PDF
-
Speaker activity driven neural speech extraction
Marc Delcroix, Katerina Zmolikova, Tsubasa Ochiai, Keisuke Kinoshita, Tomohiro Nakatani
arXiv_SD
arXiv_SD
Speech
Pose
Action
Activity
PDF
-
WER-BERT: Automatic WER Estimation with BERT in a Balanced Ordinal Classification Paradigm
Akshay Krishna Sheshadri, Anvesh Rao Vijjini, Sukhdeep Kharbanda
arXiv_CL
arXiv_CL
Recognition
Bert
Speech
Pose
Classification
Speech_Recognition
PDF
-
Whispered and Lombard Neural Speech Synthesis
Qiong Hu, Tobias Bleisch, Petko Petkov, Tuomo Raitio, Erik Marchi, Varun Lakshminarasimhan
arXiv_CL
arXiv_CL
Embedding
Speech
Pose
PDF
-
Effective Low-Cost Time-Domain Audio Separation Using Globally Attentive Locally Recurrent Networks
Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu
arXiv_AI
arXiv_AI
RNN
Speech
Attention
PDF
-
Learning Efficient Representations for Keyword Spotting with Triplet Loss
Roman Vygon, Nikolay Mikhaylovskiy
arXiv_AI
arXiv_AI
Embedding
Recognition
Represenation_Learning
Speech
Pose
Classification
Speech_Recognition
CNN
PDF
-
Practical Speech Re-use Prevention in Voice-driven Services
Yangyong Zhang, Maliheh Shirvanian, Sunpreet S. Arora, Jianwei Huang, Guofei Gu
arXiv_AI
arXiv_AI
Unsupervised
Speech
Action
PDF
-
Self-Training Pre-Trained Language Models for Zero- and Few-Shot Multi-Dialectal Arabic Sequence Labeling
Muhammad Khalifa, Muhammad Abdul-Mageed, Khaled Shaalan
arXiv_AI
arXiv_AI
Zero-Shot
Speech
Pose
Few-Shot
Language_Model
PDF
-
Neural Network-based Virtual Microphone Estimator
Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Shoko Araki
arXiv_SD
arXiv_SD
Enhancement
Recognition
Speech
Pose
PDF
-
Evaluation of Deep Learning Models for Hostility Detection in Hindi Text
Ramchandra Joshi, Rushabh Karnavat, Kaustubh Jirapure, Raviraj Joshi
arXiv_CL
arXiv_CL
Embedding
RNN
Speech
Pose
Face
Action
Classification
Deep_Learning
Detection
PDF
-
A More Efficient Chinese Named Entity Recognition base on BERT and Syntactic Analysis
Xiao Fu, Guijun Zhang
arXiv_CL
arXiv_CL
Transformer
Segmentation
Recognition
Bert
Speech
Pose
PDF
-
MAAS: Multi-modal Assignation for Active Speaker Detection
Juan León-Alcázar, Fabian Caba Heilbron, Ali Thabet, Bernard Ghanem
arXiv_CV
arXiv_CV
Speech
Detection
Prediction
PDF
-
Trankit: A Light-Weight Transformer-based Toolkit for Multilingual Natural Language Processing
Minh Nguyen, Viet Lai, Amir Pouran Ben Veyseh, Thien Huu Nguyen
arXiv_CL
arXiv_CL
Transformer
Segmentation
Speech
Language_Model
PDF
-
Leveraging Multilingual Transformers for Hate Speech Detection
Sayar Ghosh Roy, Ujwal Narayan, Tathagata Raha, Zubair Abid, Vasudeva Varma
arXiv_AI
arXiv_AI
Transformer
Speech
Classification
Detection
Language_Model
PDF
-
VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency
Ruohan Gao, Kristen Grauman
arXiv_CV
arXiv_CV
Embedding
Enhancement
Speech
Pose
Face
PDF
-
A Novel Word Sense Disambiguation Approach Using WordNet Knowledge Graph
Mohannad AlMousa, Rachid Benlamri, Richard Khoury
arXiv_CL
arXiv_CL
Knowledge
Knowledge_Graph
Speech
Pose
Face
PDF
-
Attention-based multi-task learning for speech-enhancement and speaker-identification in multi-speaker dialogue scenario
Chiang-Jen Peng, Yun-Ju Chan, Cheng Yu, Syu-Siang Wang, Yu Tsao, Tai-Shih Chi
arXiv_SD
arXiv_SD
Enhancement
RNN
Speech
Pose
Attention
PDF
-
Object Detection for Understanding Assembly Instruction Using Context-aware Data Augmentation and Cascade Mask R-CNN
J. Lee, S. Lee, S. Back, S. Shin, K. Lee
arXiv_CV
arXiv_CV
Segmentation
Speech
Pose
Deep_Learning
Detection
Object_Detection
PDF
-
Detecting Suspicious Events in Fast Information Flows
Kristiaan Pelckmans, Moustafa Aboushady, Andreas Brosemyr
arXiv_AI
arXiv_AI
Knowledge
Speech
Action
Classification
Detection
Object_Detection
PDF
-
Interspeech 2021 Deep Noise Suppression Challenge
Chandan K A Reddy, Harishchandra Dubey, Kazuhito Koishida, Arun Nair, Vishak Gopal, Ross Cutler, Sebastian Braun, Hannes Gamper, Robert Aichner, Sriram Srinivasan
arXiv_SD
arXiv_SD
Speech
Denoising
GAN
PDF
-
Hypothesis Stitcher for End-to-End Speaker-attributed ASR on Long-form Multi-talker Recordings
Xuankai Chang, Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Takuya Yoshioka
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Speech_Recognition
PDF
-
PhoNLP: A joint multi-task learning model for Vietnamese part-of-speech tagging, named entity recognition and dependency parsing
Linh The Nguyen, Dat Quoc Nguyen
arXiv_CL
arXiv_CL
Recognition
Bert
Speech
Language_Model
PDF
-
Fixed-MAML for Few Shot Classification in Multilingual Speech Emotion Recognition
Anugunj Naman, Liliana Mancini
arXiv_SD
arXiv_SD
Recognition
Speech
Pose
Emotion
Classification
Few-Shot
PDF
-
Toward Multi-Diversified Ensemble Clustering of High-Dimensional Data: From Subspaces to Metrics and Beyond
Dong Huang, Chang-Dong Wang, Jian-Huang Lai, Chee-Keong Kwoh
arXiv_CV
arXiv_CV
Speech
Pose
Attention
PDF
-
Domain-aware Neural Language Models for Speech Recognition
Linda Liu, Yile Gu, Aditya Gourav, Ankur Gandhe, Shashank Kalmane, Denis Filimonov, Ariya Rastrow, Ivan Bulyko
arXiv_CL
arXiv_CL
Recognition
RNN
Speech
Classification
Speech_Recognition
Language_Model
PDF
-
Generalized RNN beamformer for target speech separation
Yong Xu, Zhuohuang Zhang, Meng Yu, Shi-Xiong Zhang, Lianwu Chen, Dong Yu
arXiv_SD
arXiv_SD
RNN
Speech
Pose
PDF
-
Oral Billiards
Elaine Y L Tsiang
arXiv_SD
arXiv_SD
Recognition
Speech
Pose
Action
Speech_Recognition
PDF
-
A novel policy for pre-trained Deep Reinforcement Learning for Speech Emotion Recognition
Thejan Rajapakshe, Rajib Rana, Sara Khalifa
arXiv_SD
arXiv_SD
Recognition
Reinforcement_Learning
Speech
Pose
Emotion
Action
Deep_Learning
PDF
-
DEVI: Open-source Human-Robot Interface for Interactive Receptionist Systems
Ramesha Karunasena, Piumi Sandarenu, Madushi Pinto, Achala Athukorala, Ranga Rodrigo, Peshala Jayasekara
arXiv_RO
arXiv_RO
Recognition
Gesture
Speech
Pose
Face
Face_Recognition
Speech_Recognition
PDF
-
Assessing Emoji Use in Modern Text Processing Tools
Abu Awal Md Shoeb, Gerard de Melo
arXiv_CL
arXiv_CL
Speech
Emotion
Sentiment
PDF
-
Substructure Substitution: Structured Data Augmentation for NLP
Haoyue Shi, Karen Livescu, Kevin Gimpel
arXiv_CL
arXiv_CL
Text_Classification
Speech
Classification
PDF
-
VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation
Changhan Wang, Morgane Rivière, Ann Lee, Anne Wu, Chaitanya Talnikar, Daniel Haziza, Mary Williamson, Juan Pino, Emmanuel Dupoux
arXiv_CL
arXiv_CL
Unsupervised
Recognition
Represenation_Learning
Speech
Speech_Recognition
PDF
-
What all do audio transformer models hear? Probing Acoustic Representations for Language Delivery and its Structure
Jui Shah, Yaman Kumar Singla, Changyou Chen, Rajiv Ratn Shah
arXiv_CL
arXiv_CL
Transformer
Embedding
Bert
Speech
Face
PDF
-
A Survey on Deep Reinforcement Learning for Audio-Based Applications
Siddique Latif, Heriberto Cuayáhuitl, Farrukh Pervez, Fahad Shamshad, Hafiz Shehbaz Ali, Erik Cambria
arXiv_SD
arXiv_SD
Reinforcement_Learning
Speech
Face
Survey
Deep_Learning
Autonomous
PDF
-
Towards Modelling Coherence in Spoken Discourse
Rajaswa Patil, Yaman Kumar Singla, Rajiv Ratn Shah, Mika Hama, Roger Zimmermann
arXiv_CL
arXiv_CL
Speech
PDF
-
Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection
Bertie Vidgen, Tristan Thrush, Zeerak Waseem, Douwe Kiela
arXiv_CL
arXiv_CL
Adversarial
Speech
Classification
Detection
PDF
-
EfficientNet-Absolute Zero for Continuous Speech Keyword Spotting
Amir Mohammad Rostami, Ali Karimi, Mohammad Ali Akhaee
arXiv_CL
arXiv_CL
Speech
Pose
PDF
-
HateCheck: Functional Tests for Hate Speech Detection Models
Paul Röttger, Bertram Vidgen, Dong Nguyen, Zeerak Waseem, Helen Margetts, Janet Pierrehumbert
arXiv_CL
arXiv_CL
Transformer
Review
Speech
Detection
PDF
-
Deep Graph Generators: A Survey
Faezeh Faez, Yassaman Ommi, Mahdieh Soleymani Baghshah, Hamid R. Rabiee
arXiv_AI
arXiv_AI
Represenation_Learning
Adversarial
Speech
Survey
Deep_Learning
PDF
-
Text-Free Image-to-Speech Synthesis Using Learned Segmental Units
Wei-Ning Hsu, David Harwath, Christopher Song, James Glass
arXiv_AI
arXiv_AI
Image_Caption
Speech
Self-Supervised
Caption
PDF
-
Unified Mandarin TTS Front-end Based on Distilled BERT Model
Yang Zhang, Liqun Deng, Yasheng Wang
arXiv_CL
arXiv_CL
Bert
Knowledge
Speech
Pose
Language_Model
Prediction
PDF
-
Generalized Operating Procedure for Deep Learning: an Unconstrained Optimal Design Perspective
Shen Chen, Mingwei Zhang, Jiamin Cui, Wei Yao
arXiv_SD
arXiv_SD
Speech
Pose
Deep_Learning
Inference
PDF
-
Robustness Testing of Language Understanding in Dialog Systems
Jiexi Liu, Ryuichi Takanobu, Jiaxin Wen, Dazhen Wan, Weiran Nie, Hongyan Li, Cheng Li, Wei Peng, Minlie Huang
arXiv_AI
arXiv_AI
Speech
Pose
PDF
-
Multi-view Temporal Alignment for Non-parallel Articulatory-to-Acoustic Speech Synthesis
Jose A. Gonzalez-Lopez, Miriam Gonzalez-Atienza, Alejandro Gomez-Alanis, Jose L. Perez-Cordoba, Phil D. Green
arXiv_SD
arXiv_SD
Speech
Pose
PDF
-
Detecting Hate Speech in Multi-modal Memes
Abhishek Das, Japsimar Singh Wahi, Siyao Li
arXiv_CV
arXiv_CV
Image_Caption
Speech
Pose
Face
Classification
Detection
Sentiment
VQA
Object_Detection
Caption
Prediction
PDF
-
Detection of Lexical Stress Errors in Non-native English with Data Augmentation and Attention
Daniel Korzekwa, Roberto Barra-Chicote, Szymon Zaporowski, Grzegorz Beringer, Jaime Lorenzo-Trueba, Alicja Serafinowicz, Jasha Droppo, Thomas Drugman, Bozena Kostek
arXiv_SD
arXiv_SD
Speech
Pose
Action
Deep_Learning
Detection
Attention
PDF
-
RADDLE: An Evaluation Benchmark and Analysis Platform for Robust Task-oriented Dialog Systems
Baolin Peng, Chunyuan Li, Zhu Zhang, Chenguang Zhu, Jinchao Li, Jianfeng Gao
arXiv_AI
arXiv_AI
Speech
PDF
-
DeepHateExplainer: Explainable Hate Speech Detection in Under-resourced Bengali Language
Md. Rezaul Karim, Sumon Kanti Dey, Bharathi Raja Chakravarthi
arXiv_CL
arXiv_CL
Transformer
Embedding
Bert
RNN
Speech
Pose
Detection
PDF
-
Towards Fully Automated Manga Translation
Ryota Hinami, Shonosuke Ishiwatari, Kazuhiko Yasuda, Yusuke Matsui
arXiv_CL
arXiv_CL
Speech
Pose
PDF
-
Lattice-Free MMI Adaptation Of Self-Supervised Pretrained Acoustic Models
Apoorv Vyas, Srikanth Madikeri, Hervé Bourlard
arXiv_SD
arXiv_SD
Transformer
Speech
Self-Supervised
Pose
PDF
-
Building Multi lingual TTS using Cross Lingual Voice Conversion
Qinghua Sun, Kenji Nagamatsu
arXiv_SD
arXiv_SD
Speech
Pose
PDF
-
Exploring Emotion Features and Fusion Strategies for Audio-Video Emotion Recognition
Hengshun Zhou, Debin Meng, Yuanyuan Zhang, Xiaojiang Peng, Jun Du, Kai Wang, Yu Qiao
arXiv_CV
arXiv_CV
Recognition
Speech
Emotion
Attention
PDF
-
My Teacher Thinks The World Is Flat! Interpreting Automatic Essay Scoring Mechanism
Swapnil Parekh, Yaman Kumar Singla, Changyou Chen, Junyi Jessy Li, Rajiv Ratn Shah
arXiv_AI
arXiv_AI
Knowledge
Adversarial
Speech
PDF
-
Speech Enhancement based on Denoising Autoencoder with Multi-branched Encoders
Cheng Yu, Ryandhimas E. Zezario, Syu-Siang Wang, Jonathan Sherman, Yi-Yen Hsieh, Xugang Lu, Hsin-Min Wang, Yu Tsao
arXiv_SD
arXiv_SD
Enhancement
Recognition
Knowledge
Speech
Pose
Deep_Learning
Denoising
Speech_Recognition
PDF
-
Multi-channel Multi-frame ADL-MVDR for Target Speech Separation
Zhuohuang Zhang, Yong Xu, Meng Yu, Shi-Xiong Zhang, Lianwu Chen, Donald S. Williamson, Dong Yu
arXiv_SD
arXiv_SD
Recognition
Speech
Pose
Deep_Learning
Relation
Speech_Recognition
PDF
-
ThamizhiUDp: A Dependency Parser for Tamil
Kengatharaiyer Sarveswaran, Gihan Dias
arXiv_CL
arXiv_CL
Speech
PDF
-
Detecting Hateful Memes Using a Multimodal Deep Ensemble
Vlad Sandulescu
arXiv_CV
arXiv_CV
Transformer
Speech
Pose
PDF
-
Unsupervised neural adaptation model based on optimal transport for spoken language identification
Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai
arXiv_CL
arXiv_CL
Unsupervised
Recognition
Speech
Pose
Classification
PDF
-
The 2020 ESPnet update: new features, broadened applications, performance improvements, and future plans
Shinji Watanabe, Florian Boyer, Xuankai Chang, Pengcheng Guo, Tomoki Hayashi, Yosuke Higuchi, Takaaki Hori, Wen-Chin Huang, Hirofumi Inaguma, Naoyuki Kamo, Shigeki Karita, Chenda Li, Jing Shi, Aswin Shanmugam Subramanian, Wangyou Zhang
arXiv_SD
arXiv_SD
Transformer
Enhancement
Recognition
Speech
Denoising
Speech_Recognition
PDF
-
Speech Synthesis as Augmentation for Low-Resource ASR
Deblin Bagchi, Shannon Wotherspoon, Zhuolin Jiang, Prasanna Muthukumar
arXiv_CL
arXiv_CL
Recognition
Adversarial
Speech
Speech_Recognition
PDF
-
Detecting Hate Speech in Memes Using Multimodal Deep Learning Approaches: Prize-winning solution to Hateful Memes Challenge
Riza Velioglu, Jewgeni Rose
arXiv_AI
arXiv_AI
Bert
Speech
Pose
Deep_Learning
Caption
PDF
-
A Multimodal Framework for the Detection of Hateful Memes
Phillip Lippe, Nithin Holla, Shantanu Chandra, Santhosh Rajamanickam, Georgios Antoniou, Ekaterina Shutova, Helen Yannakoudakis
arXiv_AI
arXiv_AI
Speech
Face
Classification
Detection
GAN
PDF
-
Incremental Text-to-Speech Synthesis Using Pseudo Lookahead with Large Pretrained Language Model
Takaaki Saeki, Shinnosuke Takamichi, Hiroshi Saruwatari
arXiv_SD
arXiv_SD
Knowledge
Speech
Pose
Language_Model
PDF
-
CN-Celeb: multi-genre speaker recognition
Lantian Li, Ruiqi Liu, Jiawen Kang, Yue Fan, Hao Cui, Yunqi Cai, Ravichander Vipperla, Thomas Fang Zheng, Dong Wang
arXiv_SD
arXiv_SD
Recognition
Speech
PDF
-
AudioViewer: Learning to Visualize Sound
Yuchi Zhang, Willis Peng, Bastian Wandt, Helge Rhodin
arXiv_CV
arXiv_CV
Speech
Face
Quantitative
PDF
-
Confronting Abusive Language Online: A Survey from the Ethical and Human Rights Perspective
Svetlana Kiritchenko, Isar Nejadgholi, Kathleen C. Fraser
arXiv_AI
arXiv_AI
Style_Transfer
Review
Speech
Survey
Classification
Detection
GAN
PDF
-
Applying wav2vec2.0 to Speech Recognition in various low-resource languages
Cheng Yi, Jianzhong Wang, Ning Cheng, Shiyu Zhou, Bo Xu
arXiv_CL
arXiv_CL
Recognition
Bert
Speech
Speech_Recognition
PDF
-
A Hierarchical Reasoning Graph Neural Network for The Automatic Scoring of Answer Transcriptions in Video Job Interviews
Kai Chen, Meng Niu, Qingcai Chen
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Action
Relation
Attention
Speech_Recognition
Prediction
QA
Matching
PDF
-
Limitations of Deep Neural Networks: a discussion of G. Marcus' critical appraisal of deep learning
Stefanos Tsimenidis
arXiv_AI
arXiv_AI
Recognition
Speech
Deep_Learning
Speech_Recognition
Medical
Autonomous
PDF
-
Adversarial Meta Sampling for Multilingual Low-Resource Speech Recognition
Yubei Xiao, Ke Gong, Pan Zhou, Guolin Zheng, Xiaodan Liang, Liang Lin
arXiv_CL
arXiv_CL
Transfer_Learning
Recognition
Adversarial
Speech
Speech_Recognition
PDF
-
Differentially Private Synthetic Medical Data Generation using Convolutional GANs
Amirsina Torfi, Edward A. Fox, Chandan K. Reddy
arXiv_AI
arXiv_AI
Unsupervised
Adversarial
Speech
Classification
Deep_Learning
Relation
GAN
Medical
CNN
Image_Classification
PDF
-
Encoding Syntactic Knowledge in Transformer Encoder for Intent Detection and Slot Filling
Jixuan Wang, Kai Wei, Martin Radfar, Weiwei Zhang, Clement Chung
arXiv_AI
arXiv_AI
Transformer
Knowledge
Speech
Pose
Detection
Attention
Inference
PDF
-
Neural Methods for Effective, Efficient, and Exposure-Aware Information Retrieval
Bhaskar Mitra
arXiv_AI
arXiv_AI
Recognition
Speech
Relation
Speech_Recognition
PDF
-
Unsupervised Cross-Lingual Speech Emotion Recognition Using DomainAdversarial Neural Network
Xiong Cai, Zhiyong Wu, Kuo Zhong, Bin Su, Dongyang Dai, Helen Meng
arXiv_AI
arXiv_AI
Unsupervised
Recognition
Adversarial
Speech
Pose
Emotion
Classification
Deep_Learning
PDF
-
Multi-stream Convolutional Neural Network with Frequency Selection for Robust Speaker Verification
Wei Yao, Shen Chen, Jiamin Cui, Yaolin Lou
arXiv_SD
arXiv_SD
Embedding
Knowledge
Speech
Pose
Classification
CNN
PDF
-
Adjust-free adversarial example generation in speech recognition using evolutionary multi-objective optimization under black-box condition
Shoma Ishida
arXiv_CL
arXiv_CL
Recognition
Optimization
Adversarial
Speech
Pose
Speech_Recognition
PDF
-
Visual Speech Enhancement Without A Real Visual Stream
Sindhu B Hegde, K R Prajwal, Rudrabha Mukhopadhyay, Vinay Namboodiri, C.V. Jawahar
arXiv_CV
arXiv_CV
Enhancement
Speech
Pose
Quantitative
PDF
-
Lexically-constrained Text Generation through Commonsense Knowledge Extraction and Injection
Yikang Li, Pulkit Goel, Varsha Kuppur Rajendra, Har Simrat Singh, Jonathan Francis, Kaixin Ma, Eric Nyberg, Alessandro Oltramari
arXiv_CL
arXiv_CL
Knowledge
Knowledge_Graph
Speech
Pose
Action
Relation
Attention
Text_Generation
Language_Model
Matching
PDF
-
DCCRGAN: Deep Complex Convolution Recurrent Generator Adversarial Network for Speech Enhancement
Huixiang Huang, Renjie Wu, Jingbiao Huang, Jucai Lin
arXiv_SD
arXiv_SD
Enhancement
Optimization
RNN
Adversarial
Speech
Pose
Relation
GAN
PDF
-
Non-uniform FIR Digital Filter Bank for Hearing Aid Application Using Frequency Response Masking Technique: A Review
Arun Sebastian, Manu Francis, Arun Mathew
arXiv_SD
arXiv_SD
Review
Speech
Matching
PDF
-
HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection
Binny Mathew, Punyajoy Saha, Seid Muhie Yimam, Chris Biemann, Pawan Goyal, Animesh Mukherjee
arXiv_AI
arXiv_AI
Speech
Classification
Detection
PDF
-
End-to-End Speaker Diarization as Post-Processing
Shota Horiguchi, Paola Garcia, Yusuke Fujita, Shinji Watanabe, Kenji Nagamatsu
arXiv_CL
arXiv_CL
Speech
Pose
Classification
PDF
-
NeurST: Neural Speech Translation Toolkit
Chengqi Zhao, Mingxuan Wang, Lei Li
arXiv_CL
arXiv_CL
Speech
Action
PDF
-
Continuous Speech Separation Using Speaker Inventory for Long Multi-talker Recording
Cong Han, Yi Luo, Chenda Li, Tianyan Zhou, Keisuke Kinoshita, Shinji Watanabe, Marc Delcroix, Hakan Erdogan, John R. Hershey, Nima Mesgarani, Zhuo Chen
arXiv_SD
arXiv_SD
Embedding
Speech
Pose
Attention
PDF
-
Parallel WaveNet conditioned on VAE latent vectors
Jonas Rohnke, Tom Merritt, Jaime Lorenzo-Trueba, Adam Gabrys, Vatsal Aggarwal, Alexis Moinet, Roberto Barra-Chicote
arXiv_SD
arXiv_SD
Speech
Inference
PDF
-
Hate Speech detection in the Bengali language: A dataset and its baseline evaluation
Nauros Romim, Mosahed Ahmed, Hriteshwar Talukder, Md Saiful Islam
arXiv_CL
arXiv_CL
Embedding
Speech
Face
Deep_Learning
Detection
Language_Model
PDF
-
Denoising Text to Speech with Frame-Level Noise Modeling
Chen Zhang, Yi Ren, Xu Tan, Jinglin Liu, Kejun Zhang, Tao Qin, Sheng Zhao, Tie-Yan Liu
arXiv_SD
arXiv_SD
Embedding
Enhancement
Speech
Denoising
PDF
-
The effectiveness of unsupervised subword modeling with autoregressive and cross-lingual phone-aware networks
Siyuan Feng, Odette Scharenborg
arXiv_CL
arXiv_CL
Unsupervised
Knowledge
Speech
Self-Supervised
Pose
Relation
PDF
-
cif-based collaborative decoding for end-to-end contextual speech recognition
Minglun Han, Linhao Dong, Shiyu Zhou, Bo Xu
arXiv_AI
arXiv_AI
Embedding
Recognition
Speech
Speech_Recognition
PDF
-
Interactive Speech and Noise Modeling for Speech Enhancement
Chengyu Zheng, Xiulian Peng, Yuan Zhang, Sriram Srinivasan, Yan Lu
arXiv_SD
arXiv_SD
Enhancement
Speech
Pose
Action
Relation
Attention
CNN
PDF
-
Speech Enhancement with Zero-Shot Model Selection
Ryandhimas E. Zezario, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao
arXiv_SD
arXiv_SD
Embedding
Enhancement
Zero-Shot
Speech
Pose
Deep_Learning
PDF
-
You Are What You Tweet: Profiling Users by Past Tweets to Improve Hate Speech Detection
Prateek Chaudhry, Matthew Lease
arXiv_AI
arXiv_AI
Speech
Detection
PDF
-
Pre-Training Transformers as Energy-Based Cloze Models
Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning
arXiv_CL
arXiv_CL
Transformer
Recognition
Bert
Represenation_Learning
Speech
Pose
Speech_Recognition
Language_Model
PDF
-
LiteMuL: A Lightweight On-Device Sequence Tagger using Multi-task Learning
Sonal Kumari, Vibhav Agarwal, Bharath Challa, Kranti Chalamalasetti, Sourav Ghosh, Harshavardhana, Barath Raj Kandur Raja
arXiv_CL
arXiv_CL
Knowledge
Speech
Pose
Detection
PDF
-
Exploring Transfer Learning For End-to-End Spoken Language Understanding
Subendhu Rongali, Beiye Liu, Liwei Cai, Konstantine Arkoudas, Chengwei Su, Wael Hamza
arXiv_CL
arXiv_CL
Transfer_Learning
Recognition
Zero-Shot
Speech
Pose
Face
Action
Speech_Recognition
PDF
-
QUARC: Quaternion Multi-Modal Fusion Architecture For Hate Speech Classification
Deepak Kumar, Nalin Kumar, Subhankar Mishra
arXiv_CL
arXiv_CL
Speech
Pose
Classification
PDF
-
User-friendly automatic transcription of low-resource languages: Plugging ESPnet into Elpis
Oliver Adams, Benjamin Galliot (LACITO), Guillaume Wisniewski (LLF UMR7110), Nicholas Lambourne, Ben Foley, Rahasya Sanders-Dwyer, Janet Wiles, Alexis Michaud (LACITO), Séverine Guillaume (LACITO), Laurent Besacier (LIG), Christopher Cox, Katya Aplonova (LLACAN), Guillaume Jacques (CRLAO), Nathan Hill
arXiv_CL
arXiv_CL
Enhancement
Recognition
Speech
Face
Speech_Recognition
PDF
-
Writing Polishment with Simile: Task, Dataset and A Neural Approach
Jiayi Zhang, Zhi Cui, Xiaoqiang Xia, Yalong Guo, Yanran Li, Chen Wei, Jianwei Cui
arXiv_CL
arXiv_CL
Transformer
Speech
Pose
PDF
-
A review of on-device fully neural end-to-end automatic speech recognition algorithms
Chanwoo Kim, Dhananjaya Gowda, Dongsoo Lee, Jiyeon Kim, Ankur Kumar, Sungsoo Kim, Abhinav Garg, Changwoo Han
arXiv_CL
arXiv_CL
Transformer
Recognition
Optimization
RNN
Review
Speech
Pose
Classification
Attention
Speech_Recognition
Language_Model
PDF
-
Lips Don't Lie: A Generalisable and Robust Approach to Face Forgery Detection
Alexandros Haliassos, Konstantinos Vougioukas, Stavros Petridis, Maja Pantic
arXiv_CV
arXiv_CV
Embedding
Recognition
Speech
Pose
Face
Deep_Learning
Detection
Object_Detection
Speech_Recognition
PDF
-
Clickbait in Hindi News Media : A Preliminary Study
Vivek Kaushal, Kavita Vemuri
arXiv_CL
arXiv_CL
Speech
Action
Relation
PDF
-
Towards unsupervised phone and word segmentation using self-supervised vector-quantized neural networks
Herman Kamper, Benjamin van Niekerk
arXiv_CL
arXiv_CL
Segmentation
Unsupervised
Speech
Self-Supervised
PDF
-
A learning perspective on the emergence of abstractions: the curious case of phonemes
Petar Milin, Benjamin V. Tucker, Dagmar Divjak
arXiv_AI
arXiv_AI
Knowledge
Speech
Action
PDF
-
Bayesian Learning for Deep Neural Network Adaptation
Xurong Xie, Xunying Liu, Tan Lee, Lan Wang
arXiv_SD
arXiv_SD
Unsupervised
Recognition
Speech
Pose
Speech_Recognition
Inference
PDF
-
Robust One Shot Audio to Video Generation
Neeraj Kumar, Srishti Goel, Ankur Narang, Mujtaba Hasan
arXiv_CV
arXiv_CV
Adversarial
Speech
Pose
Quantitative
GAN
PDF
-
Towards localisation of keywords in speech using weak supervision
Kayode Olaleye, Benjamin van Niekerk, Herman Kamper
arXiv_CL
arXiv_CL
Weakly_Supervised
Salient
Speech
Self-Supervised
PDF
-
A comparison of self-supervised speech representations as input features for unsupervised acoustic word embeddings
Lisa van Staden, Herman Kamper
arXiv_CL
arXiv_CL
Embedding
Unsupervised
Represenation_Learning
Speech
Self-Supervised
PDF
-
Classification of ALS patients based on acoustic analysis of sustained vowel phonations
Maxim Vashkevich, Yulia Rushkevich
arXiv_CL
arXiv_CL
Speech
Pose
Classification
Detection
PDF
-
Multi Modal Adaptive Normalization for Audio to Video Generation
Neeraj Kumar, Srishti Goel, Ankur Narang, Brejesh Lall
arXiv_CV
arXiv_CV
Multi_Modal
Adversarial
Speech
Pose
Action
Quantitative
GAN
Optical_Flow
PDF
-
Group Communication with Context Codec for Ultra-Lightweight Source Separation
Yi Luo, Cong Han, Nima Mesgarani
arXiv_AI
arXiv_AI
Enhancement
Speech
Pose
PDF
-
SPARTA: Speaker Profiling for ARabic TAlk
Wael Farhan, Muhy Eddin Za'ter, Qusai Abu Obaidah, Hisham al Bataineh, Zyad Sober, Hussein T. Al-Natsheh
arXiv_CL
arXiv_CL
Text_Classification
RNN
Speech
Pose
Emotion
Classification
CNN
PDF
-
Syntactic representation learning for neural network based TTS with syntactic parse tree traversal
Changhe Song, Jingbei Li, Yixuan Zhou, Zhiyong Wu, Helen Meng
arXiv_CL
arXiv_CL
Embedding
Represenation_Learning
Knowledge
Speech
Pose
PDF
-
DEAAN: Disentangled Embedding and Adversarial Adaptation Network for Robust Speaker Representation Learning
Mufan Sang, Wei Xia, John H.L. Hansen
arXiv_SD
arXiv_SD
Embedding
Represenation_Learning
Adversarial
Speech
Pose
PDF
-
VoxSRC 2020: The Second VoxCeleb Speaker Recognition Challenge
Arsha Nagrani, Joon Son Chung, Jaesung Huh, Andrew Brown, Ernesto Coto, Weidi Xie, Mitchell McLaren, Douglas A Reynolds, Andrew Zisserman
arXiv_SD
arXiv_SD
Recognition
Speech
PDF
-
Less Is More: Improved RNN-T Decoding Using Limited Label Context and Path Merging
Rohit Prabhavalkar, Yanzhang He, David Rybach, Sean Campbell, Arun Narayanan, Trevor Strohman, Tara N. Sainath
arXiv_CL
arXiv_CL
Recognition
RNN
Speech
Pose
Speech_Recognition
PDF
-
DeCoAR 2.0: Deep Contextualized Acoustic Representations with Vector Quantization
Shaoshi Ling, Yuzong Liu
arXiv_CL
arXiv_CL
Transformer
Reconstruction
Quantization
Recognition
Sparse
Represenation_Learning
RNN
Speech
Self-Supervised
Pose
Speech_Recognition
PDF
-
Improved Robustness to Disfluencies in RNN-Transducer Based Speech Recognition
Valentin Mendelev, Tina Raissi, Guglielmo Camporese, Manuel Giollo
arXiv_CL
arXiv_CL
Recognition
RNN
Speech
Speech_Recognition
PDF
-
Exploring wav2vec 2.0 on speaker verification and language identification
Zhiyun Fan, Meng Li, Shiyu Zhou, Bo Xu
arXiv_CL
arXiv_CL
Recognition
Represenation_Learning
Speech
Self-Supervised
Pose
Speech_Recognition
PDF
-
VaryFairyTED : A Fair in Rating Predictor for Public Speeches by Awareness of Verbal and Gesture Quality
Rupam Acharyya, Ankani Chattoraj, Shouman Das, Md. Iftekhar Tanveer, Ehsan Hoque
arXiv_AI
arXiv_AI
Gesture
Speech
Emotion
Relation
Prediction
PDF
-
Next Wave Artificial Intelligence: Robust, Explainable, Adaptable, Ethical, and Accountable
Odest Chadwicke Jenkins, Daniel Lopresti, Melanie Mitchell
arXiv_AI
arXiv_AI
Recognition
Knowledge
Adversarial
Speech
Face
Action
Deep_Learning
Speech_Recognition
Medical
Autonomous
PDF
-
Towards Neural Programming Interfaces
Zachary C. Brown, Nathaniel Robinson, David Wingate, Nancy Fulda
arXiv_AI
arXiv_AI
Transformer
Speech
Pose
Face
GAN
Language_Model
PDF
-
Look Before you Speak: Visually Contextualized Utterances
Paul Hongsuck Seo, Arsha Nagrani, Cordelia Schmid
arXiv_CV
arXiv_CV
Transformer
Speech
Attention
Activity
Prediction
QA
PDF
-
Direct multimodal few-shot learning of speech and images
Leanne Nortje, Herman Kamper
arXiv_SD
arXiv_SD
Embedding
Unsupervised
Transfer_Learning
Speech
Pose
Few-Shot
Matching
PDF
-
Unified Streaming and Non-streaming Two-pass End-to-end Model for Speech Recognition
Binbin Zhang, Di Wu, Zhuoyuan Yao, Xiong Wang, Fan Yu, Chao Yang, Liyong Guo, Yaguang Hu, Lei Xie, Xin Lei
arXiv_SD
arXiv_SD
Transformer
Recognition
Speech
Pose
Attention
Speech_Recognition
Inference
PDF
-
Multi-Classifier Interactive Learning for Ambiguous Speech Emotion Recognition
Ying Zhou, Xuefeng Liang, Yu Gu, Yifei Yin, Longshan Yao
arXiv_AI
arXiv_AI
Recognition
Speech
Pose
Emotion
Attention
Speech_Recognition
PDF
-
Segmenting Natural Language Sentences via Lexical Unit Analysis
Yangming Li, Lemao Liu, Shuming Shi
arXiv_CL
arXiv_CL
Segmentation
Recognition
Speech
Inference
PDF
-
Semantic Communications for Speech Signals
Zhenzi Weng, Zhijin Qin, Geoffrey Ye Li
arXiv_SD
arXiv_SD
Speech
Pose
Deep_Learning
Attention
PDF
-
Speech Recognition for Endangered and Extinct Samoyedic languages
Niko Partanen, Mika Hämäläinen, Tiina Klooster
arXiv_CL
arXiv_CL
Recognition
Knowledge
Speech
GAN
Speech_Recognition
Recommendation
PDF
-
DeepTalk: Vocal Style Encoding for Speaker Recognition and Speech Synthesis
Anurag Chowdhury, Arun Ross, Prabu David
arXiv_SD
arXiv_SD
Recognition
Speech
Pose
Contour
PDF
-
On Knowledge Distillation for Direct Speech Translation
Marco Gaido, Mattia A. Di Gangi, Matteo Negri, Marco Turchi
arXiv_CL
arXiv_CL
Recognition
Knowledge
Speech
Speech_Recognition
PDF
-
Breeding Gender-aware Direct Speech Translation Systems
Marco Gaido, Beatrice Savoldi, Luisa Bentivogli, Matteo Negri, Marco Turchi
arXiv_CL
arXiv_CL
Speech
PDF
-
Recent Advances in Computer Audition for Diagnosing COVID-19: An Overview
Kun Qian, Bjorn W. Schuller, Yoshiharu Yamamoto
arXiv_SD
arXiv_SD
Sparse
Speech
Attention
PDF
-
Adversarial Disentanglement of Speaker Representation for Attribute-Driven Privacy Preservation
Paul-Gauthier Noé, Mohammad Mohammadamini, Driss Matrouf, Titouan Parcollet, Jean-François Bonastre
arXiv_AI
arXiv_AI
Embedding
Adversarial
Speech
Pose
PDF
-
Incorporating Domain Knowledge To Improve Topic Segmentation Of Long MOOC Lecture Videos
Ananda Das, Partha Pratim Das
arXiv_CL
arXiv_CL
Segmentation
Knowledge
Knowledge_Graph
Speech
Pose
Language_Model
PDF
-
End-to-End Chinese Parsing Exploiting Lexicons
Yuan Zhang, Zhiyang Teng, Yue Zhang
arXiv_CL
arXiv_CL
Segmentation
Knowledge
Speech
Pose
Attention
PDF
-
Using multiple ASR hypotheses to boost i18n NLU performance
Charith Peris, Gokmen Oz, Khadige Abboud, Venkata sai Varada, Prashan Wanigasekara, Haidar Khan
arXiv_AI
arXiv_AI
Recognition
Speech
Pose
Classification
Summarization
Speech_Recognition
PDF
-
Modeling the effects of dynamic range compression on signals in noise
Ryan M. Corey, Andrew C. Singer
arXiv_SD
arXiv_SD
Speech
PDF
-
Using previous acoustic context to improve Text-to-Speech synthesis
Pilar Oplustil-Gallegos, Simon King
arXiv_CL
arXiv_CL
Embedding
Speech
Pose
Relation
Inference
PDF
-
Speech Imagery Classification using Length-Wise Training based on Deep Learning
Byeong-Hoo Lee, Byeong-Hee Kwon, Do-Yeun Lee, Ji-Hoon Jeong
arXiv_SD
arXiv_SD
Speech
Pose
Face
Classification
Deep_Learning
CNN
PDF
-
Towards end-to-end speech enhancement with a variational U-Net architecture
Eike J. Nustede, Jörn Anemüller
arXiv_SD
arXiv_SD
Enhancement
Speech
Pose
Relation
Denoising
PDF
-
Reverberant Sound Localization with a Robot Head Based on Direct-Path Relative Transfer Function
Xiaofei Li, Laurent Girin, Fabien Badeig, Radu Horaud
arXiv_SD
arXiv_SD
Speech
Pose
Action
PDF
-
EfficientTTS: An Efficient and High-Quality Text-to-Speech Architecture
Chenfeng Miao, Shuang Liang, Zhencheng Liu, Minchuan Chen, Jun Ma, Shaojun Wang, Jing Xiao
arXiv_SD
arXiv_SD
Speech
Pose
PDF
-
From syntactic structure to semantic relationship: hypernym extraction from definitions by recurrent neural networks using the part of speech information
Yixin Tan, Xiaomeng Wang, Tao Jia
arXiv_AI
arXiv_AI
RNN
Speech
Action
Relation
PDF
-
MLS: A Large-Scale Multilingual Dataset for Speech Research
Vineel Pratap, Qiantong Xu, Anuroop Sriram, Gabriel Synnaeve, Ronan Collobert
arXiv_CL
arXiv_CL
Recognition
Speech
Speech_Recognition
Language_Model
PDF
-
Multi-task Learning Based Spoofing-Robust Automatic Speaker Verification System
Yuanjun Zhao, Roberto Togneri, Victor Sreeram
arXiv_SD
arXiv_SD
Recognition
Speech
Pose
Deep_Learning
Detection
PDF
-
On-Device Tag Generation for Unstructured Text
Manish Chugani, Shubham Vatsal, Gopi Ramena, Sukumar Moharana, Naresh Purre
arXiv_CL
arXiv_CL
Knowledge
Speech
Pose
Relation
PDF
-
SpeakingFaces: A Large-Scale Multimodal Dataset of Voice Commands with Visual and Thermal Video Streams
Madina Abdrakhmanova, Askat Kuzdeuov, Sheikh Jarju, Yerbolat Khassanov, Michael Lewis, Huseyin Atakan Varol
arXiv_CV
arXiv_CV
Recognition
Speech
Face
Action
Classification
Speech_Recognition
PDF
-
On-Device Sentence Similarity for SMS Dataset
Arun D Prabhu, Nikhil Arora, Shubham Vatsal, Gopi Ramena, Sukumar Moharana, Naresh Purre
arXiv_CL
arXiv_CL
Speech
Pose
Face
Action
PDF
-
Cross-Modal Generalization: Learning in Low Resource Modalities via Meta-Alignment
Paul Pu Liang, Peter Wu, Liu Ziyin, Louis-Philippe Morency, Ruslan Salakhutdinov
arXiv_AI
arXiv_AI
Speech
Pose
Classification
PDF
-
Predicting Emotions Perceived from Sounds
Faranak Abri, Luis Felipe Gutiérrez, Akbar Siami Namin, David R. W. Sears, Keith S. Jones
arXiv_CV
arXiv_CV
Speech
Emotion
Prediction
PDF
-
FinnSentiment -- A Finnish Social Media Corpus for Sentiment Polarity Annotation
Krister Lindén, Tommi Jauhiainen, Sam Hardwick
arXiv_CL
arXiv_CL
Speech
Survey
Sentiment
PDF
-
Automated Detection of Cyberbullying Against Women and Immigrants and Cross-domain Adaptability
Thushari Atapattu, Mahen Herath, Georgia Zhang, Katrina Falkner
arXiv_CL
arXiv_CL
Bert
Speech
Detection
Recommendation
PDF
-
CUED_speech at TREC 2020 Podcast Summarisation Track
Potsawee Manakul, Mark Gales
arXiv_CL
arXiv_CL
Speech
Attention
PDF
-
A Correspondence Variational Autoencoder for Unsupervised Acoustic Word Embeddings
Puyuan Peng, Herman Kamper, Karen Livescu
arXiv_CL
arXiv_CL
Embedding
Unsupervised
RNN
Speech
Self-Supervised
Pose
PDF
-
Individually amplified text-to-speech
Josef Schlittenlacher, Thomas Baer
arXiv_SD
arXiv_SD
Transfer_Learning
Speech
Pose
PDF
-
LookOut! Interactive Camera Gimbal Controller for Filming Long Takes
Mohamed Sayed, Robert Cinca, Enrico Costanza, Gabriel Brostow
arXiv_RO
arXiv_RO
Tracking
Speech
Pose
Face
Action
PDF
-
End to End ASR System with Automatic Punctuation Insertion
Yushi Guan
arXiv_CL
arXiv_CL
Recognition
RNN
Speech
Pose
Action
Classification
Speech_Recognition
Language_Model
PDF
-
Triplet Entropy Loss: Improving The Generalisation of Short Speech Language Identification Systems
Ruan van der Merwe
arXiv_CV
arXiv_CV
Speech
Classification
PDF
-
Adapt-and-Adjust: Overcoming the Long-Tail Problem of Multilingual Speech Recognition
Genta Indra Winata, Guangsen Wang, Caiming Xiong, Steven Hoi
arXiv_AI
arXiv_AI
Transformer
Recognition
Bert
Speech
Pose
Speech_Recognition
Inference
Language_Model
PDF
-
MelGlow: Efficient Waveform Generative Network Based on Location-Variable Convolution
Zhen Zeng, Jianzong Wang, Ning Cheng, Jing Xiao
arXiv_AI
arXiv_AI
Optimization
Speech
Pose
PDF
-
GraphPB: Graphical Representations of Prosody Boundary in Speech Synthesis
Aolan Sun, Jianzong Wang, Ning Cheng, Huayi Peng, Zhen Zeng, Lingwei Kong, Jing Xiao
arXiv_CL
arXiv_CL
Embedding
RNN
Speech
Pose
Relation
Attention
PDF
-
Combining Spatial Clustering with LSTM Speech Models for Multichannel Speech Enhancement
Felix Grezes, Zhaoheng Ni, Viet Anh Trinh, Michael Mandel
arXiv_SD
arXiv_SD
Enhancement
RNN
Speech
PDF
-
Improved MVDR Beamforming Using LSTM Speech Models to Clean Spatial Clustering Masks
Zhaoheng Ni, Felix Grezes, Viet Anh Trinh, Michael I. Mandel
arXiv_SD
arXiv_SD
RNN
Speech
PDF
-
Enhancement of Spatial Clustering-Based Time-Frequency Masks using LSTM Neural Networks
Felix Grezes, Zhaoheng Ni, Viet Anh Trinh, Michael Mandel
arXiv_SD
arXiv_SD
Enhancement
RNN
Speech
Pose
PDF
-
Joint gender and age estimation based on speech signals using x-vectors and transfer learning
Damian Kwasny, Daria Hemmerling
arXiv_SD
arXiv_SD
Transfer_Learning
Recognition
Speech
Pose
Classification
Speech_Recognition
CNN
PDF
-
The Third DIHARD Diarization Challenge
Neville Ryant, Prachi Singh, Venkat Krishnamohan, Rajat Varma, Kenneth Church, Christopher Cieri, Jun Du, Sriram Ganapathy, Mark Liberman
arXiv_SD
arXiv_SD
Segmentation
Speech
Action
Detection
Activity
PDF
-
Policy Supervectors: General Characterization of Agents by their Behaviour
Anssi Kanervisto, Tomi Kinnunen, Ville Hautamäki
arXiv_AI
arXiv_AI
Reinforcement_Learning
Speech
Pose
PDF
-
Classification of Multimodal Hate Speech -- The Winning Solution of Hateful Memes Challenge
Xiayu Zhong
arXiv_CL
arXiv_CL
Speech
Pose
Classification
PDF
-
Federated Marginal Personalization for ASR Rescoring
Zhe Liu, Fuchun Peng
arXiv_CL
arXiv_CL
Recognition
Speech
Speech_Recognition
Language_Model
PDF
-
Automatically Identifying Language Family from Acoustic Examples in Low Resource Scenarios
Peter Wu, Yifan Zhong, Alan W Black
arXiv_CL
arXiv_CL
Speech
Pose
Deep_Learning
PDF
-
A Unified Deep Speaker Embedding Framework for Mixed-Bandwidth Speech Data
Weicheng Cai, Ming Li
arXiv_CL
arXiv_CL
Embedding
Speech
Pose
Classification
Image_Classification
PDF
-
Deep Ad-hoc Beamforming Based on Speaker Extraction for Target-Dependent Speech Separation
Ziye Yang, Shanzheng Guan, Xiao-Lei Zhang
arXiv_CL
arXiv_CL
Enhancement
Knowledge
Speech
Pose
Action
Deep_Learning
Attention
PDF
-
NHSS: A Speech and Singing Parallel Database
Bidisha Sharma, Xiaoxue Gao, Karthika Vijayan, Xiaohai Tian, Haizhou Li
arXiv_SD
arXiv_SD
Speech
Relation
PDF
-
Improving accuracy of rare words for RNN-Transducer through unigram shallow fusion
Vijay Ravi, Yile Gu, Ankur Gandhe, Ariya Rastrow, Linda Liu, Denis Filimonov, Scott Novotney, Ivan Bulyko
arXiv_CL
arXiv_CL
Recognition
RNN
Speech
Pose
Speech_Recognition
Language_Model
PDF
-
Multi-Modal Detection of Alzheimer's Disease from Speech and Text
Amish Mittal, Sourav Sahoo, Arnhav Datar, Juned Kadiwala, Hrithwik Shalu, Jimson Mathew
arXiv_CL
arXiv_CL
Embedding
Recognition
Bert
Speech
Pose
Classification
Deep_Learning
Detection
Speech_Recognition
CNN
Prediction
PDF
-
Transformer-Transducers for Code-Switched Speech Recognition
Siddharth Dalmia, Yuzong Liu, Srikanth Ronanki, Katrin Kirchhoff
arXiv_CL
arXiv_CL
Transformer
Recognition
Speech
Pose
Speech_Recognition
PDF
-
Convolutive Transfer Function Invariant SDR training criteria for Multi-Channel Reverberant Speech Separation
Christoph Boeddeker, Wangyou Zhang, Tomohiro Nakatani, Keisuke Kinoshita, Tsubasa Ochiai, Marc Delcroix, Naoyuki Kamo, Yanmin Qian, Shinji Watanabe, Reinhold Haeb-Umbach
arXiv_SD
arXiv_SD
Enhancement
Speech
Pose
PDF
-
Look who's not talking
Youngki Kwon, Hee Soo Heo, Jaesung Huh, Bong-Jin Lee, Joon Son Chung
arXiv_SD
arXiv_SD
Embedding
Speech
Detection
Activity
PDF
-
Audio, Speech, Language, & Signal Processing for COVID-19: A Comprehensive Overview
Gauri Deshpande, Björn W. Schuller
arXiv_SD
arXiv_SD
Speech
Detection
PDF
-
Audio-visual Speech Separation with Adversarially Disentangled Visual Representation
Peng Zhang, Jiaming Xu, Jing shi, Yunzhe Hao, Bo Xu
arXiv_CV
arXiv_CV
Reconstruction
Adversarial
Speech
Pose
Face
Detection
Object_Detection
PDF
-
A comparison of handcrafted, parameterized, and learnable features for speech separation
Wenbo Zhu, Mou Wang, Xiao-Lei Zhang, Susanto Rahardja
arXiv_SD
arXiv_SD
Speech
Pose
CNN
PDF
-
Disentangling Homophemes in Lip Reading using Perplexity Analysis
Souheil Fenghour, Daqing Chen, Kun Guo, Perry Xiao
arXiv_CL
arXiv_CL
Transformer
Text_Classification
Speech
Pose
Classification
Language_Model
Prediction
PDF
-
Unsupervised Spoken Term Discovery Based on Re-clustering of Hypothesized Speech Segments with Siamese and Triplet Networks
Man-Ling Sung, Tan Lee
arXiv_CL
arXiv_CL
Unsupervised
Speech
Pose
Matching
PDF
-
Unsupervised Domain Adaptation for Speech Recognition via Uncertainty Driven Self-Training
Sameer Khurana, Niko Moritz, Takaaki Hori, Jonathan Le Roux
arXiv_CL
arXiv_CL
Unsupervised
Recognition
Speech
Pose
Speech_Recognition
Prediction
PDF
-
Improving RNN Transducer With Target Speaker Extraction and Neural Uncertainty Estimation
Jiatong Shi, Chunlei Zhang, Chao Weng, Shinji Watanabe, Meng Yu, Dong Yu
arXiv_SD
arXiv_SD
Enhancement
Recognition
RNN
Speech
Pose
Action
Speech_Recognition
PDF
-
Analysing Social Media Network Data with R: Semi-Automated Screening of Users, Comments and Communication Patterns
Dennis Klinkhammer
arXiv_CV
arXiv_CV
Speech
PDF
-
Towards Interpretable Multilingual Detection of Hate Speech against Immigrants and Women in Twitter at SemEval-2019 Task 5
Alvi Md Ishmam
arXiv_CL
arXiv_CL
Speech
Pose
Detection
CNN
PDF
-
Streaming end-to-end multi-talker speech recognition
Liang Lu, Naoyuki Kanda, Jinyu Li, Yifan Gong
arXiv_CL
arXiv_CL
Recognition
Knowledge
Speech
Pose
Speech_Recognition
PDF
-
FBWave: Efficient and Scalable Neural Vocoders for Streaming Text-To-Speech on the Edge
Bichen Wu, Qing He, Peizhao Zhang, Thilo Koehler, Kurt Keutzer, Peter Vajda
arXiv_SD
arXiv_SD
RNN
Speech
Pose
Inference
PDF
-
De-STT: De-entaglement of unwanted Nuisances and Biases in Speech to Text System using Adversarial Forgetting
Hemant Yadav, Janvijay Singh, Atul Anshuman Singh, Rachit Mittal, Rajiv Ratn Shah
arXiv_AI
arXiv_AI
Adversarial
Speech
Pose
Deep_Learning
PDF
-
Neural Representations for Modeling Variation in English Speech
Martijn Bartelds, Wietse de Vries, Faraz Sanal, Caitlin Richter, Mark Liberman, Martijn Wieling
arXiv_CL
arXiv_CL
Transformer
Embedding
Speech
Self-Supervised
Action
PDF
-
A Panoramic Survey of Natural Language Processing in the Arab World
Kareem Darwish, Nizar Habash, Mourad Abbas, Hend Al-Khalifa, Huseein T. Al-Natsheh, Samhaa R. El-Beltagy, Houda Bouamor, Karim Bouzoubaa, Violetta Cavalli-Sforza, Wassim El-Hajj, Mustafa Jarrar, Hamdy Mubarak
arXiv_CL
arXiv_CL
Recognition
OCR
Optical_Character
Speech
Survey
Sentiment
Speech_Recognition
PDF
-
SAR-Net: A End-to-End Deep Speech Accent Recognition Network
Wei Wang, Chao Zhang, Xiaopei Wu
arXiv_AI
arXiv_AI
Embedding
Recognition
Optimization
RNN
Speech
Pose
Face
Classification
Face_Recognition
Speech_Recognition
PDF
-
Enhancing deep neural networks with morphological information
Matej Klemen, Luka Krsnik, Marko Robnik-Šikonja
arXiv_CL
arXiv_CL
Transformer
Recognition
Bert
RNN
Speech
Deep_Learning
Language_Model
PDF
-
A Pattern-mining Driven Study on Differences of Newspapers in Expressing Temporal Information
Yingxue Fu, Elaine Ui Dhonnchadha
arXiv_CL
arXiv_CL
Speech
Attention
PDF
-
Tight Integrated End-to-End Training for Cascaded Speech Translation
Parnia Bahar, Tobias Bieschke, Ralf Schlüter, Hermann Ney
arXiv_CL
arXiv_CL
Speech
Pose
PDF
-
Multi-Decoder DPRNN: High Accuracy Source Counting and Separation
Junzhe Zhu, Raymond Yeh, Mark Hasegawa-Johnson
arXiv_SD
arXiv_SD
RNN
Speech
Pose
PDF
-
A Review of Recent Advances of Binary Neural Networks for Edge Computing
Wenyu Zhao, Teli Ma, Xuan Gong, Baochang Zhang, David Doermann
arXiv_AI
arXiv_AI
NAS
Quantization
Recognition
Optimization
Review
Speech
Speech_Recognition
PDF
-
Synth2Aug: Cross-domain speaker recognition with TTS synthesized speech
Yiling Huang, Yutian Chen, Jason Pelecanos, Quan Wang
arXiv_SD
arXiv_SD
Recognition
Speech
Speech_Recognition
Matching
PDF
-
Acoustic span embeddings for multilingual query-by-example search
Yushi Hu, Shane Settle, Karen Livescu
arXiv_CL
arXiv_CL
Embedding
Speech
Matching
PDF
-
Multi-task Language Modeling for Improving Speech Recognition of Rare Words
Chao-Han Huck Yang, Linda Liu, Ankur Gandhe, Yile Gu, Anirudh Raju, Denis Filimonov, Ivan Bulyko
arXiv_AI
arXiv_AI
Recognition
Speech
Pose
Speech_Recognition
Language_Model
Prediction
PDF
-
Streaming Multi-speaker ASR with RNN-T
Ilya Sklyar, Anna Piunova, Yulan Liu
arXiv_CL
arXiv_CL
Tracking
Recognition
RNN
Speech
Pose
Action
Speech_Recognition
Inference
PDF
-
The Zero Resource Speech Benchmark 2021: Metrics and baselines for unsupervised spoken language modeling
Tu Anh Nguyen, Maureen de Seyssel, Patricia Rozé, Morgane Rivière, Evgeny Kharitonov, Alexei Baevski, Ewan Dunbar, Emmanuel Dupoux
arXiv_CL
arXiv_CL
Unsupervised
Bert
Zero-Shot
Represenation_Learning
RNN
Speech
Self-Supervised
Language_Model
PDF
-
Using Synthetic Audio to Improve The Recognition of Out-Of-Vocabulary Words in End-To-End ASR Systems
Xianrui Zheng, Yulan Liu, Deniz Gunceler, Daniel Willett (Amazon Alexa)
arXiv_SD
arXiv_SD
Recognition
RNN
Speech
Speech_Recognition
PDF
-
An Online Multilingual Hate speech Recognition System
Neeraj Vashistha, Arkaitz Zubiaga
arXiv_CL
arXiv_CL
Recognition
Speech
Action
Detection
Speech_Recognition
PDF
-
Speech Command Recognition in Computationally Constrained Environments with a Quadratic Self-organized Operational Layer
Mohammad Soltanian (1), Junaid Malik (1), Jenni Raitoharju (2), Alexandros Iosifidis (3), Serkan Kiranyaz (4), Moncef Gabbouj (1) ((1) Department of Computing Sciences, Tampere University, Finland, (2) Programme for Environmental Information, Finnish Environment Institute, Jyvaskyla, Finland, (3) Department of Electrical and Computer Engineering, Aarhus University, Denmark, (4) Electrical Engineering Department, Qatar University, Qatar)
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Action
Classification
Deep_Learning
GAN
PDF
-
STEPs-RL: Speech-Text Entanglement for Phonetically Sound Representation Learning
Prakamya Mishra
arXiv_CL
arXiv_CL
Represenation_Learning
Knowledge
Speech
Pose
Language_Model
PDF
-
End-to-end Silent Speech Recognition with Acoustic Sensing
Jian Luo, Jianzong Wang, Ning Cheng, Guilin Jiang, Jing Xiao
arXiv_SD
arXiv_SD
Recognition
Speech
Pose
Face
Deep_Learning
Attention
Speech_Recognition
PDF
-
Effect of Word Embedding Models on Hate and Offensive Speech Detection
Safa Alsafari, Samira Sadaoui, Malek Mouhoub
arXiv_CL
arXiv_CL
Embedding
Speech
Classification
Detection
PDF
-
Hierachical Delta-Attention Method for Multimodal Fusion
Kunjal Panchal
arXiv_CV
arXiv_CV
Speech
Emotion
Classification
Attention
PDF
-
A Better and Faster End-to-End Model for Streaming ASR
Bo Li, Anmol Gulati, Jiahui Yu, Tara N. Sainath, Chung-Cheng Chiu, Arun Narayanan, Shuo-Yiin Chang, Ruoming Pang, Yanzhang He, James Qin, Wei Han, Qiao Liang, Yu Zhang, Trevor Strohman, Yonghui Wu
arXiv_SD
arXiv_SD
Recognition
RNN
Speech
Speech_Recognition
Prediction
PDF
-
Exploring Voice Conversion based Data Augmentation in Text-Dependent Speaker Verification
Xiaoyi Qin, Yaogen Yang, Lin Yang, Xuyang Wang, Junjie Wang, Ming Li
arXiv_SD
arXiv_SD
Speech
Pose
Deep_Learning
PDF
-
Deep Network Perceptual Losses for Speech Denoising
Mark R. Saddler, Andrew Francl, Jenelle Feather, Kaizhi Qian, Yang Zhang, Josh H. McDermott
arXiv_SD
arXiv_SD
Enhancement
Recognition
Speech
Classification
Denoising
PDF
-
Iterative Text-based Editing of Talking-heads Using Neural Retargeting
Xinwei Yao, Ohad Fried, Kayvon Fatahalian, Maneesh Agrawala
arXiv_CV
arXiv_CV
Gesture
Speech
Self-Supervised
PDF
-
Self-Supervised learning with cross-modal transformers for emotion recognition
Aparna Khare, Srinivas Parthasarathy, Shiva Sundaram
arXiv_CL
arXiv_CL
Transformer
Embedding
Recognition
Bert
Speech
Self-Supervised
Emotion
Language_Model
PDF
-
Improving RNN-T ASR Accuracy Using Untranscribed Context Audio
Andreas Schwarz, Ilya Sklyar, Simon Wiesler
arXiv_SD
arXiv_SD
Recognition
RNN
Speech
Speech_Recognition
Inference
PDF
-
Empirical Evaluation of Deep Learning Model Compression Techniques on the WaveNet Vocoder
Sam Davis, Giuseppe Coccia, Sam Gooch, Julian Mack
arXiv_SD
arXiv_SD
Quantization
Speech
Pose
Survey
Deep_Learning
PDF
-
Are Chess Discussions Racist? An Adversarial Hate Speech Data Set
Rupak Sarkar, Ashiqur R. KhudaBukhsh
arXiv_CL
arXiv_CL
Adversarial
Speech
PDF
-
One Shot Learning for Speech Separation
Yuan-Kuei Wu, Kuan-Po Huang, Yu Tsao, Hung-yi Lee
arXiv_SD
arXiv_SD
Speech
Pose
PDF
-
Persuasive Dialogue Understanding: the Baselines and Negative Results
Hui Chen, Deepanway Ghosal, Navonil Majumder, Amir Hussain, Soujanya Poria
arXiv_CL
arXiv_CL
Transformer
Recognition
Optimization
RNN
Speech
Action
Relation
Attention
CNN
PDF
-
TaL: a synchronised multi-speaker corpus of ultrasound tongue imaging, audio, and lip videos
Manuel Sam Ribeiro, Jennifer Sanger, Jing-Xuan Zhang, Aciel Eshky, Alan Wrench, Korin Richmond, Steve Renals
arXiv_CV
arXiv_CV
Recognition
Speech
Speech_Recognition
PDF
-
Deep Residual Local Feature Learning for Speech Emotion Recognition
Sattaya Singkul, Thakorn Chatchaisathaporn, Boontawee Suntisrivaraporn, Kuntpong Woraratpanya
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Emotion
Deep_Learning
Relation
PDF
-
Universal MelGAN: A Robust Neural Vocoder for High-Fidelity Waveform Generation in Multiple Domains
Won Jang, Dan Lim, Jaesam Yoon
arXiv_CL
arXiv_CL
Transformer
Speech
Pose
Emotion
GAN
Inference
PDF
-
Context-aware RNNLM Rescoring for Conversational Speech Recognition
Kun Wei, Pengcheng Guo, Hang Lv, Zhen Tu, Lei Xie
arXiv_SD
arXiv_SD
Recognition
RNN
Speech
Speech_Recognition
PDF
-
Combining Prosodic, Voice Quality and Lexical Features to Automatically Detect Alzheimer's Disease
Mireia Farrús, Joan Codina-Filbà
arXiv_CL
arXiv_CL
Gradient_Descent
Speech
Action
Classification
Detection
PDF
-
On the use of Self-supervised Pre-trained Acoustic and Linguistic Features for Continuous Speech Emotion Recognition
Manon Macary, Marie Tahon, Yannick Estève, Anthony Rousseau
arXiv_CL
arXiv_CL
Embedding
Recognition
Bert
Knowledge
Speech
Self-Supervised
Emotion
Action
Relation
Language_Model
PDF
-
WPD++: An Improved Neural Beamformer for Simultaneous Speech Separation and Dereverberation
Zhaoheng Ni, Yong Xu, Meng Yu, Bo Wu, Shixiong Zhang, Dong Yu, Michael I Mandel
arXiv_SD
arXiv_SD
Recognition
Speech
Pose
Relation
Speech_Recognition
PDF
-
Multi-Channel Automatic Speech Recognition Using Deep Complex Unet
Yuxiang Kong, Jian Wu, Quandong Wang, Peng Gao, Weiji Zhuang, Yujun Wang, Lei Xie
arXiv_SD
arXiv_SD
Enhancement
Recognition
Speech
Pose
Speech_Recognition
PDF
-
Controllable Emotion Transfer For End-to-End Speech Synthesis
Tao Li, Shan Yang, Liumeng Xue, Lei Xie
arXiv_SD
arXiv_SD
Embedding
Salient
Speech
Pose
Emotion
PDF
-
Adversarial Training for Multi-domain Speaker Recognition
Qing Wang, Wei Rao, Pengcheng Guo, Lei Xie
arXiv_SD
arXiv_SD
Unsupervised
Recognition
Adversarial
Speech
Pose
PDF
-
Empowering Things with Intelligence: A Survey of the Progress, Challenges, and Opportunities in Artificial Intelligence of Things
Jing Zhang, Dacheng Tao
arXiv_AI
arXiv_AI
Recognition
Speech
Survey
Deep_Learning
Speech_Recognition
PDF
-
Accent and Speaker Disentanglement in Many-to-many Voice Conversion
Zhichao Wang, Wenshuo Ge, Xiong Wang, Shan Yang, Wendong Gan, Haitao Chen, Hai Li, Lei Xie, Xiulin Li
arXiv_SD
arXiv_SD
Recognition
Adversarial
Speech
Pose
PDF
-
Optimizing voice conversion network with cycle consistency loss of speaker identity
Hongqiang Du, Xiaohai Tian, Lei Xie, Haizhou Li
arXiv_SD
arXiv_SD
Speech
Pose
PDF
-
s-Transformer: Segment-Transformer for Robust Neural Speech Synthesis
Xi Wang, Huaiping Ming, Lei He, Frank K. Soong
arXiv_SD
arXiv_SD
Transformer
Speech
Pose
Attention
PDF
-
Fine-grained Emotion Strength Transfer, Control and Prediction for Emotional Speech Synthesis
Yi Lei, Shan Yang, Lei Xie
arXiv_SD
arXiv_SD
Speech
Pose
Emotion
Inference
Prediction
PDF
-
Cascade RNN-Transducer: Syllable Based Streaming On-device Mandarin Speech Recognition with a Syllable-to-Character Converter
Xiong Wang, Zhuoyuan Yao, Xian Shi, Lei Xie
arXiv_CL
arXiv_CL
Recognition
RNN
Speech
Pose
Speech_Recognition
Language_Model
Prediction
PDF
-
Learn2Sing: Target Speaker Singing Voice Synthesis by learning from a Singing Teacher
Heyang Xue, Shan Yang, Yi Lei, Lei Xie, Xiulin Li
arXiv_SD
arXiv_SD
Embedding
Adversarial
Speech
Pose
Attention
Inference
Prediction
PDF
-
Implicit Filter-and-sum Network for Multi-channel Speech Separation
Yi Luo, Nima Mesgarani
arXiv_SD
arXiv_SD
Speech
Pose
Action
Relation
PDF
-
Rethinking the Separation Layers in Speech Separation Networks
Yi Luo, Zhuo Chen, Cong Han, Chenda Li, Tianyan Zhou, Nima Mesgarani
arXiv_SD
arXiv_SD
Enhancement
Speech
PDF
-
Ultra-Lightweight Speech Separation via Group Communication
Yi Luo, Cong Han, Nima Mesgarani
arXiv_SD
arXiv_SD
Quantization
Enhancement
RNN
Speech
PDF
-
Refining Automatic Speech Recognition System for older adults
Liu Chen, Meysam Asgari
arXiv_CL
arXiv_CL
Transfer_Learning
Recognition
Speech
Attention
Speech_Recognition
PDF
-
It's a Thin Line Between Love and Hate: Using the Echo in Modeling Dynamics of Racist Online Communities
Eyal Arviv, Simo Hanouna, Oren Tsur
arXiv_CL
arXiv_CL
Transformer
Reconstruction
Tracking
Bert
RNN
Speech
Pose
Detection
PDF
-
A New Dataset and Proposed Convolutional Neural Network Architecture for Classification of American Sign Language Digits
Arda Mavi
arXiv_CV
arXiv_CV
Speech
Pose
Classification
CNN
PDF
-
Block-Online Guided Source Separation
Shota Horiguchi, Yusuke Fujita, Kenji Nagamatsu
arXiv_SD
arXiv_SD
Optimization
Speech
Pose
PDF
-
Deep Shallow Fusion for RNN-T Personalization
Duc Le, Gil Keren, Julian Chan, Jay Mahadeokar, Christian Fuegen, Michael L. Seltzer
arXiv_CL
arXiv_CL
Recognition
RNN
Speech
Action
Speech_Recognition
Language_Model
PDF
-
iPerceive: Applying Common-Sense Reasoning to Multi-Modal Dense Video Captioning and Video Question Answering
Aman Chadha, Gurneet Arora, Navpreet Kaloty
arXiv_AI
arXiv_AI
Recognition
Video_Caption
Knowledge
Speech
Pose
Relation
VQA
Attention
Caption
Activity
QA
PDF
-
Learn an Effective Lip Reading Model without Pains
Dalu Feng, Shuang Yang, Shiguang Shan, Xilin Chen
arXiv_CV
arXiv_CV
Recognition
Speech
Quantitative
Deep_Learning
Speech_Recognition
PDF
-
Automatic and perceptual discrimination between dysarthria, apraxia of speech, and neurotypical speech
I. Kodrasi, M. Pernon, M. Laganaro, H. Bourlard
arXiv_SD
arXiv_SD
Speech
Pose
Classification
PDF
-
Respiratory Distress Detection from Telephone Speech using Acoustic and Prosodic Features
Meemnur Rashid, Kaisar Ahmed Alman, Khaled Hasan, John H.L. Hansen, Taufiq Hasan
arXiv_SD
arXiv_SD
Speech
Detection
Relation
PDF
-
Speech enhancement guided by contextual articulatory information
Yen-Ju Lu, Chia-Yu Chang, Yu Tsao, Jeih-weih Hung
arXiv_SD
arXiv_SD
Enhancement
Recognition
Speech
Pose
Speech_Recognition
PDF
-
Communication-Cost Aware Microphone Selection For Neural Speech Enhancement with Ad-hoc Microphone Arrays
Jonah Casebeer, Jamshed Kaikaus, Paris Smaragdis
arXiv_SD
arXiv_SD
Enhancement
Speech
Pose
PDF
-
Speech Prediction in Silent Videos using Variational Autoencoders
Ravindra Yadav, Ashish Sardana, Vinay P Namboodiri, Rajesh M Hegde
arXiv_CV
arXiv_CV
RNN
Speech
Pose
Relation
Prediction
PDF
-
DebateSum: A large-scale argument mining and summarization dataset
Allen Roush, Arvind Balaji
arXiv_CL
arXiv_CL
Transformer
Speech
Summarization
PDF
-
Multi-Modal Emotion Detection with Transfer Learning
Amith Ananthram, Kailash Karthik Saravanakumar, Jessica Huynh, Homayoon Beigi
arXiv_CL
arXiv_CL
Embedding
Transfer_Learning
Bert
Speech
Emotion
Detection
PDF
-
Low-activity supervised convolutional spiking neural networks applied to speech commands recognition
Thomas Pellegrini, Romain Zimmer, Timothée Masquelier
arXiv_SD
arXiv_SD
Recognition
Sparse
Regularization
Speech
Activity
CNN
PDF
-
Cross-Domain Learning forClassifying Propaganda in Online Contents
Liqiang Wang, Xiaoyu Shen, Gerard de Melo, Gerhard Weikum
arXiv_CL
arXiv_CL
Salient
Speech
Attention
GAN
PDF
-
Re-framing Incremental Deep Language Models for Dialogue Processing with Multi-task Learning
Morteza Rohanian, Julian Hough
arXiv_CL
arXiv_CL
Segmentation
Speech
Detection
Language_Model
PDF
-
A Survey on Recent Advances in Sequence Labeling from Deep Learning Models
Zhiyong He, Zanbo Wang, Wei Wei, Shanshan Feng, Xianling Mao, Sheng Jiang
arXiv_AI
arXiv_AI
Embedding
Recognition
Knowledge
Review
Knowledge_Graph
Speech
Survey
Deep_Learning
PDF
-
The SLT 2021 children speech recognition challenge: Open datasets, rules and baselines
Yu Fan, Zhuoyuan Yao, Xiong Wang, Keyu An, Lei Xie, Zhijian Ou, Bo Liu, Xiulin Li, Guanqiong Miao
arXiv_SD
arXiv_SD
Recognition
Speech
Deep_Learning
Speech_Recognition
PDF
-
Evaluating the Intelligibility Benefits of Neural Speech Enrichment for Listeners with Normal Hearing and Hearing Impairment using the Greek Harvard Corpus
Muhammed PV Shifas, Anna Sfakianaki, Theognosia Chimona, Yannis Stylianou
arXiv_SD
arXiv_SD
Enhancement
Speech
Matching
PDF
-
Self-supervised reinforcement learning for speaker localisation with the iCub humanoid robot
Jonas Gonzalez-Billandon, Lukas Grasse, Matthew Tata, Alessandra Sciutti, Francesco Rea
arXiv_AI
arXiv_AI
Recognition
Reinforcement_Learning
Speech
Self-Supervised
Pose
Face
Action
Deep_Learning
Speech_Recognition
Autonomous
PDF
-
Exploiting Cross-Dialectal Gold Syntax for Low-Resource Historical Languages: Towards a Generic Parser for Pre-Modern Slavic
Nilo Pedrazzini (University of Oxford)
arXiv_CL
arXiv_CL
Speech
PDF
-
Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis
Chung-Ming Chien, Hung-yi Lee
arXiv_SD
arXiv_SD
Speech
Pose
Inference
Prediction
PDF
-
Using IPA-Based Tacotron for Data Efficient Cross-Lingual Speaker Adaptation and Pronunciation Enhancement
Hamed Hemati, Damian Borth
arXiv_SD
arXiv_SD
Transfer_Learning
Enhancement
Speech
Pose
PDF
-
The CUHK-TUDELFT System for The SLT 2021 Children Speech Recognition Challenge
Si-Ioi Ng, Wei Liu, Zhiyuan Peng, Siyuan Feng, Hing-Pang Huang, Odette Scharenborg, Tan Lee
arXiv_SD
arXiv_SD
Transfer_Learning
Recognition
Speech
Attention
Speech_Recognition
Language_Model
PDF
-
Cross-lingual and Multilingual Spoken Term Detection for Low-Resource Indian Languages
Sanket Shah, Satarupa Guha, Simran Khanuja, Sunayana Sitaram
arXiv_CL
arXiv_CL
Zero-Shot
Speech
Pose
Detection
Language_Model
Matching
PDF
-
Enabling Interactive Transcription in an Indigenous Community
Éric Le Ferrand, Steven Bird, Laurent Besacier
arXiv_CL
arXiv_CL
Speech
Pose
Detection
PDF
-
Augmenting BERT Carefully with Underrepresented Linguistic Features
Aparna Balagopalan, Jekaterina Novikova
arXiv_CL
arXiv_CL
Transformer
Bert
Speech
Classification
Detection
PDF
-
Efficient Knowledge Distillation for RNN-Transducer Models
Sankaran Panchapagesan, Daniel S. Park, Chung-Cheng Chiu, Yuan Shangguan, Qiao Liang, Alexander Gruenstein
arXiv_SD
arXiv_SD
Recognition
Sparse
RNN
Knowledge
Speech
Pose
Speech_Recognition
PDF
-
Text Augmentation for Language Models in High Error Recognition Scenario
Karel Beneš, Lukáš Burget
arXiv_CL
arXiv_CL
Recognition
Speech
Attention
Speech_Recognition
Language_Model
Prediction
PDF
-
On End-to-end Multi-channel Time Domain Speech Separation in Reverberant Environments
Jisi Zhang, Catalin Zorila, Rama Doddipatla, Jon Barker
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Action
Speech_Recognition
CNN
PDF
-
FAT: Training Neural Networks for Reliable Inference Under Hardware Faults
Ussama Zahid, Giulio Gambardella, Nicholas J. Fraser, Michaela Blott, Kees Vissers
arXiv_CV
arXiv_CV
Recognition
Speech
Classification
Speech_Recognition
Medical
CNN
Image_Classification
Inference
PDF
-
Low-resource expressive text-to-speech using data augmentation
Goeric Huybrechts, Thomas Merritt, Giulia Comini, Bartek Perz, Raahil Shah, Jaime Lorenzo-Trueba
arXiv_CL
arXiv_CL
Speech
Pose
PDF
-
WaDeNet: Wavelet Decomposition based CNN for Speech Processing
Prithvi Suresh, Abhijith Ragav
arXiv_SD
arXiv_SD
Recognition
Speech
Pose
Emotion
Action
PDF
-
Deep Time Delay Neural Network for Speech Enhancement with Full Data Learning
Cunhang Fan, Bin Liu, Jianhua Tao, Jiangyan Yi, Zhengqi Wen, Leichao Song
arXiv_SD
arXiv_SD
Enhancement
RNN
Speech
Pose
Inference
PDF
-
Recognizing More Emotions with Less Data Using Self-supervised Transfer Learning
Jonathan Boigne, Biman Liyanage, Ted Östrem
arXiv_SD
arXiv_SD
Transfer_Learning
Recognition
Bert
Knowledge
Speech
Self-Supervised
Pose
Emotion
Attention
PDF
-
Surrogate Source Model Learning for Determined Source Separation
Robin Scheibler, Masahito Togami
arXiv_SD
arXiv_SD
Speech
Pose
PDF
-
Spoken Language Interaction with Robots: Research Issues and Recommendations, Report from the NSF Future Directions Workshop
Matthew Marge, Carol Espy-Wilson, Nigel Ward
arXiv_CL
arXiv_CL
Gesture
Speech
Face
Action
Recommendation
PDF
-
Recurrent Deep Stacking Networks for Speech Recognition
Peidong Wang, Zhongqiu Wang, Deliang Wang
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Speech_Recognition
PDF
-
Incorporating Language Level Information into Acoustic Models
Peidong Wang, Deliang Wang
arXiv_CL
arXiv_CL
Recognition
RNN
Speech
Pose
Speech_Recognition
PDF
-
Towards Semi-Supervised Semantics Understanding from Speech
Cheng-I Lai, Jin Cao, Sravan Bodapati, Shang-Wen Li
arXiv_AI
arXiv_AI
Recognition
Bert
Speech
Self-Supervised
Pose
Speech_Recognition
Language_Model
PDF
-
Artificial sound change: Language change and deep convolutional neural networks in iterative learning
Gašper Beguš
arXiv_CL
arXiv_CL
Adversarial
Speech
Pose
GAN
CNN
PDF
-
Using GANs to Synthesise Minimum Training Data for Deepfake Generation
Simranjeet Singh, Rajneesh Sharma, Alan F. Smeaton
arXiv_CV
arXiv_CV
Adversarial
Speech
Relation
GAN
PDF
-
Enhancing Low-Quality Voice Recordings Using Disentangled Channel Factor and Neural Waveform Model
Haoyu Li, Yang Ai, Junichi Yamagishi
arXiv_SD
arXiv_SD
Enhancement
Adversarial
Speech
Pose
PDF
-
A low latency ASR-free end to end spoken language understanding system
Mohamed Mhiri, Samuel Myer, Vikrant Singh Tomar
arXiv_CV
arXiv_CV
Speech
Pose
PDF
-
Simultaneous Speech-to-Speech Translation System with Neural Incremental ASR, MT, and TTS
Katsuhito Sudoh, Takatomo Kano, Sashi Novitasari, Tomoya Yanagita, Sakriani Sakti, Satoshi Nakamura
arXiv_CL
arXiv_CL
Recognition
Speech
Speech_Recognition
PDF
-
Pretraining Strategies, Waveform Model Choice, and Acoustic Configurations for Multi-Speaker End-to-End Speech Synthesis
Erica Cooper, Xin Wang, Yi Zhao, Yusuke Yasuda, Junichi Yamagishi
arXiv_CL
arXiv_CL
Zero-Shot
RNN
Speech
Inference
PDF
-
Language Through a Prism: A Spectral Approach for Multiscale Language Representations
Alex Tamkin, Dan Jurafsky, Noah Goodman
arXiv_CL
arXiv_CL
Embedding
Bert
Speech
Pose
Classification
PDF
-
Benchmarking LF-MMI, CTC and RNN-T Criteria for Streaming ASR
Xiaohui Zhang, Frank Zhang, Chunxi Liu, Kjell Schubert, Julian Chan, Pradyot Prakash, Jun Liu, Ching-Feng Yeh, Fuchun Peng, Yatharth Saraf, Geoffrey Zweig
arXiv_SD
arXiv_SD
Recognition
RNN
Knowledge
Speech
Speech_Recognition
Inference
PDF
-
Personalized Query Rewriting in Conversational AI Agents
Alireza Roshan-Ghias, Clint Solomon Mathialagan, Pragaash Ponnusamy, Lambert Mathias, Chenlei Guo
arXiv_AI
arXiv_AI
Recognition
Speech
Pose
Action
Attention
Speech_Recognition
PDF
-
Speaker De-identification System using Autoencodersand Adversarial Training
Fernando M. Espinoza-Cuadros, Juan M. Perero-Codosero, Javier Antón-Martín, Luis A. Hernández-Gómez
arXiv_CL
arXiv_CL
Adversarial
Speech
Pose
Face
Deep_Learning
PDF
-
FUN! Fast, Universal, Non-Semantic Speech Embeddings
Jacob Peplinski, Joel Shor, Sachin Joglekar, Jake Garrison, Shwetak Patel
arXiv_SD
arXiv_SD
Embedding
Knowledge
Speech
Pose
Detection
PDF
-
Data Augmentation For Children's Speech Recognition -- The 'Ethiopian' System For The SLT 2021 Children Speech Recognition Challenge
Guoguo Chen, Xingyu Na, Yongqing Wang, Zhiyong Yan, Junbo Zhang, Sifan Ma, Yujun Wang
arXiv_SD
arXiv_SD
Recognition
Speech
Pose
GAN
Speech_Recognition
PDF
-
Neural Architecture Search with an Efficient Multiobjective Evolutionary Framework
Maria Baldeon Calisto, Susana Lai-Yuen
arXiv_AI
arXiv_AI
NAS
Segmentation
Recognition
3D
Optimization
Speech
Pose
Classification
Deep_Learning
Speech_Recognition
Image_Classification
PDF
-
An Empirical Study of Visual Features for DNN based Audio-Visual Speech Enhancement in Multi-talker Environments
Shrishti Saha Shetu, Soumitro Chakrabarty, Emanuël A. P. Habets
arXiv_CV
arXiv_CV
Embedding
Enhancement
Knowledge
Speech
Optical_Flow
PDF
-
COVID-19 Patient Detection from Telephone Quality Speech Data
Kotra Venkata Sai Ritwik, Shareef Babu Kalluri, Deepu Vijayasenan
arXiv_SD
arXiv_SD
Recognition
Speech
Detection
PDF
-
STOI-Net: A Deep Learning based Non-Intrusive Speech Intelligibility Assessment Model
Ryandhimas E. Zezario, Szu-Wei Fu, Chiou-Shann Fuh, Yu Tsao, Hsin-Min Wang
arXiv_SD
arXiv_SD
RNN
Speech
Pose
Deep_Learning
Relation
Attention
CNN
PDF
-
Gated Recurrent Fusion with Joint Training Framework for Robust End-to-End Speech Recognition
Cunhang Fan, Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Bin Liu, Zhengqi Wen
arXiv_CL
arXiv_CL
Transformer
Enhancement
Recognition
Speech
Pose
Speech_Recognition
PDF
-
Efficient End-to-End Speech Recognition Using Performers in Conformers
Peidong Wang, DeLiang Wang
arXiv_CL
arXiv_CL
Transformer
Recognition
Speech
Pose
Attention
Speech_Recognition
PDF
-
Frequency Gating: Improved Convolutional Neural Networks for Speech Enhancement in the Time-Frequency Domain
Koen Oostermeijer, Qing Wang, Jun Du
arXiv_SD
arXiv_SD
Enhancement
Speech
Pose
CNN
PDF
-
Listen, Look and Deliberate: Visual context-aware speech recognition using pre-trained text-video representations
Shahram Ghorbani, Yashesh Gaur, Yu Shi, Jinyu Li
arXiv_SD
arXiv_SD
Embedding
Recognition
Speech
Self-Supervised
Pose
Attention
Speech_Recognition
PDF
-
On the Usefulness of Self-Attention for Automatic Speech Recognition with Transformers
Shucong Zhang, Erfan Loweimi, Peter Bell, Steve Renals
arXiv_CL
arXiv_CL
Transformer
Recognition
Speech
Relation
Attention
Speech_Recognition
PDF
-
Stochastic Attention Head Removal: A Simple and Effective Method for Improving Automatic Speech Recognition with Transformers
Shucong Zhang, Erfan Loweimi, Peter Bell, Steve Renals
arXiv_CL
arXiv_CL
Transformer
Recognition
Speech
Pose
Attention
Speech_Recognition
PDF
-
Denoising-and-Dereverberation Hierarchical Neural Vocoder for Robust Waveform Generation
Yang Ai, Haoyu Li, Xin Wang, Junichi Yamagishi, Zhenhua Ling
arXiv_SD
arXiv_SD
Enhancement
Speech
Denoising
PDF
-
Fine-grained style modelling and transfer in text-to-speech synthesis via content-style disentanglement
Tan Daxin, Lee Tan
arXiv_SD
arXiv_SD
Embedding
Style_Transfer
Adversarial
Speech
Pose
PDF
-
Dual Application of Speech Enhancement for Automatic Speech Recognition
Ashutosh Pandey, Chunxi Liu, Yun Wang, Yatharth Saraf
arXiv_SD
arXiv_SD
Enhancement
Recognition
RNN
Speech
Pose
Speech_Recognition
CNN
PDF
-
Enhancement by postfiltering for speech and audio coding in ad-hoc sensor networks
Sneha Das, Tom Bäckström
arXiv_SD
arXiv_SD
Quantization
Enhancement
Speech
Pose
PDF
-
NLP-CIC @ DIACR-Ita: POS and Neighbor Based Distributional Models for Lexical Semantic Change in Diachronic Italian Corpora
Jason Angel, Carlos A. Rodriguez-Diaz, Alexander Gelbukh, Sergio Jimenez
arXiv_AI
arXiv_AI
Embedding
Unsupervised
Speech
Pose
PDF
-
Template Controllable keywords-to-text Generation
Abhijit Mishra, Md Faisal Mahbub Chowdhury, Sagar Manohar, Dan Gutfreund, Karthik Sankaranarayanan
arXiv_AI
arXiv_AI
Speech
Pose
Face
Quantitative
Text_Generation
PDF
-
Naturalization of Text by the Insertion of Pauses and Filler Words
Richa Sharma, Parth Vipul Shah, Ashwini M. Joshi
arXiv_CL
arXiv_CL
Speech
Pose
Survey
Action
PDF
-
ESPnet-se: end-to-end speech enhancement and separation toolkit designed for asr integration
Chenda Li, Jing Shi, Wangyou Zhang, Aswin Shanmugam Subramanian, Xuankai Chang, Naoyuki Kamo, Moto Hira, Tomoki Hayashi, Christoph Boeddeker, Zhuo Chen, Shinji Watanabe
arXiv_SD
arXiv_SD
Enhancement
Recognition
Speech
Pose
Action
Denoising
Speech_Recognition
PDF
-
Detection and Evaluation of human and machine generated speech in spoofing attacks on automatic speaker verification systems
Yang Gao, Jiachen Lian, Bhiksha Raj, Rita Singh
arXiv_SD
arXiv_SD
Speech
Detection
PDF
-
Acoustics Based Intent Recognition Using Discovered Phonetic Units for Low Resource Languages
Akshat Gupta, Xinjian Li, Sai Krishna Rallabandi, Alan W Black
arXiv_AI
arXiv_AI
Recognition
Speech
Pose
Classification
Speech_Recognition
PDF
-
Hostility Detection Dataset in Hindi
Mohit Bhardwaj, Md Shad Akhtar, Asif Ekbal, Amitava Das, Tanmoy Chakraborty
arXiv_CL
arXiv_CL
Speech
Detection
PDF
-
Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis
Ron J. Weiss, RJ Skerry-Ryan, Eric Battenberg, Soroosh Mariooryad, Diederik P. Kingma
arXiv_CL
arXiv_CL
RNN
Speech
Pose
PDF
-
Large-scale multilingual audio visual dubbing
Yi Yang, Brendan Shillingford, Yannis Assael, Miaosen Wang, Wendi Liu, Yutian Chen, Yu Zhang, Eren Sezener, Luis C. Cobo, Misha Denil, Yusuf Aytar, Nando de Freitas
arXiv_CV
arXiv_CV
Speech
Relation
PDF
-
Self-Supervised Learning fro