Speech_Recognition
Speech_Recognition
2023-01-30
Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical Distillation
Minglun Han, Feilong Chen, Jing Shi, Shuang Xu, Bo Xu
arXiv_AI
arXiv_AI
Recognition
Knowledge
Speech
Pose
Speech_Recognition
Language_Model
PDF
2023-01-29
Exploring Attention Map Reuse for Efficient Transformer Neural Networks
Kyuhong Shim, Jungwook Choi, Wonyong Sung
arXiv_AI
arXiv_AI
Transformer
Recognition
Speech
Pose
Attention
Speech_Recognition
Inference
PDF
2023-01-25
Fillers in Spoken Language Understanding: Computational and Psycholinguistic Perspectives
Tanvi Dinkar, Chloé Clavel, Ioana Vasilescu
arXiv_CL
arXiv_CL
Recognition
Knowledge
Speech
Speech_Recognition
PDF
2023-01-22
Unsupervised Data Selection for TTS: Using Arabic Broadcast News as a Case Study
Massa Baali, Tomoki Hayashi, Hamdy Mubarak, Soumi Maiti, Shinji Watanabe, Wassim El-Hajj, Ahmed Ali
arXiv_CL
arXiv_CL
Transformer
Unsupervised
Transfer_Learning
Recognition
Speech
Pose
Speech_Recognition
PDF
2023-01-21
A Multi-Purpose Audio-Visual Corpus for Multi-Modal Persian Speech Recognition: the Arman-AV Dataset
Javad Peymanfard, Samin Heydarian, Ali Lashini, Hossein Zeinali, Mohammad Reza Mohammadi, Nasser Mozayani
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Speech_Recognition
PDF
2023-01-21
Regeneration Learning: A Learning Paradigm for Data Generation
Xu Tan, Tao Qin, Jiang Bian, Tie-Yan Liu, Yoshua Bengio
arXiv_AI
arXiv_AI
Transformer
Recognition
Represenation_Learning
Speech
Self-Supervised
Action
Speech_Recognition
Text_Generation
PDF
2023-01-20
Neural Architecture Search: Insights from 1000 Papers
Colin White, Mahmoud Safari, Rhea Sukthanker, Binxin Ru, Thomas Elsken, Arber Zela, Debadeepta Dey, Frank Hutter
arXiv_AI
arXiv_AI
NAS
Recognition
Reinforcement_Learning
Speech
Survey
Deep_Learning
GAN
Speech_Recognition
PDF
2023-01-20
Language Agnostic Data-Driven Inverse Text Normalization
Szu-Jui Chen, Debjyoti Paul, Yutong Pang, Peng Su, Xuedong Zhang
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Attention
Speech_Recognition
PDF
2023-01-19
From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition
Chao-Han Huck Yang, Bo Li, Yu Zhang, Nanxin Chen, Rohit Prabhavalkar, Tara N. Sainath, Trevor Strohman
arXiv_AI
arXiv_AI
Enhancement
Recognition
Bert
RNN
Speech
Self-Supervised
Pose
Speech_Recognition
PDF
2023-01-18
Adapting Multilingual Speech Representation Model for a New, Underresourced Language through Multilingual Fine-tuning and Continued Pretraining
Karol Nowakowski, Michal Ptaszynski, Kyoko Murasaki, Jagna Nieuważny
arXiv_CL
arXiv_CL
Recognition
Speech
Self-Supervised
Speech_Recognition
PDF
2023-01-17
Syllable Subword Tokens for Open Vocabulary Speech Recognition in Malayalam
Kavya Manohar, A. R. Jayan, Rajeev Rajan
arXiv_CL
arXiv_CL
Recognition
Speech
Speech_Recognition
Language_Model
PDF
2023-01-16
BayesSpeech: A Bayesian Transformer Network for Automatic Speech Recognition
Will Rieger
arXiv_CL
arXiv_CL
Transformer
Recognition
RNN
Speech
Deep_Learning
Speech_Recognition
Inference
PDF
2023-01-16
Using Kaldi for Automatic Speech Recognition of Conversational Austrian German
Julian Linke, Saskia Wepner, Gernot Kubin, Barbara Schuppler
arXiv_CL
arXiv_CL
Recognition
Sparse
Knowledge
Speech
Face
Action
Speech_Recognition
Language_Model
PDF
2023-01-16
Multi-resolution location-based training for multi-channel continuous speech separation
Hassan Taherian, DeLiang Wang
arXiv_SD
arXiv_SD
Recognition
Speech
Pose
Speech_Recognition
CNN
PDF
2023-01-16
OLKAVS: An Open Large-Scale Korean Audio-Visual Speech Dataset
Jeongkyun Park, Jung-Wook Hwang, Kwanghee Choi, Seung-Hyun Lee, Jun Hwan Ahn, Rae-Hong Park, Hyung-Min Park
arXiv_AI
arXiv_AI
Recognition
Speech
Classification
Speech_Recognition
Prediction
PDF
2023-01-15
Rationalizing Predictions by Adversarial Information Calibration
Lei Sha, Oana-Maria Camburu, Thomas Lukasiewicz
arXiv_AI
arXiv_AI
Recognition
Adversarial
Speech
Pose
Action
Sentiment
Speech_Recognition
Medical
Prediction
PDF
2023-01-10
Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional Context for Continuous Speech Recognition
Piyush Behre, Sharman Tan, Padma Varadharajan, Shuangyu Chang
arXiv_CL
arXiv_CL
Transformer
Segmentation
Recognition
Speech
Pose
Speech_Recognition
PDF
2023-01-09
FullStop:Punctuation and Segmentation Prediction for Dutch with Transformers
Vincent Vandeghinste, Oliver Guhr
arXiv_AI
arXiv_AI
Transformer
Segmentation
Recognition
Bert
Speech
Classification
Speech_Recognition
Language_Model
Prediction
PDF
2023-01-08
Equivariant and Steerable Neural Networks: A review with special emphasis on the symmetric group
Patrick Krüger, Hanno Gottschalk
arXiv_AI
arXiv_AI
Recognition
Review
Speech
Speech_Recognition
CNN
PDF
2023-01-06
Using External Off-Policy Speech-To-Text Mappings in Contextual End-To-End Automated Speech Recognition
David M. Chan, Shalini Ghosh, Ariya Rastrow, Björn Hoffmeister
arXiv_SD
arXiv_SD
Embedding
Recognition
Knowledge
Speech
Pose
Few-Shot
Speech_Recognition
PDF
2023-01-04
Audio-Visual Efficient Conformer for Robust Speech Recognition
Maxime Burchi, Radu Timofte
arXiv_CV
arXiv_CV
Recognition
Speech
Pose
Classification
Attention
Speech_Recognition
Prediction
PDF
2023-01-03
Supervised Acoustic Embeddings And Their Transferability Across Languages
Sreepratha Ram, Hanan Aldarmaki
arXiv_CL
arXiv_CL
Transformer
Embedding
Unsupervised
Recognition
Speech
Self-Supervised
Pose
Speech_Recognition
PDF
2023-01-02
Towards Voice Reconstruction from EEG during Imagined Speech
Young-Eun Lee, Seo-Hyun Lee, Sang-Ho Kim, Seong-Whan Lee
arXiv_SD
arXiv_SD
Reconstruction
Recognition
Speech
Pose
Face
Speech_Recognition
Activity
PDF
2022-12-31
Sample-Efficient Unsupervised Domain Adaptation of Speech Recognition Systems A case study for Modern Greek
Georgios Paraskevopoulos, Theodoros Kouzelis, Georgios Rouvalis, Athanasios Katsamanis, Vassilis Katsouros, Alexandros Potamianos
arXiv_CL
arXiv_CL
Unsupervised
Recognition
Weakly_Supervised
Speech
Pose
Speech_Recognition
PDF
2022-12-30
Memory Augmented Lookup Dictionary based Language Modeling for Automatic Speech Recognition
Yukun Feng, Ming Tu, Rui Xia, Chuanzeng Huang, Yuxuan Wang
arXiv_CL
arXiv_CL
Transformer
Recognition
Speech
Pose
Speech_Recognition
Language_Model
Prediction
PDF
2022-12-29
Macro-block dropout for improved regularization in training end-to-end speech recognition models
Chanwoo Kim, Sathish Indurti, Jinhwan Park, Wonyong Sung
arXiv_CL
arXiv_CL
Recognition
RNN
Regularization
Speech
Pose
Attention
Speech_Recognition
PDF
2022-12-27
Don't Be So Sure! Boosting ASR Decoding via Confidence Relaxation
Tomer Wullach, Shlomo E. Chazan
arXiv_CL
arXiv_CL
Recognition
Speech
Self-Supervised
Pose
Speech_Recognition
Inference
Prediction
PDF
2022-12-26
Skit-S2I: An Indian Accented Speech to Intent dataset
Shangeth Rajaa, Swaraj Dalmia, Kumarmanas Nethil
arXiv_CL
arXiv_CL
Recognition
Speech
Classification
Speech_Recognition
Prediction
PDF
2022-12-22
Alignment Entropy Regularization
Ehsan Variani, Ke Wu, David Rybach, Cyril Allauzen, Michael Riley
arXiv_CL
arXiv_CL
Recognition
Regularization
Speech
Speech_Recognition
PDF
2022-12-21
ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement
Wei-Ning Hsu, Tal Remez, Bowen Shi, Jacob Donley, Yossi Adi
arXiv_CV
arXiv_CV
Enhancement
Recognition
Inpainting
Speech
Self-Supervised
Pose
Speech_Recognition
PDF
2022-12-21
End-to-End Automatic Speech Recognition model for the Sudanese Dialect
Ayman Mansour, Wafaa F. Mukhtar
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Face
Action
Speech_Recognition
CNN
PDF
2022-12-21
4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders
Yui Sudo, Muhammad Shakeel, Brian Yan, Jiatong Shi, Shinji Watanabe
arXiv_CL
arXiv_CL
Recognition
RNN
Regularization
Speech
Pose
Classification
Attention
Speech_Recognition
PDF
2022-12-20
SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding Tasks
Suwon Shon, Siddhant Arora, Chyi-Jiunn Lin, Ankita Pasad, Felix Wu, Roshan Sharma, Wei-Lun Wu, Hung-Yi Lee, Karen Livescu, Shinji Watanabe
arXiv_CL
arXiv_CL
Recognition
Speech
Classification
Attention
Summarization
Speech_Recognition
Inference
PDF
2022-12-20
Exploring Effective Fusion Algorithms for Speech Based Self-Supervised Learning Models
Changli Tang, Yujin Wang, Xie Chen, Wei-Qiang Zhang
arXiv_SD
arXiv_SD
Recognition
Bert
Speech
Self-Supervised
Pose
Speech_Recognition
PDF
2022-12-19
NusaCrowd: Open Source Initiative for Indonesian NLP Resources
Samuel Cahyawijaya, Holy Lovenia, Alham Fikri Aji, Genta Indra Winata, Bryan Wilie, Rahmad Mahendra, Christian Wibisono, Ade Romadhony, Karissa Vincentio, Fajri Koto, Jennifer Santoso, David Moeljadi, Cahya Wirawan, Frederikus Hudi, Ivan Halim Parmonangan, Ika Alfina, Muhammad Satrio Wicaksono, Ilham Firdausi Putra, Samsul Rahmadani, Yulianti Oenang, Ali Akbar Septiandri, James Jaya, Kaustubh D. Dhole, Arie Ardiyanti Suryani, Rifki Afina Putri, Dan Su, Keith Stevens, Made Nindyatama Nityasya, Muhammad Farid Adilazuarda, Ryan Ignatius, Ryandito Diandaru, Tiezheng Yu, Vito Ghifari, Wenliang Dai, Yan Xu, Dyah Damapuspita, Cuk Tho, Ichwanul Muslim Karo Karo, Tirana Noor Fatyanosa, Ziwei Ji, Pascale Fung, Graham Neubig, Timothy Baldwin, Sebastian Ruder, Herry Sujaini, Sakriani Sakti, Ayu Purwarianti
arXiv_AI
arXiv_AI
Recognition
Zero-Shot
Speech
Speech_Recognition
PDF
2022-12-19
Mu$^{2}$SLAM: Multitask, Multilingual Speech and Language Models
Yong Cheng, Yu Zhang, Melvin Johnson, Wolfgang Macherey, Ankur Bapna
arXiv_CL
arXiv_CL
Recognition
RNN
Speech
SLAM
Denoising
Speech_Recognition
Language_Model
QA
PDF
2022-12-17
AdaTranS: Adapting with Boundary-based Shrinking for End-to-End Speech Translation
Xingshan Zeng, Liangyou Li, Qun Liu
arXiv_CL
arXiv_CL
Transformer
Recognition
Knowledge
Speech
Pose
Speech_Recognition
Inference
PDF
2022-12-16
Speech Aware Dialog System Technology Challenge
Hagen Soltau, Izhak Shafran, Mingqiu Wang, Abhinav Rastogi, Jeffrey Zhao, Ye Jia, Wei Han, Yuan Cao, Aramys Miranda
arXiv_AI
arXiv_AI
Tracking
Recognition
Speech
Speech_Recognition
PDF
2022-12-16
Fast Entropy-Based Methods of Word-Level Confidence Estimation for End-To-End Automatic Speech Recognition
Aleksandr Laptev, Boris Ginsburg
arXiv_CL
arXiv_CL
Recognition
RNN
Speech
Pose
Classification
Speech_Recognition
PDF
2022-12-16
Context-aware Fine-tuning of Self-supervised Speech Models
Suwon Shon, Felix Wu, Kwangyoun Kim, Prashant Sridhar, Karen Livescu, Shinji Watanabe
arXiv_CL
arXiv_CL
Transformer
Embedding
Recognition
Speech
Self-Supervised
Pose
Sentiment
Attention
Speech_Recognition
Inference
Prediction
PDF
2022-12-16
Effectiveness of Text, Acoustic, and Lattice-based representations in Spoken Language Understanding tasks
Esaú Villatoro-Tello, Srikanth Madikeri, Juan Zuluaga-Gomez, Bidisha Sharma, Seyyed Saeed Sarfjoo, Iuliia Nigmatulina, Petr Motlicek, Alexei V. Ivanov, Aravind Ganapathiraju
arXiv_AI
arXiv_AI
Embedding
Recognition
Speech
Classification
Detection
Speech_Recognition
PDF
2022-12-16
BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric
Mingda Chen, Paul-Ambroise Duquenne, Pierre Andrews, Justine Kao, Alexandre Mourachko, Holger Schwenk, Marta R. Costa-jussà
arXiv_CL
arXiv_CL
Embedding
Recognition
Speech
Pose
Relation
Speech_Recognition
PDF
2022-12-14
Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language
Alexei Baevski, Arun Babu, Wei-Ning Hsu, Michael Auli
arXiv_CL
arXiv_CL
Transformer
Recognition
Bert
Speech
Self-Supervised
Classification
Speech_Recognition
CNN
Image_Classification
PDF
2022-12-14
Tackling the Cocktail Fork Problem for Separation and Transcription of Real-World Soundtracks
Darius Petermann, Gordon Wichern, Aswin Shanmugam Subramanian, Zhong-Qiu Wang, Jonathan Le Roux
arXiv_SD
arXiv_SD
Recognition
Speech
Deep_Learning
Detection
Speech_Recognition
Activity
PDF
2022-12-14
Speech and Natural Language Processing Technologies for Pseudo-Pilot Simulator
Amrutha Prasad, Juan Zuluaga-Gomez, Petr Motlicek, Saeed Sarfjoo, Iuliia Nigmatulina, Karel Vesely
arXiv_AI
arXiv_AI
Recognition
Speech
Pose
Speech_Recognition
PDF
2022-12-14
Disentangling Prosody Representations with Unsupervised Speech Reconstruction
Leyuan Qu, Taihao Li, Cornelius Weber, Theresa Pekarek-Rosin, Fuji Ren, Stefan Wermter
arXiv_SD
arXiv_SD
Reconstruction
Embedding
Unsupervised
Recognition
Bert
Speech
Pose
Emotion
Speech_Recognition
PDF
2022-12-13
RT-1: Robotics Transformer for Real-World Control at Scale
Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Joseph Dabis, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Tomas Jackson, Sally Jesmonth, Nikhil J Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Isabel Leal, Kuang-Huei Lee, Sergey Levine, Yao Lu, Utsav Malla, Deeksha Manjunath, Igor Mordatch, Ofir Nachum, Carolina Parada, Jodilyn Peralta, Emily Perez, Karl Pertsch, Jornell Quiambao, Kanishka Rao, Michael Ryoo, Grecia Salazar, Pannag Sanketi, Kevin Sayed, Jaspiar Singh, Sumedh Sontakke, Austin Stone, Clayton Tan, Huong Tran, Vincent Vanhoucke, Steve Vega, Quan Vuong, Fei Xia, Ted Xiao, Peng Xu, Sichun Xu, Tianhe Yu, Brianna Zitkovich
arXiv_CV
arXiv_CV
Transformer
Recognition
Zero-Shot
Knowledge
Speech
Speech_Recognition
PDF
2022-12-12
Jointly Learning Visual and Auditory Speech Representations from Raw Data
Alexandros Haliassos, Pingchuan Ma, Rodrigo Mira, Stavros Petridis, Maja Pantic
arXiv_CV
arXiv_CV
Transformer
Recognition
Speech
Self-Supervised
Speech_Recognition
PDF
2022-12-11
End-to-End Speech Translation of Arabic to English Broadcast News
Fethi Bougares, Salim Jouili
arXiv_AI
arXiv_AI
Transfer_Learning
Recognition
Speech
Speech_Recognition
PDF
2022-12-11
BASPRO: a balanced script producer for speech corpus collection based on the genetic algorithm
Yu-Wen Chen, Hsin-Min Wang, Yu Tsao
arXiv_AI
arXiv_AI
Enhancement
Recognition
Speech
Pose
Classification
Speech_Recognition
PDF
2022-12-10
Leveraging Modality-specific Representations for Audio-visual Speech Recognition via Reinforcement Learning
Chen Chen, Yuchen Hu, Qiang Zhang, Heqing Zou, Beier Zhu, Eng Siong Chng
arXiv_CV
arXiv_CV
Recognition
Reinforcement_Learning
Speech
Pose
Face
Speech_Recognition
PDF
2022-12-07
Lattice-Free Sequence Discriminative Training for Phoneme-Based Neural Transducers
Zijian Yang, Wei Zhou, Ralf Schlüter, Hermann Ney
arXiv_AI
arXiv_AI
Recognition
RNN
Speech
Pose
Speech_Recognition
PDF
2022-12-07
Improved Speech Pre-Training with Supervision-Enhanced Acoustic Unit
Pengcheng Li, Genshun Wan, Fenglin Ding, Hang Chen, Jianqing Gao, Jia Pan, Cong Liu
arXiv_SD
arXiv_SD
Transformer
Recognition
Bert
Speech
Self-Supervised
Pose
Speech_Recognition
PDF
2022-12-07
Progressive Multi-Scale Self-Supervised Learning for Speech Recognition
Genshun Wan, Tan Liu, Hang Chen, Jia Pan, Cong Liu, Zhongfu Ye
arXiv_SD
arXiv_SD
Recognition
Bert
Speech
Self-Supervised
Pose
Attention
Speech_Recognition
PDF
2022-12-07
Improved Self-Supervised Multilingual Speech Representation Learning Combined with Auxiliary Language Information
Fenglin Ding, Genshun Wan, Pengcheng Li, Jia Pan, Cong Liu
arXiv_CL
arXiv_CL
Transformer
Embedding
Recognition
Represenation_Learning
Adversarial
Speech
Self-Supervised
Speech_Recognition
PDF
2022-12-06
Robust Speech Recognition via Large-Scale Weak Supervision
Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever
arXiv_CL
arXiv_CL
Recognition
Zero-Shot
Speech
Speech_Recognition
Inference
PDF
2022-12-05
SoftCTC $unicode{x2013}$ Semi-Supervised Learning for Text Recognition using Soft Pseudo-Labels
Martin Kišš, Michal Hradiš, Karel Beneš, Petr Buchal, Michal Kula
arXiv_CV
arXiv_CV
Handwriting
Recognition
Optical_Character
Speech
Pose
Speech_Recognition
PDF
2022-12-05
LMEC: Learnable Multiplicative Absolute Position Embedding Based Conformer for Speech Recognition
Yuguang Yang, Yu Pan, Jingjing Yin, Heng Lu
arXiv_SD
arXiv_SD
Embedding
Recognition
Speech
Pose
Attention
Speech_Recognition
Inference
PDF
2022-12-05
Fast and accurate factorized neural transducer for text adaption of end-to-end speech recognition models
Rui Zhao, Jian Xue, Partha Parthasarathy, Veljko Miljanic, Jinyu Li
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Speech_Recognition
Language_Model
PDF
2022-12-03
Unsupervised Fine-Tuning Data Selection for ASR Using Self-Supervised Speech Models
Reem Gody, David Harwath
arXiv_CL
arXiv_CL
Transformer
Unsupervised
Recognition
Bert
Speech
Self-Supervised
Relation
Speech_Recognition
PDF
2022-12-02
Continual Learning for On-Device Speech Recognition using Disentangled Conformers
Anuj Diwan, Ching-Feng Yeh, Wei-Ning Hsu, Paden Tomasello, Eunsol Choi, David Harwath, Abdelrahman Mohamed
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Speech_Recognition
PDF
2022-12-02
Cross-Modal Mutual Learning for Cued Speech Recognition
Lei Liu, Li Liu
arXiv_CV
arXiv_CV
Transformer
Recognition
Gesture
Knowledge
Speech
Pose
Face
Action
Attention
Speech_Recognition
PDF
2022-12-02
SoftCorrect: Error Correction with Soft Detection for Automatic Speech Recognition
Yichong Leng, Xu Tan, Wenjie Liu, Kaitao Song, Rui Wang, Xiang-Yang Li, Tao Qin, Edward Lin, Tie-Yan Liu
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Classification
Detection
Attention
Speech_Recognition
Language_Model
PDF
2022-12-01
Surrogate Gradient Spiking Neural Networks as Encoders for Large Vocabulary Continuous Speech Recognition
Alexandre Bittar, Philip N. Garner
arXiv_CL
arXiv_CL
Recognition
Sparse
RNN
Speech
Speech_Recognition
PDF
2022-11-30
EURO: ESPnet Unsupervised ASR Open-source Toolkit
Dongji Gao, Jiatong Shi, Shun-Po Chuang, Leibny Paola Garcia, Hung-yi Lee, Shinji Watanabe, Sanjeev Khudanpur
arXiv_CL
arXiv_CL
Unsupervised
Recognition
Adversarial
Speech
Self-Supervised
Speech_Recognition
Activity
PDF
2022-11-30
Preliminary Study on SSCF-derived Polar Coordinate for ASR
Sotheara Leang (CADT, M-PSI), Eric Castelli (M-PSI), Dominique Vaufreydaz (M-PSI), Sethserey Sam (CADT)
arXiv_AI
arXiv_AI
Recognition
Speech
Pose
Speech_Recognition
PDF
2022-11-30
VideoDubber: Machine Translation with Speech-Aware Length Control for Video Dubbing
Yihan Wu, Junliang Guo, Xu Tan, Chen Zhang, Bohan Li, Ruihua Song, Lei He, Sheng Zhao, Arul Menezes, Jiang Bian
arXiv_AI
arXiv_AI
Recognition
Speech
Pose
Speech_Recognition
Prediction
PDF
2022-11-29
Better Transcription of UK Supreme Court Hearings
Hadeel Saadany, Constantin Orăsan, Catherine Breslin
arXiv_CL
arXiv_CL
Recognition
Speech
Detection
Speech_Recognition
Language_Model
PDF
2022-11-29
Neural Transducer Training: Reduced Memory Consumption with Sample-wise Computation
Stefan Braun, Erik McDermott, Roger Hsiao
arXiv_CL
arXiv_CL
Recognition
Optimization
Speech
Pose
Speech_Recognition
PDF
2022-11-29
MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech Recognition
Xiaohuan Zhou, Jiaming Wang, Zeyu Cui, Shiliang Zhang, Zhijie Yan, Jingren Zhou, Chang Zhou
arXiv_CL
arXiv_CL
Transformer
Recognition
Speech
Self-Supervised
Pose
Speech_Recognition
Prediction
PDF
2022-11-29
On Word Error Rate Definitions and their Efficient Computation for Multi-Speaker Speech Recognition Systems
Thilo von Neumann, Christoph Boeddeker, Keisuke Kinoshita, Marc Delcroix, Reinhold Haeb-Umbach
arXiv_SD
arXiv_SD
Recognition
Speech
Speech_Recognition
Recommendation
PDF
2022-11-29
Evaluating and reducing the distance between synthetic and real speech distributions
Christoph Minixhofer, Ondřej Klejch, Peter Bell
arXiv_CL
arXiv_CL
Recognition
Speech
Speech_Recognition
Inference
PDF
2022-11-28
Handling and extracting key entities from customer conversations using Speech recognition and Named Entity recognition
Sharvi Endait, Ruturaj Ghatage, Prof. DD Kadam
arXiv_CL
arXiv_CL
Transformer
Recognition
Bert
Speech
Face
Action
Speech_Recognition
PDF
2022-11-28
Inter-KD: Intermediate Knowledge Distillation for CTC-Based Automatic Speech Recognition
Ji Won Yoon, Beom Jun Woo, Sunghwan Ahn, Hyeonseung Lee, Nam Soo Kim
arXiv_SD
arXiv_SD
Recognition
Knowledge
Speech
Pose
Classification
Deep_Learning
Speech_Recognition
Language_Model
PDF
2022-11-24
Improving Multi-task Learning via Seeking Task-based Flat Regions
Hoang Phan, Lam Tran, Ngoc N. Tran, Nhat Ho, Dinh Phung, Trung Le
arXiv_AI
arXiv_AI
Gradient_Descent
Recognition
Regularization
Knowledge
Speech
Pose
Speech_Recognition
PDF
2022-11-24
Bidirectional Representations for Low Resource Spoken Language Understanding
Quentin Meeus, Marie-Francine Moens, Hugo Van hamme
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Face
Attention
Speech_Recognition
Language_Model
Prediction
PDF
2022-11-24
Multitask Learning for Low Resource Spoken Language Understanding
Quentin Meeus, Marie-Francine Moens, Hugo Van hamme
arXiv_CL
arXiv_CL
Recognition
Sentiment_Classification
Speech
Classification
Sentiment
Speech_Recognition
PDF
2022-11-24
TESSP: Text-Enhanced Self-Supervised Speech Pre-training
Zhuoyuan Yao, Shuo Ren, Sanyuan Chen, Ziyang Ma, Pengcheng Guo, Lei Xie
arXiv_SD
arXiv_SD
Transformer
Unsupervised
Recognition
Bert
Speech
Self-Supervised
Pose
Speech_Recognition
Matching
PDF
2022-11-23
Device Directedness with Contextual Cues for Spoken Dialog Systems
Dhanush Bekal, Sundararajan Srinivasan, Sravan Bodapati, Srikanth Ronanki, Katrin Kirchhoff
arXiv_CL
arXiv_CL
Transformer
Recognition
Represenation_Learning
RNN
Speech
Self-Supervised
Pose
Classification
Speech_Recognition
PDF
2022-11-23
Mask the Correct Tokens: An Embarrassingly Simple Approach for Error Correction
Kai Shen, Yichong Leng, Xu Tan, Siliang Tang, Yuan Zhang, Wenjie Liu, Edward Lin
arXiv_AI
arXiv_AI
Recognition
Speech
Pose
Speech_Recognition
PDF
2022-11-23
Whose Emotion Matters? Speaker Detection without Prior Knowledge
Hugo Carneiro, Cornelius Weber, Stefan Wermter
arXiv_CV
arXiv_CV
Recognition
Knowledge
Speech
Self-Supervised
Emotion
Face
Detection
Speech_Recognition
PDF
2022-11-22
Complex-Valued Time-Frequency Self-Attention for Speech Dereverberation
Vinay Kothapally, John H.L. Hansen
arXiv_SD
arXiv_SD
Recognition
Speech
Pose
Attention
Speech_Recognition
CNN
PDF
2022-11-22
Benchmarking Evaluation Metrics for Code-Switching Automatic Speech Recognition
Injy Hamed, Amir Hussein, Oumnia Chellah, Shammur Chowdhury, Hamdy Mubarak, Sunayana Sitaram, Nizar Habash, Ahmed Ali
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Relation
Speech_Recognition
PDF
2022-11-21
SpeechNet: Weakly Supervised, End-to-End Speech Recognition at Industrial Scale
Raphael Tang, Karun Kumar, Gefei Yang, Akshat Pandey, Yajie Mao, Vladislav Belyaev, Madhuri Emmadi, Craig Murray, Ferhan Ture, Jimmy Lin
arXiv_CL
arXiv_CL
Recognition
Weakly_Supervised
Knowledge
Speech
Pose
Speech_Recognition
Inference
PDF
2022-11-21
Continually learning new languages
Ngoc-Quan Pham, Jan Niehues, Alexander Waibel
arXiv_CL
arXiv_CL
Transfer_Learning
Recognition
Speech
Speech_Recognition
PDF
2022-11-21
VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning
Qiushi Zhu, Long Zhou, Ziqiang Zhang, Shujie Liu, Binxing Jiao, Jie Zhang, Lirong Dai, Daxin Jiang, Jinyu Li, Furu Wei
arXiv_AI
arXiv_AI
Transformer
Recognition
Bert
Represenation_Learning
Speech
Pose
Action
Speech_Recognition
Language_Model
Prediction
PDF
2022-11-18
A Persian ASR-based SER: Modification of Sharif Emotional Speech Database and Investigation of Persian Text Corpora
Ali Yazdani, Yasser Shekofteh
arXiv_AI
arXiv_AI
Recognition
Speech
Emotion
Deep_Learning
Speech_Recognition
Language_Model
Prediction
PDF
2022-11-18
Exploring WavLM on Speech Enhancement
Hyungchan Song, Sanyuan Chen, Zhuo Chen, Yu Wu, Takuya Yoshioka, Min Tang, Jong Won Shin, Shujie Liu
arXiv_SD
arXiv_SD
Enhancement
Recognition
Speech
Self-Supervised
Pose
Speech_Recognition
PDF
2022-11-17
Hey ASR System! Why Aren't You More Inclusive? Automatic Speech Recognition Systems' Bias and Proposed Bias Mitigation Techniques. A Literature Review
Mikel K. Ngueajio, Gloria Washington
arXiv_SD
arXiv_SD
Recognition
Review
Speech
Pose
Survey
Action
Speech_Recognition
PDF
2022-11-17
LongFNT: Long-form Speech Recognition with Factorized Neural Transducer
Xun Gong, Yu Wu, Jinyu Li, Shujie Liu, Rui Zhao, Xie Chen, Yanmin Qian
arXiv_SD
arXiv_SD
Recognition
Bert
Speech
Pose
Speech_Recognition
Language_Model
Prediction
PDF
2022-11-17
Token-level Speaker Change Detection Using Speaker Difference and Speech Content via Continuous Integrate-and-fire
Zhiyun Fan, Zhenlin Liang, Linhao Dong, Yi Liu, Shiyu Zhou, Meng Cai, Jun Zhang, Zejun Ma, Bo Xu
arXiv_SD
arXiv_SD
Segmentation
Recognition
Speech
Pose
Detection
Speech_Recognition
PDF
2022-11-17
Unsupervised Model-based speaker adaptation of end-to-end lattice-free MMI model for speech recognition
Xurong Xie, Xunying Liu, Hui Chen, Hongan Wang
arXiv_SD
arXiv_SD
Unsupervised
Recognition
RNN
Regularization
Speech
Pose
Speech_Recognition
PDF
2022-11-16
L2 proficiency assessment using self-supervised speech representations
Stefano Bannò, Kate M. Knill, Marco Matassoni, Vyas Raina, Mark J. F. Gales
arXiv_CL
arXiv_CL
Recognition
Bert
Speech
Self-Supervised
Pose
Speech_Recognition
PDF
2022-11-16
Data Augmentation with Unsupervised Speaking Style Transfer for Speech Emotion Recognition
Leyuan Qu, Wei Wang, Taihao Li, Cornelius Weber, Stefan Wermter, Fuji Ren
arXiv_AI
arXiv_AI
Unsupervised
Style_Transfer
Recognition
Adversarial
Speech
Self-Supervised
Pose
Emotion
GAN
Speech_Recognition
PDF
2022-11-16
Speaker Adaptation for End-To-End Speech Recognition Systems in Noisy Environments
Dominik Wagner, Ilja Baumann, Sebastian P. Bayerl, Korbinian Riedhammer, Tobias Bocklet
arXiv_SD
arXiv_SD
Transformer
Embedding
Recognition
Speech
Speech_Recognition
PDF
2022-11-16
Streaming Joint Speech Recognition and Disfluency Detection
Hayato Futami, Emiru Tsunoo, Kentaro Shibata, Yosuke Kashiwagi, Takao Okuda, Siddhant Arora, Shinji Watanabe
arXiv_SD
arXiv_SD
Transformer
Recognition
Bert
Speech
Pose
Detection
Speech_Recognition
Inference
Language_Model
PDF
2022-11-15
Alzheimer's Dementia Detection through Spontaneous Dialogue with Proactive Robotic Listeners
Yuanchao Li, Catherine Lai, Divesh Lala, Koji Inoue, Tatsuya Kawahara
arXiv_RO
arXiv_RO
Recognition
Speech
Pose
Detection
Object_Detection
Attention
Speech_Recognition
Medical
PDF
2022-11-15
Introducing Semantics into Speech Encoders
Derek Xu, Shuyan Dong, Changhan Wang, Suyoun Kim, Zhaojiang Lin, Akshat Shrivastava, Shang-Wen Li, Liang-Hsuan Tseng, Alexei Baevski, Guan-Ting Lin, Hung-yi Lee, Yizhou Sun, Wei Wang
arXiv_CL
arXiv_CL
Unsupervised
Recognition
Speech
Self-Supervised
Pose
Classification
Speech_Recognition
Language_Model
PDF
2022-11-14
Improving Children's Speech Recognition by Fine-tuning Self-supervised Adult Speech Representations
Renee Lu, Mostafa Shahin, Beena Ahmed
arXiv_CL
arXiv_CL
Recognition
Speech
Self-Supervised
Speech_Recognition
PDF
2022-11-14
Towards A Unified Conformer Structure: from ASR to ASV Task
Dexin Liao, Tao Jiang, Feng Wang, Lin Li, Qingyang Hong
arXiv_SD
arXiv_SD
Transformer
Transfer_Learning
Recognition
Speech
Relation
Attention
Speech_Recognition
CNN
Inference
PDF
2022-11-13
FullPack: Full Vector Utilization for Sub-Byte Quantized Inference on General Purpose CPUs
Hossein Katebi, Navidreza Asadi, Maziar Goudarzi
arXiv_SD
arXiv_SD
Quantization
Recognition
Speech
Pose
Deep_Learning
Speech_Recognition
Inference
PDF
2022-11-11
Breaking trade-offs in speech separation with sparsely-gated mixture of experts
Xiaofei Wang, Zhuo Chen, Yu Shi, Jian Wu, Naoyuki Kanda, Takuya Yoshioka
arXiv_SD
arXiv_SD
Recognition
Sparse
Speech
Speech_Recognition
PDF
2022-11-11
The Far Side of Failure: Investigating the Impact of Speech Recognition Errors on Subsequent Dementia Classification
Changye Li, Trevor Cohen, Serguei Pakhomov
arXiv_AI
arXiv_AI
Recognition
Speech
Self-Supervised
Classification
Deep_Learning
Relation
Speech_Recognition
PDF
2022-11-11
Continuous Soft Pseudo-Labeling in ASR
Tatiana Likhomanenko, Ronan Collobert, Navdeep Jaitly, Samy Bengio
arXiv_SD
arXiv_SD
Recognition
Regularization
Speech
Pose
Speech_Recognition
PDF
2022-11-11
Align, Write, Re-order: Explainable End-to-End Speech Translation via Operation Sequence Generation
Motoi Omachi, Brian Yan, Siddharth Dalmia, Yuya Fujita, Shinji Watanabe
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Speech_Recognition
Prediction
PDF
2022-11-10
A Study on the Integration of Pre-trained SSL, ASR, LM and SLU Models for Spoken Language Understanding
Yifan Peng, Siddhant Arora, Yosuke Higuchi, Yushi Ueda, Sujay Kumar, Karthik Ganesan, Siddharth Dalmia, Xuankai Chang, Shinji Watanabe
arXiv_CL
arXiv_CL
Transformer
Recognition
Speech
Self-Supervised
Sentiment
Speech_Recognition
Language_Model
PDF
2022-11-10
Self-supervised learning with bi-label masked speech prediction for streaming multi-talker speech recognition
Zili Huang, Zhuo Chen, Naoyuki Kanda, Jian Wu, Yiming Wang, Jinyu Li, Takuya Yoshioka, Xiaofei Wang, Peidong Wang
arXiv_SD
arXiv_SD
Recognition
Represenation_Learning
Speech
Self-Supervised
Pose
Speech_Recognition
Prediction
PDF
2022-11-09
Improving Noisy Student Training on Non-target Domain Data for Automatic Speech Recognition
Yu Chen, Wen Ding, Junjie Lai
arXiv_CL
arXiv_CL
Recognition
Knowledge
Speech
Pose
Speech_Recognition
Language_Model
PDF
2022-11-09
Adaptive Multi-Corpora Language Model Training for Speech Recognition
Yingyi Ma, Zhe Liu, Xuedong Zhang
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Speech_Recognition
Language_Model
PDF
2022-11-08
ATCO2 corpus: A Large-Scale Dataset for Research on Automatic Speech Recognition and Natural Language Understanding of Air Traffic Control Communications
Juan Zuluaga-Gomez, Karel Veselý, Igor Szöke, Petr Motlicek, Martin Kocour, Mickael Rigault, Khalid Choukri, Amrutha Prasad, Seyyed Saeed Sarfjoo, Iuliia Nigmatulina, Claudia Cevenini, Pavel Kolčárek, Allan Tart, Jan Černocký
arXiv_AI
arXiv_AI
Recognition
Speech
Action
Detection
Speech_Recognition
PDF
2022-11-08
Robust Unstructured Knowledge Access in Conversational Dialogue with ASR Errors
Yik-Cheung Tam, Jiacheng Xu, Jiakai Zou, Zecheng Wang, Tinglong Liao, Shuhan Yuan
arXiv_CL
arXiv_CL
Recognition
Knowledge
Speech
Pose
Classification
Detection
Speech_Recognition
PDF
2022-11-08
Comparative layer-wise analysis of self-supervised speech models
Ankita Pasad, Bowen Shi, Karen Livescu
arXiv_CL
arXiv_CL
Transformer
Recognition
Speech
Self-Supervised
Pose
Relation
Speech_Recognition
PDF
2022-11-08
Towards Improved Room Impulse Response Estimation for Speech Recognition
Anton Ratnarajah, Ishwarya Ananthabhotla, Vamsi Krishna Ithapu, Pablo Hoffmann, Dinesh Manocha, Paul Calamia
arXiv_SD
arXiv_SD
Recognition
Speech
Pose
GAN
Speech_Recognition
PDF
2022-11-07
Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition
Yashesh Gaur, Nick Kibre, Jian Xue, Kangyuan Shu, Yuhui Wang, Issac Alphanso, Jinyu Li, Yifan Gong
arXiv_AI
arXiv_AI
Transformer
Recognition
Speech
Pose
Speech_Recognition
PDF
2022-11-07
End-to-End Evaluation of a Spoken Dialogue System for Learning Basic Mathematics
Eda Okur, Saurav Sahay, Roddy Fuentes Alba, Lama Nachman
arXiv_CL
arXiv_CL
Enhancement
Recognition
Bert
Speech
Action
Speech_Recognition
PDF
2022-11-07
Hi,KIA: A Speech Emotion Recognition Dataset for Wake-Up Words
Taesu Kim, SeungHeon Doh, Gyunpyo Lee, Hyungseok Jeon, Juhan Nam, Hyeon-Jeong Suk
arXiv_SD
arXiv_SD
Recognition
Speech
Emotion
Classification
Speech_Recognition
PDF
2022-11-06
Bridging Speech and Textual Pre-trained Models with Unsupervised ASR
Jiatong Shi, Chan-Jan Hsu, Holam Chung, Dongji Gao, Paola Garcia, Shinji Watanabe, Ann Lee, Hung-yi Lee
arXiv_CL
arXiv_CL
Unsupervised
Recognition
Speech
Self-Supervised
Pose
Speech_Recognition
QA
PDF
2022-11-05
Evaluation of Automated Speech Recognition Systems for Conversational Speech: A Linguistic Perspective
Hannaneh B. Pasandi, Haniyeh B. Pasandi
arXiv_CL
arXiv_CL
Recognition
Speech
Face
Speech_Recognition
PDF
2022-11-05
LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers
Peidong Wang, Eric Sun, Jian Xue, Yu Wu, Long Zhou, Yashesh Gaur, Shujie Liu, Jinyu Li
arXiv_CL
arXiv_CL
Recognition
Regularization
Speech
Pose
Classification
Speech_Recognition
Text_Generation
Prediction
PDF
2022-11-04
Stutter-TTS: Controlled Synthesis and Improved Recognition of Stuttered Speech
Xin Zhang, Iván Vallés-Pérez, Andreas Stolcke, Chengzhu Yu, Jasha Droppo, Olabanji Shonibare, Roberto Barra-Chicote, Venkatesh Ravichandran
arXiv_SD
arXiv_SD
Recognition
Speech
Face
Speech_Recognition
PDF
2022-11-04
Resource-Efficient Transfer Learning From Speech Foundation Model Using Hierarchical Feature Fusion
Zhouyuan Huo, Khe Chai Sim, Bo Li, Dongseong Hwang, Tara N. Sainath, Trevor Strohman
arXiv_SD
arXiv_SD
Transformer
Transfer_Learning
Recognition
Sparse
Speech
Self-Supervised
Pose
Speech_Recognition
PDF
2022-11-04
Multi-blank Transducers for Speech Recognition
Hainan Xu, Fei Jia, Somshubra Majumdar, Shinji Watanabe, Boris Ginsburg
arXiv_SD
arXiv_SD
Recognition
RNN
Speech
Pose
Speech_Recognition
Inference
PDF
2022-11-04
Biased Self-supervised learning for ASR
Florian L. Kreyssig, Yangyang Shi, Jinxi Guo, Leda Sari, Abdelrahman Mohamed, Philip C. Woodland
arXiv_CL
arXiv_CL
Transformer
Recognition
Speech
Self-Supervised
Pose
Speech_Recognition
Prediction
PDF
2022-11-04
A Weakly-Supervised Streaming Multilingual Speech Model with Truly Zero-Shot Capability
Jian Xue, Peidong Wang, Jinyu Li, Eric Sun
arXiv_CL
arXiv_CL
Transformer
Recognition
Weakly_Supervised
Zero-Shot
Speech
Speech_Recognition
PDF
2022-11-04
Minimum Latency Training of Sequence Transducers for Streaming End-to-End Speech Recognition
Yusuke Shinohara, Shinji Watanabe
arXiv_SD
arXiv_SD
Recognition
RNN
Speech
Pose
Speech_Recognition
PDF
2022-11-03
Streaming Audio-Visual Speech Recognition with Alignment Regularization
Pingchuan Ma, Niko Moritz, Stavros Petridis, Christian Fuegen, Maja Pantic
arXiv_SD
arXiv_SD
Recognition
Regularization
Speech
Pose
Classification
Attention
Speech_Recognition
PDF
2022-11-03
Probing Statistical Representations For End-To-End ASR
Anna Ollerenshaw, Md Asif Jalal, Thomas Hain
arXiv_CL
arXiv_CL
Transformer
Recognition
Speech
Relation
Speech_Recognition
Language_Model
PDF
2022-11-03
Hybrid-SD : A new hybrid evaluation metric for automatic speech recognition tasks
Zitha Sasindran, Harsha Yelchuri, Supreeth Rao, T. V. Prabhakar
arXiv_CL
arXiv_CL
Recognition
Bert
Speech
Pose
Speech_Recognition
PDF
2022-11-03
Adversarial Data Augmentation Using VAE-GAN for Disordered Speech Recognition
Zengrui Jin, Xurong Xie, Mengzhe Geng, Tianzi Wang, Shujie Hu, Jiajun Deng, Guinan Li, Xunying Liu
arXiv_SD
arXiv_SD
Embedding
Recognition
Adversarial
Speech
Self-Supervised
Pose
GAN
Speech_Recognition
PDF
2022-11-03
Leveraging Domain Features for Detecting Adversarial Attacks Against Deep Speech Recognition in Noise
Christian Heider Nielsen, Zheng-Hua Tan
arXiv_SD
arXiv_SD
Recognition
Adversarial
Speech
Pose
Classification
Deep_Learning
Detection
Speech_Recognition
PDF
2022-11-03
The ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge : Dataset, Tracks, Baseline and Results
Ao Zhang, Fan Yu, Kaixun Huang, Lei Xie, Longbiao Wang, Eng Siong Chng, Hui Bu, Binbin Zhang, Wei Chen, Xin Xu
arXiv_SD
arXiv_SD
Recognition
Speech
Speech_Recognition
PDF
2022-11-03
Phonetic-assisted Multi-Target Units Modeling for Improving Conformer-Transducer ASR system
Li Li, Dongxing Xu, Haoran Wei, Yanhua Long
arXiv_SD
arXiv_SD
Recognition
Represenation_Learning
Speech
Pose
Speech_Recognition
PDF
2022-11-02
Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing
Yonggan Fu, Yang Zhang, Kaizhi Qian, Zhifan Ye, Zhongzhi Yu, Cheng-I Lai, Yingyan Lin
arXiv_SD
arXiv_SD
Recognition
Speech
Self-Supervised
Pose
Quantitative
Speech_Recognition
PDF
2022-11-02
Towards Zero-Shot Code-Switched Speech Recognition
Brian Yan, Matthew Wiesner, Ondrej Klejch, Preethi Jyothi, Shinji Watanabe
arXiv_CL
arXiv_CL
Segmentation
Recognition
Zero-Shot
Speech
Pose
Detection
Speech_Recognition
Language_Model
PDF
2022-11-02
Variable Attention Masking for Configurable Transformer Transducer Speech Recognition
Pawel Swietojanski, Stefan Braun, Dogan Can, Thiago Fraga da Silva, Arnab Ghoshal, Takaaki Hori, Roger Hsiao, Henry Mason, Erik McDermott, Honza Silovsky, Ruchir Travadi, Xiaodan Zhuang
arXiv_CL
arXiv_CL
Transformer
Recognition
Speech
Attention
Speech_Recognition
PDF
2022-11-02
Improving Named Entity Recognition in Telephone Conversations via Effective Active Learning with Human in the Loop
Md Tahmid Rahman Laskar, Cheng Chen, Xue-Yong Fu, Shashi Bhushan TN
arXiv_AI
arXiv_AI
Recognition
Speech
Pose
Speech_Recognition
PDF
2022-11-02
Intermediate Fine-Tuning Using Imperfect Synthetic Speech for Improving Electrolaryngeal Speech Recognition
Lester Phillip Violeta, Ding Ma, Wen-Chin Huang, Tomoki Toda
arXiv_SD
arXiv_SD
Recognition
Speech
Pose
Speech_Recognition
PDF
2022-11-02
Monolingual Recognizers Fusion for Code-switching Speech Recognition
Tongtong Song, Qiang Xu, Haoyu Lu, Longbiao Wang, Hao Shi, Yuqin Lin, Yanbing Yang, Jianwu Dang
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Speech_Recognition
Prediction
PDF
2022-11-02
Fast-U2++: Fast and Accurate End-to-End Speech Recognition in Joint CTC/Attention Frames
Chengdong Liang, Xiao-Lei Zhang, BinBin Zhang, Di Wu, Shengqiang Li, Xingchen Song, Zhendong Peng, Fuping Pan
arXiv_SD
arXiv_SD
Recognition
Knowledge
Speech
Attention
Speech_Recognition
PDF
2022-11-02
Conversation-oriented ASR with multi-look-ahead CBS architecture
Huaibo Zhao, Shinya Fujie, Tetsuji Ogawa, Jin Sakuma, Yusuke Kida, Tetsunori Kobayashi
arXiv_SD
arXiv_SD
Recognition
Speech
Pose
Action
Speech_Recognition
PDF
2022-11-02
More Speaking or More Speakers?
Dan Berrebbi, Ronan Collobert, Navdeep Jaitly, Tatiana Likhomanenko
arXiv_SD
arXiv_SD
Recognition
Knowledge
Speech
Self-Supervised
Speech_Recognition
PDF
2022-11-02
InterMPL: Momentum Pseudo-Labeling with Intermediate CTC Loss
Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Classification
Attention
Speech_Recognition
Inference
PDF
2022-11-02
BECTRA: Transducer-based End-to-End ASR with BERT-Enhanced Encoder
Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe
arXiv_CL
arXiv_CL
Recognition
Bert
Knowledge
Speech
Pose
Speech_Recognition
Inference
Language_Model
PDF
2022-11-01
Unified End-to-End Speech Recognition and Endpointing for Fast and Efficient Speech Systems
Shaan Bijwadia, Shuo-yiin Chang, Bo Li, Tara Sainath, Chao Zhang, Yanzhang He
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Speech_Recognition
Inference
Prediction
PDF
2022-11-01
Avoid Overthinking in Self-Supervised Models for Speech Recognition
Dan Berrebbi, Brian Yan, Shinji Watanabe
arXiv_SD
arXiv_SD
Recognition
Speech
Self-Supervised
Pose
Classification
Relation
Speech_Recognition
Inference
Prediction
PDF
2022-11-01
A Comparative Study on multichannel Speaker-attributed automatic speech recognition in Multi-party Meetings
Mohan Shi, Jie Zhang, Zhihao Du, Fan Yu, Shiliang Zhang, Li-Rong Dai
arXiv_SD
arXiv_SD
Recognition
Speech
Pose
Attention
Speech_Recognition
PDF
2022-11-01
Adapting self-supervised models to multi-talker speech recognition using speaker embeddings
Zili Huang, Desh Raj, Paola García, Sanjeev Khudanpur
arXiv_SD
arXiv_SD
Transformer
Embedding
Recognition
Speech
Self-Supervised
Pose
Action
Speech_Recognition
PDF
2022-11-01
Speech-text based multi-modal training with bidirectional attention for improved speech recognition
Yuhang Yang, Haihua Xu, Hao Huang, Eng Siong Chng, Sheng Li
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Attention
Speech_Recognition
PDF
2022-10-31
Joint Audio/Text Training for Transformer Rescorer of Streaming Speech Recognition
Suyoun Kim, Ke Li, Lucas Kabela, Rongqing Huang, Jiedan Zhu, Ozlem Kalinli, Duc Le
arXiv_CL
arXiv_CL
Transformer
Embedding
Recognition
Speech
Speech_Recognition
PDF
2022-10-31
An analysis of degenerating speech due to progressive dysarthria on ASR performance
Katrin Tomanek, Katie Seaver, Pan-Pan Jiang, Richard Cave, Lauren Harrel, Jordan R. Green
arXiv_CL
arXiv_CL
Recognition
Speech
Speech_Recognition
PDF
2022-10-31
Audio-Visual Speech Enhancement and Separation by Leveraging Multi-Modal Self-Supervised Embeddings
I-Chun Chern, Kuo-Hsuan Hung, Yi-Ting Chen, Tassadaq Hussain, Mandar Gogate, Amir Hussain, Yu Tsao, Jen-Cheng Hou
arXiv_SD
arXiv_SD
Embedding
Enhancement
Recognition
Bert
Speech
Self-Supervised
Pose
Speech_Recognition
PDF
2022-10-31
DiaCorrect: End-to-end error correction for speaker diarization
Jiangyu Han, Yuhang Cao, Heng Lu, Yanhua Long
arXiv_SD
arXiv_SD
Recognition
Speech
Pose
Action
Attention
Speech_Recognition
Activity
PDF
2022-10-31
Fast and parallel decoding for transducer
Wei Kang, Liyong Guo, Fangjun Kuang, Long Lin, Mingshuang Luo, Zengwei Yao, Xiaoyu Yang, Piotr Żelasko, Daniel Povey
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Speech_Recognition
PDF
2022-10-31
Delay-penalized transducer for low-latency streaming ASR
Wei Kang, Zengwei Yao, Fangjun Kuang, Liyong Guo, Xiaoyu Yang, Long lin, Piotr Żelasko, Daniel Povey
arXiv_CL
arXiv_CL
Recognition
RNN
Speech
Pose
Speech_Recognition
PDF
2022-10-31
Predicting Multi-Codebook Vector Quantization Indexes for Knowledge Distillation
Liyong Guo, Xiaoyu Yang, Quandong Wang, Yuxiang Kong, Zengwei Yao, Fan Cui, Fangjun Kuang, Wei Kang, Long Lin, Mingshuang Luo, Piotr Zelasko, Daniel Povey
arXiv_CL
arXiv_CL
Embedding
Quantization
Recognition
Knowledge
Speech
Self-Supervised
Pose
Speech_Recognition
PDF
2022-10-31
Structured State Space Decoder for Speech Recognition and Synthesis
Koichi Miyazaki, Masato Murata, Tomoki Koriyama
arXiv_SD
arXiv_SD
Transformer
Recognition
RNN
Speech
Pose
Classification
Attention
Speech_Recognition
PDF
2022-10-31
FusionFormer: Fusing Operations in Transformer for Efficient Streaming Speech Recognition
Xingchen Song, Di Wu, Binbin Zhang, Zhiyong Wu, Wenpeng Li, Dongfang Li, Pengshen Zhang, Zhendong Peng, Fuping Pan, Changbao Zhu, Zhongqin Wu
arXiv_CL
arXiv_CL
Transformer
Recognition
Speech
Pose
Attention
Speech_Recognition
Inference
PDF
2022-10-31
Modular Hybrid Autoregressive Transducer
Zhong Meng, Tongzhou Chen, Rohit Prabhavalkar, Yu Zhang, Gary Wang, Kartik Audhkhasi, Jesse Emond, Trevor Strohman, Bhuvana Ramabhadran, W. Ronny Huang, Ehsan Variani, Yinghui Huang, Pedro J. Moreno
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Action
Speech_Recognition
Language_Model
PDF
2022-10-31
Blank Collapse: Compressing CTC emission for the faster decoding
Minkyu Jung, Ohhyeok Kwon, Seunghyun Seo, Soonshin Seo
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Classification
Speech_Recognition
Language_Model
PDF
2022-10-30
Improved acoustic-to-articulatory inversion using representations from pretrained self-supervised learning models
Sathvik Udupa, Siddarth C, Prasanta Kumar Ghosh
arXiv_SD
arXiv_SD
Transformer
Reconstruction
Recognition
Speech
Self-Supervised
Emotion
Classification
Relation
Speech_Recognition
PDF
2022-10-30
DuDe: Dual-Decoder Multilingual ASR for Indian Languages using Common Label Set
Arunkumar A, Mudit Batra, Umesh S
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Speech_Recognition
PDF
2022-10-29
BERT Meets CTC: New Formulation of End-to-End Speech Recognition with Pre-trained Masked Language Model
Yosuke Higuchi, Brian Yan, Siddhant Arora, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe
arXiv_CL
arXiv_CL
Embedding
Recognition
Bert
Knowledge
Speech
Classification
Attention
Speech_Recognition
Inference
Language_Model
PDF
2022-10-29
XNOR-FORMER: Learning Accurate Approximations in Long Speech Transformers
Roshan Sharma, Bhiksha Raj
arXiv_CL
arXiv_CL
Transformer
Recognition
Speech
Attention
Summarization
Speech_Recognition
PDF
2022-10-29
Application of Knowledge Distillation to Multi-task Speech Representation Learning
Mine Kerpicci, Van Nguyen, Shuhua Zhang, Erik Visser
arXiv_CL
arXiv_CL
Recognition
Bert
Represenation_Learning
Knowledge
Speech
Self-Supervised
Pose
Speech_Recognition
PDF
2022-10-29
End-to-end Spoken Language Understanding with Tree-constrained Pointer Generator
Guangzhi Sun, Chao Zhang, Philip C. Woodland
arXiv_CL
arXiv_CL
Recognition
Zero-Shot
Speech
Pose
Classification
Speech_Recognition
PDF
2022-10-29
Phonemic Representation and Transcription for Speech to Text Applications for Under-resourced Indigenous African Languages: The Case of Kiswahili
Ebbie Awino, Lilian Wanzare, Lawrence Muchemi, Barack Wanjawa, Edward Ombui, Florence Indede, Owen McOnyango, Benard Okal
arXiv_CL
arXiv_CL
Recognition
Speech
Speech_Recognition
PDF
2022-10-28
Filter and evolve: progressive pseudo label refining for semi-supervised automatic speech recognition
Zezhong Jin, Dading Zhong, Xiao Song, Zhaoyi Liu, Naipeng Ye, Qingcheng Zeng
arXiv_SD
arXiv_SD
Recognition
Speech
Pose
Speech_Recognition
Prediction
PDF
2022-10-28
Dysfluencies Seldom Come Alone -- Detection as a Multi-Label Problem
Sebastian P. Bayerl, Dominik Wagner, Florian Hönig, Tobias Bocklet, Elmar Nöth, Korbinian Riedhammer
arXiv_SD
arXiv_SD
Recognition
Speech
Classification
Detection
Speech_Recognition
PDF
2022-10-28
Improving short-video speech recognition using random utterance concatenation
Haihua Xu, Van Tung Pham, Yerbolat Khassanov, Yist Lin, Tao Han, Tze Yuan Chong, Yi He, Zejun Ma
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Detection
Speech_Recognition
Activity
PDF
2022-10-27
Evaluating context-invariance in unsupervised speech representations
Mark Hallap, Emmanuel Dupoux, Ewan Dunbar
arXiv_CL
arXiv_CL
Unsupervised
Recognition
Speech
Self-Supervised
Speech_Recognition
Language_Model
PDF
2022-10-27
Simulating realistic speech overlaps improves multi-talker ASR
Muqiao Yang, Naoyuki Kanda, Xiaofei Wang, Jian Wu, Sunit Sivasankaran, Zhuo Chen, Jinyu Li, Takuya Yoshioka
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Speech_Recognition
Language_Model
PDF
2022-10-27
Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech Recognition
Yujin Wang, Changli Tang, Ziyang Ma, Zhisheng Zheng, Xie Chen, Wei-Qiang Zhang
arXiv_CL
arXiv_CL
Recognition
Bert
Speech
Self-Supervised
Speech_Recognition
Inference
PDF
2022-10-27
Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech
Takaaki Saeki, Heiga Zen, Zhehuai Chen, Nobuyuki Morioka, Gary Wang, Yu Zhang, Ankur Bapna, Andrew Rosenberg, Bhuvana Ramabhadran
arXiv_CL
arXiv_CL
Unsupervised
Recognition
Speech
Pose
Action
Speech_Recognition
PDF
2022-10-27
Make More of Your Data: Minimal Effort Data Augmentation for Automatic Speech Recognition and Translation
Tsz Kin Lam, Shigehiko Schamoni, Stefan Riezler
arXiv_CL
arXiv_CL
Transformer
Recognition
Speech
Speech_Recognition
PDF
2022-10-27
Automatic Severity Assessment of Dysarthric speech by using Self-supervised Model with Multi-task Learning
Eun Jung Yeo, Kwanghee Choi, Sunhee Kim, Minhwa Chung
arXiv_AI
arXiv_AI
Recognition
Regularization
Speech
Self-Supervised
Pose
Classification
Speech_Recognition
PDF
2022-10-27
Explicit Intensity Control for Accented Text-to-speech
Rui Liu, Haolin Zuo, De Hu, Guanglai Gao, Haizhou Li
arXiv_AI
arXiv_AI
Recognition
Adversarial
Speech
Pose
Relation
Attention
Speech_Recognition
PDF
2022-10-27
Robust Data2vec: Noise-robust Speech Representation Learning for ASR by Combining Regression and Improved Contrastive Learning
Qiu-Shi Zhu, Long Zhou, Jie Zhang, Shu-Jie Liu, Yu-Chen Hu, Li-Rong Dai
arXiv_SD
arXiv_SD
Transformer
Recognition
Represenation_Learning
Speech
Self-Supervised
Pose
Contrastive_Learning
Speech_Recognition
PDF
2022-10-27
SAN: a robust end-to-end ASR model architecture
Zeping Min, Qian Ge, Guanhua Huang
arXiv_CL
arXiv_CL
Recognition
Adversarial
Speech
Pose
Speech_Recognition
Language_Model
PDF
2022-10-27
On Out-of-Distribution Detection for Audio with Deep Nearest Neighbors
Zaharah Bukhsh, Aaqib Saeed
arXiv_SD
arXiv_SD
Recognition
Speech
Pose
Detection
Speech_Recognition
Prediction
PDF
2022-10-27
Weight Averaging: A Simple Yet Effective Method to Overcome Catastrophic Forgetting in Automatic Speech Recognition
Steven Vander Eeckt, Hugo Van hamme
arXiv_CL
arXiv_CL
Recognition
Knowledge
Speech
Pose
Speech_Recognition
PDF
2022-10-27
Contextual-Utterance Training for Automatic Speech Recognition
Alejandro Gomez-Alanis, Lukas Drude, Andreas Schwarz, Rupak Vignesh Swaminathan, Simon Wiesler
arXiv_SD
arXiv_SD
Recognition
RNN
Knowledge
Speech
Pose
Speech_Recognition
PDF
2022-10-27
Iterative pseudo-forced alignment by acoustic CTC loss for self-supervised ASR domain adaptation
Fernando López, Jordi Luque
arXiv_CL
arXiv_CL
Recognition
Speech
Self-Supervised
Pose
Classification
Speech_Recognition
Caption
PDF
2022-10-27
Streaming Voice Conversion Via Intermediate Bottleneck Features And Non-streaming Teacher Guidance
Yuanzhe Chen, Ming Tu, Tang Li, Xin Li, Qiuqiang Kong, Jiaxin Li, Zhichao Wang, Qiao Tian, Yuping Wang, Yuxuan Wang
arXiv_SD
arXiv_SD
Recognition
Speech
Pose
Speech_Recognition
PDF
2022-10-27
V-Cloak: Intelligibility-, Naturalness- & Timbre-Preserving Real-Time Voice Anonymization
Jiangyi Deng (1), Fei Teng (1), Yanjiao Chen (1), Xiaofu Chen (2), Zhaohui Wang (2), Wenyuan Xu (1) ((1) Zhejiang University, (2) Wuhan University)
arXiv_AI
arXiv_AI
Recognition
Speech
Speech_Recognition
Inference
PDF
2022-10-27
Training Autoregressive Speech Recognition Models with Limited in-domain Supervision
Chak-Fai Li, Francis Keith, William Hartmann, Matthew Snover
arXiv_CL
arXiv_CL
Recognition
Speech
Self-Supervised
Speech_Recognition
PDF
2022-10-27
TRScore: A Novel GPT-based Readability Scorer for ASR Segmentation and Punctuation model evaluation and selection
Piyush Behre, Sharman Tan, Amy Shah, Harini Kesavamoorthy, Shuangyu Chang, Fei Zuo, Chris Basoglu, Sayan Pathak
arXiv_CL
arXiv_CL
Segmentation
Recognition
Speech
Quantitative
Relation
Speech_Recognition
PDF
2022-10-26
Four-in-One: A Joint Approach to Inverse Text Normalization, Punctuation, Capitalization, and Disfluency for Automatic Speech Recognition
Sharman Tan, Piyush Behre, Nick Kibre, Issac Alphonso, Shuangyu Chang
arXiv_CL
arXiv_CL
Transformer
Recognition
Speech
Speech_Recognition
PDF
2022-10-26
There is more than one kind of robustness: Fooling Whisper with adversarial examples
Raphael Olivier, Bhiksha Raj
arXiv_CL
arXiv_CL
Recognition
Adversarial
Speech
Detection
Object_Detection
Speech_Recognition
PDF
2022-10-26
End-to-End Speech to Intent Prediction to improve E-commerce Customer Support Voicebot in Hindi and English
Abhinav Goyal, Anupam Singh, Nikesh Garera
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Classification
Speech_Recognition
Prediction
PDF
2022-10-26
Efficient Use of Large Pre-Trained Models for Low Resource ASR
Peter Vieting, Christoph Lüscher, Julian Dierkes, Ralf Schlüter, Hermann Ney
arXiv_CL
arXiv_CL
Transformer
Unsupervised
Recognition
Represenation_Learning
Speech
Speech_Recognition
Medical
PDF
2022-10-26
Monotonic segmental attention for automatic speech recognition
Albert Zeyer, Robin Schmitt, Wei Zhou, Ralf Schlüter, Hermann Ney
arXiv_CL
arXiv_CL
Recognition
Speech
Attention
Speech_Recognition
PDF
2022-10-26
Pronunciation Generation for Foreign Language Words in Intra-Sentential Code-Switching Speech Recognition
Wei Wang, Chao Zhang, Xiaopei Wu
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Speech_Recognition
PDF
2022-10-26
Reducing Language confusion for Code-switching Speech Recognition with Token-level Language Diarization
Hexin Liu, Haihua Xu, Leibny Paola Garcia, Andy W. H. Khong, Yi He, Sanjeev Khudanpur
arXiv_SD
arXiv_SD
Recognition
Optimization
Adversarial
Speech
Pose
Speech_Recognition
PDF
2022-10-26
UFO2: A unified pre-training framework for online and offline speech recognition
Li Fu, Siqi Li, Qingtao Li, Liping Deng, Fangzhu Li, Lu Fan, Meng Chen, Xiaodong He
arXiv_SD
arXiv_SD
Transformer
Recognition
Speech
Self-Supervised
Pose
Speech_Recognition
PDF
2022-10-26
Improving Speech-to-Speech Translation Through Unlabeled Text
Xuan-Phi Nguyen, Sravya Popuri, Changhan Wang, Yun Tang, Ilia Kulikov, Hongyu Gong
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Speech_Recognition
PDF
2022-10-26
Smart Speech Segmentation using Acousto-Linguistic Features with look-ahead
Piyush Behre, Naveen Parihar, Sharman Tan, Amy Shah, Eva Sharma, Geoffrey Liu, Shuangyu Chang, Hosam Khalil, Chris Basoglu, Sayan Pathak
arXiv_CL
arXiv_CL
Segmentation
Recognition
Speech
Detection
Object_Detection
Speech_Recognition
Activity
PDF
2022-10-25
Linguistic-Enhanced Transformer with CTC Embedding for Speech Recognition
Xulong Zhang, Jianzong Wang, Ning Cheng, Mengyuan Zhao, Zhiyong Zhang, Jing Xiao
arXiv_CL
arXiv_CL
Transformer
Embedding
Recognition
Speech
Pose
Attention
Speech_Recognition
Language_Model
PDF
2022-10-24
ESB: A Benchmark For Multi-Domain End-to-End Speech Recognition
Sanchit Gandhi, Patrick von Platen, Alexander M. Rush
arXiv_CL
arXiv_CL
Recognition
Knowledge
Speech
Pose
Speech_Recognition
PDF
2022-10-24
Time-Domain Speech Enhancement for Robust Automatic Speech Recognition
Yufeng Yang, Ashutosh Pandey, DeLiang Wang
arXiv_SD
arXiv_SD
Enhancement
Recognition
Speech
Pose
Speech_Recognition
PDF
2022-10-24
Brouhaha: multi-task training for voice activity detection, speech-to-noise ratio, and C50 room acoustics estimation
Marvin Lavechin, Marianne Métais, Hadrien Titeux, Alodie Boissonnet, Jade Copet, Morgane Rivière, Elika Bergelson, Alejandrina Cristia, Emmanuel Dupoux, Hervé Bredin
arXiv_SD
arXiv_SD
Recognition
Speech
Pose
Detection
Speech_Recognition
Activity
Prediction
PDF
2022-10-24
Does Joint Training Really Help Cascaded Speech Translation?
Viet Anh Khoa Tran, David Thulke, Yingbo Gao, Christian Herold, Hermann Ney
arXiv_CL
arXiv_CL
Recognition
Review
Speech
Pose
Attention
Speech_Recognition
PDF
2022-10-24
Investigating the effect of domain selection on automatic speech recognition performance: a case study on Bangladeshi Bangla
Ahnaf Mozib Samin, M. Humayan Kobir, Md. Mushtaq Shahriyar Rafee, M. Firoz Ahmed, Shafkat Kibria, M. Shahidur Rahman
arXiv_CL
arXiv_CL
Recognition
Speech
Face
Action
Classification
Attention
Speech_Recognition
CNN
PDF
2022-10-22
Guided contrastive self-supervised pre-training for automatic speech recognition
Aparna Khare, Minhua Wu, Saurabhchand Bhati, Jasha Droppo, Roland Maas
arXiv_CL
arXiv_CL
Transformer
Recognition
Represenation_Learning
Knowledge
Speech
Self-Supervised
Pose
Speech_Recognition
PDF
2022-10-21
Optimizing Bilingual Neural Transducer with Synthetic Code-switching Text Generation
Thien Nguyen, Nathalie Tran, Liuhui Deng, Thiago Fraga da Silva, Matthew Radzihovsky, Roger Hsiao, Henry Mason, Stefan Braun, Erik McDermott, Dogan Can, Pawel Swietojanski, Lyan Verwimp, Sibel Oyman, Tresi Arvizo, Honza Silovsky, Arnab Ghoshal, Mathieu Martel, Bharat Ram Ambati, Mohamed Ali
arXiv_CL
arXiv_CL
Recognition
Speech
Speech_Recognition
Text_Generation
PDF
2022-10-21
Deep LSTM Spoken Term Detection using Wav2Vec 2.0 Recognizer
Jan Švec, Jan Lehečka, Luboš Šmídl
arXiv_CL
arXiv_CL
Embedding
Transfer_Learning
Recognition
RNN
Knowledge
Speech
Self-Supervised
Pose
Detection
Speech_Recognition
Language_Model
PDF
2022-10-21
Can Visual Context Improve Automatic Speech Recognition for an Embodied Agent?
Pradip Pramanick, Chayan Sarkar
arXiv_AI
arXiv_AI
Recognition
Speech
Pose
Face
Action
Speech_Recognition
Inference
QA
PDF
2022-10-20
Improving Semi-supervised End-to-end Automatic Speech Recognition using CycleGAN and Inter-domain Losses
Chia-Yu Li, Ngoc Thang Vu
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Action
GAN
Speech_Recognition
PDF
2022-10-20
Anchored Speech Recognition with Neural Transducers
Desh Raj, Junteng Jia, Jay Mahadeokar, Chunyang Wu, Niko Moritz, Xiaohui Zhang, Ozlem Kalinli
arXiv_SD
arXiv_SD
Embedding
Recognition
Speech
Pose
Action
Speech_Recognition
PDF
2022-10-19
G-Augment: Searching For The Meta-Structure Of Data Augmentation Policies For ASR
Gary Wang, Ekin D.Cubuk, Andrew Rosenberg, Shuyang Cheng, Ron J. Weiss, Bhuvana Ramabhadran, Pedro J. Moreno, Quoc V. Le, Daniel S. Park
arXiv_CL
arXiv_CL
Recognition
Speech
Speech_Recognition
PDF
2022-10-19
End-to-End Integration of Speech Recognition, Dereverberation, Beamforming, and Self-Supervised Learning Representation
Yoshiki Masuyama, Xuankai Chang, Samuele Cornell, Shinji Watanabe, Nobutaka Ono
arXiv_SD
arXiv_SD
Enhancement
Recognition
Speech
Self-Supervised
Pose
Denoising
Speech_Recognition
PDF
2022-10-19
Tourist Guidance Robot Based on HyperCLOVA
Takato Yamazaki, Katsumasa Yoshikawa, Toshiki Kawamoto, Masaya Ohagi, Tomoya Mizumoto, Shuta Ichimura, Yusuke Kida, Toshinori Sato
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Summarization
Speech_Recognition
PDF
2022-10-19
Speaker- and Age-Invariant Training for Child Acoustic Modeling Using Adversarial Multi-Task Learning
Mostafa Shahin, Beena Ahmed, Julien Epps
arXiv_CL
arXiv_CL
Recognition
Adversarial
Speech
Pose
Speech_Recognition
PDF
2022-10-18
Simple and Effective Unsupervised Speech Translation
Changhan Wang, Hirofumi Inaguma, Peng-Jen Chen, Ilia Kulikov, Yun Tang, Wei-Ning Hsu, Michael Auli, Juan Pino
arXiv_CL
arXiv_CL
Transformer
Unsupervised
Recognition
Speech
Speech_Recognition
PDF
2022-10-18
Maestro-U: Leveraging joint speech-text representation learning for zero supervised speech ASR
Zhehuai Chen, Ankur Bapna, Andrew Rosenberg, Yu Zhang, Bhuvana Ramabhadran, Pedro Moreno, Nanxin Chen
arXiv_CL
arXiv_CL
Embedding
Recognition
Represenation_Learning
Knowledge
Speech
Speech_Recognition
PDF
2022-10-18
It's a long way! Layer-wise Relevance Propagation for Echo State Networks applied to Earth System Variability
Marco Landt-Hayen, Peer Kröger, Martin Claus, Willi Rath
arXiv_AI
arXiv_AI
Recognition
RNN
Speech
Face
Classification
Detection
Object_Detection
Speech_Recognition
Image_Classification
Prediction
PDF
2022-10-18
HMM vs. CTC for Automatic Speech Recognition: Comparison Based on Full-Sum Training from Scratch
Tina Raissi, Wei Zhou, Simon Berger, Ralf Schlüter, Hermann Ney
arXiv_SD
arXiv_SD
Recognition
Speech
Pose
Classification
Speech_Recognition
PDF
2022-10-18
Discrete Cross-Modal Alignment Enables Zero-Shot Speech Translation
Chen Wang, Yuchen Liu, Boxing Chen, Jiajun Zhang, Wei Luo, Zhongqiang Huang, Chengqing Zong
arXiv_CL
arXiv_CL
Quantization
Recognition
Zero-Shot
Speech
Pose
Speech_Recognition
PDF
2022-10-18
Personalization of CTC Speech Recognition Models
Saket Dingliwal, Monica Sunkara, Srikanth Ronanki, Jeff Farris, Katrin Kirchhoff, Sravan Bodapati
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Classification
Attention
Speech_Recognition
Medical
Inference
Prediction
PDF
2022-10-17
Sub-8-bit quantization for on-device speech recognition: a regularization-free approach
Kai Zhen, Martin Radfar, Hieu Duy Nguyen, Grant P. Strimel, Nathan Susanj, Athanasios Mouchtaris
arXiv_SD
arXiv_SD
Quantization
Recognition
RNN
Regularization
Speech
Speech_Recognition
Inference
QA
PDF
2022-10-17
Language-agnostic Code-Switching in End-To-End Speech Recognition
Enes Yavuz Ugan, Christian Huber, Juan Hussain, Alexander Waibel
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Speech_Recognition
PDF
2022-10-17
A Treatise On FST Lattice Based MMI Training
Adnan Haider, Tim Ng, Zhen Huang, Xingyu Na, Antti Veikko Rosti
arXiv_CV
arXiv_CV
Recognition
Speech
Speech_Recognition
PDF
2022-10-17
Towards Relation Extraction From Speech
Tongtong Wu, Guitao Wang, Jinming Zhao, Zhaoran Liu, Guilin Qi, Yuan-Fang Li, Gholamreza Haffari
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Action
Relation
Relation_Extraction
Speech_Recognition
PDF
2022-10-16
Acoustic-aware Non-autoregressive Spell Correction with Mask Sample Decoding
Ruchao Fan, Guoli Ye, Yashesh Gaur, Jinyu Li
arXiv_SD
arXiv_SD
Transformer
Recognition
Bert
Speech
Pose
Speech_Recognition
Language_Model
PDF
2022-10-16
A Policy-based Approach to the SpecAugment Method for Low Resource E2E ASR
Rui Li, Guodong Ma, Dexin Zhao, Ranran Zeng, Xiaoyu Li, Hao Huang
arXiv_SD
arXiv_SD
Recognition
Speech
Pose
Speech_Recognition
PDF
2022-10-15
Learning Invariant Representation and Risk Minimized for Unsupervised Accent Domain Adaptation
Chendong Zhao, Jianzong Wang, Xiaoyang Qu, Haoqian Wang, Jing Xiao
arXiv_AI
arXiv_AI
Unsupervised
Recognition
Represenation_Learning
Speech
Speech_Recognition
PDF
2022-10-14
Bringing NURC/SP to Digital Life: the Role of Open-source Automatic Speech Recognition Models
Lucas Rafael Stefanel Gris, Arnaldo Candido Junior, Vinícius G. dos Santos, Bruno A. Papa Dias, Marli Quadros Leite, Flaviane Romani Fernandes Svartman, Sandra Aluísio
arXiv_CL
arXiv_CL
Recognition
Speech
Speech_Recognition
PDF
2022-10-14
Learning to Jointly Transcribe and Subtitle for End-to-End Spontaneous Speech Recognition
Jakob Poncelet, Hugo Van hamme
arXiv_CL
arXiv_CL
Transformer
Recognition
Speech
Pose
Speech_Recognition
PDF
2022-10-14
LeVoice ASR Systems for the ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge
Yan Jia, Mi Hong, Jingyu Hou, Kailong Ren, Sifan Ma, Jin Wang, Fangzhen Peng, Yinglin Ji, Lin Yang, Junjie Wang
arXiv_SD
arXiv_SD
Enhancement
Recognition
Speech
Classification
Deep_Learning
Attention
Speech_Recognition
Language_Model
PDF
2022-10-14
TransFusion: Transcribing Speech with Multinomial Diffusion
Matthew Baas, Kevin Eloff, Herman Kamper
arXiv_AI
arXiv_AI
Recognition
Knowledge
Speech
Pose
Denoising
Speech_Recognition
PDF
2022-10-13
HuBERT-TR: Reviving Turkish Automatic Speech Recognition with Self-supervised Speech Representation Learning
Ali Safaya, Engin Erzin
arXiv_CL
arXiv_CL
Transformer
Recognition
Bert
Represenation_Learning
Speech
Self-Supervised
Speech_Recognition
PDF
2022-10-13
Multilingual Zero Resource Speech Recognition Base on Self-Supervise Pre-Trained Acoustic Models
Haoyu Wang, Wei-Qiang Zhang, Hongbin Suo, Yulong Wan
arXiv_CL
arXiv_CL
Recognition
Bert
Speech
Speech_Recognition
Language_Model
PDF
2022-10-12
Foundation Transformers
Hongyu Wang, Shuming Ma, Shaohan Huang, Li Dong, Wenhui Wang, Zhiliang Peng, Yu Wu, Payal Bajaj, Saksham Singhal, Alon Benhaim, Barun Patra, Zhun Liu, Vishrav Chaudhary, Xia Song, Furu Wei
arXiv_CV
arXiv_CV
Transformer
Recognition
Bert
Speech
Pose
Speech_Recognition
Language_Model
PDF
2022-10-12
An Ensemble Teacher-Student Learning Approach with Poisson Sub-sampling to Differential Privacy Preserving Speech Recognition
Chao-Han Huck Yang, Jun Qi, Sabato Marco Siniscalchi, Chin-Hui Lee
arXiv_AI
arXiv_AI
Recognition
Knowledge
Speech
Pose
Speech_Recognition
Prediction
PDF
2022-10-12
Can we use Common Voice to train a Multi-Speaker TTS system?
Sewade Ogun, Vincent Colotte, Emmanuel Vincent
arXiv_SD
arXiv_SD
Recognition
Speech
Pose
Speech_Recognition
PDF
2022-10-12
A context-aware knowledge transferring strategy for CTC-based ASR
Ke-Han Lu, Kuan-Yu Chen
arXiv_CL
arXiv_CL
Recognition
Knowledge
Speech
Pose
Classification
Attention
Speech_Recognition
Language_Model
PDF
2022-10-12
Summary on the ISCSLP 2022 Chinese-English Code-Switching ASR Challenge
Shuhao Deng, Chengfei Li, infeng Bai, Qingqing Zhang, Wei-Qiang Zhang, Runyan Yang, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan
arXiv_CL
arXiv_CL
Recognition
Speech
Speech_Recognition
PDF
2022-10-11
Comparison of Soft and Hard Target RNN-T Distillation for Large-scale ASR
Dongseong Hwang, Khe Chai Sim, Yu Zhang, Trevor Strohman
arXiv_CL
arXiv_CL
Recognition
RNN
Knowledge
Speech
Speech_Recognition
PDF
2022-10-11
Scaling Up Deliberation for Multilingual ASR
Ke Hu, Bo Li, Tara N. Sainath
arXiv_CL
arXiv_CL
Transformer
Recognition
Speech
Pose
Speech_Recognition
PDF
2022-10-11
Streaming Punctuation for Long-form Dictation with Transformers
Piyush Behre, Sharman Tan, Padma Varadharajan, Shuangyu Chang
arXiv_CL
arXiv_CL
Transformer
Segmentation
Recognition
Speech
Pose
Speech_Recognition
PDF
2022-10-11
Automatic Speech Recognition of Low-Resource Languages Based on Chukchi
Anastasia Safonova, Tatiana Yudina, Emil Nadimanov, Cydnie Davenport
arXiv_CL
arXiv_CL
Recognition
Speech
Speech_Recognition
PDF
2022-10-11
An Experimental Study on Private Aggregation of Teacher Ensemble Learning for End-to-End Speech Recognition
Chao-Han Huck Yang, I-Fan Chen, Andreas Stolcke, Sabato Marco Siniscalchi, Chin-Hui Lee
arXiv_SD
arXiv_SD
Transformer
Recognition
RNN
Speech
Attention
Speech_Recognition
PDF
2022-10-11
Inner speech recognition through electroencephalographic signals
Francesca Gasparini, Elisa Cazzaniga, Aurora Saibene
arXiv_AI
arXiv_AI
Recognition
RNN
Speech
Classification
Speech_Recognition
PDF
2022-10-11
CTC Alignments Improve Autoregressive Translation
Brian Yan, Siddharth Dalmia, Yosuke Higuchi, Graham Neubig, Florian Metze, Alan W Black, Shinji Watanabe
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Classification
Attention
Speech_Recognition
PDF
2022-10-07
SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training
Ziqiang Zhang, Long Zhou, Junyi Ao, Shujie Liu, Lirong Dai, Jinyu Li, Furu Wei
arXiv_CL
arXiv_CL
Transformer
Recognition
Speech
Pose
Face
Attention
Speech_Recognition
PDF
2022-10-07
Pronunciation Modeling of Foreign Words for Mandarin ASR by Considering the Effect of Language Transfer
Lei Wang, Rong Tong
arXiv_CL
arXiv_CL
Recognition
Knowledge
Speech
Pose
Speech_Recognition
PDF
2022-10-07
Cloud-based Automatic Speech Recognition Systems for Southeast Asian Languages
Lei Wang, Rong Tong, Cheung Chi Leung, Sunil Sivadas, Chongjia Ni, Bin Ma
arXiv_CL
arXiv_CL
Recognition
Knowledge
Speech
Speech_Recognition
PDF
2022-10-07
Model-based estimation of in-car-communication feedback applied to speech zone detection
Kaspar Müller, Simon Doclo, Jan Østergaard, Tobias Wolff
arXiv_SD
arXiv_SD
Enhancement
Recognition
Speech
Pose
Detection
Speech_Recognition
PDF
2022-10-06
Damage Control During Domain Adaptation for Transducer Based Automatic Speech Recognition
Somshubra Majumdar, Shantanu Acharya, Vitaly Lavrukhin, Boris Ginsburg
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Speech_Recognition
Prediction
PDF
2022-10-06
Synthetic Dataset Generation for Privacy-Preserving Machine Learning
Efstathia Soufleri, Gobinda Saha, Kaushik Roy
arXiv_AI
arXiv_AI
Recognition
Speech
Pose
Classification
Detection
Object_Detection
GAN
Speech_Recognition
Medical
Image_Classification
Matching
PDF
2022-10-05
JoeyS2T: Minimalistic Speech-to-Text Modeling with JoeyNMT
Mayumi Ohta, Julia Kreutzer, Stefan Riezler
arXiv_CL
arXiv_CL
Transformer
Recognition
Speech
NMT
Speech_Recognition
CNN
Prediction
PDF
2022-10-04
Code-Switching without Switching: Language Agnostic End-to-End Speech Translation
Christian Huber, Enes Yavuz Ugan, Alexander Waibel
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Speech_Recognition
PDF
2022-10-03
Efficient acoustic feature transformation in mismatched environments using a Guided-GAN
Walter Heymans, Marelie H. Davel, Charl van Heerden
arXiv_SD
arXiv_SD
Recognition
Adversarial
Speech
Pose
GAN
Speech_Recognition
PDF
2022-10-01
A Comparison of Transformer, Convolutional, and Recurrent Neural Networks on Phoneme Recognition
Kyuhong Shim, Wonyong Sung
arXiv_CL
arXiv_CL
Transformer
Recognition
RNN
Speech
Attention
Speech_Recognition
CNN
PDF
2022-10-01
Multi-stage Progressive Compression of Conformer Transducer for On-device Speech Recognition
Jash Rathod, Nauman Dawalatabad, Shatrughan Singh, Dhananjaya Gowda
arXiv_SD
arXiv_SD
Recognition
Knowledge
Speech
Pose
Speech_Recognition
PDF
2022-09-30
Blind Signal Dereverberation for Machine Speech Recognition
Samik Sadhu, Hynek Hermansky
arXiv_CL
arXiv_CL
Recognition
Speech
Speech_Recognition
PDF
2022-09-30
Adaptive Sparse and Monotonic Attention for Transformer-based Automatic Speech Recognition
Chendong Zhao, Jianzong Wang, Wen qi Wei, Xiaoyang Qu, Haoqian Wang, Jing Xiao
arXiv_AI
arXiv_AI
Transformer
Recognition
Sparse
Regularization
Speech
Attention
Speech_Recognition
PDF
2022-09-29
ConvRNN-T: Convolutional Augmented Recurrent Neural Network Transducers for Streaming Speech Recognition
Martin Radfar, Rohit Barnwal, Rupak Vignesh Swaminathan, Feng-Ju Chang, Grant P. Strimel, Nathan Susanj, Athanasios Mouchtaris
arXiv_CL
arXiv_CL
Transformer
Recognition
RNN
Speech
Pose
Attention
Speech_Recognition
CNN
PDF
2022-09-28
TVLT: Textless Vision-Language Transformer
Zineng Tang, Jaemin Cho, Yixin Nie, Mohit Bansal
arXiv_AI
arXiv_AI
Transformer
Recognition
Represenation_Learning
Image_Retrieval
Speech
Sentiment
VQA
Speech_Recognition
Inference
Video_Retrieval
PDF
2022-09-27
An Effective, Performant Named Entity Recognition System for Noisy Business Telephone Conversation Transcripts
Xue-Yong Fu, Cheng Chen, Md Tahmid Rahman Laskar, Shashi Bhushan TN, Simon Corston-Oliver
arXiv_CL
arXiv_CL
Recognition
Bert
Speech
Speech_Recognition
PDF
2022-09-26
On the Impact of Speech Recognition Errors in Passage Retrieval for Spoken Question Answering
Georgios Sidiropoulos, Svitlana Vakulenko, Evangelos Kanoulas
arXiv_CL
arXiv_CL
Recognition
Speech
Face
Attention
Speech_Recognition
QA
PDF
2022-09-24
Unsupervised domain adaptation for speech recognition with unsupervised error correction
Long Mai, Julie Carson-Berndsen
arXiv_AI
arXiv_AI
Unsupervised
Recognition
Speech
Pose
Speech_Recognition
PDF
2022-09-22
CMGAN: Conformer-Based Metric-GAN for Monaural Speech Enhancement
Sherif Abdulatif, Ruizhe Cao, Bin Yang
arXiv_AI
arXiv_AI
Transformer
Enhancement
Recognition
Super_Resolution
Adversarial
Speech
Pose
Quantitative
Denoising
GAN
Speech_Recognition
PDF
2022-09-21
Assessing ASR Model Quality on Disordered Speech using BERTScore
Jimmy Tobin, Qisheng Li, Subhashini Venugopalan, Katie Seaver, Richard Cave, Katrin Tomanek
arXiv_CL
arXiv_CL
Recognition
Bert
Speech
Action
Speech_Recognition
Text_Generation
Prediction
PDF
2022-09-18
HAPI: A Large-scale Longitudinal Dataset of Commercial ML API Predictions
Lingjiao Chen, Zhihua Jin, Sabri Eyuboglu, Christopher Ré, Matei Zaharia, James Zou
arXiv_AI
arXiv_AI
Recognition
OCR
Speech
Detection
Sentiment
Object_Detection
Speech_Recognition
Prediction
PDF
2022-09-17
Watch What You Pretrain For: Targeted, Transferable Adversarial Examples on Self-Supervised Speech Recognition models
Raphael Olivier, Bhiksha Raj
arXiv_CL
arXiv_CL
Transformer
Recognition
Bert
Knowledge
Adversarial
Speech
Self-Supervised
Pose
Speech_Recognition
PDF
2022-09-17
Parameter-Efficient Conformers via Sharing Sparsely-Gated Experts for End-to-End Speech Recognition
Ye Bai, Jie Li, Wenjing Han, Hao Ni, Kaituo Xu, Zhuo Zhang, Cheng Yi, Xiaorui Wang
arXiv_CL
arXiv_CL
Transformer
Recognition
Sparse
Knowledge
Speech
Pose
Speech_Recognition
Inference
PDF
2022-09-16
An Automatic Speech Recognition System for Bengali Language based on Wav2Vec2 and Transfer Learning
Tushar Talukder Showrav
arXiv_CL
arXiv_CL
Transfer_Learning
Recognition
Speech
Pose
Speech_Recognition
PDF
2022-09-16
Adaptive Natural Language Generation for Task-oriented Dialogue via Reinforcement Learning
Atsumoto Ohashi, Ryuichiro Higashinaka
arXiv_CL
arXiv_CL
Recognition
Reinforcement_Learning
Speech
Pose
Speech_Recognition
PDF
2022-09-15
Distribution Aware Metrics for Conditional Natural Language Generation
David M Chan, Yiming Ni, Austin Myers, Sudheendra Vijayanarasimhan, David A Ross, John Canny
arXiv_CV
arXiv_CV
Recognition
Speech
Pose
Summarization
Speech_Recognition
Caption
Matching
PDF
2022-09-15
MVNet: Memory Assistance and Vocal Reinforcement Network for Speech Enhancement
Jianrong Wang, Xiaomin Li, Xuewei Li, Mei Yu, Qiang Fang, Li Liu
arXiv_SD
arXiv_SD
Enhancement
Recognition
Speech
Pose
Speech_Recognition
PDF
2022-09-15
Non-Parallel Voice Conversion for ASR Augmentation
Gary Wang, Andrew Rosenberg, Bhuvana Ramabhadran, Fadi Biadsy, Yinghui Huang, Jesse Emond, Pedro Moreno Mengibar
arXiv_SD
arXiv_SD
Recognition
Speech
Speech_Recognition
PDF
2022-09-14
ESSumm: Extractive Speech Summarization from Untranscribed Meeting
Jun Wang
arXiv_CL
arXiv_CL
Unsupervised
Recognition
Speech
Self-Supervised
Pose
Summarization
Speech_Recognition
CNN
PDF
2022-09-14
A Universally-Deployable ASR Frontend for Joint Acoustic Echo Cancellation, Speech Enhancement, and Voice Separation
Tom O'Malley, Arun Narayanan, Quan Wang
arXiv_SD
arXiv_SD
Embedding
Enhancement
Recognition
Speech
Pose
Attention
Speech_Recognition
PDF
2022-09-14
Federated Pruning: Improving Neural Network Efficiency with Federated Learning
Rongmei Lin, Yonghui Xiao, Tien-Ju Yang, Ding Zhao, Li Xiong, Giovanni Motta, Françoise Beaufays
arXiv_AI
arXiv_AI
Recognition
Speech
Pose
Speech_Recognition
Prediction
PDF
2022-09-13
Bangla-Wave: Improving Bangla Automatic Speech Recognition Utilizing N-gram Language Models
Mohammed Rakib, Md. Ismail Hossain, Nabeel Mohammed, Fuad Rahman
arXiv_AI
arXiv_AI
Recognition
Speech
Speech_Recognition
Language_Model
PDF
2022-09-13
Analysis of Self-Attention Head Diversity for Conformer-based Automatic Speech Recognition
Kartik Audhkhasi, Yinghui Huang, Bhuvana Ramabhadran, Pedro J. Moreno
arXiv_SD
arXiv_SD
Transformer
Recognition
Speech
Attention
Speech_Recognition
PDF
2022-09-13
Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification
Chao Zhang, Bo Li, Tara Sainath, Trevor Strohman, Sepand Mavandadi, Shuo-yiin Chang, Parisa Haghani
arXiv_CL
arXiv_CL
Recognition
RNN
Speech
Pose
Speech_Recognition
Prediction
PDF
2022-09-13
Learning ASR pathways: A sparse multilingual ASR model
Mu Yang, Andros Tjandra, Chunxi Liu, David Zhang, Duc Le, John H. L. Hansen, Ozlem Kalinli
arXiv_CL
arXiv_CL
Recognition
Sparse
RNN
Knowledge
Speech
Pose
Speech_Recognition
PDF
2022-09-12
VarArray Meets t-SOT: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition
Naoyuki Kanda, Jian Wu, Xiaofei Wang, Zhuo Chen, Jinyu Li, Takuya Yoshioka
arXiv_CL
arXiv_CL
Transformer
Recognition
Speech
Pose
Speech_Recognition
Inference
PDF
2022-09-11
Applying wav2vec2 for Speech Recognition on Bengali Common Voices Dataset
H.A.Z. Sameen Shahgir, Khondker Salman Sayeed, Tanjeem Azwad Zaman
arXiv_AI
arXiv_AI
Recognition
Speech
Speech_Recognition
Language_Model
PDF
2022-09-11
Lexicon and Attention based Handwritten Text Recognition System
Lalita Kumari, Sukhdeep Singh, VVS Rathore, Anuj Sharma
arXiv_CV
arXiv_CV
Recognition
Speech
Attention
Speech_Recognition
PDF
2022-09-09
Conversion of Acoustic Signal Into Text By Digital Filter using Natural Language Processing
Abhiram Katuri, Sindhu Salugu, Gelli Tharuni, Challa Sri Gouri
arXiv_AI
arXiv_AI
Recognition
Speech
Face
Speech_Recognition
PDF
2022-09-09
Streaming End-to-End Target Speaker ASR
Takafumi Moriya, Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Takahiro Shinozaki
arXiv_SD
arXiv_SD
Recognition
RNN
Speech
Pose
Action
Deep_Learning
Speech_Recognition
PDF
2022-09-08
Non-autoregressive Error Correction for CTC-based ASR with Phone-conditioned Masked LM
Hayato Futami, Hirofumi Inaguma, Sei Ueno, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Classification
Speech_Recognition
Inference
Language_Model
PDF
2022-09-08
Multilingual Transformer Language Model for Speech Recognition in Low-resource Languages
Li Miao, Jian Wu, Piyush Behre, Shuangyu Chang, Sarangarajan Parthasarathy
arXiv_CL
arXiv_CL
Transformer
Recognition
Sparse
Speech
Speech_Recognition
Language_Model
PDF
2022-09-08
Accented Speech Recognition under the Indian context
Ankit Grover
arXiv_SD
arXiv_SD
Recognition
Speech
Emotion
Speech_Recognition
PDF
2022-09-07
Modeling Dependent Structure for Utterances in ASR Evaluation
Zhe Liu, Fuchun Peng
arXiv_AI
arXiv_AI
Recognition
Speech
Pose
Speech_Recognition
PDF
2022-09-07
Plant Species Classification Using Transfer Learning by Pretrained Classifier VGG-19
Thiru Siddharth, Bhupendra Singh Kirar, Dheeraj Kumar Agrawal
arXiv_CV
arXiv_CV
Transfer_Learning
Recognition
Knowledge
Speech
Action
Classification
Deep_Learning
Speech_Recognition
Medical
CNN
Image_Classification
PDF
2022-09-06
ASR2K: Speech Recognition for Around 2000 Languages without Audio
Xinjian Li, Florian Metze, David R Mortensen, Alan W Black, Shinji Watanabe
arXiv_CL
arXiv_CL
Recognition
Speech
Speech_Recognition
Language_Model
PDF
2022-09-05
Distilling the Knowledge of BERT for CTC-based ASR
Hayato Futami, Hirofumi Inaguma, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara
arXiv_CL
arXiv_CL
Recognition
Bert
Knowledge
Speech
Pose
Classification
Attention
Speech_Recognition
Inference
Language_Model
Prediction
PDF
2022-09-05
Predict-and-Update Network: Audio-Visual Speech Recognition Inspired by Human Speech Perception
Jiadong Wang, Xinyuan Qian, Haizhou Li
arXiv_SD
arXiv_SD
Embedding
Recognition
Speech
Pose
Action
Speech_Recognition
PDF
2022-09-04
A Review of Sparse Expert Models in Deep Learning
William Fedus, Jeff Dean, Barret Zoph
arXiv_CL
arXiv_CL
Transformer
Recognition
Sparse
Review
Speech
Deep_Learning
Speech_Recognition
PDF
2022-09-02
Universal Fourier Attack for Time Series
Elizabeth Coda, Brad Clymer, Chance DeSmet, Yijing Watkins, Michael Girard
arXiv_AI
arXiv_AI
Recognition
Adversarial
Speech
Pose
Speech_Recognition
PDF
2022-09-01
A Wavelet Transform Based Scheme to Extract Speech Pitch and Formant Frequencies
Mahdieh Ghazvini, Seyedamiryousef Hosseini Goki, Sajad Hamzenejadi
arXiv_SD
arXiv_SD
Recognition
Review
Speech
Pose
Action
Classification
Speech_Recognition
PDF
2022-09-01
Attention Enhanced Citrinet for Speech Recognition
Xianchao Wu
arXiv_SD
arXiv_SD
Recognition
Speech
Pose
Classification
Attention
Speech_Recognition
CNN
PDF
2022-09-01
Deep Sparse Conformer for Speech Recognition
Xianchao Wu
arXiv_SD
arXiv_SD
Transformer
Recognition
Sparse
Speech
Action
Attention
Speech_Recognition
CNN
PDF
2022-08-29
A Language Agnostic Multilingual Streaming On-Device ASR System
Bo Li, Tara N. Sainath, Ruoming Pang, Shuo-yiin Chang, Qiumin Xu, Trevor Strohman, Vince Chen, Qiao Liang, Heguang Liu, Yanzhang He, Parisa Haghani, Sameer Bidichandani
arXiv_CL
arXiv_CL
Embedding
Recognition
RNN
Speech
Pose
Speech_Recognition
PDF
2022-08-29
Streaming Intended Query Detection using E2E Modeling for Continued Conversation
Shuo-yiin Chang, Guru Prakash, Zelin Wu, Qiao Liang, Tara N. Sainath, Bo Li, Adam Stambler, Shyam Upadhyay, Manaal Faruqui, Trevor Strohman
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Detection
Object_Detection
Speech_Recognition
PDF
2022-08-29
Turn-Taking Prediction for Natural Conversational Speech
Shuo-yiin Chang, Bo Li, Tara N. Sainath, Chao Zhang, Trevor Strohman, Qiao Liang, Yanzhang He
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Action
Speech_Recognition
Prediction
PDF
2022-08-28
Bayesian Neural Network Language Modeling for Speech Recognition
Boyang Xue, Shoukang Hu, Junhao Xu, Mengzhe Geng, Xunying Liu, Helen Meng
arXiv_CL
arXiv_CL
Transformer
NAS
Recognition
RNN
Regularization
Speech
Pose
Speech_Recognition
Inference
Language_Model
PDF
2022-08-27
Minimal Feature Analysis for Isolated Digit Recognition for varying encoding rates in noisy environments
Muskan Garg, Naveen Aggarwal
arXiv_CV
arXiv_CV
Recognition
Speech
Action
Speech_Recognition
PDF
2022-08-26
Investigating data partitioning strategies for crosslinguistic low-resource ASR evaluation
Zoey Liu, Justin Spence, Emily Prud'hommeaux
arXiv_CL
arXiv_CL
Recognition
Adversarial
Speech
Speech_Recognition
PDF
2022-08-26
Effectiveness of Mining Audio and Text Pairs from Public Data for Improving ASR Systems for Low-Resource Languages
Kaushal Santosh Bhogale, Abhigyan Raman, Tahir Javed, Sumanth Doddapaneni, Anoop Kunchukuttan, Pratyush Kumar, Mitesh M. Khapra
arXiv_CL
arXiv_CL
Transfer_Learning
Recognition
OCR
Speech
Self-Supervised
Speech_Recognition
PDF
2022-08-26
Convolutional Neural Network to reduce construction loss in JPEG compression
Suman Kunwar
arXiv_CV
arXiv_CV
Reconstruction
Recognition
Speech
Pose
Action
Deep_Learning
Attention
Image_Compression
Speech_Recognition
CNN
PDF
2022-08-24
IndicSUPERB: A Speech Processing Universal Performance Benchmark for Indian languages
Tahir Javed, Kaushal Santosh Bhogale, Abhigyan Raman, Anoop Kunchukuttan, Pratyush Kumar, Mitesh M. Khapra
arXiv_CL
arXiv_CL
Recognition
Bert
Speech
Self-Supervised
Speech_Recognition
Language_Model
PDF
2022-08-22
Low-Level Physiological Implications of End-to-End Learning of Speech Recognition
Louise Coppieters de Gibson, Philip N. Garner
arXiv_AI
arXiv_AI
Recognition
Speech
Action
Speech_Recognition
Inference
PDF
2022-08-22
DualVoice: Speech Interaction that Discriminates between Normal and Whispered Voice Input
Jun Rekimoto
arXiv_SD
arXiv_SD
Recognition
Speech
Pose
Action
Speech_Recognition
PDF
2022-08-22
Are disentangled representations all you need to build speaker anonymization systems?
Pierre Champion (MULTISPEECH, LIUM), Denis Jouvet (MULTISPEECH), Anthony Larcher (LIUM)
arXiv_SD
arXiv_SD
Quantization
Recognition
Speech
Pose
Speech_Recognition
PDF
2022-08-17
Analyzing Robustness of End-to-End Neural Models for Automatic Speech Recognition
Goutham Rajendran, Wei Zou
arXiv_CL
arXiv_CL
Recognition
Bert
Adversarial
Speech
Speech_Recognition
Inference
Prediction
PDF
2022-08-16
Uconv-Conformer: High Reduction of Input Sequence Length for End-to-End Speech Recognition
Andrei Andrusenko, Rauf Nasretdinov, Aleksei Romanenko
arXiv_SD
arXiv_SD
Recognition
Optimization
Speech
Pose
Speech_Recognition
Inference
PDF
2022-08-10
Comparison and Analysis of New Curriculum Criteria for End-to-End ASR
Georgios Karakasidis, Tamás Grósz, Mikko Kurimo
arXiv_SD
arXiv_SD
Recognition
Knowledge
Speech
Pose
GAN
Speech_Recognition
PDF
2022-08-09
Thai Wav2Vec2.0 with CommonVoice V8
Wannaphong Phatthiyaphaibun, Chompakorn Chaksangchaichot, Peerat Limkonchotiwat, Ekapol Chuangsuwanich, Sarana Nutanong
arXiv_CL
arXiv_CL
Recognition
Speech
Face
Attention
Speech_Recognition
Language_Model
PDF
2022-08-09
ASR Error Correction with Constrained Decoding on Operation Prediction
Jingyuan Yang, Rongjun Li, Wei Peng
arXiv_CL
arXiv_CL
Embedding
Recognition
Speech
Pose
Speech_Recognition
Inference
Prediction
PDF
2022-08-09
Speaker-adaptive Lip Reading with User-dependent Padding
Minsu Kim, Hyunjun Kim, Yong Man Ro
arXiv_AI
arXiv_AI
Unsupervised
Recognition
Speech
Pose
Action
Speech_Recognition
PDF
2022-08-05
Large vocabulary speech recognition for languages of Africa: multilingual modeling and self-supervised learning
Sandy Ritchie, You-Chi Cheng, Mingqing Chen, Rajiv Mathews, Daan van Esch, Bo Li, Khe Chai Sim
arXiv_CL
arXiv_CL
Transformer
Unsupervised
Recognition
Speech
Self-Supervised
Speech_Recognition
PDF
2022-08-05
Model Blending for Text Classification
Ramit Pahwa
arXiv_AI
arXiv_AI
Recognition
Text_Classification
RNN
Knowledge
Speech
Classification
Speech_Recognition
Inference
PDF
2022-08-03
Adversarial Attacks on ASR Systems: An Overview
Xiao Zhang, Hao Tan, Xuan Huang, Denghui Zhang, Keke Tang, Zhaoquan Gu
arXiv_AI
arXiv_AI
Recognition
Adversarial
Speech
Survey
Relation
Attention
Speech_Recognition
PDF
2022-08-03
Multiclass ASMA vs Targeted PGD Attack in Image Segmentation
Johnson Vo (1), Jiabao Xie (1), Sahil Patel (1) ((1) University of Toronto)
arXiv_CV
arXiv_CV
Segmentation
Gradient_Descent
Recognition
Adversarial
Speech
Classification
Deep_Learning
Speech_Recognition
Image_Classification
PDF
2022-08-01
Performance Disparities Between Accents in Automatic Speech Recognition
Alex DiChristofano, Henry Shuster, Shefali Chandra, Neal Patwari
arXiv_CL
arXiv_CL
Recognition
Speech
Quantitative
Relation
Speech_Recognition
PDF
2022-08-01
DENT-DDSP: Data-efficient noisy speech generator using differentiable digital signal processors for explicit distortion modelling and noise-robust speech recognition
Z. Guo, C. Chen, E.S. Chng
arXiv_SD
arXiv_SD
Recognition
Speech
Pose
Speech_Recognition
PDF
2022-07-29
Thutmose Tagger: Single-pass neural model for Inverse Text Normalization
Alexandra Antonova, Evelina Bakhturina, Boris Ginsburg
arXiv_AI
arXiv_AI
Recognition
Speech
Pose
Speech_Recognition
Prediction
PDF
2022-07-29
Multiple-hypothesis RNN-T Loss for Unsupervised Fine-tuning and Self-training of Neural Transducer
Cong-Thanh Do, Mohan Li, Rama Doddipatla
arXiv_CL
arXiv_CL
Unsupervised
Recognition
RNN
Speech
Pose
Speech_Recognition
PDF
2022-07-29
Pronunciation-aware unique character encoding for RNN Transducer-based Mandarin speech recognition
Peng Shen, Xugang Lu, Hisashi Kawai
arXiv_CL
arXiv_CL
Recognition
RNN
Speech
Pose
Speech_Recognition
PDF
2022-07-29
Domain Specific Wav2vec 2.0 Fine-tuning For The SE&R 2022 Challenge
Alef Iury Siqueira Ferreira, Gustavo dos Reis Oliveira
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Emotion
Speech_Recognition
PDF
2022-07-28
Extending RNN-T-based speech recognition systems with emotion and language classification
Zvi Kons, Hagai Aronowitz, Edmilson Morais, Matheus Damasceno, Hong-Kwang Kuo, Samuel Thomas, George Saon
arXiv_SD
arXiv_SD
Recognition
RNN
Speech
Pose
Emotion
Classification
Speech_Recognition
PDF
2022-07-27
Knowledge-driven Subword Grammar Modeling for Automatic Speech Recognition in Tamil and Kannada
Madhavaraj A, Bharathi Pilar, Ramakrishnan A G
arXiv_SD
arXiv_SD
Segmentation
Recognition
Knowledge
Speech
Speech_Recognition
PDF
2022-07-27
Subword Dictionary Learning and Segmentation Techniques for Automatic Speech Recognition in Tamil and Kannada
Madhavaraj A, Bharathi Pilar, Ramakrishnan A G
arXiv_SD
arXiv_SD
Segmentation
Recognition
Speech
Pose
Speech_Recognition
PDF
2022-07-27
SoundChoice: Grapheme-to-Phoneme Models with Semantic Disambiguation
Artem Ploujnikov, Mirco Ravanelli
arXiv_SD
arXiv_SD
Embedding
Recognition
Bert
Speech
Pose
Classification
Speech_Recognition
Language_Model
PDF
2022-07-26
Perception-Aware Attack: Creating Adversarial Music via Reverse-Engineering Human Perception
Rui Duan, Zhe Qu, Shangqing Zhao, Leah Ding, Yao Liu, Zhuo Lu
arXiv_AI
arXiv_AI
Recognition
Optimization
Adversarial
Speech
Pose
Classification
Detection
Object_Detection
Speech_Recognition
Prediction
PDF
2022-07-25
Unsupervised data selection for Speech Recognition with contrastive loss ratios
Chanho Park, Rehan Ahmad, Thomas Hain
arXiv_SD
arXiv_SD
Unsupervised
Recognition
Speech
Pose
Speech_Recognition
Matching
PDF
2022-07-25
Learning a Dual-Mode Speech Recognition Model via Self-Pruning
Chunxi Liu, Yuan Shangguan, Haichuan Yang, Yangyang Shi, Raghuraman Krishnamoorthi, Ozlem Kalinli
arXiv_SD
arXiv_SD
Recognition
Sparse
Speech
Self-Supervised
Speech_Recognition
PDF
2022-07-24
A Deep Dive into Deep Cluster
Ahmad Mustapha, Wael Khreich, Wasim Masr
arXiv_CV
arXiv_CV
Unsupervised
Recognition
Speech
Pose
Deep_Learning
Speech_Recognition
CNN
PDF
2022-07-24
Improving Mandarin Speech Recogntion with Block-augmented Transformer
Xiaoming Ren, Huifeng Zhu, Liuwei Wei, Minghui Wu, Jie Hao
arXiv_AI
arXiv_AI
Transformer
Recognition
Speech
Pose
Speech_Recognition
Language_Model
PDF
2022-07-23
Implementation Of Tiny Machine Learning Models On Arduino 33 BLE For Gesture And Speech Recognition
Viswanatha V, Ramachandra A.C, Raghavendra Prasanna, Prem Chowdary Kakarla, Viveka Simha PJ, Nishant Mohan
arXiv_SD
arXiv_SD
Recognition
Gesture
Speech
Action
Speech_Recognition
PDF
2022-07-22
Toward Fairness in Speech Recognition: Discovery and mitigation of performance disparities
Pranav Dheram, Murugesan Ramakrishnan, Anirudh Raju, I-Fan Chen, Brian King, Katherine Powell, Melissa Saboowala, Karan Shetty, Andreas Stolcke
arXiv_CL
arXiv_CL
Embedding
Recognition
Speech
Speech_Recognition
PDF
2022-07-22
ASR Error Detection via Audio-Transcript entailment
Nimshi Venkat Meripo, Sandeep Konam
arXiv_CL
arXiv_CL
Recognition
Knowledge
Speech
Pose
Classification
Detection
Speech_Recognition
Medical
PDF
2022-07-20
AutoDiCE: Fully Automated Distributed CNN Inference at the Edge
Xiaotian Guo, Andy D.Pimentel, Todor Stefanov
arXiv_AI
arXiv_AI
Recognition
Speech
Pose
Classification
Deep_Learning
Speech_Recognition
CNN
Image_Classification
Inference
PDF
2022-07-20
When Is TTS Augmentation Through a Pivot Language Useful?
Nathaniel Robinson, Perez Ogayo, Swetha Gangu, David R. Mortensen, Shinji Watanabe
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Speech_Recognition
PDF
2022-07-20
Towards Transfer Learning of wav2vec 2.0 for Automatic Lyric Transcription
Longshen Ou, Xiangming Gu, Ye Wang
arXiv_SD
arXiv_SD
Transfer_Learning
Recognition
Speech
Self-Supervised
Pose
Attention
Speech_Recognition
PDF
2022-07-20
Improving Data Driven Inverse Text Normalization using Data Augmentation
Laxmi Pandey, Debjyoti Paul, Pooja Chitkara, Yutong Pang, Xuedong Zhang, Kjell Schubert, Mark Chou, Shu Liu, Yatharth Saraf
arXiv_CL
arXiv_CL
Recognition
Speech
Face
Action
Speech_Recognition
PDF
2022-07-19
ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding
Yen-Ju Lu, Xuankai Chang, Chenda Li, Wangyou Zhang, Samuele Cornell, Zhaoheng Ni, Yoshiki Masuyama, Brian Yan, Robin Scheibler, Zhong-Qiu Wang, Yu Tsao, Yanmin Qian, Shinji Watanabe
arXiv_CL
arXiv_CL
Enhancement
Recognition
Speech
Face
Speech_Recognition
PDF
2022-07-19
ILASR: Privacy-Preserving Incremental Learning for AutomaticSpeech Recognition at Production Scale
Gopinath Chennupati, Milind Rao, Gurpreet Chadha, Aaron Eakin, Anirudh Raju, Gautam Tiwari, Anit Kumar Sahu, Ariya Rastrow, Jasha Droppo, Andy Oberlin, Buddha Nandanoor, Prahalad Venkataramanan, Zheng Wu, Pankaj Sitpure
arXiv_AI
arXiv_AI
Recognition
Speech
Pose
Speech_Recognition
PDF
2022-07-17
End-to-End Spoken Language Understanding: Performance analyses of a voice command task in a low resource setting
Thierry Desot, François Portet, Michel Vacher
arXiv_CL
arXiv_CL
Recognition
Optimization
Speech
Action
Speech_Recognition
PDF
2022-07-16
Reducing Geographic Disparities in Automatic Speech Recognition via Elastic Weight Consolidation
Viet Anh Trinh, Pegah Ghahremani, Brian King, Jasha Droppo, Andreas Stolcke, Roland Maas
arXiv_SD
arXiv_SD
Recognition
Regularization
Speech
Pose
Speech_Recognition
PDF
2022-07-16
Sotto Voce: Federated Speech Recognition with Differential Privacy Guarantees
Michael Shoemate, Kevin Jett, Ethan Cowan, Sean Colbath, James Honaker, Prasanna Muthukumar
arXiv_CL
arXiv_CL
Recognition
Speech
Classification
GAN
Speech_Recognition
PDF
2022-07-15
Position Prediction as an Effective Pretraining Strategy
Shuangfei Zhai, Navdeep Jaitly, Jason Ramapuram, Dan Busbridge, Tatiana Likhomanenko, Joseph Yitan Cheng, Walter Talbott, Chen Huang, Hanlin Goh, Joshua Susskind
arXiv_CV
arXiv_CV
Transformer
Reconstruction
Embedding
Unsupervised
Recognition
Bert
Regularization
Speech
Self-Supervised
Pose
Classification
Relation
Speech_Recognition
Prediction
PDF
2022-07-15
Knowledge Transfer and Distillation from Autoregressive to Non-Autoregressive Speech Recognition
Xun Gong, Zhikai Zhou, Yanmin Qian
arXiv_SD
arXiv_SD
Transfer_Learning
Recognition
Knowledge
Speech
Pose
Speech_Recognition
Inference
PDF
2022-07-15
Direction-Aware Joint Adaptation of Neural Speech Enhancement and Recognition in Real Multiparty Conversational Environments
Yicheng Du, Aditya Arie Nugraha, Kouhei Sekiguchi, Yoshiaki Bando, Mathieu Fontaine, Kazuyoshi Yoshii
arXiv_SD
arXiv_SD
Enhancement
Recognition
Speech
Pose
Speech_Recognition
PDF
2022-07-14
Efficient spike encoding algorithms for neuromorphic speech recognition
Sidi Yaya Arnaud Yarga, Jean Rouat, Sean U. N. Wood
arXiv_SD
arXiv_SD
Recognition
Speech
Pose
Classification
Deep_Learning
Speech_Recognition
CNN
PDF
2022-07-14
A Single Self-Supervised Model for Many Speech Modalities Enables Zero-Shot Modality Transfer
Wei-Ning Hsu, Bowen Shi
arXiv_AI
arXiv_AI
Transformer
Recognition
Bert
Zero-Shot
Speech
Self-Supervised
Speech_Recognition
Prediction
PDF
2022-07-14
Data Augmentation for Low-Resource Quechua ASR Improvement
Rodolfo Zevallos, Nuria Bel, Guillermo Cámbara, Mireia Farrús, Jordi Luque
arXiv_CL
arXiv_CL
Recognition
Speech
Deep_Learning
Speech_Recognition
PDF
2022-07-14
Deep versus Wide: An Analysis of Student Architectures for Task-Agnostic Knowledge Distillation of Self-Supervised Speech Models
Takanori Ashihara, Takafumi Moriya, Kohei Matsuura, Tomohiro Tanaka
arXiv_CL
arXiv_CL
Recognition
Knowledge
Speech
Self-Supervised
Speech_Recognition
Inference
PDF
2022-07-14
Open Terminology Management and Sharing Toolkit for Federation of Terminology Databases
Andis Lagzdiņš, Uldis Siliņš, Mārcis Pinnis, Toms Bergmanis, Artūrs Vasiļevskis, Andrejs Vasiļjevs
arXiv_CL
arXiv_CL
Recognition
Speech
Action
GAN
Speech_Recognition
PDF
2022-07-14
Two-Pass Low Latency End-to-End Spoken Language Understanding
Siddhant Arora, Siddharth Dalmia, Xuankai Chang, Brian Yan, Alan Black, Shinji Watanabe
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Speech_Recognition
Inference
Language_Model
Prediction
PDF
2022-07-13
MM-ALT: A Multimodal Automatic Lyric Transcription System
Xiangming Gu, Longshen Ou, Danielle Ong, Ye Wang
arXiv_SD
arXiv_SD
Recognition
Speech
Pose
Detection
Attention
Speech_Recognition
Activity
PDF
2022-07-13
Visual Context-driven Audio Feature Enhancement for Robust End-to-End Audio-Visual Speech Recognition
Joanna Hong, Minsu Kim, Daehun Yoo, Yong Man Ro
arXiv_AI
arXiv_AI
Transformer
Enhancement
Recognition
Speech
Pose
Speech_Recognition
PDF
2022-07-12
Huqariq: A Multilingual Speech Corpus of Native Languages of Peru for Speech Recognition
Rodolfo Zevallos, Luis Camacho, Nelsi Melgarejo
arXiv_CL
arXiv_CL
Recognition
Speech
Speech_Recognition
QA
PDF
2022-07-12
End-to-end speech recognition modeling from de-identified data
Martin Flechl, Shou-Chun Yin, Junho Park, Peter Skala
arXiv_SD
arXiv_SD
Recognition
Speech
Pose
Speech_Recognition
Medical
Matching
PDF
2022-07-11
pMCT: Patched Multi-Condition Training for Robust Speech Recognition
Pablo Peso Parada, Agnieszka Dobrowolska, Karthikeyan Saravanan, Mete Ozay
arXiv_SD
arXiv_SD
Recognition
Speech
Pose
Speech_Recognition
PDF
2022-07-11
Speaker Anonymization with Phonetic Intermediate Representations
Sarina Meyer, Florian Lux, Pavel Denisov, Julia Koch, Pascal Tilli, Ngoc Thang Vu
arXiv_SD
arXiv_SD
Embedding
Recognition
Speech
Pose
Speech_Recognition
PDF
2022-07-11
Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data
Naoki Makishima, Satoshi Suzuki, Atsushi Ando, Ryo Masumura
arXiv_SD
arXiv_SD
Recognition
Optimization
Speech
Pose
Speech_Recognition
PDF
2022-07-11
Online Continual Learning of End-to-End Speech Recognition Models
Muqiao Yang, Ian Lane, Shinji Watanabe
arXiv_AI
arXiv_AI
Recognition
Speech
Self-Supervised
Pose
Speech_Recognition
PDF
2022-07-09
Intermediate-layer output Regularization for Attention-based Speech Recognition with Shared Decoder
Jicheng Zhang, Yizhou Peng, Haihua Xu, Yi He, Eng Siong Chng, Hao Huang
arXiv_SD
arXiv_SD
Recognition
Regularization
Speech
Pose
Attention
Speech_Recognition
PDF
2022-07-09
Internal Language Model Estimation based Language Model Fusion for Cross-Domain Code-Switching Speech Recognition
Yizhou Peng, Yufei Liu, Jicheng Zhang, Haihua Xu, Yi He, Hao Huang, Eng Siong Chng
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Speech_Recognition
Language_Model
PDF
2022-07-08
Tandem Multitask Training of Speaker Diarisation and Speech Recognition for Meeting Transcription
Xianrui Zheng, Chao Zhang, Philip C. Woodland
arXiv_SD
arXiv_SD
Segmentation
Embedding
Recognition
Speech
Self-Supervised
Pose
Classification
Detection
Speech_Recognition
Activity
PDF
2022-07-07
Non-Linear Pairwise Language Mappings for Low-Resource Multilingual Acoustic Model Fusion
Muhammad Umar Farooq, Darshan Adiga Haniya Narayana, Thomas Hain
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Attention
Speech_Recognition
PDF
2022-07-07
Investigating the Impact of Cross-lingual Acoustic-Phonetic Similarities on Multilingual Speech Recognition
Muhammad Umar Farooq, Thomas Hain
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Speech_Recognition
PDF
2022-07-07
End-to-end Speech-to-Punctuated-Text Recognition
Jumon Nozaki, Tatsuya Kawahara, Kenkichi Ishizuka, Taiichi Hashimoto
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Speech_Recognition
Prediction
PDF
2022-07-06
Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding
Yifan Peng, Siddharth Dalmia, Ian Lane, Shinji Watanabe
arXiv_CL
arXiv_CL
Transformer
Recognition
Speech
Pose
Relation
Attention
Speech_Recognition
CNN
Inference
PDF
2022-07-06
Kaggle Competition: Cantonese Audio-Visual Speech Recognition for In-car Commands
Wenliang Dai, Samuel Cahyawijaya, Tiezheng Yu, Elham J Barezi, Pascale Fung
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Action
Deep_Learning
Speech_Recognition
PDF
2022-07-06
Improving Streaming End-to-End ASR on Transformer-based Causal Models with Encoder States Revision Strategies
Zehan Li, Haoran Miao, Keqi Deng, Gaofeng Cheng, Sanli Tian, Ta Li, Yonghong Yan
arXiv_SD
arXiv_SD
Transformer
Recognition
Knowledge
Speech
Pose
Speech_Recognition
PDF
2022-07-05
Compute Cost Amortized Transformer for Streaming ASR
Yi Xie, Jonathan Macoskey, Martin Radfar, Feng-Ju Chang, Brian King, Ariya Rastrow, Athanasios Mouchtaris, Grant P. Strimel
arXiv_CL
arXiv_CL
Transformer
Recognition
Sparse
Speech
Speech_Recognition
Inference
PDF
2022-07-04
Vietnamese Capitalization and Punctuation Recovery Models
Hoang Thi Thu Uyen, Nguyen Anh Tu, Ta Duc Huy
arXiv_CL
arXiv_CL
Recognition
Restoration
Speech
Pose
Speech_Recognition
PDF
2022-07-04
Minimizing Sequential Confusion Error in Speech Command Recognition
Zhanheng Yang, Hang Lv, Xiong Wang, Ao Zhang, Lei Xie
arXiv_SD
arXiv_SD
Recognition
Knowledge
Speech
Pose
Classification
Speech_Recognition
PDF
2022-07-03
Generating gender-ambiguous voices for privacy-preserving speech recognition
Dimitrios Stoidis, Andrea Cavallaro
arXiv_SD
arXiv_SD
Recognition
Represenation_Learning
Adversarial
Speech
Pose
GAN
Speech_Recognition
Inference
PDF
2022-07-03
Leveraging Acoustic Contextual Representation by Audio-textual Cross-modal Learning for Conversational ASR
Kun Wei, Yike Zhang, Sining Sun, Lei Xie, Long Ma
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Relation
Attention
Speech_Recognition
Prediction
PDF
2022-07-02
Improving Transformer-based Conversational ASR by Inter-Sentential Attention Mechanism
Kun Wei, Pengcheng Guo, Ning Jiang
arXiv_CL
arXiv_CL
Transformer
Recognition
Speech
Pose
Attention
Speech_Recognition
Prediction
PDF
2022-07-02
Tree-constrained Pointer Generator with Graph Neural Network Encodings for Contextual Speech Recognition
Guangzhi Sun, Chao Zhang, Philip C. Woodland
arXiv_CL
arXiv_CL
Recognition
Knowledge
Speech
Pose
Speech_Recognition
Prediction
PDF
2022-07-02
UserLibri: A Dataset for ASR Personalization Using Only Text
Theresa Breiner, Swaroop Ramaswamy, Ehsan Variani, Shefali Garg, Rajiv Mathews, Khe Chai Sim, Kilol Gupta, Mingqing Chen, Lara McConnaughey
arXiv_CL
arXiv_CL
Recognition
Speech
Speech_Recognition
Inference
Language_Model
PDF
2022-07-01
Improving Low-Resource Speech Recognition with Pretrained Speech Models: Continued Pretraining vs. Semi-Supervised Training
Mitchell DeHaven, Jayadev Billa
arXiv_CL
arXiv_CL
Transformer
Recognition
Bert
Speech
Self-Supervised
Speech_Recognition
PDF
2022-06-30
Sub-8-Bit Quantization Aware Training for 8-Bit Neural Network Accelerator with On-Device Speech Recognition
Kai Zhen, Hieu Duy Nguyen, Raviteja Chinta, Nathan Susanj, Athanasios Mouchtaris, Tariq Afzal, Ariya Rastrow
arXiv_AI
arXiv_AI
Quantization
Recognition
RNN
Speech
Speech_Recognition
QA
PDF
2022-06-30
FeaRLESS: Feature Refinement Loss for Ensembling Self-Supervised Learning Features in Robust End-to-end Speech Recognition
Szu-Jui Chen, Jiamin Xie, John H.L. Hansen
arXiv_SD
arXiv_SD
Recognition
Speech
Self-Supervised
Pose
Relation
Speech_Recognition
PDF
2022-06-29
Space-Efficient Representation of Entity-centric Query Language Models
Christophe Van Gysel, Mirko Hannemann, Ernest Pusateri, Youssef Oualil, Ilya Oparin
arXiv_AI
arXiv_AI
Recognition
Speech
Speech_Recognition
Language_Model
PDF
2022-06-29
The THUEE System Description for the IARPA OpenASR21 Challenge
Jing Zhao, Haoyu Wang, Jinpeng Li, Shuzhou Chai, Guan-Bo Wang, Guoguo Chen, Wei-Qiang Zhang
arXiv_CL
arXiv_CL
Recognition
Speech
Self-Supervised
Classification
Attention
Speech_Recognition
PDF
2022-06-29
Language-specific Characteristic Assistance for Code-switching Speech Recognition
Tongtong Song, Qiang Xu, Meng Ge, Longbiao Wang, Hao Shi, Yongjie Lv, Yuqin Lin, Jianwu Dang
arXiv_CL
arXiv_CL
Recognition
Knowledge
Speech
Pose
Speech_Recognition
Prediction
PDF
2022-06-29
STOP: A dataset for Spoken Task Oriented Semantic Parsing
Paden Tomasello, Po-Chun Hsu, Akshat Shrivastava, Daniel Lazar, Duc Le, Adithya Sagar, Ali Elkahky, Jade Copet, Wei-Ning Hsu, Yossef Mordechay, Robin Algayres, Tu Ahn Nguyen, Emmanuel Dupoux, Luke Zettlemoyer, Abdelrahman Mohamed
arXiv_CL
arXiv_CL
Recognition
Speech
Speech_Recognition
PDF
2022-06-28
Bengali Common Voice Speech Dataset for Automatic Speech Recognition
Samiul Alam, Asif Sushmit, Zaowad Abdullah, Shahrin Nakkhatra, MD. Nazmuddoha Ansary, Syed Mobassir Hossen, Sazia Morshed Mehnaz, Tahsin Reasat, Ahmed Imtiaz Humayun
arXiv_CL
arXiv_CL
Recognition
Speech
Speech_Recognition
PDF
2022-06-27
TALCS: An Open-Source Mandarin-English Code-Switching Corpus and a Speech Recognition Baseline
Chengfei Li, Shuhao Deng, Yaoping Wang, Guangjing Wang, Yaguang Gong, Changbin Chen, Jinfeng Bai
arXiv_CL
arXiv_CL
Recognition
Knowledge
Speech
Speech_Recognition
PDF
2022-06-26
Improving the Training Recipe for a Robust Conformer-based Hybrid Model
Mohammad Zeineldeen, Jingjing Xu, Christoph Lüscher, Ralf Schlüter, Hermann Ney
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Attention
Speech_Recognition
PDF
2022-06-26
Annotated Speech Corpus for Low Resource Indian Languages: Awadhi, Bhojpuri, Braj and Magahi
Ritesh Kumar, Siddharth Singh, Shyam Ratan, Mohit Raj, Sonal Sinha, bornini lahiri, Vivek Seshadri, Kalika Bali, Atul Kr. Ojha
arXiv_CL
arXiv_CL
Recognition
Speech
Relation
Speech_Recognition
PDF
2022-06-26
On Comparison of Encoders for Attention based End to End Speech Recognition in Standalone and Rescoring Mode
Raviraj Joshi, Subodh Kumar
arXiv_CL
arXiv_CL
Transformer
Recognition
RNN
Speech
Attention
Speech_Recognition
Text_Generation
PDF
2022-06-26
Meta Auxiliary Learning for Low-resource Spoken Language Understanding
Yingying Gao, Junlan Feng, Chao Deng, Shilei Zhang
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Speech_Recognition
Prediction
PDF
2022-06-26
Low-resource Accent Classification in Geographically-proximate Settings: A Forensic and Sociophonetics Perspective
Qingcheng Zeng, Dading Chong, Peilin Zhou, Jie Yang
arXiv_CL
arXiv_CL
Transformer
Recognition
Speech
Classification
Deep_Learning
Attention
Speech_Recognition
PDF
2022-06-25
TEVR: Improving Speech Recognition by Token Entropy Variance Reduction
Hajo Nils Krabbenhöft, Erhardt Barth
arXiv_CL
arXiv_CL
Recognition
Speech
Speech_Recognition
Language_Model
PDF
2022-06-25
Distilling a Pretrained Language Model to a Multilingual ASR Model
Kwanghee Choi, Hyung-Min Park
arXiv_AI
arXiv_AI
Transformer
Recognition
Knowledge
Speech
Pose
Speech_Recognition
Language_Model
PDF
2022-06-24
PoCaP Corpus: A Multimodal Dataset for Smart Operating Room Speech Assistant using Interventional Radiology Workflow Analysis
Kubilay Can Demir, Matthias May, Axel Schmid, Michael Uder, Katharina Breininger, Tobias Weise, Andreas Maier, Seung Hee Yang
arXiv_SD
arXiv_SD
Recognition
Review
Speech
Speech_Recognition
Medical
PDF
2022-06-24
Confidence Score Based Conformer Speaker Adaptation for Speech Recognition
Jiajun Deng, Xurong Xie, Tianzi Wang, Mingyu Cui, Boyang Xue, Zengrui Jin, Mengzhe Geng, Guinan Li, Xunying Liu, Helen Meng
arXiv_SD
arXiv_SD
Transformer
Unsupervised
Recognition
RNN
Speech
Pose
Speech_Recognition
Language_Model
PDF
2022-06-23
Conformer Based Elderly Speech Recognition System for Alzheimer's Disease Detection
Tianzi Wang, Jiajun Deng, Mengzhe Geng, Zi Ye, Shoukang Hu, Yi Wang, Mingyu Cui, Zengrui Jin, Xunying Liu, Helen Meng
arXiv_SD
arXiv_SD
NAS
Recognition
Speech
Pose
Detection
Speech_Recognition
PDF
2022-06-23
Pruned RNN-T for fast, memory-efficient ASR training
Fangjun Kuang, Liyong Guo, Wei Kang, Long Lin, Mingshuang Luo, Zengwei Yao, Daniel Povey
arXiv_AI
arXiv_AI
Embedding
Recognition
RNN
Speech
Speech_Recognition
PDF
2022-06-23
Towards Green ASR: Lossless 4-bit Quantization of a Hybrid TDNN System on the 300-hr Switchboard Corpus
Junhao Xu, Shoukang Hu, Xunying Liu, Helen Meng
arXiv_SD
arXiv_SD
Transformer
Quantization
Recognition
Speech
Pose
Speech_Recognition
Language_Model
PDF
2022-06-23
Two-pass Decoding and Cross-adaptation Based System Combination of End-to-end Conformer and Hybrid TDNN ASR Systems
Mingyu Cui, Jiajun Deng, Shoukang Hu, Xurong Xie, Tianzi Wang, Shujie Hu, Mengzhe Geng, Boyang Xue, Xunying Liu, Helen Meng
arXiv_AI
arXiv_AI
Recognition
Speech
Speech_Recognition
PDF
2022-06-22
Answer Fast: Accelerating BERT on the Tensor Streaming Processor
Ibrahim Ahmed, Sahil Parmar, Matthew Boyd, Michael Beidler, Kris Kang, Bill Liu, Kyle Roach, John Kim, Dennis Abts
arXiv_CL
arXiv_CL
Transformer
Recognition
Bert
Speech
Speech_Recognition
Inference
PDF
2022-06-22
A Simple Baseline for Domain Adaptation in End to End ASR Systems Using Synthetic Data
Raviraj Joshi, Anupam Singh
arXiv_SD
arXiv_SD
Recognition
Speech
Pose
Deep_Learning
Attention
Speech_Recognition
PDF
2022-06-22
A Systematic Comparison of Phonetic Aware Techniques for Speech Enhancement
Or Tal, Moshe Mandel, Felix Kreuk, Yossi Adi
arXiv_SD
arXiv_SD
Embedding
Enhancement
Recognition
Optimization
Regularization
Speech
Self-Supervised
Speech_Recognition
PDF
2022-06-21
Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training
Chengyi Wang, Yiming Wang, Yu Wu, Sanyuan Chen, Jinyu Li, Shujie Liu, Furu Wei
arXiv_SD
arXiv_SD
Transformer
Unsupervised
Recognition
Bert
Speech
Self-Supervised
Pose
Speech_Recognition
Prediction
PDF
2022-06-20
The Makerere Radio Speech Corpus: A Luganda Radio Corpus for Automatic Speech Recognition
Jonathan Mukiibi, Andrew Katumba, Joyce Nakatumba-Nabende, Ali Hussein, Josh Meyer
arXiv_SD
arXiv_SD
Recognition
Knowledge
Speech
GAN
Speech_Recognition
PDF
2022-06-20
Boosting Cross-Domain Speech Recognition with Self-Supervision
Han Zhu, Gaofeng Cheng, Jindong Wang, Wenxin Hou, Pengyuan Zhang, Yonghong Yan
arXiv_SD
arXiv_SD
Transformer
Unsupervised
Recognition
Knowledge
Speech
Self-Supervised
Pose
Face
Speech_Recognition
PDF
2022-06-19
Transfer Learning for Robust Low-Resource Children's Speech ASR with Transformers and Source-Filter Warping
Jenthe Thienpondt, Kris Demuynck
arXiv_SD
arXiv_SD
Transformer
Transfer_Learning
Enhancement
Recognition
Speech
Pose
Speech_Recognition
PDF
2022-06-18
Decoupled Federated Learning for ASR with Non-IID Data
Han Zhu, Jindong Wang, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan
arXiv_SD
arXiv_SD
Recognition
Speech
Pose
Speech_Recognition
PDF
2022-06-16
Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition
Zhifu Gao, Shiliang Zhang, Ian McLoughlin, Zhijie Yan
arXiv_SD
arXiv_SD
Transformer
Embedding
Recognition
Speech
Pose
Speech_Recognition
Inference
Language_Model
PDF
2022-06-16
SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning
Changan Chen, Carl Schissler, Sanchit Garg, Philip Kobernik, Alexander Clegg, Paul Calamia, Dhruv Batra, Philip W Robinson, Kristen Grauman
arXiv_CV
arXiv_CV
Recognition
3D
Knowledge
Speech
Speech_Recognition
Matching
PDF
2022-06-16
Censer: Curriculum Semi-supervised Learning for Speech Recognition Based on Self-supervised Pre-training
Bowen Zhang, Songjun Cao, Xiaoming Zhang, Yike Zhang, Long Ma, Takahiro Shinozaki
arXiv_SD
arXiv_SD
Transformer
Recognition
Speech
Self-Supervised
Pose
Speech_Recognition
PDF
2022-06-16
DRAFT: A Novel Framework to Reduce Domain Shifting in Self-supervised Learning and Its Application to Children's ASR
Ruchao Fan, Abeer Alwan
arXiv_SD
arXiv_SD
Recognition
Bert
Knowledge
Speech
Self-Supervised
Pose
Speech_Recognition
PDF
2022-06-15
AVATAR: Unconstrained Audiovisual Speech Recognition
Valentin Gabeur, Paul Hongsuck Seo, Arsha Nagrani, Chen Sun, Karteek Alahari, Cordelia Schmid
arXiv_CV
arXiv_CV
Transformer
Recognition
Speech
Pose
Action
Attention
Speech_Recognition
PDF
2022-06-15
Transformer-based Automatic Speech Recognition of Formal and Colloquial Czech in MALACH Project
Jan Lehečka, Josef V. Psutka, Josef Psutka
arXiv_CL
arXiv_CL
Transformer
Recognition
Speech
Self-Supervised
Speech_Recognition
Language_Model
PDF
2022-06-15
Exploring Capabilities of Monolingual Audio Transformers using Large Datasets in Automatic Speech Recognition of Czech
Jan Lehečka, Jan Švec, Aleš Pražák, Josef V. Psutka
arXiv_CL
arXiv_CL
Transformer
Recognition
Zero-Shot
Speech
Speech_Recognition
PDF
2022-06-15
Residual Language Model for End-to-end Speech Recognition
Emiru Tsunoo, Yosuke Kashiwagi, Chaitanya Narisetty, Shinji Watanabe
arXiv_SD
arXiv_SD
Recognition
Speech
Pose
Speech_Recognition
Inference
Language_Model
PDF
2022-06-15
Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging Features For Elderly And Dysarthric Speech Recognition
Shujie Hu, Xurong Xie, Mengzhe Geng, Mingyu Cui, Jiajun Deng, Tianzi Wang, Xunying Liu, Helen Meng
arXiv_AI
arXiv_AI
Transformer
Recognition
Speech
Speech_Recognition
PDF
2022-06-15
Streaming non-autoregressive model for any-to-many voice conversion
Ziyi Chen, Haoran Miao, Pengyuan Zhang
arXiv_SD
arXiv_SD
Transformer
Recognition
Speech
Pose
Speech_Recognition
PDF
2022-06-13
Toward Zero Oracle Word Error Rate on the Switchboard Benchmark
Arlo Faria, Adam Janin, Korbinian Riedhammer, Sidhi Adkoli
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Speech_Recognition
PDF
2022-06-11
Investigation of Ensemble features of Self-Supervised Pretrained Models for Automatic Speech Recognition
A Arunkumar, Vrunda N Sukhadia, S. Umesh
arXiv_CL
arXiv_CL
Embedding
Recognition
Bert
Speech
Self-Supervised
Pose
Speech_Recognition
PDF
2022-06-10
AHD ConvNet for Speech Emotion Classification
Asfand Ali, Danial Nasir, Mohammad Hassan Jawad
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Emotion
Classification
Speech_Recognition
PDF
2022-06-09
Revisiting End-to-End Speech-to-Text Translation From Scratch
Biao Zhang, Barry Haddow, Rico Sennrich
arXiv_CL
arXiv_CL
Transformer
Recognition
Speech
Pose
Attention
Speech_Recognition
PDF
2022-06-09
Face-Dubbing++: Lip-Synchronous, Voice Preserving Translation of Videos
Alexander Waibel, Moritz Behr, Fevziye Irem Eyiokur, Dogucan Yaman, Tuan-Nam Nguyen, Carlos Mullov, Mehmet Arif Demirtas, Alperen Kantarcı, Stefan Constantin, Hazım Kemal Ekenel
arXiv_CV
arXiv_CV
Recognition
Adversarial
Speech
Pose
Face
Detection
Speech_Recognition
PDF
2022-06-09
Joint Encoder-Decoder Self-Supervised Pre-training for ASR
Arunkumar A, Umesh S
arXiv_CL
arXiv_CL
Transformer
Embedding
Recognition
Bert
Speech
Self-Supervised
Pose
Speech_Recognition
Language_Model
Prediction
PDF
2022-06-09
Context-based out-of-vocabulary word recovery for ASR systems in Indian languages
Arun Baby, Saranya Vinnaitherthan, Akhil Kerhalkar, Pranav Jawale, Sharath Adavanne, Nagaraj Adiga
arXiv_CL
arXiv_CL
Recognition
Knowledge
Speech
Pose
Speech_Recognition
Language_Model
Matching
PDF
2022-06-07
Towards Understanding and Mitigating Audio Adversarial Examples for Speaker Recognition
Guangke Chen, Zhe Zhao, Fu Song, Sen Chen, Lingling Fan, Feng Wang, Jiashui Wang
arXiv_AI
arXiv_AI
Recognition
Adversarial
Speech
Pose
Speech_Recognition
PDF
2022-06-07
LegoNN: Building Modular Encoder-Decoder Models
Siddharth Dalmia, Dmytro Okhonko, Mike Lewis, Sergey Edunov, Shinji Watanabe, Florian Metze, Luke Zettlemoyer, Abdelrahman Mohamed
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Face
Speech_Recognition
PDF
2022-06-06
FedNST: Federated Noisy Student Training for Automatic Speech Recognition
Haaris Mehmood, Agnieszka Dobrowolska, Karthikeyan Saravanan, Mete Ozay
arXiv_AI
arXiv_AI
Recognition
Speech
Self-Supervised
Pose
Face
Speech_Recognition
PDF
2022-06-05
Variable-rate hierarchical CPC leads to acoustic unit discovery in speech
Santiago Cuervo, Adrian Łańcucki, Ricard Marxer, Paweł Rychlikowski, Jan Chorowski
arXiv_AI
arXiv_AI
Segmentation
Quantization
Recognition
Speech
Self-Supervised
Pose
Deep_Learning
Speech_Recognition
Prediction
PDF
2022-06-05
Lip-Listening: Mixing Senses to Understand Lips using Cross Modality Knowledge Distillation for Word-Based Models
Hadeel Mabrouk, Omar Abugabal, Nourhan Sakr, Hesham M. Eraqi
arXiv_CL
arXiv_CL
Recognition
Knowledge
Speech
Pose
Speech_Recognition
PDF
2022-06-05
LAE: Language-Aware Encoder for Monolingual and Multilingual ASR
Jinchuan Tian, Jianwei Yu, Chunlei Zhang, Chao Weng, Yuexian Zou, Dong Yu
arXiv_AI
arXiv_AI
Recognition
Speech
Pose
Speech_Recognition
PDF
2022-06-02
Pronunciation Dictionary-Free Multilingual Speech Synthesis by Combining Unsupervised and Supervised Phonetic Representations
Chang Liu, Zhen-Hua Ling, Ling-Hui Chen
arXiv_SD
arXiv_SD
Unsupervised
Recognition
Speech
Pose
Classification
Speech_Recognition
PDF
2022-06-02
Squeezeformer: An Efficient Transformer for Automatic Speech Recognition
Sehoon Kim, Amir Gholami, Albert Shaw, Nicholas Lee, Karttikeya Mangalam, Jitendra Malik, Michael W. Mahoney, Kurt Keutzer
arXiv_CL
arXiv_CL
Transformer
Recognition
Speech
Pose
Attention
Speech_Recognition
CNN
Language_Model
PDF
2022-06-01
Snow Mountain: Dataset of Audio Recordings of The Bible in Low Resource Languages
Kavitha Raju, Anjaly V, Ryan Lish, Joel Mathew
arXiv_SD
arXiv_SD
Recognition
Speech
Speech_Recognition
PDF
2022-05-30
Adversarial synthesis based data-augmentation for code-switched spoken language identification
Parth Shastri, Chirag Patil, Poorval Wanere, Dr. Shrinivas Mahajan, Dr. Abhishek Bhatt, Dr. Hardik Sailor
arXiv_CV
arXiv_CV
Recognition
RNN
Adversarial
Speech
Pose
Classification
GAN
Speech_Recognition
CNN
PDF
2022-05-29
Speaker Identification using Speech Recognition
Syeda Rabia Arshad, Syed Mujtaba Haider, Abdul Basit Mughal
arXiv_SD
arXiv_SD
Unsupervised
Recognition
Speech
Pose
Speech_Recognition
PDF
2022-05-28
Adaptive Activation Network For Low Resource Multilingual Speech Recognition
Jian Luo, Jianzong Wang, Ning Cheng, Zhenpeng Zheng, Jing Xiao
arXiv_SD
arXiv_SD
Transformer
Recognition
Speech
Pose
Classification
Deep_Learning
Speech_Recognition
PDF
2022-05-28
Is Lip Region-of-Interest Sufficient for Lipreading?
Jing-Xuan Zhang, Gen-Shun Wan, Jia Pan
arXiv_CV
arXiv_CV
Recognition
Bert
Speech
Self-Supervised
Pose
Emotion
Face
Speech_Recognition
PDF
2022-05-27
Punctuation Restoration in Spanish Customer Support Transcripts using Transfer Learning
Xiliang Zhu, Shayna Gardiner, David Rossouw, Tere Roldán, Simon Corston-Oliver
arXiv_CL
arXiv_CL
Transfer_Learning
Recognition
Restoration
Speech
Pose
Speech_Recognition
PDF
2022-05-26
Global Normalization for Streaming Speech Recognition in a Modular Framework
Ehsan Variani, Ke Wu, Michael Riley, David Rybach, Matt Shannon, Cyril Allauzen
arXiv_AI
arXiv_AI
Recognition
Speech
Speech_Recognition
PDF
2022-05-26
Contextual Adapters for Personalized Speech Recognition in Neural Transducers
Kanthashree Mysore Sathyendra, Thejaswi Muniyappa, Feng-Ju Chang, Jing Liu, Jinru Su, Grant P. Strimel, Athanasios Mouchtaris, Siegfried Kunzmann
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Speech_Recognition
Inference
Language_Model
PDF
2022-05-26
Clinical Dialogue Transcription Error Correction using Seq2Seq Models
Gayani Nanayakkara, Nirmalie Wiratunga, David Corsar, Kyle Martin, Anjana Wijekoon
arXiv_AI
arXiv_AI
Recognition
Speech
Speech_Recognition
Medical
PDF
2022-05-26
Joint Training of Speech Enhancement and Self-supervised Model for Noise-robust ASR
Qiu-Shi Zhu, Jie Zhang, Zi-Qiang Zhang, Li-Rong Dai
arXiv_SD
arXiv_SD
Transformer
Enhancement
Recognition
Speech
Self-Supervised
Pose
Attention
Speech_Recognition
PDF
2022-05-25
On Building Spoken Language Understanding Systems for Low Resourced Languages
Akshat Gupta
arXiv_AI
arXiv_AI
Recognition
Speech
Face
Classification
Speech_Recognition
PDF
2022-05-25
Semantic-preserved Communication System for Highly Efficient Speech Transmission
Tianxiao Han, Qianqian Yang, Zhiguo Shi, Shibo He, Zhaoyang Zhang
arXiv_SD
arXiv_SD
Reconstruction
Recognition
Speech
Pose
Deep_Learning
Speech_Recognition
PDF
2022-05-25
Investigating Lexical Replacements for Arabic-English Code-Switched Data Augmentation
Injy Hamed, Nizar Habash, Slim Abdennadher, Ngoc Thang Vu
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Speech_Recognition
Language_Model
PDF
2022-05-25
Heterogeneous Reservoir Computing Models for Persian Speech Recognition
Zohreh Ansari, Farzin Pourhoseini, Fatemeh Hadaeghi
arXiv_SD
arXiv_SD
Recognition
RNN
Knowledge
Speech
Pose
Speech_Recognition
Language_Model
PDF
2022-05-25
An Investigation on Applying Acoustic Feature Conversion to ASR of Adult and Child Speech
Wei Liu, Jingyu Li, Tan Lee
arXiv_SD
arXiv_SD
Unsupervised
Recognition
Speech
Classification
Speech_Recognition
Matching
PDF
2022-05-25
Improving CTC-based ASR Models with Gated Interlayer Collaboration
Yuting Yang, Yuke Li, Binbin Du
arXiv_CL
arXiv_CL
Embedding
Recognition
Speech
Pose
Action
Speech_Recognition
Inference
Language_Model
PDF
2022-05-25
FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech
Alexis Conneau, Min Ma, Simran Khanuja, Yu Zhang, Vera Axelrod, Siddharth Dalmia, Jason Riesa, Clara Rivera, Ankur Bapna
arXiv_CL
arXiv_CL
Recognition
Speech
SLAM
Few-Shot
Speech_Recognition
PDF
2022-05-24
Adaptive multilingual speech recognition with pretrained models
Ngoc-Quan Pham, Alex Waibel, Jan Niehues
arXiv_CL
arXiv_CL
Unsupervised
Recognition
Knowledge
Speech
Attention
Speech_Recognition
PDF
2022-05-24
Multi-Level Modeling Units for End-to-End Mandarin Speech Recognition
Yuting Yang, Binbin Du, Yuke Li
arXiv_CL
arXiv_CL
Transformer
Recognition
Speech
Speech_Recognition
Inference
Language_Model
PDF
2022-05-24
DPSNN: A Differentially Private Spiking Neural Network
Jihang Wang, Dongcheng Zhao, Guobin Shen, Qian Zhang, Yi Zeng
arXiv_AI
arXiv_AI
Recognition
Speech
Pose
Face
Classification
Detection
Object_Detection
Face_Recognition
Speech_Recognition
Image_Classification
PDF
2022-05-23
Training Efficient CNNS: Tweaking the Nuts and Bolts of Neural Networks for Lighter, Faster and Robust Models
Sabeesh Ethiraj, Bharath Kumar Bolla
arXiv_AI
arXiv_AI
Recognition
Speech
Deep_Learning
Speech_Recognition
PDF
2022-05-23
Calibrate and Refine! A Novel and Agile Framework for ASR-error Robust Intent Detection
Peilin Zhou, Dading Chong, Helin Wang, Qingcheng Zeng
arXiv_CL
arXiv_CL
Recognition
Speech
Pose
Deep_Learning
Detection
Speech_Recognition
PDF
2022-05-22
Deep Learning for Visual Speech Analysis: A Survey
Changchong Sheng, Gangyao Kuang, Liang Bai, Chenping Hou, Yulan Guo, Xin Xu, Matti Pietikäinen, Li Liu
arXiv_CV
arXiv_CV
Recognition
Review
Speech
Pose
Survey
Deep_Learning
Attention
Speech_Recognition
Medical
PDF
2022-05-22
Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Zhenhailong Wang, Manling Li, Ruochen Xu, Luowei Zhou, Jie Lei, Xudong Lin, Shuohang Wang, Ziyi Yang, Chenguang Zhu, Derek Hoiem, Shih-Fu Chang, Mohit Bansal, Heng Ji
arXiv_AI
arXiv_AI
Recognition
Video_Caption
Speech
Pose
Few-Shot
Speech_Recognition
Caption
Language_Model
Prediction
PDF
2022-05-21
Self-Supervised Speech Representation Learning: A Review
Abdelrahman Mohamed, Hung-yi Lee, Lasse Borgholt, Jakob D. Havtorn, Joakim Edin, Christian Igel, Katrin Kirchhoff, Shang-Wen Li, Karen Livescu, Lars Maaløe, Tara N. Sainath, Shinji Watanabe
arXiv_CL
arXiv_CL
Transformer
Embedding
Recognition
Represenation_Learning
Review
Speech
Self-Supervised
Deep_Learning
Speech_Recognition
PDF
2022-05-20
NeuralEcho: A Self-Attentive Recurrent Neural Network For Unified Acoustic Echo Suppression And Speech Enhancement
Meng Yu, Yong Xu, Chunlei Zhang, Shi-Xiong Zhang, Dong Yu
arXiv_SD
arXiv_SD
Embedding
Enhancement
Recognition
RNN
Speech
Pose
Attention
Speech_Recognition
PDF
2022-05-20
Predicting electrode array impedance after one month from cochlear implantation surgery
Yousef A. Alohali, Yassin Abdelsamad, Tamer Mesallam, Fida Almuhawas, Abdulrahman Hagr, Mahmoud S. Fayed
arXiv_AI
arXiv_AI
Recognition
Speech
Speech_Recognition
PDF
2022-05-20
Set-based Meta-Interpolation for Few-Task Meta-Learning
Seanie Lee, Bruno Andreis, Kenji Kawaguchi, Juho Lee, Sung Ju Hwang
arXiv_AI
arXiv_AI
Recognition
Text_Classification
Optimization
Knowledge
Speech
Pose
Classification
Speech_Recognition
Image_Classification
Prediction
PDF
2022-05-19
Content-Context Factorized Representations for Automated Speech Recognition
David M. Chan, Shalini Ghosh
arXiv_SD
arXiv_SD
Unsupervised
Recognition
Speech
Relation
Speech_Recognition
PDF
2022-05-19
Automatic Spoken Language Identification using a Time-Delay Neural Network
Benjamin Kepecs, Homayoon Beigi
arXiv_CL
arXiv_CL
Recognition
Speech
Speech_Recognition
Language_Model
PDF
2022-05-19
Insights on Neural Representations for End-to-End Speech Recognition
Anna Ollerenshaw, Md Asif Jalal, Thomas Hain
arXiv_CL
arXiv_CL
Transformer
Recognition
RNN
Speech
Relation
Speech_Recognition
PDF