Paper Reading AI Learner

BowNet: Dilated Convolution Neural Network for Ultrasound Tongue Contour Extraction

2019-06-10 19:04:09
M. Hamed Mozaffari, Won-Sook Lee

Abstract

Ultrasound imaging is safe, relatively affordable, and capable of real-time performance. One application of this technology is to visualize and to characterize human tongue shape and motion during a real-time speech to study healthy or impaired speech production. Due to the noisy nature of ultrasound images with low-contrast characteristic, it might require expertise for non-expert users to recognize organ shape such as tongue surface (dorsum). To alleviate this difficulty for quantitative analysis of tongue shape and motion, tongue surface can be extracted, tracked, and visualized instead of the whole tongue region. Delineating the tongue surface from each frame is a cumbersome, subjective, and error-prone task. Furthermore, the rapidity and complexity of tongue gestures have made it a challenging task, and manual segmentation is not a feasible solution for real-time applications. Employing the power of state-of-the-art deep neural network models and training techniques, it is feasible to implement new fully-automatic, accurate, and robust segmentation methods with the capability of real-time performance, applicable for tracking of the tongue contours during the speech. This paper presents two novel deep neural network models named BowNet and wBowNet benefits from the ability of global prediction of decoding-encoding models, with integrated multi-scale contextual information, and capability of full-resolution (local) extraction of dilated convolutions. Experimental results using several ultrasound tongue image datasets revealed that the combination of both localization and globalization searching could improve prediction result significantly. Assessment of BowNet models using both qualitatively and quantitatively studies showed them outstanding achievements in terms of accuracy and robustness in comparison with similar techniques.

Abstract (translated)

超声成像是安全的,相对便宜,并能够实时的性能。该技术的一个应用是在实时语音中可视化和描述人类舌头的形状和运动,以研究健康或受损的语音生成。由于低对比度超声图像的噪声特性,非专家用户可能需要专业知识来识别器官形状,如舌面(背)。为了减轻定量分析舌头形状和运动的困难,可以提取、跟踪和可视化舌头表面,而不是整个舌头区域。从每一帧描绘舌头表面是一项繁琐、主观和容易出错的任务。此外,语言手势的快速性和复杂性使其成为一项具有挑战性的任务,而手工分割并不是实时应用的可行解决方案。利用最先进的深部神经网络模型和训练技术,可以实现新的全自动、精确和鲁棒的分割方法,具有实时性能,适用于语音过程中的舌廓跟踪。本文提出了两种新的深度神经网络模型Bownet和Wbownet,它得益于解码编码模型的全局预测能力、集成的多尺度上下文信息以及扩展卷积的全分辨率(局部)提取能力。利用多个超声舌图像数据集的实验结果表明,局部化和全球化搜索相结合可以显著提高预测结果。通过定性和定量研究对弓网模型进行评估,结果表明,与同类技术相比,弓网模型在精度和鲁棒性方面取得了显著的成就。

URL

https://arxiv.org/abs/1906.04232

PDF

https://arxiv.org/pdf/1906.04232.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot