In this paper, we explore various approaches for semi supervised learning in an end to end automatic speech recognition (ASR) framework. The first step in our approach involves training a seed model on the limited amount of labelled data. Additional unlabelled speech data is employed through a data selection mechanism to obtain the best hypothesized output, further used to retrain the seed model. However, uncertainties of the model may not be well captured with a single hypothesis. As opposed to this technique, we apply a dropout mechanism to capture the uncertainty by obtaining multiple hypothesized text transcripts of an speech recording. We assume that the diversity of automatically generated transcripts for an utterance will implicitly increase the reliability of the model. Finally, the data selection process is also applied on these hypothesized transcripts to reduce the uncertainty. Experiments on freely available TEDLIUM corpus and proprietary Adobe's internal dataset show that the proposed approach significantly reduces ASR errors, compared to the baseline model.
https://arxiv.org/abs/1908.05227
We introduce a novel deep neural network architecture that links visual regions to corresponding textual segments including phrases and words. To accomplish this task, our architecture makes use of the rich semantic information available in a joint embedding space of multi-modal data. From this joint embedding space, we extract the associative localization maps that develop naturally, without explicitly providing supervision during training for the localization task. The joint space is learned using a bidirectional ranking objective that is optimized using a $N$-Pair loss formulation. This training mechanism demonstrates the idea that localization information is learned inherently while optimizing a Bidirectional Retrieval objective. The model's retrieval and localization performance is evaluated on MSCOCO and Flickr30K Entities datasets. This architecture outperforms the state of the art results in the semi-supervised phrase localization setting.
https://arxiv.org/abs/1908.02950
This paper presents a novel framework for predicting shot location and type in tennis. Inspired by recent neuroscience discoveries we incorporate neural memory modules to model the episodic and semantic memory components of a tennis player. We propose a Semi Supervised Generative Adversarial Network architecture that couples these memory models with the automatic feature learning power of deep neural networks and demonstrate methodologies for learning player level behavioural patterns with the proposed framework. We evaluate the effectiveness of the proposed model on tennis tracking data from the 2012 Australian Tennis open and exhibit applications of the proposed method in discovering how players adapt their style depending on the match context.
https://arxiv.org/abs/1901.05123
Conventional methods for visual assessment of civil infrastructures have certain limitations, such as subjectivity of the collected data, long inspection time, and high cost of labor. Although some new technologies i.e. robotic techniques that are currently in practice can collect objective, quantified data, the inspectors own expertise is still critical in many instances since these technologies are not designed to work interactively with human inspector. This study aims to create a smart, human centered method that offers significant contributions to infrastructure inspection, maintenance, management practice, and safety for the bridge owners. By developing a smart Mixed Reality framework, which can be integrated into a wearable holographic headset device, a bridge inspector, for example, can automatically analyze a certain defect such as a crack that he or she sees on an element, display its dimension information in real-time along with the condition state. Such systems can potentially decrease the time and cost of infrastructure inspections by accelerating essential tasks of the inspector such as defect measurement, condition assessment and data processing to management systems. The human centered artificial intelligence will help the inspector collect more quantified and objective data while incorporating inspectors professional judgement. This study explains in detail the described system and related methodologies of implementing attention guided semi supervised deep learning into mixed reality technology, which interacts with the human inspector during assessment. Thereby, the inspector and the AI will collaborate or communicate for improved visual inspection.
https://arxiv.org/abs/1812.05659
The level of PD-L1 expression in immunohistochemistry (IHC) assays is a key biomarker for the identification of Non-Small-Cell-Lung-Cancer (NSCLC) patients that may respond to anti PD-1/PD-L1 treatments. The quantification of PD-L1 expression currently includes the visual estimation of a Tumor Cell (TC) score by a pathologist and consists of evaluating the ratio of PD-L1 positive and PD-L1 negative tumor cells. Known challenges like differences in positivity estimation around clinically relevant cut-offs and sub-optimal quality of samples makes visual scoring tedious and subjective, yielding a scoring variability between pathologists. In this work, we propose a novel deep learning solution that enables the first automated and objective scoring of PD-L1 expression in late stage NSCLC needle biopsies. To account for the low amount of tissue available in biopsy images and to restrict the amount of manual annotations necessary for training, we explore the use of semi-supervised approaches against standard fully supervised methods. We consolidate the manual annotations used for training as well the visual TC scores used for quantitative evaluation with multiple pathologists. Concordance measures computed on a set of slides unseen during training provide evidence that our automatic scoring method matches visual scoring on the considered dataset while ensuring repeatability and objectivity.
免疫组织化学(IHC)测定中PD-L1表达水平是鉴定可能对抗PD-1 / PD-L1治疗有反应的非小细胞肺癌(NSCLC)患者的关键生物标志物。 PD-L1表达的定量目前包括由病理学家对肿瘤细胞(TC)评分的视觉估计,并且包括评估PD-L1阳性和PD-L1阴性肿瘤细胞的比率。已知的挑战如临床相关临界点附近的积极性评估差异和样品的次优质量使得视觉评分冗长和主观,产生病理学家之间的评分变异性。在这项工作中,我们提出了一种新型的深度学习解决方案,能够在晚期NSCLC穿刺活检中首次实现PD-L1表达的自动客观评分。为了解释活组织图像中可用的组织量较少并限制培训所需的手动注释的数量,我们探讨了在标准完全监督方法中使用半监督方法。我们整合了用于培训的手册注释以及用于与多位病理学家进行定量评估的视觉TC分数。通过在训练期间看不到的一组幻灯片计算出的一致性度量提供了证据,证明我们的自动评分方法与所考虑数据集的视觉评分匹配,同时确保可重复性和客观性。
https://arxiv.org/abs/1806.11036