Sparse view NeRF is challenging because limited input images lead to an under constrained optimization problem for volume rendering. Existing methods address this issue by relying on supplementary information, such as depth maps. However, generating this supplementary information accurately remains problematic and often leads to NeRF producing images with undesired artifacts. To address these artifacts and enhance robustness, we propose SSNeRF, a sparse view semi supervised NeRF method based on a teacher student framework. Our key idea is to challenge the NeRF module with progressively severe sparse view degradation while providing high confidence pseudo labels. This approach helps the NeRF model become aware of noise and incomplete information associated with sparse views, thus improving its robustness. The novelty of SSNeRF lies in its sparse view specific augmentations and semi supervised learning mechanism. In this approach, the teacher NeRF generates novel views along with confidence scores, while the student NeRF, perturbed by the augmented input, learns from the high confidence pseudo labels. Our sparse view degradation augmentation progressively injects noise into volume rendering weights, perturbs feature maps in vulnerable layers, and simulates sparse view blurriness. These augmentation strategies force the student NeRF to recognize degradation and produce clearer rendered views. By transferring the student's parameters to the teacher, the teacher gains increased robustness in subsequent training iterations. Extensive experiments demonstrate the effectiveness of our SSNeRF in generating novel views with less sparse view degradation. We will release code upon acceptance.
稀疏视图NeRF具有挑战性,因为有限的输入图像导致体积渲染中的约束优化问题。现有的方法通过依赖补充信息(如深度图)来解决此问题。然而,生成此补充信息准确仍然具有问题,并通常导致NeRF产生具有不希望伪影的图像。为了解决这些伪影并提高稳健性,我们提出了SSNeRF,一种基于教师-学生框架的稀疏视图半监督NeRF方法。我们的关键想法是,在提供高置信度伪标签的同时,逐步挑战NeRF模块,从而改善其稳健性。SSNeRF的创新之处在于其稀疏视图特定的增强和半监督学习机制。在这种方法中,教师NeRF生成新的视图并伴信心分数,而学生NeRF在增强的输入下扰动,从高置信度伪标签中学习。我们的稀疏视图衰减增强策略逐级注入噪声到体积渲染权重中,扰动易受影响的层特征图并模拟稀疏视图模糊。这些增强策略迫使学生NeRF识别衰减并产生更清晰的渲染视图。通过将学生的参数传递给教师,教师在后续训练迭代中获得了更大的稳健性。大量实验证明了我们SSNeRF在生成新颖视图的同时具有较低的稀疏视图衰减的有效性。我们将在审核通过后发布代码。
https://arxiv.org/abs/2408.09144
Semi-supervised domain adaptation methods leverage information from a source labelled domain with the goal of generalizing over a scarcely labelled target domain. While this setting already poses challenges due to potential distribution shifts between domains, an even more complex scenario arises when source and target data differs in modality representation (e.g. they are acquired by sensors with different characteristics). For instance, in remote sensing, images may be collected via various acquisition modes (e.g. optical or radar), different spectral characteristics (e.g. RGB or multi-spectral) and spatial resolutions. Such a setting is denoted as Semi-Supervised Heterogeneous Domain Adaptation (SSHDA) and it exhibits an even more severe distribution shift due to modality heterogeneity across this http URL cope with the challenging SSHDA setting, here we introduce SHeDD (Semi-supervised Heterogeneous Domain Adaptation via Disentanglement) an end-to-end neural framework tailored to learning a target domain classifier by leveraging both labelled and unlabelled data from heterogeneous data sources. SHeDD is designed to effectively disentangle domain-invariant representations, relevant for the downstream task, from domain-specific information, that can hinder the cross-modality transfer. Additionally, SHeDD adopts an augmentation-based consistency regularization mechanism that takes advantages of reliable pseudo-labels on the unlabelled target samples to further boost its generalization ability on the target domain. Empirical evaluations on two remote sensing benchmarks, encompassing heterogeneous data in terms of acquisition modes and spectral/spatial resolutions, demonstrate the quality of SHeDD compared to both baseline and state-of-the-art competing approaches. Our code is publicly available here: this https URL
半监督域适应方法利用带有标签的source域中的信息,以实现对目标域的泛化。然而,由于领域之间可能存在的分布变化,这种设置本身就具有挑战性。当源和目标数据在模态表示上不同(例如,它们是由具有不同特性的传感器获取的)时,情况变得更加复杂。例如,在遥感领域,图像可以通过各种采集模式(例如光学或雷达)以及不同的光谱特性和空间分辨率进行收集。这种设置被称为半监督异质领域自适应(SSHDA),并且由于模态异质性在这些http URL上表现更加严重,它对半监督SSHDA设置的挑战更大。在这里,我们引入了SHeDD(通过解耦来半监督异质领域自适应)这个端到端的神经框架,它专门为通过利用带有标签和未标记的数据源异质数据进行目标领域分类而设计。SHeDD旨在有效地解耦领域无关表示,这对于下游任务是相关的,同时利用无标记目标样本的可靠伪标签进一步增强其在目标领域的泛化能力。在两个遥感基准上的实证评估,包括 acquisition modes 和 spectral/spatial resolutions 中的异质数据,证明了SHeDD相对于基线和最先进的竞争方法具有更高的质量。我们的代码在这里是公开可用的:这个https URL。
https://arxiv.org/abs/2406.14087
A major roadblock in the seamless digitization of medical records remains the lack of interoperability of existing records. Extracting relevant medical information required for further treatment planning or even research is a time consuming labour intensive task involving the much valuable time of doctors. In this demo paper we present, MedPromptExtract an automated tool using a combination of semi supervised learning, large language models, natural lanuguage processing and prompt engineering to convert unstructured medical records to structured data which is amenable to further analysis.
在医疗记录的无缝数字化过程中,一个主要障碍是现有记录之间的不兼容性。从进一步治疗计划或研究提取相关医疗信息是一个耗时且劳动密集型任务,涉及医生们宝贵的时间。在本文演示论文中,我们提出了MedPromptExtract,一种使用半监督学习、大型语言模型、自然语言处理和提示工程相结合的自动工具,将无结构医疗记录转换为可以进一步分析的结构化数据。
https://arxiv.org/abs/2405.02664
Photovoltaic (PV) systems allow us to tap into all abundant solar energy, however they require regular maintenance for high efficiency and to prevent degradation. Traditional manual health check, using Electroluminescence (EL) imaging, is expensive and logistically challenging making automated defect detection essential. Current automation approaches require extensive manual expert labeling, which is time-consuming, expensive, and prone to errors. We propose PV-S3 (Photovoltaic-Semi Supervised Segmentation), a Semi-Supervised Learning approach for semantic segmentation of defects in EL images that reduces reliance on extensive labeling. PV-S3 is a Deep learning model trained using a few labeled images along with numerous unlabeled images. We introduce a novel Semi Cross-Entropy loss function to train PV-S3 which addresses the challenges specific to automated PV defect detection, such as diverse defect types and class imbalance. We evaluate PV-S3 on multiple datasets and demonstrate its effectiveness and adaptability. With merely 20% labeled samples, we achieve an absolute improvement of 9.7% in IoU, 29.9% in Precision, 12.75% in Recall, and 20.42% in F1-Score over prior state-of-the-art supervised method (which uses 100% labeled samples) on UCF-EL dataset (largest dataset available for semantic segmentation of EL images) showing improvement in performance while reducing the annotation costs by 80%.
光伏(PV)系统允许我们利用丰富的太阳能能量,然而它们需要定期维护以实现高效和防止降解。传统的手动健康检查使用发光二极管(EL)成像,代价昂贵且具有挑战性,因此自动缺陷检测变得至关重要。目前的自动化方法需要大量手动专家标注,这需要花费时间、金钱,并且容易出错。我们提出了PV-S3(光伏-半监督分割),一种用于EL图像中缺陷语义分割的半监督学习方法,减少了对于广泛标注的依赖。 PV-S3是一个通过几张带标签图像和大量未标记图像进行训练的深度学习模型。我们引入了一种新颖的半交叉熵损失函数来训练PV-S3,解决了自动PV缺陷检测中特定的挑战,例如多样缺陷类型和类别不平衡。我们在多个数据集上评估PV-S3,并证明了其有效性和可适应性。 只需20%的带标签样本,我们实现了IoU绝对值 improve 9.7%,Precision绝对值 improve 29.9%,Recall绝对值 improve 12.75%,F1-Score绝对值 improve 20.42%,在UCF-EL数据集(可用于EL图像 semantic分割的最大数据集)上的性能改善,同时将 annotations costs 降低80%。
https://arxiv.org/abs/2404.13693
Perceptual quality assessment of user generated content (UGC) videos is challenging due to the requirement of large scale human annotated videos for training. In this work, we address this challenge by first designing a self-supervised Spatio-Temporal Visual Quality Representation Learning (ST-VQRL) framework to generate robust quality aware features for videos. Then, we propose a dual-model based Semi Supervised Learning (SSL) method specifically designed for the Video Quality Assessment (SSL-VQA) task, through a novel knowledge transfer of quality predictions between the two models. Our SSL-VQA method uses the ST-VQRL backbone to produce robust performances across various VQA datasets including cross-database settings, despite being learned with limited human annotated videos. Our model improves the state-of-the-art performance when trained only with limited data by around 10%, and by around 15% when unlabelled data is also used in SSL. Source codes and checkpoints are available at this https URL.
用户生成内容(UGC)视频的感知质量评估具有挑战性,因为需要大量标注大规模人类视频进行训练。在这项工作中,我们通过首先设计了一个自监督的时空视觉质量表示学习(ST-VQRL)框架来生成具有稳健质量感的视频特征。然后,我们提出了一种基于双模型的半监督学习(SSL)方法,特别针对视频质量评估(SSL-VQA)任务进行设计,通过两个模型之间质量预测的新知识传递。我们的SSL-VQA方法使用ST-VQRL骨架在各种VQA数据集(包括跨数据库设置)上产生稳健的性能。通过仅使用有限的人类标注视频进行训练,我们的模型在仅有限数据时将最先进的性能提高了约10%,而在使用未标记数据时提高了约15%。源代码和检查点可通过此链接访问:https://www.osac.org/papers/transactions-on-multimedia-computing-and-communication-tmmc-2022-v2/
https://arxiv.org/abs/2312.15425
Image quality assessment (IQA) plays a critical role in optimizing radiation dose and developing novel medical imaging techniques in computed tomography (CT). Traditional IQA methods relying on hand-crafted features have limitations in summarizing the subjective perceptual experience of image quality. Recent deep learning-based approaches have demonstrated strong modeling capabilities and potential for medical IQA, but challenges remain regarding model generalization and perceptual accuracy. In this work, we propose a multi-scale distributions regression approach to predict quality scores by constraining the output distribution, thereby improving model generalization. Furthermore, we design a dual-branch alignment network to enhance feature extraction capabilities. Additionally, semi-supervised learning is introduced by utilizing pseudo-labels for unlabeled data to guide model training. Extensive qualitative experiments demonstrate the effectiveness of our proposed method for advancing the state-of-the-art in deep learning-based medical IQA. Code is available at: this https URL.
图像质量评估(IQA)在优化辐射剂量和开发计算机断层扫描(CT)中的新医学成像技术方面起着关键作用。传统IQA方法依赖于手工定制特征,在总结图像质量的主观感知经验方面存在局限性。最近基于深度学习的IQA方法显示出强大的建模能力和医学IQA的潜在可能性,但模型的泛化能力和感知准确性仍然存在挑战。在本文中,我们提出了一种多尺度分布回归方法,通过约束输出分布来预测质量分数,从而提高模型的泛化能力。此外,我们还设计了一个双分支对齐网络来增强特征提取能力。此外,通过利用未标记数据的伪标签进行半监督学习,进一步提高了模型的训练效果。大量实验证明,我们提出的IQA方法在推动基于深度学习的医学IQA领域取得了最先进的成果。代码可在此链接下载:https://this URL。
https://arxiv.org/abs/2311.08024
Intentionally luring readers to click on a particular content by exploiting their curiosity defines a title as clickbait. Although several studies focused on detecting clickbait titles in English articles, low resource language like Bangla has not been given adequate attention. To tackle clickbait titles in Bangla, we have constructed the first Bangla clickbait detection dataset containing 15,056 labeled news articles and 65,406 unlabelled news articles extracted from clickbait dense news sites. Each article has been labeled by three expert linguists and includes an article's title, body, and other metadata. By incorporating labeled and unlabelled data, we finetune a pretrained Bangla transformer model in an adversarial fashion using Semi Supervised Generative Adversarial Networks (SS GANs). The proposed model acts as a good baseline for this dataset, outperforming traditional neural network models (LSTM, GRU, CNN) and linguistic feature based models. We expect that this dataset and the detailed analysis and comparison of these clickbait detection models will provide a fundamental basis for future research into detecting clickbait titles in Bengali articles. We have released the corresponding code and dataset.
有意识地吸引读者点击特定内容,通过利用他们的好奇心定义标题为点击标题。尽管有几项研究关注于在英语文章中检测点击标题,但像孟加拉语这样的低资源语言尚未得到足够的关注。为了解决孟加拉语中的点击标题问题,我们构建了包含15,056个有标签的新闻文章和65,406个无标签的新闻文章的第一个孟加拉语点击标题检测数据集。每篇文章都由三位专家级语言学家标注,包括文章标题、正文和其他元数据。通过结合有标签和无标签数据,我们以对抗的方式微调了预训练的孟加拉语Transformer模型。该模型作为这个数据集的基准,超过了传统神经网络模型(LSTM,GRU,CNN)和基于语言特征的模型。我们预计,这个数据集以及这些点击标题检测模型的详细分析和比较将为未来研究提供基础,以在孟加拉语文章中检测点击标题。我们已经发布了相应的代码和数据集。
https://arxiv.org/abs/2311.06204
Surgical instrument segmentation is recognised as a key enabler to provide advanced surgical assistance and improve computer assisted interventions. In this work, we propose SegMatch, a semi supervised learning method to reduce the need for expensive annotation for laparoscopic and robotic surgical images. SegMatch builds on FixMatch, a widespread semi supervised classification pipeline combining consistency regularization and pseudo labelling, and adapts it for the purpose of segmentation. In our proposed SegMatch, the unlabelled images are weakly augmented and fed into the segmentation model to generate a pseudo-label to enforce the unsupervised loss against the output of the model for the adversarial augmented image on the pixels with a high confidence score. Our adaptation for segmentation tasks includes carefully considering the equivariance and invariance properties of the augmentation functions we rely on. To increase the relevance of our augmentations, we depart from using only handcrafted augmentations and introduce a trainable adversarial augmentation strategy. Our algorithm was evaluated on the MICCAI Instrument Segmentation Challenge datasets Robust-MIS 2019 and EndoVis 2017. Our results demonstrate that adding unlabelled data for training purposes allows us to surpass the performance of fully supervised approaches which are limited by the availability of training data in these challenges. SegMatch also outperforms a range of state-of-the-art semi-supervised learning semantic segmentation models in different labelled to unlabelled data ratios.
surgical instrument segmentation 被认为是提供高级手术辅助和提高计算机辅助干预的关键工具。在这项工作中,我们提出了 segMatch 半监督学习方法,以降低对Laparoscopic 和机器人手术图像的昂贵标记的依赖性。SegMatch 建立在 fixMatch 普遍使用的半监督分类管道,结合一致性 Regularization 和伪标签,并适应于分割任务。在我们提出的 segMatch 中,未标记的图像进行弱增强,并输入到分割模型,生成伪标签,以强制模型输出对具有高信心值像素的无监督损失与模型的输出。我们对分割任务的调整包括仔细考虑我们依赖的增强函数的等温性和不变性性质。为了增加我们的增强函数的相关性,我们离开了仅使用手工增强和引入训练可增强的对抗增强策略。我们的方法在 MICCAI 设备分割挑战数据集 Robust-MIS 2019 和EndoVis 2017 中进行评估。我们的结果表明,为训练目的添加未标记数据可以使我们超过这些挑战中完全监督方法的性能限制。此外, segMatch 在不同类型的标记数据比例下表现出比当前最先进的半监督学习语义分割模型更好的性能。
https://arxiv.org/abs/2308.05232
We propose a novel semi-supervised active learning (SSAL) framework for monocular 3D object detection with LiDAR guidance (MonoLiG), which leverages all modalities of collected data during model development. We utilize LiDAR to guide the data selection and training of monocular 3D detectors without introducing any overhead in the inference phase. During training, we leverage the LiDAR teacher, monocular student cross-modal framework from semi-supervised learning to distill information from unlabeled data as pseudo-labels. To handle the differences in sensor characteristics, we propose a data noise-based weighting mechanism to reduce the effect of propagating noise from LiDAR modality to monocular. For selecting which samples to label to improve the model performance, we propose a sensor consistency-based selection score that is also coherent with the training objective. Extensive experimental results on KITTI and Waymo datasets verify the effectiveness of our proposed framework. In particular, our selection strategy consistently outperforms state-of-the-art active learning baselines, yielding up to 17% better saving rate in labeling costs. Our training strategy attains the top place in KITTI 3D and birds-eye-view (BEV) monocular object detection official benchmarks by improving the BEV Average Precision (AP) by 2.02.
我们提出了一种新的半监督主动学习(SSAL)框架,以单眼3D物体检测为例,利用LiDAR guidance(单眼LiG),在模型开发过程中利用所有数据模式。我们利用LiDAR指导选择和训练单眼3D探测器,在推理阶段没有引入任何额外的负担。在训练过程中,我们利用半监督学习的LiDAR教师和单眼学生跨modal框架,从半监督学习中提取信息,将其作为伪标签进行舍入。为了处理传感器特性的差异,我们提出了一种数据噪声加权机制,以减少LiDAR模式向单眼模式的传播噪声的影响。为了选择哪些样本进行标注以改善模型性能,我们提出了一种传感器一致性选择得分,也与训练目标相一致。在KITTI和Waymo数据集上的广泛实验结果验证了我们提出的框架的有效性。特别是,我们的选择策略 consistently outperforms state-of-the-art主动学习基准线,在标签成本方面提高了17%的节省率。我们的训练策略通过提高KITTI3D和 birds-eye-view(BEV)单眼物体检测官方基准线的AP值,达到了最高排名。
https://arxiv.org/abs/2307.08415
We approached the goal of applying meta-learning to self-supervised masked autoencoders for spatiotemporal learning in three steps. Broadly, we seek to understand the impact of applying meta-learning to existing state-of-the-art representation learning architectures. Thus, we test spatiotemporal learning through: a meta-learning architecture only, a representation learning architecture only, and an architecture applying representation learning alongside a meta learning architecture. We utilize the Memory Augmented Neural Network (MANN) architecture to apply meta-learning to our framework. Specifically, we first experiment with applying a pre-trained MAE and fine-tuning on our small-scale spatiotemporal dataset for video reconstruction tasks. Next, we experiment with training an MAE encoder and applying a classification head for action classification tasks. Finally, we experiment with applying a pre-trained MAE and fine-tune with MANN backbone for action classification tasks.
我们采取了三个步骤来接近将元学习应用于自监督掩码生成器以时间空间学习的目标。总的来说,我们旨在理解将元学习应用于现有先进的表示学习架构的影响。因此,我们只有通过元学习架构、表示学习架构和元学习架构一起使用的架构来测试时间空间学习。我们利用增强记忆神经网络(MANN)架构将元学习应用于我们的框架。具体来说,我们首先尝试应用预训练的MAE并对我们的小型时间空间数据集进行微调,以进行视频重建任务。然后,我们尝试训练MAE编码器和应用分类头,以进行动作分类任务。最后,我们尝试应用预训练的MAE并调整ManN的骨架以进行动作分类任务。
https://arxiv.org/abs/2308.01916
The fifth Affective Behavior Analysis in-the-wild (ABAW) competition has multiple challenges such as Valence-Arousal Estimation Challenge, Expression Classification Challenge, Action Unit Detection Challenge, Emotional Reaction Intensity Estimation Challenge. In this paper we have dealt only expression classification challenge using multiple approaches such as fully supervised, semi-supervised and noisy label approach. Our approach using noise aware model has performed better than baseline model by 10.46% and semi supervised model has performed better than baseline model by 9.38% and the fully supervised model has performed better than the baseline by 9.34%
这篇文献讨论了第五项野生情感行为分析挑战(ABAW)竞赛,其中包括多个挑战,例如情感强度估计挑战、表达分类挑战、动作单元检测挑战和情绪反应强度估计挑战。在本文中,我们使用多个方法,如完全监督、半监督和噪声标签方法,来处理表达分类挑战。我们使用噪声意识到模型的方法表现得更好,比基准模型提高了10.46%。半监督模型表现得更好,比基准模型提高了9.38%。完全监督模型表现得更好,比基准提高了9.34%。
https://arxiv.org/abs/2303.09785
The increasing intensity and frequency of floods is one of the many consequences of our changing climate. In this work, we explore ML techniques that improve the flood detection module of an operational early flood warning system. Our method exploits an unlabelled dataset of paired multi-spectral and Synthetic Aperture Radar (SAR) imagery to reduce the labeling requirements of a purely supervised learning method. Prior works have used unlabelled data by creating weak labels out of them. However, from our experiments we noticed that such a model still ends up learning the label mistakes in those weak labels. Motivated by knowledge distillation and semi supervised learning, we explore the use of a teacher to train a student with the help of a small hand labelled dataset and a large unlabelled dataset. Unlike the conventional self distillation setup, we propose a cross modal distillation framework that transfers supervision from a teacher trained on richer modality (multi-spectral images) to a student model trained on SAR imagery. The trained models are then tested on the Sen1Floods11 dataset. Our model outperforms the Sen1Floods11 baseline model trained on the weak labeled SAR imagery by an absolute margin of 6.53% Intersection-over-Union (IoU) on the test split.
洪水的强度和频率的增加是我们气候变化的许多后果之一。在这个研究中,我们探讨了机器学习技术,以提高 operational early flood warning system 中的洪水检测模块。我们利用一个未标记的配对多光谱和合成孔径雷达图像的未命名数据集,以降低纯粹的监督学习方法的标记要求。以前的工作已经使用未标记数据,从它们中创建弱标签。然而,从我们的实验中我们发现,这样的模型仍然最终学习这些弱标签的标记错误。基于知识蒸馏和半监督学习的动机,我们探讨了使用一名教师帮助训练学生的方法,使用一个小手标注的数据集和一个大型未标注的数据集。与传统的自蒸馏setup不同,我们提出了一种跨modal蒸馏框架,将监督从训练丰富的modality(多光谱图像)转移到训练SAR图像的学生模型中。训练模型后,在Sen1Floods11数据集上进行了测试。我们的模型在弱标签SAR图像上的标记错误训练 Sen1Floods11 基线模型的相对误差6.53%的IoU上表现出色。
https://arxiv.org/abs/2302.08180
Content based providers transmits real time complex signal such as video data from one region to another. During this transmission process, the signals usually end up distorted or degraded where the actual information present in the video is lost. This normally happens in the streaming video services applications. Hence there is a need to know the level of degradation that happened in the receiver side. This video degradation can be estimated by network state parameters like data rate and packet loss values. Our proposed solution vQoS GAN (video Quality of Service Generative Adversarial Network) can estimate the network state parameters from the degraded received video data using a deep learning approach of semi supervised generative adversarial network algorithm. A robust and unique design of deep learning network model has been trained with the video data along with data rate and packet loss class labels and achieves over 95 percent of training accuracy. The proposed semi supervised generative adversarial network can additionally reconstruct the degraded video data to its original form for a better end user experience.
内容提供商通过传输实时复杂信号(如视频数据)从一个地区传输到另一个地区。在传输过程中,信号通常会在实际视频信息丢失的地方出现扭曲或失真。这种情况通常发生在流媒体视频服务应用程序中。因此,需要在接收端了解发生了多少降解。这种视频降解可以通过网络状态参数(如数据率和丢失数据包数量)来估计。我们提出的解决方案vQoS GAN(视频服务质量生成对抗网络)使用半监督生成对抗网络算法的深度学习方法从失真的接收视频数据中估计网络状态参数。通过与视频数据和数据率及丢失数据包分类标签一起训练,该网络模型的鲁棒性和独特设计达到超过95%的训练准确率。与传统的基于规则的方法相比,我们的半监督生成对抗网络具有更高的准确性,并且可以重建失真的视频数据以提供更好的用户体验。
https://arxiv.org/abs/2204.07062
Semi supervised learning (SSL) provides an effective means of leveraging unlabelled data to improve a model's performance. Even though the domain has received a considerable amount of attention in the past years, most methods present the common drawback of being unsafe. By safeness we mean the quality of not degrading a fully supervised model when including unlabelled data. Our starting point is to notice that the estimate of the risk that most discriminative SSL methods minimise is biased, even asymptotically. This bias makes these techniques untrustable without a proper validation set, but we propose a simple way of removing the bias. Our debiasing approach is straightforward to implement, and applicable to most deep SSL methods. We provide simple theoretical guarantees on the safeness of these modified methods, without having to rely on the strong assumptions on the data distribution that SSL theory usually requires. We evaluate debiased versions of different existing SSL methods and show that debiasing can compete with classic deep SSL techniques in various classic settings and even performs well when traditional SSL fails.
https://arxiv.org/abs/2203.07512
In this paper, we propose a Neural Architecture Search strategy based on self supervision and semi-supervised learning for the task of semantic segmentation. Our approach builds an optimized neural network (NN) model for this task by jointly solving a jigsaw pretext task discovered with self-supervised learning over unlabeled training data, and, exploiting the structure of the unlabeled data with semi-supervised learning. The search of the architecture of the NN model is performed by dynamic routing using a gradient descent algorithm. Experiments on the Cityscapes and PASCAL VOC 2012 datasets demonstrate that the discovered neural network is more efficient than a state-of-the-art hand-crafted NN model with four times less floating operations.
https://arxiv.org/abs/2201.12646
Computer aided diagnostics often requires analysis of a region of interest (ROI) within a radiology scan, and the ROI may be an organ or a suborgan. Although deep learning algorithms have the ability to outperform other methods, they rely on the availability of a large amount of annotated data. Motivated by the need to address this limitation, an approach to localisation and detection of multiple organs based on supervised and semi-supervised learning is presented here. It draws upon previous work by the authors on localising the thoracic and lumbar spine region in CT images. The method generates six bounding boxes of organs of interest, which are then fused to a single bounding box. The results of experiments on localisation of the Spleen, Left and Right Kidneys in CT Images using supervised and semi supervised learning (SSL) demonstrate the ability to address data limitations with a much smaller data set and fewer annotations, compared to other state-of-the-art methods. The SSL performance was evaluated using three different mixes of labelled and unlabelled data (i.e.30:70,35:65,40:60) for each of lumbar spine, spleen left and right kidneys respectively. The results indicate that SSL provides a workable alternative especially in medical imaging where it is difficult to obtain annotated data.
https://arxiv.org/abs/2112.03276
Our way of grasping objects is challenging for efficient, intelligent and optimal grasp by COBOTs. To streamline the process, here we use deep learning techniques to help robots learn to generate and execute appropriate grasps quickly. We developed a Generative Inception Neural Network (GI-NNet) model, capable of generating antipodal robotic grasps on seen as well as unseen objects. It is trained on Cornell Grasping Dataset (CGD) and attained 98.87% grasp pose accuracy for detecting both regular and irregular shaped objects from RGB-Depth (RGB-D) images while requiring only one third of the network trainable parameters as compared to the existing approaches. However, to attain this level of performance the model requires the entire 90% of the available labelled data of CGD keeping only 10% labelled data for testing which makes it vulnerable to poor generalization. Furthermore, getting sufficient and quality labelled dataset is becoming increasingly difficult keeping in pace with the requirement of gigantic networks. To address these issues, we attach our model as a decoder with a semi-supervised learning based architecture known as Vector Quantized Variational Auto Encoder (VQVAE), which works efficiently when trained both with the available labelled and unlabelled data. The proposed model, which we name as Representation based GI-NNet (RGI-NNet), has been trained with various splits of label data on CGD with as minimum as 10% labelled dataset together with latent embedding generated from VQVAE up to 50% labelled data with latent embedding obtained from VQVAE. The performance level, in terms of grasp pose accuracy of RGI-NNet, varies between 92.13% to 95.6% which is far better than several existing models trained with only labelled dataset. For the performance verification of both GI-NNet and RGI-NNet models, we use Anukul (Baxter) hardware cobot.
https://arxiv.org/abs/2107.07452
This paper addresses semi-supervised semantic segmentation by exploiting a small set of images with pixel-level annotations (strong supervisions) and a large set of images with only image-level annotations (weak supervisions). Most existing approaches aim to generate accurate pixel-level labels from weak supervisions. However, we observe that those generated labels still inevitably contain noisy labels. Motivated by this observation, we present a novel perspective and formulate this task as a problem of learning with pixel-level label noise. Existing noisy label methods, nevertheless, mainly aim at image-level tasks, which can not capture the relationship between neighboring labels in one image. Therefore, we propose a graph based label noise detection and correction framework to deal with pixel-level noisy labels. In particular, for the generated pixel-level noisy labels from weak supervisions by Class Activation Map (CAM), we train a clean segmentation model with strong supervisions to detect the clean labels from these noisy labels according to the cross-entropy loss. Then, we adopt a superpixel-based graph to represent the relations of spatial adjacency and semantic similarity between pixels in one image. Finally we correct the noisy labels using a Graph Attention Network (GAT) supervised by detected clean labels. We comprehensively conduct experiments on PASCAL VOC 2012, PASCAL-Context and MS-COCO datasets. The experimental results show that our proposed semi supervised method achieves the state-of-the-art performances and even outperforms the fully-supervised models on PASCAL VOC 2012 and MS-COCO datasets in some cases.
https://arxiv.org/abs/2103.14242
Few-shot learning aims to generalize unseen classes that appear during testing but are unavailable during training. Prototypical networks incorporate few-shot metric learning, by constructing a class prototype in the form of a mean vector of the embedded support points within a class. The performance of prototypical networks in extreme few-shot scenarios (like one-shot) degrades drastically, mainly due to the desuetude of variations within the clusters while constructing prototypes. In this paper, we propose to replace the typical prototypical loss function with an Episodic Triplet Mining (ETM) technique. The conventional triplet selection leads to overfitting, because of all possible combinations being used during training. We incorporate episodic training for mining the semi hard positive and the semi hard negative triplets to overcome the overfitting. We also propose an adaptation to make use of unlabeled training samples for better modeling. Experimenting on two different audio processing tasks, namely speaker recognition and audio event detection; show improved performances and hence the efficacy of ETM over the prototypical loss function and other meta-learning frameworks. Further, we show improved performances when unlabeled training samples are used.
https://arxiv.org/abs/2102.08074
In this paper, we present a semi supervised deep quick learning framework for instance detection and pixel-wise semantic segmentation of images in a dense clutter of items. The framework can quickly and incrementally learn novel items in an online manner by real-time data acquisition and generating corresponding ground truths on its own. To learn various combinations of items, it can synthesize cluttered scenes, in real time. The overall approach is based on the tutor-child analogy in which a deep network (tutor) is pretrained for class-agnostic object detection which generates labeled data for another deep network (child). The child utilizes a customized convolutional neural network head for the purpose of quick learning. There are broadly four key components of the proposed framework semi supervised labeling, occlusion aware clutter synthesis, a customized convolutional neural network head, and instance detection. The initial version of this framework was implemented during our participation in Amazon Robotics Challenge (ARC), 2017. Our system was ranked 3rd, 4th and 5th worldwide in pick, stow-pick and stow task respectively. The proposed framework is an improved version over ARC17 where novel features such as instance detection and online learning has been added.
https://arxiv.org/abs/2101.06405