We study the problem of object retrieval in scenarios where visual sensing is absent, object shapes are unknown beforehand and objects can move freely, like grabbing objects out of a drawer. Successful solutions require localizing free objects, identifying specific object instances, and then grasping the identified objects, only using touch feedback. Unlike vision, where cameras can observe the entire scene, touch sensors are local and only observe parts of the scene that are in contact with the manipulator. Moreover, information gathering via touch sensors necessitates applying forces on the touched surface which may disturb the scene itself. Reasoning with touch, therefore, requires careful exploration and integration of information over time -- a challenge we tackle. We present a system capable of using sparse tactile feedback from fingertip touch sensors on a dexterous hand to localize, identify and grasp novel objects without any visual feedback. Videos are available at this https URL.
我们研究的是视觉感知不存在、物体形状未知且能够自由移动的场景,例如取出抽屉中的物品。成功的解决方案需要定位自由物品、识别特定的实例并用手抓住被识别的物品,仅使用触摸反馈。与视觉不同的是,相机只能观察整个场景,触摸传感器是局部的,只能观察与操纵器接触的部分场景。此外,通过触摸传感器收集信息需要在触摸表面上施加力量,这可能会干扰场景本身。因此,与触摸进行推理需要仔细探索和整合信息——这是一个我们挑战的问题。我们提出了一种系统,它能够利用手指尖端触摸传感器提供的稀疏触觉反馈,在没有任何视觉反馈的情况下定位、识别并抓住新物品。视频资源可在以下 https URL 中找到。
https://arxiv.org/abs/2303.13482
The objective of this study is to address the critical issue of de-identification of clinical reports in order to allow access to data for research purposes, while ensuring patient privacy. The study highlights the difficulties faced in sharing tools and resources in this domain and presents the experience of the Greater Paris University Hospitals (AP-HP) in implementing a systematic pseudonymization of text documents from its Clinical Data Warehouse. We annotated a corpus of clinical documents according to 12 types of identifying entities, and built a hybrid system, merging the results of a deep learning model as well as manual rules. Our results show an overall performance of 0.99 of F1-score. We discuss implementation choices and present experiments to better understand the effort involved in such a task, including dataset size, document types, language models, or rule addition. We share guidelines and code under a 3-Clause BSD license.
本研究的目标是解决临床报告的易错性问题,以便允许进行科学研究,同时确保患者隐私。研究强调了在这个领域的分享工具和资源所面临的困难,并介绍了巴黎大巴黎大学医院(AP-HP)在从临床数据仓库中系统命名化文本文档的经验。我们对临床文档的语料库进行了注释,按照12种识别实体类型进行标注,并建立了一个混合系统,将深度学习模型和手动规则的结果合并。我们的结果显示整体表现达到F1得分的0.99。我们讨论了实现选择,并介绍了实验,以更好地理解这种任务所需的努力,包括数据集大小、文档类型、语言模型或规则增加。我们遵循3项BSD许可证分享指南和代码。
https://arxiv.org/abs/2303.13451
Quality assessment algorithms can be used to estimate the utility of a biometric sample for the purpose of biometric recognition. "Error versus Discard Characteristic" (EDC) plots, and "partial Area Under Curve" (pAUC) values of curves therein, are generally used by researchers to evaluate the predictive performance of such quality assessment algorithms. An EDC curve depends on an error type such as the "False Non Match Rate" (FNMR), a quality assessment algorithm, a biometric recognition system, a set of comparisons each corresponding to a biometric sample pair, and a comparison score threshold corresponding to a starting error. To compute an EDC curve, comparisons are progressively discarded based on the associated samples' lowest quality scores, and the error is computed for the remaining comparisons. Additionally, a discard fraction limit or range must be selected to compute pAUC values, which can then be used to quantitatively rank quality assessment algorithms. This paper discusses and analyses various details for this kind of quality assessment algorithm evaluation, including general EDC properties, interpretability improvements for pAUC values based on a hard lower error limit and a soft upper error limit, the use of relative instead of discrete rankings, stepwise vs. linear curve interpolation, and normalisation of quality scores to a [0, 100] integer range. We also analyse the stability of quantitative quality assessment algorithm rankings based on pAUC values across varying pAUC discard fraction limits and starting errors, concluding that higher pAUC discard fraction limits should be preferred. The analyses are conducted both with synthetic data and with real data for a face image quality assessment scenario, with a focus on general modality-independent conclusions for EDC evaluations.
质量评估算法可以用来估计一个生物特征样本用于生物特征识别的有用性。 "错误-排除特征" (EDC) 绘图和曲线中的 "部分平均面积" (pAUC) 值通常被研究人员用于评估这些质量评估算法的预测性能。 EDC 曲线取决于错误类型,例如 "False 不匹配率" (FNMR), 质量评估算法,生物特征识别系统,一组对应于生物特征样本对的比较,以及一个比较分数阈值对应于起始错误。为了计算 EDC 曲线,比较是按相关样本的最低质量分数逐步排除的,而错误是计算剩余的比较。此外,必须选择计算 pAUC 值的范围来计算 pAUC 值,然后用于定量评估质量评估算法。 本 paper 讨论和分析了评估这种质量评估算法评估的各种细节,包括 general EDC 特性,基于 hard 的低错误限制和 soft 的高等错误限制的 pAUC 值的解释性改进,使用相对排名而不是离散排名, stepwise 和线性曲线插值,以及质量分数的归一化到 [0, 100] 整数范围内。我们还分析基于 pAUC 值在不同 pAUC 丢弃 fraction 限制和起始错误的稳定性,得出结论较高 pAUC 丢弃 fraction 限制应该优先考虑。 分析使用了合成数据和真实数据的一个面部图像质量评估场景,专注于 EDC 评估的一般模式无关结论。
https://arxiv.org/abs/2303.13294
Deep point cloud registration methods face challenges to partial overlaps and rely on labeled data. To address these issues, we propose UDPReg, an unsupervised deep probabilistic registration framework for point clouds with partial overlaps. Specifically, we first adopt a network to learn posterior probability distributions of Gaussian mixture models (GMMs) from point clouds. To handle partial point cloud registration, we apply the Sinkhorn algorithm to predict the distribution-level correspondences under the constraint of the mixing weights of GMMs. To enable unsupervised learning, we design three distribution consistency-based losses: self-consistency, cross-consistency, and local contrastive. The self-consistency loss is formulated by encouraging GMMs in Euclidean and feature spaces to share identical posterior distributions. The cross-consistency loss derives from the fact that the points of two partially overlapping point clouds belonging to the same clusters share the cluster centroids. The cross-consistency loss allows the network to flexibly learn a transformation-invariant posterior distribution of two aligned point clouds. The local contrastive loss facilitates the network to extract discriminative local features. Our UDPReg achieves competitive performance on the 3DMatch/3DLoMatch and ModelNet/ModelLoNet benchmarks.
深度点云注册方法面临着部分重叠和依赖标记数据的挑战。为了解决这些问题,我们提出了UDPReg,一个 unsupervised deep probabilistic 注册框架,用于处理部分重叠的点云。具体来说,我们首先采用一个网络来从点云学习高斯混合模型(GMM)的后概率分布。为了处理部分点云注册,我们应用Sinkhorn算法预测 GMM 混合权重的限制下的分布级对应关系。为了允许未监督学习,我们设计了三种分布一致性损失:自一致性、跨一致性和局部竞争。自一致性损失是由鼓励GMM在欧氏空间和特征空间共享相同的后概率分布而定义的。跨一致性损失源于两个相同的簇内的点云点号的共享簇 centroid。跨一致性损失使网络能够灵活学习两个对齐点云的 transformation-invariant 后概率分布。局部竞争损失使网络能够提取具有差异的局部特征。我们的UDPReg在3DMatch/3DLoMatch和ModelNet/ModelLoNet基准测试中取得了竞争性能。
https://arxiv.org/abs/2303.13290
With the increasing ubiquity of cameras and smart sensors, humanity is generating data at an exponential rate. Access to this trove of information, often covering yet-underrepresented use-cases (e.g., AI in medical settings) could fuel a new generation of deep-learning tools. However, eager data scientists should first provide satisfying guarantees w.r.t. the privacy of individuals present in these untapped datasets. This is especially important for images or videos depicting faces, as their biometric information is the target of most identification methods. While a variety of solutions have been proposed to de-identify such images, they often corrupt other non-identifying facial attributes that would be relevant for downstream tasks. In this paper, we propose Disguise, a novel algorithm to seamlessly de-identify facial images while ensuring the usability of the altered data. Unlike prior arts, we ground our solution in both differential privacy and ensemble-learning research domains. Our method extracts and swaps depicted identities with fake ones, synthesized via variational mechanisms to maximize obfuscation and non-invertibility; while leveraging the supervision from a mixture-of-experts to disentangle and preserve other utility attributes. We extensively evaluate our method on multiple datasets, demonstrating higher de-identification rate and superior consistency than prior art w.r.t. various downstream tasks.
随着相机和智能传感器的普及,人类正在以指数级速度生成数据。访问这些数据宝藏,通常涵盖尚未被充分覆盖的使用场景(例如,医疗场景中的人工智能),可以推动新一代深度学习工具的开发。然而,渴望数据科学家应该首先为这些未挖掘的数据中的个人隐私提供令人满意的保障。对于描绘面部的图像或视频,这尤为重要,因为它们的生物识别信息是大多数身份验证方法的目标。虽然已经提出了多种解决方案来解谜这些图像,但它们往往损坏了与后续任务相关的其他非识别面部属性。在本文中,我们提出了伪装算法,一种 seamlessly 解谜面部图像同时确保其可用性的新方法。与以前的艺术形式不同,我们将其解决方案建立在差异隐私和集成学习研究 domains 的双重框架内。我们的算法从描绘的身份信息中提取和交换,通过变分机制合成,以最大化混淆和非逆转性;同时利用专家混合组的监督来分离和保留其他有用属性。我们对这些数据集进行了广泛的评估,证明了相比以前的艺术形式,我们的解谜方法解谜率和一致性更高,其性能更加优越。
https://arxiv.org/abs/2303.13269
This work is unique in the use of discrete wavelets that were built from or derived from Chebyshev polynomials of the second and third kind, filter the Discrete Second Chebyshev Wavelets Transform (DSCWT), and derive two effective filters. The Filter Discrete Third Chebyshev Wavelets Transform (FDTCWT) is used in the process of analyzing color images and removing noise and impurities that accompany the image, as well as because of the large amount of data that makes up the image as it is taken. These data are massive, making it difficult to deal with each other during transmission. However to address this issue, the image compression technique is used, with the image not losing information due to the readings that were obtained, and the results were satisfactory. Mean Square Error (MSE), Peak Signal Noise Ratio (PSNR), Bit Per Pixel (BPP), and Compression Ratio (CR) Coronavirus is the initial treatment, while the processing stage is done with network training for Convolutional Neural Networks (CNN) with Discrete Second Chebeshev Wavelets Convolutional Neural Network (DSCWCNN) and Discrete Third Chebeshev Wavelets Convolutional Neural Network (DTCWCNN) to create an efficient algorithm for face recognition, and the best results were achieved in accuracy and in the least amount of time. Two samples of color images that were made or implemented were used. The proposed theory was obtained with fast and good results; the results are evident shown in the tables below.
这项工作的独特之处在于使用了从或基于Chebyshev多项式第二类和第三类的离散小波,过滤Discrete Second Chebyshev Wavelets Transform(DSCWT),并推导出两个有效的过滤器。滤波Discrete Third Chebyshev Wavelets Transform(FDTCWT)用于分析彩色图像,并去除伴随图像的噪声和杂质,以及由于在拍摄时构成图像的大量数据。这些数据是巨大的,因此在传输期间很难相互处理。然而,为了解决这一问题,使用图像压缩技术,由于图像没有因获得的阅读而丢失信息,结果令人满意。Mean Square Error(MSE)、Peak Signal-to-Noise Ratio(PSNR)、每个像素的比特数(BPP)和压缩比(CR)是新冠病毒的初始治疗,而处理阶段使用网络训练Convolutional Neural Networks(CNN)与Discrete Second Chebyshev Wavelets Convolutional Neural Network(DSCWCNN)和Discrete Third Chebyshev Wavelets Convolutional Neural Network(DTCWCNN)创建高效的人脸识别算法,并且最好的结果是在精度和最少的时间内实现。使用了制作或实现的彩色图像的两个样本。提出的理论以快速和良好的结果得出,结果在以下表格中显而易见。
https://arxiv.org/abs/2303.13158
Recently, face swapping has been developing rapidly and achieved a surprising reality, raising concerns about fake content. As a countermeasure, various detection approaches have been proposed and achieved promising performance. However, most existing detectors struggle to maintain performance on unseen face swapping methods and low-quality images. Apart from the generalization problem, current detection approaches have been shown vulnerable to evasion attacks crafted by detection-aware manipulators. Lack of robustness under adversary scenarios leaves threats for applying face swapping detection in real world. In this paper, we propose a novel face swapping detection approach based on face identification probability distributions, coined as IdP_FSD, to improve the generalization and robustness. IdP_FSD is specially designed for detecting swapped faces whose identities belong to a finite set, which is meaningful in real-world applications. Compared with previous general detection methods, we make use of the available real faces with concerned identities and require no fake samples for training. IdP_FSD exploits face swapping's common nature that the identity of swapped face combines that of two faces involved in swapping. We reflect this nature with the confusion of a face identification model and measure the confusion with the maximum value of the output probability distribution. What's more, to defend our detector under adversary scenarios, an attention-based finetuning scheme is proposed for the face identification models used in IdP_FSD. Extensive experiments show that the proposed IdP_FSD not only achieves high detection performance on different benchmark datasets and image qualities but also raises the bar for manipulators to evade the detection.
近年来,人脸交换技术正在快速发展并实现了令人惊讶的现状,引起了关于虚假内容的忧虑。作为一种应对措施,已经提出了多种检测方法,并取得了令人瞩目的表现。然而,大多数现有检测方法都在未展示的人脸交换方法和低质量图像上 struggle 维持表现。除了泛化问题,当前检测方法还受到了检测意识操纵者精心制作的规避攻击的脆弱性。在对抗性场景下缺乏鲁棒性,使在现实世界应用人脸交换检测的威胁仍然存在。在本文中,我们提出了一种基于人脸身份识别概率分布的新人脸交换检测方法,称为 idP_FSD,以改善泛化和鲁棒性。 idP_FSD 专门设计用于检测身份属于有限集合的swapped faces,这在现实世界应用中有意义。与以前的通用检测方法相比,我们利用已知的真实 faces 和相关的身份信息,不需要假样本进行训练。 idP_FSD 利用人脸交换的普遍特性,即swapped face 的身份与在swapping过程中涉及的两个 Face 的身份相结合。我们通过人脸身份识别模型的混淆来反映这种特性,并使用输出概率分布的最大值来衡量混淆。此外,为了保护我们在对抗性场景下的检测器,我们提出了一种基于注意力的微调方案,用于 idP_FSD 中使用的人脸身份识别模型。广泛的实验结果表明,提出的 idP_FSD 不仅可以在不同类型的基准数据集和图像质量上实现高水平的检测性能,而且还提高了操纵者规避检测的难度。
https://arxiv.org/abs/2303.13131
Self-supervised learning is attracting large attention in point cloud understanding. However, exploring discriminative and transferable features still remains challenging due to their nature of irregularity and sparsity. We propose a geometrically and adaptively masked auto-encoder for self-supervised learning on point clouds, termed \textit{PointGame}. PointGame contains two core components: GATE and EAT. GATE stands for the geometrical and adaptive token embedding module; it not only absorbs the conventional wisdom of geometric descriptors that captures the surface shape effectively, but also exploits adaptive saliency to focus on the salient part of a point cloud. EAT stands for the external attention-based Transformer encoder with linear computational complexity, which increases the efficiency of the whole pipeline. Unlike cutting-edge unsupervised learning models, PointGame leverages geometric descriptors to perceive surface shapes and adaptively mines discriminative features from training data. PointGame showcases clear advantages over its competitors on various downstream tasks under both global and local fine-tuning strategies. The code and pre-trained models will be publicly available.
自监督学习在点云理解中引起了广泛关注。然而,探索具有区分性和可转移性的特征仍然由于它们的不规则性和稀疏性的性质而非常困难。我们提出了一种基于几何和自适应掩膜的自编码器,称为 \textit{PointGame}。PointGame包含两个核心组件:Gate和EAT。Gate表示基于几何特征的自适应 token 嵌入模块,不仅吸收了传统的几何描述符,有效地捕捉表面形状的经验,而且还利用自适应偏差来集中关注点云的突出部分。EAT表示基于外部注意力的Transformer编码器,具有线性计算复杂性,提高了整个流程的效率。与最先进的无监督学习模型不同,PointGame利用几何描述符来感知表面形状,并自适应地从训练数据中挖掘具有区分性的特征。PointGame在不同全球和 local微调策略下的多个后续任务中展现了明显的优势,代码和预训练模型将公开可用。
https://arxiv.org/abs/2303.13100
Synthesis and reconstruction of 3D human head has gained increasing interests in computer vision and computer graphics recently. Existing state-of-the-art 3D generative adversarial networks (GANs) for 3D human head synthesis are either limited to near-frontal views or hard to preserve 3D consistency in large view angles. We propose PanoHead, the first 3D-aware generative model that enables high-quality view-consistent image synthesis of full heads in $360^\circ$ with diverse appearance and detailed geometry using only in-the-wild unstructured images for training. At its core, we lift up the representation power of recent 3D GANs and bridge the data alignment gap when training from in-the-wild images with widely distributed views. Specifically, we propose a novel two-stage self-adaptive image alignment for robust 3D GAN training. We further introduce a tri-grid neural volume representation that effectively addresses front-face and back-head feature entanglement rooted in the widely-adopted tri-plane formulation. Our method instills prior knowledge of 2D image segmentation in adversarial learning of 3D neural scene structures, enabling compositable head synthesis in diverse backgrounds. Benefiting from these designs, our method significantly outperforms previous 3D GANs, generating high-quality 3D heads with accurate geometry and diverse appearances, even with long wavy and afro hairstyles, renderable from arbitrary poses. Furthermore, we show that our system can reconstruct full 3D heads from single input images for personalized realistic 3D avatars.
3D人类头部合成和重建最近在计算机视觉和计算机图形领域引起了越来越多的兴趣。目前,用于3D人类头部合成的最新3D生成对抗网络(GAN)通常只能局限于近前方视图或难以在较大的视角下保持3D一致性。我们提出了潘多拉头(PanoHead),是第一个具有3D感知能力的生成模型,能够通过高质量的视角一致性图像合成,生成具有多种外观和详细几何结构的全头部,仅使用野生无序图像进行训练。其核心是,我们提升了最近发布的3DGAN的表示能力,并在从广泛分布的野生图像训练时解决了数据对齐差距。具体来说,我们提出了一种新的二阶段自适应性图像对齐方法,以 robust 3D GAN训练。我们还介绍了一种 tri-grid 神经网络体积表示,有效地解决了正面和头部背景特征的纠缠,的根源是被广泛采用的三个平面表示。我们的方法在3D神经网络场景结构的对抗学习中引入了2D图像分割的先验知识,使可以在多种背景中合成头部。得益于这些设计,我们的方法显著优于以前的3DGAN,生成高质量的3D头部,具有准确的几何和多种外观,即使具有长波状和frofro的发型,也能从任意姿态进行渲染。此外,我们展示了我们的系统可以从单个输入图像中恢复完整的3D头部,为个性化的真实3D角色创建定制的逼真头像。
https://arxiv.org/abs/2303.13071
Recent advances in neural rendering have shown great potential for reconstructing scenes from multiview images. However, accurately representing objects with glossy surfaces remains a challenge for existing methods. In this work, we introduce ENVIDR, a rendering and modeling framework for high-quality rendering and reconstruction of surfaces with challenging specular reflections. To achieve this, we first propose a novel neural renderer with decomposed rendering components to learn the interaction between surface and environment lighting. This renderer is trained using existing physically based renderers and is decoupled from actual scene representations. We then propose an SDF-based neural surface model that leverages this learned neural renderer to represent general scenes. Our model additionally synthesizes indirect illuminations caused by inter-reflections from shiny surfaces by marching surface-reflected rays. We demonstrate that our method outperforms state-of-art methods on challenging shiny scenes, providing high-quality rendering of specular reflections while also enabling material editing and scene relighting.
神经网络渲染最近的进展已经表明可以从多视角图像中重建场景的巨大潜力。然而,准确代表具有光滑表面的物体仍然是现有方法所面临的挑战。在本文中,我们介绍了ENVIDR,一个渲染和建模框架,用于高质量渲染和重建具有挑战性的光滑反射表面的表面。为了实现这一点,我们提出了一种新的神经网络渲染器,分解渲染组件以学习表面和环境照明之间的交互。这个渲染器使用现有的物理基渲染器进行训练,并将其与实际场景表示分离。随后,我们提出了基于SDF的神经网络表面模型,该模型利用已学习的神经网络渲染器来代表一般场景。我们的模型此外合成由光滑表面间的反射引起的间接照明。我们证明了我们的方法在挑战性的光滑场景方面优于最先进的方法,提供高质量的镜面反射渲染,同时还可以编辑材料和重新照亮场景。
https://arxiv.org/abs/2303.13022
Human-machine interaction (HMI) and human-robot interaction (HRI) can assist structural monitoring and structural dynamics testing in the laboratory and field. In vibratory experimentation, one mode of generating vibration is to use electrodynamic exciters. Manual control is a common way of setting the input of the exciter by the operator. To measure the structural responses to these generated vibrations sensors are attached to the structure. These sensors can be deployed by repeatable robots with high endurance, which require on-the-fly control. If the interface between operators and the controls was augmented, then operators can visualize the experiments, exciter levels, and define robot input with a better awareness of the area of interest. Robots can provide better aid to humans if intelligent on-the-fly control of the robot is: (1) quantified and presented to the human; (2) conducted in real-time for human feedback informed by data. Information provided by the new interface would be used to change the control input based on their understanding of real-time parameters. This research proposes using Augmented Reality (AR) applications to provide humans with sensor feedback and control of actuators and robots. This method improves cognition by allowing the operator to maintain awareness of structures while adjusting conditions accordingly with the assistance of the new real-time interface. One interface application is developed to plot sensor data in addition to voltage, frequency, and duration controls for vibration generation. Two more applications are developed under similar framework, one to control the position of a mediating robot and one to control the frequency of the robot movement. This paper presents the proposed model for the new control loop and then compares the new approach with a traditional method by measuring time delay in control input and user efficiency.
人机交互(HMI)和人机交互(HRI)可以在实验室和实地帮助进行结构监测和结构动力学测试。在振动实验中,一种产生振动的模式是利用电热激发器。手动控制是一种常见的方式,由操作员设置激发器的输入。为了测量这些产生振动的结构响应,传感器被安装在结构中。这些传感器可以重复使用机器人上具有高耐力的机器人部署,这需要实时控制。如果操作员与控制台之间的界面被增强,则操作员可以更好地可视化实验、激发器水平,并定义机器人输入,更好地了解感兴趣的区域。如果机器人的实时智能控制是:(1)量化并呈现给人类;(2)通过数据 inform 人类实时反馈。新界面提供的信息将被用于改变控制输入,基于他们对实时参数的理解。此研究建议使用增强现实(AR)应用程序为提供传感器反馈和驱动控制器和机器人的控制。这种方法可以提高认知,允许操作员在借助新实时界面的同时,保持对结构的注意。一个界面应用程序将被开发来绘制传感器数据,除了电压、频率和持续时间的控制外,还用于振动生成。另外两个应用程序将在类似框架下开发,一个控制中介机器人的位置,一个控制机器人的运动频率。本文提出了新控制循环的模型,然后通过测量控制输入和时间延迟比较传统方法和新方法的效率。
https://arxiv.org/abs/2303.13016
Face recognition models embed a face image into a low-dimensional identity vector containing abstract encodings of identity-specific facial features that allow individuals to be distinguished from one another. We tackle the challenging task of inverting the latent space of pre-trained face recognition models without full model access (i.e. black-box setting). A variety of methods have been proposed in literature for this task, but they have serious shortcomings such as a lack of realistic outputs, long inference times, and strong requirements for the data set and accessibility of the face recognition model. Through an analysis of the black-box inversion problem, we show that the conditional diffusion model loss naturally emerges and that we can effectively sample from the inverse distribution even without an identity-specific loss. Our method, named identity denoising diffusion probabilistic model (ID3PM), leverages the stochastic nature of the denoising diffusion process to produce high-quality, identity-preserving face images with various backgrounds, lighting, poses, and expressions. We demonstrate state-of-the-art performance in terms of identity preservation and diversity both qualitatively and quantitatively. Our method is the first black-box face recognition model inversion method that offers intuitive control over the generation process and does not suffer from any of the common shortcomings from competing methods.
人脸识别模型将面部图像嵌入低维身份向量中,其中包含身份特定的面部特征抽象编码,以便能够区分个体。我们解决了没有全模型访问(即黑盒设置)的挑战,即反转训练后的面部识别模型的潜在空间。在文献中提出了多种方法来完成这项工作,但它们都有严重的缺陷,例如缺乏实际输出、推理时间长、以及对数据集和面部识别模型访问的强烈要求。通过对黑盒反转问题的分析,我们表明条件扩散模型损失自然地出现,并且即使没有身份特定的损失,我们仍然可以从逆分布中有效地采样。我们的方法名为身份去噪扩散概率模型(ID3PM),利用去噪扩散过程的随机性质,以产生各种背景、照明、姿势和面部表情的高质量、身份保留的面部图像。我们证明了身份保留和多样性的高水平、高定量表现。我们的方法是第一个黑盒面部识别模型反转方法,提供了直观的控制生成过程,并且没有与竞争方法的普遍缺点的任何影响。
https://arxiv.org/abs/2303.13006
The emergence of ChatGPT has recently garnered significant attention from the computational linguistics community. To demonstrate its capabilities as a keyphrase generator, we conduct a preliminary evaluation of ChatGPT for the keyphrase generation task. We evaluate its performance in various aspects, including keyphrase generation prompts, keyphrase generation diversity, multi-domain keyphrase generation, and long document understanding. Our evaluation is based on six benchmark datasets, and we adopt the prompt suggested by OpenAI while extending it to six candidate prompts. We find that ChatGPT performs exceptionally well on all six candidate prompts, with minor performance differences observed across the datasets. Based on our findings, we conclude that ChatGPT has great potential for keyphrase generation. Moreover, we discover that ChatGPT still faces challenges when it comes to generating absent keyphrases. Meanwhile, in the final section, we also present some limitations and future expansions of this report.
ChatGPT的出现最近引起了计算语言学社区的广泛关注。为了展示其作为关键词生成器的能力,我们对ChatGPT进行了关键词生成任务的第一步评估。我们评估了其在多个方面的表现,包括关键词生成提示、关键词生成多样性、多主题关键词生成以及长期文档理解。我们的评估基于六个基准数据集,并采用OpenAI推荐的提示,同时将提示扩展到了六个备选提示。我们发现ChatGPT在所有六个备选提示上都表现非常出色,数据集上表现出微小的性能差异。基于我们的发现,我们得出结论,ChatGPT在关键词生成方面具有巨大的潜力。此外,我们还发现ChatGPT在生成缺失关键词时仍然面临挑战。同时,在结论部分,我们还介绍了本报告的一些限制和未来扩展。
https://arxiv.org/abs/2303.13001
Facial expression is a way of communication that can be used to interact with computers or other electronic devices and the recognition of emotion from faces is an emerging practice with application in many fields. There are many cloud-based vision application programming interfaces available that recognize emotion from facial images and video. In this article, the performances of two well-known APIs were compared using a public dataset of 980 images of facial emotions. For these experiments, a client program was developed which iterates over the image set, calls the cloud services, and caches the results of the emotion detection for each image. The performance was evaluated in each class of emotions using prediction accuracy. It has been found that the prediction accuracy for each emotion varies according to the cloud service being used. Similarly, each service provider presents a strong variation of performance according to the class being analyzed, as can be seen with more detail in this artilects.
面部表情是一种可以用来与电脑或其他电子设备交互的通信方式,而从面部识别情感是一种新兴的行为,在许多领域都有应用。有许多基于云计算的视觉应用程序编程接口可用,可以从面部图像和视频中提取情感。在本文中,使用了一个公开的面部情感图像数据集,对两个知名的API的性能进行了比较。为了进行这些实验,开发了一款客户端程序,该程序对图像集进行迭代,调用云计算服务,并缓存每个图像的情感检测结果。在每个情感类别中,使用预测精度进行评估。发现每个情感的预测精度根据使用的云计算服务而异。类似地,每个服务提供商都展示了根据被分析的情感类别的强烈性能变化,这可以在本文中以更详细的方式看到。
https://arxiv.org/abs/2303.12974
The potential of large language models (LLMs) to reason like humans has been a highly contested topic in Machine Learning communities. However, the reasoning abilities of humans are multifaceted and can be seen in various forms, including analogical, spatial and moral reasoning, among others. This fact raises the question whether LLMs can perform equally well across all these different domains. This research work aims to investigate the performance of LLMs on different reasoning tasks by conducting experiments that directly use or draw inspirations from existing datasets on analogical and spatial reasoning. Additionally, to evaluate the ability of LLMs to reason like human, their performance is evaluted on more open-ended, natural language questions. My findings indicate that LLMs excel at analogical and moral reasoning, yet struggle to perform as proficiently on spatial reasoning tasks. I believe these experiments are crucial for informing the future development of LLMs, particularly in contexts that require diverse reasoning proficiencies. By shedding light on the reasoning abilities of LLMs, this study aims to push forward our understanding of how they can better emulate the cognitive abilities of humans.
大型语言模型(LLM)像人类一样进行推理的潜在能力一直是机器学习社区中高度争议的话题。然而,人类的思维能力具有多方面的特点,可以表现在不同的形式中,包括类比、空间和行为推理等。这一事实引发了一个问题,即LLM是否能在所有不同的领域中表现同样出色。本研究旨在通过直接使用或借鉴现有的类比和空间推理数据集来开展实验,以研究LLM在不同推理任务中的表现。此外,为了评估LLM像人类一样推理的能力,我们对更加开放自然语言问题的表现进行了评估。我的研究结果表明,LLM在类比和道德推理方面表现优异,但在空间推理任务中表现不足。我相信这些实验对于LLM未来的发展前景至关重要,特别是在需要多种推理能力的场景下。通过深入研究LLM的推理能力,本研究旨在推动我们理解如何更好地模拟人类的认知能力。
https://arxiv.org/abs/2303.12810
A positive phrase or a sentence with an underlying negative motive is usually defined as sarcasm that is widely used in today's social media platforms such as Facebook, Twitter, Reddit, etc. In recent times active users in social media platforms are increasing dramatically which raises the need for an automated NLP-based system that can be utilized in various tasks such as determining market demand, sentiment analysis, threat detection, etc. However, since sarcasm usually implies the opposite meaning and its detection is frequently a challenging issue, data meaning extraction through an NLP-based model becomes more complicated. As a result, there has been a lot of study on sarcasm detection in English over the past several years, and there's been a noticeable improvement and yet sarcasm detection in the Bangla language's state remains the same. In this article, we present a BERT-based system that can achieve 99.60\% while the utilized traditional machine learning algorithms are only capable of achieving 89.93\%. Additionally, we have employed Local Interpretable Model-Agnostic Explanations that introduce explainability to our system. Moreover, we have utilized a newly collected bangla sarcasm dataset, BanglaSarc that was constructed specifically for the evaluation of this study. This dataset consists of fresh records of sarcastic and non-sarcastic comments, the majority of which are acquired from Facebook and YouTube comment sections.
一个正面短语或句子背后存在消极动机通常被定义为 sarcastic,在今天的社交媒体平台上如Facebook、Twitter、Reddit等广泛使用。近年来,社交媒体平台上的活跃用户数量急剧增加,这导致了需要一种自动化的 NLP 系统,可以在各种任务中使用,例如确定市场需求、情绪分析、威胁检测等。然而,由于 sarcastic 往往意味着相反的含义,其检测经常是一个具有挑战性的问题,因此通过 NLP 模型提取数据含义变得更加复杂。因此,过去几年中,在英语中研究了 sarcastic 检测,取得了明显进展,然而孟加拉语中 sarcastic 检测仍然相同。在本文中,我们介绍了一种 BERT 系统,可以实现 99.60%,而使用的传统机器学习算法只能实现 89.93%。此外,我们采用了 local 解释模型无关的本地解释,引入我们的系统的可解释性。此外,我们使用了 newly collected孟加拉语 sarcastic 数据集 BanglaSarc,该数据集专门为评估本研究而构建。该数据集包括新鲜记录的 sarcastic 和非 sarcastic 评论,其中大多数评论是从 Facebook 和 YouTube 评论 sections 收集的。
https://arxiv.org/abs/2303.12772
Design mockups are essential instruments for visualizing and testing design ideas. However, the process of generating mockups can be time-consuming and challenging for designers. In this article, we present and evaluate two different modalities for generating mockup ideas to support designers in their work: (1) a sketch-based approach to generate mockups based on hand-drawn sketches, and (2) a semantic-based approach to generate interfaces based on a set of predefined design elements. To evaluate the effectiveness of these two approaches, we conducted a series of experiments with 13 participants in which we asked them to generate mockups using each modality. Our results show that sketch-based generation was more intuitive and expressive, while semantic-based generative AI obtained better results in terms of quality and fidelity. Both methods can be valuable tools for UI designers looking to increase their creativity and efficiency.
设计原型是可视化和测试设计想法的必要工具。然而,生成原型的过程对设计师来说可能会是一项耗时且具有挑战性的活动。在本文中,我们介绍了和评估了两种生成设计原型的不同方式,以支持设计师在工作中:(1)基于手绘草图的生成方式,(2)基于一组预先定义的设计元素的语义生成方式。为了评估这两种方法的有效性,我们与13名参与者进行了一系列实验,让他们使用每种方式生成原型。我们的结果表明,基于手绘草图的生成方式更具直觉和表现力,而基于语义生成方式的人工智能在质量和清晰度方面取得更好的结果。这两种方法对于想要提高UI设计师的创造力和效率的设计师来说都是非常有价值的工具。
https://arxiv.org/abs/2303.12709
We introduce Uni-Fusion, an universal continuous mapping framework for surfaces, surface properties (color, infrared, etc.) and more (latent features in CLIP embedding space, etc.). We propose the first Universal Implicit Encoding model that supports encoding of both geometry and various types of properties (RGB, infrared, feature and etc.) without the need for any training. Based on that, our framework divides the point cloud into regular grid voxels and produces a latent feature in each voxel to form a Latent Implicit Map (LIM) for geometries and arbitrary properties. Then, by fusing a Local LIM of new frame to Global LIM, an incremental reconstruction is approached. Encoded with corresponding types of data, our Latent Implicit Map is capable to generate continuous surfaces, surface properties fields, surface feature fields and any other possible options. To demonstrate the capabilities of our model, we implement three applications: (1) incremental reconstruction for surfaces and color (2) 2D-to-3D fabricated properties transfers (3) open-vocabulary scene understanding by producing a text CLIP feature field on surfaces. We evaluate Uni-Fusion by comparing in corresponding applications, from which, Uni-Fusion shows high flexibility to various of application while performing best or competitive. The project page of Uni-Fusion is available at this https URL
我们引入了 Uni-Fusion,一个适用于表面、表面属性(颜色、红外等)以及更多的 universal 连续映射框架。我们提出了第一个 universal implicit 编码模型,该模型无需任何训练即可支持几何体和任意属性的编码(如 RGB、红外、特征等)。基于该模型,我们将其点云按 regular grid voxels 分割成单个的隐式映射(LIM)单元,并在每个 voxel 中产生隐式特征,以形成几何体和任意属性的隐式映射(LIM)。然后,通过将新帧的 local LIM 与 global LIM 融合,增量重建被 approached。与相应的数据编码,我们的隐式 implicit 映射可以生成连续的表面、表面属性场、表面特征场和任何其他可能的选择。为了展示我们模型的能力,我们实现了三个应用:(1)增量重建用于表面和颜色;(2)2D 到 3D 制造属性转移;(3)通过在表面上生成文本 CLIP 特征场,实现开放词汇场景理解。我们比较了相应的应用,Uni-Fusion 在表现最佳或竞争环境中表现出高灵活性。Uni-Fusion 项目的页面在此 https URL 上可用。
https://arxiv.org/abs/2303.12678
A hard challenge in developing practical face recognition (FR) attacks is due to the black-box nature of the target FR model, i.e., inaccessible gradient and parameter information to attackers. While recent research took an important step towards attacking black-box FR models through leveraging transferability, their performance is still limited, especially against online commercial FR systems that can be pessimistic (e.g., a less than 50% ASR--attack success rate on average). Motivated by this, we present Sibling-Attack, a new FR attack technique for the first time explores a novel multi-task perspective (i.e., leveraging extra information from multi-correlated tasks to boost attacking transferability). Intuitively, Sibling-Attack selects a set of tasks correlated with FR and picks the Attribute Recognition (AR) task as the task used in Sibling-Attack based on theoretical and quantitative analysis. Sibling-Attack then develops an optimization framework that fuses adversarial gradient information through (1) constraining the cross-task features to be under the same space, (2) a joint-task meta optimization framework that enhances the gradient compatibility among tasks, and (3) a cross-task gradient stabilization method which mitigates the oscillation effect during attacking. Extensive experiments demonstrate that Sibling-Attack outperforms state-of-the-art FR attack techniques by a non-trivial margin, boosting ASR by 12.61% and 55.77% on average on state-of-the-art pre-trained FR models and two well-known, widely used commercial FR systems.
在开发实际人脸识别攻击方面,面临的一项难题是由于目标人脸识别模型的black-box性质,即攻击者难以获取梯度和参数信息。尽管最近的研究通过利用可移植性攻击 black-box 的人脸识别模型迈出了重要一步,但它们的表现仍然有限,特别是针对可能悲观的在线商业人脸识别系统(例如平均攻击成功率低于50%)。因此,我们提出了Sibling-Attack,这是一种新的人脸识别攻击技术,首次探索了一种新的多任务视角(即利用多相关任务增加攻击可移植性)。Intuitively,Sibling-Attack选择一组与人脸识别相关的任务,并根据理论分析和定量分析选择了在Sibling-Attack中使用的任务,即Attribute Recognition (AR) 任务。Sibling-Attack then开发了一个融合对抗梯度信息的优化框架,通过(1)限制交叉任务特征在同一空间内,(2)一个联合任务meta优化框架,增强任务之间的梯度兼容性,以及(3)一个交叉任务梯度稳定方法,在攻击期间减缓振荡效应。广泛的实验结果表明,Sibling-Attack在先进的人脸识别攻击技术中表现出了非同寻常的优势,平均提高了攻击成功率12.61%和55.77%。在两个知名且广泛应用的人脸识别商业系统中,Sibling-Attack的表现更是远远超过了最先进的技术。
https://arxiv.org/abs/2303.12512
The programming of robotic assembly tasks is a key component in manufacturing and automation. Force-sensitive assembly, however, often requires reactive strategies to handle slight changes in positioning and unforeseen part jamming. Learning such strategies from human performance is a promising approach, but faces two common challenges: the handling of low part clearances which is difficult to capture from demonstrations and learning intuitive strategies offline without access to the real hardware. We address these two challenges by learning probabilistic force strategies from data that are easily acquired offline in a robot-less simulation from human demonstrations with a joystick. We combine a Long Short Term Memory (LSTM) and a Mixture Density Network (MDN) to model human-inspired behavior in such a way that the learned strategies transfer easily onto real hardware. The experiments show a UR10e robot that completes a plastic assembly with clearances of less than 100 micrometers whose strategies were solely demonstrated in simulation.
机器人组装任务编程是制造业和自动化的关键组件。然而,对于具有微弱位置变化和意外部分阻塞的特性的组装任务,常常需要反应性策略来处理。从人类表演中学习这些策略是一个有前途的方法,但面临两个常见的挑战:处理难以从演示中捕捉的低部件间隙,以及在没有机器人的情况下在没有访问实际硬件的情况下学习直觉策略。我们通过从容易在离线模拟中获取的数据中学习概率性的力量策略,解决这两个挑战。我们结合了长短期记忆(LSTM)和混合密度网络(MDN),以模型人类启发性行为,以便 learned 策略可以轻松地转移到实际硬件上。实验展示了一个UR10e机器人完成小于100微米的塑料装配,其策略仅通过模拟演示得到。
https://arxiv.org/abs/2303.12440