We introduce SAOR, a novel approach for estimating the 3D shape, texture, and viewpoint of an articulated object from a single image captured in the wild. Unlike prior approaches that rely on pre-defined category-specific 3D templates or tailored 3D skeletons, SAOR learns to articulate shapes from single-view image collections with a skeleton-free part-based model without requiring any 3D object shape priors. To prevent ill-posed solutions, we propose a cross-instance consistency loss that exploits disentangled object shape deformation and articulation. This is helped by a new silhouette-based sampling mechanism to enhance viewpoint diversity during training. Our method only requires estimated object silhouettes and relative depth maps from off-the-shelf pre-trained networks during training. At inference time, given a single-view image, it efficiently outputs an explicit mesh representation. We obtain improved qualitative and quantitative results on challenging quadruped animals compared to relevant existing work.
我们介绍了 SAOR,一种从野生图像中估计具有关节的物体的三维形状、纹理和视角的新方法。与之前的方法依赖于预先定义的特定类别3D模板或定制的3D骨骼,SAOR使用无骨骼个体模型从单个视角图像集中学习关节形状,而不需要任何3D物体形状先验。为了防止不整的解决方案,我们提出了一种交叉实例一致性损失,利用分离物体形状变形和关节。这通过一种新的轮廓采样机制在训练期间增强视角多样性。在我们的方法中,只需要在训练期间从现有的预训练网络估计物体轮廓和相对深度图。在推理期间,给定单个视角图像,它高效输出明确的网格表示。与相关的现有工作相比,我们对挑战性的四足动物在质量和数量方面取得了改进的结果。
https://arxiv.org/abs/2303.13514
Images taken under low-light conditions tend to suffer from poor visibility, which can decrease image quality and even reduce the performance of the downstream tasks. It is hard for a CNN-based method to learn generalized features that can recover normal images from the ones under various unknow low-light conditions. In this paper, we propose to incorporate the contrastive learning into an illumination correction network to learn abstract representations to distinguish various low-light conditions in the representation space, with the purpose of enhancing the generalizability of the network. Considering that light conditions can change the frequency components of the images, the representations are learned and compared in both spatial and frequency domains to make full advantage of the contrastive learning. The proposed method is evaluated on LOL and LOL-V2 datasets, the results show that the proposed method achieves better qualitative and quantitative results compared with other state-of-the-arts.
在低光环境下拍摄的图像往往会出现视野不佳的情况,这可能会降低图像质量,甚至影响后续任务的表现。基于卷积神经网络的方法很难学习通用的特征,以便从各种未知低光环境下恢复正常图像。在本文中,我们提出将对比学习融入照明纠正网络中,学习抽象表示来在表示空间中区分各种低光条件,以增强网络的泛化能力。考虑到光照条件可以改变图像的频谱成分,我们将在空间域和频域中学习表示并进行比较,以充分利用对比学习的优势。我们使用LOL和LOL-V2数据集评估了所选方法,结果表明,与其他先进技术相比,该方法取得了更好的质量和定量结果。
https://arxiv.org/abs/2303.13412
Quality assessment algorithms can be used to estimate the utility of a biometric sample for the purpose of biometric recognition. "Error versus Discard Characteristic" (EDC) plots, and "partial Area Under Curve" (pAUC) values of curves therein, are generally used by researchers to evaluate the predictive performance of such quality assessment algorithms. An EDC curve depends on an error type such as the "False Non Match Rate" (FNMR), a quality assessment algorithm, a biometric recognition system, a set of comparisons each corresponding to a biometric sample pair, and a comparison score threshold corresponding to a starting error. To compute an EDC curve, comparisons are progressively discarded based on the associated samples' lowest quality scores, and the error is computed for the remaining comparisons. Additionally, a discard fraction limit or range must be selected to compute pAUC values, which can then be used to quantitatively rank quality assessment algorithms. This paper discusses and analyses various details for this kind of quality assessment algorithm evaluation, including general EDC properties, interpretability improvements for pAUC values based on a hard lower error limit and a soft upper error limit, the use of relative instead of discrete rankings, stepwise vs. linear curve interpolation, and normalisation of quality scores to a [0, 100] integer range. We also analyse the stability of quantitative quality assessment algorithm rankings based on pAUC values across varying pAUC discard fraction limits and starting errors, concluding that higher pAUC discard fraction limits should be preferred. The analyses are conducted both with synthetic data and with real data for a face image quality assessment scenario, with a focus on general modality-independent conclusions for EDC evaluations.
质量评估算法可以用来估计一个生物特征样本用于生物特征识别的有用性。 "错误-排除特征" (EDC) 绘图和曲线中的 "部分平均面积" (pAUC) 值通常被研究人员用于评估这些质量评估算法的预测性能。 EDC 曲线取决于错误类型,例如 "False 不匹配率" (FNMR), 质量评估算法,生物特征识别系统,一组对应于生物特征样本对的比较,以及一个比较分数阈值对应于起始错误。为了计算 EDC 曲线,比较是按相关样本的最低质量分数逐步排除的,而错误是计算剩余的比较。此外,必须选择计算 pAUC 值的范围来计算 pAUC 值,然后用于定量评估质量评估算法。 本 paper 讨论和分析了评估这种质量评估算法评估的各种细节,包括 general EDC 特性,基于 hard 的低错误限制和 soft 的高等错误限制的 pAUC 值的解释性改进,使用相对排名而不是离散排名, stepwise 和线性曲线插值,以及质量分数的归一化到 [0, 100] 整数范围内。我们还分析基于 pAUC 值在不同 pAUC 丢弃 fraction 限制和起始错误的稳定性,得出结论较高 pAUC 丢弃 fraction 限制应该优先考虑。 分析使用了合成数据和真实数据的一个面部图像质量评估场景,专注于 EDC 评估的一般模式无关结论。
https://arxiv.org/abs/2303.13294
Knee OsteoArthritis (KOA) is a prevalent musculoskeletal disorder that causes decreased mobility in seniors. The diagnosis provided by physicians is subjective, however, as it relies on personal experience and the semi-quantitative Kellgren-Lawrence (KL) scoring system. KOA has been successfully diagnosed by Computer-Aided Diagnostic (CAD) systems that use deep learning techniques like Convolutional Neural Networks (CNN). In this paper, we propose a novel Siamese-based network, and we introduce a new hybrid loss strategy for the early detection of KOA. The model extends the classical Siamese network by integrating a collection of Global Average Pooling (GAP) layers for feature extraction at each level. Then, to improve the classification performance, a novel training strategy that partitions each training batch into low-, medium- and high-confidence subsets, and a specific hybrid loss function are used for each new label attributed to each sample. The final loss function is then derived by combining the latter loss functions with optimized weights. Our test results demonstrate that our proposed approach significantly improves the detection performance.
Knee OsteoArthritis (KOA) 是一种普遍存在的骨关节炎,会导致老年人减少 mobility。然而,医生提供的诊断仍然是主观的,因为它依赖于个人经验和半定量的凯蒙德-拉森(KL)评分系统。KOA 通过使用使用深度学习技术如卷积神经网络(CNN)的计算机辅助诊断(CAD)系统而被成功诊断。在本文中,我们提出了一种新的对称神经网络,并介绍了一种用于早期检测 KOA 的新混合损失策略。模型通过将每个级别的特征提取集合中的全局平均池化层(GAP)集成起来,扩展了传统的对称神经网络。为了改善分类性能,我们提出了一种新的训练策略,将每个训练批次分为低、中和高信噪比的子集,并为每个样本分配每个新标签使用的特定混合损失函数。最后,通过将后一种损失函数与优化权重相结合,得出最终的 loss 函数。我们的测试结果表明,我们提出的这种方法显著提高了检测性能。
https://arxiv.org/abs/2303.13203
Detecting sets of relevant patterns from a given dataset is an important challenge in data mining. The relevance of a pattern, also called utility in the literature, is a subjective measure and can be actually assessed from very different points of view. Rule-based languages like Answer Set Programming (ASP) seem well suited for specifying user-provided criteria to assess pattern utility in a form of constraints; moreover, declarativity of ASP allows for a very easy switch between several criteria in order to analyze the dataset from different points of view. In this paper, we make steps toward extending the notion of High Utility Pattern Mining (HUPM); in particular we introduce a new framework that allows for new classes of utility criteria not considered in the previous literature. We also show how recent extensions of ASP with external functions can support a fast and effective encoding and testing of the new framework. To demonstrate the potential of the proposed framework, we exploit it as a building block for the definition of an innovative method for predicting ICU admission for COVID-19 patients. Finally, an extensive experimental activity demonstrates both from a quantitative and a qualitative point of view the effectiveness of the proposed approach. Under consideration in Theory and Practice of Logic Programming (TPLP)
在数据挖掘中,检测给定数据集中的相关模式是一项重要的挑战。模式 relevance 也称为文献中的 utility,是一种主观测量,可以从非常不同的角度进行评估。规则语言如 Answer Set Programming (ASP)似乎非常适合指定用户提供的标准以评估模式 utility 的形式进行约束;此外,ASP 的 declarativity 允许非常轻松地切换多个标准以分析数据集从不同的角度。在本文中,我们朝着扩展高 Utility 模式挖掘 (HUPM)的概念迈出一步;特别是,我们引入了一个新的框架,该框架允许新的 utility 标准class,在先前的文献中未考虑。我们还展示了如何使用最近的 ASP 扩展外部函数支持快速且有效的编码和测试新框架。为了展示新框架的潜力,我们利用它作为构建块,定义一种预测 COVID-19 患者重症监护病房接纳方法的创新性方法。最后,一项广泛的实验活动从定量和定性角度证明了所提出的方法的有效性。在逻辑编程的理论和实践中,正在考虑。
https://arxiv.org/abs/2303.13191
This paper presents a new adversarial training framework for image inpainting with segmentation confusion adversarial training (SCAT) and contrastive learning. SCAT plays an adversarial game between an inpainting generator and a segmentation network, which provides pixel-level local training signals and can adapt to images with free-form holes. By combining SCAT with standard global adversarial training, the new adversarial training framework exhibits the following three advantages simultaneously: (1) the global consistency of the repaired image, (2) the local fine texture details of the repaired image, and (3) the flexibility of handling images with free-form holes. Moreover, we propose the textural and semantic contrastive learning losses to stabilize and improve our inpainting model's training by exploiting the feature representation space of the discriminator, in which the inpainting images are pulled closer to the ground truth images but pushed farther from the corrupted images. The proposed contrastive losses better guide the repaired images to move from the corrupted image data points to the real image data points in the feature representation space, resulting in more realistic completed images. We conduct extensive experiments on two benchmark datasets, demonstrating our model's effectiveness and superiority both qualitatively and quantitatively.
本论文提出了一种用于图像修复与分割混淆GAN训练(SCAT)和对比学习的新对抗训练框架。SCAT是一个修复生成器和分割网络之间的对抗游戏,它提供像素级别的局部训练信号,并能够适应具有自由漏洞的图像。通过将SCAT与标准全局GAN训练相结合,新框架同时表现出以下三个优点:(1)修复图像的全局一致性,(2)修复图像的局部细节特征,(3)能够灵活处理具有自由漏洞的图像。此外,我们提出了纹理和语义对比学习损失,以稳定和提高修复模型的训练,利用分选器的特征表示空间,在修复图像靠近 ground truth 图像但远离损坏图像的情况下,将修复图像从损坏图像数据点移动到特征表示空间的真实图像数据点,从而生成更真实的完成图像。我们在两个基准数据集上进行了广泛的实验,证明了我们的模型的有效性和优越性,既定性上也定量上。
https://arxiv.org/abs/2303.13133
Digital human motion synthesis is a vibrant research field with applications in movies, AR/VR, and video games. Whereas methods were proposed to generate natural and realistic human motions, most only focus on modeling humans and largely ignore object movements. Generating task-oriented human-object interaction motions in simulation is challenging. For different intents of using the objects, humans conduct various motions, which requires the human first to approach the objects and then make them move consistently with the human instead of staying still. Also, to deploy in downstream applications, the synthesized motions are desired to be flexible in length, providing options to personalize the predicted motions for various purposes. To this end, we propose TOHO: Task-Oriented Human-Object Interactions Generation with Implicit Neural Representations, which generates full human-object interaction motions to conduct specific tasks, given only the task type, the object, and a starting human status. TOHO generates human-object motions in three steps: 1) it first estimates the keyframe poses of conducting a task given the task type and object information; 2) then, it infills the keyframes and generates continuous motions; 3) finally, it applies a compact closed-form object motion estimation to generate the object motion. Our method generates continuous motions that are parameterized only by the temporal coordinate, which allows for upsampling or downsampling of the sequence to arbitrary frames and adjusting the motion speeds by designing the temporal coordinate vector. We demonstrate the effectiveness of our method, both qualitatively and quantitatively. This work takes a step further toward general human-scene interaction simulation.
数字人体运动合成是一个充满活力的研究领域,应用领域包括电影、增强现实和视频游戏。虽然人们提出了方法来生成自然和真实的人体运动,但大多数人只关注建模人类,而忽视了物体的运动。在模拟中生成面向任务的人类-物体交互运动非常困难。针对不同的目的,人类会采取各种运动,这需要人类先接近物体,然后使它们与人类一起运动,而不是静止不动。此外,为了在后续应用中部署,合成的运动长度应该灵活,为各种用途提供定制预测运动的选择。为此,我们提出了TOHO:面向任务的任务间相互作用生成,该方法通过隐含神经网络表示生成全部人类-物体交互运动,以执行特定任务,只需给出任务类型、物体信息和起始人类状态。TOHO在三个步骤中生成人类-物体运动:1)首先根据任务类型和物体信息估计关键帧姿态,2)然后填充关键帧并生成连续运动,3)最后应用紧凑的 closed-form 物体运动估计生成物体运动。我们的方法生成连续运动的唯一参数是时间坐标,这使得可以采样或裁剪序列到任意帧,并通过设计时间坐标向量调整运动速度。我们证明了我们方法的高效性,既定性又定量。这项工作进一步迈向一般人类场景交互模拟。
https://arxiv.org/abs/2303.13129
Face recognition models embed a face image into a low-dimensional identity vector containing abstract encodings of identity-specific facial features that allow individuals to be distinguished from one another. We tackle the challenging task of inverting the latent space of pre-trained face recognition models without full model access (i.e. black-box setting). A variety of methods have been proposed in literature for this task, but they have serious shortcomings such as a lack of realistic outputs, long inference times, and strong requirements for the data set and accessibility of the face recognition model. Through an analysis of the black-box inversion problem, we show that the conditional diffusion model loss naturally emerges and that we can effectively sample from the inverse distribution even without an identity-specific loss. Our method, named identity denoising diffusion probabilistic model (ID3PM), leverages the stochastic nature of the denoising diffusion process to produce high-quality, identity-preserving face images with various backgrounds, lighting, poses, and expressions. We demonstrate state-of-the-art performance in terms of identity preservation and diversity both qualitatively and quantitatively. Our method is the first black-box face recognition model inversion method that offers intuitive control over the generation process and does not suffer from any of the common shortcomings from competing methods.
人脸识别模型将面部图像嵌入低维身份向量中,其中包含身份特定的面部特征抽象编码,以便能够区分个体。我们解决了没有全模型访问(即黑盒设置)的挑战,即反转训练后的面部识别模型的潜在空间。在文献中提出了多种方法来完成这项工作,但它们都有严重的缺陷,例如缺乏实际输出、推理时间长、以及对数据集和面部识别模型访问的强烈要求。通过对黑盒反转问题的分析,我们表明条件扩散模型损失自然地出现,并且即使没有身份特定的损失,我们仍然可以从逆分布中有效地采样。我们的方法名为身份去噪扩散概率模型(ID3PM),利用去噪扩散过程的随机性质,以产生各种背景、照明、姿势和面部表情的高质量、身份保留的面部图像。我们证明了身份保留和多样性的高水平、高定量表现。我们的方法是第一个黑盒面部识别模型反转方法,提供了直观的控制生成过程,并且没有与竞争方法的普遍缺点的任何影响。
https://arxiv.org/abs/2303.13006
Recent portrait relighting methods have achieved realistic results of portrait lighting effects given a desired lighting representation such as an environment map. However, these methods are not intuitive for user interaction and lack precise lighting control. We introduce LightPainter, a scribble-based relighting system that allows users to interactively manipulate portrait lighting effect with ease. This is achieved by two conditional neural networks, a delighting module that recovers geometry and albedo optionally conditioned on skin tone, and a scribble-based module for relighting. To train the relighting module, we propose a novel scribble simulation procedure to mimic real user scribbles, which allows our pipeline to be trained without any human annotations. We demonstrate high-quality and flexible portrait lighting editing capability with both quantitative and qualitative experiments. User study comparisons with commercial lighting editing tools also demonstrate consistent user preference for our method.
最近肖像重曝方法已经实现了给定环境地图所需的理想照明效果的逼真结果。然而,这些方法对用户交互不够直观,缺乏精确的照明控制。我们介绍了光画者(LightPainter),一种基于涂画系统的重曝系统,使用户可以轻松地交互式操纵肖像照明效果。这通过两个条件神经网络实现,一个令人愉悦模块可以从肤色可选地恢复几何和反射,另一个是基于涂画系统的重曝模块。为了训练重曝模块,我们提出了一种独特的涂画模拟程序,以模拟真实的用户涂画,从而使我们的管道无需人类标注就可以训练。我们通过定量和定性实验展示了高质量的和灵活的肖像照明编辑能力。用户研究比较了我们方法的商业照明编辑工具也证明了用户对这种方法的一致性偏好。
https://arxiv.org/abs/2303.12950
Font design is of vital importance in the digital content design and modern printing industry. Developing algorithms capable of automatically synthesizing vector fonts can significantly facilitate the font design process. However, existing methods mainly concentrate on raster image generation, and only a few approaches can directly synthesize vector fonts. This paper proposes an end-to-end trainable method, VecFontSDF, to reconstruct and synthesize high-quality vector fonts using signed distance functions (SDFs). Specifically, based on the proposed SDF-based implicit shape representation, VecFontSDF learns to model each glyph as shape primitives enclosed by several parabolic curves, which can be precisely converted to quadratic Bézier curves that are widely used in vector font products. In this manner, most image generation methods can be easily extended to synthesize vector fonts. Qualitative and quantitative experiments conducted on a publicly-available dataset demonstrate that our method obtains high-quality results on several tasks, including vector font reconstruction, interpolation, and few-shot vector font synthesis, markedly outperforming the state of the art.
字体设计在数字内容设计和现代印刷业中至关重要。开发能够自动合成矢量字体的算法可以大大简化字体设计过程。然而,现有的方法主要关注Raster image generation,而且只有几种方法可以直接合成矢量字体。本文提出了一种端到端训练的方法, VecFontSDF,使用 signed distance functions (SDFs) 重构和合成高质量的矢量字体。具体来说,基于所提出的SDF为基础的隐含形状表示, VecFontSDF学习将每个字符视为形状基本单元,被精确转换为在矢量字体产品中广泛使用的 quadratic Bezier 曲线。因此,大多数生成方法可以轻松地扩展用于合成矢量字体。在公开数据集上进行定性和定量实验表明,我们的方法在多个任务中取得了高质量的结果,包括矢量字体重建、插值和少量的矢量字体合成,显著超越了现有技术水平。
https://arxiv.org/abs/2303.12675
In this paper, we introduce a new approach, called "Posthoc Interpretation via Quantization (PIQ)", for interpreting decisions made by trained classifiers. Our method utilizes vector quantization to transform the representations of a classifier into a discrete, class-specific latent space. The class-specific codebooks act as a bottleneck that forces the interpreter to focus on the parts of the input data deemed relevant by the classifier for making a prediction. We evaluated our method through quantitative and qualitative studies and found that PIQ generates interpretations that are more easily understood by participants to our user studies when compared to several other interpretation methods in the literature.
在本文中,我们介绍了一种新方法,称为“后算术解释法(PIQ)”,用于解释训练分类器做出的决策。我们的算法利用向量化将分类器表示转换为离散、类特异性的潜在空间。类特异性编码书作为瓶颈,迫使解释者专注于分类器认为 relevant 的输入数据的部分,以进行预测。我们通过量化和定性研究评估了我们的算法,并发现与文献中的多个其他解释方法相比,PIQ产生的解释更容易让参与者理解。
https://arxiv.org/abs/2303.12659
Assisting people in efficiently producing visually plausible 3D characters has always been a fundamental research topic in computer vision and computer graphics. Recent learning-based approaches have achieved unprecedented accuracy and efficiency in the area of 3D real human digitization. However, none of the prior works focus on modeling 3D biped cartoon characters, which are also in great demand in gaming and filming. In this paper, we introduce 3DBiCar, the first large-scale dataset of 3D biped cartoon characters, and RaBit, the corresponding parametric model. Our dataset contains 1,500 topologically consistent high-quality 3D textured models which are manually crafted by professional artists. Built upon the data, RaBit is thus designed with a SMPL-like linear blend shape model and a StyleGAN-based neural UV-texture generator, simultaneously expressing the shape, pose, and texture. To demonstrate the practicality of 3DBiCar and RaBit, various applications are conducted, including single-view reconstruction, sketch-based modeling, and 3D cartoon animation. For the single-view reconstruction setting, we find a straightforward global mapping from input images to the output UV-based texture maps tends to lose detailed appearances of some local parts (e.g., nose, ears). Thus, a part-sensitive texture reasoner is adopted to make all important local areas perceived. Experiments further demonstrate the effectiveness of our method both qualitatively and quantitatively. 3DBiCar and RaBit are available at this http URL.
协助人们高效制造视觉效果合理的三维角色一直是计算机视觉和计算机图形领域的基本研究话题。近年来,基于学习的方法在3D真实人类数字化领域实现了前所未有的准确性和效率。然而,以前的工作并没有专注于建模三维双足卡通角色,这些角色在游戏和电影拍摄中也非常受欢迎。在本文中,我们介绍了3DBiCar,这是第一个大规模3D双足卡通角色数据集,以及RaBit,它的参数化模型。我们的数据集包含了由专业艺术家手动制作的具有正确拓扑结构的高质量3D纹理模型1500个,基于这些数据,RaBit采用类似于SMPL的线性混合形状模型和基于风格GAN的 UV纹理生成神经网络,同时表达形状、姿态和纹理。为了展示3DBiCar和RaBit的实际可行性,我们进行了多种应用测试,包括单视角重建、基于 Sketch 的建模和3D卡通动画。对于单视角重建设置,我们发现从输入图像到输出 UV 纹理映射的直观全球映射倾向于失去一些局部部件(如鼻子和耳朵)的详细外观。因此,采用了局部纹理敏感的原因分析器,以便所有重要局部区域都被感知到。实验还证明了我们方法的定性和定量效果。3DBiCar和RaBit可在本网站的http://URL上获取。
https://arxiv.org/abs/2303.12564
A hard challenge in developing practical face recognition (FR) attacks is due to the black-box nature of the target FR model, i.e., inaccessible gradient and parameter information to attackers. While recent research took an important step towards attacking black-box FR models through leveraging transferability, their performance is still limited, especially against online commercial FR systems that can be pessimistic (e.g., a less than 50% ASR--attack success rate on average). Motivated by this, we present Sibling-Attack, a new FR attack technique for the first time explores a novel multi-task perspective (i.e., leveraging extra information from multi-correlated tasks to boost attacking transferability). Intuitively, Sibling-Attack selects a set of tasks correlated with FR and picks the Attribute Recognition (AR) task as the task used in Sibling-Attack based on theoretical and quantitative analysis. Sibling-Attack then develops an optimization framework that fuses adversarial gradient information through (1) constraining the cross-task features to be under the same space, (2) a joint-task meta optimization framework that enhances the gradient compatibility among tasks, and (3) a cross-task gradient stabilization method which mitigates the oscillation effect during attacking. Extensive experiments demonstrate that Sibling-Attack outperforms state-of-the-art FR attack techniques by a non-trivial margin, boosting ASR by 12.61% and 55.77% on average on state-of-the-art pre-trained FR models and two well-known, widely used commercial FR systems.
在开发实际人脸识别攻击方面,面临的一项难题是由于目标人脸识别模型的black-box性质,即攻击者难以获取梯度和参数信息。尽管最近的研究通过利用可移植性攻击 black-box 的人脸识别模型迈出了重要一步,但它们的表现仍然有限,特别是针对可能悲观的在线商业人脸识别系统(例如平均攻击成功率低于50%)。因此,我们提出了Sibling-Attack,这是一种新的人脸识别攻击技术,首次探索了一种新的多任务视角(即利用多相关任务增加攻击可移植性)。Intuitively,Sibling-Attack选择一组与人脸识别相关的任务,并根据理论分析和定量分析选择了在Sibling-Attack中使用的任务,即Attribute Recognition (AR) 任务。Sibling-Attack then开发了一个融合对抗梯度信息的优化框架,通过(1)限制交叉任务特征在同一空间内,(2)一个联合任务meta优化框架,增强任务之间的梯度兼容性,以及(3)一个交叉任务梯度稳定方法,在攻击期间减缓振荡效应。广泛的实验结果表明,Sibling-Attack在先进的人脸识别攻击技术中表现出了非同寻常的优势,平均提高了攻击成功率12.61%和55.77%。在两个知名且广泛应用的人脸识别商业系统中,Sibling-Attack的表现更是远远超过了最先进的技术。
https://arxiv.org/abs/2303.12512
Multi-agent collaborative perception (MCP) has recently attracted much attention. It includes three key processes: communication for sharing, collaboration for integration, and reconstruction for different downstream tasks. Existing methods pursue designing the collaboration process alone, ignoring their intrinsic interactions and resulting in suboptimal performance. In contrast, we aim to propose a Unified Collaborative perception framework named UMC, optimizing the communication, collaboration, and reconstruction processes with the Multi-resolution technique. The communication introduces a novel trainable multi-resolution and selective-region (MRSR) mechanism, achieving higher quality and lower bandwidth. Then, a graph-based collaboration is proposed, conducting on each resolution to adapt the MRSR. Finally, the reconstruction integrates the multi-resolution collaborative features for downstream tasks. Since the general metric can not reflect the performance enhancement brought by MCP systematically, we introduce a brand-new evaluation metric that evaluates the MCP from different perspectives. To verify our algorithm, we conducted experiments on the V2X-Sim and OPV2V datasets. Our quantitative and qualitative experiments prove that the proposed UMC greatly outperforms the state-of-the-art collaborative perception approaches.
多Agent 协作感知(MCP)最近引起了广泛关注。它包括三个关键过程:沟通分享、协作整合,以及不同下游任务的重建。现有的方法主要致力于设计协作过程,而忽视了其内在相互作用,导致性能最优化。相反,我们旨在提出一个名为UMC的统一协作感知框架,通过多分辨率技术优化沟通、协作和重建过程。沟通引入了一种可训练的多分辨率和选择性区域(MRSR)机制,实现高质量低带宽。然后,我们提出了基于图的协作,在每个分辨率上适应MRSR。最后,重建整合了多分辨率协作特征,以下游任务为例。由于通用指标无法系统地反映 MCP 带来的性能增强,我们引入了一个全新的评估指标,从不同角度评估了 UMC。为了验证我们的算法,我们研究了V2X-Sim和OPV2V数据集。我们的定量和定性实验表明,所提出的UMC远远优于当前先进的协作感知方法。
https://arxiv.org/abs/2303.12400
Autonomous swarms of robots can bring robustness, scalability and adaptability to safety-critical tasks such as search and rescue but their application is still very limited. Using semi-autonomous swarms with human control can bring robot swarms to real-world applications. Human operators can define goals for the swarm, monitor their performance and interfere with, or overrule, the decisions and behaviour. We present the ``Human And Robot Interactive Swarm'' simulator (HARIS) that allows multi-user interaction with a robot swarm and facilitates qualitative and quantitative user studies through simulation of robot swarms completing tasks, from package delivery to search and rescue, with varying levels of human control. In this demonstration, we showcase the simulator by using it to study the performance gain offered by maintaining a ``human-in-the-loop'' over a fully autonomous system as an example. This is illustrated in the context of search and rescue, with an autonomous allocation of resources to those in need.
机器人群集( Swarms of robots)可以在搜索和救援等安全性关键任务中提供 robustness、 scalability 和灵活性,但是其应用仍然非常有限。使用具有人类控制下的半自主机器人群集可以将其带到现实世界的应用中。人类操作员可以定义机器人群集的目标,监测其表现,干扰或推翻决策和行为。我们介绍了“人类和机器人互动群集模拟器”(Haris),它允许多个用户与机器人群集交互,并通过模拟机器人群集完成任务,从包裹配送到搜索和救援,在不同水平人类控制的情况下提供定性和定量的用户研究。在这个演示中,我们展示了模拟器,并通过使用它来研究保持“人类参与循环”对完全自主系统的性能增益的影响。这个例子在搜索和救援上下文中得到了体现,通过自主分配资源以援助需要的人。
https://arxiv.org/abs/2303.12390
We propose a content-based system for matching video and background music. The system aims to address the challenges in music recommendation for new users or new music give short-form videos. To this end, we propose a cross-modal framework VMCML that finds a shared embedding space between video and music representations. To ensure the embedding space can be effectively shared by both representations, we leverage CosFace loss based on margin-based cosine similarity loss. Furthermore, we establish a large-scale dataset called MSVD, in which we provide 390 individual music and the corresponding matched 150,000 videos. We conduct extensive experiments on Youtube-8M and our MSVD datasets. Our quantitative and qualitative results demonstrate the effectiveness of our proposed framework and achieve state-of-the-art video and music matching performance.
我们提出了一种基于内容匹配视频和背景音乐的系统。该系统旨在解决对新用户或新音乐提供简短视频的音乐推荐面临的挑战。为此,我们提出了一种跨媒体框架VMCML,该框架在视频和音乐表示之间找到了共享嵌入空间。为了确保嵌入空间能够 effectively shared by both representations,我们利用基于 margin-based cosine similarity loss的CosFace损失。此外,我们建立了一个大型数据集MSVD,其中提供了390个单独的音乐视频,并匹配了150,000个相关视频。我们在Youtube-8M和MSVD数据集上进行了广泛的实验。我们的定量和定性结果证明了我们提出的框架的有效性,并实现了最先进的视频和音乐匹配表现。
https://arxiv.org/abs/2303.12379
Many material properties are manifested in the morphological appearance and characterized with microscopic image, such as scanning electron microscopy (SEM). Polymer compatibility is a key physical quantity of polymer material and commonly and intuitively judged by SEM images. However, human observation and judgement for the images is time-consuming, labor-intensive and hard to be quantified. Computer image recognition with machine learning method can make up the defects of artificial judging, giving accurate and quantitative judgement. We achieve automatic compatibility recognition utilizing convolution neural network and transfer learning method, and the model obtains up to 94% accuracy. We also put forward a quantitative criterion for polymer compatibility with this model. The proposed method can be widely applied to the quantitative characterization of the microstructure and properties of various materials.
许多材料特性体现在其形态结构和用小角度图像进行特征识别等方面,例如扫描电子显微镜(SEM)。聚合物相容性是聚合物材料的关键物理量,通常通过SEM图像直觉地判断。然而,对图像进行人类观察和判断是耗时、劳动密集型且难以量化的。利用机器学习方法计算机图像识别可以弥补人工判断的缺陷,提供准确和定量的判断。我们利用卷积神经网络和迁移学习方法实现自动相容识别,模型准确率达到94%。我们还提出了与该模型的聚合物相容性的定量标准。该方法可以广泛应用于各种材料微观结构和性质的定量表征。
https://arxiv.org/abs/2303.12360
While the performance of deep convolutional neural networks for image super-resolution (SR) has improved significantly, the rapid increase of memory and computation requirements hinders their deployment on resource-constrained devices. Quantized networks, especially binary neural networks (BNN) for SR have been proposed to significantly improve the model inference efficiency but suffer from large performance degradation. We observe the activation distribution of SR networks demonstrates very large pixel-to-pixel, channel-to-channel, and image-to-image variation, which is important for high performance SR but gets lost during binarization. To address the problem, we propose two effective methods, including the spatial re-scaling as well as channel-wise shifting and re-scaling, which augments binary convolutions by retaining more spatial and channel-wise information. Our proposed models, dubbed EBSR, demonstrate superior performance over prior art methods both quantitatively and qualitatively across different datasets and different model sizes. Specifically, for x4 SR on Set5 and Urban100, EBSRlight improves the PSNR by 0.31 dB and 0.28 dB compared to SRResNet-E2FIF, respectively, while EBSR outperforms EDSR-E2FIF by 0.29 dB and 0.32 dB PSNR, respectively.
虽然深度学习卷积神经网络的图像超分辨率(SR)性能已经显著改善,但内存和计算要求的快速增加阻碍了它们在资源受限的设备上的部署。提议了量化网络,特别是SR中的二进制神经网络(BNN),以提高模型推理效率,但出现了显著的性能下降。我们观察了SR网络的激活分布,显示它们具有巨大的像素到像素、通道到通道和图像到图像变异,这对于高性能SR非常重要,但在二进制分类中迷失了。为了解决这个问题,我们提出了两个有效方法,包括空间重排和通道重排,这增加了二进制卷积核的数量,并通过保留更多的空间和情感通道信息来增强它们。我们提出的模型被称为EBSR,在不同数据集和模型大小上表现出了比现有方法更高的性能和质量。具体来说,对于Set5和Urban100中的x4SR,EBSRlight比SRResNet-E2FIF提高了0.31 dB的PSNR,而EBSR比EDSR-E2FIF提高了0.29 dB的PSNR。
https://arxiv.org/abs/2303.12270
Ex vivo MRI of the brain provides remarkable advantages over in vivo MRI for visualizing and characterizing detailed neuroanatomy, and helps to link microscale histology studies with morphometric measurements. However, automated segmentation methods for brain mapping in ex vivo MRI are not well developed, primarily due to limited availability of labeled datasets, and heterogeneity in scanner hardware and acquisition protocols. In this work, we present a high resolution dataset of 37 ex vivo post-mortem human brain tissue specimens scanned on a 7T whole-body MRI scanner. We developed a deep learning pipeline to segment the cortical mantle by benchmarking the performance of nine deep neural architectures. We then segment the four subcortical structures: caudate, putamen, globus pallidus, and thalamus; white matter hyperintensities, and the normal appearing white matter. We show excellent generalizing capabilities across whole brain hemispheres in different specimens, and also on unseen images acquired at different magnetic field strengths and different imaging sequence. We then compute volumetric and localized cortical thickness measurements across key regions, and link them with semi-quantitative neuropathological ratings. Our code, containerized executables, and the processed datasets are publicly available at: this https URL.
体外MRI扫描大脑提供了比体内MRI对于可视化和描述详细神经解剖的显著优势,并有助于将微尺度脑组织学研究与形态学测量连接起来。然而,在体外MRI扫描中的大脑映射方法并没有得到很好的发展,主要是因为标记数据数量有限,以及扫描设备硬件和采集协议的不一致性。在本研究中,我们提出了一个高分辨率的数据集,包括37个体外离体人类脑组织样本扫描,使用7T全身MRI扫描器进行。我们开发了一道深度学习管道,以基准九种深度学习架构的性能,将 cortical mantle 分段。然后我们分段了四个底结构:奇偶性脑叶、扣带回、脑白质和脑灰质;脑浆过载和正常表现的脑白质。在不同样本中,我们展示了在不同脑半球整个大脑的泛化能力,以及在不同磁场强度和图像序列下获得的未观察过的图像。然后我们计算了关键区域的体积化和定位的cortical厚度测量,并将它们与半定量神经病理学评级连接起来。我们的代码、容器化可执行文件和处理的数据集是公开可用的,在此 https URL 上。
https://arxiv.org/abs/2303.12237
Head MRI pre-processing involves converting raw images to an intensity-normalized, skull-stripped brain in a standard coordinate space. In this paper, we propose an end-to-end weakly supervised learning approach, called Neural Pre-processing (NPP), for solving all three sub-tasks simultaneously via a neural network, trained on a large dataset without individual sub-task supervision. Because the overall objective is highly under-constrained, we explicitly disentangle geometric-preserving intensity mapping (skull-stripping and intensity normalization) and spatial transformation (spatial normalization). Quantitative results show that our model outperforms state-of-the-art methods which tackle only a single sub-task. Our ablation experiments demonstrate the importance of the architecture design we chose for NPP. Furthermore, NPP affords the user the flexibility to control each of these tasks at inference time. The code and model are freely-available at \url{this https URL}.
头部MRI预处理涉及将原始图像转换为一个强度归一化、去颅脑的图像在标准坐标系中。在本文中,我们提出了一种 end-to-end 弱监督学习方法,称为神经网络预处理(NPP),以通过神经网络在训练大型数据集上同时解决所有三个子任务,而无需 individual 子任务监督。由于整体目标 highly constrained,我们 explicitly 分解了几何保持强度映射(去颅脑和强度归一化)和空间转换(空间归一化)。 quantitative 结果显示,我们的模型优于仅处理一个子任务的最新方法。我们的点蚀实验证明了我们选择的架构设计对于 NPP 的重要性。此外,NPP 允许用户在每个推理时间控制每个任务。代码和模型都免费可访问 \url{this https URL}。
https://arxiv.org/abs/2303.12148