Hand-crafted image quality metrics, such as PSNR and SSIM, are commonly used to evaluate model privacy risk under reconstruction attacks. Under these metrics, reconstructed images that are determined to resemble the original one generally indicate more privacy leakage. Images determined as overall dissimilar, on the other hand, indicate higher robustness against attack. However, there is no guarantee that these metrics well reflect human opinions, which, as a judgement for model privacy leakage, are more trustworthy. In this paper, we comprehensively study the faithfulness of these hand-crafted metrics to human perception of privacy information from the reconstructed images. On 5 datasets ranging from natural images, faces, to fine-grained classes, we use 4 existing attack methods to reconstruct images from many different classification models and, for each reconstructed image, we ask multiple human annotators to assess whether this image is recognizable. Our studies reveal that the hand-crafted metrics only have a weak correlation with the human evaluation of privacy leakage and that even these metrics themselves often contradict each other. These observations suggest risks of current metrics in the community. To address this potential risk, we propose a learning-based measure called SemSim to evaluate the Semantic Similarity between the original and reconstructed images. SemSim is trained with a standard triplet loss, using an original image as an anchor, one of its recognizable reconstructed images as a positive sample, and an unrecognizable one as a negative. By training on human annotations, SemSim exhibits a greater reflection of privacy leakage on the semantic level. We show that SemSim has a significantly higher correlation with human judgment compared with existing metrics. Moreover, this strong correlation generalizes to unseen datasets, models and attack methods.
人工制作的图像质量指标,例如PSNR和SSIM,在重建攻击下通常用于评估模型隐私风险。在这些指标下,确定的重构图像通常表示更多的隐私泄露。另一方面,确定的整然差异图像则表示更强的抵御攻击能力。然而,没有保证这些指标很好地反映了人类的意见,作为模型隐私泄露的判断,它们更加可靠。在本文中,我们全面研究了这些人工制作的指标对人类对重构图像的隐私信息感知的准确性的符合性。在5个数据集,包括自然图像、人脸和精细类别,我们使用4个现有的攻击方法从多个分类模型中重构图像,并为每个重构图像询问多个人类标注者是否可识别。我们的研究表明,人工制作的指标仅与人类评估隐私泄露的微弱相关,甚至这些指标本身也常常互相矛盾。这些观察暗示了社区当前指标的风险。为了应对这些潜在风险,我们提出了一种基于学习的指标,称为SemSim,以评估原始和重构图像语义相似性。SemSim使用标准三因素损失进行训练,使用原始图像作为参考,其中一个可识别的重构图像作为正样本,一个不可识别的重构图像作为负样本。通过训练人类标注,SemSim表现出在语义层面上更多的隐私泄露反映。我们表明,SemSim与人类判断的相关性比现有的指标高得多。此外,这种强相关性可以扩展到未观测的数据集、模型和攻击方法。
https://arxiv.org/abs/2309.13038
Precise crop yield prediction is essential for improving agricultural practices and ensuring crop resilience in varying climates. Integrating weather data across the growing season, especially for different crop varieties, is crucial for understanding their adaptability in the face of climate change. In the MLCAS2021 Crop Yield Prediction Challenge, we utilized a dataset comprising 93,028 training records to forecast yields for 10,337 test records, covering 159 locations across 28 U.S. states and Canadian provinces over 13 years (2003-2015). This dataset included details on 5,838 distinct genotypes and daily weather data for a 214-day growing season, enabling comprehensive analysis. As one of the winning teams, we developed two novel convolutional neural network (CNN) architectures: the CNN-DNN model, combining CNN and fully-connected networks, and the CNN-LSTM-DNN model, with an added LSTM layer for weather variables. Leveraging the Generalized Ensemble Method (GEM), we determined optimal model weights, resulting in superior performance compared to baseline models. The GEM model achieved lower RMSE (5.55% to 39.88%), reduced MAE (5.34% to 43.76%), and higher correlation coefficients (1.1% to 10.79%) when evaluated on test data. We applied the CNN-DNN model to identify top-performing genotypes for various locations and weather conditions, aiding genotype selection based on weather variables. Our data-driven approach is valuable for scenarios with limited testing years. Additionally, a feature importance analysis using RMSE change highlighted the significance of location, MG, year, and genotype, along with the importance of weather variables MDNI and AP.
精确的 crop yield 预测对于改善农业实践和确保在不同气候条件下的作物韧性至关重要。将气象数据在整个生长季节中整合,特别是针对不同作物 variety 的气象数据,对于理解它们在气候变化面前的适应能力至关重要。在 MLCAS2021 crop yield 预测挑战中,我们使用了一个包含 93,028 个训练记录的数据集,用于预测 10,337 个测试记录的 yield,覆盖了 28 个美国州和加拿大省在 13 年(2003-2015)中的 159 个地点。这个数据集包含了关于 5,838 个 distinct genotypes 和每日气象数据的详细情况,使能够进行全面分析。作为获胜团队之一,我们开发了两种新的卷积神经网络 (CNN) 架构:CNN-DNN 模型,将 CNN 和全连接网络相结合,以及 CNN-LSTM-DNN 模型,并在气象变量方面增加了 LSTM 层。利用通用群集方法 (gem),我们确定了最佳的模型权重,从而导致与基准模型相比更好的性能。gem 模型在测试数据上的 RMSE 降低到了 5.55% 到 39.88%,MAE 降低到了 5.34% 到 43.76%,并更高的 correlation 系数 (1.1% 到 10.79%)。我们应用了 CNN-DNN 模型来确定各种地点和气象条件的顶级表现 genotypes,并根据气象变量进行 genotypes 选择。我们的数据驱动方法对于测试年份有限的情况非常有价值。此外,使用 RMSE 变化的特征重要性分析强调了地点、MG、年份和 genotypes 的重要性,以及气象变量 MDNI 和 AP 的重要性。
https://arxiv.org/abs/2309.13021
Recognizing the prevalence of domain shift as a common challenge in machine learning, various domain generalization (DG) techniques have been developed to enhance the performance of machine learning systems when dealing with out-of-distribution (OOD) data. Furthermore, in real-world scenarios, data distributions can gradually change across a sequence of sequential domains. While current methodologies primarily focus on improving model effectiveness within these new domains, they often overlook fairness issues throughout the learning process. In response, we introduce an innovative framework called Counterfactual Fairness-Aware Domain Generalization with Sequential Autoencoder (CDSAE). This approach effectively separates environmental information and sensitive attributes from the embedded representation of classification features. This concurrent separation not only greatly improves model generalization across diverse and unfamiliar domains but also effectively addresses challenges related to unfair classification. Our strategy is rooted in the principles of causal inference to tackle these dual issues. To examine the intricate relationship between semantic information, sensitive attributes, and environmental cues, we systematically categorize exogenous uncertainty factors into four latent variables: 1) semantic information influenced by sensitive attributes, 2) semantic information unaffected by sensitive attributes, 3) environmental cues influenced by sensitive attributes, and 4) environmental cues unaffected by sensitive attributes. By incorporating fairness regularization, we exclusively employ semantic information for classification purposes. Empirical validation on synthetic and real-world datasets substantiates the effectiveness of our approach, demonstrating improved accuracy levels while ensuring the preservation of fairness in the evolving landscape of continuous domains.
认识到域转换是机器学习中常见的挑战,各种域扩展技术(DG)已经被开发用于提高处理非分布数据(OOD)机器学习系统的性能。此外,在现实世界场景中,数据分布可以逐步变化在一个连续的域序列中。虽然当前的方法主要关注在这些新域中提高模型有效性,但它们往往在整个学习过程中忽视公平问题。为了应对这种情况,我们提出了一种名为“反事实公平 aware 域扩展”的创新框架(CDSAE)。该方法有效地将环境信息和敏感属性从分类特征嵌入表示中分离出来。这种同时分离不仅极大地改善了跨不同熟悉域模型的泛化能力,而且还有效地解决了与不公平分类相关的挑战。我们的策略基于因果关系推理的原则,以解决这些双重问题。为了研究语义信息、敏感属性和环境 cues之间的关系,我们 systematic 地将外部不确定性因素分类为四个隐变量:1)受敏感属性影响的语义信息,2)不受敏感属性影响的语义信息,3)受敏感属性影响的环境问题,4)不受敏感属性影响的环境问题。通过引入公平 regularization,我们仅用于分类目的的语义信息。对合成数据和实际数据集的模拟验证证实了我们方法的有效性,证明了提高准确性水平,同时确保了连续域演化 landscape 中公平性的保持。
https://arxiv.org/abs/2309.13005
Nested Event Extraction (NEE) aims to extract complex event structures where an event contains other events as its arguments recursively. Nested events involve a kind of Pivot Elements (PEs) that simultaneously act as arguments of outer events and as triggers of inner events, and thus connect them into nested structures. This special characteristic of PEs brings challenges to existing NEE methods, as they cannot well cope with the dual identities of PEs. Therefore, this paper proposes a new model, called PerNee, which extracts nested events mainly based on recognizing PEs. Specifically, PerNee first recognizes the triggers of both inner and outer events and further recognizes the PEs via classifying the relation type between trigger pairs. In order to obtain better representations of triggers and arguments to further improve NEE performance, it incorporates the information of both event types and argument roles into PerNee through prompt learning. Since existing NEE datasets (e.g., Genia11) are limited to specific domains and contain a narrow range of event types with nested structures, we systematically categorize nested events in generic domain and construct a new NEE dataset, namely ACE2005-Nest. Experimental results demonstrate that PerNee consistently achieves state-of-the-art performance on ACE2005-Nest, Genia11 and Genia13.
嵌套事件提取(NEE)的目标是提取事件结构中包含其他事件作为其论据的复杂的事件。嵌套事件涉及到一种称为pivot elements(PEs)的特殊元素,它们同时作为外部事件论据和内部事件触发器,将它们连接成嵌套结构。PEs的特殊性质给现有的NEE方法带来了挑战,因为它们无法很好地处理PEs的双重身份。因此,本文提出了一种新模型,称为PerNee,它主要基于识别PEs来提取嵌套事件。具体来说,PerNee首先识别内部和外部事件的触发器,并通过分类触发器之间的关系类型来进一步识别PEs。为了获得更好的触发器和论据表示,以进一步改善NEE性能,它通过prompt learning将两种事件类型和论据角色的信息融入PerNee中。由于现有的NEE数据集(例如Gia11)仅局限于特定的领域,并包含嵌套结构和嵌套事件的狭窄范围,因此我们在通用领域 systematic 分类嵌套事件,并建立了新的NEE数据集,即ACE2005- Nest。实验结果显示,PerNee在ACE2005- Nest、Gia11和Gia13上 consistently 实现了最先进的性能。
https://arxiv.org/abs/2309.12960
Event Relation Extraction (ERE) aims to extract multiple kinds of relations among events in texts. However, existing methods singly categorize event relations as different classes, which are inadequately capturing the intrinsic semantics of these relations. To comprehensively understand their intrinsic semantics, in this paper, we obtain prototype representations for each type of event relation and propose a Prototype-Enhanced Matching (ProtoEM) framework for the joint extraction of multiple kinds of event relations. Specifically, ProtoEM extracts event relations in a two-step manner, i.e., prototype representing and prototype matching. In the first step, to capture the connotations of different event relations, ProtoEM utilizes examples to represent the prototypes corresponding to these relations. Subsequently, to capture the interdependence among event relations, it constructs a dependency graph for the prototypes corresponding to these relations and utilized a Graph Neural Network (GNN)-based module for modeling. In the second step, it obtains the representations of new event pairs and calculates their similarity with those prototypes obtained in the first step to evaluate which types of event relations they belong to. Experimental results on the MAVEN-ERE dataset demonstrate that the proposed ProtoEM framework can effectively represent the prototypes of event relations and further obtain a significant improvement over baseline models.
事件关系提取(ERE)的目标是在文本中提取不同类型的关系。然而,现有方法单独将事件关系分类为不同的类别,这些类别未能充分捕捉到这些关系的内在语义。为了全面理解这些关系的内在语义,在本文中,我们提出了每个类型的事件关系原型表示,并提出了原型增强匹配(ProtoEM)框架,用于同时提取多种类型的事件关系。具体来说,ProtoEM采用两步提取方法,即原型表示和原型匹配。在第一步中,为了捕捉不同事件关系的内涵,ProtoEM使用示例表示这些关系的原型。随后,为了捕捉事件关系之间的依赖关系,它构建了一个原型依赖图,用于表示这些关系的原型,并使用基于Graph Neural Network(GNN)模块进行建模。在第二步中,它获取了新的事件对的表示,并计算它们与在第一步中获取的原型之间的相似性,以评估它们属于哪种事件关系。Maven-ERE数据集的实验结果表明,提出的ProtoEM框架可以 effectively representing 原型事件关系原型,并进一步优于基准模型。
https://arxiv.org/abs/2309.12892
With the rapid advances in high-throughput sequencing technologies, the focus of survival analysis has shifted from examining clinical indicators to incorporating genomic profiles with pathological images. However, existing methods either directly adopt a straightforward fusion of pathological features and genomic profiles for survival prediction, or take genomic profiles as guidance to integrate the features of pathological images. The former would overlook intrinsic cross-modal correlations. The latter would discard pathological information irrelevant to gene expression. To address these issues, we present a Cross-Modal Translation and Alignment (CMTA) framework to explore the intrinsic cross-modal correlations and transfer potential complementary information. Specifically, we construct two parallel encoder-decoder structures for multi-modal data to integrate intra-modal information and generate cross-modal representation. Taking the generated cross-modal representation to enhance and recalibrate intra-modal representation can significantly improve its discrimination for comprehensive survival analysis. To explore the intrinsic crossmodal correlations, we further design a cross-modal attention module as the information bridge between different modalities to perform cross-modal interactions and transfer complementary information. Our extensive experiments on five public TCGA datasets demonstrate that our proposed framework outperforms the state-of-the-art methods.
随着高分辨率测序技术的迅速发展,生存分析的重点已经从检查临床指标转移到结合病理图像的基因组 profiles 。然而,现有的方法要么直接采用一种简单直接的融合病理特征和基因组 profiles 进行生存预测的方法,要么将基因组 profiles 作为指导,以整合病理图像的特征。前者会忽略内在的跨模态 correlation 。后者则会丢弃与基因表达无关的病理信息。为了解决这些问题,我们提出了一种跨模态翻译和对齐(CMTA)框架,以探索内在的跨模态 correlation 和传输潜在的互补信息。具体而言,我们构建了两个并行的编码-解码结构,对多模态数据整合内部模态信息,生成跨模态表示。利用生成的跨模态表示来提高和重新校准内部模态表示,可以显著改善全面生存分析的区分性。为了探索内在的跨模态 correlation,我们进一步设计了一个跨模态注意模块,作为不同模态之间的信息桥梁,进行跨模态交互和传输互补信息。我们对五个公开的 TCGA 数据集进行了广泛的实验,证明了我们提出的框架比当前最先进的方法表现更好。
https://arxiv.org/abs/2309.12855
The rising usage of AI and ML-based processing across application domains has exacerbated the need for low-cost ML implementation, specifically for resource-constrained embedded systems. To this end, approximate computing, an approach that explores the power, performance, area (PPA), and behavioral accuracy (BEHAV) trade-offs, has emerged as a possible solution for implementing embedded machine learning. Due to the predominance of MAC operations in ML, designing platform-specific approximate arithmetic operators forms one of the major research problems in approximate computing. Recently there has been a rising usage of AI/ML-based design space exploration techniques for implementing approximate operators. However, most of these approaches are limited to using ML-based surrogate functions for predicting the PPA and BEHAV impact of a set of related design decisions. While this approach leverages the regression capabilities of ML methods, it does not exploit the more advanced approaches in ML. To this end, we propose AxOCS, a methodology for designing approximate arithmetic operators through ML-based supersampling. Specifically, we present a method to leverage the correlation of PPA and BEHAV metrics across operators of varying bit-widths for generating larger bit-width operators. The proposed approach involves traversing the relatively smaller design space of smaller bit-width operators and employing its associated Design-PPA-BEHAV relationship to generate initial solutions for metaheuristics-based optimization for larger operators. The experimental evaluation of AxOCS for FPGA-optimized approximate operators shows that the proposed approach significantly improves the quality-resulting hypervolume for multi-objective optimization-of 8x8 signed approximate multipliers.
跨应用域的人工智能和机器学习的使用不断增加,这加剧了需要在低成本的机器学习实现方面的需求,特别是针对资源受限的嵌入式系统。因此,近似计算,一种探索权力、性能、区域(PPA)和行为准确性(BEHAV)权衡的方法,已经成为实现嵌入式机器学习的可能解决方案。由于机器学习中的MAC操作主导,设计特定平台的近似算术操作成为近似计算中的主要研究问题。最近,AI/ML基于设计空间探索技术来实现近似操作的方法不断增加。然而,大多数这些方法局限于使用ML基代函数来预测一系列相关设计决策的 PPA 和行为影响。虽然这种方法利用ML方法的回归能力,但它并未充分利用机器学习中的更高级方法。为此,我们提出了 AxOCS,一种通过基于机器学习的超级采样来设计近似算术操作的方法。具体来说,我们提出了一种方法,利用不同比特宽度的 Operator 之间的 PPA 和行为指标的关系来生成更大的比特宽度的 Operator。 proposed 的方法涉及穿越相对较小的设计空间,较小的比特宽度 Operator 的设计空间,并使用其相关的设计-PPA-BEHAV 关系来生成较大规模Operator 的基于启发式优化的初始解决方案。AxOCS 对于FPGA优化的近似操作实现的实验评估表明,该方法显著提高了多目标优化时8x8 sign 近似乘法的Hypervolume的质量。
https://arxiv.org/abs/2309.12830
Background: View planning for the acquisition of cardiac magnetic resonance (CMR) imaging remains a demanding task in clinical practice. Purpose: Existing approaches to its automation relied either on an additional volumetric image not typically acquired in clinic routine, or on laborious manual annotations of cardiac structural landmarks. This work presents a clinic-compatible, annotation-free system for automatic CMR view planning. Methods: The system mines the spatial relationship, more specifically, locates the intersecting lines, between the target planes and source views, and trains deep networks to regress heatmaps defined by distances from the intersecting lines. The intersection lines are the prescription lines prescribed by the technologists at the time of image acquisition using cardiac landmarks, and retrospectively identified from the spatial relationship. As the spatial relationship is self-contained in properly stored data, the need for additional manual annotation is eliminated. In addition, the interplay of multiple target planes predicted in a source view is utilized in a stacked hourglass architecture to gradually improve the regression. Then, a multi-view planning strategy is proposed to aggregate information from the predicted heatmaps for all the source views of a target plane, for a globally optimal prescription, mimicking the similar strategy practiced by skilled human prescribers. Results: The experiments include 181 CMR exams. Our system yields the mean angular difference and point-to-plane distance of 5.68 degrees and 3.12 mm, respectively. It not only achieves superior accuracy to existing approaches including conventional atlas-based and newer deep-learning-based in prescribing the four standard CMR planes but also demonstrates prescription of the first cardiac-anatomy-oriented plane(s) from the body-oriented scout.
背景:获取心脏磁共振成像的视图规划在临床实践中仍然是一项艰巨的任务。目的:现有的自动化方法依赖在 Clinic 常规中不常见的体积图像,或者对心脏结构 landmarks 的人工手动注释。本工作提出了一种 Clinic 兼容、无人工注释的自动 CMR 视图规划系统。方法:系统挖掘空间关系,更具体地说,定位交叉线,在目标平面和源视图之间,并训练深度网络以从交叉线定义的距离倒退热映射图。交叉线是使用心血管 landmarks 获取图像时医生所指定的处方线,并从历史空间关系中识别。由于空间关系在正确存储的数据中是自我封闭的,因此不再需要额外的手动注释。此外,源视图中预测的多个目标平面之间的交互作用被用于层叠的钟型结构,以逐步改善回归。然后,提出了一种多视图规划策略,将从预测的热映射图汇总的信息,以制定全局最优处方,模拟有技能的医生使用的类似策略。结果:实验包括 181 次 CMR 考试。我们的系统产生平均角差和点-平面距离分别为 5.68 度和 3.12 毫米。它不仅比现有的方法(包括传统的Atlas和最新的深度学习方法)在制定四个标准 CMR 平面的精度上表现更好,而且从身体导航的浏览中展示了第一例心脏解剖学导向平面的处方。
https://arxiv.org/abs/2309.12805
This research introduces an enhanced version of the multi-objective speech assessment model, called MOSA-Net+, by leveraging the acoustic features from large pre-trained weakly supervised models, namely Whisper, to create embedding features. The first part of this study investigates the correlation between the embedding features of Whisper and two self-supervised learning (SSL) models with subjective quality and intelligibility scores. The second part evaluates the effectiveness of Whisper in deploying a more robust speech assessment model. Third, the possibility of combining representations from Whisper and SSL models while deploying MOSA-Net+ is analyzed. The experimental results reveal that Whisper's embedding features correlate more strongly with subjective quality and intelligibility than other SSL's embedding features, contributing to more accurate prediction performance achieved by MOSA-Net+. Moreover, combining the embedding features from Whisper and SSL models only leads to marginal improvement. As compared to MOSA-Net and other SSL-based speech assessment models, MOSA-Net+ yields notable improvements in estimating subjective quality and intelligibility scores across all evaluation metrics. We further tested MOSA-Net+ on Track 3 of the VoiceMOS Challenge 2023 and obtained the top-ranked performance.
这项研究介绍了一种增强版本的多目标语音评估模型,称为MOSA-Net+,通过利用大型弱监督预训练模型Whisper的声学特征来创建嵌入特征。本研究第一部分研究了Whisper的嵌入特征与两个基于自我监督学习(SSL)模型的主观质量和语音识别得分之间的相关性。本研究第二部分评估了Whisper在部署更稳健的语音评估模型方面的 effectiveness。第三部分分析了在部署MOSA-Net+的同时,将Whisper和SSL模型的表示相结合的可能性。实验结果显示,Whisper的嵌入特征与主观质量和语音识别得分之间的相关性比SSL模型的其他嵌入特征更强,为MOSA-Net+实现的更准确的预测性能做出了贡献。此外,将Whisper和SSL模型的嵌入特征相结合仅会导致微小改进。与MOSA-Net和其他基于SSL的语音评估模型相比,MOSA-Net+在估计主观质量和语音识别得分方面实现了显著的改进。我们在2023年声音MOS挑战 track 3 上测试了MOSA-Net+,并取得了排名最高的性能。
https://arxiv.org/abs/2309.12766
Noisy multi-label learning has garnered increasing attention due to the challenges posed by collecting large-scale accurate labels, making noisy labels a more practical alternative. Motivated by noisy multi-class learning, the introduction of transition matrices can help model multi-label noise and enable the development of statistically consistent algorithms for noisy multi-label learning. However, estimating multi-label noise transition matrices remains a challenging task, as most existing estimators in noisy multi-class learning rely on anchor points and accurate fitting of noisy class posteriors, which is hard to satisfy in noisy multi-label learning. In this paper, we address this problem by first investigating the identifiability of class-dependent transition matrices in noisy multi-label learning. Building upon the identifiability results, we propose a novel estimator that leverages label correlations without the need for anchor points or precise fitting of noisy class posteriors. Specifically, we first estimate the occurrence probability of two noisy labels to capture noisy label correlations. Subsequently, we employ sample selection techniques to extract information implying clean label correlations, which are then used to estimate the occurrence probability of one noisy label when a certain clean label appears. By exploiting the mismatches in label correlations implied by these occurrence probabilities, we demonstrate that the transition matrix becomes identifiable and can be acquired by solving a bilinear decomposition problem. Theoretically, we establish an estimation error bound for our multi-label transition matrix estimator and derive a generalization error bound for our statistically consistent algorithm. Empirically, we validate the effectiveness of our estimator in estimating multi-label noise transition matrices, leading to excellent classification performance.
嘈杂的多标签学习因为收集大规模准确标签所面临的挑战而吸引了越来越多的关注,这使得嘈杂的标签成为了更实用的选择。基于嘈杂多标签学习的动机,引入转换矩阵可以帮助模型多标签噪声,使嘈杂多标签学习中的统计一致性算法更加一致。然而,估计多标签噪声转换矩阵仍然是一个具有挑战性的任务,因为大多数在嘈杂多标签学习中使用的现有估计器依赖于锚点精确匹配嘈杂类后向量,这在嘈杂多标签学习中很难满足。在本文中,我们为了解决这一问题,首先研究了多标签噪声学习中类依赖转换矩阵的可分辨性。基于可分辨性的结果,我们提出了一种新估计器,它利用标签相关性而不需要锚点或精确匹配嘈杂类后向量。具体来说,我们首先估计两个嘈杂标签的出现概率以捕捉嘈杂标签相关性。随后,我们使用样本选择技术提取意味着干净标签相关性的信息,然后将其用于估计一个干净标签出现时一个嘈杂标签的出现概率。利用这些出现概率之间的不匹配,我们证明了转换矩阵的可分辨性,并且可以通过解决双平方分解问题获得。理论上,我们制定了我们的多标签转换矩阵估计器的期望误差限,并推导了我们统计一致性算法的泛化误差限。经验上,我们验证我们估计多标签噪声转换矩阵的有效性,取得了良好的分类性能。
https://arxiv.org/abs/2309.12706
Recent advancements in Natural Language Processing (NLP) have highlighted the potential of sentence embeddings in measuring semantic similarity. Yet, its application in analyzing real-world dyadic interactions and predicting the affect of conversational participants remains largely uncharted. To bridge this gap, the present study utilizes verbal conversations within 50 married couples talking about conflicts and pleasant activities. Transformer-based model all-MiniLM-L6-v2 was employed to obtain the embeddings of the utterances from each speaker. The overall similarity of the conversation was then quantified by the average cosine similarity between the embeddings of adjacent utterances. Results showed that semantic similarity had a positive association with wives' affect during conflict (but not pleasant) conversations. Moreover, this association was not observed with husbands' affect regardless of conversation types. Two validation checks further provided support for the validity of the similarity measure and showed that the observed patterns were not mere artifacts of data. The present study underscores the potency of sentence embeddings in understanding the association between interpersonal dynamics and individual affect, paving the way for innovative applications in affective and relationship sciences.
最近的自然语言处理(NLP)进展已经强调了句子嵌入在测量语义相似性方面的潜力。然而,在分析真实世界中两男一女的互动以及预测对话参与者的影响方面,应用句子嵌入仍然在很大程度上未知。为了填补这一差距,本研究利用50对已婚夫妇在讨论冲突和愉悦活动时的口头对话。采用基于Transformer的模型all-MiniLM-L6-v2从每个参与者的说话中提取了嵌入。然后,整个对话的相似性通过相邻说话者嵌入平均余弦相似度量化。结果表明,在冲突(但非愉悦)对话中,语义相似性与妻子的情绪产生了积极关系。此外,无论对话类型如何,这种关系都没有观察到与丈夫的情绪。两个验证检查进一步支持了相似性度量的精度,并表明所观察到的模式不是数据本身的副产品。本研究强调了句子嵌入在理解人际关系动态和个人情绪之间的相互作用方面的潜力,为情感和关系科学中的创新应用铺平了道路。
https://arxiv.org/abs/2309.12646
Surface defect inspection is a very challenging task in which surface defects usually show weak appearances or exist under complex backgrounds. Most high-accuracy defect detection methods require expensive computation and storage overhead, making them less practical in some resource-constrained defect detection applications. Although some lightweight methods have achieved real-time inference speed with fewer parameters, they show poor detection accuracy in complex defect scenarios. To this end, we develop a Global Context Aggregation Network (GCANet) for lightweight saliency detection of surface defects on the encoder-decoder structure. First, we introduce a novel transformer encoder on the top layer of the lightweight backbone, which captures global context information through a novel Depth-wise Self-Attention (DSA) module. The proposed DSA performs element-wise similarity in channel dimension while maintaining linear complexity. In addition, we introduce a novel Channel Reference Attention (CRA) module before each decoder block to strengthen the representation of multi-level features in the bottom-up path. The proposed CRA exploits the channel correlation between features at different layers to adaptively enhance feature representation. The experimental results on three public defect datasets demonstrate that the proposed network achieves a better trade-off between accuracy and running efficiency compared with other 17 state-of-the-art methods. Specifically, GCANet achieves competitive accuracy (91.79% $F_{\beta}^{w}$, 93.55% $S_\alpha$, and 97.35% $E_\phi$) on SD-saliency-900 while running 272fps on a single gpu.
表面缺陷检查是一项极具挑战性的任务,通常表面缺陷呈现出较弱的外观或存在于复杂的背景中。大多数高精度的缺陷检测方法都需要昂贵的计算和存储 overhead,因此在一些资源受限的缺陷检测应用中不太实用。虽然一些轻量级方法在仅有几个参数的情况下已经可以实现实时推断速度,但在复杂的缺陷场景中表现出较差的检测精度。为此,我们开发了一种全球上下文聚合网络(GCANet),用于在编码器和解码器结构中 lightweight saliency检测表面缺陷。我们首先在轻量级骨架的顶部引入了一个新的transformer编码器,该编码器通过 novel Depth-wise Self-Attention (DSA)模块实现了全球上下文信息捕捉。 proposed DSA 在通道维度上进行元素相似性计算,同时保持线性复杂性。此外,我们在每个解码块前引入了一个 novel Channel Reference Attention (CRA)模块,以加强从bottom-up路径上下来的多级特征表示。 proposed CRA利用不同层上特征之间的通道相关性,自适应地增强特征表示。在三个公开缺陷数据集上的实验结果显示,与另外17个先进方法相比, proposed 网络在准确性和运行效率之间的更好权衡。具体来说,GCANet 在SD-saliency-900上实现了 competitive accuracy(91.79% $F_{\beta}^{w}$,93.55% $S_\alpha$,97.35% $E_\phi$),同时运行在单个GPU上的帧率为272fps。
https://arxiv.org/abs/2309.12641
In the U.S. inpatient payment system, the Diagnosis-Related Group (DRG) plays a key role but its current assignment process is time-consuming. We introduce DRG-LLaMA, a large language model (LLM) fine-tuned on clinical notes for improved DRG prediction. Using Meta's LLaMA as the base model, we optimized it with Low-Rank Adaptation (LoRA) on 236,192 MIMIC-IV discharge summaries. With an input token length of 512, DRG-LLaMA-7B achieved a macro-averaged F1 score of 0.327, a top-1 prediction accuracy of 52.0% and a macro-averaged Area Under the Curve (AUC) of 0.986. Impressively, DRG-LLaMA-7B surpassed previously reported leading models on this task, demonstrating a relative improvement in macro-averaged F1 score of 40.3% compared to ClinicalBERT and 35.7% compared to CAML. When DRG-LLaMA is applied to predict base DRGs and complication or comorbidity (CC) / major complication or comorbidity (MCC), the top-1 prediction accuracy reached 67.8% for base DRGs and 67.5% for CC/MCC status. DRG-LLaMA performance exhibits improvements in correlation with larger model parameters and longer input context lengths. Furthermore, usage of LoRA enables training even on smaller GPUs with 48 GB of VRAM, highlighting the viability of adapting LLMs for DRGs prediction.
在美国的住院支付系统中,诊断相关组(DRG)扮演着关键角色,但其当前的任务分配过程却相当耗时。我们引入了DRG-LLaMA,这是一款大型语言模型(LLM),通过在临床记录中优化,以提高DRG预测能力。我们将Meta的LLaMA作为基础模型,通过低秩适应(LoRA)优化它在236,192MIMIC-IV出院摘要中的预测性能。输入 token 长度为512,DRG-LLaMA-7B获得了 macro-averaged F1 得分0.327,top-1预测准确率为52.0%,以及 macro-averaged AUC为0.986。相比之下,DRG-LLaMA-7B在任务中超过了先前报告的主要模型,表现出相对于 ClinicalBERT 和 CAML的 macro-averaged F1 得分相对改善。当将DRG-LLaMA应用于预测基础诊断相关组和并发症或复杂性(CC)或主要并发症或复杂性(MCC)时,基础诊断相关组的预测准确率达到了67.8%,CC/MCC状态的预测准确率则达到了67.5%。DRG-LLaMA 的表现与更大的模型参数和更长的输入上下文长度呈相关改善。此外,使用LoRA可以使在小GPU上使用48GB VRAM的训练中进行训练,强调了 adaptLLMs 对 DRG 预测的可行性。
https://arxiv.org/abs/2309.12625
Dual-path is a popular architecture for speech separation models (e.g. Sepformer) which splits long sequences into overlapping chunks for its intra- and inter-blocks that separately model intra-chunk local features and inter-chunk global relationships. However, it has been found that inter-blocks, which comprise half a dual-path model's parameters, contribute minimally to performance. Thus, we propose the Single-Path Global Modulation (SPGM) block to replace inter-blocks. SPGM is named after its structure consisting of a parameter-free global pooling module followed by a modulation module comprising only 2% of the model's total parameters. The SPGM block allows all transformer layers in the model to be dedicated to local feature modelling, making the overall model single-path. SPGM achieves 22.1 dB SI-SDRi on WSJ0-2Mix and 20.4 dB SI-SDRi on Libri2Mix, exceeding the performance of Sepformer by 0.5 dB and 0.3 dB respectively and matches the performance of recent SOTA models with up to 8 times fewer parameters.
双路径是一种广泛应用于语音分离模型(例如Sepformer)的常见架构,该模型将长序列分成重叠的块,以便内部块和外部块分别建模内部块 local 特征和外部块 global 关系。然而,我们发现外部块,占双路径模型参数的一半,对性能的贡献较小。因此,我们建议采用Single-Path Global Modulation(SPGM)块来取代外部块。SPGM块以其结构命名,由一个参数免费的全球聚合模块和仅模型总参数的2%的调制模块组成。SPGM块允许模型中的所有变压器层专门用于 local 特征建模,从而使整个模型成为双路径。在 WSJ0-2 混合中,SPGM 实现 22.1 dBSI-SDRi,在 Libri2 混合中实现 20.4 dBSI-SDRi,分别比 Sepformer 表现高出 0.5 dB 和 0.3 dB,并与最近的 SOTA 模型,参数数量不到8倍的 recent models 表现相当。
https://arxiv.org/abs/2309.12608
In an efficient and flexible human-robot collaborative work environment, a robot team member must be able to recognize both explicit requests and implied actions from human users. Identifying "what to do" in such cases requires an agent to have the ability to construct associations between objects, their actions, and the effect of actions on the environment. In this regard, semantic memory is being introduced to understand the explicit cues and their relationships with available objects and required skills to make "tea" and "sandwich". We have extended our previous hierarchical robot control architecture to add the capability to execute the most appropriate task based on both feedback from the user and the environmental context. To validate this system, two types of skills were implemented in the hierarchical task tree: 1) Tea making skills and 2) Sandwich making skills. During the conversation between the robot and the human, the robot was able to determine the hidden context using ontology and began to act accordingly. For instance, if the person says "I am thirsty" or "It is cold outside" the robot will start to perform the tea-making skill. In contrast, if the person says, "I am hungry" or "I need something to eat", the robot will make the sandwich. A humanoid robot Baxter was used for this experiment. We tested three scenarios with objects at different positions on the table for each skill. We observed that in all cases, the robot used only objects that were relevant to the skill.
在高效且灵活的人类机器人协作工作环境中,机器人团队成员必须能够从人类用户那里识别 both 明确请求和隐含行动。在这种情况下,确定“做什么”需要代理具有构建对象、其行动和环境行动之间的关联的能力。因此,正在引入语义记忆,以理解可用对象和做出“茶”和“三明治”等食品制作所需的技能的具体线索及其与物品和技能的关系。我们已经扩展了以前的层次结构机器人控制架构,增加了能够基于用户反馈和环境上下文执行最适当的任务的能力。为了验证此系统,在层次任务树中实施了两种技能:1) 茶制作技能和2) 三明治制作技能。在机器人和人类的交流中,机器人使用本体论来确定隐藏的上下文,并据此采取行动。例如,如果人会说“我口渴”或“外面很冷”,机器人就会开始执行茶制作技能。相反,如果人会说“我饿了”或“我需要吃点东西”,机器人就会做三明治。该机器人使用了一个类似人型的机器人 Baxter 进行实验。我们测试了三种情况,每个技能都有在桌子上不同位置的物品。我们观察到,在所有情况下,机器人只使用了与技能相关的物品。
https://arxiv.org/abs/2309.12562
While deep learning enables real robots to perform complex tasks had been difficult to implement in the past, the challenge is the enormous amount of trial-and-error and motion teaching in a real environment. The manipulation of moving objects, due to their dynamic properties, requires learning a wide range of factors such as the object's position, movement speed, and grasping timing. We propose a data augmentation method for enabling a robot to grasp moving objects with different speeds and grasping timings at low cost. Specifically, the robot is taught to grasp an object moving at low speed using teleoperation, and multiple data with different speeds and grasping timings are generated by down-sampling and padding the robot sensor data in the time-series direction. By learning multiple sensor data in a time series, the robot can generate motions while adjusting the grasping timing for unlearned movement speeds and sudden speed changes. We have shown using a real robot that this data augmentation method facilitates learning the relationship between object position and velocity and enables the robot to perform robust grasping motions for unlearned positions and objects with dynamically changing positions and velocities.
虽然在深度学习使得真正的机器人能够执行复杂的任务在过去非常困难,但挑战在于在实际环境中进行大量的试错和运动教学。由于物体的动态性质,需要进行大量的学习,如物体的位置、运动速度、抓握时机等。我们提出了一种数据增强方法,以使机器人以不同的速度和抓握时机抓住运动物体,从而降低成本。具体来说,机器人通过远程操作学习以低速度抓住一个物体,同时通过时间序列方向削减和填充机器人传感器数据,生成多个速度不同、抓握时机不同的数据。通过学习多个传感器数据的时间序列,机器人可以在未学习的运动速度和突然速度变化时生成运动。我们使用真实的机器人展示了这种方法可以帮助学习物体位置和速度之间的关系,并使机器人能够执行对未学习位置和动态变化位置和速度的物体的稳健抓握运动。
https://arxiv.org/abs/2309.12547
In human-robot collaboration, there has been a trade-off relationship between the speed of collaborative robots and the safety of human workers. In our previous paper, we introduced a time-optimal path tracking algorithm designed to maximize speed while ensuring safety for human workers. This algorithm runs in real-time and provides the safe and fastest control input for every cycle with respect to ISO standards. However, true optimality has not been achieved due to inaccurate distance computation resulting from conservative model simplification. To attain true optimality, we require a method that can compute distances 1. at many robot configurations to examine along a trajectory 2. in real-time for online robot control 3. as precisely as possible for optimal control. In this paper, we propose a batched, fast and precise distance checking method based on precomputed link-local SDFs. Our method can check distances for 500 waypoints along a trajectory within less than 1 millisecond using a GPU at runtime, making it suited for time-critical robotic control. Additionally, a neural approximation has been proposed to accelerate preprocessing by a factor of 2. Finally, we experimentally demonstrate that our method can navigate a 6-DoF robot earlier than a geometric-primitives-based distance checker in a dynamic and collaborative environment.
在人类机器人的合作中,合作机器人的速度与人类工人的安全之间存在一种权衡关系。在我们之前的文章中,我们介绍了一种时间最优路径跟踪算法,旨在最大化速度,同时确保人类工人的安全。该算法实时运行,并按照ISO标准为每个周期提供安全和最快的控制输入。然而,由于保守模型简化的不准确计算,并未实现真正的最优性。要实现真正的最优性,我们需要一种方法,它可以在多个机器人配置下计算距离,并在实时在线机器人控制中计算距离,以及尽可能准确地进行最优控制。在本文中,我们提出了一种基于预计算链接local SDF的批量快速精确距离检查方法。我们的方法使用GPU在运行时计算轨迹上的500个 Waypoints 的距离,小于1秒钟,使其适用于时间紧张的机器人控制。此外,我们提出了一种神经网络近似,以加速预处理的2倍速度。最后,我们实验证实,我们的方法可以在动态和协作环境中更快地导航6自由度机器人,比基于几何基本点的距离检查方法更早。
https://arxiv.org/abs/2309.12543
Many mathematical models have been leveraged to design embeddings for representing Knowledge Graph (KG) entities and relations for link prediction and many downstream tasks. These mathematically-inspired models are not only highly scalable for inference in large KGs, but also have many explainable advantages in modeling different relation patterns that can be validated through both formal proofs and empirical results. In this paper, we make a comprehensive overview of the current state of research in KG completion. In particular, we focus on two main branches of KG embedding (KGE) design: 1) distance-based methods and 2) semantic matching-based methods. We discover the connections between recently proposed models and present an underlying trend that might help researchers invent novel and more effective models. Next, we delve into CompoundE and CompoundE3D, which draw inspiration from 2D and 3D affine operations, respectively. They encompass a broad spectrum of techniques including distance-based and semantic-based methods. We will also discuss an emerging approach for KG completion which leverages pre-trained language models (PLMs) and textual descriptions of entities and relations and offer insights into the integration of KGE embedding methods with PLMs for KG completion.
许多数学模型被用来利用设计表示知识图(KG)实体和关系嵌入,以进行链接预测和其他许多后续任务。这些数学模型不仅具有在大型KG中进行推理的高度可扩展性,而且还具有许多可解释的优势,在建模不同的关系模式时,可以通过形式证明和实证结果进行验证。在本文中,我们进行了全面综述KG完成的研究现状。特别是,我们重点关注KG嵌入(KGE)设计的两个主要分支:距离方法和语义匹配方法。我们发现了最近提出模型之间的联系,并提出了可能有助于研究人员发明新且更有效模型的潜在趋势。接下来,我们将探讨结合化合物E和化合物E3D,分别从2D和3D阿夫洛夫操作中汲取灵感。它们涵盖了包括距离方法和语义方法在内的广泛这些方法。此外,我们还将讨论KG完成新兴的方法,利用预训练的语言模型(PLMs)和实体和关系文本描述,并提供关于将KGE嵌入方法与PLMs用于KG完成之间的集成的洞察。
https://arxiv.org/abs/2309.12501
Instruction-tuned Large Language Models (It-LLMs) have been exhibiting outstanding abilities to reason around cognitive states, intentions, and reactions of all people involved, letting humans guide and comprehend day-to-day social interactions effectively. In fact, several multiple-choice questions (MCQ) benchmarks have been proposed to construct solid assessments of the models' abilities. However, earlier works are demonstrating the presence of inherent "order bias" in It-LLMs, posing challenges to the appropriate evaluation. In this paper, we investigate It-LLMs' resilience abilities towards a series of probing tests using four MCQ benchmarks. Introducing adversarial examples, we show a significant performance gap, mainly when varying the order of the choices, which reveals a selection bias and brings into discussion reasoning abilities. Following a correlation between first positions and model choices due to positional bias, we hypothesized the presence of structural heuristics in the decision-making process of the It-LLMs, strengthened by including significant examples in few-shot scenarios. Finally, by using the Chain-of-Thought (CoT) technique, we elicit the model to reason and mitigate the bias by obtaining more robust models.
指令优化的大型语言模型(It-LLMs)表现出卓越的推理能力,能够处理所有参与者的认知状态、意图和反应,让人类能够有效地指导和理解日常的社交互动。实际上,已经提出了几个多选题问题(MCQ)基准来构建模型能力的全面评估。然而,以前的研究已经表明It-LLMs存在固有的“顺序偏差”,这给适当的评估带来了挑战。在本文中,我们使用四个MCQ基准来研究It-LLMs在面对一系列测试时的坚韧能力。引入对抗性示例后,我们展示了显著的性能差距,主要是在改变选择顺序时,这揭示了选择偏差,并讨论了推理能力。由于位置偏差的影响,我们假设It-LLMs的决策过程中存在结构性启发式,通过在少量情况下包含重要的例子来加强。最后,我们使用思维链(CoT)技术,激发模型推理并减轻偏差,通过获得更加可靠的模型。
https://arxiv.org/abs/2309.12481
Applying link prediction (LP) methods over knowledge graphs (KG) for tasks such as causal event prediction presents an exciting opportunity. However, typical LP models are ill-suited for this task as they are incapable of performing inductive link prediction for new, unseen event entities and they require retraining as knowledge is added or changed in the underlying KG. We introduce a case-based reasoning model, EvCBR, to predict properties about new consequent events based on similar cause-effect events present in the KG. EvCBR uses statistical measures to identify similar events and performs path-based predictions, requiring no training step. To generalize our methods beyond the domain of event prediction, we frame our task as a 2-hop LP task, where the first hop is a causal relation connecting a cause event to a new effect event and the second hop is a property about the new event which we wish to predict. The effectiveness of our method is demonstrated using a novel dataset of newsworthy events with causal relations curated from Wikidata, where EvCBR outperforms baselines including translational-distance-based, GNN-based, and rule-based LP models.
将链接预测方法(LP)应用于知识图谱(KG)中的任务,如因果关系预测,带来了令人兴奋的机会。然而,典型的LP模型不适合这项工作,因为它们无法对新 unseen 事件实体进行归纳链接预测,并且需要随着在底层KG中新知识的添加或更改而重新训练。我们引入了一种基于案例推理模型的例子推理模型( EvCBR),以预测新后继事件的属性,基于在KG中存在的类似因果关系的事件。 EvCBR 使用统计方法识别类似事件,并使用路径预测方法进行预测,不需要训练步骤。为了将我们的方法扩展到事件预测领域的之外,我们将任务定义成两个hop的LP任务,其中第一个hop是一个因果关系连接一个 cause 事件和一个 new 效应事件,第二个hop是我们希望预测的新事件的属性。我们的方法的效果通过使用一个从 Wikidata 整理的有价值的新闻事件数据集来展示,在该数据集中, EvCBR 比包括 Translation-distance-based、GNN-based 和规则为基础的 LP 模型的基准表现更好。
https://arxiv.org/abs/2309.12423