Learning for robot navigation presents a critical and challenging task. The scarcity and costliness of real-world datasets necessitate efficient learning approaches. In this letter, we exploit Euclidean symmetry in planning for 2D navigation, which originates from Euclidean transformations between reference frames and enables parameter sharing. To address the challenges of unstructured environments, we formulate the navigation problem as planning on a geometric graph and develop an equivariant message passing network to perform value iteration. Furthermore, to handle multi-camera input, we propose a learnable equivariant layer to lift features to a desired space. We conduct comprehensive evaluations across five diverse tasks encompassing structured and unstructured environments, along with maps of known and unknown, given point goals or semantic goals. Our experiments confirm the substantial benefits on training efficiency, stability, and generalization.
学习机器人导航是一项关键且具有挑战性的任务。现实世界数据稀缺且昂贵,需要高效的学习方法。在这个信中,我们利用欧几里得对称性在规划2D导航时利用,该对称性源自参考框架之间的欧几里得变换,并实现了参数共享。为了解决无结构环境的挑战,我们将导航问题转化为几何图的规划中,并开发了一种等变消息传递网络,以进行价值迭代。此外,为了处理多摄像头输入,我们提出了一个可学习等变层,将特征提高到我们希望的空间。我们涵盖了结构化和无结构环境的五类不同任务,以及已知和未知的地图,并给定点目标或语义目标。我们的实验证实,训练效率、稳定性和泛化等方面获得了实质性的好处。
https://arxiv.org/abs/2309.13043
We present MosaicFusion, a simple yet effective diffusion-based data augmentation approach for large vocabulary instance segmentation. Our method is training-free and does not rely on any label supervision. Two key designs enable us to employ an off-the-shelf text-to-image diffusion model as a useful dataset generator for object instances and mask annotations. First, we divide an image canvas into several regions and perform a single round of diffusion process to generate multiple instances simultaneously, conditioning on different text prompts. Second, we obtain corresponding instance masks by aggregating cross-attention maps associated with object prompts across layers and diffusion time steps, followed by simple thresholding and edge-aware refinement processing. Without bells and whistles, our MosaicFusion can produce a significant amount of synthetic labeled data for both rare and novel categories. Experimental results on the challenging LVIS long-tailed and open-vocabulary benchmarks demonstrate that MosaicFusion can significantly improve the performance of existing instance segmentation models, especially for rare and novel categories. Code will be released at this https URL.
我们提出了MosaicFusion,一种简单但有效的扩散based数据增强方法,用于大规模词汇实例分割。我们的方法不需要训练,并依赖于任何标签监督。两个关键设计使我们可以使用现成的文本到图像扩散模型,作为对象实例和 mask 注释有用的数据生成器。首先,我们将图像 canvas 分为多个区域,并执行一次扩散过程,以同时生成多个实例,根据不同的文本提示条件进行训练。其次,我们获得相应的实例掩膜,通过汇总跨层和扩散时间步的对象提示相关的交叉注意力地图,然后简单的阈值和边缘aware 优化处理。在没有花哨的功能的情况下,MosaicFusion 可以生成大量的合成标注数据,特别是对于稀有和新颖类别。在挑战性的LVIS长长尾和开放词汇基准测试中,实验结果证明MosaicFusion可以显著改进现有的实例分割模型的性能,特别是稀有和新颖类别。代码将在本httpsURL上发布。
https://arxiv.org/abs/2309.13042
Conformers have recently been proposed as a promising modelling approach for automatic speech recognition (ASR), outperforming recurrent neural network-based approaches and transformers. Nevertheless, in general, the performance of these end-to-end models, especially attention-based models, is particularly degraded in the case of long utterances. To address this limitation, we propose adding a fully-differentiable memory-augmented neural network between the encoder and decoder of a conformer. This external memory can enrich the generalization for longer utterances since it allows the system to store and retrieve more information recurrently. Notably, we explore the neural Turing machine (NTM) that results in our proposed Conformer-NTM model architecture for ASR. Experimental results using Librispeech train-clean-100 and train-960 sets show that the proposed system outperforms the baseline conformer without memory for long utterances.
Conformers 最近被提出作为自动语音识别(ASR)的一种有前途的建模方法,比基于循环神经网络的方法和变压器表现更好。然而,总的来说,这些端到端模型,特别是基于注意力的模型,在较长的发言中表现特别差。为了解决这个问题,我们建议在一个 conformer 的编码器和解码器之间添加一个全变分的增强记忆神经网络。这个外部记忆可以丰富对更长发言的泛化,因为它允许系统多次存储和检索更多的信息。值得注意的是,我们探索了导致我们提出的 conformer-NTM 模型架构的神经网络 Turing 机器(NTM)。使用 LibriSpeech 训练- clean-100 和训练-960 集的实验结果表明, proposed 系统在较长的发言中比无记忆的基础 conformer 表现更好。
https://arxiv.org/abs/2309.13029
Precise crop yield prediction is essential for improving agricultural practices and ensuring crop resilience in varying climates. Integrating weather data across the growing season, especially for different crop varieties, is crucial for understanding their adaptability in the face of climate change. In the MLCAS2021 Crop Yield Prediction Challenge, we utilized a dataset comprising 93,028 training records to forecast yields for 10,337 test records, covering 159 locations across 28 U.S. states and Canadian provinces over 13 years (2003-2015). This dataset included details on 5,838 distinct genotypes and daily weather data for a 214-day growing season, enabling comprehensive analysis. As one of the winning teams, we developed two novel convolutional neural network (CNN) architectures: the CNN-DNN model, combining CNN and fully-connected networks, and the CNN-LSTM-DNN model, with an added LSTM layer for weather variables. Leveraging the Generalized Ensemble Method (GEM), we determined optimal model weights, resulting in superior performance compared to baseline models. The GEM model achieved lower RMSE (5.55% to 39.88%), reduced MAE (5.34% to 43.76%), and higher correlation coefficients (1.1% to 10.79%) when evaluated on test data. We applied the CNN-DNN model to identify top-performing genotypes for various locations and weather conditions, aiding genotype selection based on weather variables. Our data-driven approach is valuable for scenarios with limited testing years. Additionally, a feature importance analysis using RMSE change highlighted the significance of location, MG, year, and genotype, along with the importance of weather variables MDNI and AP.
精确的 crop yield 预测对于改善农业实践和确保在不同气候条件下的作物韧性至关重要。将气象数据在整个生长季节中整合,特别是针对不同作物 variety 的气象数据,对于理解它们在气候变化面前的适应能力至关重要。在 MLCAS2021 crop yield 预测挑战中,我们使用了一个包含 93,028 个训练记录的数据集,用于预测 10,337 个测试记录的 yield,覆盖了 28 个美国州和加拿大省在 13 年(2003-2015)中的 159 个地点。这个数据集包含了关于 5,838 个 distinct genotypes 和每日气象数据的详细情况,使能够进行全面分析。作为获胜团队之一,我们开发了两种新的卷积神经网络 (CNN) 架构:CNN-DNN 模型,将 CNN 和全连接网络相结合,以及 CNN-LSTM-DNN 模型,并在气象变量方面增加了 LSTM 层。利用通用群集方法 (gem),我们确定了最佳的模型权重,从而导致与基准模型相比更好的性能。gem 模型在测试数据上的 RMSE 降低到了 5.55% 到 39.88%,MAE 降低到了 5.34% 到 43.76%,并更高的 correlation 系数 (1.1% 到 10.79%)。我们应用了 CNN-DNN 模型来确定各种地点和气象条件的顶级表现 genotypes,并根据气象变量进行 genotypes 选择。我们的数据驱动方法对于测试年份有限的情况非常有价值。此外,使用 RMSE 变化的特征重要性分析强调了地点、MG、年份和 genotypes 的重要性,以及气象变量 MDNI 和 AP 的重要性。
https://arxiv.org/abs/2309.13021
Sparse training is one of the promising techniques to reduce the computational cost of DNNs while retaining high accuracy. In particular, N:M fine-grained structured sparsity, where only N out of consecutive M elements can be nonzero, has attracted attention due to its hardware-friendly pattern and capability of achieving a high sparse ratio. However, the potential to accelerate N:M sparse DNN training has not been fully exploited, and there is a lack of efficient hardware supporting N:M sparse training. To tackle these challenges, this paper presents a computation-efficient training scheme for N:M sparse DNNs using algorithm, architecture, and dataflow co-design. At the algorithm level, a bidirectional weight pruning method, dubbed BDWP, is proposed to leverage the N:M sparsity of weights during both forward and backward passes of DNN training, which can significantly reduce the computational cost while maintaining model accuracy. At the architecture level, a sparse accelerator for DNN training, namely SAT, is developed to neatly support both the regular dense operations and the computation-efficient N:M sparse operations. At the dataflow level, multiple optimization methods ranging from interleave mapping, pre-generation of N:M sparse weights, and offline scheduling, are proposed to boost the computational efficiency of SAT. Finally, the effectiveness of our training scheme is evaluated on a Xilinx VCU1525 FPGA card using various DNN models and datasets. Experimental results show the SAT accelerator with the BDWP sparse training method under 2:8 sparse ratio achieves an average speedup of 1.75x over that with the dense training, accompanied by a negligible accuracy loss of 0.56% on average. Furthermore, our proposed training scheme significantly improves the training throughput by 2.97~25.22x and the energy efficiency by 1.36~3.58x over prior FPGA-based accelerators.
稀疏训练是一种有前途的技术,能够在保留高准确性的同时降低深度学习系统的计算成本。特别是,具有 N:M Fine-grained Structured sparsity 的稀疏结构,其中只有 N 个连续的元素中才有非零值,因此备受关注,因为它具有硬件友好的模式和实现高稀疏比例的能力。然而,加速 N:M 稀疏深度学习训练的潜力尚未得到充分利用,缺乏有效的硬件支持 N:M 稀疏训练。为了解决这些挑战,本文提出了一种计算高效的训练方案,使用算法、结构和数据流的共同设计。在算法层面上,我们提出了一种双向 weight 压缩方法,称为 BDWP,在深度学习训练的forward和backward 过程中利用 N:M 的稀疏权重,可以显著降低计算成本,同时保持模型精度。在架构层面上,我们开发了稀疏深度学习加速器,名为 SAT,以方便支持标准的DenseOps 和计算高效的 N:M 稀疏Ops。在数据流层面上,我们提出了多种优化方法,包括InterleaveMapping、N:M 稀疏权重的预先生成和离线调度,以提高 SAT 的计算效率。最后,我们对我们的训练方案的有效性在 Xilinx VCU1525 FPGA card上进行了评估,使用各种深度学习模型和数据集。实验结果表明,使用 BDWP 稀疏训练方法的 SAT 加速器在 2:8 稀疏比例下实现平均速度提高 1.75 倍,与Dense训练相比,平均精度损失几乎忽略不计。此外,我们提出的训练方案显著提高了训练吞吐量 2.97~25.22 倍,以及与先前基于 FPGA 的加速器相比,能源效率提高了 1.36~3.58 倍。
https://arxiv.org/abs/2309.13015
Large Language Models (LLMs) still struggle with complex reasoning tasks. Motivated by the society of minds (Minsky, 1988), we propose ReConcile, a multi-model multi-agent framework designed as a round table conference among diverse LLM agents to foster diverse thoughts and discussion for improved consensus. ReConcile enhances the reasoning capabilities of LLMs by holding multiple rounds of discussion, learning to convince other agents to improve their answers, and employing a confidence-weighted voting mechanism. In each round, ReConcile initiates discussion between agents via a 'discussion prompt' that consists of (a) grouped answers and explanations generated by each agent in the previous round, (b) their uncertainties, and (c) demonstrations of answer-rectifying human explanations, used for convincing other agents. This discussion prompt enables each agent to revise their responses in light of insights from other agents. Once a consensus is reached and the discussion ends, ReConcile determines the final answer by leveraging the confidence of each agent in a weighted voting scheme. We implement ReConcile with ChatGPT, Bard, and Claude2 as the three agents. Our experimental results on various benchmarks demonstrate that ReConcile significantly enhances the reasoning performance of the agents (both individually and as a team), surpassing prior single-agent and multi-agent baselines by 7.7% and also outperforming GPT-4 on some of these datasets. We also experiment with GPT-4 itself as one of the agents in ReConcile and demonstrate that its initial performance also improves by absolute 10.0% through discussion and feedback from other agents. Finally, we also analyze the accuracy after every round and observe that ReConcile achieves better and faster consensus between agents, compared to a multi-agent debate baseline. Our code is available at: this https URL
大型语言模型(LLM)仍然面临复杂的推理任务。受到思维社会的启发(米斯基,1988年),我们提出了Reconcile,它是一个多模型多Agent框架,设计为在一个多样化的LLM代理之间的圆桌会议中促进多样化的思考和讨论,以改善共识。Reconcile通过多次讨论增强LLM的推理能力,学习说服其他代理改善他们的答案,并使用信心加权投票机制。在每个回合中,Reconcile通过一个“讨论prompt”启动代理之间的讨论,其中(a)包括每个代理在上一个回合中生成的分组答案和解释,(b)是他们的不确定性,(c)是人类解释的演示,用于说服其他代理。这个讨论prompt使每个代理根据其他代理的见解更新他们的答案。一旦共识达成并讨论结束,Reconcile通过利用每个代理的信心加权投票机制确定最终答案。我们使用ChatGPT、 Bard和Claude2作为三个代理,我们的各种基准实验结果表明,Reconcile极大地增强了代理的推理表现(个体和团队),超过先前的单代理和多代理基准7.7%,并且在这些数据集上比GPT-4表现更好。我们也尝试以GPT-4作为Reconcile中的代理之一进行实验,并证明其初始表现也改善了absolute 10.0%。最后,我们还分析每个回合的精度,并观察到Reconcile通过代理之间的讨论实现更好的和更快的共识,相比多代理辩论基准。我们的代码可在以下httpsURL上获取:
https://arxiv.org/abs/2309.13007
Recognizing the prevalence of domain shift as a common challenge in machine learning, various domain generalization (DG) techniques have been developed to enhance the performance of machine learning systems when dealing with out-of-distribution (OOD) data. Furthermore, in real-world scenarios, data distributions can gradually change across a sequence of sequential domains. While current methodologies primarily focus on improving model effectiveness within these new domains, they often overlook fairness issues throughout the learning process. In response, we introduce an innovative framework called Counterfactual Fairness-Aware Domain Generalization with Sequential Autoencoder (CDSAE). This approach effectively separates environmental information and sensitive attributes from the embedded representation of classification features. This concurrent separation not only greatly improves model generalization across diverse and unfamiliar domains but also effectively addresses challenges related to unfair classification. Our strategy is rooted in the principles of causal inference to tackle these dual issues. To examine the intricate relationship between semantic information, sensitive attributes, and environmental cues, we systematically categorize exogenous uncertainty factors into four latent variables: 1) semantic information influenced by sensitive attributes, 2) semantic information unaffected by sensitive attributes, 3) environmental cues influenced by sensitive attributes, and 4) environmental cues unaffected by sensitive attributes. By incorporating fairness regularization, we exclusively employ semantic information for classification purposes. Empirical validation on synthetic and real-world datasets substantiates the effectiveness of our approach, demonstrating improved accuracy levels while ensuring the preservation of fairness in the evolving landscape of continuous domains.
认识到域转换是机器学习中常见的挑战,各种域扩展技术(DG)已经被开发用于提高处理非分布数据(OOD)机器学习系统的性能。此外,在现实世界场景中,数据分布可以逐步变化在一个连续的域序列中。虽然当前的方法主要关注在这些新域中提高模型有效性,但它们往往在整个学习过程中忽视公平问题。为了应对这种情况,我们提出了一种名为“反事实公平 aware 域扩展”的创新框架(CDSAE)。该方法有效地将环境信息和敏感属性从分类特征嵌入表示中分离出来。这种同时分离不仅极大地改善了跨不同熟悉域模型的泛化能力,而且还有效地解决了与不公平分类相关的挑战。我们的策略基于因果关系推理的原则,以解决这些双重问题。为了研究语义信息、敏感属性和环境 cues之间的关系,我们 systematic 地将外部不确定性因素分类为四个隐变量:1)受敏感属性影响的语义信息,2)不受敏感属性影响的语义信息,3)受敏感属性影响的环境问题,4)不受敏感属性影响的环境问题。通过引入公平 regularization,我们仅用于分类目的的语义信息。对合成数据和实际数据集的模拟验证证实了我们方法的有效性,证明了提高准确性水平,同时确保了连续域演化 landscape 中公平性的保持。
https://arxiv.org/abs/2309.13005
In machine translation, a common problem is that the translation of certain words even if translated can cause incomprehension of the target language audience due to different cultural backgrounds. A solution to solve this problem is to add explanations for these words. In a first step, we therefore need to identify these words or phrases. In this work we explore techniques to extract example explanations from a parallel corpus. However, the sparsity of sentences containing words that need to be explained makes building the training dataset extremely difficult. In this work, we propose a semi-automatic technique to extract these explanations from a large parallel corpus. Experiments on English->German language pair show that our method is able to extract sentence so that more than 10% of the sentences contain explanation, while only 1.9% of the original sentences contain explanations. In addition, experiments on English->French and English->Chinese language pairs also show similar conclusions. This is therefore an essential first automatic step to create a explanation dataset. Furthermore we show that the technique is robust for all three language pairs.
在机器翻译中,一个常见问题是,即使翻译了某些单词,由于目标语言文化背景的不同,也会导致 target 语言观众无法理解。解决这个问题的解决方法是对这些单词或短语添加解释。因此,的第一步是确定这些单词或短语。在这个研究中,我们探索了从并行语料库中提取示例解释的技术。然而,包含需要解释的单词的句子数量稀少,使得构建训练数据集非常困难。在这个研究中,我们提出了一种半自动的方法,从大型并行语料库中提取这些解释。对英语到德语语言对进行了实验,表明我们的方法和方法能够提取句子,使得超过 10% 的句子包含解释,而只有 1.9% 的原本句子包含解释。对英语到法语和英语到中文语言对也进行了实验,也得出类似的结论。因此,这是一个创建解释数据集的 essential 的第一步。此外,我们还表明,该方法对这三个语言对都是可靠的。
https://arxiv.org/abs/2309.12998
Despite the recent successes of vanilla Graph Neural Networks (GNNs) on many tasks, their foundation on pairwise interaction networks inherently limits their capacity to discern latent higher-order interactions in complex systems. To bridge this capability gap, we propose a novel approach exploiting the rich mathematical theory of simplicial complexes (SCs) - a robust tool for modeling higher-order interactions. Current SC-based GNNs are burdened by high complexity and rigidity, and quantifying higher-order interaction strengths remains challenging. Innovatively, we present a higher-order Flower-Petals (FP) model, incorporating FP Laplacians into SCs. Further, we introduce a Higher-order Graph Convolutional Network (HiGCN) grounded in FP Laplacians, capable of discerning intrinsic features across varying topological scales. By employing learnable graph filters, a parameter group within each FP Laplacian domain, we can identify diverse patterns where the filters' weights serve as a quantifiable measure of higher-order interaction strengths. The theoretical underpinnings of HiGCN's advanced expressiveness are rigorously demonstrated. Additionally, our empirical investigations reveal that the proposed model accomplishes state-of-the-art (SOTA) performance on a range of graph tasks and provides a scalable and flexible solution to explore higher-order interactions in graphs.
尽管 vanilla 图形神经网络(GNN)在多项任务上取得了 recent 的成功,但他们基于点对点交互网络的基础本身限制了他们在复杂系统中发现潜在高级别交互的能力。为了填补这一能力差距,我们提出了一种创新的方法,利用线段组合丰富的数学理论 - 一种用于建模高级别交互的强大工具。目前基于SC的GNNs 承受着高复杂性和Rigidity 的负载,量化高级别交互 strengths 仍然是一项挑战。创新性地,我们提出了高级别 flower-petal(FP)模型,将FP拉普拉斯算子融入SC中。进一步,我们介绍了基于FP拉普拉斯算子的高级别图形卷积网络(HiGCN),能够在不同拓扑级别的上识别内在特征。通过使用可学习图形滤波器,每个FP拉普拉斯域中的参数组,我们可以识别不同的模式,这些滤波器的权重作为高级别交互 strengths 的可量化衡量标准。HiGCN 的高级表达能力的理论基础得到了严格证明。此外,我们的实证研究表明,我们提出的模型在多种图形任务中实现了最先进的表现,并提供了一种可扩展且灵活的解决方案,以探索图形中的高级别交互。
https://arxiv.org/abs/2309.12971
Assurance cases can be used to argue for the safety of products in safety engineering. In safety-critical areas, the construction of assurance cases is indispensable. Trustworthiness Derivation Trees (TDTs) enhance assurance cases by incorporating formal methods, rendering it possible for automatic reasoning about assurance cases. We present Trustworthiness Derivation Tree Analyzer (Trusta), a desktop application designed to automatically construct and verify TDTs. The tool has a built-in Prolog interpreter in its backend, and is supported by the constraint solvers Z3 and MONA. Therefore, it can solve constraints about logical formulas involving arithmetic, sets, Horn clauses etc. Trusta also utilizes large language models to make the creation and evaluation of assurance cases more convenient. It allows for interactive human examination and modification. We evaluated top language models like ChatGPT-3.5, ChatGPT-4, and PaLM 2 for generating assurance cases. Our tests showed a 50%-80% similarity between machine-generated and human-created cases. In addition, Trusta can extract formal constraints from text in natural languages, facilitating an easier interpretation and validation process. This extraction is subject to human review and correction, blending the best of automated efficiency with human insight. To our knowledge, this marks the first integration of large language models in automatic creating and reasoning about assurance cases, bringing a novel approach to a traditional challenge. Through several industrial case studies, Trusta has proven to quickly find some subtle issues that are typically missed in manual inspection, demonstrating its practical value in enhancing the assurance case development process.
在安全性关键领域,建设质量保证案例是不可或缺的。 trustworthinessDerivation Trees(TDT)通过引入形式方法,提高了质量保证案例的质量,使得对质量保证案例进行自动推理变得可能。我们开发了 trustworthinessDerivation Tree Analyzer( Trusta),这是一个桌面应用程序,旨在自动构建和验证 TDT。该工具在后台拥有一个内置 Prolog 解释器,并支持约束求解器 Z3 和 MONA。因此,它可以解决涉及算术、集合、 Horn 条件等逻辑公式的约束。 trusta 还利用大型语言模型,使创建和评估质量保证案例更加便利。它允许人机交互的人类检查和修改。我们评估了如 ChatGPT-3.5、ChatGPT-4 和 PaLM2 等顶尖语言模型,以生成质量保证案例。我们的测试结果显示,机器生成的案例与人类生成的案例有 50%-80% 的相似性。此外, trusta 从自然语言文本中提取形式约束,促进了更容易的解释和验证过程。这种提取需要人类审查和修正,将自动化效率和人类洞察力相结合。据我们所知,这是第一次将大型语言模型集成到自动创建和推理质量保证案例方面,带来了传统挑战的一种新颖方法。通过几个工业案例研究, trusta 证明可以快速发现一些通常在手动检查中忽略的微妙问题,展示了它在增强质量保证案例开发过程中的实际价值。
https://arxiv.org/abs/2309.12941
Task-oriented dialogue (TOD) systems facilitate users in executing various activities via multi-turn dialogues, but Large Language Models (LLMs) often struggle to comprehend these intricate contexts. In this study, we propose a novel "Self-Explanation" prompting strategy to enhance the comprehension abilities of LLMs in multi-turn dialogues. This task-agnostic approach requires the model to analyze each dialogue utterance before task execution, thereby improving performance across various dialogue-centric tasks. Experimental results from six benchmark datasets confirm that our method consistently outperforms other zero-shot prompts and matches or exceeds the efficacy of few-shot prompts, demonstrating its potential as a powerful tool in enhancing LLMs' comprehension in complex dialogue tasks.
任务导向对话系统(TOD)系统通过多回合对话协助用户执行各种任务,但大型语言模型(LLMs)往往难以理解这些复杂的上下文。在这个研究中,我们提出了一种新的“自我解释”引导策略,以增强多回合对话中的LLMs的理解能力。这种任务无关的方法要求模型在任务执行之前分析每个对话表述,从而提高在各种对话中心任务中的表现。从六个基准数据集的 experimental 结果来看,我们的方法和零样本引导相比,表现 consistently 更好,且与少量的引导效果相当或超过,表明它可能成为增强LLMs在复杂对话任务中理解能力的强大工具。
https://arxiv.org/abs/2309.12940
As software projects progress, quality of code assumes paramount importance as it affects reliability, maintainability and security of software. For this reason, static analysis tools are used in developer workflows to flag code quality issues. However, developers need to spend extra efforts to revise their code to improve code quality based on the tool findings. In this work, we investigate the use of (instruction-following) large language models (LLMs) to assist developers in revising code to resolve code quality issues. We present a tool, CORE (short for COde REvisions), architected using a pair of LLMs organized as a duo comprised of a proposer and a ranker. Providers of static analysis tools recommend ways to mitigate the tool warnings and developers follow them to revise their code. The \emph{proposer LLM} of CORE takes the same set of recommendations and applies them to generate candidate code revisions. The candidates which pass the static quality checks are retained. However, the LLM may introduce subtle, unintended functionality changes which may go un-detected by the static analysis. The \emph{ranker LLM} evaluates the changes made by the proposer using a rubric that closely follows the acceptance criteria that a developer would enforce. CORE uses the scores assigned by the ranker LLM to rank the candidate revisions before presenting them to the developer. CORE could revise 59.2% Python files (across 52 quality checks) so that they pass scrutiny by both a tool and a human reviewer. The ranker LLM is able to reduce false positives by 25.8% in these cases. CORE produced revisions that passed the static analysis tool in 76.8% Java files (across 10 quality checks) comparable to 78.3% of a specialized program repair tool, with significantly much less engineering efforts.
随着软件项目的进展,代码质量变得越来越重要,因为它会影响软件的可靠性、维护性和安全性。因此,静态分析工具被广泛用于开发者工作流程中,以检测代码质量问题。然而,开发者需要额外的努力来修改代码以基于工具调查结果提高代码质量。在本研究中,我们研究使用(指令跟随)大型语言模型(LLMs)协助开发者修改代码以解决代码质量问题的工具。我们提出了一个工具,称为CORE(缩写为COde Revisions),它由一对LLMs组成,由提议者和排名者组成。静态分析工具供应商建议如何缓解工具警告,开发者遵循它们来修改代码。CORE的提议者LLM使用相同的建议并生成候选人代码修订。通过静态质量检查合格的候选人保留。然而,LLM可能引入微妙、意想不到的功能变化,可能未被静态分析发现。排名者LLM评估提议者所做出的更改,使用与开发者接受标准严格遵守的认可标准。CORE使用排名者LLM分配的评分来排名候选人修订,在向开发者展示之前。CORE可以修改59.2%的Python文件(通过52个质量检查),使其通过工具和人类评审员的审查。排名者LLM在这些情况下可以减少false positives的25.8%。CORE创造了76.8%的Java文件(通过10个质量检查)中的修订,与专门的程序修复工具的78.3%相当,工程 effort significantly less。
https://arxiv.org/abs/2309.12938
Self-supervised training methods for transformers have demonstrated remarkable performance across various domains. Previous transformer-based models, such as masked autoencoders (MAE), typically utilize a single normalization layer for both the [CLS] symbol and the tokens. We propose in this paper a simple modification that employs separate normalization layers for the tokens and the [CLS] symbol to better capture their distinct characteristics and enhance downstream task performance. Our method aims to alleviate the potential negative effects of using the same normalization statistics for both token types, which may not be optimally aligned with their individual roles. We empirically show that by utilizing a separate normalization layer, the [CLS] embeddings can better encode the global contextual information and are distributed more uniformly in its anisotropic space. When replacing the conventional normalization layer with the two separate layers, we observe an average 2.7% performance improvement over the image, natural language, and graph domains.
Transformer的自监督训练方法已经在各种领域表现出卓越的性能。以前的Transformer模型,如掩码自动编码器(MAE),通常只使用[CLS]符号和代币的单个标准化层。在本文中,我们提出了一种简单的修改,使用不同的标准化层来处理代币和[CLS]符号,更好地捕捉它们的独特特征并提高后续任务表现。我们的方法旨在减轻使用相同标准化统计对不同代币类型的潜在负面影响,这些可能不太与它们各自的角色最佳匹配。我们的经验表明,通过使用独立的标准化层,[CLS]嵌入可以更好地编码全球上下文信息,并在其非均匀空间中更均匀地分布。当将传统的标准化层替换为两个独立的层时,我们观察到与图像、自然语言和图形 domains相比,平均性能提高了2.7%。
https://arxiv.org/abs/2309.12931
Saliency maps have become one of the most widely used interpretability techniques for convolutional neural networks (CNN) due to their simplicity and the quality of the insights they provide. However, there are still some doubts about whether these insights are a trustworthy representation of what CNNs use to come up with their predictions. This paper explores how rescuing the sign of the gradients from the saliency map can lead to a deeper understanding of multi-class classification problems. Using both pretrained and trained from scratch CNNs we unveil that considering the sign and the effect not only of the correct class, but also the influence of the other classes, allows to better identify the pixels of the image that the network is really focusing on. Furthermore, how occluding or altering those pixels is expected to affect the outcome also becomes clearer.
视觉相关度图(Saliency Map)已经成为卷积神经网络(CNN)中最常用的可解释技术之一,因为它们的简单易用以及它们提供的深刻 insights。然而,仍然存在一些疑问,这些 insights 是否足以代表 CNN 用来预测结果的可信表示。本文探讨了如何从视觉相关度图中提取梯度 sign 并深入理解多分类问题。使用预训练和从零开始训练的 CNN 模型,我们发现不仅考虑正确的类别及其影响,还要考虑其他类别的影响,可以更好地确定网络真正关注的图像像素。此外,如何遮蔽或改变这些像素预计会影响结果也变得更加清晰。
https://arxiv.org/abs/2309.12913
Nowadays, increasingly more data are available as knowledge graphs (KGs). While this data model supports advanced reasoning and querying, they remain difficult to mine due to their size and complexity. Graph mining approaches can be used to extract patterns from KGs. However this presents two main issues. First, graph mining approaches tend to extract too many patterns for a human analyst to interpret (pattern explosion). Second, real-life KGs tend to differ from the graphs usually treated in graph mining: they are multigraphs, their vertex degrees tend to follow a power-law, and the way in which they model knowledge can produce spurious patterns. Recently, a graph mining approach named GraphMDL+ has been proposed to tackle the problem of pattern explosion, using the Minimum Description Length (MDL) principle. However, GraphMDL+, like other graph mining approaches, is not suited for KGs without adaptations. In this paper we propose KG-MDL, a graph pattern mining approach based on the MDL principle that, given a KG, generates a human-sized and descriptive set of graph patterns, and so in a parameter-less and anytime way. We report on experiments on medium-sized KGs showing that our approach generates sets of patterns that are both small enough to be interpreted by humans and descriptive of the KG. We show that the extracted patterns highlight relevant characteristics of the data: both of the schema used to create the data, and of the concrete facts it contains. We also discuss the issues related to mining graph patterns on knowledge graphs, as opposed to other types of graph data.
Nowadays, increasingly more data are available as knowledge graphs (KGs). While this data model supports advanced reasoning and querying, they remain difficult to mine due to their size and complexity. Graph mining approaches can be used to extract patterns from KGs. However this presents two main issues. First, graph mining approaches tend to extract too many patterns for a human analyst to interpret (pattern explosion). Second, real-life KGs tend to differ from the graphs usually treated in graph mining: they are multigraphs, their vertex degrees tend to follow a power-law, and the way in which they model knowledge can produce spurious patterns. Recently, a graph mining approach named GraphMDL+ has been proposed to tackle the problem of pattern explosion, using the minimum Description Length (MDL) principle. However, GraphMDL+, like other graph mining approaches, is not suited for KGs without adaptations. In this paper we propose KG-MDL, a graph pattern mining approach based on the MDL principle that, given a KG, generates a human-sized and descriptive set of graph patterns, and so in a parameter-less and anytime way. We report on experiments on medium-sized KGs showing that our approach generates sets of patterns that are both small enough to be interpreted by humans and descriptive of the KG. We show that the extracted patterns highlight relevant characteristics of the data: both of the schema used to create the data, and of the concrete facts it contains. We also discuss the issues related to mining graph patterns on knowledge graphs, as opposed to other types of graph data.
https://arxiv.org/abs/2309.12908
Event Relation Extraction (ERE) aims to extract multiple kinds of relations among events in texts. However, existing methods singly categorize event relations as different classes, which are inadequately capturing the intrinsic semantics of these relations. To comprehensively understand their intrinsic semantics, in this paper, we obtain prototype representations for each type of event relation and propose a Prototype-Enhanced Matching (ProtoEM) framework for the joint extraction of multiple kinds of event relations. Specifically, ProtoEM extracts event relations in a two-step manner, i.e., prototype representing and prototype matching. In the first step, to capture the connotations of different event relations, ProtoEM utilizes examples to represent the prototypes corresponding to these relations. Subsequently, to capture the interdependence among event relations, it constructs a dependency graph for the prototypes corresponding to these relations and utilized a Graph Neural Network (GNN)-based module for modeling. In the second step, it obtains the representations of new event pairs and calculates their similarity with those prototypes obtained in the first step to evaluate which types of event relations they belong to. Experimental results on the MAVEN-ERE dataset demonstrate that the proposed ProtoEM framework can effectively represent the prototypes of event relations and further obtain a significant improvement over baseline models.
事件关系提取(ERE)的目标是在文本中提取不同类型的关系。然而,现有方法单独将事件关系分类为不同的类别,这些类别未能充分捕捉到这些关系的内在语义。为了全面理解这些关系的内在语义,在本文中,我们提出了每个类型的事件关系原型表示,并提出了原型增强匹配(ProtoEM)框架,用于同时提取多种类型的事件关系。具体来说,ProtoEM采用两步提取方法,即原型表示和原型匹配。在第一步中,为了捕捉不同事件关系的内涵,ProtoEM使用示例表示这些关系的原型。随后,为了捕捉事件关系之间的依赖关系,它构建了一个原型依赖图,用于表示这些关系的原型,并使用基于Graph Neural Network(GNN)模块进行建模。在第二步中,它获取了新的事件对的表示,并计算它们与在第一步中获取的原型之间的相似性,以评估它们属于哪种事件关系。Maven-ERE数据集的实验结果表明,提出的ProtoEM框架可以 effectively representing 原型事件关系原型,并进一步优于基准模型。
https://arxiv.org/abs/2309.12892
This paper introduces a novel one-stage end-to-end detector specifically designed to detect small lesions in medical images. Precise localization of small lesions presents challenges due to their appearance and the diverse contextual backgrounds in which they are found. To address this, our approach introduces a new type of pixel-based anchor that dynamically moves towards the targeted lesion for detection. We refer to this new architecture as GravityNet, and the novel anchors as gravity points since they appear to be "attracted" by the lesions. We conducted experiments on two well-established medical problems involving small lesions to evaluate the performance of the proposed approach: microcalcifications detection in digital mammograms and microaneurysms detection in digital fundus images. Our method demonstrates promising results in effectively detecting small lesions in these medical imaging tasks.
这篇文章介绍了一种独特的单阶段端到端探测器,专门设计用于在医学图像中检测小型损伤。精确定位小型损伤面临着因为它们的外观和它们在不同背景中出现的多样性的挑战。为了解决这个问题,我们的方法引入了一种新的基于像素的基准点,该基准点动态地朝向目标的损伤进行探测。我们称之为GravityNet,并将新的基准点称为Gravity points,因为它们似乎受到损伤的“吸引”。我们对涉及小型损伤的两个现有医学问题进行了实验,以评估所提出的方法的性能:数字乳腺照相中的微骨折检测和数字胸片中的微血管破裂检测。我们的方法在这两个医学图像任务中表现出优异的结果,有效地检测小型损伤。
https://arxiv.org/abs/2309.12876
High-quality text embedding is pivotal in improving semantic textual similarity (STS) tasks, which are crucial components in Large Language Model (LLM) applications. However, a common challenge existing text embedding models face is the problem of vanishing gradients, primarily due to their reliance on the cosine function in the optimization objective, which has saturation zones. To address this issue, this paper proposes a novel angle-optimized text embedding model called AnglE. The core idea of AnglE is to introduce angle optimization in a complex space. This novel approach effectively mitigates the adverse effects of the saturation zone in the cosine function, which can impede gradient and hinder optimization processes. To set up a comprehensive STS evaluation, we experimented on existing short-text STS datasets and a newly collected long-text STS dataset from GitHub Issues. Furthermore, we examine domain-specific STS scenarios with limited labeled data and explore how AnglE works with LLM-annotated data. Extensive experiments were conducted on various tasks including short-text STS, long-text STS, and domain-specific STS tasks. The results show that AnglE outperforms the state-of-the-art (SOTA) STS models that ignore the cosine saturation zone. These findings demonstrate the ability of AnglE to generate high-quality text embeddings and the usefulness of angle optimization in STS.
高质量的文本嵌入是改善语义文本相似性任务的关键,它们是大型语言模型应用的关键组件。然而,现有文本嵌入模型面临一个共同的挑战,就是梯度消失问题,这主要是因为它们在优化目标中依赖余弦函数,而余弦函数有一个饱和区域。为了解决这一问题,本文提出了一种名为AnglE的新角度优化文本嵌入模型。AnglE的核心思想是引入复杂的空间角度优化。这种新的方法有效地缓解了余弦函数饱和区域产生的不利效应,这些效应可能会阻碍梯度和妨碍优化过程。为了建立全面的语义文本相似性评估,我们实验了现有的短文本语义文本相似性任务数据集和新从GitHub问题集收集的长篇文本语义文本相似性任务数据集。我们还检查了特定领域的有限标记数据下的特定语义文本相似性场景,并探索了AnglE与LLM标记数据的结合方式。广泛的实验涵盖了各种任务,包括短文本语义文本相似性任务、长篇文本语义文本相似性任务和特定领域的语义文本相似性任务。结果表明,AnglE比忽略余弦函数饱和区域的最先进的语义文本相似性模型表现更好。这些发现表明AnglE能够生成高质量的文本嵌入,以及在语义文本相似性任务中的角度优化的有用性。
https://arxiv.org/abs/2309.12871
Existing video captioning approaches typically require to first sample video frames from a decoded video and then conduct a subsequent process (e.g., feature extraction and/or captioning model learning). In this pipeline, manual frame sampling may ignore key information in videos and thus degrade performance. Additionally, redundant information in the sampled frames may result in low efficiency in the inference of video captioning. Addressing this, we study video captioning from a different perspective in compressed domain, which brings multi-fold advantages over the existing pipeline: 1) Compared to raw images from the decoded video, the compressed video, consisting of I-frames, motion vectors and residuals, is highly distinguishable, which allows us to leverage the entire video for learning without manual sampling through a specialized model design; 2) The captioning model is more efficient in inference as smaller and less redundant information is processed. We propose a simple yet effective end-to-end transformer in the compressed domain for video captioning that enables learning from the compressed video for captioning. We show that even with a simple design, our method can achieve state-of-the-art performance on different benchmarks while running almost 2x faster than existing approaches. Code is available at this https URL.
现有的视频字幕方法通常需要先从解码视频中抽取帧并执行后续处理(例如特征提取和或字幕模型学习)。在这条处理路径中,手动帧采样可能会忽略视频中的关键信息,从而降低性能。此外,抽取的帧中的冗余信息可能会影响视频字幕推断的效率。针对这个问题,我们从压缩域的角度研究视频字幕,比现有的处理路径具有多项优势:1)与解码视频中的 raw 图像相比,压缩视频由 I-frames、运动向量和残差组成,具有很高的辨识度,这使得我们可以通过专门的模型设计利用整个视频进行学习,而无需手动采样;2)由于处理的信息规模更小且冗余更少,字幕模型在推断方面更加高效。我们提出了一种在压缩域中用于视频字幕的端到端Transformer,使其可以从压缩视频中学习字幕。我们证明,即使采用简单的设计,我们的方法也可以在不同基准测试中实现最先进的性能,同时比现有方法运行快近2倍。代码可在该 https URL 上获取。
https://arxiv.org/abs/2309.12867
Neural machine translation (NMT) has shown impressive performance when trained on large-scale corpora. However, generic NMT systems have demonstrated poor performance on out-of-domain translation. To mitigate this issue, several domain adaptation methods have recently been proposed which often lead to better translation quality than genetic NMT systems. While there has been some continuous progress in NMT for English and other European languages, domain adaption in Arabic has received little attention in the literature. The current study, therefore, aims to explore the effectiveness of domain-specific adaptation for Arabic MT (AMT), in yet unexplored domain, financial news articles. To this end, we developed carefully a parallel corpus for Arabic-English (AR- EN) translation in the financial domain for benchmarking different domain adaptation methods. We then fine-tuned several pre-trained NMT and Large Language models including ChatGPT-3.5 Turbo on our dataset. The results showed that the fine-tuning is successful using just a few well-aligned in-domain AR-EN segments. The quality of ChatGPT translation was superior than other models based on automatic and human evaluations. To the best of our knowledge, this is the first work on fine-tuning ChatGPT towards financial domain transfer learning. To contribute to research in domain translation, we made our datasets and fine-tuned models available at this https URL.
神经网络机器翻译(NMT)在大规模语料库上训练时表现令人印象深刻。然而,通用NMT系统在跨域翻译方面表现出较差的性能。为了解决这个问题,近年来提出了许多域适应方法,这些方法通常比遗传的NMT系统提供更好的翻译质量。虽然英语和其他欧洲语言在NMT方面取得了一些进展,但在阿拉伯语的域适应方面在文献中却较少关注。因此,本研究的目的是探索阿拉伯MT(AMT)在尚未探索过的域——金融新闻 articles 中的域特定适应效果。为此,我们仔细开发了金融域中的阿拉伯-英语(AR- EN)翻译平行语料库,以基准不同的域适应方法。随后,我们对几个预先训练的NMT和大型语言模型,包括 ChatGPT-3.5 Turbo 进行了微调,在我们的数据集上成功进行了微调。结果显示,仅仅使用一些与域相关的 AR-EN Segments 就可以成功进行微调。ChatGPT 翻译的质量基于自动和人工评估被认为比其他模型更好。据我们所知,这是第一个针对金融域迁移学习的研究。为了做出贡献到域翻译研究,我们将该数据集和微调模型放在了这个 https URL 上。
https://arxiv.org/abs/2309.12863