The core problem in zero-shot open vocabulary detection is how to align visual and text features, so that the detector performs well on unseen classes. Previous approaches train the feature pyramid and detection head from scratch, which breaks the vision-text feature alignment established during pretraining, and struggles to prevent the language model from forgetting unseen classes. We propose three methods to alleviate these issues. Firstly, a simple scheme is used to augment the text embeddings which prevents overfitting to a small number of classes seen during training, while simultaneously saving memory and computation. Secondly, the feature pyramid network and the detection head are modified to include trainable gated shortcuts, which encourages vision-text feature alignment and guarantees it at the start of detection training. Finally, a self-training approach is used to leverage a larger corpus of image-text pairs thus improving detection performance on classes with no human annotated bounding boxes. Our three methods are evaluated on the zero-shot version of the LVIS benchmark, each of them showing clear and significant benefits. Our final network achieves the new stateof-the-art on the mAP-all metric and demonstrates competitive performance for mAP-rare, as well as superior transfer to COCO and Objects365.
在零样本开放词汇检测中,核心问题是如何对齐视觉和文本特征,以便检测器在未训练过的类上表现良好。以前的算法从开始训练就开始训练特征金字塔和检测头,这破坏了在预训练期间建立的视觉文本特征对齐,并努力防止语言模型忘记未训练过的类。我们提出了三种方法来缓解这些问题。第一种方法是使用简单的方案来增加文本嵌入,以防止在训练期间看到的少数类上过度拟合,同时同时节省内存和计算。第二种方法是修改特征金字塔网络和检测头,包括可训练的门控快捷方式,这鼓励视觉文本特征对齐,并在检测训练开始时保证它。最后一种方法是利用更大的图像文本对语料库,从而提高检测在这些类上没有人类标注 bounding box 的检测性能。我们三种方法在 LVIS 基准测试的零样本版本上进行评估,每个方法都表现出明显和重要的 benefits。我们的最终网络在 mAP-all 度量上实现了新的前沿技术,并表现出 mAP-罕见的类上的 competitive 性能,以及与 COCO 和 Object365 相比更好的传输性能。
https://arxiv.org/abs/2303.13518
To facilitate research in the direction of fine-tuning foundation models from human feedback, we held the MineRL BASALT Competition on Fine-Tuning from Human Feedback at NeurIPS 2022. The BASALT challenge asks teams to compete to develop algorithms to solve tasks with hard-to-specify reward functions in Minecraft. Through this competition, we aimed to promote the development of algorithms that use human feedback as channels to learn the desired behavior. We describe the competition and provide an overview of the top solutions. We conclude by discussing the impact of the competition and future directions for improvement.
为促进从人类反馈中 fine-tuning 基础模型的研究,我们在 NeurIPS 2022 年举办了 MineRL BASALT 比赛,比赛的主题是从人类反馈中 fine-tuning 基础模型。BASALT 挑战要求团队竞争,开发用于解决 Minecraft 中难以定义奖励函数的任务的算法。通过这场比赛,我们旨在促进使用人类反馈作为学习目标行为的算法开发。我们描述了比赛,并概述了最优秀的解决方案。最后,我们讨论了比赛的影响和未来的改进方向。
https://arxiv.org/abs/2303.13512
Recent progress in NeRF-based GANs has introduced a number of approaches for high-resolution and high-fidelity generative modeling of human heads with a possibility for novel view rendering. At the same time, one must solve an inverse problem to be able to re-render or modify an existing image or video. Despite the success of universal optimization-based methods for 2D GAN inversion, those, applied to 3D GANs, may fail to produce 3D-consistent renderings. Fast encoder-based techniques, such as those developed for StyleGAN, may also be less appealing due to the lack of identity preservation. In our work, we introduce a real-time method that bridges the gap between the two approaches by directly utilizing the tri-plane representation introduced for EG3D generative model. In particular, we build upon a feed-forward convolutional encoder for the latent code and extend it with a fully-convolutional predictor of tri-plane numerical offsets. As shown in our work, the renderings are similar in quality to optimization-based techniques and significantly outperform the baselines for novel view. As we empirically prove, this is a consequence of directly operating in the tri-plane space, not in the GAN parameter space, while making use of an encoder-based trainable approach.
近年来,基于NeRF的GAN技术取得了进展,引入了多种方法,以实现高分辨率和高保真度的人类头部生成建模,并具备创造新视角的能力。同时,必须解决逆问题才能重新渲染或修改现有图像或视频。尽管2DGAN反转的通用优化方法取得了成功,但应用于3DGAN时可能无法产生3D一致性的渲染。类似于StyleGAN开发的快速编码技术也可能因为缺乏身份保留而不太吸引人。在我们的研究中,我们引入了实时方法,通过直接利用为EG3D生成模型引入的三方平面表示来直接连接两个方法之间的差异。特别是,我们基于前向卷积编码器的 latent code构建了一个 fully-convolutional 预测器,并将其扩展为三方平面数值偏移的全面卷积预测器。在我们的研究中,渲染质量与优化方法类似,对于新视角显著优于基准。我们通过经验证明,这是直接操作三方平面而不是GAN参数空间的后果,同时利用编码器可训练方法。
https://arxiv.org/abs/2303.13497
Attention is the crucial cognitive ability that limits and selects what information we observe. Previous work by Bolander et al. (2016) proposes a model of attention based on dynamic epistemic logic (DEL) where agents are either fully attentive or not attentive at all. While introducing the realistic feature that inattentive agents believe nothing happens, the model does not represent the most essential aspect of attention: its selectivity. Here, we propose a generalization that allows for paying attention to subsets of atomic formulas. We introduce the corresponding logic for propositional attention, and show its axiomatization to be sound and complete. We then extend the framework to account for inattentive agents that, instead of assuming nothing happens, may default to a specific truth-value of what they failed to attend to (a sort of prior concerning the unattended atoms). This feature allows for a more cognitively plausible representation of the inattentional blindness phenomenon, where agents end up with false beliefs due to their failure to attend to conspicuous but unexpected events. Both versions of the model define attention-based learning through appropriate DEL event models based on a few and clear edge principles. While the size of such event models grow exponentially both with the number of agents and the number of atoms, we introduce a new logical language for describing event models syntactically and show that using this language our event models can be represented linearly in the number of agents and atoms. Furthermore, representing our event models using this language is achieved by a straightforward formalisation of the aforementioned edge principles.
注意力是一个重要的认知能力,它限制并选择我们观察到的信息。Bolander等人(2016)之前的研究表明,我们可以基于动态知识逻辑(DEL)建立一个注意力模型,其中参与者可以是全注意力或完全没有注意力。虽然引入了真实的特征,即缺乏注意力的参与者认为什么也没有发生,但模型并没有表现出注意力最本质的特征:选择性。在这里,我们提出了一种扩展,可以关注原子公式的子集。我们介绍了命题注意力对应的逻辑,并证明了其axiomatization是 sound和完整的。然后我们扩展了框架,以处理缺乏注意力的参与者,他们可能会默认关注他们未关注的特定真相值(类似于关注原子的前置知识)。这一特性可以更容易地模拟缺乏注意力的忽视现象,即因为未能关注而出现错误的信念。该模型的两个版本通过适当的DEL事件模型定义了注意力基于学习,这些事件模型基于几个清晰的边缘原则。虽然这些事件模型的大小随着参与者数量和原子数量呈指数增长,但我们引入了一个新的逻辑语言,以描述事件模型的结构,并证明了使用这个语言,我们可以线性地表示参与者和原子的数量。此外,使用这个语言表示我们的事件模型是通过上述边缘原则的简单形式化实现的。
https://arxiv.org/abs/2303.13494
The vulnerability of machine learning models to adversarial attacks has been attracting considerable attention in recent years. Most existing studies focus on the behavior of stand-alone single-agent learners. In comparison, this work studies adversarial training over graphs, where individual agents are subjected to perturbations of varied strength levels across space. It is expected that interactions by linked agents, and the heterogeneity of the attack models that are possible over the graph, can help enhance robustness in view of the coordination power of the group. Using a min-max formulation of diffusion learning, we develop a decentralized adversarial training framework for multi-agent systems. We analyze the convergence properties of the proposed scheme for both convex and non-convex environments, and illustrate the enhanced robustness to adversarial attacks.
机器学习模型对对抗攻击的脆弱性近年来吸引了相当大的注意力。大多数现有研究都关注单个独立学习器的行为。相比之下,本研究研究在图形上的对抗训练,个体 agents 在空间中受到各种强度水平的变化影响。预计通过连接agents的互动,以及在图形上的攻击模型的多样性,可以改善群体协调力,从而增强鲁棒性。使用扩散学习的最小最大定义,我们开发了分布式对抗训练框架,为多Agent系统。我们对 proposed scheme 在凸环境和非凸环境下的收敛性质进行了分析,并展示了对对抗攻击的增强鲁棒性。
https://arxiv.org/abs/2303.13326
The creation of new technological concepts through design reuses, recombination, and synthesis of prior concepts to create new ones may lead to exponential growth of the concept space over time. However, our statistical analysis of a large-scale technology semantic network consisting of over four million concepts from patent texts found evidence of a persistent deceleration in the pace of concept creation and a decline in the originality of newly created concepts. These trends may be attributed to the limitations of human intelligence in innovating beyond an expanding space of prior art. To sustain innovation, we recommend the development and implementation of creative artificial intelligence that can augment various aspects of the innovation process, including learning, creation, and evaluation.
通过设计重用、组合和合成先前概念来创建新科技概念可能会导致概念领域的指数增长。然而,我们对一个由专利文献中超过四百万概念组成的大规模技术语义网络进行的统计分析发现,概念创建的速度持续减缓,新创建概念的原创性不断下降。这些趋势可能归咎于人类智力在超越先前 art 的扩展空间中进行创新的限制。为了维持创新,我们建议开发和实施创意人工智能,可以增强创新过程的各种方面,包括学习、创建和评估。
https://arxiv.org/abs/2303.13300
As neural networks increasingly make critical decisions in high-stakes settings, monitoring and explaining their behavior in an understandable and trustworthy manner is a necessity. One commonly used type of explainer is post hoc feature attribution, a family of methods for giving each feature in an input a score corresponding to its influence on a model's output. A major limitation of this family of explainers in practice is that they can disagree on which features are more important than others. Our contribution in this paper is a method of training models with this disagreement problem in mind. We do this by introducing a Post hoc Explainer Agreement Regularization (PEAR) loss term alongside the standard term corresponding to accuracy, an additional term that measures the difference in feature attribution between a pair of explainers. We observe on three datasets that we can train a model with this loss term to improve explanation consensus on unseen data, and see improved consensus between explainers other than those used in the loss term. We examine the trade-off between improved consensus and model performance. And finally, we study the influence our method has on feature attribution explanations.
神经网络在高风险环境中越来越频繁地做出关键决策,因此理解和可信地监测和解释其行为是至关重要的。一种常见的解释器类型是后处理特征归因,一种方法 family,为给输入每个特征赋予其对模型输出的影响对应的得分而提供一组方法。在实践中,这个 family 的主要限制是它们可能不同意哪个特征比其他特征更重要。本文的贡献是考虑这个问题并使用一种后处理解释器协议 regularization (PEAR) loss Term,与精度标准 term 一起使用,并添加一个用于衡量两个解释器之间的特征归因差异的新 term。我们在三个数据集上观察,可以训练模型使用这个 loss Term 来提高 unseen 数据下的解释一致性,并观察除了用于 loss Term 的解释器之外之间的更好的一致性。我们研究提高一致性和模型性能之间的权衡。最后,我们研究我们的方法对特征归因解释的影响。
https://arxiv.org/abs/2303.13299
Machine learning algorithms, especially Neural Networks (NNs), are a valuable tool used to approximate non-linear relationships, like the AC-Optimal Power Flow (AC-OPF), with considerable accuracy -- and achieving a speedup of several orders of magnitude when deployed for use. Often in power systems literature, the NNs are trained with a fixed dataset generated prior to the training process. In this paper, we show that adapting the NN training dataset during training can improve the NN performance and substantially reduce its worst-case violations. This paper proposes an algorithm that identifies and enriches the training dataset with critical datapoints that reduce the worst-case violations and deliver a neural network with improved worst-case performance guarantees. We demonstrate the performance of our algorithm in four test power systems, ranging from 39-buses to 162-buses.
机器学习算法,特别是神经网络(NNs),是一种宝贵的工具,用于近似非线性关系,如交流最优能量流(AC-OPF),具有相当准确的精度,并在部署时实现数 orders of magnitude 的提速。通常在电力系统文献中,NNs 通常是在训练过程开始前生成固定的数据集进行训练。在本文中,我们表明,在训练期间适应NN训练数据集可以改进NN性能,并显著减少其最坏情况下的违反。本文提出了一种算法,可以识别并丰富训练数据集中的关键数据点,以减少最坏情况下的违反,并生成NNs 具有改进最坏情况下性能保证。我们展示了我们算法在四个测试电力系统中的表现,这些电力系统的规模从39辆到162辆不等。
https://arxiv.org/abs/2303.13228
Large language models have demonstrated surprising ability to perform in-context learning, i.e., these models can be directly applied to solve numerous downstream tasks by conditioning on a prompt constructed by a few input-output examples. However, prior research has shown that in-context learning can suffer from high instability due to variations in training examples, example order, and prompt formats. Therefore, the construction of an appropriate prompt is essential for improving the performance of in-context learning. In this paper, we revisit this problem from the view of predictive bias. Specifically, we introduce a metric to evaluate the predictive bias of a fixed prompt against labels or a given attributes. Then we empirically show that prompts with higher bias always lead to unsatisfactory predictive quality. Based on this observation, we propose a novel search strategy based on the greedy search to identify the near-optimal prompt for improving the performance of in-context learning. We perform comprehensive experiments with state-of-the-art mainstream models such as GPT-3 on various downstream tasks. Our results indicate that our method can enhance the model's in-context learning performance in an effective and interpretable manner.
大型语言模型已经表现出惊人的在上下文中进行学习的能力,即这些模型可以通过对几个输入输出示例构建的提示进行条件化来解决许多后续任务。然而,先前的研究已经表明,由于训练示例、示例顺序和提示格式的变异,在上下文学习中可能会出现高不稳定性。因此,构建适当的提示是至关重要的,以改善在上下文中的学习表现。在本文中,我们重新考虑这个问题从预测偏差的视角出发。具体来说,我们引入了一种度量方法,以评估固定提示与标签或给定属性的预测偏差。然后,我们经验证了高偏差的提示总是会导致不满意的预测质量。基于这一观察,我们提出了一种基于贪婪搜索的新搜索策略,以识别改善在上下文中的学习表现的最佳提示。我们与最先进的主流模型如GPT-3在各种后续任务上进行综合实验。我们的结果表明,我们的方法可以在有效且可解释的方式增强模型在上下文学习中的表现。
https://arxiv.org/abs/2303.13217
Today, many systems use artificial intelligence (AI) to solve complex problems. While this often increases system effectiveness, developing a production-ready AI-based system is a difficult task. Thus, solid AI engineering practices are required to ensure the quality of the resulting system and to improve the development process. While several practices have already been proposed for the development of AI-based systems, detailed practical experiences of applying these practices are rare. In this paper, we aim to address this gap by collecting such experiences during a case study, namely the development of an autonomous stock trading system that uses machine learning functionality to invest in stocks. We selected 10 AI engineering practices from the literature and systematically applied them during development, with the goal to collect evidence about their applicability and effectiveness. Using structured field notes, we documented our experiences. Furthermore, we also used field notes to document challenges that occurred during the development, and the solutions we applied to overcome them. Afterwards, we analyzed the collected field notes, and evaluated how each practice improved the development. Lastly, we compared our evidence with existing literature. Most applied practices improved our system, albeit to varying extent, and we were able to overcome all major challenges. The qualitative results provide detailed accounts about 10 AI engineering practices, as well as challenges and solutions associated with such a project. Our experiences therefore enrich the emerging body of evidence in this field, which may be especially helpful for practitioner teams new to AI engineering.
当今世界,许多系统使用人工智能(AI)解决复杂的问题。虽然这通常可以增加系统的有效性,但开发生产级别的基于AI的系统是一项困难的任务。因此,需要 solid AI engineering practices 来确保生成的系统质量,并改进开发过程。虽然已经有几种方法被提议用于开发基于AI的系统,但实施这些实践的经验非常罕见。在本文中,我们旨在通过在一个案例研究中收集这些经验来解决这一差距,即开发一个使用机器学习功能投资于股票的自主股票交易系统。我们从文献中选择了10个AI engineering practices,并系统地在开发过程中应用它们,旨在收集它们的应用和有效性的证据。使用结构化的Field notes,我们记录了我们的经历。此外,我们还使用Field notes记录了开发过程中发生的挑战和我们所应用的解决措施。之后,我们对收集的Field notes进行了分析,并评估了每种实践如何改进开发。最后,我们比较了我们的证据证明与现有的文献。大多数应用实践改进了我们的系统,虽然程度不一,我们成功地克服了所有主要挑战。定性结果提供了关于10个AI engineering practices 的详细描述,以及与这个项目相关的挑战和解决方案。我们的经历因此丰富了该领域的证据,可能对AI工程的新团队特别有用。
https://arxiv.org/abs/2303.13216
Point cloud (PCD) anomaly detection steadily emerges as a promising research area. This study aims to improve PCD anomaly detection performance by combining handcrafted PCD descriptions with powerful pre-trained 2D neural networks. To this end, this study proposes Complementary Pseudo Multimodal Feature (CPMF) that incorporates local geometrical information in 3D modality using handcrafted PCD descriptors and global semantic information in the generated pseudo 2D modality using pre-trained 2D neural networks. For global semantics extraction, CPMF projects the origin PCD into a pseudo 2D modality containing multi-view images. These images are delivered to pre-trained 2D neural networks for informative 2D modality feature extraction. The 3D and 2D modality features are aggregated to obtain the CPMF for PCD anomaly detection. Extensive experiments demonstrate the complementary capacity between 2D and 3D modality features and the effectiveness of CPMF, with 95.15% image-level AU-ROC and 92.93% pixel-level PRO on the MVTec3D benchmark. Code is available on this https URL.
点云(PCD)异常检测逐渐成为一个有前途的研究领域。本研究旨在通过结合手工编写的点云描述与强大的预训练2D神经网络,提高PCD异常检测性能。为此,本研究提出了互补的伪多模态特征(CPMF),该特征使用手工编写的点云描述将3D模态中的 local 几何信息与使用预训练2D神经网络生成的伪2D模态中的 global 语义信息相结合。为了获取全局语义信息,CPMF将点云的起源点转换为包含多视角图像的伪2D模态。这些图像被发送到预训练2D神经网络进行2D模态信息 informative 特征提取。3D和2D模态特征的聚合得到了PCD异常检测的CPMF。广泛的实验结果表明,2D和3D模态特征之间的互补能力以及CPMF的有效性,在MVTec3D基准测试中,PCD异常检测的性能达到95.15%。代码可在本https URL上获取。
https://arxiv.org/abs/2303.13194
In many applications, ads are displayed together with the prices, so as to provide a direct comparison among similar products or services. The price-displaying feature not only influences the consumers' decisions, but also affects the advertisers' bidding behaviors. In this paper, we study ad auctions with display prices from the perspective of mechanism design, in which advertisers are asked to submit both the costs and prices of their products. We provide a characterization for all incentive compatible auctions with display prices, and use it to design auctions under two scenarios. In the former scenario, the display prices are assumed to be exogenously determined. For this setting, we derive the welfare-maximizing and revenue-maximizing auctions for any realization of the price profile. In the latter, advertisers are allowed to strategize display prices in their own interests. We investigate two families of allocation policies within the scenario and identify the equilibrium prices accordingly. Our results reveal that the display prices do affect the design of ad auctions and the platform can leverage such information to optimize the performance of ad delivery.
在许多应用程序中,广告和价格一起显示,以便提供类似产品或服务之间的直接比较。显示价格 feature 不仅会影响消费者的决策,还会影响广告商的竞标行为。在本文中,我们从机制设计的角度研究带有显示价格的 ads 拍卖,要求广告商提交其产品的成本和价值。我们提供了与显示价格相关的特征描述,并使用它设计了两个场景下的拍卖。在第一个场景中,显示价格假设是外部决定的。为此,我们推导出任何价格型实现的最佳福利和收入最大化拍卖。在第二个场景中,广告商被允许为自己的私利制定显示价格策略。我们研究了场景内的两个分配政策家族,并相应地确定均衡价格。我们的结果表明,显示价格确实会影响 ads 拍卖的设计,平台可以利用这些信息优化广告交付的性能。
https://arxiv.org/abs/2303.13192
Detecting sets of relevant patterns from a given dataset is an important challenge in data mining. The relevance of a pattern, also called utility in the literature, is a subjective measure and can be actually assessed from very different points of view. Rule-based languages like Answer Set Programming (ASP) seem well suited for specifying user-provided criteria to assess pattern utility in a form of constraints; moreover, declarativity of ASP allows for a very easy switch between several criteria in order to analyze the dataset from different points of view. In this paper, we make steps toward extending the notion of High Utility Pattern Mining (HUPM); in particular we introduce a new framework that allows for new classes of utility criteria not considered in the previous literature. We also show how recent extensions of ASP with external functions can support a fast and effective encoding and testing of the new framework. To demonstrate the potential of the proposed framework, we exploit it as a building block for the definition of an innovative method for predicting ICU admission for COVID-19 patients. Finally, an extensive experimental activity demonstrates both from a quantitative and a qualitative point of view the effectiveness of the proposed approach. Under consideration in Theory and Practice of Logic Programming (TPLP)
在数据挖掘中,检测给定数据集中的相关模式是一项重要的挑战。模式 relevance 也称为文献中的 utility,是一种主观测量,可以从非常不同的角度进行评估。规则语言如 Answer Set Programming (ASP)似乎非常适合指定用户提供的标准以评估模式 utility 的形式进行约束;此外,ASP 的 declarativity 允许非常轻松地切换多个标准以分析数据集从不同的角度。在本文中,我们朝着扩展高 Utility 模式挖掘 (HUPM)的概念迈出一步;特别是,我们引入了一个新的框架,该框架允许新的 utility 标准class,在先前的文献中未考虑。我们还展示了如何使用最近的 ASP 扩展外部函数支持快速且有效的编码和测试新框架。为了展示新框架的潜力,我们利用它作为构建块,定义一种预测 COVID-19 患者重症监护病房接纳方法的创新性方法。最后,一项广泛的实验活动从定量和定性角度证明了所提出的方法的有效性。在逻辑编程的理论和实践中,正在考虑。
https://arxiv.org/abs/2303.13191
Systems with artificial intelligence components, so-called AI-based systems, have gained considerable attention recently. However, many organizations have issues with achieving production readiness with such systems. As a means to improve certain software quality attributes and to address frequently occurring problems, design patterns represent proven solution blueprints. While new patterns for AI-based systems are emerging, existing patterns have also been adapted to this new context. The goal of this study is to provide an overview of design patterns for AI-based systems, both new and adapted ones. We want to collect and categorize patterns, and make them accessible for researchers and practitioners. To this end, we first performed a multivocal literature review (MLR) to collect design patterns used with AI-based systems. We then integrated the created pattern collection into a web-based pattern repository to make the patterns browsable and easy to find. As a result, we selected 51 resources (35 white and 16 gray ones), from which we extracted 70 unique patterns used for AI-based systems. Among these are 34 new patterns and 36 traditional ones that have been adapted to this context. Popular pattern categories include "architecture" (25 patterns), "deployment" (16), "implementation" (9), or "security & safety" (9). While some patterns with four or more mentions already seem established, the majority of patterns have only been mentioned once or twice (51 patterns). Our results in this emerging field can be used by researchers as a foundation for follow-up studies and by practitioners to discover relevant patterns for informing the design of AI-based systems.
最近,具有人工智能成分的系统,也就是所谓的AI-based系统,引起了广泛关注。然而,许多组织在与此类系统实现生产准备方面遇到了问题。作为改善某些软件质量属性并解决经常出现的问题的手段,设计模式代表了已经证明的解决方案蓝图。尽管AI-based系统的新设计模式正在涌现,但现有模式也已经被适应到这个新环境中。本研究的目标是提供AI-based系统设计模式的新和适应模式的全面概述。我们希望收集和分类模式,使其为研究人员和从业者所可用。为此,我们首先进行了多项式文献综述(MLR),以收集与AI-based系统使用的设计模式。然后,我们将创建的模式集合集成到一个在线模式存储库中,以使其易于搜索。因此,我们选择了51个资源(35个白色和16个灰色),从中提取了70个用于AI-based系统的独特的设计模式。其中包括34个新的设计和36个传统的适应此环境的模式。流行的模式类别包括“建筑”(25个模式)、“部署”(16个)、“实现”(9个)或“安全和安全”(9个)。尽管一些模式已有四个或更多提及,但大多数模式只被提及一次或两次(51个模式)。在我们这个新兴领域的结果可以被用作后续研究的基线,并被用作开发人员发现相关模式以指导AI-based系统的设计和开发。
https://arxiv.org/abs/2303.13173
Non-additive uncertainty theories, typically possibility theory, belief functions and imprecise probabilities share a common feature with modal logic: the duality properties between possibility and necessity measures, belief and plausibility functions as well as between upper and lower probabilities extend the duality between possibility and necessity modalities to the graded environment. It has been shown that the all-or-nothing version of possibility theory can be exactly captured by a minimal epistemic logic (MEL) that uses a very small fragment of the KD modal logic, without resorting to relational semantics. Besides, the case of belief functions has been studied independently, and a belief function logic has been obtained by extending the modal logic S5 to graded modalities using Łukasiewicz logic, albeit using relational semantics. This paper shows that a simpler belief function logic can be devised by adding Łukasiewicz logic on top of MEL. It allows for a more natural semantics in terms of Shafer basic probability assignments.
非累加性不确定性理论,通常称为可能性理论,信念函数和不确定的概率与模态逻辑有共同的特征:可能性和必要性测量之间的双对称性性质、信念和可能性函数以及上界和下界概率之间的双对称性性质将可能性和必要性模态扩展到梯度环境中。已经证明,可能性理论的无备选方案版本可以完全被一个最小知识逻辑(MEL)所捕捉,该逻辑使用KD模态逻辑的一个非常小的片段,而无需使用关系语义。此外,信念函数的案例也已经独立地研究了,并通过使用Łukasiewicz逻辑将模态逻辑S5扩展为梯度模态,虽然使用关系语义。这篇论文表明,通过在MEL之上添加Łukasiewicz逻辑,可以设计出更简单的信念函数逻辑。这允许在Shafer基本概率 assignments 方面实现更加自然语义。
https://arxiv.org/abs/2303.13168
Conventional replay-based approaches to continual learning (CL) require, for each learning phase with new data, the replay of samples representing all of the previously learned knowledge in order to avoid catastrophic forgetting. Since the amount of learned knowledge grows over time in CL problems, generative replay spends an increasing amount of time just re-learning what is already known. In this proof-of-concept study, we propose a replay-based CL strategy that we term adiabatic replay (AR), which derives its efficiency from the (reasonable) assumption that each new learning phase is adiabatic, i.e., represents only a small addition to existing knowledge. Each new learning phase triggers a sampling process that selectively replays, from the body of existing knowledge, just such samples that are similar to the new data, in contrast to replaying all of it. Complete replay is not required since AR represents the data distribution by GMMs, which are capable of selectively updating their internal representation only where data statistics have changed. As long as additions are adiabatic, the amount of to-be-replayed samples need not to depend on the amount of previously acquired knowledge at all. We verify experimentally that AR is superior to state-of-the-art deep generative replay using VAEs.
传统的重放基于学习的方法(CL)要求,在每个新的学习阶段中,使用新数据来重放代表以前学到的所有知识样本,以避免灾难性遗忘。由于CL问题中学到的知识量随着时间不断增加,生成回放花费越来越多的时间只是简单地重新学习已经学到的知识。在这个概念验证研究中,我们提出了一种基于重放的学习方法,我们称之为adiabatic重放(AR),其效率来源于(合理的)假设每个新的学习阶段只是对现有的知识进行微小的增加。每个新的学习阶段触发一个选择性采样过程,从现有的知识主体中选择性地重放与新数据相似的样本,而不像全部重放。由于AR代表GMMs对数据的分布,它们只能选择性地更新内部表示只有在数据统计量发生变化时才能进行。只要添加是adiabatic的,即将要重放的数据数量与以前学到的知识量无关。我们实验证实,AR比使用VAEs的最先进的深度生成回放方法优越。
https://arxiv.org/abs/2303.13157
For an AI solution to evolve from a trained machine learning model into a production-ready AI system, many more things need to be considered than just the performance of the machine learning model. A production-ready AI system needs to be trustworthy, i.e. of high quality. But how to determine this in practice? For traditional software, ISO25000 and its predecessors have since long time been used to define and measure quality characteristics. Recently, quality models for AI systems, based on ISO25000, have been introduced. This paper applies one such quality model to a real-life case study: a deep learning platform for monitoring wildflowers. The paper presents three realistic scenarios sketching what it means to respectively use, extend and incrementally improve the deep learning platform for wildflower identification and counting. Next, it is shown how the quality model can be used as a structured dictionary to define quality requirements for data, model and software. Future work remains to extend the quality model with metrics, tools and best practices to aid AI engineering practitioners in implementing trustworthy AI systems.
要将人工智能解决方案从训练机器学习模型进化为生产级别的人工智能系统,需要考虑许多比机器学习模型表现更为重要的事情。一个生产级别的人工智能系统需要可靠性,也就是高质量的。但是,在实践中如何确定这一点呢?对于传统的软件,ISO25000及其前身已经很长时间被用来定义和衡量质量特征。最近,基于ISO25000的质量模型已经被引入到人工智能系统中。本文将一个基于ISO25000的质量模型应用于一个实际案例研究:用于监测野花的深度学习平台。本文提出了三个实际情景,描述了如何使用、扩展和逐步改进深度学习平台来进行野花识别和计数的意义。接下来,它展示了如何使用质量模型作为结构化词典来定义数据、模型和软件的质量要求。未来的工作将继续扩展质量模型,使用指标、工具和最佳实践,帮助人工智能工程师实现可靠的人工智能系统。
https://arxiv.org/abs/2303.13151
Various recent experimental results show that large language models (LLM) exhibit emergent abilities that are not present in small models. System performance is greatly improved after passing a certain critical threshold of scale. In this letter, we provide a simple explanation for such a phase transition phenomenon. For this, we model an LLM as a sequence-to-sequence random function. Instead of using instant generation at each step, we use a list decoder that keeps a list of candidate sequences at each step and defers the generation of the output sequence at the end. We show that there is a critical threshold such that the expected number of erroneous candidate sequences remains bounded when an LLM is below the threshold, and it grows exponentially when an LLM is above the threshold. Such a threshold is related to the basic reproduction number in a contagious disease.
各种最近的实验结果表明,大型语言模型(LLM)表现出小型模型无法出现的 emergent 能力。系统性能在达到某个规模 critical 阈值后极大地改善。在本信中,我们提供了这种相位转移现象的简单解释。为此,我们将 LLM 建模为序列到序列随机函数。 Instead of 使用每个步骤的即时生成,我们使用一个列表解码器,在每个步骤中保持一个列表,并在最后推迟生成输出序列的生成。我们表明,存在一个 critical 阈值,即当 LLM 低于该阈值时,错误候选序列的预计数量将保持有限值,而当 LLM 高于该阈值时,它会呈指数级增长。这种阈值与传染病的基本复制数有关。
https://arxiv.org/abs/2303.13112
In Task Oriented Dialogue (TOD) system, detecting and inducing new intents are two main challenges to apply the system in the real world. In this paper, we suggest the semantic multi-view model to resolve these two challenges: (1) SBERT for General Embedding (GE), (2) Multi Domain Batch (MDB) for dialogue domain knowledge, and (3) Proxy Gradient Transfer (PGT) for cluster-specialized semantic. MDB feeds diverse dialogue datasets to the model at once to tackle the multi-domain problem by learning the multiple domain knowledge. We introduce a novel method PGT, which employs the Siamese network to fine-tune the model with a clustering method directly.Our model can learn how to cluster dialogue utterances by using PGT. Experimental results demonstrate that our multi-view model with MDB and PGT significantly improves the Open Intent Induction performance compared to baseline systems.
在任务定向对话系统(TOD)中,检测和引入新的意图是将其应用于现实世界的两个主要挑战。在本文中,我们建议采用语义多视角模型来解决这两个挑战:(1) SBERT用于一般嵌入(GE),(2) 对话领域知识的多域批量(MDB),(3) 代理梯度转移(PGT)用于簇特定语义。 MDB通过一次性向模型注入多样化的对话数据集来解决多域问题,通过学习多个域知识。我们介绍了一种新的方法PGT,它使用iamese网络微调模型,并通过聚类方法直接进行。我们的模型可以通过使用PGT来学习如何聚类对话 utterances。实验结果表明,与基线系统相比,我们的多视角模型使用MDB和PGT显著提高了开放意图引入性能。
https://arxiv.org/abs/2303.13099
Electronic health records (EHRs) store an extensive array of patient information, encompassing medical histories, diagnoses, treatments, and test outcomes. These records are crucial for enabling healthcare providers to make well-informed decisions regarding patient care. Summarizing clinical notes further assists healthcare professionals in pinpointing potential health risks and making better-informed decisions. This process contributes to reducing errors and enhancing patient outcomes by ensuring providers have access to the most pertinent and current patient data. Recent research has shown that incorporating prompts with large language models (LLMs) substantially boosts the efficacy of summarization tasks. However, we show that this approach also leads to increased output variance, resulting in notably divergent outputs even when prompts share similar meanings. To tackle this challenge, we introduce a model-agnostic Soft Prompt-Based Calibration (SPeC) pipeline that employs soft prompts to diminish variance while preserving the advantages of prompt-based summarization. Experimental findings on multiple clinical note tasks and LLMs indicate that our method not only bolsters performance but also effectively curbs variance for various LLMs, providing a more uniform and dependable solution for summarizing vital medical information.
电子健康记录(EHRs)存储了广泛的患者信息,包括医疗历史、诊断、治疗和测试结果。这些记录对于使医疗保健提供者做出关于护理的知情决策至关重要。总结临床笔记进一步协助医疗保健专业人员指出潜在的健康风险并做出更好的知情决策。这个过程有助于减少错误并增强患者的治疗效果,通过确保提供者访问最相关和最新的患者数据来实现。最近的研究表明,包括大型语言模型(LLMs)的提示极大地提高了摘要任务的有效性。然而,我们表明,这种方法也导致输出变异性增加,即使提示共享相似的含义,仍会导致显著的不同输出。为了应对这个挑战,我们引入了一种模型无关的软提示-基于校准(SPeC)管道,采用软提示以减少变异性,同时保留提示摘要的优势。对多个临床笔记任务和LLM的实验室发现表明,我们的方法不仅增强了性能,而且有效地限制了各种LLM的输出变异性,提供了一种更均匀和可靠的摘要重要医疗信息的解决方案。
https://arxiv.org/abs/2303.13035