In recent years, various large foundation models have been proposed for image segmentation. There models are often trained on large amounts of data corresponding to general computer vision tasks. Hence, these models do not perform well on medical data. There have been some attempts in the literature to perform parameter-efficient finetuning of such foundation models for medical image segmentation. However, these approaches assume that all the parameters of the model are available for adaptation. But, in many cases, these models are released as APIs or blackboxes, with no or limited access to the model parameters and data. In addition, finetuning methods also require a significant amount of compute, which may not be available for the downstream task. At the same time, medical data can't be shared with third-party agents for finetuning due to privacy reasons. To tackle these challenges, we pioneer a blackbox adaptation technique for prompted medical image segmentation, called BAPS. BAPS has two components - (i) An Image-Prompt decoder (IP decoder) module that generates visual prompts given an image and a prompt, and (ii) A Zero Order Optimization (ZOO) Method, called SPSA-GC that is used to update the IP decoder without the need for backpropagating through the foundation model. Thus, our method does not require any knowledge about the foundation model's weights or gradients. We test BAPS on four different modalities and show that our method can improve the original model's performance by around 4%.
近年来,已经提出了许多大型基础模型来进行图像分割。这些模型通常在对应于一般计算机视觉任务的较大数据集上进行训练。因此,这些模型在医学数据上的表现往往不佳。有一些文献中尝试对这类基础模型进行参数高效的微调来进行医学图像分割。然而,这些方法都假设模型的所有参数都可用进行调整。但是,在许多情况下,这些模型作为API或黑盒发布,没有或仅有限地访问模型参数和数据。此外,微调方法也需要大量的计算,这可能不适用于下游任务。同时,由于隐私原因,医疗数据也不能与第三方共享以进行微调。为了应对这些挑战,我们首创了一种用于提示医学图像分割的黑盒适应技术,称为BAPS。BAPS有两个组件——(i)图像提示编码器(IP decoder)模块,根据图像和提示生成视觉提示;(ii)零阶优化(ZOO)方法,称为SPSA-GC,用于在无需通过基础模型反向传播的情况下更新IP decoder。因此,我们的方法不需要了解基础模型的权重或梯度。我们在四个不同数据集上测试BAPS,结果表明,我们的方法可以将原始模型的性能提高约4%。
https://arxiv.org/abs/2405.10913
The exponential growth of astronomical datasets provides an unprecedented opportunity for humans to gain insight into the Universe. However, effectively analyzing this vast amount of data poses a significant challenge. Astronomers are turning to deep learning techniques to address this, but the methods are limited by their specific training sets, leading to considerable duplicate workloads too. Hence, as an example to present how to overcome the issue, we built a framework for general analysis of galaxy images, based on a large vision model (LVM) plus downstream tasks (DST), including galaxy morphological classification, image restoration, object detection, parameter extraction, and more. Considering the low signal-to-noise ratio of galaxy images and the imbalanced distribution of galaxy categories, we have incorporated a Human-in-the-loop (HITL) module into our large vision model, which leverages human knowledge to enhance the reliability and interpretability of processing galaxy images interactively. The proposed framework exhibits notable few-shot learning capabilities and versatile adaptability to all the abovementioned tasks on galaxy images in the DESI legacy imaging surveys. Expressly, for object detection, trained by 1000 data points, our DST upon the LVM achieves an accuracy of 96.7%, while ResNet50 plus Mask R-CNN gives an accuracy of 93.1%; for morphology classification, to obtain AUC ~0.9, LVM plus DST and HITL only requests 1/50 training sets compared to ResNet18. Expectedly, multimodal data can be integrated similarly, which opens up possibilities for conducting joint analyses with datasets spanning diverse domains in the era of multi-message astronomy.
星际数据的增长为人类深入了解宇宙提供了前所未有的机会。然而,有效地分析这些大量数据仍然是一个巨大的挑战。天文学家开始利用深度学习技术解决这个问题,但是这些方法受到其特定训练集的局限,导致大量的重复工作。因此,作为展示如何克服这个问题的示例,我们构建了一个基于大型视觉模型(LVM)和下游任务的框架,包括星系形态分类、图像修复、目标检测、参数提取等。考虑到星系图像的低信噪比和星系类别的失衡分布,我们将人机交互模块纳入我们的大型视觉模型中,利用人类知识增强处理星系图像的可靠性和可解释性。所提出的框架在DESI遗产成像调查中的星系图像上表现出显著的少样本学习能力和对上述所有任务的变通适应性。具体来说,在我们的LVM上训练1000个数据点后,我们的DST在LVM上可以达到96.7%的准确率,而ResNet50加Mask R-CNN可以实现93.1%的准确率;对于形态分类,要获得AUC ~0.9,只需要1/50的训练集,而ResNet18需要更多的训练集。预计,多模态数据可以按照这种方式整合,为多样域数据集的联合分析提供了可能性,这为在多信使天文学时代进行联合分析提供了可能性。
https://arxiv.org/abs/2405.10890
Most existing methods often rely on complex models to predict scene depth with high accuracy, resulting in slow inference that is not conducive to deployment. To better balance precision and speed, we first designed SmallDepth based on sparsity. Second, to enhance the feature representation ability of SmallDepth during training under the condition of equal complexity during inference, we propose an equivalent transformation module(ETM). Third, to improve the ability of each layer in the case of a fixed SmallDepth to perceive different context information and improve the robustness of SmallDepth to the left-right direction and illumination changes, we propose pyramid loss. Fourth, to further improve the accuracy of SmallDepth, we utilized the proposed function approximation loss (APX) to transfer knowledge in the pretrained HQDecv2, obtained by optimizing the previous HQDec to address grid artifacts in some regions, to SmallDepth. Extensive experiments demonstrate that each proposed component improves the precision of SmallDepth without changing the complexity of SmallDepth during inference, and the developed approach achieves state-of-the-art results on KITTI at an inference speed of more than 500 frames per second and with approximately 2 M parameters. The code and models will be publicly available at this https URL.
大多数现有方法通常依赖复杂的模型来预测场景深度,以实现高精度的预测,但会导致推理速度较慢,不适合部署。为了实现精度和速度的平衡,我们首先基于稀疏性设计了我的小型深度(SmallDepth)。接着,为了在推理过程中增强SmallDepth的特征表示能力,我们提出了等效变换模块(ETM)。然后,为了在固定SmallDepth的情况下更好地感知不同上下文信息,并提高SmallDepth对左右方向和光照变化等的鲁棒性,我们提出了金字塔损失。最后,为了进一步提高SmallDepth的准确性,我们利用所提出的函数逼近损失(APX)将前HQDecv2预训练知识传递给SmallDepth,以解决某些区域中的网格伪影问题。大量实验证明,每个所提出的组件都能提高SmallDepth的精度,而不会改变SmallDepth在推理过程中的复杂度,并且所开发的方法在每秒超过500帧的推理速度和大约20000个参数的KITTI数据集上实现了最先进的结果。代码和模型将公开发布在https://这个URL上。
https://arxiv.org/abs/2405.10885
Increasing demands on medical imaging departments are taking a toll on the radiologist's ability to deliver timely and accurate reports. Recent technological advances in artificial intelligence have demonstrated great potential for automatic radiology report generation (ARRG), sparking an explosion of research. This survey paper conducts a methodological review of contemporary ARRG approaches by way of (i) assessing datasets based on characteristics, such as availability, size, and adoption rate, (ii) examining deep learning training methods, such as contrastive learning and reinforcement learning, (iii) exploring state-of-the-art model architectures, including variations of CNN and transformer models, (iv) outlining techniques integrating clinical knowledge through multimodal inputs and knowledge graphs, and (v) scrutinising current model evaluation techniques, including commonly applied NLP metrics and qualitative clinical reviews. Furthermore, the quantitative results of the reviewed models are analysed, where the top performing models are examined to seek further insights. Finally, potential new directions are highlighted, with the adoption of additional datasets from other radiological modalities and improved evaluation methods predicted as important areas of future development.
医疗影像部门的需求不断增加,对放射科医生及时和准确报告的能力产生了压力。最近的人工智能技术进步表明,自动放射学报告生成(ARRG)具有巨大的潜力,引发了爆炸性的研究。这份调查论文通过对当代ARRG方法的系统综述,评估了基于特征的数据集,研究了对比学习和支持性学习等深度学习训练方法,探讨了最先进的模型架构,包括CNN和Transformer模型的变体,以及通过多模态输入和知识图整合临床知识的 technique。此外,还审视了当前的模型评估技术,包括常用于NLP指标的质量和定性临床评价。进一步,审查了审查模型的定量结果,重点关注表现最好的模型,以寻求进一步的见解。最后,概述了潜在的新方向,包括其他放射学模态数据集的采用和评估方法的改进,预测为未来发展的关键领域。
https://arxiv.org/abs/2405.10842
Thanks to the explosive developments of data-driven learning methodologies recently, reinforcement learning (RL) emerges as a promising solution to address the legged locomotion problem in robotics. In this manuscript, we propose a novel concurrent teacher-student reinforcement learning architecture for legged locomotion over challenging terrains, based only on proprioceptive measurements in real-world deployment. Different from convectional teacher-student architecture that trains the teacher policy via RL and transfers the knowledge to the student policy through supervised learning, our proposed architecture trains teacher and student policy networks concurrently under the reinforcement learning paradigm. To achieve this, we develop a new training scheme based on conventional proximal policy gradient (PPO) method to accommodate the interaction between teacher policy network and student policy network. The effectiveness of the proposed architecture as well as the new training scheme is demonstrated through extensive indoor and outdoor experiments on quadrupedal robots and point-foot bipedal robot, showcasing robust locomotion over challenging terrains and improved performance compared to two-stage training methods.
感谢数据驱动学习方法论的爆炸性发展,强化学习(RL)在机器人学中解决腿行问题变得具有前景。在本文中,我们提出了一个新颖的并行教师-学生强化学习架构,用于解决具有挑战性地形的三足机器人。与通过RL训练教师策略并通过监督学习将知识传递给学生策略的传热器教师-学生架构不同,我们提出的架构在强化学习范式下训练教师和学生策略网络的同时。为了实现这一目标,我们开发了一种基于传统近端策略梯度(PPO)方法的新训练方案,以适应教师策略网络和学生策略网络之间的交互。通过在室内和室外对四足机器人和点脚步行机器人进行广泛的实验,证明所提出的架构的有效性和新训练方案的优越性,展示了在具有挑战性地形下的稳健运动和与两阶段训练方法相比的性能提升。
https://arxiv.org/abs/2405.10830
Knowledge-intensive tasks pose a significant challenge for Machine Learning (ML) techniques. Commonly adopted methods, such as Large Language Models (LLMs), often exhibit limitations when applied to such tasks. Nevertheless, there have been notable endeavours to mitigate these challenges, with a significant emphasis on augmenting LLMs through Knowledge Graphs (KGs). While KGs provide many advantages for representing knowledge, their development costs can deter extensive research and applications. Addressing this limitation, we introduce a framework for enriching embeddings of small-scale domain-specific Knowledge Graphs with well-established general-purpose KGs. Adopting our method, a modest domain-specific KG can benefit from a performance boost in downstream tasks when linked to a substantial general-purpose KG. Experimental evaluations demonstrate a notable enhancement, with up to a 44% increase observed in the Hits@10 metric. This relatively unexplored research direction can catalyze more frequent incorporation of KGs in knowledge-intensive tasks, resulting in more robust, reliable ML implementations, which hallucinates less than prevalent LLM solutions. Keywords: knowledge graph, knowledge graph completion, entity alignment, representation learning, machine learning
知识密集型任务对机器学习(ML)技术带来了巨大的挑战。通常采用的方法,如大型语言模型(LLMs),当应用于这类任务时往往表现出局限性。然而,已经取得了一些有意义的研究来缓解这些挑战,并重点关注通过知识图谱(KGs)增强LLMs。虽然KGs为表示知识提供了许多优势,但它们的发展成本可能会阻碍广泛的研究和应用。为解决这一局限,我们引入了一个用于丰富小规模领域特定知识图谱的嵌入框架,并与其对应的一般性通用知识图谱。采用我们方法的一个小规模领域特定KG可以在与大量通用KG链接时提高下游任务的性能。实验评估表明,在Hits@10指标上观察到了显著的增强,最高达到44%的观察值。这个相对尚未探索的研究方向可以催化更频繁地将知识图谱(KGs)应用于知识密集型任务,从而实现更健壮、可靠的ML实现,这远非流行的LLM解决方案所梦寐以求的。关键词:知识图谱,知识图谱完成,实体对齐,表示学习,机器学习
https://arxiv.org/abs/2405.10745
Subtitling plays a crucial role in enhancing the accessibility of audiovisual content and encompasses three primary subtasks: translating spoken dialogue, segmenting translations into concise textual units, and estimating timestamps that govern their on-screen duration. Past attempts to automate this process rely, to varying degrees, on automatic transcripts, employed diversely for the three subtasks. In response to the acknowledged limitations associated with this reliance on transcripts, recent research has shifted towards transcription-free solutions for translation and segmentation, leaving the direct generation of timestamps as uncharted territory. To fill this gap, we introduce the first direct model capable of producing automatic subtitles, entirely eliminating any dependence on intermediate transcripts also for timestamp prediction. Experimental results, backed by manual evaluation, showcase our solution's new state-of-the-art performance across multiple language pairs and diverse conditions.
字幕在增强音频和视频内容的可访问性方面起着关键作用,包括三个基本子任务:翻译口语对话、将翻译段划分为简洁的文本单元以及估计控制它们在屏幕上持续时间的时刻。过去尝试自动化的过程中,以不同程度依赖自动转录来完成这三个子任务。为了回应这种对转录依赖关系的认可限制,最近的研究转向了完全无需转录的翻译和分割解决方案,使直接生成时刻作为未知领域。为了填补这一空白,我们引入了第一个能够产生自动字幕的直接模型,完全消除了对中间转录的依赖,同时也为时刻预测提供了直接支持。实验结果,经过手动评估,展示了我们解决方案在多种语言对和各种条件下的最新状态。
https://arxiv.org/abs/2405.10741
Large language models (LLMs) trained on general domain corpora showed remarkable results on natural language processing (NLP) tasks. However, previous research demonstrated LLMs trained using domain-focused corpora perform better on specialized tasks. Inspired by this pivotal insight, we developed INDUS, a comprehensive suite of LLMs tailored for the Earth science, biology, physics, heliophysics, planetary sciences and astrophysics domains and trained using curated scientific corpora drawn from diverse data sources. The suite of models include: (1) an encoder model trained using domain-specific vocabulary and corpora to address natural language understanding tasks, (2) a contrastive-learning-based general text embedding model trained using a diverse set of datasets drawn from multiple sources to address information retrieval tasks and (3) smaller versions of these models created using knowledge distillation techniques to address applications which have latency or resource constraints. We also created three new scientific benchmark datasets namely, CLIMATE-CHANGE-NER (entity-recognition), NASA-QA (extractive QA) and NASA-IR (IR) to accelerate research in these multi-disciplinary fields. Finally, we show that our models outperform both general-purpose encoders (RoBERTa) and existing domain-specific encoders (SciBERT) on these new tasks as well as existing benchmark tasks in the domains of interest.
基于通用领域语料库训练的大语言模型(LLMs)在自然语言处理(NLP)任务上表现出惊人的效果。然而,以前的研究表明,使用领域关注语料库训练的LLM在专业任务上表现更好。受到这一关键洞见的启发,我们开发了INDUS,一款专为地球科学、生物学、物理学、天体物理学、行星科学和天文学领域设计的全面LLM,并使用来自各种数据源的经过策展的科学语料库进行训练。该套模型包括:(1)一个使用领域特定词汇和语料库训练的编码器模型,以解决自然语言理解任务, (2)一个使用多样数据源的对比学习基于通用文本嵌入模型的信息检索任务,和一个使用知识蒸馏技术创建的更小的模型,以解决具有延迟或资源限制的应用程序, (3)三个新的科学基准数据集:CLIMATE-CHANGE-NER(实体识别)、NASA-QA(提取式QA)和NASA-IR(IR),以加速这些多学科领域的研究。最后,我们证明了我们的模型在这些新任务以及感兴趣领域的现有基准任务上都超越了通用编码器(RoBERTa)和现有领域编码器(SciBERT)。
https://arxiv.org/abs/2405.10725
Modern diffusion MRI sequences commonly acquire a large number of volumes with diffusion sensitization gradients of differing strengths or directions. Such sequences rely on echo-planar imaging (EPI) to achieve reasonable scan duration. However, EPI is vulnerable to off-resonance effects, leading to tissue susceptibility and eddy-current induced distortions. The latter is particularly problematic because it causes misalignment between volumes, disrupting downstream modelling and analysis. The essential correction of eddy distortions is typically done post-acquisition, with image registration. However, this is non-trivial because correspondence between volumes can be severely disrupted due to volume-specific signal attenuations induced by varying directions and strengths of the applied gradients. This challenge has been successfully addressed by the popular FSL~Eddy tool but at considerable computational cost. We propose an alternative approach, leveraging recent advances in image processing enabled by deep learning (DL). It consists of two convolutional neural networks: 1) An image translator to restore correspondence between images; 2) A registration model to align the translated images. Results demonstrate comparable distortion estimates to FSL~Eddy, while requiring only modest training sample sizes. This work, to the best of our knowledge, is the first to tackle this problem with deep learning. Together with recently developed DL-based susceptibility correction techniques, they pave the way for real-time preprocessing of diffusion MRI, facilitating its wider uptake in the clinic.
现代扩散MRI序列通常使用扩散敏感度梯度的大小或方向不同的数量来获取大量体积数据。这些序列依赖于回波平面成像(EPI)来实现合理的扫描时间。然而,EPI对谐波影响敏感,导致组织敏度和涡流诱导的畸变。后一种情况尤其成问题,因为它会导致体积之间的错位,破坏下游建模和分析。对涡流畸变的根本纠正通常在收购后进行,通过图像配准实现。但是,这并不容易,因为应用的梯度方向和强度引起的体积特定信号衰减会导致图像之间的对应关系严重破坏。这个问题已经被流行的FSL~Eddy工具成功解决,但代价是相当高的计算成本。我们提出了一个利用深度学习(DL)成像处理 recent 进展的方法来解决这个问题。它包括两个卷积神经网络:1)一个图像转换器来恢复图像之间的对应关系;2)一个配准模型来对平移后的图像进行对齐。结果表明,与FSL~Eddy相当的组织畸变估计,而只需要很小的训练样本量。据我们所知,这是第一个利用深度学习来解决这个问题的。与最近开发基于DL的敏感性纠正技术相结合,它们为扩散MRI的实时预处理铺平了道路,促进了其在临床上的更广泛应用。
https://arxiv.org/abs/2405.10723
Drought is a complex environmental phenomenon that affects millions of people and communities all over the globe and is too elusive to be accurately predicted. This is mostly due to the scalability and variability of the web of environmental parameters that directly/indirectly causes the onset of different categories of drought. Since the dawn of man, efforts have been made to uniquely understand the natural indicators that provide signs of likely environmental events. These indicators/signs in the form of indigenous knowledge system have been used for generations. The intricate complexity of drought has, however, always been a major stumbling block for accurate drought prediction and forecasting systems. Recently, scientists in the field of agriculture and environmental monitoring have been discussing the integration of indigenous knowledge and scientific knowledge for a more accurate environmental forecasting system in order to incorporate diverse environmental information for a reliable drought forecast. Hence, in this research, the core objective is the development of a semantics-based data integration middleware that encompasses and integrates heterogeneous data models of local indigenous knowledge and sensor data towards an accurate drought forecasting system for the study areas. The local indigenous knowledge on drought gathered from the domain experts is transformed into rules to be used for performing deductive inference in conjunction with sensors data for determining the onset of drought through an automated inference generation module of the middleware. The semantic middleware incorporates, inter alia, a distributed architecture that consists of a streaming data processing engine based on Apache Kafka for real-time stream processing; a rule-based reasoning module; an ontology module for semantic representation of the knowledge bases.
干旱是一种全球性的复杂环境现象,影响了数百万人和社区。由于环境参数的复杂性和可变性,准确预测干旱的发生仍然是困难的。这主要是因为导致不同类型干旱发生的环境参数网络的可扩展性和变异性。自人类起源以来,人们一直在努力独特地理解提供可能环境事件迹象的自然指示器。这些指示器/迹象在原住民知识系统中以图形式呈现,被几代人用于实践。然而,干旱的复杂性总是让准确干旱预测和预测系统陷入困境。最近,农业和环境监测领域的科学家们正在讨论将原住民知识和科学知识相结合以建立更准确的环保预测系统,以便纳入多样化的环境信息以实现可靠的干旱预测。因此,在这项研究中,核心目标是开发一个基于语义的数据集成中间件,该中间件包括并整合了地方原住民知识和传感器数据的异质数据模式,以建立准确干旱预测系统。通过对当地专家收集的干旱本地知识进行转换,该中间件将规则用于使用传感器数据进行演绎推理以确定干旱的发生。该语义中间件包括,等等,一个基于Apache Kafka的流式数据处理引擎进行实时处理;基于规则的推理模块;一个知识库模块,用于表示知识基础的语义表示。
https://arxiv.org/abs/2405.10713
Diaspora communities are disproportionately impacted by off-the-radar misinformation and often neglected by mainstream fact-checking efforts, creating a critical need to scale-up efforts of nascent fact-checking initiatives. In this paper we present SynDy, a framework for Synthetic Dynamic Dataset Generation to leverage the capabilities of the largest frontier Large Language Models (LLMs) to train local, specialized language models. To the best of our knowledge, SynDy is the first paper utilizing LLMs to create fine-grained synthetic labels for tasks of direct relevance to misinformation mitigation, namely Claim Matching, Topical Clustering, and Claim Relationship Classification. SynDy utilizes LLMs and social media queries to automatically generate distantly-supervised, topically-focused datasets with synthetic labels on these three tasks, providing essential tools to scale up human-led fact-checking at a fraction of the cost of human-annotated data. Training on SynDy's generated labels shows improvement over a standard baseline and is not significantly worse compared to training on human labels (which may be infeasible to acquire). SynDy is being integrated into Meedan's chatbot tiplines that are used by over 50 organizations, serve over 230K users annually, and automatically distribute human-written fact-checks via messaging apps such as WhatsApp. SynDy will also be integrated into our deployed Co-Insights toolkit, enabling low-resource organizations to launch tiplines for their communities. Finally, we envision SynDy enabling additional fact-checking tools such as matching new misinformation claims to high-quality explainers on common misinformation topics.
离散社区受到的是来自非公开传播的错误信息的不公平影响,往往被主流事实核查忽略了,这导致有必要扩大新兴事实核查倡议的工作规模。在本文中,我们提出了SynDy,一个利用大型前沿大型语言模型的框架来训练本地专业语言模型的合成动态数据集生成框架。据我们所知,SynDy是第一个利用大型语言模型为防止错误信息传播任务创建细粒度人造标签的论文。SynDy利用大型语言模型和社交媒体查询来自动生成这三个任务上带有人造标签的距离监督、主题集中的数据,为人类引导下的实地核实提供关键工具,而成本仅为人类标注数据的几分之一。在SynDy生成的标签上进行训练表明,效果优于标准基线,与人类标签训练相比,并没有显著的差别(这可能不可行)。SynDy正在被整合到Meedan的聊天机器人建议中,该机器人建议由50个组织使用,每年服务超过230,000用户,并通过消息应用程序如WhatsApp自动分发通过的事实核查。SynDy还将被整合到我们部署的Co-Insights工具包中,使资源有限的组织能够为自己的社区启动建议。最后,我们展望SynDy将能够为其他事实核查工具提供支持,如将新错误主张与高质量解释器匹配,这些解释器通常涉及常见的错误信息主题。
https://arxiv.org/abs/2405.10700
Large language models (LLMs) have become integral to our professional workflows and daily lives. Nevertheless, these machine companions of ours have a critical flaw: the huge amount of data which endows them with vast and diverse knowledge, also exposes them to the inevitable toxicity and bias. While most LLMs incorporate defense mechanisms to prevent the generation of harmful content, these safeguards can be easily bypassed with minimal prompt engineering. In this paper, we introduce the new Thoroughly Engineered Toxicity (TET) dataset, comprising manually crafted prompts designed to nullify the protective layers of such models. Through extensive evaluations, we demonstrate the pivotal role of TET in providing a rigorous benchmark for evaluation of toxicity awareness in several popular LLMs: it highlights the toxicity in the LLMs that might remain hidden when using normal prompts, thus revealing subtler issues in their behavior.
大语言模型(LLMs)已经成为了我们专业工作流程和日常生活的重要组成部分。然而,这些机器伙伴的一个关键缺陷是:赋予它们丰富多样知识的巨大数据集,也使它们容易受到不可避免的偏见和毒性影响。虽然大多数LLM都包含了防止生成有害内容的防御机制,但这些保护措施可以通过轻微的提示工程轻松绕过。在本文中,我们介绍了新的全面加工的毒性(TET)数据集,包括旨在消除这些模型保护层的人工编写的提示。通过广泛的评估,我们证明了TET在提供对几种流行LLM毒性意识的重要基准方面具有关键作用:它揭示了在正常提示下可能隐藏的LLM的毒性,从而揭示了它们行为中更微妙的問題。
https://arxiv.org/abs/2405.10659
Federated Learning (FL) has gained attention for addressing data scarcity and privacy concerns. While parallel FL algorithms like FedAvg exhibit remarkable performance, they face challenges in scenarios with diverse network speeds and concerns about centralized control, especially in multi-institutional collaborations like the medical domain. Serial FL presents an alternative solution, circumventing these challenges by transferring model updates serially between devices in a cyclical manner. Nevertheless, it is deemed inferior to parallel FL in that (1) its performance shows undesirable fluctuations, and (2) it converges to a lower plateau, particularly when dealing with non-IID data. The observed phenomenon is attributed to catastrophic forgetting due to knowledge loss from previous sites. In this paper, to overcome fluctuation and low efficiency in the iterative learning and forgetting process, we introduce cyclical weight consolidation (CWC), a straightforward yet potent approach specifically tailored for serial FL. CWC employs a consolidation matrix to regulate local optimization. This matrix tracks the significance of each parameter on the overall federation throughout the entire training trajectory, preventing abrupt changes in significant weights. During revisitation, to maintain adaptability, old memory undergoes decay to incorporate new information. Our comprehensive evaluations demonstrate that in various non-IID settings, CWC mitigates the fluctuation behavior of the original serial FL approach and enhances the converged performance consistently and significantly. The improved performance is either comparable to or better than the parallel vanilla.
联邦学习(FL)因解决数据稀缺性和隐私问题而受到关注。虽然并行FL算法如FedAvg表现出惊人的性能,但在具有多样网络速度的场景中,它们面临着集中控制担忧以及特别是在医疗领域等复杂环境中的挑战。序列FL提供了一种替代方案,通过在设备之间以周期性的方式传递模型更新来绕过这些挑战。然而,它在性能上被认为是劣于并行FL的,原因有两点:(1)其性能存在不可预测的波动,(2)当处理非IID数据时,它趋向于较低的峰点。这种观察到的现象归因于从之前站点造成的知识丢失导致的灾难性遗忘。在本文中,为了克服迭代学习与遗忘过程中的波动和低效率,我们引入了循环权重巩固(CWC),这是专门为序列FL设计的简单而强大的方法。CWC使用一个整合矩阵来调节局部优化。这个矩阵在整个训练过程中跟踪每个参数对整个联邦的重要性,防止关键权重突然变化。在重新访问时,为了保持适应性,旧记忆会衰减并纳入新信息。我们的全面评估表明,在各种非IID场景中,CWC减轻了原始序列FL方法的波动行为,并且始终显著增强其收敛性能。改善后的性能要么与原序列FL相当,要么更好。
https://arxiv.org/abs/2405.10647
Large Language Models (LLMs) have gained significant attention in the field of natural language processing (NLP) due to their wide range of applications. However, training LLMs for languages other than English poses significant challenges, due to the difficulty in acquiring large-scale corpus and the requisite computing resources. In this paper, we propose ChatFlow, a cross-language transfer-based LLM, to address these challenges and train large Chinese language models in a cost-effective manner. We employ a mix of Chinese, English, and parallel corpus to continuously train the LLaMA2 model, aiming to align cross-language representations and facilitate the knowledge transfer specifically to the Chinese language model. In addition, we use a dynamic data sampler to progressively transition the model from unsupervised pre-training to supervised fine-tuning. Experimental results demonstrate that our approach accelerates model convergence and achieves superior performance. We evaluate ChatFlow on popular Chinese and English benchmarks, the results indicate that it outperforms other Chinese models post-trained on LLaMA-2-7B.
大型语言模型(LLMs)因其在自然语言处理(NLP)领域的广泛应用而引起了NLP领域的广泛关注。然而,为非英语语言训练LLMs存在巨大的挑战,这是因为获取大规模语料库和所需的计算资源困难。在本文中,我们提出了ChatFlow,一种跨语言迁移的LLM,以解决这些挑战,并以经济的方式训练大型中文语言模型。我们采用中国语、英语和并行语料库连续训练LLLaMA2模型,旨在使跨语言表示对齐,并促进特别是对中文模型的知识传递。此外,我们还使用动态数据抽样器,逐步将模型从无监督预训练转移到监督微调。实验结果表明,我们的方法加速了模型收敛,并取得了卓越的性能。我们在流行的大中文和英文基准上评估了ChatFlow,结果表明,它在大LaMA-2-7B预训练后的其他中文模型中表现优异。
https://arxiv.org/abs/2405.10626
Temporal Knowledge Graph (TKG) reasoning focuses on predicting events through historical information within snapshots distributed on a timeline. Existing studies mainly concentrate on two perspectives of leveraging the history of TKGs, including capturing evolution of each recent snapshot or correlations among global historical facts. Despite the achieved significant accomplishments, these models still fall short of (1) investigating the influences of multi-granularity interactions across recent snapshots and (2) harnessing the expressive semantics of significant links accorded with queries throughout the entire history, especially events exerting a profound impact on the future. These inadequacies restrict representation ability to reflect historical dependencies and future trends thoroughly. To overcome these drawbacks, we propose an innovative TKG reasoning approach towards \textbf{His}torically \textbf{R}elevant \textbf{E}vents \textbf{S}tructuring ($\mathsf{HisRES}$). Concretely, $\mathsf{HisRES}$ comprises two distinctive modules excelling in structuring historically relevant events within TKGs, including a multi-granularity evolutionary encoder that captures structural and temporal dependencies of the most recent snapshots, and a global relevance encoder that concentrates on crucial correlations among events relevant to queries from the entire history. Furthermore, $\mathsf{HisRES}$ incorporates a self-gating mechanism for adaptively merging multi-granularity recent and historically relevant structuring representations. Extensive experiments on four event-based benchmarks demonstrate the state-of-the-art performance of $\mathsf{HisRES}$ and indicate the superiority and effectiveness of structuring historical relevance for TKG reasoning.
temporal知识图(TKG)推理关注通过时间轴上分发的快照中的历史信息来预测事件。现有研究主要集中在利用TKG历史两个方面,包括捕捉每个最近快照的演化过程以及全球历史事实之间的关联。尽管取得了显著的成就,但这些模型仍然不足以(1)研究多粒度交互对最近快照之间影响的调查,(2)利用整个历史中与查询相关的显著链接的语义语义。这些不足限制了表示能力,不能充分反映历史依赖和未来趋势。为了克服这些缺陷,我们提出了一个创新的历史相关事件构建(HisRElevant Structuring,\拼写为$\mathsf{HisRES}$)的TKG推理方法。具体来说,$\mathsf{HisRES}$包括两个具有独特功能的模块,在TKG中构建历史相关事件,包括一个多粒度进化编码器,它捕捉了最近快照的结构和时间依赖;一个全局相关编码器,它集中于整个历史中与查询相关的关键关联。此外,$\mathsf{HisRES}$引入了一个自适应合并多粒度最近和历史相关结构表示的自门机制。在四个基于事件的基准测试中进行的广泛实验证明了$\mathsf{HisRES}$的最先进性能,并表明了为TKG推理构建历史相关性的优越性和有效性。
https://arxiv.org/abs/2405.10621
Recent advances in multi-view camera-only 3D object detection either rely on an accurate reconstruction of bird's-eye-view (BEV) 3D features or on traditional 2D perspective view (PV) image features. While both have their own pros and cons, few have found a way to stitch them together in order to benefit from "the best of both worlds". To this end, we explore a duo space (i.e., BEV and PV) 3D perception framework, in conjunction with some useful duo space fusion strategies that allow effective aggregation of the two feature representations. To the best of our knowledge, our proposed method, DuoSpaceNet, is the first to leverage two distinct feature spaces and achieves the state-of-the-art 3D object detection and BEV map segmentation results on nuScenes dataset.
近年来,多视角相机仅3D物体检测技术的发展主要依赖于对鸟眼视图(BEV)3D特征的准确重建,或者依赖于传统2D透视图(PV)图像特征。虽然两者都有其自身的优点和缺点,但很少有方法将它们结合起来以实现“两者之最”。因此,我们探讨了一种结合鸟眼视图(BEV)和透视图(PV)的3D感知框架,并探讨了一些有用的二元空间融合策略,以实现两个特征表示的有效聚合。据我们所知,我们提出的方法DuoSpaceNet是第一个利用两个不同的特征空间并实现 nuScenes 数据集上最先进的3D物体检测和 BEV地图分割结果的方法。
https://arxiv.org/abs/2405.10577
Automated driving fundamentally requires knowledge about the surrounding geometry of the scene. Modern approaches use only captured images to predict occupancy maps that represent the geometry. Training these approaches requires accurate data that may be acquired with the help of LiDAR scanners. We show that the techniques used for current benchmarks and training datasets to convert LiDAR scans into occupancy grid maps yield very low quality, and subsequently present a novel approach using evidence theory that yields more accurate reconstructions. We demonstrate that these are superior by a large margin, both qualitatively and quantitatively, and that we additionally obtain meaningful uncertainty estimates. When converting the occupancy maps back to depth estimates and comparing them with the raw LiDAR measurements, our method yields a MAE improvement of 30% to 52% on nuScenes and 53% on Waymo over other occupancy ground-truth data. Finally, we use the improved occupancy maps to train a state-of-the-art occupancy prediction method and demonstrate that it improves the MAE by 25% on nuScenes.
自动驾驶从根本上需要关于场景周围的几何知识。现代方法仅使用捕获的图像预测占据地图,代表几何形状。为训练这些方法,需要准确的数据,这可能通过激光雷达扫描器获得。我们证明了当前基准测试数据集和训练数据集中的转换激光雷达扫描为占据网格图的方法产生非常低质量,然后使用证据理论提出了一种新方法,该方法产生更准确的重构。我们证明了这些方法的优越性,不仅在定性方面,而且在定量方面,并且我们还获得了有意义的置信度估计。将占据地图转换为深度估计并与原始激光雷达测量进行比较,我们的方法在nuScenes和Waymo上的MAE改进分别为30%到52%和53%。最后,我们使用改进的占据地图训练了最先进的占据预测方法,并证明了它提高了MAE by 25%在nuScenes上的表现。
https://arxiv.org/abs/2405.10575
In light of recent breakthroughs in large language models (LLMs) that have revolutionized natural language processing (NLP), there is an urgent need for new benchmarks to keep pace with the fast development of LLMs. In this paper, we propose CFLUE, the Chinese Financial Language Understanding Evaluation benchmark, designed to assess the capability of LLMs across various dimensions. Specifically, CFLUE provides datasets tailored for both knowledge assessment and application assessment. In knowledge assessment, it consists of 38K+ multiple-choice questions with associated solution explanations. These questions serve dual purposes: answer prediction and question reasoning. In application assessment, CFLUE features 16K+ test instances across distinct groups of NLP tasks such as text classification, machine translation, relation extraction, reading comprehension, and text generation. Upon CFLUE, we conduct a thorough evaluation of representative LLMs. The results reveal that only GPT-4 and GPT-4-turbo achieve an accuracy exceeding 60\% in answer prediction for knowledge assessment, suggesting that there is still substantial room for improvement in current LLMs. In application assessment, although GPT-4 and GPT-4-turbo are the top two performers, their considerable advantage over lightweight LLMs is noticeably diminished. The datasets and scripts associated with CFLUE are openly accessible at this https URL.
鉴于最近大型语言模型(LLMs)的突破性进展,为了跟上LLMs的快速发展,有必要制定新的基准。在本文中,我们提出了CFLUE,即中文金融语言理解评估基准,旨在评估LLM在各种维度上的能力。具体来说,CFLUE为知识评估和应用评估提供了专门的数据集。在知识评估方面,它包括38K多个选择题及其相关解答解释。这些问题具有双重作用:答案预测和问题推理。在应用评估方面,CFLUE涵盖了NLP任务中的16K多个测试实例,如文本分类、机器翻译、关系提取、阅读理解和高中生成的文本。通过CFLUE,我们对代表LLM的典型模型进行了深入评估。结果显示,只有GPT-4和GPT-4-turbo在知识评估方面的答案预测准确率超过60%,这表明当前LLM仍有很大的改进空间。在应用评估方面,尽管GPT-4和GPT-4-turbo是前两名,但它们相对于轻量级LLM的优势显著减弱。与CFLUE相关的数据集和脚本已公开在本文后的这个链接中。
https://arxiv.org/abs/2405.10542
This paper explores an automatic news generation and fact-checking system based on language processing, aimed at enhancing the efficiency and quality of news production while ensuring the authenticity and reliability of the news content. With the rapid development of Natural Language Processing (NLP) and deep learning technologies, automatic news generation systems are capable of extracting key information from massive data and generating well-structured, fluent news articles. Meanwhile, by integrating fact-checking technology, the system can effectively prevent the spread of false news and improve the accuracy and credibility of news. This study details the key technologies involved in automatic news generation and factchecking, including text generation, information extraction, and the application of knowledge graphs, and validates the effectiveness of these technologies through experiments. Additionally, the paper discusses the future development directions of automatic news generation and fact-checking systems, emphasizing the importance of further integration and innovation of technologies. The results show that with continuous technological optimization and practical application, these systems will play an increasingly important role in the future news industry, providing more efficient and reliable news services.
本文探讨了一种基于语言处理技术的自动新闻生成和事实核查系统,旨在提高新闻生产的效率和质量,同时确保新闻内容的真实性和可靠性。随着自然语言处理(NLP)和深度学习技术的快速发展,自动新闻生成系统能够从大量数据中提取关键信息并生成结构化、流畅的新闻文章。同时,通过整合事实核查技术,该系统可以有效防止虚假新闻的传播,提高新闻的准确性和可信度。本研究详细介绍了自动新闻生成和事实核查系统所涉及的关键技术,包括文本生成、信息提取和知识图谱的应用,并通过实验验证了这些技术的效果。此外,本文讨论了自动新闻生成和事实核查系统的未来发展方向,强调了对技术进一步整合和创新的重要性。结果表明,通过持续的技术优化和实际应用,这些系统将在未来的新闻行业中发挥越来越重要的作用,提供更加高效和可靠的新闻服务。
https://arxiv.org/abs/2405.10492
Foundation model-enabled generative artificial intelligence facilitates the development and implementation of agents, which can leverage distinguished reasoning and language processing capabilities to takes a proactive, autonomous role to pursue users' goals. Nevertheless, there is a lack of systematic knowledge to guide practitioners in designing the agents considering challenges of goal-seeking (including generating instrumental goals and plans), such as hallucinations inherent in foundation models, explainability of reasoning process, complex accountability, etc. To address this issue, we have performed a systematic literature review to understand the state-of-the-art foundation model-based agents and the broader ecosystem. In this paper, we present a pattern catalogue consisting of 16 architectural patterns with analyses of the context, forces, and trade-offs as the outcomes from the previous literature review. The proposed catalogue can provide holistic guidance for the effective use of patterns, and support the architecture design of foundation model-based agents by facilitating goal-seeking and plan generation.
基础模型启发的生成人工智能促进了代理的开发和实施,这些代理可以利用卓越的推理和语言处理能力,以主动、自主地追求用户的目标。然而,在指导实践者设计代理以考虑追求目标(包括生成工具目标和计划)时,缺乏系统化的知识。例如,基础模型中存在的幻觉,推理过程的解释性,复杂责任等挑战。为了解决这个问题,我们进行了系统的文献回顾,以了解基于基础模型的代理的状态和整个生态系统。在本文中,我们提出了一个由16个架构模式组成的模式目录,这些分析基于前文献综述的结果。所提出的目录可以为有效使用模式提供全面指导,并通过促进目标寻求和计划生成来支持基于基础模型的代理架构设计。
https://arxiv.org/abs/2405.10467