The surge in black-box AI models has prompted the need to explain the internal mechanism and justify their reliability, especially in high-stakes applications, such as healthcare and autonomous driving. Due to the lack of a rigorous definition of explainable AI (XAI), a plethora of research related to explainability, interpretability, and transparency has been developed to explain and analyze the model from various perspectives. Consequently, with an exhaustive list of papers, it becomes challenging to have a comprehensive overview of XAI research from all aspects. Considering the popularity of neural networks in AI research, we narrow our focus to a specific area of XAI research: gradient based explanations, which can be directly adopted for neural network models. In this review, we systematically explore gradient based explanation methods to date and introduce a novel taxonomy to categorize them into four distinct classes. Then, we present the essence of technique details in chronological order and underscore the evolution of algorithms. Next, we introduce both human and quantitative evaluations to measure algorithm performance. More importantly, we demonstrate the general challenges in XAI and specific challenges in gradient based explanations. We hope that this survey can help researchers understand state-of-the-art progress and their corresponding disadvantages, which could spark their interest in addressing these issues in future work.
黑盒AI模型的激增引发了解释内部机制并证明其可靠性的需要,特别是在高风险应用中,如医疗和自动驾驶等领域。由于可解释AI(XAI)的严谨定义缺失,为了解释和分析模型从各种角度进行大量的研究。因此,随着一系列论文的详细列出,全面了解XAI研究方面变得具有挑战性。考虑到神经网络在人工智能研究中的流行,我们将重点缩小为XAI研究的一个具体领域:基于梯度的解释,可以直接应用于神经网络模型。 在本文回顾中,我们系统地探讨了迄今为止的基于梯度的解释方法,并引入了一个新的分类体系将它们分为四个不同的类别。然后,我们按时间顺序呈现了技术细节,强调了解算法的演变过程。接下来,我们引入了人类和定量评估来衡量算法的性能。更重要的是,我们展示了XAI的一般挑战和基于梯度的解释的特殊挑战。我们希望这次调查可以帮助研究人员了解最先进的进展,以及他们相应的不足之处,激发他们在未来的工作中关注这些问题。
https://arxiv.org/abs/2403.10415
The swift evolution of Large-scale Models (LMs), either language-focused or multi-modal, has garnered extensive attention in both academy and industry. But despite the surge in interest in this rapidly evolving area, there are scarce systematic reviews on their capabilities and potential in distinct impactful scenarios. This paper endeavours to help bridge this gap, offering a thorough examination of the current landscape of LM usage in regards to complex game playing scenarios and the challenges still open. Here, we seek to systematically review the existing architectures of LM-based Agents (LMAs) for games and summarize their commonalities, challenges, and any other insights. Furthermore, we present our perspective on promising future research avenues for the advancement of LMs in games. We hope to assist researchers in gaining a clear understanding of the field and to generate more interest in this highly impactful research direction. A corresponding resource, continuously updated, can be found in our GitHub repository.
大规模模型(LMs)的快速演变,无论是语言集中还是多模态,都引起了学术界和产业界的广泛关注。然而,尽管对这一快速发展的领域兴趣浓厚,但关于它们在各自显著影响场景下的能力和潜力方面的系统综述仍然很少。本文旨在弥合这一空白,对基于游戏的LM使用现状及其所面临的挑战进行全面审查。在这里,我们试图系统地审查基于游戏的LM代理(LMAs)的现有架构,总结它们的共同点、挑战以及任何其他见解。此外,我们还提出了未来在游戏领域推动LM发展的有前途的研究方向。我们希望帮助研究人员对这一领域有一个清晰的认识,并产生对该具有重大影响的研究方向的更多兴趣。相应的资源,持续更新,可以在我们的GitHub存储库中找到。
https://arxiv.org/abs/2403.10249
Various macroeconomic and institutional factors hinder FDI inflows, including corruption, trade openness, access to finance, and political instability. Existing research mostly focuses on country-level data, with limited exploration of firm-level data, especially in developing countries. Recognizing this gap, recent calls for research emphasize the need for qualitative data analysis to delve into FDI determinants, particularly at the regional level. This paper proposes a novel methodology, based on text mining and social network analysis, to get information from more than 167,000 online news articles to quantify regional-level (sub-national) attributes affecting FDI ownership in African companies. Our analysis extends information on obstacles to industrial development as mapped by the World Bank Enterprise Surveys. Findings suggest that regional (sub-national) structural and institutional characteristics can play an important role in determining foreign ownership.
各种宏观经济和制度因素阻碍了外国直接投资(FDI)流入,包括腐败、贸易开放程度、融资能力和政治不稳定。现有的研究主要关注国家层面的数据,对企业层面的数据探讨有限,特别是在发展中国家。认识到这一差距,最近的研究强调需要定性数据分析来深入研究FDI的决定因素,特别是地区层面(次国家层面)。本文提出了一种基于文本挖掘和社会网络分析的新方法,以从超过167,000篇在线新闻文章中获取信息,定量非洲公司FDI所有权区域层面的(次国家层面)特征。我们的分析扩展了世界银行企业调查中绘制的工业化发展障碍信息。研究结果表明,地区(次国家层面)结构和制度特征在决定外国所有权方面发挥着重要作用。
https://arxiv.org/abs/2403.10239
Importance weighting is a fundamental procedure in statistics and machine learning that weights the objective function or probability distribution based on the importance of the instance in some sense. The simplicity and usefulness of the idea has led to many applications of importance weighting. For example, it is known that supervised learning under an assumption about the difference between the training and test distributions, called distribution shift, can guarantee statistically desirable properties through importance weighting by their density ratio. This survey summarizes the broad applications of importance weighting in machine learning and related research.
重要性加权是一种基本的统计学和机器学习方法,根据实例在某种意义上的重要性对目标函数或概率分布进行加权。这个想法的简单性和实用性导致了许多重要性加权的应用。例如,在假设训练和测试分布之间的差异被称为分布平移的监督学习条件下,通过密度比率的重要性加权可以保证统计上有意义的属性。本调查概括了机器学习和相关领域中重要性加权的广泛应用。
https://arxiv.org/abs/2403.10175
The standard approach to tackling computer vision problems is to train deep convolutional neural network (CNN) models using large-scale image datasets which are representative of the target task. However, in many scenarios, it is often challenging to obtain sufficient image data for the target task. Data augmentation is a way to mitigate this challenge. A common practice is to explicitly transform existing images in desired ways so as to create the required volume and variability of training data necessary to achieve good generalization performance. In situations where data for the target domain is not accessible, a viable workaround is to synthesize training data from scratch--i.e., synthetic data augmentation. This paper presents an extensive review of synthetic data augmentation techniques. It covers data synthesis approaches based on realistic 3D graphics modeling, neural style transfer (NST), differential neural rendering, and generative artificial intelligence (AI) techniques such as generative adversarial networks (GANs) and variational autoencoders (VAEs). For each of these classes of methods, we focus on the important data generation and augmentation techniques, general scope of application and specific use-cases, as well as existing limitations and possible workarounds. Additionally, we provide a summary of common synthetic datasets for training computer vision models, highlighting the main features, application domains and supported tasks. Finally, we discuss the effectiveness of synthetic data augmentation methods. Since this is the first paper to explore synthetic data augmentation methods in great detail, we are hoping to equip readers with the necessary background information and in-depth knowledge of existing methods and their attendant issues.
解决计算机视觉问题的标准方法是使用大型图像数据集训练深度卷积神经网络(CNN)模型,这些数据集代表目标任务。然而,在许多情况下,获得足够的目标任务图像数据具有挑战性。数据增强是一种减轻这一挑战的方法。一种常见的做法是对现有的图像进行显式转换,以便创建实现良好泛化性能所需的训练数据量。在目标领域数据不可访问的情况下,一个可行的解决方法是从零开始合成训练数据,即合成数据增强。 本文对合成数据增强技术进行了全面的回顾。它涵盖了基于现实3D图形建模的数据生成方法、神经风格迁移(NST)、差分神经渲染和生成人工智能(AI)技术(如生成对抗网络(GANs)和变分自编码器(VAEs)的数据生成方法。对于每种方法,我们重点关注重要的数据生成和增强技术、应用范围和具体用例,以及现有的局限性和可能的解决方案。此外,我们还提供了用于训练计算机视觉模型的常见合成数据集的总结,突出了主要特点、应用领域和支持任务。最后,我们讨论了合成数据增强方法的有效性。由于这是对详细探索合成数据增强方法的第一篇论文,我们希望能够为读者提供必要的背景信息和现有方法的深入知识及其相关问题。
https://arxiv.org/abs/2403.10075
Electronic health records include information on patients' status and medical history, which could cover the history of diseases and disorders that could be hereditary. One important use of family history information is in precision health, where the goal is to keep the population healthy with preventative measures. Natural Language Processing (NLP) and machine learning techniques can assist with identifying information that could assist health professionals in identifying health risks before a condition is developed in their later years, saving lives and reducing healthcare costs. We survey the literature on the techniques from the NLP field that have been developed to utilise digital health records to identify risks of familial diseases. We highlight that rule-based methods are heavily investigated and are still actively used for family history extraction. Still, more recent efforts have been put into building neural models based on large-scale pre-trained language models. In addition to the areas where NLP has successfully been utilised, we also identify the areas where more research is needed to unlock the value of patients' records regarding data collection, task formulation and downstream applications.
电子病历包括患者的状况和医疗历史信息,这些信息可能涵盖可以遗传的疾病的病史。家族史信息的另一个重要应用是精确医疗,其目标是保持人口健康通过预防措施。自然语言处理(NLP)和机器学习技术可以帮助医生在疾病出现前识别健康风险,节省生命并降低医疗费用。我们调查了NLP领域中用于利用电子病历识别家族病风险的技术。我们强调,基于规则的方法仍在积极研究,并仍然是家族史提取的主要方法。尽管如此,近年来已经投入了大量精力基于大规模预训练语言模型构建神经模型。除了已经成功利用NLP技术的领域外,我们还发现了需要更多研究以释放患者记录关于数据收集、任务设计和 downstream应用的价值的领域。
https://arxiv.org/abs/2403.09997
Causal inference has shown potential in enhancing the predictive accuracy, fairness, robustness, and explainability of Natural Language Processing (NLP) models by capturing causal relationships among variables. The emergence of generative Large Language Models (LLMs) has significantly impacted various NLP domains, particularly through their advanced reasoning capabilities. This survey focuses on evaluating and improving LLMs from a causal view in the following areas: understanding and improving the LLMs' reasoning capacity, addressing fairness and safety issues in LLMs, complementing LLMs with explanations, and handling multimodality. Meanwhile, LLMs' strong reasoning capacities can in turn contribute to the field of causal inference by aiding causal relationship discovery and causal effect estimations. This review explores the interplay between causal inference frameworks and LLMs from both perspectives, emphasizing their collective potential to further the development of more advanced and equitable artificial intelligence systems.
因果推理在增强自然语言处理(NLP)模型的预测准确性、公平性、稳健性和可解释性方面具有潜在作用,通过捕捉变量之间的因果关系。大型语言模型的出现对各种NLP领域产生了重大影响,特别是通过其先进的推理能力。本调查将专注于从因果视角评估和改善LLMs的以下方面:提高LLMs的推理能力,解决LLMs中的公平和安全问题,补充LLMs的解释,处理多模态。同时,LLMs强大的推理能力可以促进因果推理领域的进一步发展,通过帮助发现因果关系和估计因果效果。本调查从两个角度探讨了因果推理框架和LLM之间的互动,强调了它们集体推动更先进、更公平的人工智能系统发展的共同潜力。
https://arxiv.org/abs/2403.09606
We present BEHAVIOR-1K, a comprehensive simulation benchmark for human-centered robotics. BEHAVIOR-1K includes two components, guided and motivated by the results of an extensive survey on "what do you want robots to do for you?". The first is the definition of 1,000 everyday activities, grounded in 50 scenes (houses, gardens, restaurants, offices, etc.) with more than 9,000 objects annotated with rich physical and semantic properties. The second is OMNIGIBSON, a novel simulation environment that supports these activities via realistic physics simulation and rendering of rigid bodies, deformable bodies, and liquids. Our experiments indicate that the activities in BEHAVIOR-1K are long-horizon and dependent on complex manipulation skills, both of which remain a challenge for even state-of-the-art robot learning solutions. To calibrate the simulation-to-reality gap of BEHAVIOR-1K, we provide an initial study on transferring solutions learned with a mobile manipulator in a simulated apartment to its real-world counterpart. We hope that BEHAVIOR-1K's human-grounded nature, diversity, and realism make it valuable for embodied AI and robot learning research. Project website: this https URL.
我们提出了BEHAVIOR-1K,一个以人为中心的机器人模拟基准。BEHAVIOR-1K包括两个组件,分别是基于对“你想要机器人为你做什么?”的广泛调查结果而设计的引导和有动力的组件。第一部分是对1000个日常活动的定义,基于50个场景(房屋、花园、餐厅、办公室等)的超过9000个物体,这些物体具有丰富的物理和语义属性。第二部分是OMNIGIBSON,一个支持这些活动的新模拟环境,通过 realistic physics simulation and rendering of rigid bodies, deformable bodies, and liquids来实现。我们的实验结果表明,BEHAVIOR-1K中的活动具有长时程和依赖于复杂操作技能的特点,这两点仍然是即使是最先进的机器人学习解决方案也难以实现的挑战。为了校准BEHAVIOR-1K的模拟与现实之间的差距,我们提供了在模拟公寓中使用移动操作器学习解决方案的初始研究,并希望BEHAVIOR-1K具有以人为本、多样化和逼真的特点,使其成为 embodied AI 和机器人学习研究的宝贵财富。项目网站:https://this URL。
https://arxiv.org/abs/2403.09227
Knowledge sharing about emerging threats is crucial in the rapidly advancing field of cybersecurity and forms the foundation of Cyber Threat Intelligence (CTI). In this context, Large Language Models are becoming increasingly significant in the field of cybersecurity, presenting a wide range of opportunities. This study surveys the performance of ChatGPT, GPT4all, Dolly, Stanford Alpaca, Alpaca-LoRA, Falcon, and Vicuna chatbots in binary classification and Named Entity Recognition (NER) tasks performed using Open Source INTelligence (OSINT). We utilize well-established data collected in previous research from Twitter to assess the competitiveness of these chatbots when compared to specialized models trained for those tasks. In binary classification experiments, Chatbot GPT-4 as a commercial model achieved an acceptable F1 score of 0.94, and the open-source GPT4all model achieved an F1 score of 0.90. However, concerning cybersecurity entity recognition, all evaluated chatbots have limitations and are less effective. This study demonstrates the capability of chatbots for OSINT binary classification and shows that they require further improvement in NER to effectively replace specially trained models. Our results shed light on the limitations of the LLM chatbots when compared to specialized models, and can help researchers improve chatbots technology with the objective to reduce the required effort to integrate machine learning in OSINT-based CTI tools.
在快速发展的网络安全领域,知识共享关于新兴威胁非常重要,并成为网络安全情报(CTI)的基础。在这种情况下,大型语言模型在网络安全领域变得越来越重要,呈现出了各种机会。本研究调查了ChatGPT、GPT4all、Dolly、斯坦福Alpaca、Alpaca-LoRA、Falcon和Vicuna聊天机器人在使用开源智能(OSINT)执行二分类和命名实体识别(NER)任务中的表现。我们利用之前研究从Twitter收集的经过良好检验的数据来评估这些聊天机器人在与这些任务专用模型相比较时的竞争力。在二分类实验中,作为商业模型的Chatbot GPT-4取得了0.94的F1 score,而开源的GPT4all模型取得了0.90的F1分数。然而,在网络安全实体识别方面,所有评估的聊天机器人都有局限性,效果不如专门训练的模型。本研究证明了聊天机器人在OSINT二分类方面的能力,并表明它们需要进一步改进以有效替代专门训练的模型。我们的结果揭示了LLM聊天机器人与专用模型相比较的局限性,这可以帮助研究人员通过减少将机器学习集成到基于OSINT的CTI工具所需的精力来改善聊天机器人技术。
https://arxiv.org/abs/2401.15127
Large Language Models (LLMs) for code have gained significant attention recently. They can generate code in different programming languages based on provided prompts, fulfilling a long-lasting dream in Software Engineering (SE), i.e., automatic code generation. Similar to human-written code, LLM-generated code is prone to bugs, and these bugs have not yet been thoroughly examined by the community. Given the increasing adoption of LLM-based code generation tools (e.g., GitHub Copilot) in SE activities, it is critical to understand the characteristics of bugs contained in code generated by LLMs. This paper examines a sample of 333 bugs collected from code generated using three leading LLMs (i.e., CodeGen, PanGu-Coder, and Codex) and identifies the following 10 distinctive bug patterns: Misinterpretations, Syntax Error, Silly Mistake, Prompt-biased code, Missing Corner Case, Wrong Input Type, Hallucinated Object, Wrong Attribute, Incomplete Generation, and Non-Prompted Consideration. The bug patterns are presented in the form of a taxonomy. The identified bug patterns are validated using an online survey with 34 LLM practitioners and researchers. The surveyed participants generally asserted the significance and prevalence of the bug patterns. Researchers and practitioners can leverage these findings to develop effective quality assurance techniques for LLM-generated code. This study sheds light on the distinctive characteristics of LLM-generated code.
大语言模型(LLMs)在代码生成方面引起了最近显著的关注。它们可以根据提供的问题生成不同编程语言的代码,实现了软件工程(SE)中一直梦想的目标——自动代码生成。与人类编写的代码类似,LLM生成的代码容易出错,而这些错误尚未被社区充分调查。随着LLM为基础的代码生成工具(如GitHub Copilot)在SE活动中的逐渐采用,了解LLM生成的代码中所包含的错误特征至关重要。本文对使用三个主要LLM(即CodeGen、PanGu-Coder和Codex)生成的代码样本进行了检查,并识别出以下10个独特的 bug 模式:误解,语法错误,愚蠢的失误,基于提示的代码,缺少边界情况,错误的输入类型,幻觉的对象,错误的属性,不完整的生成和未提示的考虑。这些 bug 模式以分类的形式呈现。通过与 34 名LLM 从业者和研究人员进行在线调查,验证了识别的 bug 模式。调查参与者普遍认为这些 bug 模式具有重大意义和普遍性。研究人员和从业者可以利用这些发现开发有效的 LLM 生代码质保障技术。本研究揭示了 LLM 生代码的独特特征。
https://arxiv.org/abs/2403.08937
This survey presents a comprehensive analysis of data augmentation techniques in human-centric vision tasks, a first of its kind in the field. It delves into a wide range of research areas including person ReID, human parsing, human pose estimation, and pedestrian detection, addressing the significant challenges posed by overfitting and limited training data in these domains. Our work categorizes data augmentation methods into two main types: data generation and data perturbation. Data generation covers techniques like graphic engine-based generation, generative model-based generation, and data recombination, while data perturbation is divided into image-level and human-level perturbations. Each method is tailored to the unique requirements of human-centric tasks, with some applicable across multiple areas. Our contributions include an extensive literature review, providing deep insights into the influence of these augmentation techniques in human-centric vision and highlighting the nuances of each method. We also discuss open issues and future directions, such as the integration of advanced generative models like Latent Diffusion Models, for creating more realistic and diverse training data. This survey not only encapsulates the current state of data augmentation in human-centric vision but also charts a course for future research, aiming to develop more robust, accurate, and efficient human-centric vision systems.
这项调查对人类中心化视觉任务中的数据增强技术进行了全面分析,是该领域独一无二的。它深入研究了包括人物识别、人解析、人姿势估计和行人检测在内的广泛研究领域,解决了过拟合和有限训练数据在这些领域带来的显著挑战。我们的工作将数据增强方法分为两种主要类型:数据生成和数据扰动。数据生成包括基于图形引擎生成、基于生成模型生成和数据重组等技术,而数据扰动则分为图像级别和人类级别扰动。每种方法都是针对人类中心化任务独特的需求进行定制的,有些方法可以应用于多个领域。我们的贡献包括广泛的文献 review,为这些增强技术在人类中心化视觉和每个方法的影响力提供了深刻的洞察。我们还讨论了未解决的问题和未来的研究方向,例如采用先进的生成模型如潜在扩散模型,以创建更真实和多样化的训练数据。这项调查不仅概括了当前数据增强在人类中心化视觉中的状态,而且为未来的研究奠定了基础,旨在开发更健壮、准确和高效的以人为本视觉系统。
https://arxiv.org/abs/2403.08650
Story Visualization (SV) is a challenging generative vision task, that requires both visual quality and consistency between different frames in generated image sequences. Previous approaches either employ some kind of memory mechanism to maintain context throughout an auto-regressive generation of the image sequence, or model the generation of the characters and their background separately, to improve the rendering of characters. On the contrary, we embrace a completely parallel transformer-based approach, exclusively relying on Cross-Attention with past and future captions to achieve consistency. Additionally, we propose a Character Guidance technique to focus on the generation of characters in an implicit manner, by forming a combination of text-conditional and character-conditional logits in the logit space. We also employ a caption-augmentation technique, carried out by a Large Language Model (LLM), to enhance the robustness of our approach. The combination of these methods culminates into state-of-the-art (SOTA) results over various metrics in the most prominent SV benchmark (Pororo-SV), attained with constraint resources while achieving superior computational complexity compared to previous arts. The validity of our quantitative results is supported by a human survey.
Story Visualization(SV)是一个具有挑战性的生成视觉任务,它需要生成图像序列中的视觉质量和不同帧之间的一致性。之前的方法要么采用某种记忆机制来维持上下文,要么分别建模生成字符和它们的背景,以提高字符的渲染效果。相反,我们采用了一种完全基于Transformer的 approach,仅依赖过时和未来的旁白来达到一致性。此外,我们还提出了一种 Character Guidance 技术,通过在logit空间中形成文本相关和字符相关的分数来关注生成字符。我们还使用由大型语言模型(LLM)执行的 caption-augmentation 技术来增强我们方法的可行性。这些方法的组合在最具挑战性的 SV 基准(Pororo-SV)上获得了最先进的成果,通过约束资源实现了与之前艺术作品相比优越的计算复杂性。我们定量结果的有效性得到了人类调查的支持。
https://arxiv.org/abs/2403.08502
Machine learning (ML) and artificial intelligence (AI) approaches are often criticized for their inherent bias and for their lack of control, accountability, and transparency. Consequently, regulatory bodies struggle with containing this technology's potential negative side effects. High-level requirements such as fairness and robustness need to be formalized into concrete specification metrics, imperfect proxies that capture isolated aspects of the underlying requirements. Given possible trade-offs between different metrics and their vulnerability to over-optimization, integrating specification metrics in system development processes is not trivial. This paper defines specification overfitting, a scenario where systems focus excessively on specified metrics to the detriment of high-level requirements and task performance. We present an extensive literature survey to categorize how researchers propose, measure, and optimize specification metrics in several AI fields (e.g., natural language processing, computer vision, reinforcement learning). Using a keyword-based search on papers from major AI conferences and journals between 2018 and mid-2023, we identify and analyze 74 papers that propose or optimize specification metrics. We find that although most papers implicitly address specification overfitting (e.g., by reporting more than one specification metric), they rarely discuss which role specification metrics should play in system development or explicitly define the scope and assumptions behind metric formulations.
机器学习和人工智能方法常常因固有的偏见和缺乏控制、责任和透明度而受到批评。因此,监管机构很难控制这种技术可能带来的负面影响。高水平的规范,如公平性和稳健性,需要正式化为具体的规范指标,这些指标捕捉到底层要求的孤立方面。鉴于不同指标之间可能的权衡和它们易受到过度优化的影响,在系统开发过程中集成规范指标并非易事。本文定义了规格过拟合,一种情况是系统过于关注指定指标,而忽略了高级要求和任务绩效的情况。我们对2018年到2023年内人工智能领域(如自然语言处理、计算机视觉、强化学习)的论文进行了广泛的调查和分析,以分类研究人员如何提出、度量和完善规范指标。通过基于关键词的搜索,我们识别和分析了74篇论文,这些论文提出了或优化了规范指标。我们发现,尽管大多数论文暗示地讨论了规格过拟合(例如,通过报告多个规范指标),但它们很少讨论规范指标在系统开发中的作用,或者明确定义指标表述的范围和假设。
https://arxiv.org/abs/2403.08425
Data augmentation is arguably the most important regularization technique commonly used to improve generalization performance of machine learning models. It primarily involves the application of appropriate data transformation operations to create new data samples with desired properties. Despite its effectiveness, the process is often challenging because of the time-consuming trial and error procedures for creating and testing different candidate augmentations and their hyperparameters manually. Automated data augmentation methods aim to automate the process. State-of-the-art approaches typically rely on automated machine learning (AutoML) principles. This work presents a comprehensive survey of AutoML-based data augmentation techniques. We discuss various approaches for accomplishing data augmentation with AutoML, including data manipulation, data integration and data synthesis techniques. We present extensive discussion of techniques for realizing each of the major subtasks of the data augmentation process: search space design, hyperparameter optimization and model evaluation. Finally, we carried out an extensive comparison and analysis of the performance of automated data augmentation techniques and state-of-the-art methods based on classical augmentation approaches. The results show that AutoML methods for data augmentation currently outperform state-of-the-art techniques based on conventional approaches.
数据增强是机器学习模型改进通用性能的最常用的正则化技术。它主要涉及应用适当的数据变换操作来创建具有所需属性的新数据样本。尽管它的效果很好,但过程通常很难,因为创建和测试不同候选增强及其超参数需要花费大量的时间和精力。自动数据增强方法旨在自动化这个过程。最先进的 approaches 通常依赖自适应机器学习(AutoML)原则。 本文对基于 AutoML 的数据增强技术进行了全面的调查。我们讨论了使用数据操作、数据集成和数据合成技术实现数据增强的方法。我们详细讨论了实现数据增强过程中每个主要子任务的技能:搜索空间设计、超参数优化和模型评估。最后,我们对基于经典增强方法的现有技术和最先进的方法进行了广泛的比较和分析。结果表明,基于 AutoML 的数据增强方法目前超越了传统方法的性能。
https://arxiv.org/abs/2403.08352
Robots able to run, fly, and grasp have a high potential to solve a wide scope of tasks and navigate in complex environments. Several mechatronic designs of such robots with adaptive morphologies are emerging. However, the task of landing on an uneven surface, traversing rough terrain, and manipulating objects still presents high challenges. This paper introduces the design of a novel rotor UAV MorphoGear with morphogenetic gear and includes a description of the robot's mechanics, electronics, and control architecture, as well as walking behavior and an analysis of experimental results. MorphoGear is able to fly, walk on surfaces with several gaits, and grasp objects with four compatible robotic limbs. Robotic limbs with three degrees of freedom (DoFs) are used by this UAV as pedipulators when walking or flying and as manipulators when performing actions in the environment. We performed a locomotion analysis of the landing gear of the robot. Three types of robot gaits have been developed. The experimental results revealed low crosstrack error of the most accurate gait (mean of 1.9 cm and max of 5.5 cm) and the ability of the drone to move with a 210 mm step length. Another type of robot gait also showed low crosstrack error (mean of 2.3 cm and max of 6.9 cm). The proposed MorphoGear system can potentially achieve a high scope of tasks in environmental surveying, delivery, and high-altitude operations.
具有奔跑、飞行和抓取能力的机器人具有解决广泛任务和复杂环境的高潜力。已经出现了几种具有自适应形态结构的此类机器人的设计。然而,在着陆不平表面、穿越粗糙地形和操作物体方面,任务仍然具有挑战性。本文介绍了一种新型多旋翼UAV MorphoGear的设计,包括描述机器人的机械、电子和控制架构以及行走行为和实验结果的分析。MorphoGear能够飞行、在几步上行走并在环境中抓取物体。在行走或飞行时,UAV使用具有三个自由度的机器人手臂作为足履器。我们对机器人的起落架进行了运动分析。开发了三种机器人步态。实验结果揭示了最精确的步态(平均值为1.9厘米,最大值为5.5厘米)以及无人机以210毫米的步长移动的能力。另一种机器人步态也显示出较低的跨距误差(平均值为2.3厘米,最大值为6.9厘米)。所提出的MorphoGear系统有可能在环境调查、交付和太空作业等广阔领域取得成功。
https://arxiv.org/abs/2403.08340
This survey provides an in-depth analysis of knowledge conflicts for large language models (LLMs), highlighting the complex challenges they encounter when blending contextual and parametric knowledge. Our focus is on three categories of knowledge conflicts: context-memory, inter-context, and intra-memory conflict. These conflicts can significantly impact the trustworthiness and performance of LLMs, especially in real-world applications where noise and misinformation are common. By categorizing these conflicts, exploring the causes, examining the behaviors of LLMs under such conflicts, and reviewing available solutions, this survey aims to shed light on strategies for improving the robustness of LLMs, thereby serving as a valuable resource for advancing research in this evolving area.
这项调查对大型语言模型(LLMs)知识冲突的深入分析揭示了它们在融合上下文和参数知识时所面临复杂挑战。我们的重点关注三个知识冲突类别:上下文记忆冲突、跨上下文冲突和内部分子冲突。这些冲突可能显著影响LLM的可靠性和性能,尤其是在现实世界中,噪声和错误信息是常见的。通过归类这些冲突,探索其原因,研究LLM在上下文冲突下的行为,并回顾可用解决方案,这项调查旨在阐明提高LLM韧性的策略,从而为深入研究这个不断发展的领域提供宝贵的资源。
https://arxiv.org/abs/2403.08319
Mixed-media tutorials, which integrate videos, images, text, and diagrams to teach procedural skills, offer more browsable alternatives than timeline-based videos. However, manually creating such tutorials is tedious, and existing automated solutions are often restricted to a particular domain. While AI models hold promise, it is unclear how to effectively harness their powers, given the multi-modal data involved and the vast landscape of models. We present TutoAI, a cross-domain framework for AI-assisted mixed-media tutorial creation on physical tasks. First, we distill common tutorial components by surveying existing work; then, we present an approach to identify, assemble, and evaluate AI models for component extraction; finally, we propose guidelines for designing user interfaces (UI) that support tutorial creation based on AI-generated components. We show that TutoAI has achieved higher or similar quality compared to a baseline model in preliminary user studies.
混合媒体教程,结合视频、图像、文本和图表来教授程序技能,提供了比基于时间轴的视频更易于浏览的替代方案。然而,手动创建这类教程是费力的,现有的自动解决方案通常局限于特定领域。虽然AI模型具有很大的潜力,但如何有效地利用它们的力量还不清楚,尤其是在涉及多模态数据和各种模型的情况下。我们提出了TutoAI,一个跨领域框架,用于在物理任务上使用AI辅助混合媒体教程创作。首先,通过调查现有工作,我们提炼出常见的教程组件;然后,我们展示了基于组件提取的AI模型组装和评估的方法;最后,我们提出了关于基于AI生成的组件设计用户界面的建议。我们展示了TutoAI在初步用户研究中已经达到或与基线模型相当的质量水平。
https://arxiv.org/abs/2403.08049
Strongly lensed Type Ia supernovae (LSNe Ia) are a promising probe to measure the Hubble constant ($H_0$) directly. To use LSNe Ia for cosmography, a time-delay measurement between the multiple images, a lens-mass model, and a mass reconstruction along the line of sight are required. In this work, we present the machine learning network LSTM-FCNN which is a combination of a Long Short-Term Memory Network (LSTM) and a fully-connected neural network (FCNN). The LSTM-FCNN is designed to measure time delays on a sample of LSNe Ia spanning a broad range of properties, which we expect to find with the upcoming Rubin Observatory Legacy Survey of Space and Time (LSST) and for which follow-up observations are planned. With follow-up observations in $i$ band (cadence of one to three days with a single-epoch $5\sigma$ depth of 24.5 mag), we reach a bias-free delay measurement with a precision around 0.7 days over a large sample of LSNe Ia. The LSTM-FCNN is far more general than previous machine learning approaches such as the Random Forest (RF), where a RF has to be trained for each observational pattern separately, and yet the LSTM-FCNN outperforms the RF by a factor of roughly three. Therefore, the LSTM-FCNN is a very promising approach to achieve robust time delays in LSNe Ia, which is important for a precise and accurate constraint on $H_0$
强烈的 lensed Type Ia超新星(LSNe Ia)是直接测量哈勃常数($H_0$)的有前景的探测器。要使用LSNe Ia进行宇宙学,需要进行多张图像之间的时间延迟测量、 lens-mass模型以及沿着视线的大气层质量重建。在这项工作中,我们提出了一个由LSTM和全连接神经网络(FCNN)组成的机器学习网络LSTM-FCNN。LSTM-FCNN旨在测量一组LSNe Ia中的时间延迟,这些LSNe Ia具有广泛的属性,我们预计将利用即将到来的Rubin望远镜太空与时间遗产调查(LSST)的结果来寻找。在$i$频段(单个epoch $5\sigma$深度1-3天)的随访观测中,我们达到在大样本LSNe Ia中无偏的延迟测量,精度在0.7天左右。LSTM-FCNN比以前的机器学习方法(如随机森林)更具有通用性,因为随机森林必须对每个观测模式单独训练,而LSTM-FCNN在很大程度上超过了RF。因此,LSTM-FCNN是一种实现LSNe Ia中稳健时间延迟的有前景的方法,这对于精确和准确地约束$H_0$非常重要。
https://arxiv.org/abs/2403.08029
Acting is an important decisional function for autonomous robots. Acting relies on skills to implement and to model the activities it oversees: refinement, local recovery, temporal dispatching, external asynchronous events, and commands execution, all done online. While sitting between planning and the robotic platform, acting often relies on programming primitives and an interpreter which executes these skills. Following our experience in providing a formal framework to program the functional components of our robots, we propose a new language, to program the acting skills. This language maps unequivocally into a formal model which can then be used to check properties offline or execute the skills, or more precisely their formal equivalent, and perform runtime verification. We illustrate with a real example how we can program a survey mission for a drone in this new language, prove some formal properties on the program and directly execute the formal model on the drone to perform the mission.
表演是一个重要的决策功能,对于自主机器人来说。表演依赖于技能来实现和建模它所监督的活动:细化,局部恢复,时间调度,外部异步事件和命令执行,所有这些都在在线完成。在规划和机器人平台之间,表演通常依赖于编程原语和解释器来执行这些技能。在我们为机器人提供正式框架的经验基础上,我们提出了一个新的语言,用于编程机器人的表演技能。这个语言完全等价于一个形式化模型,可以在线或离线检查属性,或者更精确地说,它们的正式等价物,并执行运行时验证。我们用一个实际例子来说明,如何使用这种新语言编程一个无人机调查任务。然后我们在程序上证明一些形式化的属性,并直接在无人机上执行形式化模型以执行任务。
https://arxiv.org/abs/2403.07770
This survey explores the adaptation of visual transformer models in Autonomous Driving, a transition inspired by their success in Natural Language Processing. Surpassing traditional Recurrent Neural Networks in tasks like sequential image processing and outperforming Convolutional Neural Networks in global context capture, as evidenced in complex scene recognition, Transformers are gaining traction in computer vision. These capabilities are crucial in Autonomous Driving for real-time, dynamic visual scene processing. Our survey provides a comprehensive overview of Vision Transformer applications in Autonomous Driving, focusing on foundational concepts such as self-attention, multi-head attention, and encoder-decoder architecture. We cover applications in object detection, segmentation, pedestrian detection, lane detection, and more, comparing their architectural merits and limitations. The survey concludes with future research directions, highlighting the growing role of Vision Transformers in Autonomous Driving.
本次调查探讨了在自动驾驶中视觉Transformer模型的适应性,这是受到其在自然语言处理领域成功的影响而产生的转变。在诸如序列图像处理等任务中超越传统的循环神经网络,并在全局上下文捕捉方面优于卷积神经网络,表明Transformer正逐渐在计算机视觉领域受到关注。这些能力在自动驾驶中至关重要,因为它们有助于实现实时、动态的视觉场景处理。我们的调查全面回顾了在自动驾驶中视觉Transformer的应用,重点关注基础概念如自注意力、多头注意力和编码器-解码器架构。我们涵盖了对象检测、分割、行人检测、车道检测等应用,比较了它们的架构优劣。调查最后提出了未来的研究方向,强调了视觉Transformer在自动驾驶领域日益重要的地位。
https://arxiv.org/abs/2403.07542