In the e-commerce domain, the accurate extraction of attribute-value pairs from product listings (e.g., Brand: Apple) is crucial for enhancing search and recommendation systems. The automation of this extraction process is challenging due to the vast diversity of product categories and their respective attributes, compounded by the lack of extensive, accurately annotated training datasets and the demand for low latency to meet the real-time needs of e-commerce platforms. To address these challenges, we introduce GenToC, a novel two-stage model for extracting attribute-value pairs from product titles. GenToC is designed to train with partially-labeled data, leveraging incomplete attribute-value pairs and obviating the need for a fully annotated dataset. Moreover, we introduce a bootstrapping method that enables GenToC to progressively refine and expand its training dataset. This enhancement substantially improves the quality of data available for training other neural network models that are typically faster but are inherently less capable than GenToC in terms of their capacity to handle partially-labeled data. By supplying an enriched dataset for training, GenToC significantly advances the performance of these alternative models, making them more suitable for real-time deployment. Our results highlight the unique capability of GenToC to learn from a limited set of labeled data and to contribute to the training of more efficient models, marking a significant leap forward in the automated extraction of attribute-value pairs from product titles. GenToC has been successfully integrated into India's largest B2B e-commerce platform, this http URL, achieving a significant increase of 21.1% in recall over the existing deployed system while maintaining a high precision of 89.5% in this challenging task.
在电子商务领域,从产品列表中准确提取属性值对(例如,品牌:苹果)对于提高搜索和推荐系统非常重要。自动化这个过程因产品类别的多样性和它们的相应属性而变得非常具有挑战性,再加上缺乏广泛、准确注释的训练数据以及满足电子商务平台实时需求的需求,使得该过程非常具有挑战性。为了应对这些挑战,我们引入了GenToC,一种用于从产品标题中提取属性值对的新颖两阶段模型。GenToC旨在利用部分标注数据进行训练,并利用不完整的属性值对省略了完全标注的数据集的需求。此外,我们还引入了一种通过bootstrap方法逐步优化和扩展训练数据集的 bootstrapping 方法。通过为训练提供丰富的数据,GenToC显著提高了这些替代模型的性能,使它们更适合用于处理部分标注数据。我们的结果突出了GenToC从有限标注数据中学习的独特能力,以及为更有效的模型提供培训的重要性的显著进展,这标志着自动提取产品标题中属性值对的自动化过程中向前迈进了一大步。GenToC 已经成功地将印度最大的 B2B电子商务平台集成到该http URL中,实现了超过现有部署系统21.1%的召回率,同时保持了89.5%的高精确度,在具有挑战性的任务中表现出色。
https://arxiv.org/abs/2405.10918
Large Language Models (LLMs) constitute a breakthrough state-of-the-art Artificial Intelligence (AI) technology which is rapidly evolving and promises to aid in medical diagnosis either by assisting doctors or by simulating a doctor's workflow in more advanced and complex implementations. In this technical paper, we outline Cognitive Network Evaluation Toolkit for Medical Domains (COGNET-MD), which constitutes a novel benchmark for LLM evaluation in the medical domain. Specifically, we propose a scoring-framework with increased difficulty to assess the ability of LLMs in interpreting medical text. The proposed framework is accompanied with a database of Multiple Choice Quizzes (MCQs). To ensure alignment with current medical trends and enhance safety, usefulness, and applicability, these MCQs have been constructed in collaboration with several associated medical experts in various medical domains and are characterized by varying degrees of difficulty. The current (first) version of the database includes the medical domains of Psychiatry, Dentistry, Pulmonology, Dermatology and Endocrinology, but it will be continuously extended and expanded to include additional medical domains.
大语言模型(LLMs)是一种最先进的人工智能(AI)技术,正在迅速发展和有望在医疗诊断方面有所帮助,无论是通过帮助医生还是通过模拟医生的工作流程。在本文技术论文中,我们概述了医学领域 Cognitive Network Evaluation Toolkit(COGNET-MD),构成了一种新的 LLM 评估基准。具体来说,我们提出了一个评估 LLMs 解释医疗文本能力的新评分框架。这个框架附带了一个 Multiple Choice Quizzes(MCQs)数据库。为确保与当前医疗趋势保持一致并提高安全性、有用性和适用性,这些 MCQs 是在与各个医学领域的多个相关专家的合作下建设的,并具有不同的难度程度。目前(第一)版数据库包括精神病学、牙科、肺病学、皮肤病学和内分泌学,但将来会持续扩展和更新,以包括其他医学领域。
https://arxiv.org/abs/2405.10893
The exponential growth of astronomical datasets provides an unprecedented opportunity for humans to gain insight into the Universe. However, effectively analyzing this vast amount of data poses a significant challenge. Astronomers are turning to deep learning techniques to address this, but the methods are limited by their specific training sets, leading to considerable duplicate workloads too. Hence, as an example to present how to overcome the issue, we built a framework for general analysis of galaxy images, based on a large vision model (LVM) plus downstream tasks (DST), including galaxy morphological classification, image restoration, object detection, parameter extraction, and more. Considering the low signal-to-noise ratio of galaxy images and the imbalanced distribution of galaxy categories, we have incorporated a Human-in-the-loop (HITL) module into our large vision model, which leverages human knowledge to enhance the reliability and interpretability of processing galaxy images interactively. The proposed framework exhibits notable few-shot learning capabilities and versatile adaptability to all the abovementioned tasks on galaxy images in the DESI legacy imaging surveys. Expressly, for object detection, trained by 1000 data points, our DST upon the LVM achieves an accuracy of 96.7%, while ResNet50 plus Mask R-CNN gives an accuracy of 93.1%; for morphology classification, to obtain AUC ~0.9, LVM plus DST and HITL only requests 1/50 training sets compared to ResNet18. Expectedly, multimodal data can be integrated similarly, which opens up possibilities for conducting joint analyses with datasets spanning diverse domains in the era of multi-message astronomy.
星际数据的增长为人类深入了解宇宙提供了前所未有的机会。然而,有效地分析这些大量数据仍然是一个巨大的挑战。天文学家开始利用深度学习技术解决这个问题,但是这些方法受到其特定训练集的局限,导致大量的重复工作。因此,作为展示如何克服这个问题的示例,我们构建了一个基于大型视觉模型(LVM)和下游任务的框架,包括星系形态分类、图像修复、目标检测、参数提取等。考虑到星系图像的低信噪比和星系类别的失衡分布,我们将人机交互模块纳入我们的大型视觉模型中,利用人类知识增强处理星系图像的可靠性和可解释性。所提出的框架在DESI遗产成像调查中的星系图像上表现出显著的少样本学习能力和对上述所有任务的变通适应性。具体来说,在我们的LVM上训练1000个数据点后,我们的DST在LVM上可以达到96.7%的准确率,而ResNet50加Mask R-CNN可以实现93.1%的准确率;对于形态分类,要获得AUC ~0.9,只需要1/50的训练集,而ResNet18需要更多的训练集。预计,多模态数据可以按照这种方式整合,为多样域数据集的联合分析提供了可能性,这为在多信使天文学时代进行联合分析提供了可能性。
https://arxiv.org/abs/2405.10890
This review aims to systematically assess the current status and prospects of artificial intelligence (AI) in the rehabilitation management of patients with schizophrenia and their impact on the rehabilitation process. We selected 70 studies from 2012 to the present, focusing on application, technology categories, products, and data types of machine learning, deep learning, reinforcement learning, and other technologies in mental health interventions and management. The results indicate that AI can be widely used in symptom monitoring, relapse risk prediction, and rehabilitation treatment by analyzing ecological momentary assessment, behavioral, and speech data. This review further explores the potential challenges and future directions of emerging products, technologies, and analytical methods based on AI, such as social media analysis, serious games, and large language models in rehabilitation. In summary, this study systematically reviews the application status of AI in schizophrenia rehabilitation management and provides valuable insights and recommendations for future research paths.
本研究旨在系统地评估人工智能(AI)在精神分裂症患者康复管理中的现状和前景,以及其对康复过程的影响。我们选中了2012年至2019年间发表的70篇研究,重点关注机器学习、深度学习、强化学习和其他技术在精神卫生干预和管理中的应用、技术类别、产品和数据类型。研究结果表明,AI可以通过分析生态瞬时评估、行为和语音数据,在症状监测、复发风险预测和康复治疗中得到广泛应用。本研究还深入探讨了基于AI的新兴产品、技术和分析方法,如社交媒体分析、严重游戏和大语言模型的康复应用,为未来研究提供了宝贵的见解和方向。总之,本研究系统地评价了AI在精神分裂症康复管理中的应用状况,为未来研究提供了宝贵的见解和方向。
https://arxiv.org/abs/2405.10883
Time series (TS) forecasting has been an unprecedentedly popular problem in recent years, with ubiquitous applications in both scientific and business fields. Various approaches have been introduced to time series analysis, including both statistical approaches and deep neural networks. Although neural network approaches have illustrated stronger ability of representation than statistical methods, they struggle to provide sufficient interpretablility, and can be too complicated to optimize. In this paper, we present WEITS, a frequency-aware deep learning framework that is highly interpretable and computationally efficient. Through multi-level wavelet decomposition, WEITS novelly infuses frequency analysis into a highly deep learning framework. Combined with a forward-backward residual architecture, it enjoys both high representation capability and statistical interpretability. Extensive experiments on real-world datasets have demonstrated competitive performance of our model, along with its additional advantage of high computation efficiency. Furthermore, WEITS provides a general framework that can always seamlessly integrate with state-of-the-art approaches for time series forecast.
近年来,时间序列(TS)预测是一个前所未有的热门问题,在科学和商业领域都有普遍应用。已经引入了各种方法进行时间序列分析,包括统计方法和深度神经网络。尽管深度神经网络方法在表示能力上表现比统计方法更强,但它们很难提供足够的可解释性,并且可能过于复杂,难以优化。在本文中,我们提出了WEITS,一个关注频率的深度学习框架,具有高度可解释性和计算效率。通过多级小波分解,WEITS将频率分析融入到高度深度学习框架中。结合前馈-反馈残差架构,它既具有高表示能力,又具有统计可解释性。在现实世界数据集上的大量实验证明,我们的模型具有竞争力的性能,并具有高计算效率的附加优势。此外,WEITS提供了一个可以始终无缝集成最先进时间序列预测方法的通用框架。
https://arxiv.org/abs/2405.10877
One way to personalize chatbot interactions is by establishing common ground with the intended reader. A domain where establishing mutual understanding could be particularly impactful is vaccine concerns and misinformation. Vaccine interventions are forms of messaging which aim to answer concerns expressed about vaccination. Tailoring responses in this domain is difficult, since opinions often have seemingly little ideological overlap. We define the task of tailoring vaccine interventions to a Common-Ground Opinion (CGO). Tailoring responses to a CGO involves meaningfully improving the answer by relating it to an opinion or belief the reader holds. In this paper we introduce TAILOR-CGO, a dataset for evaluating how well responses are tailored to provided CGOs. We benchmark several major LLMs on this task; finding GPT-4-Turbo performs significantly better than others. We also build automatic evaluation metrics, including an efficient and accurate BERT model that outperforms finetuned LLMs, investigate how to successfully tailor vaccine messaging to CGOs, and provide actionable recommendations from this investigation. Code and model weights: this https URL Dataset: this https URL
一种个性化聊天机器人互动的方法是与目标读者建立共同点。在疫苗担忧和错误信息方面,建立相互理解可能尤为重要。疫苗干预是一种旨在回答关于接种疫苗的担忧的信息传递形式。在这样一个领域定制回答很难,因为观点往往有很大的意识形态差异。我们定义了将疫苗干预定制到共同观点(CGO)的任务。将回答定制到CGO涉及有意义地改进答案,将其与读者持有的观点或信念相关联。在本文中,我们介绍了TAILOR-CGO数据集,用于评估响应是否充分定制到提供的CGO。我们在该任务上基准了几个主要LLM;发现GPT-4-Turbo的表现优于其他模型。我们还构建了自动评估指标,包括一个高效且准确的BERT模型,该模型超过了微调的LLM。我们研究了如何成功地将疫苗信息定制到CGO,并从这项调查中提供了可行的建议。代码和模型权重:<https://this URL> 数据集:<https://this URL>
https://arxiv.org/abs/2405.10861
Generative pre-trained large language models (LLMs) have demonstrated impressive performance over a wide range of tasks, thanks to the unprecedented amount of data they have been trained on. As established scaling laws indicate, LLMs' future performance improvement depends on the amount of computing and data sources we can leverage for pre-training. Federated learning (FL) has the potential to unleash the majority of the planet's data and computational resources, which are underutilized by the data-center-focused training methodology of current LLM practice. Our work presents a robust, flexible, reproducible FL approach that enables large-scale collaboration across institutions to train LLMs. This would mobilize more computational and data resources while matching or potentially exceeding centralized performance. We further show the effectiveness of the federated training scales with model size and present our approach for training a billion-scale federated LLM using limited resources. This will help data-rich actors to become the protagonists of LLMs pre-training instead of leaving the stage to compute-rich actors alone.
生成式预训练大型语言模型(LLMs)通过训练在各种任务上的前所未有的大量数据,展示了惊人的性能。根据已建立的增长定律,LLMs的未来性能改进取决于我们能够利用的预训练计算和数据资源的数量。去中心化学习(FL)具有释放地球上大部分数据和计算资源潜力,而这些资源在当前LLM实践中的数据集中训练方法不充分利用的特点。我们的工作提出了一种稳健、灵活、可重复的FL方法,可以促进机构之间的大规模合作来训练LLMs。这将动用更多的计算和数据资源,并有可能超过集中式性能。我们进一步证明了FL规模与模型大小成正比的有效性,并提出了用有限资源训练十亿规模的联邦LLM的方法。这将帮助数据丰富的主体成为LLM预训练的拥护者,而不是让计算丰富的主体独自占据舞台。
https://arxiv.org/abs/2405.10853
The Shapley value (SV) is a prevalent approach of allocating credit to machine learning (ML) entities to understand black box ML models. Enriching such interpretations with higher-order interactions is inevitable for complex systems, where the Shapley Interaction Index (SII) is a direct axiomatic extension of the SV. While it is well-known that the SV yields an optimal approximation of any game via a weighted least square (WLS) objective, an extension of this result to SII has been a long-standing open problem, which even led to the proposal of an alternative index. In this work, we characterize higher-order SII as a solution to a WLS problem, which constructs an optimal approximation via SII and $k$-Shapley values ($k$-SII). We prove this representation for the SV and pairwise SII and give empirically validated conjectures for higher orders. As a result, we propose KernelSHAP-IQ, a direct extension of KernelSHAP for SII, and demonstrate state-of-the-art performance for feature interactions.
Shapley值(SV)是一种将信用分配给机器学习(ML)实体以理解黑盒ML模型的流行方法。通过提高阶数相互作用,对复杂系统中的这种解释进行拓展是不可避免的。Shapley交互指数(SII)是SV的直接轴理扩展。虽然SV通过加权最小二乘(WLS)目标给出任何游戏的最优近似是一个众所周知的结果,但将这个结果扩展到SII是一个长期未解决的问题,甚至导致了另一种索引的提出。在本文中,我们将高阶SII描述为通过SII和$k$ - Shapley值($k$ - SII)解决WLS问题的解决方案。我们证明了SV和高阶SII,并给出了关于高阶数的经验验证猜测。因此,我们提出了KernelSHAP-IQ,直接扩展了KernelSHAP用于SII,并证明了对于特征交互的功能最先进的性能。
https://arxiv.org/abs/2405.10852
Partially observable Markov Decision Processes (POMDPs) are a standard model for agents making decisions in uncertain environments. Most work on POMDPs focuses on synthesizing strategies based on the available capabilities. However, system designers can often control an agent's observation capabilities, e.g. by placing or selecting sensors. This raises the question of how one should select an agent's sensors cost-effectively such that it achieves the desired goals. In this paper, we study the novel optimal observability problem OOP: Given a POMDP M, how should one change M's observation capabilities within a fixed budget such that its (minimal) expected reward remains below a given threshold? We show that the problem is undecidable in general and decidable when considering positional strategies only. We present two algorithms for a decidable fragment of the OOP: one based on optimal strategies of M's underlying Markov decision process and one based on parameter synthesis with SMT. We report promising results for variants of typical examples from the POMDP literature.
部分可观测的马尔可夫决策过程(POMDP)是不确定环境中智能体做出决策的标准模型。大部分关于POMDP的工作都集中在基于可用能力合成策略上。然而,系统设计者通常可以控制智能体的观测能力,例如通过放置或选择传感器。这引发了这样一个问题:一个人应该如何在固定的预算内选择智能体的传感器成本效益,使得其达到期望的目标?在本文中,我们研究了新的最优可观测性问题OOP:给定一个POMDP M,如何在固定预算内改变M的观测能力,使得其(最小)期望回报低于给定阈值?我们证明了这个问题在一般情况下是不可解的,但在考虑仅位置策略时是可解的。我们提出了两个解决OOP问题的算法:一个基于M底层马尔可夫决策过程的最优策略,一个基于参数合成与SMT。我们报告了POMDP文献中典型例子的良好结果。
https://arxiv.org/abs/2405.10768
Deep learning models have performed well on many NLP tasks. However, their internal mechanisms are typically difficult for humans to understand. The development of methods to explain models has become a key issue in the reliability of deep learning models in many important applications. Various saliency explanation methods, which give each feature of input a score proportional to the contribution of output, have been proposed to determine the part of the input which a model values most. Despite a considerable body of work on the evaluation of saliency methods, whether the results of various evaluation metrics agree with human cognition remains an open question. In this study, we propose a new human-based method to evaluate saliency methods in NLP by crowdsourcing. We recruited 800 crowd workers and empirically evaluated seven saliency methods on two datasets with the proposed method. We analyzed the performance of saliency methods, compared our results with existing automated evaluation methods, and identified notable differences between NLP and computer vision (CV) fields when using saliency methods. The instance-level data of our crowdsourced experiments and the code to reproduce the explanations are available at this https URL.
深度学习模型在许多自然语言处理任务上表现良好。然而,它们内部的机制通常对人类来说很难理解。为了确保深度学习模型的可靠性,研究如何解释模型变得越来越重要。各种局部解释方法(为输入的每个特征分配一个与输出贡献成比例的分数)被提出,以确定模型最看重输入的哪个部分。尽管在评估局部解释方法方面已经进行了大量工作,但不同评估指标的结果是否与人类认知相一致仍然是一个未解决的问题。在本文中,我们提出了一个基于人群的方法来评估自然语言处理中的局部解释方法。我们招募了800名人群工作者,在两个数据集上采用所提出的方法对七种局部解释方法进行了实证评估。我们分析了局部解释方法的表现,将我们的结果与现有的自动化评估方法进行了比较,并指出了在自然语言处理和计算机视觉(CV)领域使用局部解释方法时的一些显著差异。我们的人群实验的实例级数据和用于重现解释的代码都可以在以下链接找到:https://www.academia.edu/39411041/CrowdSourced_Local_Explanations_for_Natural_Language_Processing
https://arxiv.org/abs/2405.10767
In the realm of globalized financial markets, commercial banks are confronted with an escalating magnitude of credit risk, thereby imposing heightened requisites upon the security of bank assets and financial stability. This study harnesses advanced neural network techniques, notably the Backpropagation (BP) neural network, to pioneer a novel model for preempting credit risk in commercial banks. The discourse initially scrutinizes conventional financial risk preemptive models, such as ARMA, ARCH, and Logistic regression models, critically analyzing their real-world applications. Subsequently, the exposition elaborates on the construction process of the BP neural network model, encompassing network architecture design, activation function selection, parameter initialization, and objective function construction. Through comparative analysis, the superiority of neural network models in preempting credit risk in commercial banks is elucidated. The experimental segment selects specific bank data, validating the model's predictive accuracy and practicality. Research findings evince that this model efficaciously enhances the foresight and precision of credit risk management.
在全球化的金融市场中,商业银行面临日益增加的信用风险,从而对银行资产的安全性和金融稳定性提出了更高的要求。本研究利用先进的神经网络技术,特别是反向传播(BP)神经网络,开创了一种新型模型来预防商业银行的信用风险。研究首先审查了传统的金融风险预防模型,如ARMA、ARCH和逻辑回归模型,并对其现实应用进行深入分析。随后,论述详细介绍了BP神经网络模型的构建过程,包括网络架构设计、激活函数选择、参数初始化和目标函数构建。通过比较分析,阐明了神经网络模型在商业银行预先防范信用风险方面的优越性。实验部分选择了特定的银行数据,验证了模型的预测准确性和实用性。研究结果表明,该模型有效提高了信用风险管理的预见性和精度。
https://arxiv.org/abs/2405.10762
Artificially sweetened beverages like Diet Coke are often considered healthier alternatives, but the debate over their impact on obesity persists. Previous research has predominantly relied on observational data or randomized controlled trials (RCTs), which may not accurately capture the causal relationship between Diet Coke consumption and obesity. This study uses causal inference methods, employing data from the National Health and Nutrition Examination Survey (NHANES) to examine this relationship across diverse demographics. Instead of relying on RCT data, we constructed a causal graph and applied the back-door criterion with its adjustment formula to estimate the RCT distributions. We then calculated the counterfactual quantity, the Probability of Necessity and Sufficiency (PNS), using both NHANES data and estimated RCT data. We propose that PNS is the essential metric for assessing the impact of Diet Coke on obesity. Our results indicate that between 20% to 50% of individuals, especially those with poor dietary habits, are more likely to gain weight from Diet Coke. Conversely, in groups like young females with healthier diets, only a small proportion experience weight gain due to Diet Coke. These findings highlight the influence of individual lifestyle and potential hormonal factors on the varied effects of Diet Coke, providing a new framework for understanding its nutritional impacts on health.
人工甜化饮料,如Diet Coke,通常被视为更健康的替代品,但有关其对肥胖的影响的争论仍然存在。先前的研究主要依赖观察数据或随机对照试验(RCTs),这些数据可能无法准确捕捉到Diet Coke消费与肥胖之间的因果关系。本研究利用因果推断方法,利用国家健康和营养调查(NHANES)的数据探讨这种关系在不同人口中的差异。我们没有依赖RCT数据,而是构建了一个因果图并应用其调整公式估计RCT分布。然后我们使用NHANES数据和估计的RCT数据计算反事实量,概率需求和满足(PNS)。我们提出,PNS是评估Diet Coke对肥胖影响的 essential metric。我们的结果显示,在20%至50%的个体中,尤其是那些饮食不良的人,更有可能从Diet Coke中增重。相反,像有更健康饮食的年轻女性这样的群体,只有少量人由于Diet Coke而增重。这些发现突出了个人生活方式和潜在的荷尔蒙因素对Diet Coke多样影响的影響,为理解其对健康的营养影响提供了新的框架。
https://arxiv.org/abs/2405.10746
Knowledge-intensive tasks pose a significant challenge for Machine Learning (ML) techniques. Commonly adopted methods, such as Large Language Models (LLMs), often exhibit limitations when applied to such tasks. Nevertheless, there have been notable endeavours to mitigate these challenges, with a significant emphasis on augmenting LLMs through Knowledge Graphs (KGs). While KGs provide many advantages for representing knowledge, their development costs can deter extensive research and applications. Addressing this limitation, we introduce a framework for enriching embeddings of small-scale domain-specific Knowledge Graphs with well-established general-purpose KGs. Adopting our method, a modest domain-specific KG can benefit from a performance boost in downstream tasks when linked to a substantial general-purpose KG. Experimental evaluations demonstrate a notable enhancement, with up to a 44% increase observed in the Hits@10 metric. This relatively unexplored research direction can catalyze more frequent incorporation of KGs in knowledge-intensive tasks, resulting in more robust, reliable ML implementations, which hallucinates less than prevalent LLM solutions. Keywords: knowledge graph, knowledge graph completion, entity alignment, representation learning, machine learning
知识密集型任务对机器学习(ML)技术带来了巨大的挑战。通常采用的方法,如大型语言模型(LLMs),当应用于这类任务时往往表现出局限性。然而,已经取得了一些有意义的研究来缓解这些挑战,并重点关注通过知识图谱(KGs)增强LLMs。虽然KGs为表示知识提供了许多优势,但它们的发展成本可能会阻碍广泛的研究和应用。为解决这一局限,我们引入了一个用于丰富小规模领域特定知识图谱的嵌入框架,并与其对应的一般性通用知识图谱。采用我们方法的一个小规模领域特定KG可以在与大量通用KG链接时提高下游任务的性能。实验评估表明,在Hits@10指标上观察到了显著的增强,最高达到44%的观察值。这个相对尚未探索的研究方向可以催化更频繁地将知识图谱(KGs)应用于知识密集型任务,从而实现更健壮、可靠的ML实现,这远非流行的LLM解决方案所梦寐以求的。关键词:知识图谱,知识图谱完成,实体对齐,表示学习,机器学习
https://arxiv.org/abs/2405.10745
In the past year, Multimodal Large Language Models (MLLMs) have demonstrated remarkable performance in tasks such as visual question answering, visual understanding and reasoning. However, the extensive model size and high training and inference costs have hindered the widespread application of MLLMs in academia and industry. Thus, studying efficient and lightweight MLLMs has enormous potential, especially in edge computing scenarios. In this survey, we provide a comprehensive and systematic review of the current state of efficient MLLMs. Specifically, we summarize the timeline of representative efficient MLLMs, research state of efficient structures and strategies, and the applications. Finally, we discuss the limitations of current efficient MLLM research and promising future directions. Please refer to our GitHub repository for more details: this https URL.
在过去的一年里,多模态大型语言模型(MLLMs)在诸如视觉问答、视觉理解和推理等任务上表现出了非凡的性能。然而,MLLMs的大型模型大小和高训练和推理成本限制了其在学术界和产业界的广泛应用。因此,研究高效且轻量级的MLLM具有巨大的潜力,尤其是在边缘计算场景中。在本次调查中,我们全面系统地回顾了高效MLLM的当前状态。具体来说,我们总结了具有代表性的高效MLLM的时间线、有效结构策略和研究现状,并讨论了当前高效MLLM研究的局限性和有前景的未来方向。更多详情,请参阅我们的GitHub仓库:https://github.com。
https://arxiv.org/abs/2405.10739
AI has become pervasive in recent years, but state-of-the-art approaches predominantly neglect the need for AI systems to be contestable. Instead, contestability is advocated by AI guidelines (e.g. by the OECD) and regulation of automated decision-making (e.g. GDPR). In this position paper we explore how contestability can be achieved computationally in and for AI. We argue that contestable AI requires dynamic (human-machine and/or machine-machine) explainability and decision-making processes, whereby machines can (i) interact with humans and/or other machines to progressively explain their outputs and/or their reasoning as well as assess grounds for contestation provided by these humans and/or other machines, and (ii) revise their decision-making processes to redress any issues successfully raised during contestation. Given that much of the current AI landscape is tailored to static AIs, the need to accommodate contestability will require a radical rethinking, that, we argue, computational argumentation is ideally suited to support.
近年来,AI 已经成为普遍存在,但最先进的方法主要忽视了 AI 系统需要具有可争议性的需求。相反,AI 指南和自动化决策法规(如 OECD 和 GDPR)提倡可争议性。在本文立场文件中,我们探讨了如何通过计算实现可争议的 AI。我们认为,具有可争议性的 AI 需要动态(人机互动和/或机机互动)可解释性和决策过程,从而使机器可以(i)与人类和其他机器互动并逐步解释其输出以及/或其推理,同时评估人类和其他机器提出的可争议性基础,(ii)修改其决策过程以纠正 contestation 中成功提出的任何问题。考虑到当前 AI 领域的许多内容都是为静态 AI 设计的,为适应可争议性需要进行彻底的重构,而我们认为,计算推理正是一个理想的支持方式。
https://arxiv.org/abs/2405.10729
Coreference resolution, critical for identifying textual entities referencing the same entity, faces challenges in pronoun resolution, particularly identifying pronoun antecedents. Existing methods often treat pronoun resolution as a separate task from mention detection, potentially missing valuable information. This study proposes the first end-to-end neural network system for Persian pronoun resolution, leveraging pre-trained Transformer models like ParsBERT. Our system jointly optimizes both mention detection and antecedent linking, achieving a 3.37 F1 score improvement over the previous state-of-the-art system (which relied on rule-based and statistical methods) on the Mehr corpus. This significant improvement demonstrates the effectiveness of combining neural networks with linguistic models, potentially marking a significant advancement in Persian pronoun resolution and paving the way for further research in this under-explored area.
核心参考文献解决,对于识别相同实体文本中的实体,面临代词解决挑战,特别是确定代词前缀。现有方法通常将代词解决视为与提及检测分开的任务,可能错过有价值的信息。本研究提出了第一个端到端的波斯语代词解决方案,利用像ParsBERT这样的预训练Transformer模型。我们的系统共同优化提及检测和前缀链接,在Mehr语料库上的性能比之前最先进的系统(依赖规则和统计方法)提高了3.37个F1分数。这一显著的改善表明了将神经网络与语言模型相结合的有效性,可能标志着波斯语代词解决领域的重要进展,并为这个研究不足探索领域进一步研究铺平道路。
https://arxiv.org/abs/2405.10714
Drought is a complex environmental phenomenon that affects millions of people and communities all over the globe and is too elusive to be accurately predicted. This is mostly due to the scalability and variability of the web of environmental parameters that directly/indirectly causes the onset of different categories of drought. Since the dawn of man, efforts have been made to uniquely understand the natural indicators that provide signs of likely environmental events. These indicators/signs in the form of indigenous knowledge system have been used for generations. The intricate complexity of drought has, however, always been a major stumbling block for accurate drought prediction and forecasting systems. Recently, scientists in the field of agriculture and environmental monitoring have been discussing the integration of indigenous knowledge and scientific knowledge for a more accurate environmental forecasting system in order to incorporate diverse environmental information for a reliable drought forecast. Hence, in this research, the core objective is the development of a semantics-based data integration middleware that encompasses and integrates heterogeneous data models of local indigenous knowledge and sensor data towards an accurate drought forecasting system for the study areas. The local indigenous knowledge on drought gathered from the domain experts is transformed into rules to be used for performing deductive inference in conjunction with sensors data for determining the onset of drought through an automated inference generation module of the middleware. The semantic middleware incorporates, inter alia, a distributed architecture that consists of a streaming data processing engine based on Apache Kafka for real-time stream processing; a rule-based reasoning module; an ontology module for semantic representation of the knowledge bases.
干旱是一种全球性的复杂环境现象,影响了数百万人和社区。由于环境参数的复杂性和可变性,准确预测干旱的发生仍然是困难的。这主要是因为导致不同类型干旱发生的环境参数网络的可扩展性和变异性。自人类起源以来,人们一直在努力独特地理解提供可能环境事件迹象的自然指示器。这些指示器/迹象在原住民知识系统中以图形式呈现,被几代人用于实践。然而,干旱的复杂性总是让准确干旱预测和预测系统陷入困境。最近,农业和环境监测领域的科学家们正在讨论将原住民知识和科学知识相结合以建立更准确的环保预测系统,以便纳入多样化的环境信息以实现可靠的干旱预测。因此,在这项研究中,核心目标是开发一个基于语义的数据集成中间件,该中间件包括并整合了地方原住民知识和传感器数据的异质数据模式,以建立准确干旱预测系统。通过对当地专家收集的干旱本地知识进行转换,该中间件将规则用于使用传感器数据进行演绎推理以确定干旱的发生。该语义中间件包括,等等,一个基于Apache Kafka的流式数据处理引擎进行实时处理;基于规则的推理模块;一个知识库模块,用于表示知识基础的语义表示。
https://arxiv.org/abs/2405.10713
Diaspora communities are disproportionately impacted by off-the-radar misinformation and often neglected by mainstream fact-checking efforts, creating a critical need to scale-up efforts of nascent fact-checking initiatives. In this paper we present SynDy, a framework for Synthetic Dynamic Dataset Generation to leverage the capabilities of the largest frontier Large Language Models (LLMs) to train local, specialized language models. To the best of our knowledge, SynDy is the first paper utilizing LLMs to create fine-grained synthetic labels for tasks of direct relevance to misinformation mitigation, namely Claim Matching, Topical Clustering, and Claim Relationship Classification. SynDy utilizes LLMs and social media queries to automatically generate distantly-supervised, topically-focused datasets with synthetic labels on these three tasks, providing essential tools to scale up human-led fact-checking at a fraction of the cost of human-annotated data. Training on SynDy's generated labels shows improvement over a standard baseline and is not significantly worse compared to training on human labels (which may be infeasible to acquire). SynDy is being integrated into Meedan's chatbot tiplines that are used by over 50 organizations, serve over 230K users annually, and automatically distribute human-written fact-checks via messaging apps such as WhatsApp. SynDy will also be integrated into our deployed Co-Insights toolkit, enabling low-resource organizations to launch tiplines for their communities. Finally, we envision SynDy enabling additional fact-checking tools such as matching new misinformation claims to high-quality explainers on common misinformation topics.
离散社区受到的是来自非公开传播的错误信息的不公平影响,往往被主流事实核查忽略了,这导致有必要扩大新兴事实核查倡议的工作规模。在本文中,我们提出了SynDy,一个利用大型前沿大型语言模型的框架来训练本地专业语言模型的合成动态数据集生成框架。据我们所知,SynDy是第一个利用大型语言模型为防止错误信息传播任务创建细粒度人造标签的论文。SynDy利用大型语言模型和社交媒体查询来自动生成这三个任务上带有人造标签的距离监督、主题集中的数据,为人类引导下的实地核实提供关键工具,而成本仅为人类标注数据的几分之一。在SynDy生成的标签上进行训练表明,效果优于标准基线,与人类标签训练相比,并没有显著的差别(这可能不可行)。SynDy正在被整合到Meedan的聊天机器人建议中,该机器人建议由50个组织使用,每年服务超过230,000用户,并通过消息应用程序如WhatsApp自动分发通过的事实核查。SynDy还将被整合到我们部署的Co-Insights工具包中,使资源有限的组织能够为自己的社区启动建议。最后,我们展望SynDy将能够为其他事实核查工具提供支持,如将新错误主张与高质量解释器匹配,这些解释器通常涉及常见的错误信息主题。
https://arxiv.org/abs/2405.10700
Our study focuses on comparing the performance and resource requirements between different Long Short-Term Memory (LSTM) neural network architectures and an ANN specialized architecture for forex market prediction. We analyze the execution time of the models as well as the resources consumed, such as memory and computational power. Our aim is to demonstrate that the specialized architecture not only achieves better results in forex market prediction but also executes using fewer resources and in a shorter time frame compared to LSTM architectures. This comparative analysis will provide significant insights into the suitability of these two types of architectures for time series prediction in the forex market environment.
我们的研究重点在于比较不同长短时记忆(LSTM)神经网络架构和专为外汇市场预测设计的ANN专用架构在执行时间和资源需求方面的性能。我们分析模型的执行时间和消耗的资源,包括内存和计算能力。我们的目标是证明,专用架构不仅在外汇市场预测中实现更好的结果,而且使用更少的资源和更短的时间框架执行。这种比较分析将为外汇市场环境中时序预测这两种架构的适用性提供重要的见解。
https://arxiv.org/abs/2405.10679
With impressive achievements made, artificial intelligence is on the path forward to artificial general intelligence. Sora, developed by OpenAI, which is capable of minute-level world-simulative abilities can be considered as a milestone on this developmental path. However, despite its notable successes, Sora still encounters various obstacles that need to be resolved. In this survey, we embark from the perspective of disassembling Sora in text-to-video generation, and conducting a comprehensive review of literature, trying to answer the question, \textit{From Sora What We Can See}. Specifically, after basic preliminaries regarding the general algorithms are introduced, the literature is categorized from three mutually perpendicular dimensions: evolutionary generators, excellent pursuit, and realistic panorama. Subsequently, the widely used datasets and metrics are organized in detail. Last but more importantly, we identify several challenges and open problems in this domain and propose potential future directions for research and development.
在取得令人印象深刻的成就后,人工智能正朝着实现通用人工智能(AGI)的方向前进。由OpenAI开发的Sora,具有能够在微观层面上模拟世界的能力,可以被视为这一发展道路上的里程碑。然而,尽管Sora取得了显著的成功,它仍然需要解决各种需要解决的问题。在这次调查中,我们从文本到视频生成的角度对Sora进行了拆解,并全面回顾了相关文献,试图回答这个问题:“从Sora中我们可以看到什么?” 具体来说,在介绍关于通用算法的基本前提后,文献被从三个相互垂直的维度进行分类:进化生成器、优秀追求和现实主义全景。随后,详细整理了广泛使用的数据集和指标。最后,最重要的是,我们在该领域确定了几个挑战和公开问题,并为未来的研究和开发提出了潜在方向。
https://arxiv.org/abs/2405.10674