Artificially sweetened beverages like Diet Coke are often considered healthier alternatives, but the debate over their impact on obesity persists. Previous research has predominantly relied on observational data or randomized controlled trials (RCTs), which may not accurately capture the causal relationship between Diet Coke consumption and obesity. This study uses causal inference methods, employing data from the National Health and Nutrition Examination Survey (NHANES) to examine this relationship across diverse demographics. Instead of relying on RCT data, we constructed a causal graph and applied the back-door criterion with its adjustment formula to estimate the RCT distributions. We then calculated the counterfactual quantity, the Probability of Necessity and Sufficiency (PNS), using both NHANES data and estimated RCT data. We propose that PNS is the essential metric for assessing the impact of Diet Coke on obesity. Our results indicate that between 20% to 50% of individuals, especially those with poor dietary habits, are more likely to gain weight from Diet Coke. Conversely, in groups like young females with healthier diets, only a small proportion experience weight gain due to Diet Coke. These findings highlight the influence of individual lifestyle and potential hormonal factors on the varied effects of Diet Coke, providing a new framework for understanding its nutritional impacts on health.
人工甜化饮料,如Diet Coke,通常被视为更健康的替代品,但有关其对肥胖的影响的争论仍然存在。先前的研究主要依赖观察数据或随机对照试验(RCTs),这些数据可能无法准确捕捉到Diet Coke消费与肥胖之间的因果关系。本研究利用因果推断方法,利用国家健康和营养调查(NHANES)的数据探讨这种关系在不同人口中的差异。我们没有依赖RCT数据,而是构建了一个因果图并应用其调整公式估计RCT分布。然后我们使用NHANES数据和估计的RCT数据计算反事实量,概率需求和满足(PNS)。我们提出,PNS是评估Diet Coke对肥胖影响的 essential metric。我们的结果显示,在20%至50%的个体中,尤其是那些饮食不良的人,更有可能从Diet Coke中增重。相反,像有更健康饮食的年轻女性这样的群体,只有少量人由于Diet Coke而增重。这些发现突出了个人生活方式和潜在的荷尔蒙因素对Diet Coke多样影响的影響,为理解其对健康的营养影响提供了新的框架。
https://arxiv.org/abs/2405.10746
Diaspora communities are disproportionately impacted by off-the-radar misinformation and often neglected by mainstream fact-checking efforts, creating a critical need to scale-up efforts of nascent fact-checking initiatives. In this paper we present SynDy, a framework for Synthetic Dynamic Dataset Generation to leverage the capabilities of the largest frontier Large Language Models (LLMs) to train local, specialized language models. To the best of our knowledge, SynDy is the first paper utilizing LLMs to create fine-grained synthetic labels for tasks of direct relevance to misinformation mitigation, namely Claim Matching, Topical Clustering, and Claim Relationship Classification. SynDy utilizes LLMs and social media queries to automatically generate distantly-supervised, topically-focused datasets with synthetic labels on these three tasks, providing essential tools to scale up human-led fact-checking at a fraction of the cost of human-annotated data. Training on SynDy's generated labels shows improvement over a standard baseline and is not significantly worse compared to training on human labels (which may be infeasible to acquire). SynDy is being integrated into Meedan's chatbot tiplines that are used by over 50 organizations, serve over 230K users annually, and automatically distribute human-written fact-checks via messaging apps such as WhatsApp. SynDy will also be integrated into our deployed Co-Insights toolkit, enabling low-resource organizations to launch tiplines for their communities. Finally, we envision SynDy enabling additional fact-checking tools such as matching new misinformation claims to high-quality explainers on common misinformation topics.
离散社区受到的是来自非公开传播的错误信息的不公平影响,往往被主流事实核查忽略了,这导致有必要扩大新兴事实核查倡议的工作规模。在本文中,我们提出了SynDy,一个利用大型前沿大型语言模型的框架来训练本地专业语言模型的合成动态数据集生成框架。据我们所知,SynDy是第一个利用大型语言模型为防止错误信息传播任务创建细粒度人造标签的论文。SynDy利用大型语言模型和社交媒体查询来自动生成这三个任务上带有人造标签的距离监督、主题集中的数据,为人类引导下的实地核实提供关键工具,而成本仅为人类标注数据的几分之一。在SynDy生成的标签上进行训练表明,效果优于标准基线,与人类标签训练相比,并没有显著的差别(这可能不可行)。SynDy正在被整合到Meedan的聊天机器人建议中,该机器人建议由50个组织使用,每年服务超过230,000用户,并通过消息应用程序如WhatsApp自动分发通过的事实核查。SynDy还将被整合到我们部署的Co-Insights工具包中,使资源有限的组织能够为自己的社区启动建议。最后,我们展望SynDy将能够为其他事实核查工具提供支持,如将新错误主张与高质量解释器匹配,这些解释器通常涉及常见的错误信息主题。
https://arxiv.org/abs/2405.10700
Weakly supervised audio-visual video parsing (AVVP) methods aim to detect audible-only, visible-only, and audible-visible events using only video-level labels. Existing approaches tackle this by leveraging unimodal and cross-modal contexts. However, we argue that while cross-modal learning is beneficial for detecting audible-visible events, in the weakly supervised scenario, it negatively impacts unaligned audible or visible events by introducing irrelevant modality information. In this paper, we propose CoLeaF, a novel learning framework that optimizes the integration of cross-modal context in the embedding space such that the network explicitly learns to combine cross-modal information for audible-visible events while filtering them out for unaligned events. Additionally, as videos often involve complex class relationships, modelling them improves performance. However, this introduces extra computational costs into the network. Our framework is designed to leverage cross-class relationships during training without incurring additional computations at inference. Furthermore, we propose new metrics to better evaluate a method's capabilities in performing AVVP. Our extensive experiments demonstrate that CoLeaF significantly improves the state-of-the-art results by an average of 1.9% and 2.4% F-score on the LLP and UnAV-100 datasets, respectively.
弱监督音频-视频视频解析(AVVP)方法旨在通过仅使用视频级别的标签来检测可听见的仅事件,可视见的仅事件和可听见的可视事件。现有方法通过利用单模态和跨模态上下文来解决此问题。然而,我们 argue,尽管跨模态学习对于检测可听见的可视事件是有益的,但在弱监督场景中,它会负面影响,引入无关模态信息,从而排除可听见的或可视事件。在本文中,我们提出了CoLeaF,一种新的学习框架,旨在在嵌入空间中优化跨模态上下文的整合,使得网络明确学习如何将跨模态信息与可听见的可视事件结合在一起,同时排除不相关的模态信息。此外,由于视频通常涉及复杂的分类关系,建模它们可以提高性能。然而,这引入了额外的计算成本到网络中。我们的框架旨在在训练过程中利用跨类关系,而不会增加额外的计算成本。此外,我们提出了新的指标,以更好地评估方法在执行AVVP方面的能力。我们丰富的实验证明,CoLeaF通过平均提高LLP和UnAV-100数据集上的F1分数1.9%和2.4%来显著改善了最先进的AVVP结果。
https://arxiv.org/abs/2405.10690
Temporal Knowledge Graph (TKG) reasoning focuses on predicting events through historical information within snapshots distributed on a timeline. Existing studies mainly concentrate on two perspectives of leveraging the history of TKGs, including capturing evolution of each recent snapshot or correlations among global historical facts. Despite the achieved significant accomplishments, these models still fall short of (1) investigating the influences of multi-granularity interactions across recent snapshots and (2) harnessing the expressive semantics of significant links accorded with queries throughout the entire history, especially events exerting a profound impact on the future. These inadequacies restrict representation ability to reflect historical dependencies and future trends thoroughly. To overcome these drawbacks, we propose an innovative TKG reasoning approach towards \textbf{His}torically \textbf{R}elevant \textbf{E}vents \textbf{S}tructuring ($\mathsf{HisRES}$). Concretely, $\mathsf{HisRES}$ comprises two distinctive modules excelling in structuring historically relevant events within TKGs, including a multi-granularity evolutionary encoder that captures structural and temporal dependencies of the most recent snapshots, and a global relevance encoder that concentrates on crucial correlations among events relevant to queries from the entire history. Furthermore, $\mathsf{HisRES}$ incorporates a self-gating mechanism for adaptively merging multi-granularity recent and historically relevant structuring representations. Extensive experiments on four event-based benchmarks demonstrate the state-of-the-art performance of $\mathsf{HisRES}$ and indicate the superiority and effectiveness of structuring historical relevance for TKG reasoning.
temporal知识图(TKG)推理关注通过时间轴上分发的快照中的历史信息来预测事件。现有研究主要集中在利用TKG历史两个方面,包括捕捉每个最近快照的演化过程以及全球历史事实之间的关联。尽管取得了显著的成就,但这些模型仍然不足以(1)研究多粒度交互对最近快照之间影响的调查,(2)利用整个历史中与查询相关的显著链接的语义语义。这些不足限制了表示能力,不能充分反映历史依赖和未来趋势。为了克服这些缺陷,我们提出了一个创新的历史相关事件构建(HisRElevant Structuring,\拼写为$\mathsf{HisRES}$)的TKG推理方法。具体来说,$\mathsf{HisRES}$包括两个具有独特功能的模块,在TKG中构建历史相关事件,包括一个多粒度进化编码器,它捕捉了最近快照的结构和时间依赖;一个全局相关编码器,它集中于整个历史中与查询相关的关键关联。此外,$\mathsf{HisRES}$引入了一个自适应合并多粒度最近和历史相关结构表示的自门机制。在四个基于事件的基准测试中进行的广泛实验证明了$\mathsf{HisRES}$的最先进性能,并表明了为TKG推理构建历史相关性的优越性和有效性。
https://arxiv.org/abs/2405.10621
In the Vision-and-Language Navigation (VLN) task, the agent is required to navigate to a destination following a natural language instruction. While learning-based approaches have been a major solution to the task, they suffer from high training costs and lack of interpretability. Recently, Large Language Models (LLMs) have emerged as a promising tool for VLN due to their strong generalization capabilities. However, existing LLM-based methods face limitations in memory construction and diversity of navigation strategies. To address these challenges, we propose a suite of techniques. Firstly, we introduce a method to maintain a topological map that stores navigation history, retaining information about viewpoints, objects, and their spatial relationships. This map also serves as a global action space. Additionally, we present a Navigation Chain of Thoughts module, leveraging human navigation examples to enrich navigation strategy diversity. Finally, we establish a pipeline that integrates navigational memory and strategies with perception and action prediction modules. Experimental results on the REVERIE and R2R datasets show that our method effectively enhances the navigation ability of the LLM and improves the interpretability of navigation reasoning.
在视觉与语言导航(VLN)任务中,代理需要根据自然语言指令导航到目标。虽然基于学习的解决方案是解决这个问题的主要方法,但它们在训练成本高和可解释性差方面存在局限性。最近,大型语言模型(LLMs)已成为VLN的有前途的工具,因为它们具有强大的泛化能力。然而,现有的LLM-based方法在记忆构建和导航策略多样性方面存在局限性。为了应对这些挑战,我们提出了一个系列技术。首先,我们引入了一种方法来维护一个拓扑地图,该地图存储了导航历史,包括视点、物体及其空间关系。这个地图还作为全局动作空间。此外,我们提出了一个基于人类导航示例的导航链思考模块,以丰富导航策略的多样性。最后,我们建立了一个将导航记忆和策略与感知和动作预测模块集成起来的流水线。在REVERIE和R2R数据集上的实验结果表明,我们的方法有效地增强了LLM的导航能力,并提高了导航推理的可解释性。
https://arxiv.org/abs/2405.10620
The crux of Referring Video Object Segmentation (RVOS) lies in modeling dense text-video relations to associate abstract linguistic concepts with dynamic visual contents at pixel-level. Current RVOS methods typically use vision and language models pre-trained independently as backbones. As images and texts are mapped to uncoupled feature spaces, they face the arduous task of learning Vision-Language~(VL) relation modeling from scratch. Witnessing the success of Vision-Language Pre-trained (VLP) models, we propose to learn relation modeling for RVOS based on their aligned VL feature space. Nevertheless, transferring VLP models to RVOS is a deceptively challenging task due to the substantial gap between the pre-training task (image/region-level prediction) and the RVOS task (pixel-level prediction in videos). In this work, we introduce a framework named VLP-RVOS to address this transfer challenge. We first propose a temporal-aware prompt-tuning method, which not only adapts pre-trained representations for pixel-level prediction but also empowers the vision encoder to model temporal clues. We further propose to perform multi-stage VL relation modeling while and after feature extraction for comprehensive VL understanding. Besides, we customize a cube-frame attention mechanism for spatial-temporal reasoning. Extensive experiments demonstrate that our method outperforms state-of-the-art algorithms and exhibits strong generalization abilities.
RVOS(参考视频对象分割)的核心在于将丰富的文本-视频关系建模为将抽象语言概念与动态视觉内容关联起来。当前的RVOS方法通常使用独立预训练的视觉和语言模型作为骨干网络。由于图像和文本被映射到分离的特征空间中,它们面临着从零开始学习Vision-Language(VL)关系模型的艰难任务。鉴于Vision-Language预训练(VLP)模型的成功,我们提出了基于它们对齐的VL特征空间学习RVOS关系模型的方法。然而,将VLP模型转移到RVOS是一个欺骗性的挑战,因为预训练任务(图像/区域级别预测)与RVOS任务(视频级别预测)之间存在很大差距。在本文中,我们引入了一个名为VLP-RVOS的工作框架来解决这个传输挑战。我们首先提出了一个时间感知提示调整方法,该方法不仅适应像素级别预测的预训练表示,而且还赋予了视觉编码器建模时间提示的能力。我们进一步提出了一种在特征提取之前和之后进行多阶段VL关系建模的方法,以实现全面的VL理解。此外,我们还自定义了一个立方体框架注意机制来进行空间-时间推理。大量实验证明,我们的方法超越了最先进的算法,并表现出强大的泛化能力。
https://arxiv.org/abs/2405.10610
In cardiac Magnetic Resonance Imaging (MRI) analysis, simultaneous myocardial segmentation and T2 quantification are crucial for assessing myocardial pathologies. Existing methods often address these tasks separately, limiting their synergistic potential. To address this, we propose SQNet, a dual-task network integrating Transformer and Convolutional Neural Network (CNN) components. SQNet features a T2-refine fusion decoder for quantitative analysis, leveraging global features from the Transformer, and a segmentation decoder with multiple local region supervision for enhanced accuracy. A tight coupling module aligns and fuses CNN and Transformer branch features, enabling SQNet to focus on myocardium regions. Evaluation on healthy controls (HC) and acute myocardial infarction patients (AMI) demonstrates superior segmentation dice scores (89.3/89.2) compared to state-of-the-art methods (87.7/87.9). T2 quantification yields strong linear correlations (Pearson coefficients: 0.84/0.93) with label values for HC/AMI, indicating accurate mapping. Radiologist evaluations confirm SQNet's superior image quality scores (4.60/4.58 for segmentation, 4.32/4.42 for T2 quantification) over state-of-the-art methods (4.50/4.44 for segmentation, 3.59/4.37 for T2 quantification). SQNet thus offers accurate simultaneous segmentation and quantification, enhancing cardiac disease diagnosis, such as AMI.
在心脏磁共振成像(MRI)分析中,同时进行心肌分割和T2定量分析对于评估心肌病非常重要。现有的方法通常将这些任务分别处理,限制了它们的协同作用潜力。为了解决这个问题,我们提出了SQNet,一种集成Transformer和卷积神经网络(CNN)组件的双任务网络。SQNet具有用于定量分析的T2精炼融合解码器,利用Transformer的全局特征,并具有多个局部区域监督的分割解码器,以提高准确性。一个紧耦合模块将CNN和Transformer分支特征对齐和融合,使SQNet能够专注于心肌区域。在健康对照(HC)和急性心肌梗死患者(AMI)上的评估表明,SQNet的分割散点得分(89.3/89.2)优于最先进的方法(87.7/87.9)。T2定量分析与HC/AMI标签值具有很强的线性相关性(Pearson系数:0.84/0.93),表明准确的映射。放射科医生的评估证实了SQNet在分割(4.60/4.58)和T2定量分析(4.32/4.42)方面的优越图像质量评分超过最先进的方法(4.50/4.44)。因此,SQNet能够准确同时进行分割和定量分析,提高心脏病的诊断,如AMI。
https://arxiv.org/abs/2405.10570
Large Language Models (LLMs) have transformed NLP with their remarkable In-context Learning (ICL) capabilities. Automated assistants based on LLMs are gaining popularity; however, adapting them to novel tasks is still challenging. While colossal models excel in zero-shot performance, their computational demands limit widespread use, and smaller language models struggle without context. This paper investigates whether LLMs can generalize from labeled examples of predefined tasks to novel tasks. Drawing inspiration from biological neurons and the mechanistic interpretation of the Transformer architecture, we explore the potential for information sharing across tasks. We design a cross-task prompting setup with three LLMs and show that LLMs achieve significant performance improvements despite no examples from the target task in the context. Cross-task prompting leads to a remarkable performance boost of 107% for LLaMA-2 7B, 18.6% for LLaMA-2 13B, and 3.2% for GPT 3.5 on average over zero-shot prompting, and performs comparable to standard in-context learning. The effectiveness of generating pseudo-labels for in-task examples is demonstrated, and our analyses reveal a strong correlation between the effect of cross-task examples and model activation similarities in source and target input tokens. This paper offers a first-of-its-kind exploration of LLMs' ability to solve novel tasks based on contextual signals from different task examples.
大规模语言模型(LLMs)通过其令人惊叹的上下文学习(ICL)能力,已经彻底颠覆了自然语言处理(NLP)。基于LLM的自动助手正逐渐受到欢迎;然而,将其适应新任务仍然具有挑战性。虽然巨大的模型在零散shot表现方面表现出色,但它们的计算需求仍然很大,而且小语言模型在没有上下文的情况下会挣扎。本文研究了LLMs是否可以从预定义任务的标注示例中泛化到新任务。从生物神经元和Transformer架构的机械解释中汲取灵感,我们探讨了跨任务信息共享的潜力。我们设计了一个带有三个LLM的跨任务提示设置,并展示了LLMs在没有任何目标任务上下文样本的情况下取得了显著的性能提升。跨任务提示平均提高了LLLaMA-2 7B 107%、LLLaMA-2 13B 18.6%和GPT 3.5的性能,与标准在上下文中的学习效果相当。为在任务示例上生成伪标签的效果进行了证明,我们的分析揭示了源和目标输入词对模型激活相似性的强烈相关。本文是对LLMs基于不同任务上下文信号解决新任务的独特能力的第一次深入探索。
https://arxiv.org/abs/2405.10548
In light of recent breakthroughs in large language models (LLMs) that have revolutionized natural language processing (NLP), there is an urgent need for new benchmarks to keep pace with the fast development of LLMs. In this paper, we propose CFLUE, the Chinese Financial Language Understanding Evaluation benchmark, designed to assess the capability of LLMs across various dimensions. Specifically, CFLUE provides datasets tailored for both knowledge assessment and application assessment. In knowledge assessment, it consists of 38K+ multiple-choice questions with associated solution explanations. These questions serve dual purposes: answer prediction and question reasoning. In application assessment, CFLUE features 16K+ test instances across distinct groups of NLP tasks such as text classification, machine translation, relation extraction, reading comprehension, and text generation. Upon CFLUE, we conduct a thorough evaluation of representative LLMs. The results reveal that only GPT-4 and GPT-4-turbo achieve an accuracy exceeding 60\% in answer prediction for knowledge assessment, suggesting that there is still substantial room for improvement in current LLMs. In application assessment, although GPT-4 and GPT-4-turbo are the top two performers, their considerable advantage over lightweight LLMs is noticeably diminished. The datasets and scripts associated with CFLUE are openly accessible at this https URL.
鉴于最近大型语言模型(LLMs)的突破性进展,为了跟上LLMs的快速发展,有必要制定新的基准。在本文中,我们提出了CFLUE,即中文金融语言理解评估基准,旨在评估LLM在各种维度上的能力。具体来说,CFLUE为知识评估和应用评估提供了专门的数据集。在知识评估方面,它包括38K多个选择题及其相关解答解释。这些问题具有双重作用:答案预测和问题推理。在应用评估方面,CFLUE涵盖了NLP任务中的16K多个测试实例,如文本分类、机器翻译、关系提取、阅读理解和高中生成的文本。通过CFLUE,我们对代表LLM的典型模型进行了深入评估。结果显示,只有GPT-4和GPT-4-turbo在知识评估方面的答案预测准确率超过60%,这表明当前LLM仍有很大的改进空间。在应用评估方面,尽管GPT-4和GPT-4-turbo是前两名,但它们相对于轻量级LLM的优势显著减弱。与CFLUE相关的数据集和脚本已公开在本文后的这个链接中。
https://arxiv.org/abs/2405.10542
We introduce CNER, an ensemble of capable tools for extraction of semantic relationships between named entities in Spanish language. Built upon a container-based architecture, CNER integrates different Named entity recognition and relation extraction tools with a user-friendly interface that allows users to input free text or files effortlessly, facilitating streamlined analysis. Developed as a prototype version for the Natural Language Processing (NLP) Group at Universidad del Valle, CNER serves as a practical educational resource, illustrating how machine learning techniques can effectively tackle diverse NLP tasks in Spanish. Our preliminary results reveal the promising potential of CNER in advancing the understanding and development of NLP tools, particularly within Spanish-language contexts.
我们介绍了CNER,一个用于提取西班牙语中命名实体之间语义关系的强大工具集合。它基于容器化架构构建,并集成了不同命名实体识别和关系提取工具,具有一个友好用户界面,使用户可以轻松地输入自由文本或文件,从而简化分析过程。由委内瑞拉大学自然语言处理(NLP)团队开发的CNER原型版本为NLP组提供了一个实用的教育资源,展示了机器学习技术如何有效解决西班牙语环境下的各种NLP任务。我们的初步结果揭示了CNER在推动NLP工具的理解和发展方面的有益潜力,特别是在西班牙语背景下。
https://arxiv.org/abs/2405.10485
English allows for both compounds (e.g., London-made) and phrasal paraphrases (e.g., made in London). While these constructions have roughly the same truth-conditional meaning, we hypothesize that the compound allows less freedom to express the nature of the semantic relationship between the participle and the pre-participle nominal. We thus predict that the pre-participle slot is more constrained than the equivalent position in the phrasal construction. We test this prediction in a large corpus by measuring the entropy of corresponding nominal slots, conditional on the participle used. That is, we compare the entropy of $\alpha$ in compound construction slots like $\alpha$-[V]ed to the entropy of $\alpha$ in phrasal constructions like [V]ed by $\alpha$ for a given verb V. As predicted, there is significantly lower entropy in the compound construction than in the phrasal construction. We consider how these predictions follow from more general grammatical properties and processing factors.
英语允许复合词(例如,伦敦制造的)和短语语素(例如,在伦敦制造的)。虽然这些构造在句子的真值条件意义大致相同,但我们假设复合词允许在表达名词短语和介词短语之间的语义关系方面提供较少的自由。因此,我们预测介词短语位置比相应位置在短语构造中的对应位置更加约束。我们在一个大语料库上测试这个预测,通过测量相应名义槽的熵,条件是使用的动词。即,我们比较了像$\alpha$这样的复合构造中$\alpha$的名词槽的熵与像[V]ed这样的短语构造中$\alpha$的名词槽的熵。正如预期的,复合构造中的熵明显低于短语构造。我们考虑了这些预测如何来自更一般的语法属性和处理因素。
https://arxiv.org/abs/2405.10457
To advance the circular economy (CE), it is crucial to gain insights into the evolution of public sentiments, cognitive pathways of the masses concerning circular products and digital technology, and recognise the primary concerns. To achieve this, we collected data related to the CE from diverse platforms including Twitter, Reddit, and The Guardian. This comprehensive data collection spanned across three distinct strata of the public: the general public, professionals, and official sources. Subsequently, we utilised three topic models on the collected data. Topic modelling represents a type of data-driven and machine learning approach for text mining, capable of automatically categorising a large number of documents into distinct semantic groups. Simultaneously, these groups are described by topics, and these topics can aid in understanding the semantic content of documents at a high level. However, the performance of topic modelling may vary depending on different hyperparameter values. Therefore, in this study, we proposed a framework for topic modelling with hyperparameter optimisation for CE and conducted a series of systematic experiments to ensure that topic models are set with appropriate hyperparameters and to gain insights into the correlations between the CE and public opinion based on well-established models. The results of this study indicate that concerns about sustainability and economic impact persist across all three datasets. Official sources demonstrate a higher level of engagement with the application and regulation of CE. To the best of our knowledge, this study is pioneering in investigating various levels of public opinions concerning CE through topic modelling with the exploration of hyperparameter optimisation.
要推动循环经济(CE),深入了解公众关于循环产品和数字技术的认知途径以及主要关注点是非常关键的。为了实现这一目标,我们从包括Twitter、Reddit和The Guardian在内的各种平台上收集有关CE的数据。这一全面的数据分析跨越了公众的三层:一般公众、专业人士和官方来源。随后,我们利用收集的数据使用了三种主题模型。主题建模是一种数据驱动和机器学习方法,用于文本挖掘,能够自动将大量文档归入不同的语义组。同时,这些组由主题描述,这些主题有助于在高层次上理解文档的语义内容。然而,主题模型的性能可能会因不同超参数值而有所不同。因此,在本书研究中,我们提出了一个CE主题建模的优化框架,并进行了一系列系统实验,确保主题模型设置合适的超参数,并研究了CE与公众观点之间的相关性。本研究的结果表明,对可持续性和经济影响担忧的问题在所有三个数据集中仍然存在。官方来源表明,他们对CE的应用和法规表示关注程度更高。据我们所知,通过主题建模研究公共意见的各个层次,探索超参数优化,本研究在探究各种CE水平公众意见方面是开拓性的。
https://arxiv.org/abs/2405.10452
We propose an approach to construct text-based time-series indices in an optimal way--typically, indices that maximize the contemporaneous relation or the predictive performance with respect to a target variable, such as inflation. We illustrate our methodology with a corpus of news articles from the Wall Street Journal by optimizing text-based indices focusing on tracking the VIX index and inflation expectations. Our results highlight the superior performance of our approach compared to existing indices.
我们提出了一个构建基于文本的时间序列索引的最佳方法--通常,这些索引在同时关系或与目标变量(如通货膨胀)的预测性能方面最大化。为了说明我们的方法,我们通过优化基于文本的指数来跟踪VIX指数和通货膨胀预期,以示例来自《华尔街日报》的新闻文章。我们的结果表明,与现有索引相比,我们的方法具有优越的性能。
https://arxiv.org/abs/2405.10449
The evolution of Explainable Artificial Intelligence (XAI) has emphasised the significance of meeting diverse user needs. The approaches to identifying and addressing these needs must also advance, recognising that explanation experiences are subjective, user-centred processes that interact with users towards a better understanding of AI decision-making. This paper delves into the interrelations in multi-faceted XAI and examines how different types of explanations collaboratively meet users' XAI needs. We introduce the Intent Fulfilment Framework (IFF) for creating explanation experiences. The novelty of this paper lies in recognising the importance of "follow-up" on explanations for obtaining clarity, verification and/or substitution. Moreover, the Explanation Experience Dialogue Model integrates the IFF and "Explanation Followups" to provide users with a conversational interface for exploring their explanation needs, thereby creating explanation experiences. Quantitative and qualitative findings from our comparative user study demonstrate the impact of the IFF in improving user engagement, the utility of the AI system and the overall user experience. Overall, we reinforce the principle that "one explanation does not fit all" to create explanation experiences that guide the complex interaction through conversation.
Explainable Artificial Intelligence(XAI)的演变突出了满足多样用户需求的重要性。确定和解决这些需求的方法也必须进步,因为解释体验是主观、以用户为中心的过程,与用户更好地理解AI决策进行交互。本文深入探讨了多方面的XAI之间的相互关系,并探讨了不同类型的解释如何共同满足用户的XAI需求。我们引入了意图完成框架(IFF)来创建解释体验。本文的新颖之处在于认识到“后续解释”对于获得清晰度、验证和/或替代的重要性。此外,解释体验对话模型集成了IFF和“解释后续”功能,为用户提供了一个探索其解释需求的对话界面,从而创造了解释体验。我们进行了定量定性比较研究,比较了IFF在提高用户参与度、AI系统的实用性和整体用户体验方面的效果。总体而言,我们强调了“一个解释并不适合所有人”的原则,以创建通过对话引导复杂交互的解释体验。
https://arxiv.org/abs/2405.10446
Locating an object in a sequence of frames, given its appearance in the first frame of the sequence, is a hard problem that involves many stages. Usually, state-of-the-art methods focus on bringing novel ideas in the visual encoding or relational modelling phases. However, in this work, we show that bounding box regression from learned joint search and template features is of high importance as well. While previous methods relied heavily on well-learned features representing interactions between search and template, we hypothesize that the receptive field of the input convolutional bounding box network plays an important role in accurately determining the object location. To this end, we introduce two novel bounding box regression networks: inception and deformable. Experiments and ablation studies show that our inception module installed on the recent ODTrack outperforms the latter on three benchmarks: the GOT-10k, the UAV123 and the OTB2015.
在序列帧中定位一个对象,已知其在序列的第一个帧中的出现,是一个具有很多阶段的复杂问题。通常,最先进的方法会关注在视觉编码或关系建模阶段引入新颖想法。然而,在我们的工作中,我们证明了从学习联合搜索和模板特征中进行边界框回归非常重要。虽然以前的方法主要依赖经过良好训练的表示搜索和模板之间交互的特征,但我们假设输入卷积边界框网络的接收域在准确确定对象位置方面发挥着重要作用。为此,我们引入了两个新颖的边界框回归网络:Inception和Deformable。实验和消融研究结果表明,我们安装在最新OdTrack上的Inception模块在三个基准测试(GOT-10k、UAV123和OTB2015)上的表现优于后者。
https://arxiv.org/abs/2405.10444
We are living in a three-dimensional space while moving forward through a fourth dimension: time. To allow artificial intelligence to develop a comprehensive understanding of such a 4D environment, we introduce 4D Panoptic Scene Graph (PSG-4D), a new representation that bridges the raw visual data perceived in a dynamic 4D world and high-level visual understanding. Specifically, PSG-4D abstracts rich 4D sensory data into nodes, which represent entities with precise location and status information, and edges, which capture the temporal relations. To facilitate research in this new area, we build a richly annotated PSG-4D dataset consisting of 3K RGB-D videos with a total of 1M frames, each of which is labeled with 4D panoptic segmentation masks as well as fine-grained, dynamic scene graphs. To solve PSG-4D, we propose PSG4DFormer, a Transformer-based model that can predict panoptic segmentation masks, track masks along the time axis, and generate the corresponding scene graphs via a relation component. Extensive experiments on the new dataset show that our method can serve as a strong baseline for future research on PSG-4D. In the end, we provide a real-world application example to demonstrate how we can achieve dynamic scene understanding by integrating a large language model into our PSG-4D system.
我们正在通过第四维度(时间)向前移动,生活在三维空间中。为了使人工智能能够全面理解这种四维环境,我们引入了4D透视场景图(PSG-4D),一种新的表示方法,它桥接了在动态四维世界中感知到的原始视觉数据和高层次视觉理解。具体来说,PSG-4D将丰富的4D感官数据抽象为节点,这些节点表示具有精确位置和状态信息的实体,并且边捕获了时间关系。为了促进关于这一新领域的 research,我们构建了一个带有丰富注释的PSG-4D数据集,包括3K个RGB-D视频,总共有1M个帧,每个视频都被标注了4D透视分割掩码以及细粒度的动态场景图。为了解决PSG-4D,我们提出了PSG4DFormer,一种基于Transformer的模型,可以预测透视分割掩码,跟踪掩码沿着时间轴,并通过关系组件生成相应的场景图。对新技术数据集的实验证明表明,我们的方法可以为未来的PSG-4D研究提供一个强大的基线。最后,我们通过将大型语言模型集成到PSG-4D系统中来提供了一个真实的应用实例,展示了我们如何通过整合大型语言模型来实现动态场景理解。
https://arxiv.org/abs/2405.10305
This paper investigates the dynamics of a deep neural network (DNN) learning interactions. Previous studies have discovered and mathematically proven that given each input sample, a well-trained DNN usually only encodes a small number of interactions (non-linear relationships) between input variables in the sample. A series of theorems have been derived to prove that we can consider the DNN's inference equivalent to using these interactions as primitive patterns for inference. In this paper, we discover the DNN learns interactions in two phases. The first phase mainly penalizes interactions of medium and high orders, and the second phase mainly learns interactions of gradually increasing orders. We can consider the two-phase phenomenon as the starting point of a DNN learning over-fitted features. Such a phenomenon has been widely shared by DNNs with various architectures trained for different tasks. Therefore, the discovery of the two-phase dynamics provides a detailed mechanism for how a DNN gradually learns different inference patterns (interactions). In particular, we have also verified the claim that high-order interactions have weaker generalization power than low-order interactions. Thus, the discovered two-phase dynamics also explains how the generalization power of a DNN changes during the training process.
本文研究了深度神经网络(DNN)学习交互的动态。之前的研究发现并数学证明了,给定每个输入样本,经过良好训练的DNN通常只编码样本中输入变量之间的小数量(非线性关系)交互。一系列推论已经被导出来,证明我们可以将DNN的推理视为使用这些交互作为推理的基本模式。在本文中,我们发现了DNN在两个阶段学习交互。第一个阶段主要惩罚中高阶交互,第二个阶段主要学习逐渐增加阶数的交互。我们可以将两个阶段的现象视为DNN过拟合特征的起点。事实上,这种现象已经在各种架构训练的DNN中得到了广泛分享。因此,发现两个阶段动态提供了DNN逐渐学习不同推理模式(交互)的详细机制。特别,我们还验证了说法,高阶交互的泛化能力比低阶交互弱。因此,发现的两个阶段动态也解释了DNN在训练过程中泛化能力的变化。
https://arxiv.org/abs/2405.10262
Scientific document summarization has been a challenging task due to the long structure of the input text. The long input hinders the simultaneous effective modeling of both global high-order relations between sentences and local intra-sentence relations which is the most critical step in extractive summarization. However, existing methods mostly focus on one type of relation, neglecting the simultaneous effective modeling of both relations, which can lead to insufficient learning of semantic representations. In this paper, we propose HAESum, a novel approach utilizing graph neural networks to locally and globally model documents based on their hierarchical discourse structure. First, intra-sentence relations are learned using a local heterogeneous graph. Subsequently, a novel hypergraph self-attention layer is introduced to further enhance the characterization of high-order inter-sentence relations. We validate our approach on two benchmark datasets, and the experimental results demonstrate the effectiveness of HAESum and the importance of considering hierarchical structures in modeling long scientific documents. Our code will be available at \url{this https URL}
科学论文摘要是一项具有挑战性的任务,因为输入文本的长度较长。长输入阻碍了同时有效建模句子间全局高阶关系和句子内部关系,这是提取性摘要中最关键的步骤。然而,现有的方法主要关注一种关系,忽略了同时有效建模两种关系,这可能导致语义表示学习不足。在本文中,我们提出了HAESum,一种利用图神经网络基于文档层次结构局部和全局建模的新颖方法。首先,使用局部异质图学习句子间内部关系。然后,引入了一种新颖的图自注意层,进一步增强高阶句子间关系的描述。我们在两个基准数据集上验证我们的方法,实验结果证明了HAESum的有效性和考虑文档层次结构在建模长科学论文中的重要性。我们的代码将在\url{这个链接}中提供。
https://arxiv.org/abs/2405.10202
In autonomous driving, accurately interpreting the movements of other road users and leveraging this knowledge to forecast future trajectories is crucial. This is typically achieved through the integration of map data and tracked trajectories of various agents. Numerous methodologies combine this information into a singular embedding for each agent, which is then utilized to predict future behavior. However, these approaches have a notable drawback in that they may lose exact location information during the encoding process. The encoding still includes general map information. However, the generation of valid and consistent trajectories is not guaranteed. This can cause the predicted trajectories to stray from the actual lanes. This paper introduces a new refinement module designed to project the predicted trajectories back onto the actual map, rectifying these discrepancies and leading towards more consistent predictions. This versatile module can be readily incorporated into a wide range of architectures. Additionally, we propose a novel scene encoder that handles all relations between agents and their environment in a single unified heterogeneous graph attention network. By analyzing the attention values on the different edges in this graph, we can gain unique insights into the neural network's inner workings leading towards a more explainable prediction.
在自动驾驶中,准确地解释其他道路用户的行为并利用这些知识预测未来轨迹至关重要。通常,这是通过将地图数据和不同代理的跟踪轨迹整合来实现这一目标的。许多方法将这一信息整合为每个代理的单一嵌入,然后用于预测未来的行为。然而,这些方法的一个显著缺点是在编码过程中可能会丢失精确的地理位置信息。编码过程仍然包括一般地图信息。然而,生成有效的和一致的轨迹并不是绝对的保证。这可能导致预测的轨迹与实际车道脱离。本文介绍了一种新的优化模块,旨在将预测的轨迹投射回实际地图,纠正这些差异并实现更一致的预测。这个多功能模块可以轻松地集成到各种架构中。此外,我们提出了一个全新的场景编码器,用于处理所有代理和它们环境之间的关系,在单个统一的异质图注意力网络中。通过分析该图不同边缘的注意力值,我们可以深入了解神经网络的工作原理,从而实现更有解释性的预测。
https://arxiv.org/abs/2405.10134
Event Stream Super-Resolution (ESR) aims to address the challenge of insufficient spatial resolution in event streams, which holds great significance for the application of event cameras in complex scenarios. Previous works for ESR often process positive and negative events in a mixed paradigm. This paradigm limits their ability to effectively model the unique characteristics of each event and mutually refine each other by considering their correlations. In this paper, we propose a bilateral event mining and complementary network (BMCNet) to fully leverage the potential of each event and capture the shared information to complement each other simultaneously. Specifically, we resort to a two-stream network to accomplish comprehensive mining of each type of events individually. To facilitate the exchange of information between two streams, we propose a bilateral information exchange (BIE) module. This module is layer-wisely embedded between two streams, enabling the effective propagation of hierarchical global information while alleviating the impact of invalid information brought by inherent characteristics of events. The experimental results demonstrate that our approach outperforms the previous state-of-the-art methods in ESR, achieving performance improvements of over 11\% on both real and synthetic datasets. Moreover, our method significantly enhances the performance of event-based downstream tasks such as object recognition and video reconstruction. Our code is available at this https URL.
事件流超分辨率(ESR)旨在解决事件流中空间分辨率不足的问题,这对在复杂场景中应用事件相机具有重大意义。之前的ESR工作通常在混合范式中处理正负事件。这种范式限制了他们有效建模每个事件的独特特点以及相互 refinement 彼此的能力。在本文中,我们提出了一种双边事件挖掘和互补网络(BMCNet),以充分利用每个事件的潜力,同时捕捉到相互补充的信息。具体来说,我们采用双流网络分别对每种事件进行全面的挖掘。为了促进两个流之间的信息交流,我们提出了双向信息交换(BIE)模块。该模块在两个流之间层叠嵌入,有效地传播分层全局信息,同时减轻由于事件固有特征带来的不准确信息的影响。实验结果表明,我们的方法在ESR领域超过了最先进的现有方法,实现了超过11%的性能提升,无论是真实数据还是合成数据。此外,我们的方法显著增强了基于事件的下游任务(如物体识别和视频重建)的性能。我们的代码可在此处访问:https://www.example.com/。
https://arxiv.org/abs/2405.10037