One way to personalize chatbot interactions is by establishing common ground with the intended reader. A domain where establishing mutual understanding could be particularly impactful is vaccine concerns and misinformation. Vaccine interventions are forms of messaging which aim to answer concerns expressed about vaccination. Tailoring responses in this domain is difficult, since opinions often have seemingly little ideological overlap. We define the task of tailoring vaccine interventions to a Common-Ground Opinion (CGO). Tailoring responses to a CGO involves meaningfully improving the answer by relating it to an opinion or belief the reader holds. In this paper we introduce TAILOR-CGO, a dataset for evaluating how well responses are tailored to provided CGOs. We benchmark several major LLMs on this task; finding GPT-4-Turbo performs significantly better than others. We also build automatic evaluation metrics, including an efficient and accurate BERT model that outperforms finetuned LLMs, investigate how to successfully tailor vaccine messaging to CGOs, and provide actionable recommendations from this investigation. Code and model weights: this https URL Dataset: this https URL
一种个性化聊天机器人互动的方法是与目标读者建立共同点。在疫苗担忧和错误信息方面,建立相互理解可能尤为重要。疫苗干预是一种旨在回答关于接种疫苗的担忧的信息传递形式。在这样一个领域定制回答很难,因为观点往往有很大的意识形态差异。我们定义了将疫苗干预定制到共同观点(CGO)的任务。将回答定制到CGO涉及有意义地改进答案,将其与读者持有的观点或信念相关联。在本文中,我们介绍了TAILOR-CGO数据集,用于评估响应是否充分定制到提供的CGO。我们在该任务上基准了几个主要LLM;发现GPT-4-Turbo的表现优于其他模型。我们还构建了自动评估指标,包括一个高效且准确的BERT模型,该模型超过了微调的LLM。我们研究了如何成功地将疫苗信息定制到CGO,并从这项调查中提供了可行的建议。代码和模型权重:<https://this URL> 数据集:<https://this URL>
https://arxiv.org/abs/2405.10861
Understanding the reasons behind the exceptional success of transformers requires a better analysis of why attention layers are suitable for NLP tasks. In particular, such tasks require predictive models to capture contextual meaning which often depends on one or few words, even if the sentence is long. Our work studies this key property, dubbed word sensitivity (WS), in the prototypical setting of random features. We show that attention layers enjoy high WS, namely, there exists a vector in the space of embeddings that largely perturbs the random attention features map. The argument critically exploits the role of the softmax in the attention layer, highlighting its benefit compared to other activations (e.g., ReLU). In contrast, the WS of standard random features is of order $1/\sqrt{n}$, $n$ being the number of words in the textual sample, and thus it decays with the length of the context. We then translate these results on the word sensitivity into generalization bounds: due to their low WS, random features provably cannot learn to distinguish between two sentences that differ only in a single word; in contrast, due to their high WS, random attention features have higher generalization capabilities. We validate our theoretical results with experimental evidence over the BERT-Base word embeddings of the imdb review dataset.
理解 transformers 异常成功的原因需要更好地分析为什么注意力层适用于自然语言处理任务。特别是,这些任务需要预测模型捕捉上下文意义,通常取决于一个或几个单词,即使句子很长。我们的工作研究了这种关键属性,称之为词敏感性(WS),在随机特征的典型设置中。我们证明,注意力层享有高 WS,即存在一个向量在嵌入空间中大部分扰动随机注意力特征映射。这个论据批判性地利用了注意力层中 softmax 的作用,强调了其与其他激活函数(例如 ReLU)相比的优势。相比之下,标准随机特征的 WS 规模为 $1/\sqrt{n}$,$n$ 是文本样本中单词的数量,因此它的衰减与上下文的长度成正比。然后我们将这些结果翻译成词敏感性的一般化上限:由于其低 WS,随机特征无法学会区分两个仅在单个单词上有所不同的句子;相反,由于其高 WS,随机注意力特征具有更高的通用能力。我们用 Imdb 评论数据集的 BERT-Base 单词嵌入验证我们的理论结果。
https://arxiv.org/abs/2402.02969
Active learning is designed to minimize annotation efforts by prioritizing instances that most enhance learning. However, many active learning strategies struggle with a 'cold start' problem, needing substantial initial data to be effective. This limitation often reduces their utility for pre-trained models, which already perform well in few-shot scenarios. To address this, we introduce ActiveLLM, a novel active learning approach that leverages large language models such as GPT-4, Llama 3, and Mistral Large for selecting instances. We demonstrate that ActiveLLM significantly enhances the classification performance of BERT classifiers in few-shot scenarios, outperforming both traditional active learning methods and the few-shot learning method SetFit. Additionally, ActiveLLM can be extended to non-few-shot scenarios, allowing for iterative selections. In this way, ActiveLLM can even help other active learning strategies to overcome their cold start problem. Our results suggest that ActiveLLM offers a promising solution for improving model performance across various learning setups.
主动学习旨在通过优先选择最增强学习实例来最小化注释工作量。然而,许多主动学习策略在“冷启动”问题上挣扎,需要大量初始数据才能有效。这一限制通常会降低它们在预训练模型上的可用性,而预训练模型已经在少数 shot 场景中表现良好。为了克服这一限制,我们引入了ActiveLLM,一种新颖的主动学习方法,它利用了如GPT-4、Llama 3和Mistral Large等大型语言模型进行实例选择。我们证明了ActiveLLM在少数 shot 场景中显著增强了BERT分类器的分类性能,超过了传统主动学习和SetFit等少数 shot 学习方法。此外,ActiveLLM可以扩展到非少数 shot 场景,实现迭代选择。这种方式,ActiveLLM甚至可以帮助其他主动学习策略克服其冷启动问题。我们的结果表明,ActiveLLM为改善各种学习设置下的模型性能提供了有前途的解决方案。
https://arxiv.org/abs/2405.10808
Large language models (LLMs) trained on general domain corpora showed remarkable results on natural language processing (NLP) tasks. However, previous research demonstrated LLMs trained using domain-focused corpora perform better on specialized tasks. Inspired by this pivotal insight, we developed INDUS, a comprehensive suite of LLMs tailored for the Earth science, biology, physics, heliophysics, planetary sciences and astrophysics domains and trained using curated scientific corpora drawn from diverse data sources. The suite of models include: (1) an encoder model trained using domain-specific vocabulary and corpora to address natural language understanding tasks, (2) a contrastive-learning-based general text embedding model trained using a diverse set of datasets drawn from multiple sources to address information retrieval tasks and (3) smaller versions of these models created using knowledge distillation techniques to address applications which have latency or resource constraints. We also created three new scientific benchmark datasets namely, CLIMATE-CHANGE-NER (entity-recognition), NASA-QA (extractive QA) and NASA-IR (IR) to accelerate research in these multi-disciplinary fields. Finally, we show that our models outperform both general-purpose encoders (RoBERTa) and existing domain-specific encoders (SciBERT) on these new tasks as well as existing benchmark tasks in the domains of interest.
基于通用领域语料库训练的大语言模型(LLMs)在自然语言处理(NLP)任务上表现出惊人的效果。然而,以前的研究表明,使用领域关注语料库训练的LLM在专业任务上表现更好。受到这一关键洞见的启发,我们开发了INDUS,一款专为地球科学、生物学、物理学、天体物理学、行星科学和天文学领域设计的全面LLM,并使用来自各种数据源的经过策展的科学语料库进行训练。该套模型包括:(1)一个使用领域特定词汇和语料库训练的编码器模型,以解决自然语言理解任务, (2)一个使用多样数据源的对比学习基于通用文本嵌入模型的信息检索任务,和一个使用知识蒸馏技术创建的更小的模型,以解决具有延迟或资源限制的应用程序, (3)三个新的科学基准数据集:CLIMATE-CHANGE-NER(实体识别)、NASA-QA(提取式QA)和NASA-IR(IR),以加速这些多学科领域的研究。最后,我们证明了我们的模型在这些新任务以及感兴趣领域的现有基准任务上都超越了通用编码器(RoBERTa)和现有领域编码器(SciBERT)。
https://arxiv.org/abs/2405.10725
Coreference resolution, critical for identifying textual entities referencing the same entity, faces challenges in pronoun resolution, particularly identifying pronoun antecedents. Existing methods often treat pronoun resolution as a separate task from mention detection, potentially missing valuable information. This study proposes the first end-to-end neural network system for Persian pronoun resolution, leveraging pre-trained Transformer models like ParsBERT. Our system jointly optimizes both mention detection and antecedent linking, achieving a 3.37 F1 score improvement over the previous state-of-the-art system (which relied on rule-based and statistical methods) on the Mehr corpus. This significant improvement demonstrates the effectiveness of combining neural networks with linguistic models, potentially marking a significant advancement in Persian pronoun resolution and paving the way for further research in this under-explored area.
核心参考文献解决,对于识别相同实体文本中的实体,面临代词解决挑战,特别是确定代词前缀。现有方法通常将代词解决视为与提及检测分开的任务,可能错过有价值的信息。本研究提出了第一个端到端的波斯语代词解决方案,利用像ParsBERT这样的预训练Transformer模型。我们的系统共同优化提及检测和前缀链接,在Mehr语料库上的性能比之前最先进的系统(依赖规则和统计方法)提高了3.37个F1分数。这一显著的改善表明了将神经网络与语言模型相结合的有效性,可能标志着波斯语代词解决领域的重要进展,并为这个研究不足探索领域进一步研究铺平道路。
https://arxiv.org/abs/2405.10714
The classification of statements provided by individuals during police interviews is a complex and significant task within the domain of natural language processing (NLP) and legal informatics. The lack of extensive domain-specific datasets raises challenges to the advancement of NLP methods in the field. This paper aims to address some of the present challenges by introducing a novel dataset tailored for classification of statements made during police interviews, prior to court proceedings. Utilising the curated dataset for training and evaluation, we introduce a fine-tuned DistilBERT model that achieves state-of-the-art performance in distinguishing truthful from deceptive statements. To enhance interpretability, we employ explainable artificial intelligence (XAI) methods to offer explainability through saliency maps, that interpret the model's decision-making process. Lastly, we present an XAI interface that empowers both legal professionals and non-specialists to interact with and benefit from our system. Our model achieves an accuracy of 86%, and is shown to outperform a custom transformer architecture in a comparative study. This holistic approach advances the accessibility, transparency, and effectiveness of statement analysis, with promising implications for both legal practice and research.
在自然语言处理(NLP)和法律信息学领域,对犯罪嫌疑人在警讯中提供的陈述进行分类是一个复杂而重要的任务。缺乏广泛的领域特定数据集会挑战NLP方法在领域的发展。本文旨在通过引入一个针对警讯中陈述分类的新型数据集来解决一些现有挑战,该数据集在训练和评估过程中使用了经过挑选的数据集。通过训练和评估来微调预先训练的DistilBERT模型,该模型在区分真实陈述和虚假陈述方面实现了最先进的性能。为了提高可解释性,我们采用了解释性人工智能(XAI)方法,通过置信度图提供置信度,解释了模型的决策过程。最后,我们提出了一个XAI界面,使法律专业人员和非专业人士能够与我们系统互动并从中受益。我们的模型实现了86%的准确率,并在比较研究中证明了其优于自定义Transformer架构的性能。这种全面的方法推动了陈述分析的可用性、透明度和有效性,对法律实践和研究具有积极的意义。
https://arxiv.org/abs/2405.10702
The Multigenerator, Multidomain, and Multilingual Black-Box Machine-Generated Text Detection shared task in the SemEval-2024 competition aims to tackle the problem of misusing collaborative human-AI writing. Although there are a lot of existing detectors of AI content, they are often designed to give a binary answer and thus may not be suitable for more nuanced problem of finding the boundaries between human-written and machine-generated texts, while hybrid human-AI writing becomes more and more popular. In this paper, we address the boundary detection problem. Particularly, we present a pipeline for augmenting data for supervised fine-tuning of DeBERTaV3. We receive new best MAE score, according to the leaderboard of the competition, with this pipeline.
多生成器、多领域和多语言的黑色盒式机器翻译文本检测在SemEval-2024竞赛中旨在解决协同人类-人工智能写作中误用人工智能内容的问题。尽管有很多现有的人工智能内容检测器,但它们通常被设计为只给出二进制答案,因此可能不适合更细微的问题,即在人类撰写的和人工智能生成的文本之间找到边界。而混合人类-人工智能写作变得越来越受欢迎。在本文中,我们解决了边界检测问题。特别是,我们提出了一种用于增强数据以进行监督微调DeBERTaV3的管道。根据比赛的领导者板,我们使用这个管道获得了最优秀的MAE分数。
https://arxiv.org/abs/2405.10629
Existing strategies for managing risks from advanced AI systems often focus on affecting what AI systems are developed and how they diffuse. However, this approach becomes less feasible as the number of developers of advanced AI grows, and impedes beneficial use-cases as well as harmful ones. In response, we urge a complementary approach: increasing societal adaptation to advanced AI, that is, reducing the expected negative impacts from a given level of diffusion of a given AI capability. We introduce a conceptual framework which helps identify adaptive interventions that avoid, defend against and remedy potentially harmful uses of AI systems, illustrated with examples in election manipulation, cyberterrorism, and loss of control to AI decision-makers. We discuss a three-step cycle that society can implement to adapt to AI. Increasing society's ability to implement this cycle builds its resilience to advanced AI. We conclude with concrete recommendations for governments, industry, and third-parties.
目前,管理先进人工智能系统风险的策略通常集中于影响AI系统的发展和扩散。然而,这种方法在AI开发者数量不断增加的情况下变得越来越不现实,也会阻碍有益的和有害的使用案例。因此,我们呼吁一种互补的方法:增加社会对先进AI的适应性,即减少从给定扩散水平开始,给定AI能力的潜在有害影响。我们引入了一个概念框架,以帮助识别避免、防御和修复可能对AI系统产生有害影响的适应干预措施,并通过选举操纵、网络恐怖主义和失去对AI决策者的控制等例子进行了说明。我们讨论了社会可以采用的三步循环来适应AI。不断提高社会实施这一循环能力增强了其对先进AI的韧性。最后,我们给出了政府、行业和第三方的具体建议。
https://arxiv.org/abs/2405.10295
Our study addresses a significant gap in online hate speech detection research by focusing on homophobia, an area often neglected in sentiment analysis research. Utilising advanced sentiment analysis models, particularly BERT, and traditional machine learning methods, we developed a nuanced approach to identify homophobic content on X/Twitter. This research is pivotal due to the persistent underrepresentation of homophobia in detection models. Our findings reveal that while BERT outperforms traditional methods, the choice of validation technique can impact model performance. This underscores the importance of contextual understanding in detecting nuanced hate speech. By releasing the largest open-source labelled English dataset for homophobia detection known to us, an analysis of various models' performance and our strongest BERT-based model, we aim to enhance online safety and inclusivity. Future work will extend to broader LGBTQIA+ hate speech detection, addressing the challenges of sourcing diverse datasets. Through this endeavour, we contribute to the larger effort against online hate, advocating for a more inclusive digital landscape. Our study not only offers insights into the effective detection of homophobic content by improving on previous research results, but it also lays groundwork for future advancements in hate speech analysis.
我们的研究在在线仇恨言论检测研究中填补了一个重要的空白,专注于情感分析研究经常被忽视的领域。利用先进的情感分析模型,特别是BERT,以及传统机器学习方法,我们开发了一种 nuanced的方法来识别X/Twitter上的同性恋内容。由于在检测模型中持续存在对仇恨言论的低估,这项研究至关重要。我们的发现表明,尽管BERT超越了传统方法,但验证技术的选择可能会影响模型性能。这凸显了在检测复杂仇恨言论中情境理解的重要性。通过发布我们所拥有的最大开放源代码的英语仇恨言论检测数据集,以及我们最强的基于BERT的模型,我们旨在提高在线安全和包容性。未来的工作将扩展到更广泛的LGBTQIA+仇恨言论检测,解决数据来源的挑战。通过这项努力,我们为反对在线仇恨言论作出了贡献,主张建设一个更加包容的数字环境。我们的研究不仅为以前的研究成果提供了洞察,而且也为未来仇恨言论分析的进步奠定了基础。
https://arxiv.org/abs/2405.09221
Autonomous systems often encounter environments and scenarios beyond the scope of their training data, which underscores a critical challenge: the need to generalize and adapt to unseen scenarios in real time. This challenge necessitates new mathematical and algorithmic tools that enable adaptation and zero-shot transfer. To this end, we leverage the theory of function encoders, which enables zero-shot transfer by combining the flexibility of neural networks with the mathematical principles of Hilbert spaces. Using this theory, we first present a method for learning a space of dynamics spanned by a set of neural ODE basis functions. After training, the proposed approach can rapidly identify dynamics in the learned space using an efficient inner product calculation. Critically, this calculation requires no gradient calculations or retraining during the online phase. This method enables zero-shot transfer for autonomous systems at runtime and opens the door for a new class of adaptable control algorithms. We demonstrate state-of-the-art system modeling accuracy for two MuJoCo robot environments and show that the learned models can be used for more efficient MPC control of a quadrotor.
自主系统通常会面临其训练数据范围之外的环境和场景,这凸显了一个关键挑战:需要在实时情况下对未见过的场景进行泛化和适应。这个挑战需要新的数学和算法工具来实现适应和零样本转移。为此,我们利用函数编码器的理论,该理论通过结合神经网络的灵活性和Hilbert空间数学原理来实现零样本转移。使用这个理论,我们首先提出了一种学习由一组神经ODE基础函数组成的动态空间的方法。训练后,所提出的方法可以迅速地在学习到的空间中识别出动态。关键的是,这个计算在在线阶段不需要梯度计算或重新训练。这种方法使得自主系统在实时情况下实现零样本转移,并为新的自适应控制算法打开了大门。我们用两个MuJoCo机器人环境证明了最先进的系统建模精度,并表明所学习到的模型可以用于更有效的MPC控制四旋翼。
https://arxiv.org/abs/2405.08954
Self-supervised learning has shown great success in Speech Recognition. However, it has been observed that finetuning all layers of the learned model leads to lower performance compared to resetting top layers. This phenomenon is attributed to the ''autoencoder'' behavior: top layers contain information closer to the input and are less suitable for tasks that require linguistic information, such as Speech this http URL better our understanding of this behavior, we propose to study the evolution of high-level information within the model during pretraining. We focus on the HuBERT model, which exhibits a less pronounced ''autoencoder'' behavior. By experimentally exploring various factors that may have an impact, we aim to improve the training procedure and enhance the top layers of HuBERT for high-level tasks.Furthermore, our experiments demonstrate that these improvements in the training procedure result in faster convergence and competitive performance on downstream tasks.
自监督学习在语音识别方面取得了巨大的成功。然而,观察发现,对学习模型的所有层进行微调会导致性能低于重置顶层。这种现象归因于“自编码器”行为:顶层包含更接近输入的信息,并且不太适合需要语言信息的任务,比如更好地理解这个行为。为了研究模型在预训练期间的高级信息演化,我们关注了表现较轻的HuBERT模型。通过实验探索可能影响训练过程的各种因素,我们的目标是改进训练程序并提高HuBERT模型在高级任务上的顶级层。此外,我们的实验还表明,这些训练过程的改进会导致更快的学习曲线和竞争力的下游任务性能。
https://arxiv.org/abs/2405.08402
This paper investigates the development and evaluation of machine translation models from Cantonese to English, where we propose a novel approach to tackle low-resource language translations. The main objectives of the study are to develop a model that can effectively translate Cantonese to English and evaluate it against state-of-the-art commercial models. To achieve this, a new parallel corpus has been created by combining different available corpora online with preprocessing and cleaning. In addition, a monolingual Cantonese dataset has been created through web scraping to aid the synthetic parallel corpus generation. Following the data collection process, several approaches, including fine-tuning models, back-translation, and model switch, have been used. The translation quality of models has been evaluated with multiple quality metrics, including lexicon-based metrics (SacreBLEU and hLEPOR) and embedding-space metrics (COMET and BERTscore). Based on the automatic metrics, the best model is selected and compared against the 2 best commercial translators using the human evaluation framework HOPES. The best model proposed in this investigation (NLLB-mBART) with model switch mechanisms has reached comparable and even better automatic evaluation scores against State-of-the-art commercial models (Bing and Baidu Translators), with a SacreBLEU score of 16.8 on our test set. Furthermore, an open-source web application has been developed to allow users to translate between Cantonese and English, with the different trained models available for effective comparisons between models from this investigation and users. CANTONMT is available at this https URL
本文研究了从粤语到英语的机器翻译模型的开发和评估,提出了一种新颖的方法来解决资源有限的语言翻译问题。研究的主要目标是开发一个能有效将粤语翻译成英语的模型,并将其与最先进的商业模型进行比较。为实现这一目标,通过将不同可用的语料库在线合并并进行预处理和清洗,创建了一个新的并行语料库。此外,通过网络爬取创建了一个单语粤语数据集,以帮助生成合成的并行语料库。在数据收集过程完成后,采用了多种方法,包括微调模型、反向翻译和模型切换,对模型的翻译质量进行了评估。评估指标包括词汇表为基础的指标(SacreBLEU和hLEPOR)和嵌入空间指标(COMET和BERTscore)。根据自动指标,最佳模型被选择并与人评估框架HOPES中的两个最佳商业翻译者进行了比较。本研究提出的最佳模型(NLLB-mBART)具有模型切换机制,在自动评估评分上已经达到了与最先进的商业模型相当甚至更好的水平(Bing和Baidu Translators),同时在我们的测试集中获得了16.8的SacreBLEU分数。此外,还开发了一个开源的Web应用程序,允许用户在粤语和英语之间进行翻译,该应用程序上有不同训练好的模型供有效的模型比较和用户。CANTONMT可在此处访问:<https://url>
https://arxiv.org/abs/2405.08172
Adaptive Risk Control (ARC) is an online calibration strategy based on set prediction that offers worst-case deterministic long-term risk control, as well as statistical marginal coverage guarantees. ARC adjusts the size of the prediction set by varying a single scalar threshold based on feedback from past decisions. In this work, we introduce Localized Adaptive Risk Control (L-ARC), an online calibration scheme that targets statistical localized risk guarantees ranging from conditional risk to marginal risk, while preserving the worst-case performance of ARC. L-ARC updates a threshold function within a reproducing kernel Hilbert space (RKHS), with the kernel determining the level of localization of the statistical risk guarantee. The theoretical results highlight a trade-off between localization of the statistical risk and convergence speed to the long-term risk target. Thanks to localization, L-ARC is demonstrated via experiments to produce prediction sets with risk guarantees across different data subpopulations, significantly improving the fairness of the calibrated model for tasks such as image segmentation and beam selection in wireless networks.
适应风险控制(ARC)是一种基于集合预测的在线校准策略,提供 worst-case 确定性长期风险控制以及统计边际覆盖保证。ARC 通过根据过去决策的反馈调整预测集中的大小来调整单个标量阈值。在这篇工作中,我们引入了局部适应风险控制(L-ARC),一种在线校准方案,旨在实现从条件风险到边缘风险的统计局部风险保证,同时保留 ARC 的最差性能。L-ARC 在可重复核哈伯空间(RKHS)内更新一个阈值函数,其中核确定统计风险保证的水平。理论结果强调了局部化统计风险与长期风险目标收敛速度之间的权衡。通过局部化,L-ARC 通过实验证明可以在不同数据子集上产生具有风险保证的预测集,从而显著改善无线网络中图像分割和束选择等任务的公平性校准模型。
https://arxiv.org/abs/2405.07976
Decoding language information from brain signals represents a vital research area within brain-computer interfaces, particularly in the context of deciphering the semantic information from the fMRI signal. However, many existing efforts concentrate on decoding small vocabulary sets, leaving space for the exploration of open vocabulary continuous text decoding. In this paper, we introduce a novel method, the \textbf{Brain Prompt GPT (BP-GPT)}. By using the brain representation that is extracted from the fMRI as a prompt, our method can utilize GPT-2 to decode fMRI signals into stimulus text. Further, we introduce a text-to-text baseline and align the fMRI prompt to the text prompt. By introducing the text-to-text baseline, our BP-GPT can extract a more robust brain prompt and promote the decoding of pre-trained LLM. We evaluate our BP-GPT on the open-source auditory semantic decoding dataset and achieve a significant improvement up to $4.61\%$ on METEOR and $2.43\%$ on BERTScore across all the subjects compared to the state-of-the-art method. The experimental results demonstrate that using brain representation as a prompt to further drive LLM for auditory neural decoding is feasible and effective.
解析脑信号中的语言信息是脑-计算机接口领域的一个重要研究内容,尤其是在从fMRI信号中解读语义信息方面。然而,许多现有努力都集中在小词汇集的解析上,这为探索基于open词汇的连续文本解析留下了空间。在本文中,我们介绍了一种新颖的方法,即《大脑提示GPT(BP-GPT)》。通过利用从fMRI中提取的大脑表示作为提示,我们的方法可以利用GPT-2将fMRI信号解码为刺激文本。此外,我们还引入了一个文本到文本的基线,并将fMRI提示与文本提示对齐。通过引入文本到文本的基线,我们的BP-GPT可以提取更健壮的大脑提示,促进预训练LLM的解码。我们在开放的音频语义解码数据集上评估我们的BP-GPT,并实现了在所有受试者上的METEOR和BERTScore分别提高4.61%和2.43%的显著改善。实验结果表明,将大脑表示作为提示以进一步驱动音频神经解码的LLM是可行的和有效的。
https://arxiv.org/abs/2405.07840
Encoder models trained for the embedding of sentences or short documents have proven useful for tasks such as semantic search and topic modeling. In this paper, we present a version of the SwissBERT encoder model that we specifically fine-tuned for this purpose. SwissBERT contains language adapters for the four national languages of Switzerland -- German, French, Italian, and Romansh -- and has been pre-trained on a large number of news articles in those languages. Using contrastive learning based on a subset of these articles, we trained a fine-tuned version, which we call SentenceSwissBERT. Multilingual experiments on document retrieval and text classification in a Switzerland-specific setting show that SentenceSwissBERT surpasses the accuracy of the original SwissBERT model and of a comparable baseline. The model is openly available for research use.
用于句子或短文嵌入的编码器模型已被证明在诸如语义搜索和主题建模等任务中很有用。在本文中,我们提出了一个特别为这一目的进行微调的瑞士BERT编码器模型。瑞士BERT包括瑞士四种国家语言(德语、法语、意大利语和罗曼什语)的语言适配器,并在这些语言的众多新闻文章上进行了预训练。基于这些文章的对比学习,我们训练了一个微调的版本,我们称之为SentenceSwissBERT。在瑞士特定环境下的文档检索和文本分类实验中,SentenceSwissBERT超越了原始瑞士BERT模型和可比较基线的准确性。该模型公开可供研究使用。
https://arxiv.org/abs/2405.07513
The quantitative analysis of political ideological positions is a difficult task. In the past, various literature focused on parliamentary voting data of politicians, party manifestos and parliamentary speech to estimate political disagreement and polarization in various political systems. However previous methods of quantitative political analysis suffered from a common challenge which was the amount of data available for analysis. Also previous methods frequently focused on a more general analysis of politics such as overall polarization of the parliament or party-wide political ideological positions. In this paper, we present a method to analyze ideological positions of individual parliamentary representatives by leveraging the latent knowledge of LLMs. The method allows us to evaluate the stance of politicians on an axis of our choice allowing us to flexibly measure the stance of politicians in regards to a topic/controversy of our choice. We achieve this by using a fine-tuned BERT classifier to extract the opinion-based sentences from the speeches of representatives and projecting the average BERT embeddings for each representative on a pair of reference seeds. These reference seeds are either manually chosen representatives known to have opposing views on a particular topic or they are generated sentences which where created using the GPT-4 model of OpenAI. We created the sentences by prompting the GPT-4 model to generate a speech that would come from a politician defending a particular position.
量化政治意识形态分析是一项困难的任务。在过去,各种文献关注于政治家的议会投票数据、党纲和议会演讲,以估计各种政治系统中的政治分歧和极化。然而,先前的量化政治分析方法普遍面临着一个共同的挑战,那就是数据可用性的数量。同时,先前的方法经常更广泛地分析政治,如整体议会的极化或党内的政治意识形态立场。在本文中,我们提出了一种利用LLM的潜在知识来分析个别议会代表意识形态的方法。该方法允许我们在我们选择的轴线上评估政治家的立场,从而我们可以灵活地衡量政治家在某个主题/争议问题上的立场。我们通过使用微调的BERT分类器从代表者的讲话中提取基于意见的句子,并将每个代表的平均BERT嵌入投影到两个参考种子对上。这些参考种子可以是手动选择的已知对某个主题持有不同观点的代表,也可以是使用OpenAI的GPT-4模型生成的句子。我们通过向GPT-4模型发出请求,生成一个政治家为某个立场辩护的演讲。
https://arxiv.org/abs/2405.07320
As the conversation around using geoengineering to combat climate change intensifies, it is imperative to engage the public and deeply understand their perspectives on geoengineering research, development, and potential deployment. Through a comprehensive data-driven investigation, this paper explores the types of news that captivate public interest in geoengineering. We delved into 30,773 English-language news articles from the BBC and the New York Times, combined with Google Trends data spanning 2018 to 2022, to explore how public interest in geoengineering fluctuates in response to news coverage of broader climate issues. Using BERT-based topic modeling, sentiment analysis, and time-series regression models, we found that positive sentiment in energy-related news serves as a good predictor of heightened public interest in geoengineering, a trend that persists over time. Our findings suggest that public engagement with geoengineering and climate action is not uniform, with some topics being more potent in shaping interest over time, such as climate news related to energy, disasters, and politics. Understanding these patterns is crucial for scientists, policymakers, and educators aiming to craft effective strategies for engaging with the public and fostering dialogue around emerging climate technologies.
随着关于使用地球工程对抗气候变化的对话不断加剧,我们有必要与公众深入探讨他们对地球工程研究、开发和潜在部署的看法。通过全面的數據驱动調查,本文探討了哪些新闻吸引了公众对地球工程的兴趣。我们深入挖掘了BBC和《纽约时报》的30,773篇英文新闻文章以及2018年至2022年间的Google Trends数据,以探讨公众对地球工程兴趣的波动如何随针对更广泛气候问题的新闻报道而变化。利用基于BERT的主题建模、情感分析和时间序列回归模型,我们发现,与能源相关的新闻中积极情绪是一个预测公众对地球工程兴趣提高的好指标,这一趋势会随着时间延续。我们的研究结果表明,公众与地球工程和应对气候变化的参与程度并不一致。在塑造长期兴趣方面,一些主题比其他主题更有影响力,比如与能源、灾难和政治相关的气候新闻。了解这些模式对科学家、政策制定者和教育者来说至关重要,他们试图制定有效的策略与公众进行互动,并促进关于新兴气候技术的热烈讨论。
https://arxiv.org/abs/2405.07010
Event relation extraction (ERE) is a critical and fundamental challenge for natural language processing. Existing work mainly focuses on directly modeling the entire document, which cannot effectively handle long-range dependencies and information redundancy. To address these issues, we propose a cluster-aware compression method for improving event relation extraction (TacoERE), which explores a compression-then-extraction paradigm. Specifically, we first introduce document clustering for modeling event dependencies. It splits the document into intra- and inter-clusters, where intra-clusters aim to enhance the relations within the same cluster, while inter-clusters attempt to model the related events at arbitrary distances. Secondly, we utilize cluster summarization to simplify and highlight important text content of clusters for mitigating information redundancy and event distance. We have conducted extensive experiments on both pre-trained language models, such as RoBERTa, and large language models, such as ChatGPT and GPT-4, on three ERE datasets, i.e., MAVEN-ERE, EventStoryLine and HiEve. Experimental results demonstrate that TacoERE is an effective method for ERE.
事件关系提取(ERE)是自然语言处理的一个关键和基本挑战。现有的工作主要集中在直接建模整个文档,这无法有效地处理长距离依赖和信息冗余。为了应对这些问题,我们提出了一个聚类感知压缩方法来提高事件关系提取(TacoREE),这是一种压缩然后再提取范式。具体来说,我们首先引入了文档聚类来建模事件依赖关系。它将文档划分为内聚和外聚的簇,内聚簇旨在在同一簇内增强关系,而外聚簇试图在任意距离上建模相关事件。其次,我们利用聚类摘要来简化并突出聚类的重要文本内容,以减轻信息冗余和事件距离。我们在三个数据集上(即MAVEN-ERE,EventStoryLine和HiEve)对预训练语言模型(如RoBERTa和ChatGPT)进行了广泛的实验。实验结果表明,TacoREE是一种有效的用于事件关系提取的有效方法。
https://arxiv.org/abs/2405.06890
Patient hand-off and triage are two fundamental problems in health care. Often doctors must painstakingly summarize complex findings to efficiently communicate with specialists and quickly make decisions on which patients have the most urgent cases. In pursuit of these challenges, we present (1) a model with state-of-art radiology report summarization performance using (2) a novel method for augmenting medical data, and (3) an analysis of the model limitations and radiology knowledge gain. We also provide a data processing pipeline for future models developed on the the MIMIC CXR dataset. Our best performing model was a fine-tuned BERT-to-BERT encoder-decoder with 58.75/100 ROUGE-L F1, which outperformed specialized checkpoints with more sophisticated attention mechanisms. We investigate these aspects in this work.
患者交接班和分诊是医疗保健中的两个基本问题。通常,医生必须费力地总结复杂的放射学报告以有效地与专家沟通并尽快做出关于病情最紧迫的患者的决策。为了解决这些挑战,我们提出了一个使用最先进的放射学报告总结性能的模型,该模型使用了一种新颖的方法来增强医疗数据,并分析了模型的局限性和放射学知识获取。我们还提供了基于MIMIC CXR数据集未来模型的数据处理管道。我们表现最好的模型是一款经过微调的BERT-to-BERT编码器-解码器,其ROUGE-L分数为58.75/100,优于具有更复杂注意机制的专业检查点。我们在本研究中调查了这些方面。
https://arxiv.org/abs/2405.06802
In today's digital landscape, where cyber attacks have become the norm, the detection of cyber attacks and threats is critically imperative across diverse domains. Our research presents a new empirical framework for cyber threat modeling, adept at parsing and categorizing cyber-related information from news articles, enhancing real-time vigilance for market stakeholders. At the core of this framework is a fine-tuned BERT model, which we call CANAL - Cyber Activity News Alerting Language Model, tailored for cyber categorization using a novel silver labeling approach powered by Random Forest. We benchmark CANAL against larger, costlier LLMs, including GPT-4, LLaMA, and Zephyr, highlighting their zero to few-shot learning in cyber news classification. CANAL demonstrates superior performance by outperforming all other LLM counterparts in both accuracy and cost-effectiveness. Furthermore, we introduce the Cyber Signal Discovery module, a strategic component designed to efficiently detect emerging cyber signals from news articles. Collectively, CANAL and Cyber Signal Discovery module equip our framework to provide a robust and cost-effective solution for businesses that require agile responses to cyber intelligence.
在当今数字 landscape中,网络攻击已成为常态,跨多个领域的检测网络攻击和威胁至关重要。我们的研究提出了一种新的实证框架,用于网络威胁建模,善于解析和分类与网络相关的信息,增强市场参与者的实时警惕。这个框架的核心是一个经过微调的BERT模型,我们称之为CANAL - 网络活动新闻警报语言模型,采用了一种新的银标注方法,利用随机森林进行网络安全分类。我们对比CANAL与其他大型、昂贵的LLM,包括GPT-4、LLLaMA和Zephyr,突显了它们在网络新闻分类中的零到几 shot学习。通过超越所有其他LLM备选,CANAL在准确性和性价比方面都表现卓越。此外,我们还引入了Cyber Signal Discovery模块,这是一个 strategic 组件,旨在有效地从新闻文章中检测新兴的网络安全信号。总之,CANAL 和 Cyber Signal Discovery 模块使我们的框架能够为需要对网络情报作出敏捷反应的企业提供稳健且经济有效的解决方案。
https://arxiv.org/abs/2405.06772