Language has long been conceived as an essential tool for human reasoning. The breakthrough of Large Language Models (LLMs) has sparked significant research interest in leveraging these models to tackle complex reasoning tasks. Researchers have moved beyond simple autoregressive token generation by introducing the concept of "thought" -- a sequence of tokens representing intermediate steps in the reasoning process. This innovative paradigm enables LLMs' to mimic complex human reasoning processes, such as tree search and reflective thinking. Recently, an emerging trend of learning to reason has applied reinforcement learning (RL) to train LLMs to master reasoning processes. This approach enables the automatic generation of high-quality reasoning trajectories through trial-and-error search algorithms, significantly expanding LLMs' reasoning capacity by providing substantially more training data. Furthermore, recent studies demonstrate that encouraging LLMs to "think" with more tokens during test-time inference can further significantly boost reasoning accuracy. Therefore, the train-time and test-time scaling combined to show a new research frontier -- a path toward Large Reasoning Model. The introduction of OpenAI's o1 series marks a significant milestone in this research direction. In this survey, we present a comprehensive review of recent progress in LLM reasoning. We begin by introducing the foundational background of LLMs and then explore the key technical components driving the development of large reasoning models, with a focus on automated data construction, learning-to-reason techniques, and test-time scaling. We also analyze popular open-source projects at building large reasoning models, and conclude with open challenges and future research directions.
长期以来,人们一直认为语言是人类推理的必要工具。大型语言模型(LLMs)的重大突破激发了利用这些模型来解决复杂推理任务的研究兴趣。研究人员已经超越了简单的自回归令牌生成,引入了“思维”这一概念——一系列代表推理过程中间步骤的令牌序列。这种创新的方法使LLM能够模仿复杂的类人类推理过程,如树搜索和反思思考。最近,一种新兴的学习推理趋势是应用强化学习(RL)来训练LLM掌握推理过程。这种方法通过试错算法实现了高质量推理轨迹的自动生成,并且通过提供大量训练数据显著扩展了LLM的推理能力。此外,近期研究表明,在测试时鼓励LLMs使用更多令牌进行“思考”可以进一步大幅提高推理准确性。因此,结合训练时间和测试时间上的扩展展示了一个新的研究前沿——通向大型推理模型的道路。OpenAI推出的o1系列标志着这一研究方向的一个重要里程碑。在这份综述中,我们将介绍近期在LLM推理方面的重大进展。我们首先引入LLMs的基础背景知识,然后探讨驱动大规模推理模型发展的关键技术组件,重点在于自动化数据构建、学习推理技术以及测试时间扩展。此外,我们还会分析一些热门的开源项目在构建大型推理模型中的应用,并最终提出开放性挑战和未来研究方向。
https://arxiv.org/abs/2501.09686
This tutorial provides an in-depth guide on inference-time guidance and alignment methods for optimizing downstream reward functions in diffusion models. While diffusion models are renowned for their generative modeling capabilities, practical applications in fields such as biology often require sample generation that maximizes specific metrics (e.g., stability, affinity in proteins, closeness to target structures). In these scenarios, diffusion models can be adapted not only to generate realistic samples but also to explicitly maximize desired measures at inference time without fine-tuning. This tutorial explores the foundational aspects of such inference-time algorithms. We review these methods from a unified perspective, demonstrating that current techniques -- such as Sequential Monte Carlo (SMC)-based guidance, value-based sampling, and classifier guidance -- aim to approximate soft optimal denoising processes (a.k.a. policies in RL) that combine pre-trained denoising processes with value functions serving as look-ahead functions that predict from intermediate states to terminal rewards. Within this framework, we present several novel algorithms not yet covered in the literature. Furthermore, we discuss (1) fine-tuning methods combined with inference-time techniques, (2) inference-time algorithms based on search algorithms such as Monte Carlo tree search, which have received limited attention in current research, and (3) connections between inference-time algorithms in language models and diffusion models. The code of this tutorial on protein design is available at this https URL
这篇教程提供了关于推理时引导和对齐方法的深入指南,这些方法用于优化扩散模型中的下游奖励函数。虽然扩散模型因其生成建模能力而闻名,但在生物学等领域中的实际应用通常需要生成最大化特定指标(例如蛋白质的稳定性、亲和力以及接近目标结构的程度)的样本。在这些场景中,可以对扩散模型进行调整,使其不仅能生成逼真的样本,还能在推理时明确地最大化所需的度量值而不需微调。本教程探讨了此类推理时间算法的基础方面,并从统一的角度回顾这些方法,表明当前的技术——如基于序列蒙特卡洛(SMC)的引导、基于价值的采样以及分类器引导——旨在近似软优化去噪过程(即RL中的策略),该过程结合了预训练的去噪过程和作为预测函数的价值功能,从中间状态到最终奖励。在此框架内,我们提出了一些尚未在文献中被涵盖的新算法。 此外,本教程还讨论了: 1. 结合推理时间技术的微调方法; 2. 基于搜索算法(如蒙特卡洛树搜索)的推理时间算法,在当前研究中受到了较少关注;以及 3. 语言模型与扩散模型之间在推理时间算法上的联系。 有关蛋白质设计教程代码,请访问此链接:[https URL]
https://arxiv.org/abs/2501.09685
In many real-world applications, agents must make sequential decisions in environments where conditions are subject to change due to various exogenous factors. These non-stationary environments pose significant challenges to traditional decision-making models, which typically assume stationary dynamics. Non-stationary Markov decision processes (NS-MDPs) offer a framework to model and solve decision problems under such changing conditions. However, the lack of standardized benchmarks and simulation tools has hindered systematic evaluation and advance in this field. We present NS-Gym, the first simulation toolkit designed explicitly for NS-MDPs, integrated within the popular Gymnasium framework. In NS-Gym, we segregate the evolution of the environmental parameters that characterize non-stationarity from the agent's decision-making module, allowing for modular and flexible adaptations to dynamic environments. We review prior work in this domain and present a toolkit encapsulating key problem characteristics and types in NS-MDPs. This toolkit is the first effort to develop a set of standardized interfaces and benchmark problems to enable consistent and reproducible evaluation of algorithms under non-stationary conditions. We also benchmark six algorithmic approaches from prior work on NS-MDPs using NS-Gym. Our vision is that NS-Gym will enable researchers to assess the adaptability and robustness of their decision-making algorithms to non-stationary conditions.
在许多现实世界的应用中,代理必须在一个条件会因各种外生因素而变化的环境中做出一系列决策。这种非平稳环境对传统假设为动态不变的经典决策模型提出了重大挑战。非平稳马尔可夫决策过程(NS-MDPs)提供了一种建模和解决此类条件下决策问题的框架。然而,缺乏标准化的基准测试和模拟工具阻碍了该领域的系统评估和进展。 我们介绍了 NS-Gym,这是第一个专门为 NS-MDP 设计的仿真工具包,并且它被整合到了流行的 Gymnasium 框架中。在 NS-Gym 中,我们将描述环境非平稳性特征的参数变化与代理决策模块分离开来,从而允许对动态环境进行模块化和灵活地适应。 我们回顾了此前的工作并介绍了一个包含 NS-MDP 关键问题特性及类型的工具包。这个工具包是第一个努力开发一系列标准化接口和基准测试问题以实现非平稳条件下算法的一致性和可重复性评估的尝试。我们也使用 NS-Gym 对六种先前文献中提出的关于 NS-MDP 的算法方法进行了基准测试。 我们的愿景是,NS-Gym 将使研究人员能够评估其决策制定算法在面对非平稳条件时的适应能力和鲁棒性。
https://arxiv.org/abs/2501.09646
Conventional 2D human pose estimation methods typically require extensive labeled annotations, which are both labor-intensive and expensive. In contrast, semi-supervised 2D human pose estimation can alleviate the above problems by leveraging a large amount of unlabeled data along with a small portion of labeled data. Existing semi-supervised 2D human pose estimation methods update the network through backpropagation, ignoring crucial historical information from the previous training process. Therefore, we propose a novel semi-supervised 2D human pose estimation method by utilizing a newly designed Teacher-Reviewer-Student framework. Specifically, we first mimic the phenomenon that human beings constantly review previous knowledge for consolidation to design our framework, in which the teacher predicts results to guide the student's learning and the reviewer stores important historical parameters to provide additional supervision signals. Secondly, we introduce a Multi-level Feature Learning strategy, which utilizes the outputs from different stages of the backbone to estimate the heatmap to guide network training, enriching the supervisory information while effectively capturing keypoint relationships. Finally, we design a data augmentation strategy, i.e., Keypoint-Mix, to perturb pose information by mixing different keypoints, thus enhancing the network's ability to discern keypoints. Extensive experiments on publicly available datasets, demonstrate our method achieves significant improvements compared to the existing methods.
传统的2D人体姿态估计方法通常需要大量的标注数据,这既耗时又昂贵。相比之下,半监督的2D人体姿态估计可以通过利用大量未标记的数据和少量已标注的数据来缓解上述问题。现有的半监督2D人体姿态估计方法通过反向传播更新网络,而忽视了之前训练过程中重要的历史信息。因此,我们提出了一种新的半监督2D人体姿态估计方法,采用了一个新颖设计的教师-评审员-学生框架(Teacher-Reviewer-Student framework)。具体来说,首先模仿人类不断回顾以前的知识以巩固记忆的现象来设计我们的框架,在这个框架中,教师预测结果来指导学生的学习,而评审员存储重要的历史参数以提供额外的监督信号。其次,我们引入了一种多级特征学习策略,利用骨干网络不同阶段的输出估计热图(heatmap)来指导网络训练,这不仅丰富了监督信息,还能有效捕捉关键点之间的关系。最后,我们设计了一种数据增强策略,即关键点混合(Keypoint-Mix),通过混合不同的关键点来扰动姿态信息,从而增强了网络区分关键点的能力。在公开可用的数据集上进行的广泛实验表明,我们的方法相较于现有方法取得了显著的改进。
https://arxiv.org/abs/2501.09565
This study presents a comprehensive review of the potential of multimodal deep learning (DL) in medical diagnosis, using COVID-19 as a case example. Motivated by the success of artificial intelligence applications during the COVID-19 pandemic, this research aims to uncover the capabilities of DL in disease screening, prediction, and classification, and to derive insights that enhance the resilience, sustainability, and inclusiveness of science, technology, and innovation systems. Adopting a systematic approach, we investigate the fundamental methodologies, data sources, preprocessing steps, and challenges encountered in various studies and implementations. We explore the architecture of deep learning models, emphasising their data-specific structures and underlying algorithms. Subsequently, we compare different deep learning strategies utilised in COVID-19 analysis, evaluating them based on methodology, data, performance, and prerequisites for future research. By examining diverse data types and diagnostic modalities, this research contributes to scientific understanding and knowledge of the multimodal application of DL and its effectiveness in diagnosis. We have implemented and analysed 11 deep learning models using COVID-19 image, text, and speech (ie, cough) data. Our analysis revealed that the MobileNet model achieved the highest accuracy of 99.97% for COVID-19 image data and 93.73% for speech data (i.e., cough). However, the BiGRU model demonstrated superior performance in COVID-19 text classification with an accuracy of 99.89%. The broader implications of this research suggest potential benefits for other domains and disciplines that could leverage deep learning techniques for image, text, and speech analysis.
这项研究提供了多模态深度学习(DL)在医学诊断中潜在应用的全面回顾,以COVID-19为例。鉴于人工智能技术在新冠疫情期间的成功应用,本研究旨在揭示深度学习在疾病筛查、预测和分类方面的潜力,并从中获得有助于增强科学、技术和创新体系韧性、可持续性和包容性的见解。采用系统方法,我们探讨了各种研究与实施中遇到的基本方法论、数据来源、预处理步骤以及所面临的挑战。我们还探索了深度学习模型的架构,强调其特定于数据的结构及其基础算法。接下来,我们将比较在COVID-19分析中使用的不同深度学习策略,并根据方法学、数据、性能和未来研究的需求对其进行评估。 通过考察不同类型的数据及诊断模式,本研究为多模态应用下的DL科学理解和知识贡献了力量,并探讨其在诊断中的有效性。我们实施并分析了11种基于COVID-19图像、文本以及语音(即咳嗽)数据的深度学习模型。我们的分析表明,MobileNet模型对COVID-19图像数据实现了最高精度为99.97%,而针对语音数据(如咳嗽)的准确率达到了93.73%。然而,在COVID-19文本分类中,BiGRU模型表现出色,其准确性达到99.89%。 这项研究更广泛的含义在于,它可能对其他领域和学科产生潜在益处,这些领域和学科可以利用深度学习技术进行图像、文本以及语音分析。
https://arxiv.org/abs/2501.09506
Understanding emotions accurately is essential for fields like human-computer interaction. Due to the complexity of emotions and their multi-modal nature (e.g., emotions are influenced by facial expressions and audio), researchers have turned to using multi-modal models to understand human emotions rather than single-modality. However, current video multi-modal large language models (MLLMs) encounter difficulties in effectively integrating audio and identifying subtle facial micro-expressions. Furthermore, the lack of detailed emotion analysis datasets also limits the development of multimodal emotion analysis. To address these issues, we introduce a self-reviewed dataset and a human-reviewed dataset, comprising 24,137 coarse-grained samples and 3,500 manually annotated samples with detailed emotion annotations, respectively. These datasets allow models to learn from diverse scenarios and better generalize to real-world applications. Moreover, in addition to the audio modeling, we propose to explicitly integrate facial encoding models into the existing advanced Video MLLM, enabling the MLLM to effectively unify audio and the subtle facial cues for emotion understanding. By aligning these features within a unified space and employing instruction tuning in our proposed datasets, our Omni-Emotion achieves state-of-the-art performance in both emotion recognition and reasoning tasks.
准确理解情感对于人机交互等领域来说至关重要。由于情绪的复杂性和多模态特性(例如,情绪会受到面部表情和音频的影响),研究人员已经转向使用多模态模型来理解和分析人类情绪,而不是单一模式的方法。然而,目前的视频多模态大语言模型在有效地融合音频数据以及识别细微的面部微表情方面遇到了困难。此外,缺乏详细的多模态情感分析数据集也限制了该领域的发展。 为了解决这些问题,我们引入了一个自我审查的数据集和一个人工审查的数据集,分别包含了24,137个粗粒度样本和3,500个详细标注的情感样本。这些数据集使模型能够从各种场景中学习,并更好地泛化到实际应用中去。 此外,在音频建模之外,我们提议将面部编码模型明确地整合到现有的先进视频多模态大语言模型(Video MLLM)之中,使得该模型能有效地统一音频和细微的面部线索进行情感理解。通过在提出的这些数据集中对特征进行空间上的对齐,并采用指令调优方法,我们的Omni-Emotion系统在情绪识别和推理任务中均达到了当前的最佳性能水平。
https://arxiv.org/abs/2501.09502
How are robots becoming smarter at interacting with their surroundings? Recent advances have reshaped how robots use tactile sensing to perceive and engage with the world. Tactile sensing is a game-changer, allowing robots to embed sensorimotor control strategies to interact with complex environments and skillfully handle heterogeneous objects. Such control frameworks plan contact-driven motions while staying responsive to sudden changes. We review the latest methods for building perception and control systems in tactile robotics while offering practical guidelines for their design and implementation. We also address key challenges to shape the future of intelligent robots.
机器人是如何在与周围环境互动时变得更聪明的?最近的技术进步重新定义了机器人如何使用触觉传感来感知和参与世界。触觉传感是一项重大突破,它使机器人能够嵌入传感器-运动控制策略,以应对复杂的环境并熟练地处理不同类型的物体。这种控制系统可以规划接触驱动的动作,并对突发变化保持响应能力。本文回顾了在触觉机器人中构建感知与控制系统的方法,同时提供了这些系统设计和实施的实用指南。此外,我们还讨论了一些关键挑战,旨在塑造智能机器人的未来发展方向。
https://arxiv.org/abs/2501.09468
While large language models (LLMs) present significant potential for supporting numerous real-world applica- tions and delivering positive social impacts, they still face significant challenges in terms of the inherent risk of privacy leakage, hallucinated outputs, and value misalignment, and can be maliciously used for generating toxic content and unethical purposes after been jailbroken. Therefore, in this survey, we present a comprehensive review of recent advancements aimed at mitigating these issues, organized across the four phases of LLM development and usage: data collecting and pre-training, fine-tuning and alignment, prompting and reasoning, and post-processing and auditing. We elaborate on the recent advances for enhancing the performance of LLMs in terms of privacy protection, hallucination reduction, value alignment, toxicity elimination, and jailbreak defenses. In contrast to previous surveys that focus on a single dimension of responsible LLMs, this survey presents a unified framework that encompasses these diverse dimensions, providing a comprehensive view of enhancing LLMs to better serve real-world applications.
尽管大型语言模型(LLMs)在支持众多现实世界应用和带来积极社会影响方面展现出巨大潜力,但它们仍然面临着固有的隐私泄露风险、幻觉输出以及价值不一致等重大挑战,并且在被破解后可能会恶意用于生成有毒内容和不符合伦理的目的。因此,在本次综述中,我们全面回顾了近期为减轻这些问题而取得的进展,这些进展按照LLMs开发和使用过程中的四个阶段进行组织:数据收集与预训练、微调与对齐、提示与推理以及后期处理与审计。我们将详细阐述最近在增强LLMs隐私保护性能、幻觉减少、价值一致性和消除毒性以及防破解措施方面的进展。 与之前专注于负责任的LLMs单一维度的综述不同,本综述提出了一种涵盖这些多样维度的统一框架,为如何通过多种方式提升LLMs性能以更好地服务于现实世界应用提供了全面的看法。
https://arxiv.org/abs/2501.09431
Image segmentation, a key task in computer vision, has traditionally relied on convolutional neural networks (CNNs), yet these models struggle with capturing complex spatial dependencies, objects with varying scales, need for manually crafted architecture components and contextual information. This paper explores the shortcomings of CNN-based models and the shift towards transformer architectures -to overcome those limitations. This work reviews state-of-the-art transformer-based segmentation models, addressing segmentation-specific challenges and their solutions. The paper discusses current challenges in transformer-based segmentation and outlines promising future trends, such as lightweight architectures and enhanced data efficiency. This survey serves as a guide for understanding the impact of transformers in advancing segmentation capabilities and overcoming the limitations of traditional models.
图像分割,作为计算机视觉中的一个关键任务,长期以来一直依赖于卷积神经网络(CNN)。然而,这些模型在捕捉复杂的空间依赖关系、处理不同尺度的对象以及利用手工设计的架构组件和上下文信息方面存在困难。本文探讨了基于 CNN 的模型的不足之处,并转向基于变压器架构的趋势以克服这些限制。这项工作回顾了最新的基于变压器的分割模型,针对特定于分割的挑战及其解决方案进行了讨论。 论文还讨论了当前基于变压器的分割面临的挑战,并概述了一些有前景的发展趋势,例如轻量级架构和增强的数据效率。这篇综述旨在帮助理解变压器在提升分割能力以及克服传统模型局限性方面的影响。
https://arxiv.org/abs/2501.09372
This review underscores the critical need for effective strategies to identify and support individuals with suicidal ideation, exploiting technological innovations in ML and DL to further suicide prevention efforts. The study details the application of these technologies in analyzing vast amounts of unstructured social media data to detect linguistic patterns, keywords, phrases, tones, and contextual cues associated with suicidal thoughts. It explores various ML and DL models like SVMs, CNNs, LSTM, neural networks, and their effectiveness in interpreting complex data patterns and emotional nuances within text data. The review discusses the potential of these technologies to serve as a life-saving tool by identifying at-risk individuals through their digital traces. Furthermore, it evaluates the real-world effectiveness, limitations, and ethical considerations of employing these technologies for suicide prevention, stressing the importance of responsible development and usage. The study aims to fill critical knowledge gaps by analyzing recent studies, methodologies, tools, and techniques in this field. It highlights the importance of synthesizing current literature to inform practical tools and suicide prevention efforts, guiding innovation in reliable, ethical systems for early intervention. This research synthesis evaluates the intersection of technology and mental health, advocating for the ethical and responsible application of ML, DL, and NLP to offer life-saving potential worldwide while addressing challenges like generalizability, biases, privacy, and the need for further research to ensure these technologies do not exacerbate existing inequities and harms.
这篇评论强调了识别和支持有自杀倾向个体的有效策略的迫切需求,利用机器学习(ML)和深度学习(DL)等技术创新来进一步推动自杀预防工作。研究详细介绍了这些技术在分析大量非结构化社交媒体数据方面的作用,以检测与自杀想法相关的语言模式、关键词、短语、语气以及上下文线索。该评论探讨了各种ML和DL模型,如支持向量机(SVM)、卷积神经网络(CNN)、长短期记忆网络(LSTM)和神经网络,并评估它们在解释复杂数据模式和文本情感细微差别方面的有效性。 评论讨论了这些技术作为挽救生命工具的潜力,通过识别有自杀风险个体在其数字足迹中留下的线索。此外,它还评估了将这些技术用于预防自杀的实际效果、局限性和伦理考虑,强调负责任地开发和使用这些技术的重要性。研究旨在填补知识空白,分析近期该领域的研究、方法论、工具和技术。 评论指出综合现有文献对于指导实践工具和自杀预防工作至关重要,并推动可靠且符合伦理规范的早期干预系统创新。这项研究综述评估了科技与心理健康之间的交汇点,并倡导负责任地应用ML、DL以及自然语言处理(NLP)技术,以在全球范围内提供挽救生命的潜力。同时,评论也指出了诸如泛化性、偏见、隐私问题等挑战,强调需要进一步的研究以确保这些技术不会加剧现有的不平等和伤害。
https://arxiv.org/abs/2501.09309
With the rapid development of digital services, a large volume of personally identifiable information (PII) is stored online and is subject to cyberattacks such as Identity fraud. Most recently, the use of Artificial Intelligence (AI) enabled deep fake technologies has significantly increased the complexity of identity fraud. Fraudsters may use these technologies to create highly sophisticated counterfeit personal identification documents, photos and videos. These advancements in the identity fraud landscape pose challenges for identity fraud detection and society at large. There is a pressing need to review and understand identity fraud detection methods, their limitations and potential solutions. This research aims to address this important need by using the well-known systematic literature review method. This paper reviewed a selected set of 43 papers across 4 major academic literature databases. In particular, the review results highlight the two types of identity fraud prevention and detection methods, in-depth and open challenges. The results were also consolidated into a taxonomy of AI-based identity fraud detection and prevention methods including key insights and trends. Overall, this paper provides a foundational knowledge base to researchers and practitioners for further research and development in this important area of digital identity fraud.
随着数字服务的快速发展,大量的个人识别信息(PII)被存储在网上,并且面临诸如身份欺诈等网络攻击的风险。最近,人工智能(AI)驱动的深度伪造技术的使用显著增加了身份欺诈的复杂性。骗子可能会利用这些技术来创建高度复杂的假冒个人身份证件、照片和视频。这种身份欺诈领域的进步对身份欺诈检测以及整个社会构成了挑战。迫切需要审查和理解身份欺诈检测方法、其局限性和潜在解决方案。这项研究旨在通过使用广为人知的系统文献回顾法解决这一重要需求。本文在四大主要学术文献数据库中选取了43篇论文进行综述。特别地,评审结果强调了两种类型的身份欺诈预防和检测方法以及深入且开放性的挑战。此外,还将这些成果整合成一种基于AI的身份欺诈检测与防范方法分类系统,包括关键见解和发展趋势。总体而言,本文为研究人员和从业者在这一重要的数字身份欺诈领域进行进一步研究和开发提供了一个基础的知识库。
https://arxiv.org/abs/2501.09239
Aspect-based sentiment analysis (ASBA) is a refined approach to sentiment analysis that aims to extract and classify sentiments based on specific aspects or features of a product, service, or entity. Unlike traditional sentiment analysis, which assigns a general sentiment score to entire reviews or texts, ABSA focuses on breaking down the text into individual components or aspects (e.g., quality, price, service) and evaluating the sentiment towards each. This allows for a more granular level of understanding of customer opinions, enabling businesses to pinpoint specific areas of strength and improvement. The process involves several key steps, including aspect extraction, sentiment classification, and aspect-level sentiment aggregation for a review paragraph or any other form that the users have provided. ABSA has significant applications in areas such as product reviews, social media monitoring, customer feedback analysis, and market research. By leveraging techniques from natural language processing (NLP) and machine learning, ABSA facilitates the extraction of valuable insights, enabling companies to make data-driven decisions that enhance customer satisfaction and optimize offerings. As ABSA evolves, it holds the potential to greatly improve personalized customer experiences by providing a deeper understanding of sentiment across various product aspects. In this work, we have analyzed the strength of LLMs for a complete cross-domain aspect-based sentiment analysis with the aim of defining the framework for certain products and using it for other similar situations. We argue that it is possible to that at an effectiveness of 92\% accuracy for the Aspect Based Sentiment Analysis dataset of SemEval-2015 Task 12.
基于方面的情感分析(ASBA)是一种细化的情感分析方法,旨在根据产品、服务或实体的特定方面或特征来提取和分类情感。与传统的整体评论或文本通用情感评分的方法不同,ASBA专注于将文本分解为各个组成部分或方面(例如质量、价格、服务),并对每个方面的感受进行评估。这种方法使企业能够更细致地了解客户的意见,并确定具体的优势和改进领域。该过程包括几个关键步骤:方面提取、情感分类以及对评论段落或其他用户提供的形式的方面层面的情感聚合。 ASBA在产品评价、社交媒体监控、顾客反馈分析及市场研究等领域具有广泛的应用价值。通过利用自然语言处理(NLP)和机器学习技术,ASBA能够抽取有价值的见解,使公司能做出基于数据的决策以提升客户满意度并优化提供服务。随着ASBA的发展,它有望通过更深入地理解不同产品方面的情感来大幅提高个性化客户体验。 在这项工作中,我们分析了大型语言模型(LLM)在跨领域方面情感分析中的强度,并旨在为某些产品定义框架,同时将此应用于类似情况。我们认为,有可能达到针对SemEval-2015 Task 12的基于方面的语义情感分析数据集的92%准确率。 该段落强调了ASBA方法的重要性及其在不同领域的应用潜力,并提出了一种通过大型语言模型实现跨领域ASBA的方法框架。此外,还提出了一个目标,在特定的数据集中达到高精度的结果(即92%)。
https://arxiv.org/abs/2501.08974
Sentiment analysis is one of the most crucial tasks in Natural Language Processing (NLP), involving the training of machine learning models to classify text based on the polarity of opinions. Pre-trained Language Models (PLMs) can be applied to downstream tasks through fine-tuning, eliminating the need to train the model from scratch. Specifically, PLMs have been employed for Sentiment Analysis, a process that involves detecting, analyzing, and extracting the polarity of text sentiments. Numerous models have been proposed to address this task, with pre-trained PhoBERT-V2 models standing out as the state-of-the-art language models for Vietnamese. The PhoBERT-V2 pre-training approach is based on RoBERTa, optimizing the BERT pre-training method for more robust performance. In this paper, we introduce a novel approach that combines PhoBERT-V2 and SentiWordnet for Sentiment Analysis of Vietnamese reviews. Our proposed model utilizes PhoBERT-V2 for Vietnamese, offering a robust optimization for the prominent BERT model in the context of Vietnamese language, and leverages SentiWordNet, a lexical resource explicitly designed to support sentiment classification applications. Experimental results on the VLSP 2016 and AIVIVN 2019 datasets demonstrate that our sentiment analysis system has achieved excellent performance in comparison to other models.
情感分析是自然语言处理(NLP)中最关键的任务之一,涉及通过训练机器学习模型来根据意见的极性对文本进行分类。预训练的语言模型(PLM)可以通过微调应用于下游任务,从而无需从头开始重新训练模型。具体来说,这些PLMs已被用于情感分析过程,该过程包括检测、分析和提取文本情绪的极性。已经提出了多种模型来解决这一任务,其中基于RoBERTa优化了BERT预训练方法的PhoBERT-V2预训练方法脱颖而出,成为越南语最先进的语言模型。在这篇论文中,我们介绍了一种结合使用PhoBERT-V2和SentiWordnet进行越南评论情感分析的新颖方法。我们的提议模型利用了针对越南语进行了强大优化的PhoBERT-V2,并借鉴了专门为支持情感分类应用而设计的词典资源SentiWordnet。在VLSP 2016和AIVIVN 2019数据集上的实验结果表明,我们的情感分析系统与其他模型相比取得了卓越的成绩。 这一段文本概述了一个基于PhoBERT-V2和SentiWordnet的新情感分析方法,并强调了该系统的性能优势。
https://arxiv.org/abs/2501.08758
Mobile robot fleets are currently used in different scenarios such as medical environments or logistics. The management of these systems provides different challenges that vary from the control of the movement of each robot to the allocation of tasks to be performed. Task Allocation (TA) problem is a key topic for the proper management of mobile robot fleets to ensure the minimization of energy consumption and quantity of necessary robots. Solutions on this aspect are essential to reach economic and environmental sustainability of robot fleets, mainly in industry applications such as warehouse logistics. The minimization of energy consumption introduces TA problem as an optimization issue which has been treated in recent studies. This work focuses on the analysis of current trends in solving TA of mobile robot fleets. Main TA optimization algorithms are presented, including novel methods based on Artificial Intelligence (AI). Additionally, this work showcases most important results extracted from simulations, including frameworks utilized for the development of the simulations. Finally, some conclusions are obtained from the analysis to target on gaps that must be treated in the future.
当前,移动机器人舰队被应用于多种场景中,如医疗环境或物流。这些系统的管理提供了从控制每个机器人的运动到任务分配的各种挑战。任务分配(TA)问题是确保移动机器人舰队高效运行的关键问题之一,旨在最小化能耗和所需机器人数量。在诸如仓库物流等工业应用领域实现机器人车队的经济效益和环保可持续性方面,解决这一问题至关重要。 随着对能源消耗最小化的追求,任务分配问题逐渐成为一个优化难题,并受到了近期研究的关注。本文重点分析了目前解决移动机器人舰队任务分配(TA)的趋势。文中介绍了主要的任务分配优化算法,包括基于人工智能(AI)的新方法。此外,文章展示了从模拟实验中提取的最重要结果,其中包括用于开发这些模拟的框架。最后,通过对现有问题进行分析,指出了未来研究需要填补的一些空白。 总之,本文旨在概述当前解决移动机器人舰队任务分配优化问题的方法和技术进展,并为未来的研究方向提供了指导和建议。
https://arxiv.org/abs/2501.08726
Facial recognition models are increasingly employed by commercial enterprises, government agencies, and cloud service providers for identity verification, consumer services, and surveillance. These models are often trained using vast amounts of facial data processed and stored in cloud-based platforms, raising significant privacy concerns. Users' facial images may be exploited without their consent, leading to potential data breaches and misuse. This survey presents a comprehensive review of current methods aimed at preserving facial image privacy in cloud-based services. We categorize these methods into two primary approaches: image obfuscation-based protection and adversarial perturbation-based protection. We provide an in-depth analysis of both categories, offering qualitative and quantitative comparisons of their effectiveness. Additionally, we highlight unresolved challenges and propose future research directions to improve privacy preservation in cloud computing environments.
面部识别模型被商业企业、政府机构和云服务提供商广泛用于身份验证、消费者服务和监控。这些模型通常使用大量存储在云端平台中的面部数据进行训练,这引发了重大的隐私问题。未经用户同意的情况下,用户的面部图像可能会被滥用,从而导致潜在的数据泄露和误用。本综述全面回顾了当前旨在保护云服务中面部图像隐私的方法,并将这些方法分为两大类:基于图像模糊化的方法和基于对抗扰动的方法。我们对这两类方法进行了深入分析,并提供了定性和定量的有效性比较。此外,我们还指出了未解决的挑战并提出了未来的研究方向,以改善云计算环境中的隐私保护措施。
https://arxiv.org/abs/2501.08665
In this position paper, we review the eclectic recent history of academic and artistic works involving computational systems for humor generation, and focus specifically on live performance. We make the case that AI comedy should be evaluated in live conditions, in front of audiences sharing either physical or online spaces, and under real-time constraints. We further suggest that improvised comedy is therefore the perfect substrate for deploying and assessing computational humor systems. Using examples of successful AI-infused shows, we demonstrate that live performance raises three sets of challenges for computational humor generation: 1) questions around robotic embodiment, anthropomorphism and competition between humans and machines, 2) questions around comedic timing and the nature of audience interaction, and 3) questions about the human interpretation of seemingly absurd AI-generated humor. We argue that these questions impact the choice of methodologies for evaluating computational humor, as any such method needs to work around the constraints of live audiences and performance spaces. These interrogations also highlight different types of collaborative relationship of human comedians towards AI tools.
在这篇立场论文中,我们回顾了涉及幽默生成计算系统学术和艺术作品的多样化近期历史,并特别关注现场表演。我们认为应该在观众身处实体或在线空间的真实条件下评估AI喜剧效果。此外,我们还建议即兴喜剧是部署和评估计算幽默系统的理想平台。通过成功融合AI的演出示例,我们证明了现场表演为计算幽默生成提出了三个方面的挑战:1)围绕机器人体现、拟人化以及人类与机器竞争的问题;2)围绕喜剧节奏及观众互动性质的问题;3)关于人类对看似荒诞的AI生成幽默的理解问题。我们认为这些问题会影响评估计算幽默的方法选择,因为任何方法都需要克服现场观众和表演空间的实际限制。这些质疑还凸显了人类喜剧演员与其使用AI工具之间不同类型的合作关系。
https://arxiv.org/abs/2501.08474
Unlocking the potential of Large Language Models (LLMs) in data classification represents a promising frontier in natural language processing. In this work, we evaluate the performance of different LLMs in comparison with state-of-the-art deep-learning and machine-learning models, in two different classification scenarios: i) the classification of employees' working locations based on job reviews posted online (multiclass classification), and 2) the classification of news articles as fake or not (binary classification). Our analysis encompasses a diverse range of language models differentiating in size, quantization, and architecture. We explore the impact of alternative prompting techniques and evaluate the models based on the weighted F1-score. Also, we examine the trade-off between performance (F1-score) and time (inference response time) for each language model to provide a more nuanced understanding of each model's practical applicability. Our work reveals significant variations in model responses based on the prompting strategies. We find that LLMs, particularly Llama3 and GPT-4, can outperform traditional methods in complex classification tasks, such as multiclass classification, though at the cost of longer inference times. In contrast, simpler ML models offer better performance-to-time trade-offs in simpler binary classification tasks.
解锁大型语言模型(LLMs)在数据分类中的潜力代表了自然语言处理领域的一个有前景的前沿方向。在这项工作中,我们评估了不同LLM与最先进的深度学习和机器学习模型相比,在两种不同的分类场景下的性能:i) 根据在线发布的职位评价来分类员工的工作地点(多类分类),以及 2) 将新闻文章分类为假新闻或非假新闻(二元分类)。我们的分析涵盖了各种在大小、量化和架构上有所区别的语言模型。我们探讨了不同的提示技术的影响,并根据加权F1分数对这些模型进行评估。此外,我们还考察了性能(F1得分)与时间(推理响应时间)之间的权衡关系,以提供每种语言模型实际应用中更细致的理解。我们的研究揭示了基于不同提示策略的模型响应存在显著差异。我们发现,特别是Llama3和GPT-4这样的LLM在复杂的分类任务(如多类分类)上可以超越传统方法,尽管这会带来推理时间较长的问题。相反,在较为简单的二元分类任务中,简单的机器学习模型提供了更好的性能与时间的权衡关系。
https://arxiv.org/abs/2501.08457
Reliance on anonymity in social media has increased its popularity on these platforms among all ages. The availability of public Wi-Fi networks has facilitated a vast variety of online content, including social media applications. Although anonymity and ease of access can be a convenient means of communication for their users, it is difficult to manage and protect its vulnerable users against sexual predators. Using an automated identification system that can attribute predators to their text would make the solution more attainable. In this survey, we provide a review of the methods of pedophile attribution used in social media platforms. We examine the effect of the size of the suspect set and the length of the text on the task of attribution. Moreover, we review the most-used datasets, features, classification techniques and performance measures for attributing sexual predators. We found that few studies have proposed tools to mitigate the risk of online sexual predators, but none of them can provide suspect attribution. Finally, we list several open research problems.
社交媒体中依赖匿名性的现象增加了各年龄段用户在这些平台上的流行度。公共Wi-Fi网络的可用性促进了大量在线内容的发展,包括社交应用程序。虽然匿名性和易访问性可以为用户提供方便的交流手段,但很难管理和保护其脆弱用户免受性侵者侵害。使用自动化识别系统来将性侵者与其文本关联起来会使解决方案更加可行。在这项调查中,我们回顾了在社交媒体平台上用于确定恋童癖者的各种方法。我们分析了嫌疑对象集合大小和文本长度对归因任务的影响,并且审查了最常用的数据库、特征分类技术以及评估属性性侵者表现的测量标准。我们发现虽然有一些研究提出了减少网上性侵犯风险的工具,但没有一个可以提供嫌疑人归属信息。最后,我们列出了一些开放的研究问题。
https://arxiv.org/abs/2501.08296
Artificial Intelligence (AI) spreads quickly as new technologies and services take over modern society. The need to regulate AI design, development, and use is strictly necessary to avoid unethical and potentially dangerous consequences to humans. The European Union (EU) has released a new legal framework, the AI Act, to regulate AI by undertaking a risk-based approach to safeguard humans during interaction. At the same time, researchers offer a new perspective on AI systems, commonly known as Human-Centred AI (HCAI), highlighting the need for a human-centred approach to their design. In this context, Symbiotic AI (a subtype of HCAI) promises to enhance human capabilities through a deeper and continuous collaboration between human intelligence and AI. This article presents the results of a Systematic Literature Review (SLR) that aims to identify principles that characterise the design and development of Symbiotic AI systems while considering humans as the core of the process. Through content analysis, four principles emerged from the review that must be applied to create Human-Centred AI systems that can establish a symbiotic relationship with humans. In addition, current trends and challenges were defined to indicate open questions that may guide future research for the development of SAI systems that comply with the AI Act.
人工智能(AI)在新技术和服务的推动下迅速扩散,渗透到现代社会中。为避免因不道德和潜在危险的应用而导致对人类的危害,规范AI的设计、开发与使用变得严格必要。欧盟发布了一项新的法律框架——《人工智能法案》(AI Act),采用基于风险的方法来保护人在互动过程中的安全。与此同时,研究人员提供了一个关于AI系统的全新视角,通常被称为以人为本的人工智能(HCAI),强调设计时需要采取以人类为中心的方针。 在这种背景下,共生型人工智能(Symbiotic AI,一种HCAI子类型)承诺通过人机更深入和持续的合作来增强人的能力。本文介绍了系统文献回顾(SLR)的结果,该研究旨在识别在设计与开发共生AI系统过程中必须考虑的人类核心角色的原则。通过对文献内容的分析,确立了四项原则,这四项原则需要被应用以创造能与人类建立共生关系的人本中心型人工智能系统。 此外,文章还定义了一些当前的趋势和挑战,指出了可能引导未来研究的问题,这些问题将有助于开发符合《AI法案》要求的SAI(Symbiotic AI)系统。
https://arxiv.org/abs/2501.08046
Image Super-Resolution (SR) aims to recover a high-resolution image from its low-resolution counterpart, which has been affected by a specific degradation process. This is achieved by enhancing detail and visual quality. Recent advancements in transformer-based methods have remolded image super-resolution by enabling high-quality reconstructions surpassing previous deep-learning approaches like CNN and GAN-based. This effectively addresses the limitations of previous methods, such as limited receptive fields, poor global context capture, and challenges in high-frequency detail recovery. Additionally, the paper reviews recent trends and advancements in transformer-based SR models, exploring various innovative techniques and architectures that combine transformers with traditional networks to balance global and local contexts. These neoteric methods are critically analyzed, revealing promising yet unexplored gaps and potential directions for future research. Several visualizations of models and techniques are included to foster a holistic understanding of recent trends. This work seeks to offer a structured roadmap for researchers at the forefront of deep learning, specifically exploring the impact of transformers on super-resolution techniques.
图像超分辨率(SR)的目标是从其低分辨率的版本中恢复出高质量的高分辨率图像,该低分辨率图像已经经历了一个特定的退化过程。这通过增强细节和视觉质量来实现。近年来基于变压器的方法取得了显著进展,使超高分辨率重建成为可能,这些方法超越了之前基于深度学习的技术如卷积神经网络(CNN)和生成对抗网络(GAN)。这种方法有效地解决了以前技术的局限性,比如受限的感受野、捕捉全局上下文的能力差以及恢复高频细节方面的挑战。 此外,该论文回顾了基于变压器的超分辨率模型的最新趋势和发展,探讨了一系列结合传统网络架构与变压器创新方法的技术。这些新方法被批判性地分析,揭示出有前景但尚未探索的研究缺口及未来研究方向。文中包括多个模型和技术的可视化图表,以促进对最近发展趋势的整体理解。 这项工作旨在为处于深度学习前沿领域的研究人员提供一个结构化的路线图,并特别探讨了变压器在超分辨率技术上的影响。
https://arxiv.org/abs/2501.07855