This paper aims to explore the evolution of image denoising in a pedagological way. We briefly review classical methods such as Fourier analysis and wavelet bases, highlighting the challenges they faced until the emergence of neural networks, notably the U-Net, in the 2010s. The remarkable performance of these networks has been demonstrated in studies such as Kadkhodaie et al. (2024). They exhibit adaptability to various image types, including those with fixed regularity, facial images, and bedroom scenes, achieving optimal results and biased towards geometry-adaptive harmonic basis. The introduction of score diffusion has played a crucial role in image generation. In this context, denoising becomes essential as it facilitates the estimation of probability density scores. We discuss the prerequisites for genuine learning of probability densities, offering insights that extend from mathematical research to the implications of universal structures.
本文旨在以教育性的方式探讨图像去噪的演变。我们简要回顾了经典方法,如傅里叶分析和小波基,并着重指出它们在21世纪之前所面临到的挑战,特别是U-Net。这些网络的非凡性能已在像Kadkhodaie等人(2024)这样的研究中得到证实。它们表现出对各种图像类型的适应性,包括具有固定规范的图像、面部图像和卧室场景,实现最佳结果并倾向于几何自适应小波基。引入分数扩散在图像生成中发挥了关键作用。在这种背景下,去噪变得至关重要,因为它有助于概率密度分数的估计。我们讨论了真正学习概率密度的先决条件,将数学研究的见解扩展到普遍结构的意义上。
https://arxiv.org/abs/2404.16617
Objectives: This study aims to systematically review the literature on the computational processing of the language of pain, whether generated by patients or physicians, identifying current trends and challenges. Methods: Following the PRISMA guidelines, a comprehensive literature search was conducted to select relevant studies on the computational processing of the language of pain and answer pre-defined research questions. Data extraction and synthesis were performed to categorize selected studies according to their primary purpose and outcome, patient and pain population, textual data, computational methodology, and outcome targets. Results: Physician-generated language of pain, specifically from clinical notes, was the most used data. Tasks included patient diagnosis and triaging, identification of pain mentions, treatment response prediction, biomedical entity extraction, correlation of linguistic features with clinical states, and lexico-semantic analysis of pain narratives. Only one study included previous linguistic knowledge on pain utterances in their experimental setup. Most studies targeted their outcomes for physicians, either directly as clinical tools or as indirect knowledge. The least targeted stage of clinical pain care was self-management, in which patients are most involved. The least studied dimensions of pain were affective and sociocultural. Only two studies measured how physician performance on clinical tasks improved with the inclusion of the proposed algorithm. Discussion: This study found that future research should focus on analyzing patient-generated language of pain, developing patient-centered resources for self-management and patient-empowerment, exploring affective and sociocultural aspects of pain, and measuring improvements in physician performance when aided by the proposed tools.
研究目标:本研究旨在系统地回顾有关疼痛语言计算的相关文献,无论是由患者还是医生产生的,以识别当前的趋势和挑战。方法:遵循PRISMA指南,进行全面的文献搜索,以选择与疼痛语言计算相关的研究,并回答预先设定的研究问题。数据提取和合成是将所选研究根据其主要目的和结果、患者和痛苦人群、文本数据、计算方法以及结果目标进行分类的过程。结果:医生产生的疼痛语言,特别是从病历中提取的数据,是最常用的数据。任务包括患者诊断和分诊、疼痛提及的识别、治疗反应预测、生物医学实体提取、语言特征与临床状态的关联以及疼痛叙述的词汇-语义分析。只有1篇论文包括了他们在实验设置中之前对疼痛语句的语言知识。大多数研究将重点放在医生身上,无论是直接作为临床工具,还是作为间接知识。最少的针对性临床疼痛护理阶段是自我管理,其中患者最积极参与。最少的疼痛研究维度是情感和社会文化方面。只有2篇论文测量了医生在临床任务中表现随着所提出的算法的引入而改善。讨论:本研究发现,未来的研究应该集中于分析患者产生的疼痛语言,为自我管理和患者赋权开发基于患者的资源,探索疼痛的情感和社会文化方面,以及衡量医生在使用所提出的工具时的表现改善。
https://arxiv.org/abs/2404.16226
This paper reviews the NTIRE 2024 RAW Image Super-Resolution Challenge, highlighting the proposed solutions and results. New methods for RAW Super-Resolution could be essential in modern Image Signal Processing (ISP) pipelines, however, this problem is not as explored as in the RGB domain. Th goal of this challenge is to upscale RAW Bayer images by 2x, considering unknown degradations such as noise and blur. In the challenge, a total of 230 participants registered, and 45 submitted results during thee challenge period. The performance of the top-5 submissions is reviewed and provided here as a gauge for the current state-of-the-art in RAW Image Super-Resolution.
本文回顾了NTIRE 2024 RAW图像超分辨率挑战,重点关注所提出的解决方案和结果。在现代图像信号处理(ISP)流程中,RAW超分辨率的新方法可能至关重要,然而,与RGB领域相比,这个问题并没有被广泛探讨。挑战的目标是将RAW Bayer图像的分辨率提高2倍,考虑到未知的降噪和模糊等损失。在挑战期间,共有230名参与者注册,45名提交了结果。对挑战前五名提交者的性能进行了审查,并提供了一个用于评估RAW图像超分辨率当前状态的指标。
https://arxiv.org/abs/2404.16223
This paper reviews the AIS 2024 Video Quality Assessment (VQA) Challenge, focused on User-Generated Content (UGC). The aim of this challenge is to gather deep learning-based methods capable of estimating the perceptual quality of UGC videos. The user-generated videos from the YouTube UGC Dataset include diverse content (sports, games, lyrics, anime, etc.), quality and resolutions. The proposed methods must process 30 FHD frames under 1 second. In the challenge, a total of 102 participants registered, and 15 submitted code and models. The performance of the top-5 submissions is reviewed and provided here as a survey of diverse deep models for efficient video quality assessment of user-generated content.
本文回顾了 AIS 2024 视频质量评估(VQA)挑战,重点关注用户生成内容(UGC)。这一挑战的目标是收集基于深度学习的估算 UGC 视频感知质量的方法。来自 YouTube UGC 数据集的用户生成视频包括各种内容(体育、游戏、歌词、动漫等),质量和分辨率。所提出的方法必须在 1 秒内处理 30 FHD 帧。在挑战中,共有 102 名参与者注册,其中 15 名提交了代码和模型。对前五名提交者的性能进行了审查,并提供了一个调查不同深度模型用于有效评估用户生成内容视频质量的调查结果。
https://arxiv.org/abs/2404.16205
State space models (SSMs) with selection mechanisms and hardware-aware architectures, namely Mamba, have recently demonstrated significant promise in long-sequence modeling. Since the self-attention mechanism in transformers has quadratic complexity with image size and increasing computational demands, the researchers are now exploring how to adapt Mamba for computer vision tasks. This paper is the first comprehensive survey aiming to provide an in-depth analysis of Mamba models in the field of computer vision. It begins by exploring the foundational concepts contributing to Mamba's success, including the state space model framework, selection mechanisms, and hardware-aware design. Next, we review these vision mamba models by categorizing them into foundational ones and enhancing them with techniques such as convolution, recurrence, and attention to improve their sophistication. We further delve into the widespread applications of Mamba in vision tasks, which include their use as a backbone in various levels of vision processing. This encompasses general visual tasks, Medical visual tasks (e.g., 2D / 3D segmentation, classification, and image registration, etc.), and Remote Sensing visual tasks. We specially introduce general visual tasks from two levels: High/Mid-level vision (e.g., Object detection, Segmentation, Video classification, etc.) and Low-level vision (e.g., Image super-resolution, Image restoration, Visual generation, etc.). We hope this endeavor will spark additional interest within the community to address current challenges and further apply Mamba models in computer vision.
带有选择机制和硬件感知架构的状态空间模型(SSMs),如Mamba,在长序列建模方面最近取得了显著的进展。由于Transformer中自注意力机制的复杂性随着图像尺寸的增加而增加,计算机视觉任务的计算需求也在增加,因此研究人员现在正在探索如何将Mamba适应计算机视觉任务。本文是旨在为计算机视觉领域提供对Mamba模型的深入分析的第一篇全面调查。文章首先探讨了导致Mamba成功的基本概念,包括状态空间模型框架、选择机制和硬件感知设计。接下来,我们通过分类这些视觉Mamba模型为基本模型并使用卷积、递归和注意等技术对其进行改进,来回顾这些模型。我们深入探讨了Mamba在计算机视觉任务中的广泛应用,包括在各种级别视觉处理中的作为骨干的应用。这包括一般视觉任务(如物体检测、分割、分类和图像配准等)、医学视觉任务(如2D/3D分割、分类和图像配准等)和遥感视觉任务。我们特别引入了两个层面的通用视觉任务:高/中级别视觉(如物体检测、分割、视频分类等)和低级别视觉(如图像超分辨率、图像修复、视觉生成等)。我们希望这个努力将在社区中激发更多的兴趣,以解决当前的挑战并进一步将Mamba模型应用于计算机视觉。
https://arxiv.org/abs/2404.15956
Systematic review (SR) is a popular research method in software engineering (SE). However, conducting an SR takes an average of 67 weeks. Thus, automating any step of the SR process could reduce the effort associated with SRs. Our objective is to investigate if Large Language Models (LLMs) can accelerate title-abstract screening by simplifying abstracts for human screeners, and automating title-abstract screening. We performed an experiment where humans screened titles and abstracts for 20 papers with both original and simplified abstracts from a prior SR. The experiment with human screeners was reproduced with GPT-3.5 and GPT-4 LLMs to perform the same screening tasks. We also studied if different prompting techniques (Zero-shot (ZS), One-shot (OS), Few-shot (FS), and Few-shot with Chain-of-Thought (FS-CoT)) improve the screening performance of LLMs. Lastly, we studied if redesigning the prompt used in the LLM reproduction of screening leads to improved performance. Text simplification did not increase the screeners' screening performance, but reduced the time used in screening. Screeners' scientific literacy skills and researcher status predict screening performance. Some LLM and prompt combinations perform as well as human screeners in the screening tasks. Our results indicate that the GPT-4 LLM is better than its predecessor, GPT-3.5. Additionally, Few-shot and One-shot prompting outperforms Zero-shot prompting. Using LLMs for text simplification in the screening process does not significantly improve human performance. Using LLMs to automate title-abstract screening seems promising, but current LLMs are not significantly more accurate than human screeners. To recommend the use of LLMs in the screening process of SRs, more research is needed. We recommend future SR studies publish replication packages with screening data to enable more conclusive experimenting with LLM screening.
系统综述法(SR)是软件工程领域(SE)中的一种流行研究方法。然而,进行SR平均需要67周的时间。因此,自动化SR过程中任何步骤都可能减少与SR相关的努力。我们的目标是调查大型语言模型(LLMs)是否可以通过简化摘要,从而加速标题摘要筛选,并自动化标题摘要筛选。我们进行了一项实验,其中人类对20篇具有原始和简化摘要的论文进行了筛选。使用人类筛选者和基于GPT-3.5和GPT-4的LLM进行了相同筛选任务。我们还研究了不同的提示技术(零击(ZS)、一次击(OS)、少量击(FS)和少量击与思考(FS-CoT))是否改善LLM的筛选性能。最后,我们研究了在LLM复制筛选提示的使用是否会导致性能提升。虽然文本简化没有提高筛选者的性能,但减少了筛选所需的时间。筛选者的科学素养和研究者身份预测了筛选绩效。一些LLM和提示组合在筛选任务中表现与人类筛选者相当。我们的结果表明,GPT-4 LLM比其前任GPT-3.5更好。此外,少量击和一次击提示优于零击提示。在筛选过程中使用LLM进行文本简化并没有显著提高人类性能。使用LLM自动进行标题摘要筛选看起来很有前途,但目前的LLM并没有比人类筛选者更准确。为了推荐在SR筛选过程中使用LLM,还需要进行更多的研究。我们建议,未来的SR研究者在SR研究中发布带有筛选数据的复制包,以促进更确凿的尝试使用LLM进行筛选。
https://arxiv.org/abs/2404.15667
The availability of software which can produce convincing yet synthetic media poses both threats and benefits to tertiary education globally. While other forms of synthetic media exist, this study focuses on deepfakes, which are advanced Generative AI (GenAI) fakes of real people. This conceptual paper assesses the current literature on deepfakes across multiple disciplines by conducting an initial scoping review of 182 peer-reviewed publications. The review reveals three major trends: detection methods, malicious applications, and potential benefits, although no specific studies on deepfakes in the tertiary educational context were found. Following a discussion of these trends, this study applies the findings to postulate the major risks and potential mitigation strategies of deepfake technologies in higher education, as well as potential beneficial uses to aid the teaching and learning of both deepfakes and synthetic media. This culminates in the proposal of a research agenda to build a comprehensive, cross-cultural approach to investigate deepfakes in higher education.
软件生产可信且合成媒体的存在既带来了威胁又带来了好处,对全球高等教育。虽然存在其他形式的合成媒体,但本研究重点关注深度伪造(Deepfakes),这是真实人的高级生成人工智能(GenAI)伪造。本文通过对182篇经同行评审的出版物进行初步范围审查,评估了当前关于深度伪造的文献。审查揭示了三个主要趋势:检测方法、恶意应用和潜在好处,尽管在高等教育背景下没有发现关于深度伪造的具体研究。在讨论这些趋势之后,本研究将研究结果应用于提出高等教育中深度伪造技术的主要风险和潜在缓解策略,以及潜在有益于帮助深度伪造和合成媒体的教学和学习。这导致了对在高等教育中研究深度伪造的研究议程的提议,以建立一种全面、跨文化的方法来研究深度伪造在高等教育中的影响。
https://arxiv.org/abs/2404.15601
The rapidly changing architecture and functionality of electrical networks and the increasing penetration of renewable and distributed energy resources have resulted in various technological and managerial challenges. These have rendered traditional centralized energy-market paradigms insufficient due to their inability to support the dynamic and evolving nature of the network. This survey explores how multi-agent reinforcement learning (MARL) can support the decentralization and decarbonization of energy networks and mitigate the 12 associated challenges. This is achieved by specifying key computational challenges in managing energy networks, reviewing recent research progress on addressing them, and highlighting open challenges that may be addressed using MARL.
由于电力网络 rapidly变化的建筑和功能,以及可再生能源和分布式能源资源的日益普及,产生了各种技术和管理挑战。这使得传统集中式能源市场范式由于无法支持网络的动态和演变性质而变得不足。本调查探讨了多智能体强化学习(MARL)如何支持能源网络的分散化和脱碳,并减轻与12个相关挑战相关的负担。这是通过指定管理能源网络的关键计算挑战,回顾针对这些挑战的最近研究进展,并强调可以使用MARL解决的开放挑战来实现的。
https://arxiv.org/abs/2404.15583
Summarizing comparative opinions about entities (e.g., hotels, phones) from a set of source reviews, often referred to as contrastive summarization, can considerably aid users in decision making. However, reliably measuring the contrastiveness of the output summaries without relying on human evaluations remains an open problem. Prior work has proposed token-overlap based metrics, Distinctiveness Score, to measure contrast which does not take into account the sensitivity to meaning-preserving lexical variations. In this work, we propose an automated evaluation metric CASPR to better measure contrast between a pair of summaries. Our metric is based on a simple and light-weight method that leverages natural language inference (NLI) task to measure contrast by segmenting reviews into single-claim sentences and carefully aggregating NLI scores between them to come up with a summary-level score. We compare CASPR with Distinctiveness Score and a simple yet powerful baseline based on BERTScore. Our results on a prior dataset CoCoTRIP demonstrate that CASPR can more reliably capture the contrastiveness of the summary pairs compared to the baselines.
概括比较性意见的翻译(例如,酒店,电话)从一个评论集的来源中进行,通常称为对比性总结,可以帮助用户在决策过程中做出重大贡献。然而,在没有人类评价的基础上可靠地测量输出摘要的对比性仍然是一个开放问题。之前的工作已经提出了基于词重叠的度量标准, distinctiveness score,来衡量对比,但并没有考虑到意义保持词形变化的敏感性。在这项工作中,我们提出了一种自动评估指标CASPR,以更好地衡量摘要对摘要的对比。我们的指标基于一种简单而轻量级的方法,利用自然语言推理(NLI)任务来测量对比,通过将评论分为单个主张句并仔细聚合NLI得分来得出总结级别得分。我们比较CASPR与Distinctiveness Score和基于BERTScore的一种简单而强大的基线。我们在CoCoTRIP先前数据集上的结果表明,CASPR比基线更可靠地捕捉摘要对摘要的对比性。
https://arxiv.org/abs/2404.15565
Smart home systems are gaining popularity as homeowners strive to enhance their living and working environments while minimizing energy consumption. However, the adoption of artificial intelligence (AI)-enabled decision-making models in smart home systems faces challenges due to the complexity and black-box nature of these systems, leading to concerns about explainability, trust, transparency, accountability, and fairness. The emerging field of explainable artificial intelligence (XAI) addresses these issues by providing explanations for the models' decisions and actions. While state-of-the-art XAI methods are beneficial for AI developers and practitioners, they may not be easily understood by general users, particularly household members. This paper advocates for human-centered XAI methods, emphasizing the importance of delivering readily comprehensible explanations to enhance user satisfaction and drive the adoption of smart home systems. We review state-of-the-art XAI methods and prior studies focusing on human-centered explanations for general users in the context of smart home applications. Through experiments on two smart home application scenarios, we demonstrate that explanations generated by prominent XAI techniques might not be effective in helping users understand and make decisions. We thus argue for the necessity of a human-centric approach in representing explanations in smart home systems and highlight relevant human-computer interaction (HCI) methodologies, including user studies, prototyping, technology probes analysis, and heuristic evaluation, that can be employed to generate and present human-centered explanations to users.
智能家居系统正因业主努力提高生活和工作环境的同时最小化能源消耗而受到欢迎。然而,在智能家居系统中采用具有人工智能(AI)决策模式的情况遇到了挑战,因为这些系统的复杂性和黑盒性质导致了对可解释性、信任、透明度、责任和公平性的担忧。新兴的AI解释领域通过为模型决策和动作提供解释来解决这些问题。尽管最先进的XAI方法对AI开发人员和实践者有益,但它们可能不容易被普通用户理解,尤其是家庭成员。本文主张采用以人为中心的人工智能(XAI)方法,强调向用户提供易于理解和推动智能家居系统采用的重要性。我们回顾了智能家居应用场景中以人为中心的解释方法的前沿研究,并通过实验研究两个智能家居应用场景,论证了由著名XAI技术生成的解释可能无法帮助用户理解和做出决策。因此,我们认为在智能家居系统中采用以人为中心的方法来表示解释是必要的,并强调了可以采用的用户中心人工智能(HCI)方法,包括用户研究、原型设计、技术探测分析和对策评估等,以向用户提供和呈现以人为中心的解释。
https://arxiv.org/abs/2404.16074
Human decision-making often relies on visual information from multiple perspectives or views. In contrast, machine learning-based object recognition utilizes information from a single image of the object. However, the information conveyed by a single image may not be sufficient for accurate decision-making, particularly in complex recognition problems. The utilization of multi-view 3D representations for object recognition has thus far demonstrated the most promising results for achieving state-of-the-art performance. This review paper comprehensively covers recent progress in multi-view 3D object recognition methods for 3D classification and retrieval tasks. Specifically, we focus on deep learning-based and transformer-based techniques, as they are widely utilized and have achieved state-of-the-art performance. We provide detailed information about existing deep learning-based and transformer-based multi-view 3D object recognition models, including the most commonly used 3D datasets, camera configurations and number of views, view selection strategies, pre-trained CNN architectures, fusion strategies, and recognition performance on 3D classification and 3D retrieval tasks. Additionally, we examine various computer vision applications that use multi-view classification. Finally, we highlight key findings and future directions for developing multi-view 3D object recognition methods to provide readers with a comprehensive understanding of the field.
人类决策通常依赖于来自多个视角或视图的视觉信息。相比之下,基于机器学习的物体识别利用了一个物体的单张图像中的信息。然而,单个图像中传递的信息可能不足以实现准确的决策,尤其是在复杂识别问题中。因此,多视角 3D 表示用于物体识别已经证明为实现最先进的性能提供了最有前途的结果。 本文回顾了多视角 3D 物体识别方法在 3D 分化和检索任务中的最新进展。具体来说,我们关注基于深度学习和Transformer 的技术,因为它们得到了广泛应用并取得了最先进的成绩。我们提供了关于现有基于深度学习和Transformer 的多视角 3D 物体识别模型的详细信息,包括最常用的 3D 数据集、相机配置和视角数量、视角选择策略、预训练 CNN 架构、融合策略以及关于分类和检索任务的识别性能。此外,我们研究了各种使用多视角分类的计算机视觉应用。最后,我们重点关注了在开发多视角 3D 物体识别方法方面的一些关键发现和未来方向,以提供读者全面的了解该领域的理解。
https://arxiv.org/abs/2404.15224
Multimodal medical imaging plays a pivotal role in clinical diagnosis and research, as it combines information from various imaging modalities to provide a more comprehensive understanding of the underlying pathology. Recently, deep learning-based multimodal fusion techniques have emerged as powerful tools for improving medical image classification. This review offers a thorough analysis of the developments in deep learning-based multimodal fusion for medical classification tasks. We explore the complementary relationships among prevalent clinical modalities and outline three main fusion schemes for multimodal classification networks: input fusion, intermediate fusion (encompassing single-level fusion, hierarchical fusion, and attention-based fusion), and output fusion. By evaluating the performance of these fusion techniques, we provide insight into the suitability of different network architectures for various multimodal fusion scenarios and application domains. Furthermore, we delve into challenges related to network architecture selection, handling incomplete multimodal data management, and the potential limitations of multimodal fusion. Finally, we spotlight the promising future of Transformer-based multimodal fusion techniques and give recommendations for future research in this rapidly evolving field.
多模态医疗影像在临床诊断和研究中扮演着至关重要的角色,因为它结合了各种影像模态的信息,提供更全面的病理解剖学理解。近年来,基于深度学习的多模态融合技术已成为提高医学图像分类的强大工具。本文对基于深度学习的多模态融合在医学分类任务的发展进行了全面的分析。我们探讨了主要临床模态之间的互补关系,并提出了三种主要的融合方案:输入融合、中间融合(包括单层融合、层次融合和基于注意力的融合)和输出融合。通过评估这些融合技术的性能,我们提供了对各种多模态融合场景和应用领域的适用网络架构的洞察。此外,我们还深入探讨了与网络架构选择、处理不完整的多模态数据管理以及多模态融合的潜在限制相关的问题。最后,我们重点关注了基于Transformer的多模态融合技术的光明未来,并给未来在这个快速发展的领域的研究提出了建议。
https://arxiv.org/abs/2404.15022
Hyperspectral image classification is a challenging task due to the high dimensionality and complex nature of hyperspectral data. In recent years, deep learning techniques have emerged as powerful tools for addressing these challenges. This survey provides a comprehensive overview of the current trends and future prospects in hyperspectral image classification, focusing on the advancements from deep learning models to the emerging use of transformers. We review the key concepts, methodologies, and state-of-the-art approaches in deep learning for hyperspectral image classification. Additionally, we discuss the potential of transformer-based models in this field and highlight the advantages and challenges associated with these approaches. Comprehensive experimental results have been undertaken using three Hyperspectral datasets to verify the efficacy of various conventional deep-learning models and Transformers. Finally, we outline future research directions and potential applications that can further enhance the accuracy and efficiency of hyperspectral image classification. The Source code is available at this https URL.
超分辨率图像分类是一个具有高维度和复杂性的挑战性的任务。近年来,深度学习技术已成为解决这些挑战的强大工具。本文对当前 hyperspectral 图像分类的趋势和未来前景进行全面概述,重点关注从深度学习模型到新兴的 transformer 的应用。我们回顾了用于 hyperspectral 图像分类的深度学习中的关键概念、方法和最先进的策略。此外,我们讨论了基于 transformer 的模型的潜力,并强调了这些方法的优势和挑战。使用三个超分辨率数据集进行了全面实验,以验证各种传统深度学习模型和 transformer 的有效性。最后,我们概述了未来研究的方向和可能的应用,以进一步增强超分辨率图像分类的准确性和效率。源代码可在此处访问:https://www.example.com/。
https://arxiv.org/abs/2404.14955
Graphs play an important role in representing complex relationships in various domains like social networks, knowledge graphs, and molecular discovery. With the advent of deep learning, Graph Neural Networks (GNNs) have emerged as a cornerstone in Graph Machine Learning (Graph ML), facilitating the representation and processing of graph structures. Recently, LLMs have demonstrated unprecedented capabilities in language tasks and are widely adopted in a variety of applications such as computer vision and recommender systems. This remarkable success has also attracted interest in applying LLMs to the graph domain. Increasing efforts have been made to explore the potential of LLMs in advancing Graph ML's generalization, transferability, and few-shot learning ability. Meanwhile, graphs, especially knowledge graphs, are rich in reliable factual knowledge, which can be utilized to enhance the reasoning capabilities of LLMs and potentially alleviate their limitations such as hallucinations and the lack of explainability. Given the rapid progress of this research direction, a systematic review summarizing the latest advancements for Graph ML in the era of LLMs is necessary to provide an in-depth understanding to researchers and practitioners. Therefore, in this survey, we first review the recent developments in Graph ML. We then explore how LLMs can be utilized to enhance the quality of graph features, alleviate the reliance on labeled data, and address challenges such as graph heterogeneity and out-of-distribution (OOD) generalization. Afterward, we delve into how graphs can enhance LLMs, highlighting their abilities to enhance LLM pre-training and inference. Furthermore, we investigate various applications and discuss the potential future directions in this promising field.
图在表示复杂关系方面在社交网络、知识图谱和分子发现等领域中发挥着重要作用。随着深度学习的出现,图神经网络(GNNs)成为图机器学习(Graph ML)的一个支柱,推动了图结构的表示和处理。近年来,LLM在语言任务上的表现已经达到了史无前例的水平,并在各种应用领域(如计算机视觉和推荐系统)得到了广泛应用。这一显著的成功也引起了将LLM应用于图形领域的兴趣。越来越多的努力致力于探索LLM在推动图机器学习的一般化、可迁移性和少样本学习能力方面的潜力。同时,特别是知识图谱,图形在可靠的事实知识方面非常丰富,可以利用来增强LLM的推理能力,并可能减轻其局限性,如幻觉和缺乏可解释性。鉴于这一研究领域的快速进步,对于LLM时代图机器学习的系统综述总结最新的进展是必要的,以提供研究人员和实践者对这一领域的深入理解。因此,在本次调查中,我们首先回顾了图机器学习领域的最新发展。然后,我们探讨了LLM如何用于提高图形特征的质量、减轻对标注数据的依赖以及解决诸如图形异质性和离散(OOD)泛化等问题。接着,我们深入研究了图形如何增强LLM,强调了它们在提高LLM预训练和推理能力方面的能力。最后,我们调查了各种应用,并讨论了这一充满前景的领域未来的潜在方向。
https://arxiv.org/abs/2404.14928
With the increasingly giant scales of (causal) large language models (LLMs), the inference efficiency comes as one of the core concerns along the improved performance. In contrast to the memory footprint, the latency bottleneck seems to be of greater importance as there can be billions of requests to a LLM (e.g., GPT-4) per day. The bottleneck is mainly due to the autoregressive innateness of LLMs, where tokens can only be generated sequentially during decoding. To alleviate the bottleneck, the idea of speculative execution, which originates from the field of computer architecture, is introduced to LLM decoding in a \textit{draft-then-verify} style. Under this regime, a sequence of tokens will be drafted in a fast pace by utilizing some heuristics, and then the tokens shall be verified in parallel by the LLM. As the costly sequential inference is parallelized, LLM decoding speed can be significantly boosted. Driven by the success of LLMs in recent couple of years, a growing literature in this direction has emerged. Yet, there lacks a position survey to summarize the current landscape and draw a roadmap for future development of this promising area. To meet this demand, we present the very first survey paper that reviews and unifies literature of speculative execution in LLMs (e.g., blockwise parallel decoding, speculative decoding, etc.) in a comprehensive framework and a systematic taxonomy. Based on the taxonomy, we present a critical review and comparative analysis of the current arts. Finally we highlight various key challenges and future directions to further develop the area.
随着大型语言模型(LLMs)越来越大,提高性能的核心问题之一是推理效率。相比之下,内存开销似乎不太重要,因为每天可能有数十亿个请求到LLM(例如GPT-4)。瓶颈主要源于LLMs的自回归性质,其中在解码过程中只能按顺序生成标记。为了减轻瓶颈,借鉴计算机架构领域的思想,以“草案-验证”的方式引入了LLM解码中的speculative execution。在这种模式下,通过使用一些启发式方法,可以快速生成一系列标记,然后由LLM并行验证这些标记。随着成本sequential inference的并行化,LLM解码速度可以大幅提高。 在LLM在过去几年取得成功的情况下,这一方向出现了越来越多的文献。然而,目前尚缺乏一份全面的调查报告,总结当前格局并为未来这个有前景的领域的发展路线图。为了满足这一需求,我们提出了第一篇 survey 论文,它回顾和统一了LLMs中speculative execution(例如块式并行解码,speculative decoding等)的文獻,并建立了一个全面的框架和系统分类学。根据这一分类学,我们给出了对当前艺术的关键审查和比较分析。最后,我们强调了进一步发展和该领域的各种关键挑战和未来方向。
https://arxiv.org/abs/2404.14897
Information Retrieval (IR) systems are crucial tools for users to access information, widely applied in scenarios like search engines, question answering, and recommendation systems. Traditional IR methods, based on similarity matching to return ranked lists of documents, have been reliable means of information acquisition, dominating the IR field for years. With the advancement of pre-trained language models, generative information retrieval (GenIR) has emerged as a novel paradigm, gaining increasing attention in recent years. Currently, research in GenIR can be categorized into two aspects: generative document retrieval (GR) and reliable response generation. GR leverages the generative model's parameters for memorizing documents, enabling retrieval by directly generating relevant document identifiers without explicit indexing. Reliable response generation, on the other hand, employs language models to directly generate the information users seek, breaking the limitations of traditional IR in terms of document granularity and relevance matching, offering more flexibility, efficiency, and creativity, thus better meeting practical needs. This paper aims to systematically review the latest research progress in GenIR. We will summarize the advancements in GR regarding model training, document identifier, incremental learning, downstream tasks adaptation, multi-modal GR and generative recommendation, as well as progress in reliable response generation in aspects of internal knowledge memorization, external knowledge augmentation, generating response with citations and personal information assistant. We also review the evaluation, challenges and future prospects in GenIR systems. This review aims to offer a comprehensive reference for researchers in the GenIR field, encouraging further development in this area.
信息检索(IR)系统对于用户访问信息至关重要,在搜索引擎、问题回答和推荐系统等场景中得到了广泛应用。传统的IR方法,基于相似度匹配返回排名文档的排名列表,是不可靠的信息获取手段,多年来一直是IR领域的主导。随着预训练语言模型的进步,生成式信息检索(GenIR)作为一种新颖的范式应运而生,并在近年来受到了越来越多的关注。目前,GenIR的研究可以分为两个方面:生成式文档检索(GR)和可靠响应生成。GR利用生成模型的参数进行记忆文档,直接生成相关文档标识,无需显式索引。另一方面,可靠响应生成采用语言模型直接生成用户所需的信息,打破了传统IR在文档粒度和相关性匹配方面的限制,提供了更多的灵活性、效率和创新,从而更好地满足实际需求。本文旨在系统地回顾GenIR的最新研究进展。我们将总结GR关于模型训练、文档标识、增量学习、下游任务适应、多模态GR和生成式推荐以及可靠响应生成的最新进展,以及GR在内部知识记忆、外部知识增强和生成带有引用和个人信息助手回应方面的进展。我们还回顾了GenIR系统的评估、挑战和未来前景。本 review旨在为GenIR领域的研究人员提供全面的参考,鼓励该领域进一步发展。
https://arxiv.org/abs/2404.14851
Voice is a natural mode of expression offered by modern computer-based systems. Qualitative perspectives on voice-based user experiences (voice UX) offer rich descriptions of complex interactions that numbers alone cannot fully represent. We conducted a systematic review of the literature on qualitative approaches to voice UX, capturing the nature of this body of work in a systematic map and offering a qualitative synthesis of findings. We highlight the benefits of qualitative methods for voice UX research, identify opportunities for increasing rigour in methods and outcomes, and distill patterns of experience across a diversity of devices and modes of qualitative praxis.
声音是现代基于计算机的系统所提供的自然表达方式。对基于声音的用户体验(voice UX)的定性视角提供了对复杂交互的丰富描述,而数字单独无法完全代表。我们对基于声音的用户体验定性方法进行了系统性的文献综述,系统地描述了这一研究领域的特点,并提供了定性的研究结论。我们强调了定性方法在声音 UX 研究中的优势,提出了在方法和结果上增加严谨性的机会,并从多样化的设备和定性实践模式中提炼出经验模式。
https://arxiv.org/abs/2404.14736
Prediction and optimisation are two widely used techniques that have found many applications in solving real-world problems. While prediction is concerned with estimating the unknown future values of a variable, optimisation is concerned with optimising the decision given all the available data. These methods are used together to solve problems for sequential decision-making where often we need to predict the future values of variables and then use them for determining the optimal decisions. This paradigm is known as forecast and optimise and has numerous applications, e.g., forecast demand for a product and then optimise inventory, forecast energy demand and schedule generations, forecast demand for a service and schedule staff, to name a few. In this extended abstract, we review a digital twin that was developed and applied in wastewater treatment in Urban Utility to improve their operational efficiency. While the current study is tailored to the case study problem, the underlying principles can be used to solve similar problems in other domains.
预测和优化是两种在解决现实问题中应用广泛的技术。预测关注于估计一个变量的未知未来值,而优化关注于在所有可用数据的基础上优化决策。这些方法一起用于解决需要进行序列决策的问题,其中我们需要预测变量的未来值,然后使用它们来确定最优决策。这个范式被称为预测和优化,具有许多应用,例如预测某种产品的需求并优化库存,预测能源需求并安排生产计划,预测某种服务的需求并安排工作人员等。在本扩展摘要中,我们回顾了在城市污水处理中开发和应用的数字孪生,以提高其运营效率。虽然本研究针对案例研究问题进行了优化,但背后的原则可以用于解决其他领域类似的问题。
https://arxiv.org/abs/2404.14635
Imaging sites around the world generate growing amounts of medical scan data with ever more versatile and affordable technology. Large-scale studies acquire MRI for tens of thousands of participants, together with metadata ranging from lifestyle questionnaires to biochemical assays, genetic analyses and more. These large datasets encode substantial information about human health and hold considerable potential for machine learning training and analysis. This chapter examines ongoing large-scale studies and the challenge of distribution shifts between them. Transfer learning for overcoming such shifts is discussed, together with federated learning for safe access to distributed training data securely held at multiple institutions. Finally, representation learning is reviewed as a methodology for encoding embeddings that express abstract relationships in multi-modal input formats.
在世界各地的成像站点生成越来越多的医疗扫描数据,并且使用越来越多样化和经济实惠的技术。大型研究项目可以获取数十万参与者的MRI数据,以及从生活方式问卷到生物化学检测、遗传分析等元数据。这些大型数据集编码了有关人类健康的大量信息,具有很大的机器学习训练和分析潜力。本章审查了正在进行的大型研究项目和它们之间的分布转移挑战。讨论了转移学习来克服这种转移,以及分散式学习安全地存储在多个机构上的分布式训练数据的访问。最后,对表示学习作为表示多模态输入格式中抽象关系的一种方法进行了回顾。
https://arxiv.org/abs/2404.14326
This paper reviews the NTIRE 2024 low light image enhancement challenge, highlighting the proposed solutions and results. The aim of this challenge is to discover an effective network design or solution capable of generating brighter, clearer, and visually appealing results when dealing with a variety of conditions, including ultra-high resolution (4K and beyond), non-uniform illumination, backlighting, extreme darkness, and night scenes. A notable total of 428 participants registered for the challenge, with 22 teams ultimately making valid submissions. This paper meticulously evaluates the state-of-the-art advancements in enhancing low-light images, reflecting the significant progress and creativity in this field.
本文回顾了NTIRE 2024低光图像增强挑战,重点介绍了所提出的解决方案和结果。该挑战的目标是发现一种有效的网络设计或解决方案,能够在处理各种情况下产生更亮、更清晰、更美观的结果,包括超高清分辨率(4K及更高)、非均匀照明、反光、极度黑暗和夜间场景。值得注意的是,共有428名参与者注册参加挑战,最终有22支队伍提出了有效的参赛作品。本文详细评估了提高低光图像效果的现有技术进步,反映了该领域在进步和创造力方面的重要性。
https://arxiv.org/abs/2404.14248