GAN

Boundary-aware Decoupled Flow Networks for Realistic Extreme Rescaling

2024-05-05 14:05:33

Jinmin Li, Tao Dai, Jingyun Zhang, Kang Liu, Jun Wang, Shaoming Wang, Shu-Tao Xia, rizen guo

arXiv_CV

arXiv_CV GAN Adversarial Pose
Abstract

Recently developed generative methods, including invertible rescaling network (IRN) based and generative adversarial network (GAN) based methods, have demonstrated exceptional performance in image rescaling. However, IRN-based methods tend to produce over-smoothed results, while GAN-based methods easily generate fake details, which thus hinders their real applications. To address this issue, we propose Boundary-aware Decoupled Flow Networks (BDFlow) to generate realistic and visually pleasing results. Unlike previous methods that model high-frequency information as standard Gaussian distribution directly, our BDFlow first decouples the high-frequency information into \textit{semantic high-frequency} that adheres to a Boundary distribution and \textit{non-semantic high-frequency} counterpart that adheres to a Gaussian distribution. Specifically, to capture semantic high-frequency parts accurately, we use Boundary-aware Mask (BAM) to constrain the model to produce rich textures, while non-semantic high-frequency part is randomly sampled from a Gaussian distribution.Comprehensive experiments demonstrate that our BDFlow significantly outperforms other state-of-the-art methods while maintaining lower complexity. Notably, our BDFlow improves the PSNR by $4.4$ dB and the SSIM by $0.1$ on average over GRAIN, utilizing only 74\% of the parameters and 20\% of the computation. The code will be available at this https URL.

Abstract (translated)

近年来，包括基于反向平滑网络（IRN）和基于生成对抗网络（GAN）的方法在内的发展起来的图像平滑方法在图像平滑方面表现出了出色的性能。然而，IRN方法往往产生过于平滑的结果，而GAN方法容易生成虚假细节，从而阻碍了其真实应用。为了解决这个问题，我们提出了边界感知分离流网络（BDFlow）以生成真实且观感良好的结果。与之前的方法不同，我们的BDFlow首先将高频率信息分解为语义高频率和 Gaussian 分布的 non-语义高频率对应物。具体来说，为了准确捕捉语义高频率部分，我们使用边界感知掩码（BAM）约束模型产生丰富纹理，而 non-语义高频率部分从 Gaussian 分布中随机采样。全面的实验证明，我们的BDFlow在保持较低复杂性的同时显著优于其他最先进的方法。值得注意的是，我们的BDFlow在GRAIN上提高了$4.4$ dB的 PSNR 值和$0.1$的 SSIM 值，只需使用 74% 的参数和 20% 的计算。代码将在此链接处提供：<https:// this URL>

URL

https://arxiv.org/abs/2405.02941

PDF

https://arxiv.org/pdf/2405.02941.pdf
Read All
SMCD: High Realism Motion Style Transfer via Mamba-based Diffusion

2024-05-05 08:28:07

Ziyun Qian, Zeyu Xiao, Zhenyi Wu, Dingkang Yang, Mingcheng Li, Shunli Wang, Shuaibing Wang, Dongliang Kou, Lihua Zhang

arXiv_CV

arXiv_CV GAN Style_Transfer Quantitative Pose Diffusion
Abstract

Motion style transfer is a significant research direction in multimedia applications. It enables the rapid switching of different styles of the same motion for virtual digital humans, thus vastly increasing the diversity and realism of movements. It is widely applied in multimedia scenarios such as movies, games, and the Metaverse. However, most of the current work in this field adopts the GAN, which may lead to instability and convergence issues, making the final generated motion sequence somewhat chaotic and unable to reflect a highly realistic and natural style. To address these problems, we consider style motion as a condition and propose the Style Motion Conditioned Diffusion (SMCD) framework for the first time, which can more comprehensively learn the style features of motion. Moreover, we apply Mamba model for the first time in the motion style transfer field, introducing the Motion Style Mamba (MSM) module to handle longer motion sequences. Thirdly, aiming at the SMCD framework, we propose Diffusion-based Content Consistency Loss and Content Consistency Loss to assist the overall framework's training. Finally, we conduct extensive experiments. The results reveal that our method surpasses state-of-the-art methods in both qualitative and quantitative comparisons, capable of generating more realistic motion sequences.

Abstract (translated)

运动风格迁移是多媒体应用中的一个重要研究方向。它使得虚拟数字人可以快速切换相同运动风格的不同样式，从而极大地增加了运动的多样性和现实感。它广泛应用于电影、游戏和元宇宙等多媒体场景。然而，目前这个领域的大多数工作采用生成对抗网络（GAN），这可能导致不稳定和收敛问题，使得最终生成的运动序列有些混乱，无法反映高度真实和自然风格。为了解决这些问题，我们考虑将风格迁移作为一种条件，并首次提出了Style Motion Conditioned Diffusion（SMCD）框架。此外，我们在运动风格迁移领域中首次应用了Mamba模型，引入了Motion Style Mamba（MSM）模块来处理较长的运动序列。第三，为了支持SMCD框架，我们提出了基于扩散的内容的一致性损失和一致性损失来协助整个框架的训练。最后，我们进行了广泛的实验。结果表明，我们的方法在质量和数量上均超过了最先进的Method，能够生成更真实的运动序列。

URL

https://arxiv.org/abs/2405.02844

PDF

https://arxiv.org/pdf/2405.02844.pdf
Read All
R4: Reinforced Retriever-Reorder-Responder for Retrieval-Augmented Large Language Models

2024-05-04 12:59:10

Taolin Zhang, Dongyang Li, Qizhou Chen, Chengyu Wang, Longtao Huang, Hui Xue, Xiaofeng He, Jun Huang

arXiv_CL

arXiv_CL GAN Adversarial Attention Knowledge Language_Model Text_Generation Pose Action Enhancement LLM
Abstract

Retrieval-augmented large language models (LLMs) leverage relevant content retrieved by information retrieval systems to generate correct responses, aiming to alleviate the hallucination problem. However, existing retriever-responder methods typically append relevant documents to the prompt of LLMs to perform text generation tasks without considering the interaction of fine-grained structural semantics between the retrieved documents and the LLMs. This issue is particularly important for accurate response generation as LLMs tend to ``lose in the middle'' when dealing with input prompts augmented with lengthy documents. In this work, we propose a new pipeline named ``Reinforced Retriever-Reorder-Responder'' (R$^4$) to learn document orderings for retrieval-augmented LLMs, thereby further enhancing their generation abilities while the large numbers of parameters of LLMs remain frozen. The reordering learning process is divided into two steps according to the quality of the generated responses: document order adjustment and document representation enhancement. Specifically, document order adjustment aims to organize retrieved document orderings into beginning, middle, and end positions based on graph attention learning, which maximizes the reinforced reward of response quality. Document representation enhancement further refines the representations of retrieved documents for responses of poor quality via document-level gradient adversarial learning. Extensive experiments demonstrate that our proposed pipeline achieves better factual question-answering performance on knowledge-intensive tasks compared to strong baselines across various public datasets. The source codes and trained models will be released upon paper acceptance.

Abstract (translated)

检索增强的大型语言模型（LLMs）利用信息检索系统检索的相关内容来生成正确的答案，旨在减轻混杂问题。然而，现有的检索响应方法通常在LLM的提示中附加相关文档进行文本生成任务，而没有考虑检索到的文档与LLM之间细粒度语义结构的交互。这个问题在准确回答问题方面尤为重要，因为LLM在处理带有长文档的输入提示时往往会出现“在中途迷失”的情况。在本文中，我们提出了一个名为“强化检索-排序-回答者”（R$^4$）的新管道来学习检索增强LLM的文档顺序，从而在保持LLM的大参数的同时进一步增强其生成能力。排序学习过程根据生成的响应质量分为两个步骤：文档顺序调整和文档表示增强。具体来说，文档顺序调整旨在根据图注意力学习将检索到的文档顺序组织为开始、中间和结束位置，从而最大化响应质量的强化奖励。文档表示增强通过文档级的梯度 adversarial 学习进一步优化了用于低质量响应的文档表示。大量实验证明，与各种公共数据集上的强大基线相比，我们提出的管道在知识密集型任务上的事实问题回答表现更好。源代码和训练好的模型将在论文接受后发布。

URL

https://arxiv.org/abs/2405.02659

PDF

https://arxiv.org/pdf/2405.02659.pdf
Read All
Enhancing Cooperation through Selective Interaction and Long-term Experiences in Multi-Agent Reinforcement Learning

2024-05-04 12:42:55

Tianyu Ren, Xiao-Jun Zeng

arXiv_AI

arXiv_AI GAN Reinforcement_Learning Action Agent
Abstract

The significance of network structures in promoting group cooperation within social dilemmas has been widely recognized. Prior studies attribute this facilitation to the assortment of strategies driven by spatial interactions. Although reinforcement learning has been employed to investigate the impact of dynamic interaction on the evolution of cooperation, there remains a lack of understanding about how agents develop neighbour selection behaviours and the formation of strategic assortment within an explicit interaction structure. To address this, our study introduces a computational framework based on multi-agent reinforcement learning in the spatial Prisoner's Dilemma game. This framework allows agents to select dilemma strategies and interacting neighbours based on their long-term experiences, differing from existing research that relies on preset social norms or external incentives. By modelling each agent using two distinct Q-networks, we disentangle the coevolutionary dynamics between cooperation and interaction. The results indicate that long-term experience enables agents to develop the ability to identify non-cooperative neighbours and exhibit a preference for interaction with cooperative ones. This emergent self-organizing behaviour leads to the clustering of agents with similar strategies, thereby increasing network reciprocity and enhancing group cooperation.

Abstract (translated)

社交困境中网络结构在促进群体合作方面的 significance已经被广泛认可。之前的研究将这种促进归因于空间交互驱动的策略多样化。尽管使用强化学习研究了动态交互对合作演化影响的效应，但关于如何在显式互动结构中发展邻居选择行为和战略组合仍然存在理解不足。为了应对这个问题，我们的研究基于多代理强化学习在囚徒困境游戏中的框架。这个框架允许代理根据其长期经验选择困境策略和互动邻居，与现有的研究不同，后者依赖于预设的社会规范或外部激励。通过使用两个不同的Q网络来建模每个代理，我们解开了合作和互动之间的协同进化动态。结果显示，长期经验使代理能够发展出识别非合作邻居和偏好与积极合作者互动的能力。这种自组织行为导致具有相似策略的代理聚类，从而增加网络互惠和提高群体合作。

URL

https://arxiv.org/abs/2405.02654

PDF

https://arxiv.org/pdf/2405.02654.pdf
Read All
Accelerating Autonomy: Insights from Pro Racers in the Era of Autonomous Racing - An Expert Interview Study

2024-05-04 09:36:45

Frederik Werner, René Oberhuber, Johannes Betz

arXiv_RO

arXiv_RO GAN Autonomous
Abstract

This research aims to investigate professional racing drivers' expertise to develop an understanding of their cognitive and adaptive skills to create new autonomy algorithms. An expert interview study was conducted with 11 professional race drivers, data analysts, and racing instructors from across prominent racing leagues. The interviews were conducted using an exploratory, non-standardized expert interview format guided by a set of prepared questions. The study investigates drivers' exploration strategies to reach their vehicle limits and contrasts them with the capabilities of state-of-the-art autonomous racing software stacks. Participants were questioned about the techniques and skills they have developed to quickly approach and maneuver at the vehicle limit, ultimately minimizing lap times. The analysis of the interviews was grounded in Mayring's qualitative content analysis framework, which facilitated the organization of the data into multiple categories and subcategories. Our findings create insights into human behavior regarding reaching a vehicle's limit and minimizing lap times. We conclude from the findings the development of new autonomy software modules that allow for more adaptive vehicle behavior. By emphasizing the distinct nuances between manual and autonomous driving techniques, the paper encourages further investigation into human drivers' strategies to maximize their vehicles' capabilities.

Abstract (translated)

这项研究旨在调查职业赛车手的专业技能，以了解他们的认知和适应能力，从而创建新的自动驾驶算法。通过对11名职业赛车手、数据分析师和赛车教练的深入访谈，采用一种指导有预备问题的非标准化专家访谈形式进行研究。研究探讨了赛车手如何探索并操纵车辆极限，并与最先进的自动驾驶赛车软件栈的性能进行了比较。参与者还被问到他们为快速接近和操纵车辆极限所开发的技术和技能，最终减少了 lap time。分析访谈数据基于 Mayring 的定性内容分析框架，该框架有助于将数据组织成多个类别和子类别。我们的研究结果揭示了关于达到车辆极限和最小化 lap time 的人类行为。我们得出结论，新型的自动驾驶软件模块的发展允许更适应车辆的行为。通过强调手动和自动驾驶技术之间的显著差异，论文鼓励进一步研究人类驾驶员如何最大化车辆的功能。

URL

https://arxiv.org/abs/2405.02620

PDF

https://arxiv.org/pdf/2405.02620.pdf
Read All
Vision-based 3D occupancy prediction in autonomous driving: a review and outlook

2024-05-04 07:39:25

Yanan Zhang, Jinqing Zhang, Zengran Wang, Junhao Xu, Di Huang

arXiv_CV

arXiv_CV GAN Review Survey Attention Prediction Pose Autonomous 3D Enhancement
Abstract

In recent years, autonomous driving has garnered escalating attention for its potential to relieve drivers' burdens and improve driving safety. Vision-based 3D occupancy prediction, which predicts the spatial occupancy status and semantics of 3D voxel grids around the autonomous vehicle from image inputs, is an emerging perception task suitable for cost-effective perception system of autonomous driving. Although numerous studies have demonstrated the greater advantages of 3D occupancy prediction over object-centric perception tasks, there is still a lack of a dedicated review focusing on this rapidly developing field. In this paper, we first introduce the background of vision-based 3D occupancy prediction and discuss the challenges in this task. Secondly, we conduct a comprehensive survey of the progress in vision-based 3D occupancy prediction from three aspects: feature enhancement, deployment friendliness and label efficiency, and provide an in-depth analysis of the potentials and challenges of each category of methods. Finally, we present a summary of prevailing research trends and propose some inspiring future outlooks. To provide a valuable reference for researchers, a regularly updated collection of related papers, datasets, and codes is organized at this https URL.

Abstract (translated)

近年来，自动驾驶因为其减轻驾驶员负担和提高驾驶安全性的潜在优势而备受关注。基于视觉的3D占用预测，预测自动驾驶车辆周围3D体素网格的空间占用状态和语义，是一个适合自动驾驶低成本感知系统的 emerging perception 任务。尽管大量研究表明，与物体中心感知任务相比，3D占用预测具有更大的优势，但目前仍缺乏针对这一快速发展的领域的专门 review。在本文中，我们首先介绍了基于视觉的3D占用预测的背景，并讨论了这项任务的挑战。然后，我们从三个方面对基于视觉的3D占用预测的研究进展进行全面调查：特征增强、部署友好性和标签效率，并深入分析每种方法的潜在和挑战。最后，我们总结了当前研究趋势，并提出了鼓舞人心的未来展望。为了为研究人员提供有价值的参考，在 https://www. this URL 处组织了一个定期更新的相关论文、数据和代码的集合。

URL

https://arxiv.org/abs/2405.02595

PDF

https://arxiv.org/pdf/2405.02595.pdf
Read All
Innate Motivation for Robot Swarms by Minimizing Surprise: From Simple Simulations to Real-World Experiments

2024-05-04 06:25:58

Tanja Katharina Kaiser, Heiko Hamann

arXiv_RO

arXiv_RO GAN Prediction Action Robot
Abstract

Applications of large-scale mobile multi-robot systems can be beneficial over monolithic robots because of higher potential for robustness and scalability. Developing controllers for multi-robot systems is challenging because the multitude of interactions is hard to anticipate and difficult to model. Automatic design using machine learning or evolutionary robotics seem to be options to avoid that challenge, but bring the challenge of designing reward or fitness functions. Generic reward and fitness functions seem unlikely to exist and task-specific rewards often have undesired side effects. Approaches of so-called innate motivation try to avoid the specific formulation of rewards and work instead with different drivers, such as curiosity. Our approach to innate motivation is to minimize surprise, which we implement by maximizing the accuracy of the swarm robot's sensor predictions using neuroevolution. A unique advantage of the swarm robot case is that swarm members populate the robot's environment and can trigger more active behaviors in a self-referential loop. We summarize our previous simulation-based results concerning behavioral diversity, robustness, scalability, and engineered self-organization, and put them into context. In several new studies, we analyze the influence of the optimizer's hyperparameters, the scalability of evolved behaviors, and the impact of realistic robot simulations. Finally, we present results using real robots that show how the reality gap can be bridged.

Abstract (translated)

大规模移动多机器人系统的应用在单一机器人系统上比具有更高的稳健性和可扩展性。开发多机器人系统的控制器具有挑战性，因为涉及的行为数量众多，难以预测和难以建模。利用机器学习或进化机器人学自动设计似乎是避免这一挑战的选项，但同时也带来了设计奖励或目标函数的挑战。似乎不存在通用的奖励和目标函数，而任务特定的奖励往往会产生不良后果。所谓的自发性方法试图避免具体的奖励表述，而是与不同的驱动程序（如好奇心）合作。我们关于自发性方法的策略是，通过增加聚类机器人的传感器预测的准确性来最小化惊喜。多机器人系统案例的自发性方法优势在于，群成员填充机器人的环境，可以在自回归循环中触发更积极的行为。我们将之前基于模拟的研究结果置于上下文中。在几项新研究中，我们分析了优化器的超参数、进化行为的可扩展性和现实机器人模拟的影响。最后，我们用实际的机器人展示了如何弥合现实差距。

URL

https://arxiv.org/abs/2405.02579

PDF

https://arxiv.org/pdf/2405.02579.pdf
Read All
Rasterized Edge Gradients: Handling Discontinuities Differentiably

2024-05-03 22:42:00

Stanislav Pidhorskyi, Tomas Simon, Gabriel Schwartz, He Wen, Yaser Sheikh, Jason Saragih

arXiv_CV

arXiv_CV Segmentation GAN Face Reconstruction
Abstract

Computing the gradients of a rendering process is paramount for diverse applications in computer vision and graphics. However, accurate computation of these gradients is challenging due to discontinuities and rendering approximations, particularly for surface-based representations and rasterization-based rendering. We present a novel method for computing gradients at visibility discontinuities for rasterization-based differentiable renderers. Our method elegantly simplifies the traditionally complex problem through a carefully designed approximation strategy, allowing for a straightforward, effective, and performant solution. We introduce a novel concept of micro-edges, which allows us to treat the rasterized images as outcomes of a differentiable, continuous process aligned with the inherently non-differentiable, discrete-pixel rasterization. This technique eliminates the necessity for rendering approximations or other modifications to the forward pass, preserving the integrity of the rendered image, which makes it applicable to rasterized masks, depth, and normals images where filtering is prohibitive. Utilizing micro-edges simplifies gradient interpretation at discontinuities and enables handling of geometry intersections, offering an advantage over the prior art. We showcase our method in dynamic human head scene reconstruction, demonstrating effective handling of camera images and segmentation masks.

Abstract (translated)

计算渲染过程中梯度的计算对于计算机视觉和图形学中的各种应用至关重要。然而，由于不连续性和渲染近似，准确计算这些梯度具有挑战性，特别是在基于表面的表示和基于元组织的渲染中。我们提出了一种新的方法，用于计算基于元组织的渲染中的可见性断点处的梯度。我们的方法通过精心设计的近似策略，将通常复杂的问题简化为一个简单而有效的解决方案。我们引入了一个名为微边（micro-edges）的新概念，使我们可以将栅化图像视为与固有非可导性离散像素渲染过程的连续过程的输出。这种技术消除了对前向传播的渲染近似或其他修改的需要，保留了渲染图像的完整性，使其适用于无法进行滤波的栅化mask、深度和法线图像。利用微边简化了在断点处的梯度解释，并使处理几何交涉及纳，提供了与先例技术相比的优势。我们在动态人类头部场景重构中展示我们的方法，证明了在处理相机图像和分割掩码方面具有有效的处理能力。

URL

https://arxiv.org/abs/2405.02508

PDF

https://arxiv.org/pdf/2405.02508.pdf
Read All
Hierarchies define the scalability of robot swarms

2024-05-03 18:21:26

Vivek Shankar Varadharajan, Karthik Soma, Sepand Dyanatkar, Pierre-Yves Lajoie, Giovanni Beltrame

arXiv_RO

arXiv_RO GAN Autonomous Robot
Abstract

The emerging behaviors of swarms have fascinated scientists and gathered significant interest in the field of robotics. Traditionally, swarms are viewed as egalitarian, with robots sharing identical roles and capabilities. However, recent findings highlight the importance of hierarchy for deploying robot swarms more effectively in diverse scenarios. Despite nature's preference for hierarchies, the robotics field has clung to the egalitarian model, partly due to a lack of empirical evidence for the conditions favoring hierarchies. Our research demonstrates that while egalitarian swarms excel in environments proportionate to their collective sensing abilities, they struggle in larger or more complex settings. Hierarchical swarms, conversely, extend their sensing reach efficiently, proving successful in larger, more unstructured environments with fewer resources. We validated these concepts through simulations and physical robot experiments, using a complex radiation cleanup task. This study paves the way for developing adaptable, hierarchical swarm systems applicable in areas like planetary exploration and autonomous vehicles. Moreover, these insights could deepen our understanding of hierarchical structures in biological organisms.

Abstract (translated)

群体行为的演变引起了科学家的浓厚兴趣，并在机器人领域引起了显著的关注。传统上，群体被视为平等的，机器人共享相同的角色和能力。然而，最近的研究强调分层对于在多样场景中更有效地部署机器人集群的重要性。尽管自然倾向于分层，但机器人领域仍然坚持平等模式，部分原因是缺乏支持分层的有实验证据。我们的研究证明，尽管平等的群体在与其 collective sensing abilities相适应的环境中表现优秀，但在更大或更复杂的设置中，它们的表现并不理想。相反，分层群扩展了其感知范围，在更大、更不规则的环境中表现出色，同时有更少的资源。我们通过仿真和实体机器人实验验证了这些概念，并使用一个复杂的放射性污染任务来验证这些概念。本研究为开发适用于行星探索和自动驾驶等领域的可适应分层群系统铺平了道路。此外，这些见解还有可能加深我们对生物有机体中层次结构的理解。

URL

https://arxiv.org/abs/2405.02417

PDF

https://arxiv.org/pdf/2405.02417.pdf
Read All
Automatic Programming: Large Language Models and Beyond

2024-05-03 16:19:24

Michael R. Lyu, Baishakhi Ray, Abhik Roychoudhury, Shin Hwei Tan, Patanamon Thongtanunam

arXiv_AI

arXiv_AI GAN Face Language_Model LLM
Abstract

Automatic programming has seen increasing popularity due to the emergence of tools like GitHub Copilot which rely on Large Language Models (LLMs). At the same time, automatically generated code faces challenges during deployment due to concerns around quality and trust. In this article, we study automated coding in a general sense and study the concerns around code quality, security and related issues of programmer responsibility. These are key issues for organizations while deciding on the usage of automatically generated code. We discuss how advances in software engineering such as program repair and analysis can enable automatic programming. We conclude with a forward looking view, focusing on the programming environment of the near future, where programmers may need to switch to different roles to fully utilize the power of automatic programming. Automated repair of automatically generated programs from LLMs, can help produce higher assurance code from LLMs, along with evidence of assurance

Abstract (translated)

自动编程由于诸如GitHub Copilot这样的工具而日益受到欢迎，这些工具依赖于大型语言模型（LLMs）。与此同时，自动生成的代码在部署过程中面临挑战，原因是担忧代码质量和可信度。在本文中，我们研究了自动编程的通用概念，并研究了编程者责任方面的担忧。这些是企业在决定使用自动生成的代码时需要关注的关键问题。我们讨论了软件工程方面的进步，如程序修复和分析，如何实现自动编程。我们最后从未来的编程环境出发，关注程序员需要切换到不同角色来充分利用自动编程的力量。自动从LLMs生成的程序的修复可以帮助生成更高保证代码，并附带保证的证据。

URL

https://arxiv.org/abs/2405.02213

PDF

https://arxiv.org/pdf/2405.02213.pdf
Read All
Three-Dimensional Amyloid-Beta PET Synthesis from Structural MRI with Conditional Generative Adversarial Networks

2024-05-03 14:10:29

Fernando Vega, Abdoljalil Addeh, M. Ethan MacDonald

arXiv_CV

arXiv_CV GAN Detection Adversarial Relation Pose 3D LLM
Abstract

Motivation: Alzheimer's Disease hallmarks include amyloid-beta deposits and brain atrophy, detectable via PET and MRI scans, respectively. PET is expensive, invasive and exposes patients to ionizing radiation. MRI is cheaper, non-invasive, and free from ionizing radiation but limited to measuring brain atrophy. Goal: To develop an 3D image translation model that synthesizes amyloid-beta PET images from T1-weighted MRI, exploiting the known relationship between amyloid-beta and brain atrophy. Approach: The model was trained on 616 PET/MRI pairs and validated with 264 pairs. Results: The model synthesized amyloid-beta PET images from T1-weighted MRI with high-degree of similarity showing high SSIM and PSNR metrics (SSIM>0.95&PSNR=28). Impact: Our model proves the feasibility of synthesizing amyloid-beta PET images from structural MRI ones, significantly enhancing accessibility for large-cohort studies and early dementia detection, while also reducing cost, invasiveness, and radiation exposure.

Abstract (translated)

动机：阿尔茨海默病的关键特征包括淀粉样蛋白β（amyloid-β）沉积和脑萎缩，可以通过PET和MRI扫描检测到。PET费用昂贵，侵入性较强，且会暴露患者接受放射线治疗。MRI虽然比PET便宜，非侵入性，但只能测量脑萎缩，有限制。目标：开发一个3D图像翻译模型，从T1加权MRI合成amyloid-β PET图像，利用已知amyloid-β和脑萎缩之间的关系。方法：该模型在616个PET/MRI对上进行训练，并通过264个对进行验证。结果：该模型从T1加权MRI上合成了高程度的amyloid-β PET图像，具有很高的SSIM和PSNR指标（SSIM>0.95&PSNR=28）。影响：我们的模型证明了从结构MRI合成amyloid-β PET图像的可能性，显著增强了大型队列研究和早期痴呆症检测的可用性，同时降低了成本、侵入性和放射线暴露。

URL

https://arxiv.org/abs/2405.02109

PDF

https://arxiv.org/pdf/2405.02109.pdf
Read All
LLM as Dataset Analyst: Subpopulation Structure Discovery with Large Language Model

2024-05-03 05:09:54

Yulin Luo, Ruichuan An, Bocheng Zou, Yiming Tang, Jiaming Liu, Shanghang Zhang

arXiv_CV

arXiv_CV Image_Caption Caption GAN Knowledge Language_Model Pose LLM
Abstract

The distribution of subpopulations is an important property hidden within a dataset. Uncovering and analyzing the subpopulation distribution within datasets provides a comprehensive understanding of the datasets, standing as a powerful tool beneficial to various downstream tasks, including Dataset Subpopulation Organization, Subpopulation Shift, and Slice Discovery. Despite its importance, there has been no work that systematically explores the subpopulation distribution of datasets to our knowledge. To address the limitation and solve all the mentioned tasks in a unified way, we introduce a novel concept of subpopulation structures to represent, analyze, and utilize subpopulation distributions within datasets. To characterize the structures in an interpretable manner, we propose the Subpopulation Structure Discovery with Large Language Models (SSD-LLM) framework, which employs world knowledge and instruction-following capabilities of Large Language Models (LLMs) to linguistically analyze informative image captions and summarize the structures. Furthermore, we propose complete workflows to address downstream tasks, named Task-specific Tuning, showcasing the application of the discovered structure to a spectrum of subpopulation-related tasks, including dataset subpopulation organization, subpopulation shift, and slice discovery. Furthermore, we propose complete workflows to address downstream tasks, named Task-specific Tuning, showcasing the application of the discovered structure to a spectrum of subpopulation-related tasks, including dataset subpopulation organization, subpopulation shift, and slice discovery.

Abstract (translated)

子种群分布是数据集中一个重要的属性。揭示和分析数据集中的子种群分布提供了对数据集的全面了解，作为各种下游任务的有力工具，包括数据集子种群组织、子种群平移和切片发现。尽管这对数据集非常重要，但我们不知道有没有系统地研究了数据集中的子种群分布。为了克服这一局限，并统一解决所有提到的任务，我们引入了一个新的子种群结构概念，用于表示、分析和利用数据集中的子种群分布。为了以可解释的方式描述结构，我们提出了 Subpopulation Structure Discovery with Large Language Models (SSD-LLM) framework，该框架利用大型语言模型的世界知识和指令跟随能力进行语言分析，并总结结构。此外，我们提出了完整的下游任务工作流程，名为任务特定调整，展示了发现的结构在子种群相关任务中的应用，包括数据集子种群组织、子种群平移和切片发现。此外，我们还提出了完整的下游任务工作流程，名为任务特定调整，展示了发现的结构在子种群相关任务中的应用，包括数据集子种群组织、子种群平移和切片发现。

URL

https://arxiv.org/abs/2405.02363

PDF

https://arxiv.org/pdf/2405.02363.pdf
Read All
Stochastic Multivariate Universal-Radix Finite-State Machine: a Theoretically and Practically Elegant Nonlinear Function Approximator

2024-05-03 02:53:32

Xincheng Feng, Guodong Shen, Jianhao Hu, Meng Li, Ngai Wong

arXiv_AI

arXiv_AI GAN Relation Pose
Abstract

Nonlinearities are crucial for capturing complex input-output relationships especially in deep neural networks. However, nonlinear functions often incur various hardware and compute overheads. Meanwhile, stochastic computing (SC) has emerged as a promising approach to tackle this challenge by trading output precision for hardware simplicity. To this end, this paper proposes a first-of-its-kind stochastic multivariate universal-radix finite-state machine (SMURF) that harnesses SC for hardware-simplistic multivariate nonlinear function generation at high accuracy. We present the finite-state machine (FSM) architecture for SMURF, as well as analytical derivations of sampling gate coefficients for accurately approximating generic nonlinear functions. Experiments demonstrate the superiority of SMURF, requiring only 16.07% area and 14.45% power consumption of Taylor-series approximation, and merely 2.22% area of look-up table (LUT) schemes.

Abstract (translated)

非线性在深度神经网络中捕捉复杂的输入-输出关系非常重要。然而，非线性函数通常会引入各种硬件和计算开销。与此同时，随机计算（SC）作为一种解决这个挑战的有效方法，通过将输出精度换取硬件的简单性，浮现出来。为此，本文提出了一个独一无二的随机多维统一 radical-30 有限状态机（SMURF），利用 SC 在高精度的硬件简约多维非线性函数生成中。我们呈现了 SMURF 的有限状态机（FSM）架构，以及用于准确近似的通用非线性函数的采样门系数分析。实验证明，SMURF 的优越性，只需要 16.07% 的面积和 14.45% 的功耗，以及仅仅 2.22% 的查找表（LUT）方案的面积。

URL

https://arxiv.org/abs/2405.02356

PDF

https://arxiv.org/pdf/2405.02356.pdf
Read All
Generative Active Learning for the Search of Small-molecule Protein Binders

2024-05-02 16:39:21

Maksym Korablyov, Cheng-Hao Liu, Moksh Jain, Almer M. van der Sloot, Eric Jolicoeur, Edward Ruediger, Andrei Cristian Nica, Emmanuel Bengio, Kostiantyn Lapchevskyi, Daniel St-Cyr, Doris Alexandra Schuetz, Victor Ion Butoi, Jarrid Rector-Brooks, Simon Blackburn, Leo Feng, Hadi Nekoei, SaiKrishna Gottipati, Priyesh Vijayan, Prateek Gupta, Ladislav Rampášek, Sasikanth Avancha, Pierre-Luc Bacon, William L. Hamilton, Brooks Paige, Sanchit Misra, Stanislaw Kamil Jastrzebski, Bharat Kaul, Doina Precup, José Miguel Hernández-Lobato, Marwin Segler, Michael Bronstein, Anne Marinier, Mike Tyers, Yoshua Bengio

arXiv_AI

arXiv_AI GAN Reinforcement_Learning
Abstract

Despite substantial progress in machine learning for scientific discovery in recent years, truly de novo design of small molecules which exhibit a property of interest remains a significant challenge. We introduce LambdaZero, a generative active learning approach to search for synthesizable molecules. Powered by deep reinforcement learning, LambdaZero learns to search over the vast space of molecules to discover candidates with a desired property. We apply LambdaZero with molecular docking to design novel small molecules that inhibit the enzyme soluble Epoxide Hydrolase 2 (sEH), while enforcing constraints on synthesizability and drug-likeliness. LambdaZero provides an exponential speedup in terms of the number of calls to the expensive molecular docking oracle, and LambdaZero de novo designed molecules reach docking scores that would otherwise require the virtual screening of a hundred billion molecules. Importantly, LambdaZero discovers novel scaffolds of synthesizable, drug-like inhibitors for sEH. In in vitro experimental validation, a series of ligands from a generated quinazoline-based scaffold were synthesized, and the lead inhibitor N-(4,6-di(pyrrolidin-1-yl)quinazolin-2-yl)-N-methylbenzamide (UM0152893) displayed sub-micromolar enzyme inhibition of sEH.

Abstract (translated)

尽管在近年来机器学习在科学发现方面取得了实质性进展，但真正实现新药小分子设计仍然是一个重要的挑战。我们引入了LambdaZero，一种基于生成式主动学习搜索合成分子的人工智能方法。LambdaZero借助深度强化学习学会了在庞大的分子空间中搜索有目标性质的分子，并发现具有所需性质的候选分子。我们将LambdaZero与分子对接应用于设计抑制酶可溶性E暴露2（sEH）的新小分子，同时满足合成性和药物相似性的约束。LambdaZero在分子对接或acle的调用次数方面提供了指数级的速度提升，LambdaZero设计的分子达到了其他方法需要进行虚拟筛选的100亿分子数量级。重要的是，LambdaZero发现了sEH的新型可合成、药物类似抑制性支架。在体内实验验证中，基于产生的喹啉支架的一组配体被合成，其中主抑制剂N-(4,6-二(吡咯烷基)-1-yl)喹啉-2-yl)-N-甲基苯胺（UM0152893）显示出对sEH的亚摩尔抑制。

URL

https://arxiv.org/abs/2405.01616

PDF

https://arxiv.org/pdf/2405.01616.pdf
Read All
Overcoming LLM Challenges using RAG-Driven Precision in Coffee Leaf Disease Remediation

2024-05-02 14:19:25

Dr. Selva Kumar S, Afifah Khan Mohammed Ajmal Khan, Imadh Ajaz Banday, Manikantha Gada, Vibha Venkatesh Shanbhag

arXiv_CL

arXiv_CL GAN Detection Object_Detection Language_Model LLM
Abstract

This research introduces an innovative AI-driven precision agriculture system, leveraging YOLOv8 for disease identification and Retrieval Augmented Generation (RAG) for context-aware diagnosis. Focused on addressing the challenges of diseases affecting the coffee production sector in Karnataka, The system integrates sophisticated object detection techniques with language models to address the inherent constraints associated with Large Language Models (LLMs). Our methodology not only tackles the issue of hallucinations in LLMs, but also introduces dynamic disease identification and remediation strategies. Real-time monitoring, collaborative dataset expansion, and organizational involvement ensure the system's adaptability in diverse agricultural settings. The effect of the suggested system extends beyond automation, aiming to secure food supplies, protect livelihoods, and promote eco-friendly farming practices. By facilitating precise disease identification, the system contributes to sustainable and environmentally conscious agriculture, reducing reliance on pesticides. Looking to the future, the project envisions continuous development in RAG-integrated object detection systems, emphasizing scalability, reliability, and usability. This research strives to be a beacon for positive change in agriculture, aligning with global efforts toward sustainable and technologically enhanced food production.

Abstract (translated)

这项研究介绍了一种创新的人工智能驱动的精度农业系统，利用YOLOv8进行疾病识别和Retrieval Augmented Generation（RAG）进行上下文感知诊断，重点解决影响印度卡纳塔克咖啡生产领域的疾病挑战。系统将先进的物体检测技术与语言模型相结合，以解决LLMs固有的限制。我们的方法不仅解决了LLMs中的幻觉问题，还引入了动态疾病识别和修复策略。实时监测、合作数据扩展和组织的参与确保了系统的适应性在不同的农业环境中。建议的系统的效果不仅超越了自动化，还旨在确保粮食供应、保护生计和促进可持续的环保农业实践。通过促进精确疾病识别，系统为可持续和环保的农业做出了贡献，减少了农药的依赖。展望未来，该项目愿景在RAG集成的物体检测系统中持续发展，强调可扩展性、可靠性和易用性。这项研究旨在成为农业领域积极变革的灯塔，与全球致力于可持续和科技增强食品生产的努力相一致。

URL

https://arxiv.org/abs/2405.01310

PDF

https://arxiv.org/pdf/2405.01310.pdf
Read All
Towards Inclusive Face Recognition Through Synthetic Ethnicity Alteration

2024-05-02 13:31:09

Praveen Kumar Chandaliya, Kiran Raja, Raghavendra Ramachandra, Zahid Akhtar, Christoph Busch

arXiv_AI

arXiv_AI GAN Recognition Adversarial Face Face_Recognition QA
Abstract

Numerous studies have shown that existing Face Recognition Systems (FRS), including commercial ones, often exhibit biases toward certain ethnicities due to under-represented data. In this work, we explore ethnicity alteration and skin tone modification using synthetic face image generation methods to increase the diversity of datasets. We conduct a detailed analysis by first constructing a balanced face image dataset representing three ethnicities: Asian, Black, and Indian. We then make use of existing Generative Adversarial Network-based (GAN) image-to-image translation and manifold learning models to alter the ethnicity from one to another. A systematic analysis is further conducted to assess the suitability of such datasets for FRS by studying the realistic skin-tone representation using Individual Typology Angle (ITA). Further, we also analyze the quality characteristics using existing Face image quality assessment (FIQA) approaches. We then provide a holistic FRS performance analysis using four different systems. Our findings pave the way for future research works in (i) developing both specific ethnicity and general (any to any) ethnicity alteration models, (ii) expanding such approaches to create databases with diverse skin tones, (iii) creating datasets representing various ethnicities which further can help in mitigating bias while addressing privacy concerns.

Abstract (translated)

许多研究都表明，现有的Face Recognition系统（包括商业系统）往往因为代表性数据不足而倾向于针对某些民族产生偏见。在这项工作中，我们使用合成面部图像生成方法来探讨种族改变和肤色修改，以增加数据集的多样性。我们首先构建了一个代表三个民族的平衡面部图像数据集，然后利用现有的基于生成对抗网络（GAN）的图像到图像转换和多态学习模型，将一种民族的肤色改变为另一种民族。我们进一步研究了这种数据集对Face Recognition System（FRS）的适用性，通过研究个体典型角度（ITA）来评估肤色现实主义表示。此外，我们还分析了使用现有的面部图像质量评估（FIQA）方法来评估质量特征。然后，我们使用四种不同的系统提供了全面的FRS性能分析。我们的研究结果为未来研究奠定了基础：（一）开发既针对特定民族又针对任意民族改变模型的可能性；（二）将这种方法扩展到创建具有不同肤色的数据库的可能性；（三）创建代表各种民族的數據集，从而在减轻偏见的同时解决隐私问题。

URL

https://arxiv.org/abs/2405.01273

PDF

https://arxiv.org/pdf/2405.01273.pdf
Read All
TRAMBA: A Hybrid Transformer and Mamba Architecture for Practical Audio and Bone Conduction Speech Super Resolution and Enhancement on Mobile and Wearable Platforms

2024-05-02 12:45:48

Yueyuan Sui, Minghui Zhao, Junxi Xia, Xiaofan Jiang, Stephen Xia

arXiv_AI

arXiv_AI GAN Super_Resolution Inference Transformer Pose Enhancement Speech
Abstract

We propose TRAMBA, a hybrid transformer and Mamba architecture for acoustic and bone conduction speech enhancement, suitable for mobile and wearable platforms. Bone conduction speech enhancement has been impractical to adopt in mobile and wearable platforms for several reasons: (i) data collection is labor-intensive, resulting in scarcity; (ii) there exists a performance gap between state of-art models with memory footprints of hundreds of MBs and methods better suited for resource-constrained systems. To adapt TRAMBA to vibration-based sensing modalities, we pre-train TRAMBA with audio speech datasets that are widely available. Then, users fine-tune with a small amount of bone conduction data. TRAMBA outperforms state-of-art GANs by up to 7.3% in PESQ and 1.8% in STOI, with an order of magnitude smaller memory footprint and an inference speed up of up to 465 times. We integrate TRAMBA into real systems and show that TRAMBA (i) improves battery life of wearables by up to 160% by requiring less data sampling and transmission; (ii) generates higher quality voice in noisy environments than over-the-air speech; (iii) requires a memory footprint of less than 20.0 MB.

Abstract (translated)

我们提出了TRAMBA，一种适用于移动和可穿戴平台的混合变压器和Mamba架构的语音和骨传导增强，为语音和骨传导增强提供了一种高效且可扩展的方法。骨传导增强在移动和可穿戴平台上的实现一直是不实用的几个原因：首先，数据收集工作量很大，导致数据稀缺；其次，与具有数百MB内存开销的先进模型相比，更适用于资源受限系统的方法之间存在性能差距。为了将TRAMBA适应振动感知模式，我们使用音频语音数据集预先训练TRAMBA。然后，用户通过少量的骨传导数据进行微调。TRAMBA在PESQ和STOI方面的性能优于最先进的GAN，其内存开销减小了 orders of magnitude，并且推理速度加快了465倍。我们将TRAMBA集成到实际系统中，并证明了TRAMBA（i）通过要求更少的数据采样和传输来提高可穿戴设备的电池寿命，提高了160%；（ii）在嘈杂的环境中产生的声音质量高于通过空气传播的声音；(iii) 内存开销小于20.0 MB。

URL

https://arxiv.org/abs/2405.01242

PDF

https://arxiv.org/pdf/2405.01242.pdf
Read All
DMON: A Simple yet Effective Approach for Argument Structure Learning

2024-05-02 11:56:16

Wei Sun, Mingxiao Li, Jingyuan Sun, Jesse Davis, Marie-Francine Moens

arXiv_AI

arXiv_AI GAN CNN Embedding Relation Medical
Abstract

Argument structure learning~(ASL) entails predicting relations between arguments. Because it can structure a document to facilitate its understanding, it has been widely applied in many fields~(medical, commercial, and scientific domains). Despite its broad utilization, ASL remains a challenging task because it involves examining the complex relationships between the sentences in a potentially unstructured discourse. To resolve this problem, we have developed a simple yet effective approach called Dual-tower Multi-scale cOnvolution neural Network~(DMON) for the ASL task. Specifically, we organize arguments into a relationship matrix that together with the argument embeddings forms a relationship tensor and design a mechanism to capture relations with contextual arguments. Experimental results on three different-domain argument mining datasets demonstrate that our framework outperforms state-of-the-art models. The code is available at this https URL .

Abstract (translated)

翻译：论据结构学习~（ASL）意味着预测论据之间的关系。由于它可以组织文档以促进其理解，因此在医学、商业和科学等领域得到了广泛应用。（ASL在许多领域中都有广泛应用，包括医学、商业和科学领域。）尽管ASL的应用非常广泛，但它仍然是一个具有挑战性的任务，因为它涉及检查潜在无结构话语中句子的复杂关系。为了解决这个问题，我们开发了一种简单而有效的ASL任务解决方案：双塔多尺度卷积神经网络~（DMON）。具体来说，我们将论据组织成一个关系矩阵，该矩阵与论据嵌入一起形成关系张量，并设计了一个机制来捕捉带有上下文论据的关系。在三个不同领域的论据挖掘数据集上的实验结果表明，我们的框架超过了最先进的模型。代码可在此处访问：https://www.jianshu.com/p/142637911311920001 。

URL

https://arxiv.org/abs/2405.01216

PDF

https://arxiv.org/pdf/2405.01216.pdf
Read All
TartuNLP at EvaLatin 2024: Emotion Polarity Detection

2024-05-02 10:28:52

Aleksei Dorkin, Kairit Sirts

arXiv_CL

arXiv_CL GAN Detection Knowledge Transformer Emotion Chat LLM
Abstract

This paper presents the TartuNLP team submission to EvaLatin 2024 shared task of the emotion polarity detection for historical Latin texts. Our system relies on two distinct approaches to annotating training data for supervised learning: 1) creating heuristics-based labels by adopting the polarity lexicon provided by the organizers and 2) generating labels with GPT4. We employed parameter efficient fine-tuning using the adapters framework and experimented with both monolingual and cross-lingual knowledge transfer for training language and task adapters. Our submission with the LLM-generated labels achieved the overall first place in the emotion polarity detection task. Our results show that LLM-based annotations show promising results on texts in Latin.

Abstract (translated)

本文是TartuNLP团队在2024年EvaLatin共享任务中提交的关于情感极性检测历史拉丁文本的任务。我们的系统依赖于两种不同的数据注释方法：1）采用主办方提供的极性词典创建基于策略的标签；2）使用GPT4生成标签。我们使用适应器框架进行参数高效的微调，并尝试了为训练语言和任务适配器进行本体和跨语言知识传递。使用LLM生成的标签，我们在情感极性检测任务中获得了 overall first place 的成绩。我们的结果表明，基于LLM的注释在拉丁文本上显示出有希望的结果。

URL

https://arxiv.org/abs/2405.01159

PDF

https://arxiv.org/pdf/2405.01159.pdf
Read All
Domain-Transferred Synthetic Data Generation for Improving Monocular Depth Estimation

2024-05-02 09:21:10

Seungyeop Lee, Knut Peterson, Solmaz Arezoomandan, Bill Cai, Peihan Li, Lifeng Zhou, David Han

arXiv_AI

arXiv_AI GAN Pose 3D Robot
Abstract

A major obstacle to the development of effective monocular depth estimation algorithms is the difficulty in obtaining high-quality depth data that corresponds to collected RGB images. Collecting this data is time-consuming and costly, and even data collected by modern sensors has limited range or resolution, and is subject to inconsistencies and noise. To combat this, we propose a method of data generation in simulation using 3D synthetic environments and CycleGAN domain transfer. We compare this method of data generation to the popular NYUDepth V2 dataset by training a depth estimation model based on the DenseDepth structure using different training sets of real and simulated data. We evaluate the performance of the models on newly collected images and LiDAR depth data from a Husky robot to verify the generalizability of the approach and show that GAN-transformed data can serve as an effective alternative to real-world data, particularly in depth estimation.

Abstract (translated)

发展有效的单目深度估计算法的一个主要障碍是获取与收集的RGB图像相高质量的深度数据困难。收集这种数据耗时且代价高昂，即使是现代传感器的数据也无法提供高分辨率或高范围。数据存在不统一和噪声，因此我们提出了使用3D合成环境和循环GAN领域转换方法进行数据生成的方法。我们通过基于DenseDepth结构的深度估计模型对不同真实和模拟数据集进行训练，比较这种数据生成方法与NYUDepth V2数据集的性能。我们在Husky机器人上评估了模型在新收集的图像和来自模拟数据的LiDAR深度数据上的性能，以验证这种方法的可扩展性，并表明经过GAN变换的数据可以作为现实世界数据的有效替代，尤其是在深度估计方面。

URL

https://arxiv.org/abs/2405.01113

PDF

https://arxiv.org/pdf/2405.01113.pdf
Read All

Content

GAN (20)

GAN

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF