We introduce a new family of minimal problems for reconstruction from multiple views. Our primary focus is a novel approach to autocalibration, a long-standing problem in computer vision. Traditional approaches to this problem, such as those based on Kruppa's equations or the modulus constraint, rely explicitly on the knowledge of multiple fundamental matrices or a projective reconstruction. In contrast, we consider a novel formulation involving constraints on image points, the unknown depths of 3D points, and a partially specified calibration matrix $K$. For $2$ and $3$ views, we present a comprehensive taxonomy of minimal autocalibration problems obtained by relaxing some of these constraints. These problems are organized into classes according to the number of views and any assumed prior knowledge of $K$. Within each class, we determine problems with the fewest -- or a relatively small number of -- solutions. From this zoo of problems, we devise three practical solvers. Experiments with synthetic and real data and interfacing our solvers with COLMAP demonstrate that we achieve superior accuracy compared to state-of-the-art calibration methods. The code is available at this https URL
我们介绍了一种新的多视角重建最小问题家族。我们的主要关注点是来自多个视角的自动对齐的新方法,这是一个在计算机视觉中由来已久的问题。传统解决这个问题的方式,如基于Kruppa方程或模量约束,都依赖于多个基本矩阵或投影重构。相反,我们考虑了一个涉及对图像点、3D点未知深度以及部分指定的校准矩阵K的新颖公式。对于2和3视图,我们通过放宽其中一些约束来全面分类最小自对齐问题。这些问题根据视图的数量和任何假设的先验知识分成不同的类别。在每个类别内,我们确定具有最少的(或相对较小的)解的问题。从这个问题动物园中,我们设计了三个人工智能 solver。用合成和真实数据进行实验,并将我们的 solver 与COLMAP进行交互,证明了我们在与最先进的校准方法相比实现更高的准确性的效果。代码可在此处访问:https://www.cs.utah.edu/~germain/ projects/calibration/
https://arxiv.org/abs/2405.05605
We propose the characteristic generator, a novel one-step generative model that combines the efficiency of sampling in Generative Adversarial Networks (GANs) with the stable performance of flow-based models. Our model is driven by characteristics, along which the probability density transport can be described by ordinary differential equations (ODEs). Specifically, We estimate the velocity field through nonparametric regression and utilize Euler method to solve the probability flow ODE, generating a series of discrete approximations to the characteristics. We then use a deep neural network to fit these characteristics, ensuring a one-step mapping that effectively pushes the prior distribution towards the target distribution. In the theoretical aspect, we analyze the errors in velocity matching, Euler discretization, and characteristic fitting to establish a non-asymptotic convergence rate for the characteristic generator in 2-Wasserstein distance. To the best of our knowledge, this is the first thorough analysis for simulation-free one step generative models. Additionally, our analysis refines the error analysis of flow-based generative models in prior works. We apply our method on both synthetic and real datasets, and the results demonstrate that the characteristic generator achieves high generation quality with just a single evaluation of neural network.
我们提出了一个新型的 one-step 生成模型——特征生成器,它结合了生成对抗网络(GANs)的抽样效率和基于流模型的稳定性能。我们的模型基于特征,其中概率密度传输可以用普通微分方程(ODEs)描述。具体来说,我们通过非参数回归估计速度场,并利用欧拉方法求解概率流 ODE,生成一系列离散的特征近似。然后我们使用深度神经网络来拟合这些特征,确保实现了一步将先验分布推向目标分布的效果。在理论方面,我们分析了速度匹配误差、欧拉离散化和特征拟合误差,以建立 2-Wasserstein 距离下特征生成器的非收缩收敛率。据我们所知,这是第一个针对无需模拟的 one-step 生成模型的深入分析。此外,我们的分析进一步提高了基于流模型的生成模型的误差分析。我们在 synthetic 和 real-world 数据集上应用我们的方法,结果表明,特征生成器只需对神经网络进行一次评估,就能实现高水平的生成质量。
https://arxiv.org/abs/2405.05512
Nasogastric tubes (NGTs) are feeding tubes that are inserted through the nose into the stomach to deliver nutrition or medication. If not placed correctly, they can cause serious harm, even death to patients. Recent AI developments demonstrate the feasibility of robustly detecting NGT placement from Chest X-ray images to reduce risks of sub-optimally or critically placed NGTs being missed or delayed in their detection, but gaps remain in clinical practice integration. In this study, we present a human-centered approach to the problem and describe insights derived following contextual inquiry and in-depth interviews with 15 clinical stakeholders. The interviews helped understand challenges in existing workflows, and how best to align technical capabilities with user needs and expectations. We discovered the trade-offs and complexities that need consideration when choosing suitable workflow stages, target users, and design configurations for different AI proposals. We explored how to balance AI benefits and risks for healthcare staff and patients within broader organizational and medical-legal constraints. We also identified data issues related to edge cases and data biases that affect model training and evaluation; how data documentation practices influence data preparation and labelling; and how to measure relevant AI outcomes reliably in future evaluations. We discuss how our work informs design and development of AI applications that are clinically useful, ethical, and acceptable in real-world healthcare services.
鼻胃管(NGTs)是通过插入鼻子到胃来输送营养或药物的喂养管。如果放置不正确,它们可能会造成严重的危害,甚至死亡于患者。最近的人工智能发展展示了通过胸部X光片图像从胸腔中正确放置NGTs的可行性,以减少在检测过程中遗漏或延迟NGTs的风险。然而,在临床实践中,仍存在与NGTs整合的空白。在这项研究中,我们提出了一个以人为中心的方法来解决这个问题,描述了在上下文调查和与15个临床利益相关者的深入访谈中得出的见解。调查帮助了解了现有工作流程中的挑战,以及如何最好地将技术能力与用户需求和期望对齐。我们发现了在选择适当的应用工作阶段、目标用户和设计配置时需要考虑的权衡和复杂性。我们还探讨了如何在广泛的组织和医疗法律限制内平衡AI的利益和风险。我们也指出了与边缘情况有关的数据问题,以及数据记录实践如何影响数据准备和标签,以及如何在未来的评估中可靠地测量相关AI成果。我们讨论了我们的工作如何指导具有临床实用性、道德可接受性和实际 healthcare 服务可接受性的 AI 应用程序的设计和开发。
https://arxiv.org/abs/2405.05299
In this paper, we demonstrate the feasibility of alterfactual explanations for black box image classifiers. Traditional explanation mechanisms from the field of Counterfactual Thinking are a widely-used paradigm for Explainable Artificial Intelligence (XAI), as they follow a natural way of reasoning that humans are familiar with. However, most common approaches from this field are based on communicating information about features or characteristics that are especially important for an AI's decision. However, to fully understand a decision, not only knowledge about relevant features is needed, but the awareness of irrelevant information also highly contributes to the creation of a user's mental model of an AI system. To this end, a novel approach for explaining AI systems called alterfactual explanations was recently proposed on a conceptual level. It is based on showing an alternative reality where irrelevant features of an AI's input are altered. By doing so, the user directly sees which input data characteristics can change arbitrarily without influencing the AI's decision. In this paper, we show for the first time that it is possible to apply this idea to black box models based on neural networks. To this end, we present a GAN-based approach to generate these alterfactual explanations for binary image classifiers. Further, we present a user study that gives interesting insights on how alterfactual explanations can complement counterfactual explanations.
在本文中,我们证明了对于黑盒图像分类器,改变性解释是可行的。该领域传统的解释机制是可解释人工智能(XAI)中广泛使用的范式,因为它们遵循了人类熟悉的推理方式。然而,这个领域最常用的方法是基于传达关于AI决策至关重要的特征或特征的 information。但是,要完全理解一个决策,不仅需要知道相关的特征,还需要意识到无关信息。因此,为了创建用户对AI系统的心理模型,需要有意识地收集无关信息。为此,在概念层面上,提出了一个新的解释AI系统的性解释方法,即改变性解释。它基于展示一个与AI输入无关特征被改变 alternative reality。通过这种方式,用户可以直接看到哪些输入数据特征可以随意改变,而不会影响AI的决策。在本文中,我们首次展示了将这个想法应用于基于神经网络的黑盒模型是可能的。为此,我们提出了一个基于GAN的生成性解释方法,用于生成这些改变性解释。此外,我们还进行了一个用户研究,深入探讨了改变性解释如何补充反事实解释。
https://arxiv.org/abs/2405.05295
Medical Image Synthesis (MIS) plays an important role in the intelligent medical field, which greatly saves the economic and time costs of medical diagnosis. However, due to the complexity of medical images and similar characteristics of different tissue cells, existing methods face great challenges in meeting their biological consistency. To this end, we propose the Hybrid Augmented Generative Adversarial Network (HAGAN) to maintain the authenticity of structural texture and tissue cells. HAGAN contains Attention Mixed (AttnMix) Generator, Hierarchical Discriminator and Reverse Skip Connection between Discriminator and Generator. The AttnMix consistency differentiable regularization encourages the perception in structural and textural variations between real and fake images, which improves the pathological integrity of synthetic images and the accuracy of features in local areas. The Hierarchical Discriminator introduces pixel-by-pixel discriminant feedback to generator for enhancing the saliency and discriminance of global and local details simultaneously. The Reverse Skip Connection further improves the accuracy for fine details by fusing real and synthetic distribution features. Our experimental evaluations on three datasets of different scales, i.e., COVID-CT, ACDC and BraTS2018, demonstrate that HAGAN outperforms the existing methods and achieves state-of-the-art performance in both high-resolution and low-resolution.
医学图像合成(MIS)在智能医疗领域中发挥着重要作用,大大降低了医疗诊断的经济和时间成本。然而,由于医学图像的复杂性和不同组织细胞的类似特征,现有方法在满足其生物一致性方面面临巨大挑战。为此,我们提出了混合增强生成对抗网络(HAGAN)来保持结构的真实性和组织细胞的真实性。HAGAN包括注意力混合(AttnMix)生成器、分层判别器和判别器和生成器的反向跳过连接。AttnMix一致性差分 regularization 鼓励在真实和假图像之间关注结构和组织学变异性,从而提高合成图像的病理完整性以及局部区域的特征准确性。分层判别器引入了逐像素判别反馈来增强生成器,以同时提高全局和局部细节的清晰度和鉴别度。反向跳过连接通过融合真实和合成分布特征进一步提高了准确度。我们在三个不同规模的数据集(即 COVID-CT、ACDC 和 BraTS2018)上的实验评估结果表明,HAGAN 优于现有方法,在 both high-resolution 和 low-resolution 高分辨率低分辨率方面实现了最先进的性能。
https://arxiv.org/abs/2405.04902
Amidst the COVID-19 pandemic, with many organizations, schools, colleges, and universities transitioning to virtual platforms, students encountered difficulties in acquiring PCs such as desktops or laptops. The starting prices, around 15,000 INR, often failed to offer adequate system specifications, posing a challenge for consumers. Additionally, those reliant on laptops for work found the conventional approach cumbersome. Enter the "Portable Smart Computer," a leap into the future of computing. This innovative device boasts speed and performance comparable to traditional desktops but in a compact, energy-efficient, and cost-effective package. It delivers a seamless desktop experience, whether one is editing documents, browsing multiple tabs, managing spreadsheets, or creating presentations. Moreover, it supports programming languages like Python, C, C++, as well as compilers such as Keil and Xilinx, catering to the needs of programmers.
在新冠病毒大流行期间,许多组织、学校、大学和学院都转向了虚拟平台,这使得学生购买台式机或笔记本电脑时遇到了困难。这些起始价格约为15,000印度卢比往往无法提供足够的系统规格,对消费者来说是一个挑战。此外,那些依赖笔记本电脑工作的人发现传统方法费力。 现在,我们推出了一种“便携式智能电脑”,这是一款跳跃到未来计算方式的设备。这款创新设备具有与传统台式机相当的速度和性能,但体积小巧,节能,经济实惠。它提供了无缝的桌面体验,无论是编辑文档、浏览多个标签页、管理电子表格还是创建演示文稿,都能轻松应对。此外,它还支持Python、C、C++编程语言以及像Keil和Xilinx这样的编译器,满足程序员的需求。
https://arxiv.org/abs/2405.05292
HiRISE (High-Resolution Imaging Science Experiment) is a camera onboard the Mars Reconnaissance orbiter responsible for photographing vast areas of the Martian surface in unprecedented detail. It can capture millions of incredible closeup images in minutes. However, Mars suffers from frequent regional and local dust storms hampering this data-collection process, and pipeline, resulting in loss of effort and crucial flight time. Removing these images manually requires a large amount of manpower. I filter out these images obstructed by atmospheric dust automatically by using a Dust Image Classifier fine-tuned on Resnet-50 with an accuracy of 94.05%. To further facilitate the seamless filtering of Images I design a prediction pipeline that classifies and stores these dusty patches. I also denoise partially obstructed images using an Auto Encoder-based denoiser and Pix2Pix GAN with 0.75 and 0.99 SSIM Index respectively
嗨,HiRISE(高分辨率成像科学实验)是一种安装在火星侦察轨道器上的相机,负责在火星表面拍摄前所未有的详细照片。它可以在几秒钟内捕捉到数百万张令人惊叹的近景照片。然而,火星经常受到区域和局部沙暴的困扰,这会阻碍数据收集过程和管道,导致努力和关键飞行时间的损失。通过使用在Resnet-50上微调的Dust Image Classifier自动过滤这些受大气尘埃影响的图像,我的算法可以实现94.05%的准确率。为了进一步简化图像的过滤,我还设计了一个预测管道,该管道对图像进行分类并存储这些沙尘斑块。为了消除部分受阻图像,我还使用基于自动编码器的去噪算法和Pix2Pix GAN,其SSIM指数分别为0.75和0.99。
https://arxiv.org/abs/2405.04807
HiRISE (High-Resolution Imaging Science Experiment) is a camera onboard the Mars Reconnaissance orbiter responsible for photographing vast areas of the Martian surface in unprecedented detail. It can capture millions of incredible closeup images in minutes. However, Mars suffers from frequent regional and local dust storms hampering this data-collection process, and pipeline, resulting in loss of effort and crucial flight time. Removing these images manually requires a large amount of manpower. I filter out these images obstructed by atmospheric dust automatically by using a Dust Image Classifier fine-tuned on Resnet-50 with an accuracy of 94.05%. To further facilitate the seamless filtering of Images I design a prediction pipeline that classifies and stores these dusty patches. I also denoise partially obstructed images using an Auto Encoder-based denoiser and Pix2Pix GAN with 0.75 and 0.99 SSIM Index respectively.
嗨,HiRISE(高分辨率成像科学实验)是一个位于火星侦察卫星上的相机,负责通过前所未有的详细程度拍摄火星表面的广阔地区。它可以在几秒钟内捕捉到数百万张令人惊叹的近景图像。然而,火星经常受到区域和局部尘暴的困扰,这阻碍了数据收集过程和管道,导致精力浪费和关键飞行时间损失。通过手动移除这些图像,需要大量的人力。我通过在Resnet-50上使用经过微调的Dust Image Classifier来自动过滤掉这些被大气尘土遮挡的图像,其准确率可以达到94.05%。为了进一步方便图像的自动过滤,我还设计了一个预测管道,该管道对图像进行分类并存储这些尘土斑块。此外,我还使用基于自动编码器的去噪算法和Pix2Pix GAN分别对部分阻塞图像进行去噪,其SSIM指数分别为0.75和0.99。
https://arxiv.org/abs/2405.04722
In this paper, we present a comprehensive survey of the metaverse, envisioned as a transformative dimension of next-generation Internet technologies. This study not only outlines the structural components of our survey but also makes a substantial scientific contribution by elucidating the foundational concepts underlying the emergence of the metaverse. We analyze its architecture by defining key characteristics and requirements, thereby illuminating the nascent reality set to revolutionize digital interactions. Our analysis emphasizes the importance of collaborative efforts in developing metaverse standards, thereby fostering a unified understanding among industry stakeholders, organizations, and regulatory bodies. We extend our scrutiny to critical technologies integral to the metaverse, including interactive experiences, communication technologies, ubiquitous computing, digital twins, artificial intelligence, and cybersecurity measures. For each technological domain, we rigorously assess current contributions, principal techniques, and representative use cases, providing a nuanced perspective on their potential impacts. Furthermore, we delve into the metaverse's diverse applications across education, healthcare, business, social interactions, industrial sectors, defense, and mission-critical operations, highlighting its extensive utility. Each application is thoroughly analyzed, demonstrating its value and addressing associated challenges. The survey concludes with an overview of persistent challenges and future directions, offering insights into essential considerations and strategies necessary to harness the full potential of the metaverse. Through this detailed investigation, our goal is to articulate the scientific contributions of this survey paper, transcending a mere structural overview to highlight the transformative implications of the metaverse.
在本文中,我们对元宇宙进行了全面的调查,将其视为下一代互联网技术的变革性维度。本研究不仅概述了我们的调查结构,而且通过阐明元宇宙出现的基础概念,对科学做出了重大贡献。我们通过定义关键特征和要求来分析其架构,从而阐明元宇宙的萌芽现实,为数字互动的变革奠定基础。我们强调开发元宇宙标准的过程中需要合作努力,从而促进产业利益相关者、组织和监管机构之间的统一理解。我们将目光投向元宇宙关键技术的扩展,包括交互体验、通信技术、普适计算、数字孪生、人工智能和安全措施。对于每个技术领域,我们严格评估其现有贡献、主要技术和代表性的使用案例,提供对这些潜在影响的深入视角。此外,我们深入研究了元宇宙在教育、医疗、商业、社交互动、工业部门、国防和任务关键操作等各个领域的广泛应用,突出其应用价值。每个应用都被深入分析,证明其价值并解决相关挑战。调查结论概述了持续的挑战和未来的方向,为充分利用元宇宙提供了重要见解。通过这一详尽的调查,我们的目标是阐述本调查报告的科学贡献,超越简单的结构概述,突出元宇宙的变革性影响。
https://arxiv.org/abs/2405.04718
Exact nearest neighbor search is a computationally intensive process, and even its simpler sibling -- vector retrieval -- can be computationally complex. This is exacerbated when retrieving vectors which have high-dimension $d$ relative to the number of vectors, $N$, in the database. Exact nearest neighbor retrieval has been generally acknowledged to be a $O(Nd)$ problem with no sub-linear solutions. Attention has instead shifted towards Approximate Nearest-Neighbor (ANN) retrieval techniques, many of which have sub-linear or even logarithmic time complexities. However, if our intuition from binary search problems (e.g. $d=1$ vector retrieval) carries, there ought to be a way to retrieve an organized representation of vectors without brute-forcing our way to a solution. For low dimension (e.g. $d=2$ or $d=3$ cases), \texttt{kd-trees} provide a $O(d\log N)$ algorithm for retrieval. Unfortunately the algorithm deteriorates rapidly to a $O(dN)$ solution at high dimensions (e.g. $k=128$), in practice. We propose a novel algorithm for logarithmic Fast Exact Retrieval for Nearest-neighbor lookup (FERN), inspired by \texttt{kd-trees}. The algorithm achieves $O(d\log N)$ look-up with 100\% recall on 10 million $d=128$ uniformly randomly generated vectors.\footnote{Code available at this https URL}
精确近邻搜索是一个计算密集型过程,甚至连其简单的兄弟——向量检索——也可能具有计算复杂性。当检索具有高维$d$与数据库中向量数量$N$相对时,这种情况尤为加剧。精确近邻检索通常被认为是$O(Nd)$问题,没有sub-线性解决方案。相反,注意力转向了向量检索的近似近邻(ANN)技术,其中许多具有sub-linear或甚至对数时间复杂性。然而,从二进制搜索问题(例如,$d=1$向量检索)的直觉告诉我们,应该有一种不需要暴力搜索解决方案的方法来检索向量的组织表示。对于低维度(例如,$d=2$或$d=3$的情况),\texttt{kd-trees}提供了一个$O(d\log N)$的检索算法。然而,在高层维度(例如,$k=128$)上,该算法迅速退化为一$O(dN)$的解决方案,在实际应用中。我们提出了一个新颖的logarithmic Fast Exact Retrieval for Nearest-neighbor lookup (FERN)算法,灵感来自\texttt{kd-trees}。该算法在100\%召回率的情况下,具有$O(d\log N)$的查找速度,对128个均匀随机生成的$d=128$维向量进行检索。附录中的代码可在该https URL上找到。
https://arxiv.org/abs/2405.04435
Artificial neural networks (ANNs) perform extraordinarily on numerous tasks including classification or prediction, e.g., speech processing and image classification. These new functions are based on a computational model that is enabled to select freely all necessary internal model parameters as long as it eventually delivers the functionality it is supposed to exhibit. Here, we review the connection between the model parameter selection in machine learning (ML) algorithms running on ANNs and the epistemological theory of neopragmatism focusing on the theory's utility and anti-representationalist aspects. To understand the consequences of the model parameter selection of an ANN, we suggest using neopragmatist theories whose implications are well studied. Incidentally, neopragmatism's notion of optimization is also based on utility considerations. This means that applying this approach elegantly reveals the inherent connections between optimization in ML, using a numerical method during the learning phase, and optimization in the ethical theory of consequentialism, where it occurs as a maxim of action. We suggest that these connections originate from the way relevance is calculated in ML systems. This could ultimately reveal a tendency for specific actions in ML systems.
人工智能神经网络(ANNs)在许多任务中表现出色,包括分类或预测,例如语音处理和图像分类。这些新功能基于一种计算模型,该模型允许在最终实现其预期功能时自由选择所有必要的内部模型参数。在这里,我们回顾了机器学习(ML)算法中模型参数选择与新格言主义理论之间的联系,重点关注其理论的实用性和反表现主义 aspects。为了了解ANN模型参数选择的影响,我们建议使用研究 implications 好的新格言主义理论。值得注意的是,新格言主义的优化概念也是基于效用考虑的。这意味着应用这种方法恰当地揭示了在机器学习中的优化问题,以及在伦理后果理论中的优化问题,那里它在作为行动最大化的目标时发生。我们建议,这些联系源于ML系统中的相关性计算方式。这最终可能揭示出ML系统中的特定行动趋势。
https://arxiv.org/abs/2405.04386
The malware booming is a cyberspace equal to the effect of climate change to ecosystems in terms of danger. In the case of significant investments in cybersecurity technologies and staff training, the global community has become locked up in the eternal war with cyber security threats. The multi-form and changing faces of malware are continuously pushing the boundaries of the cybersecurity practitioners employ various approaches like detection and mitigate in coping with this issue. Some old mannerisms like signature-based detection and behavioral analysis are slow to adapt to the speedy evolution of malware types. Consequently, this paper proposes the utilization of the Deep Learning Model, LSTM networks, and GANs to amplify malware detection accuracy and speed. A fast-growing, state-of-the-art technology that leverages raw bytestream-based data and deep learning architectures, the AI technology provides better accuracy and performance than the traditional methods. Integration of LSTM and GAN model is the technique that is used for the synthetic generation of data, leading to the expansion of the training datasets, and as a result, the detection accuracy is improved. The paper uses the VirusShare dataset which has more than one million unique samples of the malware as the training and evaluation set for the presented models. Through thorough data preparation including tokenization, augmentation, as well as model training, the LSTM and GAN models convey the better performance in the tasks compared to straight classifiers. The research outcomes come out with 98% accuracy that shows the efficiency of deep learning plays a decisive role in proactive cybersecurity defense. Aside from that, the paper studies the output of ensemble learning and model fusion methods as a way to reduce biases and lift model complexity.
恶意软件爆炸是一种与气候变化对生态系统的影响相等的网络空间。在大量投入网络安全技术和员工培训的情况下,全球社区已经陷入了与网络安全威胁的永恒战争中。恶意软件的多形态和不断变化的面孔不断推动网络安全实践者采用各种检测和减轻方法应对这一问题。一些老方法如基于签名的检测和行为分析在应对恶意软件类型快速演变方面较慢。因此,本文提出了使用深度学习模型、LSTM网络和GANs来提高恶意软件检测精度和速度。一种利用原始字节流数据和深度学习架构快速生长的最先进技术,人工智能技术提供了比传统方法更好的准确性和性能。LSTM和GAN模型的集成是用于数据合成技术的方法,导致训练数据集的扩展,从而提高了检测精度。本文使用VirusShare数据集,该数据集有超过一百万个独特的恶意软件样本作为训练和评估集,通过包括分词、增强以及模型训练等彻底的数据准备,LSTM和GAN模型在任务表现上比直接分类器更好。通过研究结果呈现了98%的准确度,这表明在主动网络安全防御中,深度学习的有效性具有决定性的作用。此外,本文研究了集成学习方法和模型融合方法的结果,以减少偏见和提高模型复杂性。
https://arxiv.org/abs/2405.04373
We present a new multi-modal face image generation method that converts a text prompt and a visual input, such as a semantic mask or scribble map, into a photo-realistic face image. To do this, we combine the strengths of Generative Adversarial networks (GANs) and diffusion models (DMs) by employing the multi-modal features in the DM into the latent space of the pre-trained GANs. We present a simple mapping and a style modulation network to link two models and convert meaningful representations in feature maps and attention maps into latent codes. With GAN inversion, the estimated latent codes can be used to generate 2D or 3D-aware facial images. We further present a multi-step training strategy that reflects textual and structural representations into the generated image. Our proposed network produces realistic 2D, multi-view, and stylized face images, which align well with inputs. We validate our method by using pre-trained 2D and 3D GANs, and our results outperform existing methods. Our project page is available at this https URL.
我们提出了一种新的多模态面部图像生成方法,可以将文本提示和视觉输入(如语义掩码或涂鸦图)转换为逼真的照片写实面部图像。为此,我们通过将多模态特征在扩散模型(DMs)的潜在空间中应用GANs的优点,将DM中的多模态特征与预训练GANs的潜在空间相结合。我们提出了一个简单的映射和风格调节网络,用于将特征图和注意图中的有意义表示转换为潜在代码。通过GAN反向传播,估计的潜在代码可用于生成2D或3D意识面部图像。我们还进一步提出了一种多步骤训练策略,将文本结构和结构表示融入生成的图像中。我们提出的网络生成逼真的2D、多视角和风格化的面部图像,与输入相一致。通过使用预训练的2D和3D GANs进行验证,我们的结果优于现有方法。我们的项目页面可在https://url.com/中找到。
https://arxiv.org/abs/2405.04356
Offline reinforcement learning (RL) provides a promising approach to avoid costly online interaction with the real environment. However, the performance of offline RL highly depends on the quality of the datasets, which may cause extrapolation error in the learning process. In many robotic applications, an inaccurate simulator is often available. However, the data directly collected from the inaccurate simulator cannot be directly used in offline RL due to the well-known exploration-exploitation dilemma and the dynamic gap between inaccurate simulation and the real environment. To address these issues, we propose a novel approach to combine the offline dataset and the inaccurate simulation data in a better manner. Specifically, we pre-train a generative adversarial network (GAN) model to fit the state distribution of the offline dataset. Given this, we collect data from the inaccurate simulator starting from the distribution provided by the generator and reweight the simulated data using the discriminator. Our experimental results in the D4RL benchmark and a real-world manipulation task confirm that our method can benefit more from both inaccurate simulator and limited offline datasets to achieve better performance than the state-of-the-art methods.
离线强化学习(RL)提供了一种有前途的方法来避免与真实环境进行昂贵的在线交互。然而,离线RL的性能高度依赖于数据质量,这可能导致学习过程中的扩展误差。在许多机器人应用中,通常缺乏准确的仿真器。然而,由于众所周知的学习-探索困境和仿真器和现实环境之间的动态差距,直接从不准确的仿真器中收集的数据无法直接用于离线RL。为解决这些问题,我们提出了一种结合离线数据和低质量仿真数据的新方法。具体来说,我们预训练了一个生成对抗网络(GAN)模型来适应离线数据的状态分布。然后,我们从生成器提供的分布开始收集离线仿真器的数据,并使用判别器对模拟数据进行重新加权。我们在D4RL基准和现实世界的操作任务中的实验结果证实,我们的方法可以从不准确的仿真器和有限离线数据中获得更好的性能,比最先进的方法具有更高的性能。
https://arxiv.org/abs/2405.04307
Over the past two decades, machine analysis of medical imaging has advanced rapidly, opening up significant potential for several important medical applications. As complicated diseases increase and the number of cases rises, the role of machine-based imaging analysis has become indispensable. It serves as both a tool and an assistant to medical experts, providing valuable insights and guidance. A particularly challenging task in this area is lesion segmentation, a task that is challenging even for experienced radiologists. The complexity of this task highlights the urgent need for robust machine learning approaches to support medical staff. In response, we present our novel solution: the D-TrAttUnet architecture. This framework is based on the observation that different diseases often target specific organs. Our architecture includes an encoder-decoder structure with a composite Transformer-CNN encoder and dual decoders. The encoder includes two paths: the Transformer path and the Encoders Fusion Module path. The Dual-Decoder configuration uses two identical decoders, each with attention gates. This allows the model to simultaneously segment lesions and organs and integrate their segmentation losses. To validate our approach, we performed evaluations on the Covid-19 and Bone Metastasis segmentation tasks. We also investigated the adaptability of the model by testing it without the second decoder in the segmentation of glands and nuclei. The results confirmed the superiority of our approach, especially in Covid-19 infections and the segmentation of bone metastases. In addition, the hybrid encoder showed exceptional performance in the segmentation of glands and nuclei, solidifying its role in modern medical image analysis.
在过去的20年里,医学影像分析机器审查经历了迅速的发展,为多个重要医疗应用打开了巨大的潜力。随着复杂疾病的增加和病例数量的上升,基于机器的影像分析的作用变得越来越不可或缺。它既是医疗专家的工具,又是他们的助手,为他们提供宝贵的见解和指导。在这个领域,一个特别具有挑战性的任务是病灶分割,即使是有经验的放射科医生也会觉得很难。这个任务的复杂性凸显了支持医疗工作人员需要稳健的机器学习方法这一紧迫需求。因此,我们提出了我们新颖的解决方案:D-TrAttUnet架构。 这个框架基于一个观察,即不同疾病通常针对特定的器官。我们的架构包括一个编码器-解码器结构,其中包含了一个复合Transformer-CNN编码器和一个双解码器。编码器包括两个路径:Transformer路径和编码器融合模块路径。双解码器配置使用两个相同的解码器,每个解码器具有注意力机制。这使得模型能够同时分割病灶和器官,并将其分割损失整合到一起。为了验证我们的方法,我们在Covid-19和骨转移瘤分割任务上进行了评估。我们还在没有第二个解码器的情况下测试了分割神经元的模型。结果证实了我们的方法在Covid-19感染和骨转移瘤分割方面的优越性。此外,混合编码器在分割神经元方面的表现尤为出色,进一步巩固了它在现代医学影像分析中的角色。
https://arxiv.org/abs/2405.04169
This study explores the limitations of traditional Cybersecurity Awareness and Training (CSAT) programs and proposes an innovative solution using Generative Pre-Trained Transformers (GPT) to address these shortcomings. Traditional approaches lack personalization and adaptability to individual learning styles. To overcome these challenges, the study integrates GPT models to deliver highly tailored and dynamic cybersecurity learning expe-riences. Leveraging natural language processing capabilities, the proposed approach personalizes training modules based on individual trainee pro-files, helping to ensure engagement and effectiveness. An experiment using a GPT model to provide a real-time and adaptive CSAT experience through generating customized training content. The findings have demonstrated a significant improvement over traditional programs, addressing issues of en-gagement, dynamicity, and relevance. GPT-powered CSAT programs offer a scalable and effective solution to enhance cybersecurity awareness, provid-ing personalized training content that better prepares individuals to miti-gate cybersecurity risks in their specific roles within the organization.
这项研究探讨了传统网络安全意识与培训(CSAT)计划的局限性,并提出了利用生成式预训练 transformer(GPT)的创新解决方案来解决这些不足。传统方法缺乏个性化和对个人学习风格的适应性。为了克服这些挑战,研究将 GPT 模型集成到提供高度定制化和动态化的网络安全学习体验中。利用自然语言处理能力,基于个人训练员档案个性化培训模块,有助于确保参与度和有效性。使用 GPT 模型通过生成定制化培训内容提供实时和自适应的 CSAT 体验。研究结果表明,与传统计划相比,这种方法取得了显著的改善,解决了参与度、动态性和相关性等问题。GPT-驱动的 CSAT 计划为提高网络安全意识提供了可扩展且有效的解决方案,为个人提供定制化的培训内容,帮助他们在组织中应对特定的安全风险。
https://arxiv.org/abs/2405.04138
With the rapid expansion of academic literature and the proliferation of preprints, researchers face growing challenges in manually organizing and labeling large volumes of articles. The NSLP 2024 FoRC Shared Task I addresses this challenge organized as a competition. The goal is to develop a classifier capable of predicting one of 123 predefined classes from the Open Research Knowledge Graph (ORKG) taxonomy of research fields for a given article.This paper presents our results. Initially, we enrich the dataset (containing English scholarly articles sourced from ORKG and arXiv), then leverage different pre-trained language Models (PLMs), specifically BERT, and explore their efficacy in transfer learning for this downstream task. Our experiments encompass feature-based and fine-tuned transfer learning approaches using diverse PLMs, optimized for scientific tasks, including SciBERT, SciNCL, and SPECTER2. We conduct hyperparameter tuning and investigate the impact of data augmentation from bibliographic databases such as OpenAlex, Semantic Scholar, and Crossref. Our results demonstrate that fine-tuning pre-trained models substantially enhances classification performance, with SPECTER2 emerging as the most accurate model. Moreover, enriching the dataset with additional metadata improves classification outcomes significantly, especially when integrating information from S2AG, OpenAlex and Crossref. Our best-performing approach achieves a weighted F1-score of 0.7415. Overall, our study contributes to the advancement of reliable automated systems for scholarly publication categorization, offering a potential solution to the laborious manual curation process, thereby facilitating researchers in efficiently locating relevant resources.
https://arxiv.org/abs/2405.04136
Segmentation of organs of interest in medical CT images is beneficial for diagnosis of diseases. Though recent methods based on Fully Convolutional Neural Networks (F-CNNs) have shown success in many segmentation tasks, fusing features from images with different scales is still a challenge: (1) Due to the lack of spatial awareness, F-CNNs share the same weights at different spatial locations. (2) F-CNNs can only obtain surrounding information through local receptive fields. To address the above challenge, we propose a new segmentation framework based on attention mechanisms, named MFA-Net (Multi-Scale Feature Fusion Attention Network). The proposed framework can learn more meaningful feature maps among multiple scales and result in more accurate automatic segmentation. We compare our proposed MFA-Net with SOTA methods on two 2D liver CT datasets. The experimental results show that our MFA-Net produces more precise segmentation on images with different scales.
器官感兴趣区在医学CT图像中的分割对疾病诊断有益。尽管基于Fully Convolutional Neural Networks(F-CNNs)的最近方法在许多分割任务中取得了成功,但融合不同尺度图像的特征仍然具有挑战性:(1)由于缺乏空间意识,F-CNNs在不同的空间位置共享相同的权重。(2)F-CNNs只能通过局部感受野获取周围信息。为了应对上述挑战,我们提出了一个基于注意机制的新分割框架,名为MFA-Net(多尺度特征融合注意网络)。该框架可以在多个尺度上学习更意义的特征图,从而实现更精确的自动分割。我们将MFA-Net与两个2D肝脏CT数据集上的最佳方法进行比较。实验结果表明,我们的MFA-Net在尺度不同的图像上产生了更精确的分割。
https://arxiv.org/abs/2405.04064
Recent developments in Large Language Models (LLMs) have significantly expanded their applications across various domains. However, the effectiveness of LLMs is often constrained when operating individually in complex environments. This paper introduces a transformative approach by organizing LLMs into community-based structures, aimed at enhancing their collective intelligence and problem-solving capabilities. We investigate different organizational models-hierarchical, flat, dynamic, and federated-each presenting unique benefits and challenges for collaborative AI systems. Within these structured communities, LLMs are designed to specialize in distinct cognitive tasks, employ advanced interaction mechanisms such as direct communication, voting systems, and market-based approaches, and dynamically adjust their governance structures to meet changing demands. The implementation of such communities holds substantial promise for improve problem-solving capabilities in AI, prompting an in-depth examination of their ethical considerations, management strategies, and scalability potential. This position paper seeks to lay the groundwork for future research, advocating a paradigm shift from isolated to synergistic operational frameworks in AI research and application.
近年来,大型语言模型(LLMs)在各种领域的应用已经显著扩展。然而,当它们在复杂环境中单独运作时,LLMs的有效性常常受到限制。本文提出了一种变革性的方法,通过将LLMs组织成基于社区的结构,旨在增强它们的集体智能和问题解决能力。我们研究了不同的组织模式——层次结构、平面、动态和联邦模式,每个都具有独特的优势和挑战,用于构建协同人工智能系统。在这些组织社区中,LLMs被设计用于专业化的认知任务,采用先进的互动机制,如直接通信、投票系统和市场基于方法,以及动态调整其治理结构以满足不断变化的需求。 such社区的实施在提高人工智能的解决问题能力方面具有巨大的潜力,促使对它们的伦理考虑、管理策略和可扩展性潜力进行深入的研究。本立场论文旨在为未来的研究奠定基础,倡导在人工智能研究和应用中从孤立到协同的运营框架的范式转移。
https://arxiv.org/abs/2405.03825
Artificial intelligence (AI) is increasingly used in every stage of drug development. Continuing breakthroughs in AI-based methods for drug discovery require the creation, improvement, and refinement of drug discovery data. We posit a new data challenge that slows the advancement of drug discovery AI: datasets are often collected independently from each other, often with little overlap, creating data sparsity. Data sparsity makes data curation difficult for researchers looking to answer key research questions requiring values posed across multiple datasets. We propose a novel diffusion GNN model Syngand capable of generating ligand and pharmacokinetic data end-to-end. We show and provide a methodology for sampling pharmacokinetic data for existing ligands using our Syngand model. We show the initial promising results on the efficacy of the Syngand-generated synthetic target property data on downstream regression tasks with AqSolDB, LD50, and hERG central. Using our proposed model and methodology, researchers can easily generate synthetic ligand data to help them explore research questions that require data spanning multiple datasets.
人工智能(AI)在药物研发的各个阶段都得到了越来越广泛的应用。在基于AI的药物发现方法的持续突破需要创建、改进和优化药物发现数据的同时,我们提出了一个新的数据挑战,从而减缓了药物研发AI的进步:数据集通常是从彼此独立收集的,往往缺乏重叠,导致数据稀疏。数据稀疏使得研究人员在回答需要跨越多个数据集的关键研究问题时,数据策展变得困难。为了克服这一挑战,我们提出了一个新扩散GNN模型Syngand,能够生成端到端的小分子和药代动力学数据。我们用Syngand模型对现有小分子进行了采样,并提供了用于现有小分子药代动力学数据采样的方法。我们在AqSolDB、LD50和hERG central等下游回归任务中,展示了Syngand生成的模拟目标数据对药物有效性具有良好的前景。使用我们提出的模型和方法,研究人员可以轻松地生成跨越多个数据集的模拟小分子数据,以帮助他们探索需要跨多个数据集的研究问题。
https://arxiv.org/abs/2405.03799