Neural radiance fields (NeRF) have revolutionized the field of image-based view synthesis. However, NeRF uses straight rays and fails to deal with complicated light path changes caused by refraction and reflection. This prevents NeRF from successfully synthesizing transparent or specular objects, which are ubiquitous in real-world robotics and A/VR applications. In this paper, we introduce the refractive-reflective field. Taking the object silhouette as input, we first utilize marching tetrahedra with a progressive encoding to reconstruct the geometry of non-Lambertian objects and then model refraction and reflection effects of the object in a unified framework using Fresnel terms. Meanwhile, to achieve efficient and effective anti-aliasing, we propose a virtual cone supersampling technique. We benchmark our method on different shapes, backgrounds and Fresnel terms on both real-world and synthetic datasets. We also qualitatively and quantitatively benchmark the rendering results of various editing applications, including material editing, object replacement/insertion, and environment illumination estimation. Codes and data are publicly available at this https URL.
神经网络辐射场(NeRF)已经彻底改变了基于图像视图合成的领域。然而,NeRF使用直线光线,并无法处理由折射和反射引起的复杂的光路径变化。这导致NeRF无法成功合成透明或闪耀的物体,它们在现实世界机器人和虚拟现实应用中无处不在。在本文中,我们介绍了折射反射域。将物体轮廓作为输入,我们首先使用逐步编码的立方体重构非Lambertian物体的几何形状,然后使用费斯涅尔术语在一个统一框架中模型物体的折射和反射效果。同时,为了高效且有效地减少失真,我们提出了一个虚拟锥超采样技术。我们在不同的形状、背景和费斯涅尔术语的现实世界和合成数据集上对我们的算法进行了基准测试。我们还定性和定量基准了各种编辑应用程序的渲染结果,包括材料编辑、物体替换/插入和环境照明估计。代码和数据在这个httpsURL上公开可用。
https://arxiv.org/abs/2309.13039
The reconstruction kernel in computed tomography (CT) generation determines the texture of the image. Consistency in reconstruction kernels is important as the underlying CT texture can impact measurements during quantitative image analysis. Harmonization (i.e., kernel conversion) minimizes differences in measurements due to inconsistent reconstruction kernels. Existing methods investigate harmonization of CT scans in single or multiple manufacturers. However, these methods require paired scans of hard and soft reconstruction kernels that are spatially and anatomically aligned. Additionally, a large number of models need to be trained across different kernel pairs within manufacturers. In this study, we adopt an unpaired image translation approach to investigate harmonization between and across reconstruction kernels from different manufacturers by constructing a multipath cycle generative adversarial network (GAN). We use hard and soft reconstruction kernels from the Siemens and GE vendors from the National Lung Screening Trial dataset. We use 50 scans from each reconstruction kernel and train a multipath cycle GAN. To evaluate the effect of harmonization on the reconstruction kernels, we harmonize 50 scans each from Siemens hard kernel, GE soft kernel and GE hard kernel to a reference Siemens soft kernel (B30f) and evaluate percent emphysema. We fit a linear model by considering the age, smoking status, sex and vendor and perform an analysis of variance (ANOVA) on the emphysema scores. Our approach minimizes differences in emphysema measurement and highlights the impact of age, sex, smoking status and vendor on emphysema quantification.
在计算机断层扫描(CT)生成中,重建内核一致性至关重要,因为 underlying CT texture 在 quantitative image analysis 中可能会影响测量结果。一致性(即内核转换)最小化由于不一致的重建内核引起的测量差异。现有方法研究在一家或多家制造商中一致性 CT 扫描。但是,这些方法需要具有空间和行为上的匹配的硬和软的重建内核的配对扫描。此外,需要在制造商内部不同内核配对之间训练大量模型。在本研究中,我们采用一个无配对的图像转换方法,以研究来自不同制造商的重建内核之间的一致性,并通过构建多路径循环生成对抗网络(GAN)来构建路径循环生成器。我们使用来自国家肺筛检试验数据集的西门子和GE的硬和软的重建内核。我们使用每个重建内核的 50 次扫描训练路径循环生成器。为了评估一致性对重建内核的影响,我们每个从西门子硬内核、GE软内核和GE硬内核中将 50 次扫描 harmonize 到西门子软内核(B30f)上并评估微血管计数。我们考虑年龄、吸烟状况、性别和供应商等因素,并使用线性模型进行方差分析,以评估微血管计数结果的精度。我们的方法最小化了微血管测量的差异,并强调年龄、性别、吸烟状况和供应商对微血管计数量化的影响。
https://arxiv.org/abs/2309.12953
One of the problems in quantitative finance that has received the most attention is the portfolio optimization problem. Regarding its solving, this problem has been approached using different techniques, with those related to quantum computing being especially prolific in recent years. In this study, we present a system called Quantum Computing-based System for Portfolio Optimization with Future Asset Values and Automatic Universe Reduction (Q4FuturePOP), which deals with the Portfolio Optimization Problem considering the following innovations: i) the developed tool is modeled for working with future prediction of assets, instead of historical values; and ii) Q4FuturePOP includes an automatic universe reduction module, which is conceived to intelligently reduce the complexity of the problem. We also introduce a brief discussion about the preliminary performance of the different modules that compose the prototypical version of Q4FuturePOP.
在量化金融中,最受关注的问题之一是投资组合优化问题。关于如何解决这一问题,已经采用了多种技术,与量子计算相关的技术尤为活跃。在本研究中,我们介绍了一个系统,称为基于量子计算的投资组合优化系统,包括未来资产价值自动宇宙减少(Q4FuturePOP)。该系统处理了投资组合优化问题,考虑了以下创新:第一,开发工具是建模用于处理未来资产预测,而不是历史价值;第二,Q4FuturePOP包括一个自动宇宙减少模块,旨在 intelligently 减少问题的复杂性。我们还介绍了关于组成Q4FuturePOP的典型版本不同模块的初步性能的简要讨论。
https://arxiv.org/abs/2309.12627
Network Architecture Search and specifically Regularized Evolution is a common way to refine the structure of a deep learning model.However, little is known about how models empirically evolve over time which has design implications for designing caching policies, refining the search algorithm for particular applications, and other important use this http URL this work, we algorithmically analyze and quantitatively characterize the patterns of model evolution for a set of models from the Candle project and the Nasbench-201 search space.We show how the evolution of the model structure is influenced by the regularized evolution algorithm. We describe how evolutionary patterns appear in distributed settings and opportunities for caching and improved scheduling. Lastly, we describe the conditions that affect when particular model architectures rise and fall in popularity based on their frequency of acting as a donor in a sliding window.
网络架构搜索(特别 RegularizedEvolution 是一种常用的方法,以优化深度学习模型的结构)是一种特殊的方法,用于改进模型的结构。然而,对于模型如何随着时间的推移而经验进化,知之甚少,这在设计缓存策略、优化特定应用的搜索算法以及其他重要方面都有设计影响。在本文中,我们算法ically 分析并定量 characterized 了从 candle 项目和 Nasbench-201 搜索空间中选取一组模型的模型进化模式。我们展示了模型结构如何受到 RegularizedEvolution 算法的影响。我们描述了如何在分布式环境中出现进化模式,以及缓存和改进调度的机会。最后,我们描述了特定模型架构的流行度如何基于它们在滑动窗口中作为捐赠者的频率而变化的条件。
https://arxiv.org/abs/2309.12576
Children typically learn to identify and express emotions through sharing their stories and feelings with others, particularly their family. However, it is challenging for parents or siblings to have emotional communication with children since children are still developing their communication skills. We present ChaCha, a chatbot that encourages and guides children to share personal events and associated emotions. ChaCha combines a state machine and large language models (LLMs) to keep the dialogue on track while carrying on free-form conversations. Through an exploratory study with 20 children (aged 8-12), we examine how ChaCha prompts children to share personal events and guides them to describe associated emotions. Participants perceived ChaCha as a close friend and shared their stories on various topics, such as family trips and personal achievements. Based on the quantitative and qualitative findings, we discuss opportunities for leveraging LLMs to design child-friendly chatbots to support children in sharing their emotions.
孩子们通常通过与他人分享故事和感受,特别是与家人分享,来识别和表达情感。然而,对于父母或兄弟姐妹来说,与孩子们进行情感交流是很困难的,因为孩子们仍在发展他们的沟通能力。我们介绍了ChaCha,一个聊天机器人,它鼓励并指导孩子们分享个人事件和相关的情感上的内容。ChaCha结合了状态机和大型语言模型(LLM),以保持对话跟踪,同时进行自由对话。通过与20名年龄在8-12岁的孩子们进行探索性研究,我们研究了ChaCha如何引导孩子们分享个人事件,并指导他们描述相关的情感。参与者认为ChaCha是他们的好朋友,并在不同的主题上,如家庭旅行和个人成就,分享了他们的故事。基于定量和定性研究结果,我们讨论了利用LLM设计儿童友好的聊天机器人的机会,以支持孩子们分享他们的情感。
https://arxiv.org/abs/2309.12244
In 3D human shape and pose estimation from a monocular video, models trained with limited labeled data cannot generalize well to videos with occlusion, which is common in the wild videos. The recent human neural rendering approaches focusing on novel view synthesis initialized by the off-the-shelf human shape and pose methods have the potential to correct the initial human shape. However, the existing methods have some drawbacks such as, erroneous in handling occlusion, sensitive to inaccurate human segmentation, and ineffective loss computation due to the non-regularized opacity field. To address these problems, we introduce ORTexME, an occlusion-robust temporal method that utilizes temporal information from the input video to better regularize the occluded body parts. While our ORTexME is based on NeRF, to determine the reliable regions for the NeRF ray sampling, we utilize our novel average texture learning approach to learn the average appearance of a person, and to infer a mask based on the average texture. In addition, to guide the opacity-field updates in NeRF to suppress blur and noise, we propose the use of human body mesh. The quantitative evaluation demonstrates that our method achieves significant improvement on the challenging multi-person 3DPW dataset, where our method achieves 1.8 P-MPJPE error reduction. The SOTA rendering-based methods fail and enlarge the error up to 5.6 on the same dataset.
在从单眼视频中进行三维人类形状和姿态估计时,训练使用有限标记数据models并不能很好地适用于遮挡的视频,这在野生视频中很常见。最近,人类神经网络渲染方法专注于初始化由标准人类形状和姿态方法开发的新视角合成,有潜力纠正最初的人类形状。然而,现有的方法有一些缺点,例如,在处理遮挡时有误,对不准确的人类分割敏感,由于非规范化的opacity field,无效的损失计算。为了解决这些问题,我们介绍了ORTexME,它是一种遮挡 robust 的时间方法,利用输入视频的时间信息更好地规范化被遮挡的身体部分。虽然ORTexME基于NeRF,为了确定NeRF射线采样可靠的区域,我们采用了我们的新型平均纹理学习方法来学习人的平均外貌,并基于平均纹理推断一个掩膜。此外,为了指导NeRF的opacity field更新,抑制模糊和噪声,我们建议使用人体网格。定量评估表明,我们的方法在挑战性的多人3DPW数据集上取得了显著改进,我们的方法实现了1.8 P-MPJPE误差减少。SOTA渲染方法失败,并将误差增加到5.6。
https://arxiv.org/abs/2309.12183
A low-cost measurement system using filtering of measurements for two-wheeled balancing robot stabilisation purposes has been addressed in this paper. In particular, a measurement system based on gyroscope, accelerometer, and encoder has been considered. The measurements have been corrected for deterministic disturbances and then filtered with Kalman, $\alpha$-$\beta$ type, and complementary filters. A quantitative assessment of selected filters has been given. As a result, the complete structure of a measurement system has been obtained. The performance of the proposed measurement system has been validated experimentally by using a dedicated research rig.
本论文讨论了一种低成本测量系统,该测量系统用于稳定平衡自行车机器人的目的。特别,考虑了一种基于陀螺仪、加速度计和编码器的高度精确的测量系统。对确定性干扰进行了纠正,然后使用卡尔曼、$\alpha$和$\beta$类型的互补滤波器进行了过滤。对所选滤波器进行了量化评估。因此,获得了测量系统的完整结构。通过使用专门的研究仪器,对 proposed 测量系统的性能进行了实验验证。
https://arxiv.org/abs/2309.12169
With the rapid development of big data and computing devices, low-latency automatic trading platforms based on real-time information acquisition have become the main components of the stock trading market, so the topic of quantitative trading has received widespread attention. And for non-strongly efficient trading markets, human emotions and expectations always dominate market trends and trading decisions. Therefore, this paper starts from the theory of emotion, taking East Money as an example, crawling user comment titles data from its corresponding stock bar and performing data cleaning. Subsequently, a natural language processing model BERT was constructed, and the BERT model was fine-tuned using existing annotated data sets. The experimental results show that the fine-tuned model has different degrees of performance improvement compared to the original model and the baseline model. Subsequently, based on the above model, the user comment data crawled is labeled with emotional polarity, and the obtained label information is combined with the Alpha191 model to participate in regression, and significant regression results are obtained. Subsequently, the regression model is used to predict the average price change for the next five days, and use it as a signal to guide automatic trading. The experimental results show that the incorporation of emotional factors increased the return rate by 73.8\% compared to the baseline during the trading period, and by 32.41\% compared to the original alpha191 model. Finally, we discuss the advantages and disadvantages of incorporating emotional factors into quantitative trading, and give possible directions for further research in the future.
随着大数据和计算设备的迅速发展,基于实时信息获取的低延迟自动交易平台已成为股票市场的主要组成部分,因此量化交易话题已经引起了广泛的关注和重视。对于不太高效的交易市场,人类情感和期望总是占据市场趋势和交易决策的主导地位。因此,本文从情感理论入手,以东方财富为例,从对应股票 bar 的用户评论标题数据爬取用户评论数据并进行数据清洗。随后,自然语言处理模型 BERT 被构建出来,并使用现有的标注数据集进行微调。实验结果显示,微调模型的性能改进程度与原始模型和基准模型不同。随后,基于上述模型,用户评论数据爬行被标记为情感极性,并将所获得的标签信息与 Alpha191 模型结合,用于回归,并获得了显著的回归结果。随后,回归模型被用于预测未来五天的平均价格波动,并将其用作自动交易的信号指导。实验结果显示,将情感因素融入量化交易会增加回报率,相比基准期间提高了 73.8%,而与原始 Alpha191 模型相比则提高了 32.41%。最后,本文讨论了将情感因素融入量化交易的优点和缺点,并提供了未来研究的可能方向。
https://arxiv.org/abs/2309.11979
This paper explores predicting suitable prosodic features for fine-grained emotion analysis from the discourse-level text. To obtain fine-grained emotional prosodic features as predictive values for our model, we extract a phoneme-level Local Prosody Embedding sequence (LPEs) and a Global Style Embedding as prosodic speech features from the speech with the help of a style transfer model. We propose a Discourse-level Multi-scale text Prosodic Model (D-MPM) that exploits multi-scale text to predict these two prosodic features. The proposed model can be used to analyze better emotional prosodic features and thus guide the speech synthesis model to synthesize more expressive speech. To quantitatively evaluate the proposed model, we contribute a new and large-scale Discourse-level Chinese Audiobook (DCA) dataset with more than 13,000 utterances annotated sequences to evaluate the proposed model. Experimental results on the DCA dataset show that the multi-scale text information effectively helps to predict prosodic features, and the discourse-level text improves both the overall coherence and the user experience. More interestingly, although we aim at the synthesis effect of the style transfer model, the synthesized speech by the proposed text prosodic analysis model is even better than the style transfer from the original speech in some user evaluation indicators.
本论文探讨了从言语水平文本中预测精细的情感 prosodic 特征的方法。为了获得对于我们模型的精细情感 prosodic 特征的预测值,我们使用了一种风格转移模型从演讲中提取了音位级别的本地 Prosodic Embedding 序列(LPEs)和全球风格Embedding作为情感 prosodic speech 特征。我们提出了一个言语水平的多尺度文本 Prosodic Model(D-MPM),利用多尺度文本预测这两个 Prosodic 特征。该模型可以用于更精细的情感 prosodic 特征的分析,从而指导语音合成模型生成更表达力的语音。为了量化评估该模型,我们提供了一个全新的大规模言语水平中文音频书(DCA)数据集,该数据集包含超过 13,000 个交互式序列,以评估该模型。该数据集的实验结果表明,多尺度文本信息有效地帮助预测 Prosodic 特征,而言语水平文本不仅整体连贯性得到了改善,用户体验也得到了提高。更有趣的是,尽管我们的目标是风格转移模型的合成效果,但提出的文本 Prosodic 分析模型生成的合成语音在某些用户评估指标上甚至优于原始演讲的风格转移。
https://arxiv.org/abs/2309.11849
In this paper, we introduce a new approach for high-quality multi-exposure image fusion (MEF). We show that the fusion weights of an exposure can be encoded into a 1D lookup table (LUT), which takes pixel intensity value as input and produces fusion weight as output. We learn one 1D LUT for each exposure, then all the pixels from different exposures can query 1D LUT of that exposure independently for high-quality and efficient fusion. Specifically, to learn these 1D LUTs, we involve attention mechanism in various dimensions including frame, channel and spatial ones into the MEF task so as to bring us significant quality improvement over the state-of-the-art (SOTA). In addition, we collect a new MEF dataset consisting of 960 samples, 155 of which are manually tuned by professionals as ground-truth for evaluation. Our network is trained by this dataset in an unsupervised manner. Extensive experiments are conducted to demonstrate the effectiveness of all the newly proposed components, and results show that our approach outperforms the SOTA in our and another representative dataset SICE, both qualitatively and quantitatively. Moreover, our 1D LUT approach takes less than 4ms to run a 4K image on a PC GPU. Given its high quality, efficiency and robustness, our method has been shipped into millions of Android mobiles across multiple brands world-wide. Code is available at: this https URL.
在本文中,我们介绍了一种高质量的多拍摄图像融合(MEF)的新方法。我们表明,一个拍摄的融合权重可以编码为1D查找表(LUT),该表以像素强度值作为输入,并产生融合权重作为输出。我们学习每个拍摄的图像的1D LUT,然后,不同拍摄图像的所有像素都可以独立地查询那个拍摄的图像的1D LUT以进行高质量且高效的融合。具体来说,为了学习这些1D LUT,我们包括帧、通道和空间维度的注意机制,将其纳入MEF任务,以使我们相对于当前最先进的技术(SOTA)实现重大的质素改进。此外,我们收集了一个新的MEF数据集,由960个样本组成,其中155个样本是由专业人士手动调整的作为评估的基准值。我们的网络通过无监督的训练方式训练的。广泛实验是为了证明所有新提出的成分的有效性,结果表明,我们的方法和另一个代表性的数据集SICE在质量上和数量上都胜过SOTA。此外,我们的1D LUT方法在运行4K图像在PC GPU上不到4ms。由于其高质量的、效率和鲁棒性,我们的方法已经在全球范围内分发到数百万个Android移动设备上。代码可以在以下httpsURL上获取。
https://arxiv.org/abs/2309.11847
We propose DISC-LawLLM, an intelligent legal system utilizing large language models (LLMs) to provide a wide range of legal services. We adopt legal syllogism prompting strategies to construct supervised fine-tuning datasets in the Chinese Judicial domain and fine-tune LLMs with legal reasoning capability. We augment LLMs with a retrieval module to enhance models' ability to access and utilize external legal knowledge. A comprehensive legal benchmark, DISC-Law-Eval, is presented to evaluate intelligent legal systems from both objective and subjective dimensions. Quantitative and qualitative results on DISC-Law-Eval demonstrate the effectiveness of our system in serving various users across diverse legal scenarios. The detailed resources are available at this https URL.
我们提出了DISC-lawLLM智能法律系统,该系统利用大型语言模型(LLMs)提供广泛的法律服务。我们采用了法律逻辑推理策略,在中文司法领域中构建 supervised fine-tuning datasets 并优化LLMs的法律推理能力。我们还加入了一个检索模块,以提高LLM获取和使用外部法律知识的能力。我们提出了一个全面的法律基准,即DISC-law-eval,以从客观和主观两个方面评估智能法律系统。DISC-law-eval的定量和定性结果证明了我们系统在不同法律场景中为不同用户提供的有效性。详细的资源可在该https://URL提供。
https://arxiv.org/abs/2309.11325
In this paper, we address the problem of face aging: generating past or future facial images by incorporating age-related changes to the given face. Previous aging methods rely solely on human facial image datasets and are thus constrained by their inherent scale and bias. This restricts their application to a limited generatable age range and the inability to handle large age gaps. We propose FADING, a novel approach to address Face Aging via DIffusion-based editiNG. We go beyond existing methods by leveraging the rich prior of large-scale language-image diffusion models. First, we specialize a pre-trained diffusion model for the task of face age editing by using an age-aware fine-tuning scheme. Next, we invert the input image to latent noise and obtain optimized null text embeddings. Finally, we perform text-guided local age editing via attention control. The quantitative and qualitative analyses demonstrate that our method outperforms existing approaches with respect to aging accuracy, attribute preservation, and aging quality.
在本文中,我们解决了面部 aging 问题:通过将给定面部的年龄相关变化融入生成过去的或未来的面部图像,来生成面部图像。以前的 aging 方法仅依赖于人类面部图像数据集,因此受到其固有的规模和偏差的限制。这限制了其应用范围,只能用于可生成的年龄范围有限,且无法处理较大的年龄差距。我们提出了 FADING,一种新颖的方法,通过利用大规模语言-图像扩散模型的丰富先前知识来解决面部 aging 问题。我们超越了现有的方法,通过利用大规模语言-图像扩散模型的丰富先前知识,利用注意力控制,实现文本引导的局部年龄编辑。定量和定性分析表明,我们的方法在年龄准确性、属性保留和面部 aging 质量方面优于现有的方法。
https://arxiv.org/abs/2309.11321
The competitive nature of Cloud marketplaces as new concerns in delivery of services makes the pricing policies a crucial task for firms. so that, pricing strategies has recently attracted many researchers. Since game theory can handle such competing well this concern is addressed by designing a normal form game between providers in current research. A committee is considered in which providers register for improving their competition based pricing policies. The functionality of game theory is applied to design dynamic pricing policies. The usage of the committee makes the game a complete information one, in which each player is aware of every others payoff functions. The players enhance their pricing policies to maximize their profits. The contribution of this paper is the quantitative modeling of Cloud marketplaces in form of a game to provide novel dynamic pricing strategies; the model is validated by proving the existence and the uniqueness of Nash equilibrium of the game.
Cloud marketplace的竞争性性质在提供服务方面成为新的关注点,使得定价策略成为 firms 的一项关键任务。因此,定价策略最近吸引了许多研究人员的关注。由于 Game Theory 可以处理这种竞争情况,因此通过在当前研究中设计一种正常形式的游戏来解决这些问题。一个委员会被考虑,其中提供者注册改进基于竞争的定价策略。Game Theory 的功能被应用于设计动态定价策略。使用委员会使游戏成为完整的信息游戏,每个玩家都了解其他人的收益函数。玩家通过加强定价策略来最大化利润。本文的贡献是以游戏的形式定量建模 Cloud marketplace,提供新的动态定价策略。模型通过证明游戏纳什均衡的存在和独特性得到了验证。
https://arxiv.org/abs/2309.11316
Charts are common in literature across different scientific fields, conveying rich information easily accessible to readers. Current chart-related tasks focus on either chart perception which refers to extracting information from the visual charts, or performing reasoning given the extracted data, e.g. in a tabular form. In this paper, we aim to establish a unified and label-efficient learning paradigm for joint perception and reasoning tasks, which can be generally applicable to different downstream tasks, beyond the question-answering task as specifically studied in peer works. Specifically, StructChart first reformulates the chart information from the popular tubular form (specifically linearized CSV) to the proposed Structured Triplet Representations (STR), which is more friendly for reducing the task gap between chart perception and reasoning due to the employed structured information extraction for charts. We then propose a Structuring Chart-oriented Representation Metric (SCRM) to quantitatively evaluate the performance for the chart perception task. To enrich the dataset for training, we further explore the possibility of leveraging the Large Language Model (LLM), enhancing the chart diversity in terms of both chart visual style and its statistical information. Extensive experiments are conducted on various chart-related tasks, demonstrating the effectiveness and promising potential for a unified chart perception-reasoning paradigm to push the frontier of chart understanding.
图表在各种科学领域中的文献中很常见,能够轻松地传达丰富的信息给读者。当前,与图表相关的任务通常关注于图表的感知任务,即从可视化图表中获取信息,或者根据提取的数据进行推理,例如以表格形式呈现。在本文中,我们旨在建立一个统一且标签高效的学习范式,用于联合感知和推理任务,可普遍适用于各种不同的后续任务,而在同行工作中特别研究的问答任务之外。具体来说,我们希望通过结构化图表表示方法重构流行 tubular格式(特别是线性化CSV)中的图表信息,将其转换为我们提出的结构化三元表示方法(STR),这种方法更友好地减少了图表感知和推理之间的任务差距,因为采用了用于图表的结构化信息提取。我们随后提出了一个结构化图表表示度量(SCRM)来定量评估图表感知任务的表现。为了丰富训练数据集,我们进一步探索利用大型语言模型(LLM)的可能性,增强图表视觉风格和统计信息的图表多样性。对多种图表相关任务进行了广泛的实验,证明了一个统一的图表感知和推理范式的潜力,以推动图表理解的前沿。
https://arxiv.org/abs/2309.11268
In this work, we investigate the personalization of text-to-music diffusion models in a few-shot setting. Motivated by recent advances in the computer vision domain, we are the first to explore the combination of pre-trained text-to-audio diffusers with two established personalization methods. We experiment with the effect of audio-specific data augmentation on the overall system performance and assess different training strategies. For evaluation, we construct a novel dataset with prompts and music clips. We consider both embedding-based and music-specific metrics for quantitative evaluation, as well as a user study for qualitative evaluation. Our analysis shows that similarity metrics are in accordance with user preferences and that current personalization approaches tend to learn rhythmic music constructs more easily than melody. The code, dataset, and example material of this study are open to the research community.
在本研究中,我们研究在几个样本量下对文本到音乐扩散模型进行个性化处理。由于计算机视觉领域的最新进展,我们是第一个探索将预先训练的文本到音频扩散器和两个已知的个性化方法结合起来的人。我们进行了实验,探索 audio-specific 数据增强对整体系统性能的影响,并评估了不同的训练策略。为了评估,我们创造了一个带有提示和音乐片段的新数据集。我们考虑了基于嵌入和音乐特定的度量指标进行定量评估,同时也进行了用户研究进行定性评估。我们的分析表明,相似度度量与用户偏好一致,而当前的个性处理方法更倾向于学习节奏音乐构造比旋律更容易。本文代码、数据集和示例材料已公开向学术界。
https://arxiv.org/abs/2309.11140
Recently, large language models (LLMs), particularly GPT-4, have demonstrated significant capabilities in various planning and reasoning tasks \cite{cheng2023gpt4,bubeck2023sparks}. Motivated by these advancements, there has been a surge of interest among researchers to harness the capabilities of GPT-4 for the automated design of quantitative factors that do not overlap with existing factor libraries, with an aspiration to achieve alpha returns \cite{webpagequant}. In contrast to these work, this study aims to examine the fidelity of GPT-4's comprehension of classic trading theories and its proficiency in applying its code interpreter abilities to real-world trading data analysis. Such an exploration is instrumental in discerning whether the underlying logic GPT-4 employs for trading is intrinsically reliable. Furthermore, given the acknowledged interpretative latitude inherent in most trading theories, we seek to distill more precise methodologies of deploying these theories from GPT-4's analytical process, potentially offering invaluable insights to human traders. To achieve this objective, we selected daily candlestick (K-line) data from specific periods for certain assets, such as the Shanghai Stock Index. Through meticulous prompt engineering, we guided GPT-4 to analyze the technical structures embedded within this data, based on specific theories like the Elliott Wave Theory. We then subjected its analytical output to manual evaluation, assessing its interpretative depth and accuracy vis-à-vis these trading theories from multiple dimensions. The results and findings from this study could pave the way for a synergistic amalgamation of human expertise and AI-driven insights in the realm of trading.
最近,大型语言模型(LLM),特别是GPT-4,在多种规划和推理任务中表现出了巨大的能力 \cite{cheng2023gpt4,bubeck2023sparks}。基于这些进展,研究人员产生了兴趣,试图利用GPT-4的能力,自动化设计不与现有因子库重叠的定量因素,并期望实现阿尔法回报 \cite{webpagequant}。与这些工作不同,本研究旨在检查GPT-4对经典交易理论的理解的准确性,以及它在应用其代码解释能力进行现实世界交易数据分析方面的熟练程度。这种探索有助于确定GPT-4用于交易的底层逻辑是否内在可靠。此外,鉴于大多数交易理论都具有固有的解释自由,我们试图从GPT-4的分析过程中提炼更精确的部署这些理论的方法,可能向人类交易员提供宝贵的见解。为了实现这一目标,我们选择了特定时间段内的一些资产,如 Shanghai Stock Index 的每日蜡烛图(K-line)数据。通过细致的即时工程指导,我们引导GPT-4基于像 Elliott Wave Theory 这样的特定理论来分析这些数据中的技术结构。然后我们对其分析输出进行了手动评估,评估了它对这些交易理论的解释深度和准确性与其他维度上的解释。本研究的结果和发现可以为交易领域的人类专业知识与AI驱动见解的协同融合铺平道路。
https://arxiv.org/abs/2309.10982
This paper presents a fully unsupervised deep change detection approach for mobile robots with 3D LiDAR. In unstructured environments, it is infeasible to define a closed set of semantic classes. Instead, semantic segmentation is reformulated as binary change detection. We develop a neural network, RangeNetCD, that uses an existing point-cloud map and a live LiDAR scan to detect scene changes with respect to the map. Using a novel loss function, existing point-cloud semantic segmentation networks can be trained to perform change detection without any labels or assumptions about local semantics. We demonstrate the performance of this approach on data from challenging terrains; mean intersection over union (mIoU) scores range between 67.4% and 82.2% depending on the amount of environmental structure. This outperforms the geometric baseline used in all experiments. The neural network runs faster than 10Hz and is integrated into a robot's autonomy stack to allow safe navigation around obstacles that intersect the planned path. In addition, a novel method for the rapid automated acquisition of per-point ground-truth labels is described. Covering changed parts of the scene with retroreflective materials and applying a threshold filter to the intensity channel of the LiDAR allows for quantitative evaluation of the change detector.
本文介绍了一种完全无监督的深度变化检测方法,适用于具有三维激光雷达的移动机器人。在无结构环境中,无法定义一个封闭的语义类别集。因此,语义分割被重新表述为二进制变化检测。我们开发了一个新神经网络RangeNetCD,使用现有的点云地图和实时激光雷达扫描来检测地图与点云之间的场景变化。使用一种新的损失函数,现有的点云语义分割网络可以训练在没有本地语义标签或假设的情况下进行变化检测。我们使用了挑战性地形的数据来演示这种方法的性能。根据环境结构的数量,平均交叉 union (mIoU)得分范围在67.4%至82.2%。这种方法超越了所有实验中使用的几何基准。神经网络的运行速度超过10赫兹,并集成到机器人的自主堆中,以便安全绕过与计划路径相交的障碍。此外,描述了一种快速自动获取每个点真阳性标签的新方法。覆盖场景变化的部分使用反射材料,并应用阈值滤波器,将激光雷达的强度通道转换为通道,以便对变化检测进行量化评估。
https://arxiv.org/abs/2309.10924
This document presents PLVS: a real-time system that leverages sparse SLAM, volumetric mapping, and 3D unsupervised incremental segmentation. PLVS stands for Points, Lines, Volumetric mapping, and Segmentation. It supports RGB-D and Stereo cameras, which may be optionally equipped with IMUs. The SLAM module is keyframe-based, and extracts and tracks sparse points and line segments as features. Volumetric mapping runs in parallel with respect to the SLAM front-end and generates a 3D reconstruction of the explored environment by fusing point clouds backprojected from keyframes. Different volumetric mapping methods are supported and integrated in PLVS. We use a novel reprojection error to bundle-adjust line segments. This error exploits available depth information to stabilize the position estimates of line segment endpoints. An incremental and geometric-based segmentation method is implemented and integrated for RGB-D cameras in the PLVS framework. We present qualitative and quantitative evaluations of the PLVS framework on some publicly available datasets. The appendix details the adopted stereo line triangulation method and provides a derivation of the Jacobians we used for line error terms. The software is available as open-source.
这份文档介绍了PLVS:一个利用稀疏SLAM、体积映射和3D无监督增量分割实时系统。PLVS代表点、线、体积映射和分割。它支持RGB-D和立体相机,这些相机可能可选地配备惯性测量单元。SLAM模块基于关键帧,提取和跟踪稀疏点和平线片段作为特征。体积映射在SLAM前端并行运行,从关键帧聚合点云并生成从关键帧聚合的探索环境3D重构。不同体积映射方法和PLVS框架被支持和整合。我们使用一种新颖的投影误差来打包调整线片段。这个误差利用可用的深度信息稳定线片段端点的位置估计。在PLVS框架中,RGB-D相机采用增量和几何based分割方法。我们提供了对PLVS框架的一些公开数据集的定性和定量评估。附录详细描述了所采用的立体线三角化方法,并提供了线误差 terms的 Jacobians的推导。软件作为开源可用。
https://arxiv.org/abs/2309.10896
Diffusion models have emerged as a popular family of deep generative models (DGMs). In the literature, it has been claimed that one class of diffusion models -- denoising diffusion probabilistic models (DDPMs) -- demonstrate superior image synthesis performance as compared to generative adversarial networks (GANs). To date, these claims have been evaluated using either ensemble-based methods designed for natural images, or conventional measures of image quality such as structural similarity. However, there remains an important need to understand the extent to which DDPMs can reliably learn medical imaging domain-relevant information, which is referred to as `spatial context' in this work. To address this, a systematic assessment of the ability of DDPMs to learn spatial context relevant to medical imaging applications is reported for the first time. A key aspect of the studies is the use of stochastic context models (SCMs) to produce training data. In this way, the ability of the DDPMs to reliably reproduce spatial context can be quantitatively assessed by use of post-hoc image analyses. Error-rates in DDPM-generated ensembles are reported, and compared to those corresponding to a modern GAN. The studies reveal new and important insights regarding the capacity of DDPMs to learn spatial context. Notably, the results demonstrate that DDPMs hold significant capacity for generating contextually correct images that are `interpolated' between training samples, which may benefit data-augmentation tasks in ways that GANs cannot.
扩散模型已成为一种流行的深度生成模型家族(DGMs)。在文献中,有人声称,一种扩散模型——去噪扩散概率模型(DDPMs)——比生成对抗网络(GANs)表现出更好的图像合成性能。迄今为止,这些声称已经使用了旨在自然图像的集体学习方法,或者传统的图像质量指标,如结构相似性。然而,仍然需要理解DDPMs是否能可靠地学习医学成像相关信息,这被称为“空间背景”在本文中。为了解决这一问题,首次报告了对DDPMs学习空间背景有关医学成像应用的能力进行系统性评估的方法。研究的关键方面是使用随机背景模型(SCMs)生成训练数据。通过这种方式,DDPMs可靠地复制空间背景的能力可以通过事后图像分析定量评估。DDPM生成的总体错误率与现代GAN相当。研究揭示了关于DDPMs学习空间背景的重要新见解。值得注意的是,研究结果表明,DDPMs具有生成上下文正确的图像的“插值”能力,这可能在GAN无法的方式上为数据增强任务带来好处。
https://arxiv.org/abs/2309.10817
Natural language processing (NLP) applications such as named entity recognition (NER) for low-resource corpora do not benefit from recent advances in the development of large language models (LLMs) where there is still a need for larger annotated datasets. This research article introduces a methodology for generating translated versions of annotated datasets through crosslingual annotation projection. Leveraging a language agnostic BERT-based approach, it is an efficient solution to increase low-resource corpora with few human efforts and by only using already available open data resources. Quantitative and qualitative evaluations are often lacking when it comes to evaluating the quality and effectiveness of semi-automatic data generation strategies. The evaluation of our crosslingual annotation projection approach showed both effectiveness and high accuracy in the resulting dataset. As a practical application of this methodology, we present the creation of French Annotated Resource with Semantic Information for Medical Entities Detection (FRASIMED), an annotated corpus comprising 2'051 synthetic clinical cases in French. The corpus is now available for researchers and practitioners to develop and refine French natural language processing (NLP) applications in the clinical field (this https URL), making it the largest open annotated corpus with linked medical concepts in French.
自然语言处理(NLP)应用,例如低资源 corpora 的命名实体识别(NER),并没有从大型语言模型(LLM)的最新发展中获得益处,尽管仍需要更大的注释dataset。本文介绍了一种方法,通过跨语言注释投影,生成注释dataset的翻译版本。利用语言无关的BERT方法,这是一种高效解决方案,以减少人类努力,仅使用已经可用的开放数据资源,增加低资源dataset。在评估半自动数据生成策略的质量和效果时,常常缺乏量化和定性评估。我们的跨语言注释投影方法评估表明,结果dataset既具有有效性,又有很高的准确性。作为这种方法的实际应用,我们介绍了创建法语注释资源,以处理医学实体识别(FRASIMED),这是一个由2,051个合成临床案例组成的法语注释集。该集现在可用于研究人员和实践中开发、改进法语自然语言处理(NLP)应用程序(此httpsURL),使其成为法语中最大的开放注释集,与相关的医学概念链接。
https://arxiv.org/abs/2309.10770