Analyzing laparoscopic surgery videos presents a complex and multifaceted challenge, with applications including surgical training, intra-operative surgical complication prediction, and post-operative surgical assessment. Identifying crucial events within these videos is a significant prerequisite in a majority of these applications. In this paper, we introduce a comprehensive dataset tailored for relevant event recognition in laparoscopic gynecology videos. Our dataset includes annotations for critical events associated with major intra-operative challenges and post-operative complications. To validate the precision of our annotations, we assess event recognition performance using several CNN-RNN architectures. Furthermore, we introduce and evaluate a hybrid transformer architecture coupled with a customized training-inference framework to recognize four specific events in laparoscopic surgery videos. Leveraging the Transformer networks, our proposed architecture harnesses inter-frame dependencies to counteract the adverse effects of relevant content occlusion, motion blur, and surgical scene variation, thus significantly enhancing event recognition accuracy. Moreover, we present a frame sampling strategy designed to manage variations in surgical scenes and the surgeons' skill level, resulting in event recognition with high temporal resolution. We empirically demonstrate the superiority of our proposed methodology in event recognition compared to conventional CNN-RNN architectures through a series of extensive experiments.
分析经内镜手术视频是一个复杂而多面的挑战,包括手术培训、术中并发症预测和术后手术评估。在大多数应用中,确定这些视频中关键事件的存在是一个重要的前提条件。在本文中,我们介绍了一个专门为 laparoscopic gynecology 视频中的相关事件识别而设计的全面数据集。我们的数据集包括与主要术中挑战和术后并发症相关的关键事件的注释。为了验证我们标注的精度,我们使用几个卷积神经网络循环神经网络架构评估事件识别性能。此外,我们介绍并评估了一个与自定义训练-推理框架相结合的混合Transformer 架构来识别四种 laparoscopic surgery 视频中的特定事件。借助Transformer网络,我们的架构利用跨帧依赖来对抗相关内容遮挡、运动模糊和手术场景变化等不利影响,从而显著提高事件识别准确性。此外,我们提出了一种帧采样策略,旨在管理手术场景和医生技能水平的变异性,从而实现具有高时间分辨率的事件识别。通过一系列广泛的实验,我们实证证明了与传统 CNN-RNN 架构相比,我们提出的方法在事件识别方面具有优越性。
https://arxiv.org/abs/2312.00593
Global horizontal irradiance (GHI) plays a vital role in estimating solar energy resources, which are used to generate sustainable green energy. In order to estimate GHI with high spatial resolution, a quantitative irradiance estimation network, named QIENet, is proposed. Specifically, the temporal and spatial characteristics of remote sensing data of the satellite Himawari-8 are extracted and fused by recurrent neural network (RNN) and convolution operation, respectively. Not only remote sensing data, but also GHI-related time information (hour, day, and month) and geographical information (altitude, longitude, and latitude), are used as the inputs of QIENet. The satellite spectral channels B07 and B11 - B15 and time are recommended as model inputs for QIENet according to the spatial distributions of annual solar energy. Meanwhile, QIENet is able to capture the impact of various clouds on hourly GHI estimates. More importantly, QIENet does not overestimate ground observations and can also reduce RMSE by 27.51%/18.00%, increase R2 by 20.17%/9.42%, and increase r by 8.69%/3.54% compared with ERA5/NSRDB. Furthermore, QIENet is capable of providing a high-fidelity hourly GHI database with spatial resolution 0.02° * 0.02°(approximately 2km * 2km) for many applied energy fields.
全球水平 irradiance(GHI)在估计太阳能资源中发挥着重要作用,这些资源用于产生可持续的绿色能源。为了使用高空间分辨率估算 GHI,我们提出了一个名为 QIENet 的定量 irradiance 估计网络。具体来说,根据卫星 Himawari-8 的遥感数据的时间和空间特性,通过循环神经网络(RNN)和卷积操作分别提取和融合遥感数据。不仅包括遥感数据,还包括 GHI 相关的时信息(小时,天,月)和地理信息(高度,经度,纬度),都被用作 QIENet 的输入。根据太阳年总辐射的空间分布,我们建议卫星光谱通道 B07 和 B11 - B15 和时间作为 QIENet 的模型输入。同时,QIENet 能够捕捉到各种云对每小时的 GHI 估计的影响。更重要的是,QIENet 没有夸大地面观测数据,还可以将 RMSE 降低 27.51%/18.00%,提高 R2 提高 20.17%/9.42%,并将 r 提高 8.69%/3.54%。此外,QIENet 能够为许多应用能源领域提供高保真度每小时的 GHI 数据库,空间分辨率约为 0.02° * 0.02°(大约 2km * 2km)。
https://arxiv.org/abs/2312.00299
Action recognition is a prerequisite for many applications in laparoscopic video analysis including but not limited to surgical training, operation room planning, follow-up surgery preparation, post-operative surgical assessment, and surgical outcome estimation. However, automatic action recognition in laparoscopic surgeries involves numerous challenges such as (I) cross-action and intra-action duration variation, (II) relevant content distortion due to smoke, blood accumulation, fast camera motions, organ movements, object occlusion, and (III) surgical scene variations due to different illuminations and viewpoints. Besides, action annotations in laparoscopy surgeries are limited and expensive due to requiring expert knowledge. In this study, we design and evaluate a CNN-RNN architecture as well as a customized training-inference framework to deal with the mentioned challenges in laparoscopic surgery action recognition. Using stacked recurrent layers, our proposed network takes advantage of inter-frame dependencies to negate the negative effect of content distortion and variation in action recognition. Furthermore, our proposed frame sampling strategy effectively manages the duration variations in surgical actions to enable action recognition with high temporal resolution. Our extensive experiments confirm the superiority of our proposed method in action recognition compared to static CNNs.
动作识别是许多腹腔镜视频分析应用程序的先决条件,包括但不仅限于手术培训、手术室规划、后续手术准备、术后手术评估和手术结局估计。然而,腹腔镜手术中的自动动作识别涉及许多挑战,例如(I)跨动作和内动作持续时间变化,(II)由于烟雾、血液积聚、快速相机运动、器官运动、物体遮挡和(III)由于不同光线和视角引起的手术场景变化。此外,由于需要专家知识,腹腔镜手术中的动作注释是有限和昂贵的。在这项研究中,我们设计并评估了一个CNN-RNN架构以及一个自定义的训练-推理框架,以解决上述腹腔镜手术动作识别中的挑战。通过堆叠循环层,我们提出的网络利用了跨帧依赖来抵消内容扭曲和变化的影响。此外,我们提出的帧采样策略有效地管理了手术动作的持续时间变化,从而实现高时间分辨率的动作识别。我们的大量实验证实了与静态CNN相比,我们提出的方法在动作识别方面具有优势。
https://arxiv.org/abs/2311.18666
This study proposes a method to automate the development of lookahead planning. The proposed method uses construction material conditions (i.e., appearances) and site space utilization to predict task completion rates. A Gated Recurrent Unit (GRU) based Recurrent Neural Network (RNN) model was trained using a segment of a construction project timeline to estimate completion rates of tasks and propose data-aware lookahead plans. The proposed method was evaluated in a sample construction project involving finishing works such as plastering, painting, and installing electrical fixtures. The results show that the proposed method can assist with developing automated lookahead plans. In doing so, this study links construction planning with actual events at the construction site. It extends the traditional scheduling techniques and integrates a broader spectrum of site spatial constraints into lookahead planning.
本研究提出了一种自动化前瞻规划的方法。所提出的方法利用建筑材料条件(即外观)和场地空间利用率来预测任务完成率。基于循环神经网络(RNN)的门控循环单元(GRU)模型,通过构建项目时间表的一个片段来训练,以估计任务的完成率并提出数据驱动的前瞻规划。所提出的方法在包括粉刷、绘画和安装电气装置等完成的施工项目样本中进行了评估。结果显示,与传统调度技术相比,所提出的方法有助于开发自动前瞻规划。在进行研究时,本研究将施工规划与现场实际事件联系起来。它扩展了传统调度技术,并将更广泛的场地空间限制融入到前瞻规划中。
https://arxiv.org/abs/2311.18361
As text generative models can give increasingly long answers, we tackle the problem of synthesizing long text in digital ink. We show that the commonly used models for this task fail to generalize to long-form data and how this problem can be solved by augmenting the training data, changing the model architecture and the inference procedure. These methods use contrastive learning technique and are tailored specifically for the handwriting domain. They can be applied to any encoder-decoder model that works with digital ink. We demonstrate that our method reduces the character error rate on long-form English data by half compared to baseline RNN and by 16% compared to the previous approach that aims at addressing the same problem. We show that all three parts of the method improve recognizability of generated inks. In addition, we evaluate synthesized data in a human study and find that people perceive most of generated data as real.
作为文本生成模型可以给出越来越长的答案,我们解决了在数字墨水合成长文本的问题。我们证明了用于此任务的常用模型无法泛化到长形式数据,以及如何通过增加训练数据、改变模型架构和推理过程来解决这个问题。这些方法使用了对比学习技术,并专门针对手写领域。它们可以应用于任何使用数字墨水的编码器-解码器模型。我们证明了我们的方法将基于RNN的長英文數據的字符錯誤率降低了一半,比基線RNN低16%。我们证明了所有三个部分的方法都能提高生成的墨水的可识别性。此外,我们在人类研究中评估了合成数据,发现人们认为大多数生成的数据是真实的。
https://arxiv.org/abs/2311.17786
We study image segmentation using spatiotemporal dynamics in a recurrent neural network where the state of each unit is given by a complex number. We show that this network generates sophisticated spatiotemporal dynamics that can effectively divide an image into groups according to a scene's structural characteristics. Using an exact solution of the recurrent network's dynamics, we present a precise description of the mechanism underlying object segmentation in this network, providing a clear mathematical interpretation of how the network performs this task. We then demonstrate a simple algorithm for object segmentation that generalizes across inputs ranging from simple geometric objects in grayscale images to natural images. Object segmentation across all images is accomplished with one recurrent neural network that has a single, fixed set of weights. This demonstrates the expressive potential of recurrent neural networks when constructed using a mathematical approach that brings together their structure, dynamics, and computation.
我们使用循环神经网络中的时空动态研究图像分割。其中每个单元的状态由一个复数表示。我们证明了这个网络生成复杂的空间和时间动态,可以根据场景的结构特征有效地将图像分组成组。通过求解递归网络动态的准确解,我们给出了这个网络中物体分割机制的准确描述,为网络如何完成此任务提供了清晰的数学解释。然后,我们证明了一个通用的物体分割算法,该算法适用于从灰度图像中的简单几何物体到自然图像的所有输入。通过使用一种将它们的结构、动态和计算相结合的数学方法构建的循环神经网络,我们证明了循环神经网络在构建过程中具有表现力。
https://arxiv.org/abs/2311.16943
Multivariate time series have many applications, from healthcare and meteorology to life science. Although deep learning models have shown excellent predictive performance for time series, they have been criticised for being "black-boxes" or non-interpretable. This paper proposes a novel modular neural network model for multivariate time series prediction that is interpretable by construction. A recurrent neural network learns the temporal dependencies in the data while an attention-based feature selection component selects the most relevant features and suppresses redundant features used in the learning of the temporal dependencies. A modular deep network is trained from the selected features independently to show the users how features influence outcomes, making the model interpretable. Experimental results show that this approach can outperform state-of-the-art interpretable Neural Additive Models (NAM) and variations thereof in both regression and classification of time series tasks, achieving a predictive performance that is comparable to the top non-interpretable methods for time series, LSTM and XGBoost.
多变量时间序列具有许多应用,从医疗保健和气象学到生命科学。尽管深度学习模型在时间序列预测方面表现出色,但它们被批评为“黑盒子”或无法解释。本文提出了一种可解释的模块化神经网络模型,用于多变量时间序列预测,该模型可以由构建过程进行解释。 循环神经网络学习数据中的时间依赖关系,而基于注意力的特征选择组件选择最具相关性的特征并抑制用于学习时间依赖关系的冗余特征。模块化深度网络从选择到的特征独立训练,以展示特征如何影响结果,使模型具有可解释性。 实验结果表明,这种方法可以在时间序列任务的回归和分类中优于最先进的可解释神经自适应模型(NAM)以及其变种,达到与时间序列 top non-interpretable 方法的预测性能相当的水平,实现类似于 LSTM 和 XGBoost 等顶级非可解释方法的精度预测。
https://arxiv.org/abs/2311.16834
Dialogue systems, including task-oriented_dialogue_system (TOD) and open-domain_dialogue_system (ODD), have undergone significant transformations, with language_models (LM) playing a central role. This survey delves into the historical trajectory of dialogue systems, elucidating their intricate relationship with advancements in language models by categorizing this evolution into four distinct stages, each marked by pivotal LM breakthroughs: 1) Early_Stage: characterized by statistical LMs, resulting in rule-based or machine-learning-driven dialogue_systems; 2) Independent development of TOD and ODD based on neural_language_models (NLM; e.g., LSTM and GRU), since NLMs lack intrinsic knowledge in their parameters; 3) fusion between different types of dialogue systems with the advert of pre-trained_language_models (PLMs), starting from the fusion between four_sub-tasks_within_TOD, and then TOD_with_ODD; and 4) current LLM-based_dialogue_system, wherein LLMs can be used to conduct TOD and ODD seamlessly. Thus, our survey provides a chronological perspective aligned with LM breakthroughs, offering a comprehensive review of state-of-the-art research outcomes. What's more, we focus on emerging topics and discuss open challenges, providing valuable insights into future directions for LLM-based_dialogue_systems. Through this exploration, we pave the way for a deeper_comprehension of the evolution, guiding future developments in LM-based dialogue_systems.
对话系统,包括面向任务的对话系统(TOD)和开放域对话系统(ODD),已经经历了显著的变革,其中语言模型(LM)发挥了关键作用。本调查深入探讨了对话系统的演变历史,揭示了它们与语言模型的进步之间的复杂关系,并将这种演变分为四个 distinct的阶段,每个阶段都由关键的LM突破所标志: 1)早期阶段:以统计LM为基础,导致基于规则或机器学习驱动的对话系统; 2)基于神经语言模型的独立发展TOD和ODD,因为NLMs在参数中缺乏固有知识; 3)不同类型的对话系统之间融合,以推广预训练语言模型(PLM)的使用,从TOD的四子任务融合开始,然后是TOD与ODD; 4)当前基于LLM的对话系统,其中LLM可以轻松进行TOD和ODD。 因此,我们的调查提供了一个与LM突破同步的编年史,全面回顾了最先进的研发成果。此外,我们关注新兴领域并讨论了公开挑战,为LLM-based对话系统未来的发展方向提供了宝贵的见解。通过这次探索,我们为更深入地理解演变提供了指导,引导未来LM-based对话系统的发展。
https://arxiv.org/abs/2311.16789
Despite their dominance in modern DL and, especially, NLP domains, transformer architectures exhibit sub-optimal performance on long-range tasks compared to recent layers that are specifically designed for this purpose. In this work, drawing inspiration from key attributes of long-range layers, such as state-space layers, linear RNN layers, and global convolution layers, we demonstrate that minimal modifications to the transformer architecture can significantly enhance performance on the Long Range Arena (LRA) benchmark, thus narrowing the gap with these specialized layers. We identify that two key principles for long-range tasks are (i) incorporating an inductive bias towards smoothness, and (ii) locality. As we show, integrating these ideas into the attention mechanism improves results with a negligible amount of additional computation and without any additional trainable parameters. Our theory and experiments also shed light on the reasons for the inferior performance of transformers on long-range tasks and identify critical properties that are essential for successfully capturing long-range dependencies.
尽管在现代深度学习和特别是自然语言处理领域,Transformer架构具有统治地位,但与专门为此设计的 recent 层相比,在长距离任务上表现 sub-optimal。在本文中,我们受到长距离层关键属性的启发,例如状态空间层、线性循环神经网络层和全局卷积层,证明了对Transformer架构的最低修改可以显著提高 Long Range Arena(LRA)基准测试的性能,从而缩小与这些专用层的差距。我们发现,长距离任务的两个关键原则是(i)引入导数偏差以实现平滑,和(ii)局部性。正如我们所证明的,将这两个想法集成到注意力机制中,只需要少量的额外计算,而无需额外训练参数,就能改善结果。我们的理论和实验还阐明了为什么Transformer在长距离任务上的表现劣后,并识别出成功捕捉长距离依赖的关键属性。
https://arxiv.org/abs/2311.16620
Jamming and intrusion detection are critical in 5G research, aiming to maintain reliability, prevent user experience degradation, and avoid infrastructure failure. This paper introduces an anonymous jamming detection model for 5G based on signal parameters from the protocol stacks. The system uses supervised and unsupervised learning for real-time, high-accuracy detection of jamming, including unknown types. Supervised models reach an AUC of 0.964 to 1, compared to LSTM models with an AUC of 0.923 to 1. However, the need for data annotation limits the supervised approach. To address this, an unsupervised auto-encoder-based anomaly detection is presented with an AUC of 0.987. The approach is resistant to adversarial training samples. For transparency and domain knowledge injection, a Bayesian network-based causation analysis is introduced.
窃听和入侵检测在5G研究中至关重要,旨在保持可靠性,防止用户体验下降,并避免基础设施故障。本文基于协议栈的信号参数引入了一种匿名窃听检测模型。系统使用监督学习和无监督学习对实时、高精度的窃听进行检测,包括未知类型。与具有AUC为0.923至1的LSTM模型相比,监督模型的AUC为0.964至1。然而,需要数据注释限制了监督方法。为了解决这个问题,引入了一种基于无监督自编码器的不确定性检测,其AUC为0.987。该方法具有抗对抗训练样本的能力。为了保证透明度和领域知识的注入,引入了一种基于贝叶斯网络的因果分析。
https://arxiv.org/abs/2311.17097
Predicting the trajectory of pedestrians in crowd scenarios is indispensable in self-driving or autonomous mobile robot field because estimating the future locations of pedestrians around is beneficial for policy decision to avoid collision. It is a challenging issue because humans have different walking motions and the interactions between humans and objects in the current environment, especially between human themselves, are complex. Previous researches have focused on how to model the human-human interactions, however, neglecting the relative importance of interactions. In order to address this issue, we introduce a novel mechanism based on the correntropy, which not only can measure the relative importance of human-human interactions, but also can build personal space for each pedestrian. We further propose an Interaction Module including this data-driven mechanism that can effectively extract feature representations of dynamic human-human interactions in the scene and calculate corresponding weights to represent the importance of different interactions. To share such social messages among pedestrians, we design an interaction-aware architecture based on the Long Short-Term Memory (LSTM) network for trajectory prediction. We demonstrate the performance of our model on two public datasets and the experimental results demonstrate that our model can achieve better performance than several latest methods with good performance.
在人群场景中预测行人的轨迹对于自驾驶或自动驾驶机器人领域至关重要,因为估测行人周围的未来位置有助于政策决策避免碰撞。这是一个具有挑战性的问题,因为人类有不同的行走动作,当前环境中人与物体之间的互动,特别是人与人之间的互动,复杂多样。之前的研究主要关注如何建模人类之间的互动,然而,忽视了互动的相对重要性。为了解决这个问题,我们引入了一种基于余弦的全新机制,这种机制不仅能够测量人类之间的互动相对重要性,还可以为每个行人构建个人空间。我们进一步提出了一个基于数据驱动的交互模块,可以有效地提取场景中动态的人-人互动特征并计算相应的权重,代表不同互动的重要性。为了在行人之间共享这些社交信息,我们基于长短时记忆(LSTM)网络设计了一个交互感知架构来进行轨迹预测。我们在两个公共数据集上测试了我们的模型,实验结果表明,我们的模型性能优于几种最先进的方法,具有较好的性能。
https://arxiv.org/abs/2311.15193
In the era of space exploration, the implications of space weather have become increasingly evident. Central to this is the phenomenon of coronal holes, which can significantly influence the functioning of satellites and aircraft. These coronal holes, present on the sun, are distinguished by their open magnetic field lines and comparatively cooler temperatures, leading to the emission of solar winds at heightened rates. To anticipate the effects of these coronal holes on Earth, our study harnesses computer vision to pinpoint the coronal hole regions and estimate their dimensions using imagery from the Solar Dynamics Observatory (SDO). Further, we deploy deep learning methodologies, specifically the Long Short-Term Memory (LSTM) approach, to analyze the trends in the data related to the area of the coronal holes and predict their dimensions across various solar regions over a span of seven days. By evaluating the time series data concerning the area of the coronal holes, our research seeks to uncover patterns in the behavior of coronal holes and comprehend their potential influence on space weather occurrences. This investigation marks a pivotal stride towards bolstering our capacity to anticipate and brace for space weather events that could have ramifications for Earth and its technological apparatuses.
在太空探索的时代,太空天气的影响越来越明显。这一现象的核心是太阳黑子现象,黑子对卫星和飞机的运行具有重要影响。这些黑子分布在太阳上,其特征是开放式的磁场线和相对较冷的温度,导致太阳风以更高的速率发射。为了预测这些黑子对地球的影响,我们的研究利用计算机视觉技术确定黑子区域,并使用太阳能动力学观测站(SDO)的图像估计它们的尺寸。此外,我们运用深度学习方法,特别是长短时记忆(LSTM)方法,对黑子区域的数据进行分析和预测,预测它们在七个不同太阳区域中的尺寸。通过评估黑子区域的时间序列数据,我们的研究旨在揭示黑子行为的模式,并理解它们对太空天气事件的影响。这次调查标志着我们向前迈进,提高我们对预测和应对太空天气事件的准备能力,从而影响地球及其技术设备的未来。
https://arxiv.org/abs/2301.06732
In this paper, we investigate the long-term memory learning capabilities of state-space models (SSMs) from the perspective of parameterization. We prove that state-space models without any reparameterization exhibit a memory limitation similar to that of traditional RNNs: the target relationships that can be stably approximated by state-space models must have an exponential decaying memory. Our analysis identifies this "curse of memory" as a result of the recurrent weights converging to a stability boundary, suggesting that a reparameterization technique can be effective. To this end, we introduce a class of reparameterization techniques for SSMs that effectively lift its memory limitations. Besides improving approximation capabilities, we further illustrate that a principled choice of reparameterization scheme can also enhance optimization stability. We validate our findings using synthetic datasets and language models.
在本文中,我们研究了参数化视角下状态空间模型(SSMs)的长期记忆学习能力。我们证明,没有重新参数化的状态空间模型具有类似于传统RNN的内存限制:状态空间模型可以稳定地近似的目标关系必须具有指数衰减的内存。我们的分析表明,这种“内存诅咒”是由递归权重收敛到稳定边界引起的,这表明重新参数化技术可以有效提高记忆能力。为此,我们引入了一类用于提高状态空间模型的重新参数化技术,有效提高了其记忆能力。除了提高近似能力外,我们还进一步表明,合理的选择重新参数化方案也可以提高优化稳定性。我们通过使用合成数据和语言模型验证我们的发现。
https://arxiv.org/abs/2311.14495
Development of robust general purpose 3D segmentation frameworks using the latest deep learning techniques is one of the active topics in various bio-medical domains. In this work, we introduce Temporal Cubic PatchGAN (TCuP-GAN), a volume-to-volume translational model that marries the concepts of a generative feature learning framework with Convolutional Long Short-Term Memory Networks (LSTMs), for the task of 3D segmentation. We demonstrate the capabilities of our TCuP-GAN on the data from four segmentation challenges (Adult Glioma, Meningioma, Pediatric Tumors, and Sub-Saharan Africa subset) featured within the 2023 Brain Tumor Segmentation (BraTS) Challenge and quantify its performance using LesionWise Dice similarity and $95\%$ Hausdorff Distance metrics. We demonstrate the successful learning of our framework to predict robust multi-class segmentation masks across all the challenges. This benchmarking work serves as a stepping stone for future efforts towards applying TCuP-GAN on other multi-class tasks such as multi-organelle segmentation in electron microscopy imaging.
使用最新的深度学习技术开发鲁棒的一般3D分割框架是各种生物医学领域的一个活跃主题。在这项工作中,我们介绍了Temporal Cubic PatchGAN(TCuP-GAN),一种将生成特征学习框架与卷积长短期记忆网络(LSTMs)相结合的体积到体积的传输模型,用于3D分割任务。我们展示了TCuP-GAN在2023年脑肿瘤分割(BraTS)挑战中的数据上的能力,并使用LesionWise Dice相似度和$95\%$ 汉明距离度量对其性能进行量化。我们展示了我们的框架在所有挑战中成功预测鲁棒多类分割掩码。这一基准工作为未来在电子显微镜图像成像等更多多类任务上应用TCuP-GAN奠定了基础。
https://arxiv.org/abs/2311.14148
This paper details the design and implementation of a system for predicting and interpolating object location coordinates. Our solution is based on processing inertial measurements and global positioning system data through a Long Short-Term Memory (LSTM) neural network and polynomial regression. LSTM is a type of recurrent neural network (RNN) particularly suited for processing data sequences and avoiding the long-term dependency problem. We employed data from real-world vehicles and the global positioning system (GPS) sensors. A critical pre-processing step was developed to address varying sensor frequencies and inconsistent GPS time steps and dropouts. The LSTM-based system's performance was compared with the Kalman Filter. The system was tuned to work in real-time with low latency and high precision. We tested our system on roads under various driving conditions, including acceleration, turns, deceleration, and straight paths. We tested our proposed solution's accuracy and inference time and showed that it could perform in real-time. Our LSTM-based system yielded an average error of 0.11 meters with an inference time of 2 ms. This represents a 76\% reduction in error compared to the traditional Kalman filter method, which has an average error of 0.46 meters with a similar inference time to the LSTM-based system.
本文详细介绍了用于预测和插值物体位置坐标的系统的设计和实现。我们的解决方案基于通过长短期记忆(LSTM)神经网络处理惯性测量和全球定位系统(GPS)数据。LSTM是一种特别适用于处理数据序列和避免长期依赖问题的循环神经网络(RNN)。我们使用来自现实世界车辆和GPS传感器的数据。为了解决不同传感器频率和不一致的GPS时间步问题,还开发了一个关键的前处理步骤。基于LSTM的系统的性能与Kalman滤波器进行了比较。系统被调整用于实时低延迟和高精度的运行。我们在不同的驾驶条件下测试了我们的系统,包括加速、转弯、减速和直线道路。我们还测试了所提出的解决方案的准确性和推理时间,并证明了它可以实现实时运行。基于LSTM的系统的平均误差为0.11米,推理时间为2毫秒。这表明与传统Kalman滤波器方法相比,误差降低了76%,其平均误差为0.46米,推理时间与基于LSTM的系统相似。
https://arxiv.org/abs/2311.13950
Specifying legal requirements for software systems to ensure their compliance with the applicable regulations is a major concern to requirements engineering (RE). Personal data which is collected by an organization is often shared with other organizations to perform certain processing activities. In such cases, the General Data Protection Regulation (GDPR) requires issuing a data processing agreement (DPA) which regulates the processing and further ensures that personal data remains protected. Violating GDPR can lead to huge fines reaching to billions of Euros. Software systems involving personal data processing must adhere to the legal obligations stipulated in GDPR and outlined in DPAs. Requirements engineers can elicit from DPAs legal requirements for regulating the data processing activities in software systems. Checking the completeness of a DPA according to the GDPR provisions is therefore an essential prerequisite to ensure that the elicited requirements are complete. Analyzing DPAs entirely manually is time consuming and requires adequate legal expertise. In this paper, we propose an automation strategy to address the completeness checking of DPAs against GDPR. Specifically, we pursue ten alternative solutions which are enabled by different technologies, namely traditional machine learning, deep learning, language modeling, and few-shot learning. The goal of our work is to empirically examine how these different technologies fare in the legal domain. We computed F2 score on a set of 30 real DPAs. Our evaluation shows that best-performing solutions yield F2 score of 86.7% and 89.7% are based on pre-trained BERT and RoBERTa language models. Our analysis further shows that other alternative solutions based on deep learning (e.g., BiLSTM) and few-shot learning (e.g., SetFit) can achieve comparable accuracy, yet are more efficient to develop.
指定软件系统遵守适用的法规以确保其合规是一个重要的问题,需求工程(RE)对此非常关注。个人数据通常由组织收集,并与其他组织共享以执行某些处理活动。在这种情况下,根据《通用数据保护条例》(GDPR)的要求,需要颁发数据处理协议(DPA),规定数据处理和进一步加强个人数据的保护。违反GDPR可能会导致高达10亿欧元的罚款。涉及个人数据处理的游戏系统必须遵守GDPR中规定的法律义务,并在DPA中详细列出。因此,根据GDPR规定对DPAs的完整性进行校验是确保提取的要求完整的必要前提。对DPAs进行完全手动分析耗时且需要足够的法律专业知识。在本文中,我们提出了一个自动化策略来解决GDPR中对DPAs完整性检查的自动化问题。具体来说,我们探讨了10种不同的技术,即传统机器学习、深度学习、自然语言处理(NLP)和少样本学习。我们的工作目标是探讨这些不同技术在法律领域中的表现。我们在30个真实DPAs上计算F2分数。我们的评估显示,最佳表现者的F2得分达到了86.7%和89.7%,这两个得分基于预训练的BERT和RoBERTa语言模型。我们进一步分析发现,基于深度学习的其他替代方案(例如BiLSTM)和少样本学习(例如SetFit)可以实现与预训练的BERT和RoBERTa语言模型相当的精度,但开发效率更高。
https://arxiv.org/abs/2311.13881
Recent advances in large language models (LLMs) show potential for clinical applications, such as clinical decision support and trial recommendations. However, the GPT-4 LLM predicts an excessive number of ICD codes for medical coding tasks, leading to high recall but low precision. To tackle this challenge, we introduce LLM-codex, a two-stage approach to predict ICD codes that first generates evidence proposals using an LLM and then employs an LSTM-based verification stage. The LSTM learns from both the LLM's high recall and human expert's high precision, using a custom loss function. Our model is the only approach that simultaneously achieves state-of-the-art results in medical coding accuracy, accuracy on rare codes, and sentence-level evidence identification to support coding decisions without training on human-annotated evidence according to experiments on the MIMIC dataset.
近年来,大型语言模型(LLMs)的进步在临床应用方面显示出巨大潜力,如临床决策支持和试验建议。然而,GPT-4 LLM预测医疗编码任务的ICD代码数量过多,导致高召回率但低精确度。为了解决这个挑战,我们引入了LLM-codex,一种两阶段方法,用于预测ICD代码,首先使用LLM生成证据建议,然后采用基于LSTM的验证阶段。LSTM从LLM的高召回率和人类专家的高精确度中学习,使用自定义损失函数。我们的模型是唯一一种同时实现医疗编码准确性和 rare codes 准确性的方法,支持根据MIMIC数据集的实验结果无需训练即支持编码决策。
https://arxiv.org/abs/2311.13735
Spelling correction is a remarkable challenge in the field of natural language processing. The objective of spelling correction tasks is to recognize and rectify spelling errors automatically. The development of applications that can effectually diagnose and correct Persian spelling and grammatical errors has become more important in order to improve the quality of Persian text. The Typographical Error Type Detection in Persian is a relatively understudied area. Therefore, this paper presents a compelling approach for detecting typographical errors in Persian texts. Our work includes the presentation of a publicly available dataset called FarsTypo, which comprises 3.4 million words arranged in chronological order and tagged with their corresponding part-of-speech. These words cover a wide range of topics and linguistic styles. We develop an algorithm designed to apply Persian-specific errors to a scalable portion of these words, resulting in a parallel dataset of correct and incorrect words. By leveraging FarsTypo, we establish a strong foundation and conduct a thorough comparison of various methodologies employing different architectures. Additionally, we introduce a groundbreaking Deep Sequential Neural Network that utilizes both word and character embeddings, along with bidirectional LSTM layers, for token classification aimed at detecting typographical errors across 51 distinct classes. Our approach is contrasted with highly advanced industrial systems that, unlike this study, have been developed using a diverse range of resources. The outcomes of our final method proved to be highly competitive, achieving an accuracy of 97.62%, precision of 98.83%, recall of 98.61%, and surpassing others in terms of speed.
拼写纠错是自然语言处理领域的一个令人惊讶的挑战。拼写纠错任务的目的是自动识别和纠正拼写错误,以提高波斯文本的质量。波斯文拼写错误类型检测是一个相对被低估的领域。因此,本文提出了一种检测波斯文本拼写错误的引人注目的方法。我们的工作包括介绍一个公开可用数据集FarsTypo,它包括340万单词,按时间顺序排列,并附有相应的词性标注。这些单词涵盖了广泛的题材和语言风格。我们开发了一个算法,将波斯特定的错误应用到这些单词的扩展部分,从而得到一个正确的和错误的单词并行数据集。通过利用FarsTypo,我们为不同的架构建立了坚实的基础,对各种方法进行了彻底的比较。此外,我们引入了一种突破性的Deep Sequential Neural Network,它利用了词和字符嵌入,以及双向LSTM层,用于检测51个不同类别的拼写错误。我们的方法与高度先进的工业系统不同,后者利用了各种资源进行开发,与本研究不同。我们最终方法的成果表明,具有高度竞争力和准确性,精度为97.62%,召回率为98.83%,速度快于其他系统。
https://arxiv.org/abs/2305.11731
Background and objectives: Dynamic handwriting analysis, due to its non-invasive and readily accessible nature, has recently emerged as a vital adjunctive method for the early diagnosis of Parkinson's disease. In this study, we design a compact and efficient network architecture to analyse the distinctive handwriting patterns of patients' dynamic handwriting signals, thereby providing an objective identification for the Parkinson's disease diagnosis. Methods: The proposed network is based on a hybrid deep learning approach that fully leverages the advantages of both long short-term memory (LSTM) and convolutional neural networks (CNNs). Specifically, the LSTM block is adopted to extract the time-varying features, while the CNN-based block is implemented using one-dimensional convolution for low computational cost. Moreover, the hybrid model architecture is continuously refined under ablation studies for superior performance. Finally, we evaluate the proposed method with its generalization under a five-fold cross-validation, which validates its efficiency and robustness. Results: The proposed network demonstrates its versatility by achieving impressive classification accuracies on both our new DraWritePD dataset ($96.2\%$) and the well-established PaHaW dataset ($90.7\%$). Moreover, the network architecture also stands out for its excellent lightweight design, occupying a mere $0.084$M of parameters, with a total of only $0.59$M floating-point operations. It also exhibits near real-time CPU inference performance, with inference times ranging from $0.106$ to $0.220$s. Conclusions: We present a series of experiments with extensive analysis, which systematically demonstrate the effectiveness and efficiency of the proposed hybrid neural network in extracting distinctive handwriting patterns for precise diagnosis of Parkinson's disease.
背景和目标:由于其非侵入性和易于访问的特点,近年来动态手写分析已成为早期诊断帕金森病的实用方法。在这项研究中,我们设计了一个紧凑且高效的网络架构,用于分析患者动态手写信号的显著特征,从而为帕金森病诊断提供客观依据。方法:所提出的网络基于一种结合长短时记忆(LSTM)和卷积神经网络(CNN)的优势的混合深度学习方法。具体来说,LSTM模块用于提取时间变化特征,而基于CNN的模块则使用一维卷积进行低计算成本的实现。此外,在消融研究中对模型架构进行了持续改进,以提高性能。最后,我们在五倍交叉验证上评估所提出的方法,验证了其效率和稳健性。结果:与我们的新DraWritePD数据集($96.2\%)和已知的有成效的PaHaW数据集($90.7\%)相比,所提出的网络在分类准确性方面都表现出惊人的效果。此外,网络架构还因其轻量级设计而脱颖而出,仅占0.084M的参数,总共有0.59M的浮点运算。它还表现出近乎实时的CPU推理性能,推理时间从0.106到0.220秒。结论:我们提供了系列实验,详细分析了所提出的混合神经网络在提取帕金森病独特手写模式方面的有效性和效率。实验结果表明,所提出的混合神经网络在精确诊断帕金森病方面具有出色的效果和效率。
https://arxiv.org/abs/2311.11756
Recovering temporally consistent 3D human body pose, shape and motion from a monocular video is a challenging task due to (self-)occlusions, poor lighting conditions, complex articulated body poses, depth ambiguity, and limited availability of annotated data. Further, doing a simple perframe estimation is insufficient as it leads to jittery and implausible results. In this paper, we propose a novel method for temporally consistent motion estimation from a monocular video. Instead of using generic ResNet-like features, our method uses a body-aware feature representation and an independent per-frame pose and camera initialization over a temporal window followed by a novel spatio-temporal feature aggregation by using a combination of self-similarity and self-attention over the body-aware features and the perframe initialization. Together, they yield enhanced spatiotemporal context for every frame by considering remaining past and future frames. These features are used to predict the pose and shape parameters of the human body model, which are further refined using an LSTM. Experimental results on the publicly available benchmark data show that our method attains significantly lower acceleration error and outperforms the existing state-of-the-art methods over all key quantitative evaluation metrics, including complex scenarios like partial occlusion, complex poses and even relatively low illumination.
从单目视频恢复具有时间一致的3D人体姿态、形状和运动是一个具有挑战性的任务,由于自遮挡、低光照条件、复杂的关节人体姿势、深度不确定性以及缺乏标注数据,进一步来说,简单的一帧估计是不够的,因为它会导致抖动和不合理的预测结果。此外,仅通过简单的一帧估计也无法解决这一问题。因此,本文提出了一种新颖的方法,从单目视频中进行时间一致的运动估计。 我们不再使用通用的ResNet类特征,而是使用基于身体的特征表示和每个帧的身体姿态和相机初始化,在时间窗口上跟随自相似性和自注意力的组合,然后通过自适应和自注意力的结合对身体感知特征和每个帧初始化进行新的时间空间特征聚合。这样,它们为每个帧产生增强的时空上下文,通过考虑保留的过去和未来帧。这些特征用于预测人体模型的大致姿态和形状参数,进一步通过LSTM进行微调。 在公开可用的基准数据上进行实验表明,我们的方法在加速误差方面明显较低,并且在关键的定量评估指标方面优于现有方法,包括复杂的遮盖场景、复杂的人体姿势甚至相对较低的亮度。
https://arxiv.org/abs/2311.11662