RNN

Improved AutoEncoder with LSTM module and KL divergence

2024-04-30 04:11:21

Wei Huang, Bingyang Zhang, Kaituo Zhang, Hua Gao, Rongchun Wan

arXiv_CV

arXiv_CV RNN CNN Detection Pose Reconstruction
Abstract

The task of anomaly detection is to separate anomalous data from normal data in the dataset. Models such as deep convolutional autoencoder (CAE) network and deep supporting vector data description (SVDD) model have been universally employed and have demonstrated significant success in detecting anomalies. However, the over-reconstruction ability of CAE network for anomalous data can easily lead to high false negative rate in detecting anomalous data. On the other hand, the deep SVDD model has the drawback of feature collapse, which leads to a decrease of detection accuracy for anomalies. To address these problems, we propose the Improved AutoEncoder with LSTM module and Kullback-Leibler divergence (IAE-LSTM-KL) model in this paper. An LSTM network is added after the encoder to memorize feature representations of normal data. In the meanwhile, the phenomenon of feature collapse can also be mitigated by penalizing the featured input to SVDD module via KL divergence. The efficacy of the IAE-LSTM-KL model is validated through experiments on both synthetic and real-world datasets. Experimental results show that IAE-LSTM-KL model yields higher detection accuracy for anomalies. In addition, it is also found that the IAE-LSTM-KL model demonstrates enhanced robustness to contaminated outliers in the dataset.

Abstract (translated)

异常检测的任务是将数据集中的异常数据与正常数据区分开来。像深度卷积自动编码器（CAE）网络和深度支持向量数据描述（SVDD）模型这样的模型已经被普遍采用，并在检测异常数据方面取得了显著的成功。然而，CAE网络对异常数据的过度重构能力可能导致在检测异常数据时的假阴性率过高。另一方面，深度SVDD模型的缺点是特征收缩，这会导致异常检测的准确性降低。为了应对这些问题，本文提出了改进自动编码器（IAE-LSTM-KL）模型。在编码器之后添加一个LSTM网络来记忆正常数据的特征表示。同时，通过KL散度惩罚SVDD模块中的特征输入，也可以减轻特征收缩的现象。通过实验验证，IAE-LSTM-KL模型的有效性得到了 both synthetic and real-world datasets 的验证。实验结果表明，IAE-LSTM-KL模型对于异常数据的检测准确率更高。此外，还发现IAE-LSTM-KL模型在数据集中的污染异常检测方面表现出更高的鲁棒性。

URL

https://arxiv.org/abs/2404.19247

PDF

https://arxiv.org/pdf/2404.19247.pdf
Read All
Revenge of the Fallen? Recurrent Models Match Transformers at Predicting Human Language Comprehension Metrics

2024-04-30 01:02:15

James A. Michaelov, Catherine Arnett, Benjamin K. Bergen

arXiv_CL

arXiv_CL RNN Language_Model Transformer
Abstract

Transformers have supplanted Recurrent Neural Networks as the dominant architecture for both natural language processing tasks and, despite criticisms of cognitive implausibility, for modelling the effect of predictability on online human language comprehension. However, two recently developed recurrent neural network architectures, RWKV and Mamba, appear to perform natural language tasks comparably to or better than transformers of equivalent scale. In this paper, we show that contemporary recurrent models are now also able to match - and in some cases, exceed - performance of comparably sized transformers at modeling online human language comprehension. This suggests that transformer language models are not uniquely suited to this task, and opens up new directions for debates about the extent to which architectural features of language models make them better or worse models of human language comprehension.

Abstract (translated)

转换器已经取代了循环神经网络在自然语言处理任务中的主导地位，并为建模预测对在线人类语言理解的影响提供了优势。然而，最近开发的两个循环神经网络架构，RWKV和Mamba，似乎与同等规模的转换器在自然语言任务上表现相当，或者甚至更好。在本文中，我们证明了当代循环模型现在也能够达到或超过相同或更大规模的转换器的性能，建模在线人类语言理解。这表明，转换器语言模型并不独一无二地适应当这项任务，并为讨论语言模型的架构特征使其成为更好或更差的人类语言理解模型提供了新的方向。

URL

https://arxiv.org/abs/2404.19178

PDF

https://arxiv.org/pdf/2404.19178.pdf
Read All
Evaluating the effectiveness of predicting covariates in LSTM Networks for Time Series Forecasting

2024-04-29 09:51:25

Gareth Davies

arXiv_AI

arXiv_AI RNN Prediction Pose
Abstract

Autoregressive Recurrent Neural Networks are widely employed in time-series forecasting tasks, demonstrating effectiveness in univariate and certain multivariate scenarios. However, their inherent structure does not readily accommodate the integration of future, time-dependent covariates. A proposed solution, outlined by Salinas et al 2019, suggests forecasting both covariates and the target variable in a multivariate framework. In this study, we conducted comprehensive tests on publicly available time-series datasets, artificially introducing highly correlated covariates to future time-step values. Our evaluation aimed to assess the performance of an LSTM network when considering these covariates and compare it against a univariate baseline. As part of this study we introduce a novel approach using seasonal time segments in combination with an RNN architecture, which is both simple and extremely effective over long forecast horizons with comparable performance to many state of the art architectures. Our findings from the results of more than 120 models reveal that under certain conditions jointly training covariates with target variables can improve overall performance of the model, but often there exists a significant performance disparity between multivariate and univariate predictions. Surprisingly, even when provided with covariates informing the network about future target values, multivariate predictions exhibited inferior performance. In essence, compelling the network to predict multiple values can prove detrimental to model performance, even in the presence of informative covariates. These results suggest that LSTM architectures may not be suitable for forecasting tasks where predicting covariates would typically be expected to enhance model accuracy.

Abstract (translated)

自回归循环神经网络在时间序列预测任务中得到了广泛应用，表明在单变量和某些多变量场景下具有有效性。然而，其固有结构并不容易适应未来的时间依赖变量。为了克服这一限制，Salinas等人2019提出了一个解决方案，即在多变量框架中预测协变量和目标变量。在这项研究中，我们对公共可用的时间序列数据进行了全面的测试，人为地引入高度相关的时间步值。我们的评估旨在评估在考虑这些协变量时LSTM网络的表现，并将其与单变量基线进行比较。作为本研究的一部分，我们采用了结合季节时间段和循环神经网络架构的新颖方法，该方法简单且在长预测周期上具有与许多最先进的架构相当的效果。我们对超过120个模型的结果进行评估，发现当某些条件下，在协变量和目标变量上共同训练预测变量可以提高模型的整体性能，但通常多变量预测与单变量预测之间存在显著的性能差异。令人惊讶的是，即使提供了指导网络预测未来目标值的协变量，多变量预测仍然表现不佳。本质上，让网络预测多个值可能会对模型的性能产生负面影响，即使在存在有益的协变量的情况下也是如此。这些结果表明，LSTM架构可能不适用于那些预测协变量通常应该有助于提高模型准确性的预测任务。

URL

https://arxiv.org/abs/2404.18553

PDF

https://arxiv.org/pdf/2404.18553.pdf
Read All
A Systematic Evaluation of Adversarial Attacks against Speech Emotion Recognition Models

2024-04-29 09:00:32

Nicolas Facchinetti, Federico Simonetta, Stavros Ntalampiras

arXiv_SD

arXiv_SD RNN Recognition Deep_Learning Adversarial Attention Pose Action Emotion Speech
Abstract

Speech emotion recognition (SER) is constantly gaining attention in recent years due to its potential applications in diverse fields and thanks to the possibility offered by deep learning technologies. However, recent studies have shown that deep learning models can be vulnerable to adversarial attacks. In this paper, we systematically assess this problem by examining the impact of various adversarial white-box and black-box attacks on different languages and genders within the context of SER. We first propose a suitable methodology for audio data processing, feature extraction, and CNN-LSTM architecture. The observed outcomes highlighted the significant vulnerability of CNN-LSTM models to adversarial examples (AEs). In fact, all the considered adversarial attacks are able to significantly reduce the performance of the constructed models. Furthermore, when assessing the efficacy of the attacks, minor differences were noted between the languages analyzed as well as between male and female speech. In summary, this work contributes to the understanding of the robustness of CNN-LSTM models, particularly in SER scenarios, and the impact of AEs. Interestingly, our findings serve as a baseline for a) developing more robust algorithms for SER, b) designing more effective attacks, c) investigating possible defenses, d) improved understanding of the vocal differences between different languages and genders, and e) overall, enhancing our comprehension of the SER task.

Abstract (translated)

近年来，由于其在各种领域具有潜在应用以及深度学习技术的优势，情感识别（SER）引起了越来越多的关注。然而，最近的研究表明，深度学习模型可能容易受到对抗攻击。在本文中，我们通过研究各种对抗性白盒和黑盒攻击对不同语言和性别在SER背景下的影响，系统地评估了这个问题。我们首先提出了一个音频数据处理、特征提取和CNN-LSTM架构的合适方法。观察到的结果突出了CNN-LSTM模型对对抗实例（AEs）的重大漏洞。事实上，所考虑的所有攻击都能够显著地降低构建模型的性能。此外，在评估攻击的有效性时，分析的语言之间以及男性和女性之间的差异较小。总之，本工作为理解CNN-LSTM模型的SER鲁棒性以及AEs的影响做出了贡献。有趣的是，我们的研究为a)为SER开发更健壮的算法，b)设计更有效的攻击，c)研究可能的防御，d)改进对不同语言和性别之间语音差异的理解，以及e)提高对SER任务的全面理解做出了贡献。

URL

https://arxiv.org/abs/2404.18514

PDF

https://arxiv.org/pdf/2404.18514.pdf
Read All
Deep Learning for Low-Latency, Quantum-Ready RF Sensing

2024-04-27 17:22:12

Pranav Gokhale, Caitlin Carnahan, William Clark, Frederic T. Chong

arXiv_AI

arXiv_AI RNN Deep_Learning Classification Inference
Abstract

Recent work has shown the promise of applying deep learning to enhance software processing of radio frequency (RF) signals. In parallel, hardware developments with quantum RF sensors based on Rydberg atoms are breaking longstanding barriers in frequency range, resolution, and sensitivity. In this paper, we describe our implementations of quantum-ready machine learning approaches for RF signal classification. Our primary objective is latency: while deep learning offers a more powerful computational paradigm, it also traditionally incurs latency overheads that hinder wider scale deployment. Our work spans three axes. (1) A novel continuous wavelet transform (CWT) based recurrent neural network (RNN) architecture that enables flexible online classification of RF signals on-the-fly with reduced sampling time. (2) Low-latency inference techniques for both GPU and CPU that span over 100x reductions in inference time, enabling real-time operation with sub-millisecond inference. (3) Quantum-readiness validated through application of our models to physics-based simulation of Rydberg atom QRF sensors. Altogether, our work bridges towards next-generation RF sensors that use quantum technology to surpass previous physical limits, paired with latency-optimized AI/ML software that is suitable for real-time deployment.

Abstract (translated)

近年来，将深度学习应用于增强射频（RF）信号处理的前景已经得到了证明。同时，基于Rydberg原子的量子RF传感器硬件发展正在打破长期存在的频段、分辨率和灵敏度障碍。在本文中，我们描述了用于RF信号分类的量子ready机器学习方法的实现。我们的主要目标延迟：虽然深度学习提供了更强大的计算范式，但传统上也会产生延迟，这会阻碍更广泛的应用。我们的工作跨越三个轴线。（1）一种基于循环神经网络（RNN）架构的新型连续波形变换（CWT），在减少采样时间的情况下实现对RF信号的灵活在线分类。（2）超过100x降低推理时间的低延迟推理技术，实现实时的操作。（3）通过将我们的模型应用于基于Rydberg原子QRF传感器的物理仿真，验证了量子兼容性。结合量子技术和低延迟AI/ML软件，我们致力于为下一代使用量子技术超越以前物理限制的RF传感器铺平道路，实现实时部署。

URL

https://arxiv.org/abs/2404.17962

PDF

https://arxiv.org/pdf/2404.17962.pdf
Read All
Motion planning for off-road autonomous driving based on human-like cognition and weight adaptation

2024-04-27 08:00:35

Yuchun Wang, Cheng Gong, Jianwei Gong, Peng Jia

arXiv_AI

arXiv_AI RNN Optimization Pose Autonomous
Abstract

Driving in an off-road environment is challenging for autonomous vehicles due to the complex and varied terrain. To ensure stable and efficient travel, the vehicle requires consideration and balancing of environmental factors, such as undulations, roughness, and obstacles, to generate optimal trajectories that can adapt to changing scenarios. However, traditional motion planners often utilize a fixed cost function for trajectory optimization, making it difficult to adapt to different driving strategies in challenging irregular terrains and uncommon scenarios. To address these issues, we propose an adaptive motion planner based on human-like cognition and cost evaluation for off-road driving. First, we construct a multi-layer map describing different features of off-road terrains, including terrain elevation, roughness, obstacle, and artificial potential field map. Subsequently, we employ a CNN-LSTM network to learn the trajectories planned by human drivers in various off-road scenarios. Then, based on human-like generated trajectories in different environments, we design a primitive-based trajectory planner that aims to mimic human trajectories and cost weight selection, generating trajectories that are consistent with the dynamics of off-road vehicles. Finally, we compute optimal cost weights and select and extend behavioral primitives to generate highly adaptive, stable, and efficient trajectories. We validate the effectiveness of the proposed method through experiments in a desert off-road environment with complex terrain and varying road conditions. The experimental results show that the proposed human-like motion planner has excellent adaptability to different off-road conditions. It shows real-time operation, greater stability, and more human-like planning ability in diverse and challenging scenarios.

Abstract (translated)

在离散环境中驾驶对于自动驾驶车辆来说具有挑战性，因为复杂和多样化的地形会带来问题。为了确保平稳和高效的行驶，车辆需要考虑并平衡环境因素，例如起伏，粗糙度和障碍，以生成能够适应变化场景的最优轨迹。然而，传统的运动规划器通常使用固定成本函数进行轨迹优化，这使得在具有挑战性的不规则地形和罕见场景中适应不同的驾驶策略变得困难。为了应对这些问题，我们提出了一个基于人类认知和成本评估的离散驾驶运动规划器。首先，我们构建了一个多层地图，描述离散地形的不同特征，包括地形高度，粗糙度，障碍和人工潜在场。然后，我们使用卷积神经网络-长短时记忆网络（CNN-LSTM）学习人类驾驶员在各种离散场景中计划的各种轨迹。接着，根据不同环境下人类生成的轨迹，我们设计了一个基于原型的轨迹规划器，旨在模仿人类轨迹和成本权衡选择，生成与离散车辆动态一致的轨迹。最后，我们计算最优成本权重并选择和扩展行为原型，以生成高度自适应、稳定和高效的轨迹。我们在具有复杂地形和多样公路条件的沙漠离散环境中通过实验验证了所提出方法的有效性。实验结果表明，所提出的具有人类相似运动规划器的自适应性非常好。它显示了在多样和具有挑战性的场景中的实时操作、更高的稳定性和更人类似规划能力。

URL

https://arxiv.org/abs/2404.17820

PDF

https://arxiv.org/pdf/2404.17820.pdf
Read All
An Attention-Based Deep Learning Architecture for Real-Time Monocular Visual Odometry: Applications to GPS-free Drone Navigation

2024-04-27 01:22:45

Olivier Brochu Dufour, Abolfazl Mohebbi, Sofiane Achiche

arXiv_CV

arXiv_CV RNN CNN Drone Deep_Learning Attention Inference Action
Abstract

Drones are increasingly used in fields like industry, medicine, research, disaster relief, defense, and security. Technical challenges, such as navigation in GPS-denied environments, hinder further adoption. Research in visual odometry is advancing, potentially solving GPS-free navigation issues. Traditional visual odometry methods use geometry-based pipelines which, while popular, often suffer from error accumulation and high computational demands. Recent studies utilizing deep neural networks (DNNs) have shown improved performance, addressing these drawbacks. Deep visual odometry typically employs convolutional neural networks (CNNs) and sequence modeling networks like recurrent neural networks (RNNs) to interpret scenes and deduce visual odometry from video sequences. This paper presents a novel real-time monocular visual odometry model for drones, using a deep neural architecture with a self-attention module. It estimates the ego-motion of a camera on a drone, using consecutive video frames. An inference utility processes the live video feed, employing deep learning to estimate the drone's trajectory. The architecture combines a CNN for image feature extraction and a long short-term memory (LSTM) network with a multi-head attention module for video sequence modeling. Tested on two visual odometry datasets, this model converged 48% faster than a previous RNN model and showed a 22% reduction in mean translational drift and a 12% improvement in mean translational absolute trajectory error, demonstrating enhanced robustness to noise.

Abstract (translated)

无人机在工业、医疗、科研、灾害救援、安全和防御等领域越来越受欢迎。然而，技术挑战，如在GPS禁止的环境中进行导航，仍然阻碍了更广泛的采用。视觉惯性测量（Visual Odometry）的研究正在取得进展，有可能解决GPS-free导航问题。传统的视觉惯性测量方法基于几何原理，虽然受欢迎，但往往存在误差累积和高计算需求的问题。最近利用深度神经网络（DNNs）进行研究，已经取得了更好的性能，这些问题得到了解决。深度视觉惯性测量通常采用卷积神经网络（CNN）和循环神经网络（RNN）等序列建模网络来解释场景，并从视频序列中推断视觉惯性。本文介绍了一种用于无人机的全新实时单目视觉惯性测量模型，该模型采用深度神经架构，并带有一个自注意模块。它通过连续视频帧估计无人机摄像机的自运动。推断单元处理实时视频流，并利用深度学习估计无人机的轨迹。在两个视觉惯性测量数据集上进行了测试，与之前的RNN模型相比，该模型 converged 48% 更快，mean translational drift平均降低了22%，mean translational absolute trajectory error平均降低了12%，证明了在噪声环境下的增强鲁棒性。

URL

https://arxiv.org/abs/2404.17745

PDF

https://arxiv.org/pdf/2404.17745.pdf
Read All
On the Road to Clarity: Exploring Explainable AI for World Models in a Driver Assistance System

2024-04-26 11:57:17

Mohamed Roshdi, Julian Petzold, Mostafa Wahby, Hussein Ebrahim, Mladen Berekovic, Heiko Hamann

arXiv_CV

arXiv_CV RNN CNN Prediction Unsupervised Pose Autonomous Reconstruction
Abstract

In Autonomous Driving (AD) transparency and safety are paramount, as mistakes are costly. However, neural networks used in AD systems are generally considered black boxes. As a countermeasure, we have methods of explainable AI (XAI), such as feature relevance estimation and dimensionality reduction. Coarse graining techniques can also help reduce dimensionality and find interpretable global patterns. A specific coarse graining method is Renormalization Groups from statistical physics. It has previously been applied to Restricted Boltzmann Machines (RBMs) to interpret unsupervised learning. We refine this technique by building a transparent backbone model for convolutional variational autoencoders (VAE) that allows mapping latent values to input features and has performance comparable to trained black box VAEs. Moreover, we propose a custom feature map visualization technique to analyze the internal convolutional layers in the VAE to explain internal causes of poor reconstruction that may lead to dangerous traffic scenarios in AD applications. In a second key contribution, we propose explanation and evaluation techniques for the internal dynamics and feature relevance of prediction networks. We test a long short-term memory (LSTM) network in the computer vision domain to evaluate the predictability and in future applications potentially safety of prediction models. We showcase our methods by analyzing a VAE-LSTM world model that predicts pedestrian perception in an urban traffic situation.

Abstract (translated)

在自动驾驶（AD）中，透明性和安全性至关重要，因为错误代价高昂。然而，用于AD系统的神经网络通常被认为是黑盒子。为了应对这种情况，我们提出了可解释性AI（XAI）方法，如特征相关估计和降维。粗粒度技术还可以帮助降低维度并找到可解释的全局模式。一种具体的粗粒度方法是统计物理中的Renormalization Groups。它之前曾被应用于限制玻尔兹曼机（RBM）以解释无监督学习。我们通过构建一个可解释的卷积变分自编码器（VAE）骨干模型来改进这一技术，使其具有与训练好的黑盒VAE相当的性能。此外，我们还提出了自定义特征图可视化技术，以分析VAE中的内部卷积层，从而解释可能导致AD应用程序中出现危险交通场景的内部原因。在第二个关键贡献中，我们提出了用于预测网络内部动态和特征相关性的解释和评估技术。我们在计算机视觉领域测试了一个长短期记忆（LSTM）网络，以评估预测模型的预测性和在未来的应用中的安全性。我们通过分析预测步行在城市交通情况下的VAE-LSTM世界模型来展示我们的方法。

URL

https://arxiv.org/abs/2404.17350

PDF

https://arxiv.org/pdf/2404.17350.pdf
Read All
A Bionic Natural Language Parser Equivalent to a Pushdown Automaton

2024-04-26 11:50:15

Zhenghao Wei, Kehua Lin, Jianlin Feng

arXiv_CL

arXiv_CL RNN Pose
Abstract

Assembly Calculus (AC), proposed by Papadimitriou et al., aims to reproduce advanced cognitive functions through simulating neural activities, with several applications based on AC having been developed, including a natural language parser proposed by Mitropolsky et al. However, this parser lacks the ability to handle Kleene closures, preventing it from parsing all regular languages and rendering it weaker than Finite Automata (FA). In this paper, we propose a new bionic natural language parser (BNLP) based on AC and integrates two new biologically rational structures, Recurrent Circuit and Stack Circuit which are inspired by RNN and short-term memory mechanism. In contrast to the original parser, the BNLP can fully handle all regular languages and Dyck languages. Therefore, leveraging the Chomsky-Sch űtzenberger theorem, the BNLP which can parse all Context-Free Languages can be constructed. We also formally prove that for any PDA, a Parser Automaton corresponding to BNLP can always be formed, ensuring that BNLP has a description ability equal to that of PDA and addressing the deficiencies of the original parser.

Abstract (translated)

翻译： Assembly Calculus（AC）是由Papadimitriou等人提出的一种方法，旨在通过模拟神经活动来复制高级认知功能，基于AC已经开发了许多应用，包括Mitropolsky等人提出的自然语言解析器。然而，这个解析器缺乏处理Kleene闭合的能力，导致它无法解析所有正则语言，变得比有限自动机（FA）更弱。在本文中，我们提出了一个基于AC的新生物自然语言解析器（BNLP），并整合了两个新的生物合理结构：循环电路和堆栈电路，这些结构受到RNN和短期记忆机制的启发。与原始解析器不同，BNLP可以完全处理所有正则语言和Dyck语言。因此，利用Chomsky-Sch 税务总局论，可以构建出可以解析所有上下文无关语言的Parser自动机。我们还正式证明了，对于任何PDA，都可以形成一个与BNLP相应的Parser自动机，从而确保BNLP具有与PDA相同的描述能力，并解决了原解析器的不足之处。

URL

https://arxiv.org/abs/2404.17343

PDF

https://arxiv.org/pdf/2404.17343.pdf
Read All
MCSDNet: Mesoscale Convective System Detection Network via Multi-scale Spatiotemporal Information

2024-04-26 06:40:54

Jiajun Liang, Baoquan Zhang, Yunming Ye, Xutao Li, Chuyao Luo, Xukai Fu

arXiv_AI

arXiv_AI RNN Detection Attention Relation Transformer Pose
Abstract

The accurate detection of Mesoscale Convective Systems (MCS) is crucial for meteorological monitoring due to their potential to cause significant destruction through severe weather phenomena such as hail, thunderstorms, and heavy rainfall. However, the existing methods for MCS detection mostly targets on single-frame detection, which just considers the static characteristics and ignores the temporal evolution in the life cycle of MCS. In this paper, we propose a novel encoder-decoder neural network for MCS detection(MCSDNet). MCSDNet has a simple architecture and is easy to expand. Different from the previous models, MCSDNet targets on multi-frames detection and leverages multi-scale spatiotemporal information for the detection of MCS regions in remote sensing imagery(RSI). As far as we know, it is the first work to utilize multi-scale spatiotemporal information to detect MCS regions. Firstly, we design a multi-scale spatiotemporal information module to extract multi-level semantic from different encoder levels, which makes our models can extract more detail spatiotemporal features. Secondly, a Spatiotemporal Mix Unit(STMU) is introduced to MCSDNet to capture both intra-frame features and inter-frame correlations, which is a scalable module and can be replaced by other spatiotemporal module, e.g., CNN, RNN, Transformer and our proposed Dual Spatiotemporal Attention(DSTA). This means that the future works about spatiotemporal modules can be easily integrated to our model. Finally, we present MCSRSI, the first publicly available dataset for multi-frames MCS detection based on visible channel images from the FY-4A satellite. We also conduct several experiments on MCSRSI and find that our proposed MCSDNet achieve the best performance on MCS detection task when comparing to other baseline methods.

Abstract (translated)

准确检测 Mesoscale Convective Systems (MCS) 对气象监测至关重要，因为它们有可能通过严重的天气现象（如冰雹、雷暴和重雨）造成重大破坏。然而，现有的 MCS 检测方法主要针对单帧检测，这仅仅考虑了静态特征并忽略了 MCS 生命周期的时间演化。在本文中，我们提出了一个新颖的编码器-解码器神经网络用于 MCS 检测（MCSDNet）。MCSDNet 具有简单的架构，易于扩展。与之前模型不同，MCSDNet 针对多帧检测，并利用遥感和气象卫星图像中的多尺度时空信息来检测 MCS 区域。据我们所知，这是第一个利用多尺度时空信息检测 MCS 区域的 work。首先，我们设计了一个多尺度时空信息模块，以提取不同编码器级别下的多层语义，这使得我们的模型可以提取更详细的时空特征。其次，引入了一个 Spatiotemporal Mix Unit（STMU），它可以捕捉帧内特征和帧间关联，是一个可扩展的模块，可以替代其他时空模块，例如 CNN、RNN、Transformer 和我们提出的双时空注意（DSTA）。这意味着未来关于时空模块的工作可以轻松地集成到我们的模型中。最后，我们提出了 MCSRSI，第一个基于 FY-4A 卫星可见通道图像的多帧 MCS 检测公开数据集。我们还对 MCSRSI 进行了多项实验，并发现我们提出的 MCSDNet 在 MCS 检测任务上与其他基线方法相比实现了最佳性能。

URL

https://arxiv.org/abs/2404.17186

PDF

https://arxiv.org/pdf/2404.17186.pdf
Read All
Sensor Response-Time Reduction using Long-Short Term Memory Network Forecasting

2024-04-26 04:21:14

Simon J. Ward, Muhamed Baljevic, Sharon M. Weiss

arXiv_CV

arXiv_CV RNN Prediction Optimization Medical Diffusion
Abstract

The response time of a biosensor is a crucial metric in safety-critical applications such as medical diagnostics where an earlier diagnosis can markedly improve patient outcomes. However, the speed at which a biosensor reaches a final equilibrium state can be limited by poor mass transport and long molecular diffusion times that increase the time it takes target molecules to reach the active sensing region of a biosensor. While optimization of system and sensor design can promote molecules reaching the sensing element faster, a simpler and complementary approach for response time reduction that is widely applicable across all sensor platforms is to use time-series forecasting to predict the ultimate steady-state sensor response. In this work, we show that ensembles of long short-term memory (LSTM) networks can accurately predict equilibrium biosensor response from a small quantity of initial time-dependent biosensor measurements, allowing for significant reduction in response time by a mean and median factor of improvement of 18.6 and 5.1, respectively. The ensemble of models also provides simultaneous estimation of uncertainty, which is vital to provide confidence in the predictions and subsequent safety-related decisions that are made. This approach is demonstrated on real-time experimental data collected by exposing porous silicon biosensors to buffered protein solutions using a multi-channel fluidic cell that enables the automated measurement of 100 porous silicon biosensors in parallel. The dramatic improvement in sensor response time achieved using LSTM network ensembles and associated uncertainty quantification opens the door to trustworthy and faster responding biosensors, enabling more rapid medical diagnostics for improved patient outcomes and healthcare access, as well as quicker identification of toxins in food and the environment.

Abstract (translated)

生物传感器响应时间的延迟是一个关键的安全应用指标，如医疗诊断，因为较早的诊断可以显著改善患者的治疗效果。然而，生物传感器达到最终平衡状态的速度可能受到质粒传输和长分子扩散时间的限制，这会延长目标分子到达传感器活性传感区域的时间，从而增加传感器响应时间。虽然对系统和传感器设计的优化可以促进分子更快地到达传感器元件，但简化和互补的方法来降低响应时间，在所有传感器平台上具有广泛的应用，是使用时间序列预测来预测 ultimate steady-state sensor response。在这项工作中，我们证明了长短期记忆（LSTM）网络的集成可以从初始时间相关生物传感器测量的小量数据准确预测平衡生物传感器响应，从而通过平均和均值改进响应时间。模型的集成还提供了同时估计不确定性，这对于提供对预测和后续安全相关决策的信心至关重要。这种方法通过对使用多通道流体细胞暴露具有较大通量检测100个孔硅生物传感器进行实时实验，证明了其可实现性和有效性。通过使用LSTM网络集成，我们实现了传感器响应时间的戏剧性改进和相关不确定性量的估计，这为可信和更快响应的生物传感器打开了大门，有助于改善患者的治疗效果和提高医疗资源的利用效率，以及更快速地检测食品和环境中的毒素。

URL

https://arxiv.org/abs/2404.17144

PDF

https://arxiv.org/pdf/2404.17144.pdf
Read All
Application of Long-Short Term Memory and Convolutional Neural Networks for Real-Time Bridge Scour Forecast

2024-04-25 12:04:36

Tahrima Hashem, Negin Yousefpour

arXiv_CV

arXiv_CV RNN CNN Deep_Learning Prediction
Abstract

Scour around bridge piers is a critical challenge for infrastructures around the world. In the absence of analytical models and due to the complexity of the scour process, it is difficult for current empirical methods to achieve accurate predictions. In this paper, we exploit the power of deep learning algorithms to forecast the scour depth variations around bridge piers based on historical sensor monitoring data, including riverbed elevation, flow elevation, and flow velocity. We investigated the performance of Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN) models for real-time scour forecasting using data collected from bridges in Alaska and Oregon from 2006 to 2021. The LSTM models achieved mean absolute error (MAE) ranging from 0.1m to 0.5m for predicting bed level variations a week in advance, showing a reasonable performance. The Fully Convolutional Network (FCN) variant of CNN outperformed other CNN configurations, showing a comparable performance to LSTMs with significantly lower computational costs. We explored various innovative random-search heuristics for hyperparameter tuning and model optimisation which resulted in reduced computational cost compared to grid-search method. The impact of different combinations of sensor features on scour prediction showed the significance of the historical time series of scour for predicting upcoming events. Overall, this study provides a greater understanding of the potential of Deep Learning (DL) for real-time scour forecasting and early warning in bridges with diverse scour and flow characteristics including riverine and tidal/coastal bridges.

Abstract (translated)

在世界各地的基础设施中，清理桥墩是一个关键的挑战。缺乏分析模型以及由于侵蚀过程的复杂性，当前的实证方法很难实现准确的预测。在本文中，我们利用深度学习算法的优势来预测基于历史传感器监测数据桥墩周围的侵蚀深度变化，包括河床高度、流速和流深。我们还研究了使用2006年至2021年阿拉斯加和俄勒冈州桥梁收集的数据来预测实时侵蚀预测的LSTM和卷积神经网络模型的性能。LSTM模型的预测床面变化平均绝对误差（MAE）在提前一周预测时从0.1米到0.5米，表现出相当不错的性能。全卷积网络（FCN）变体在其他CNN配置中表现优异，与LSTM模型的性能相当，但计算成本较低。我们研究了各种创新随机搜索策略进行超参数调整和模型优化，从而使计算成本比网格搜索方法降低。不同传感器特征组合对侵蚀预测的影响表明了历史侵蚀时间序列对于预测即将发生事件的显著性。总体而言，本研究为深入理解DL在具有多样scour和flow特性的桥梁上的实时侵蚀预测和预警提供了更大的认识。

URL

https://arxiv.org/abs/2404.16549

PDF

https://arxiv.org/pdf/2404.16549.pdf
Read All
Homonym Sense Disambiguation in the Georgian Language

2024-04-24 21:48:43

Davit Melikidze, Alexander Gamkrelidze

arXiv_CL

arXiv_CL RNN Language_Model Pose LLM
Abstract

This research proposes a novel approach to the Word Sense Disambiguation (WSD) task in the Georgian language, based on supervised fine-tuning of a pre-trained Large Language Model (LLM) on a dataset formed by filtering the Georgian Common Crawls corpus. The dataset is used to train a classifier for words with multiple senses. Additionally, we present experimental results of using LSTM for WSD. Accurately disambiguating homonyms is crucial in natural language processing. Georgian, an agglutinative language belonging to the Kartvelian language family, presents unique challenges in this context. The aim of this paper is to highlight the specific problems concerning homonym disambiguation in the Georgian language and to present our approach to solving them. The techniques discussed in the article achieve 95% accuracy for predicting lexical meanings of homonyms using a hand-classified dataset of over 7500 sentences.

Abstract (translated)

本文提出了一种基于监督微调预训练大型语言模型（LLM）在格鲁吉亚语中的新颖方法来解决Word Sense Disambiguation（WSD）任务。该方法基于通过过滤乔治亚公共爬行语料库来构建的数据集来训练具有多个意义的单词分类器。数据集用于训练具有多个意义的单词分类器。此外，我们还提供了使用LSTM进行WSD的实验结果。准确地解决同义词歧义在自然语言处理中至关重要。属于卡特维利语系（Kartvelian）的格鲁吉亚语在这一点上具有独特的挑战。本文的目的是强调格鲁吉亚语中同义词歧义问题及其解决方法。本文所讨论的技术达到使用超过7500个句子的手分类数据集预测同义词含义的95%准确度。

URL

https://arxiv.org/abs/2405.00710

PDF

https://arxiv.org/pdf/2405.00710.pdf
Read All
Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges

2024-04-24 18:10:31

Badri Narayana Patro, Vijay Srinivas Agneeswaran

arXiv_AI

arXiv_AI Speech_Recognition RNN Recognition Memory_Networks Survey Attention Recommendation Transformer Pose Medical Speech
Abstract

Sequence modeling is a crucial area across various domains, including Natural Language Processing (NLP), speech recognition, time series forecasting, music generation, and bioinformatics. Recurrent Neural Networks (RNNs) and Long Short Term Memory Networks (LSTMs) have historically dominated sequence modeling tasks like Machine Translation, Named Entity Recognition (NER), etc. However, the advancement of transformers has led to a shift in this paradigm, given their superior performance. Yet, transformers suffer from $O(N^2)$ attention complexity and challenges in handling inductive bias. Several variations have been proposed to address these issues which use spectral networks or convolutions and have performed well on a range of tasks. However, they still have difficulty in dealing with long sequences. State Space Models(SSMs) have emerged as promising alternatives for sequence modeling paradigms in this context, especially with the advent of S4 and its variants, such as S4nd, Hippo, Hyena, Diagnol State Spaces (DSS), Gated State Spaces (GSS), Linear Recurrent Unit (LRU), Liquid-S4, Mamba, etc. In this survey, we categorize the foundational SSMs based on three paradigms namely, Gating architectures, Structural architectures, and Recurrent architectures. This survey also highlights diverse applications of SSMs across domains such as vision, video, audio, speech, language (especially long sequence modeling), medical (including genomics), chemical (like drug design), recommendation systems, and time series analysis, including tabular data. Moreover, we consolidate the performance of SSMs on benchmark datasets like Long Range Arena (LRA), WikiText, Glue, Pile, ImageNet, Kinetics-400, sstv2, as well as video datasets such as Breakfast, COIN, LVU, and various time series datasets. The project page for Mamba-360 work is available on this webpage.\url{this https URL}.

Abstract (translated)

序列建模是一个贯穿各种领域的关键领域，包括自然语言处理（NLP）、语音识别、时间序列预测、音乐生成和生物信息学。递归神经网络（RNNs）和长短时记忆网络（LSTMs）历史上曾统治序列建模任务，如机器翻译、命名实体识别等。然而，Transformer的进步导致了一种范式的转移，由于它们在性能上的优越表现。然而，Transformer的注意力复杂性和处理归纳偏差的能力仍然存在挑战。为解决这些问题，已经提出了几种变体，包括使用特征网络或卷积的模型，并在各种任务上表现良好。然而，它们仍然很难处理长序列。状态空间模型（SSMs）在这一背景下出现了有前景的替代方案，尤其是S4和其变体，如S4nd、Hippo、Hyena、诊断状态空间（DSS）、Gated State Spaces（GSS）和Linear Recurrent Unit（LRU）、Liquid-S4、Mamba等。在本次调查中，我们根据三种范式对基本SSMs进行了分类，即开关架构、结构架构和循环架构。本调查还强调了SSMs在各个领域的多样化应用，如视觉、视频、音频、语音、语言（特别是长序列建模）、医学（包括基因组学）、化学（如药物设计）和推荐系统，以及时间序列分析，包括表格数据。此外，我们还分析了SSMs在基准数据集，如Long Range Arena（LRA）、WikiText、Glue、Pile、ImageNet、Kinetics-400、sstv2，以及视频数据集，如Breakfast、COIN、LVU等。Mamba-360工作的项目页面可以在该网页上查看。

URL

https://arxiv.org/abs/2404.16112

PDF

https://arxiv.org/pdf/2404.16112.pdf
Read All
Neural Proto-Language Reconstruction

2024-04-24 06:56:46

Chenxuan Cui, Ying Chen, Qinxin Wang, David R. Mortensen

arXiv_CL

arXiv_CL RNN Prediction Transformer Pose Reconstruction
Abstract

Proto-form reconstruction has been a painstaking process for linguists. Recently, computational models such as RNN and Transformers have been proposed to automate this process. We take three different approaches to improve upon previous methods, including data augmentation to recover missing reflexes, adding a VAE structure to the Transformer model for proto-to-language prediction, and using a neural machine translation model for the reconstruction task. We find that with the additional VAE structure, the Transformer model has a better performance on the WikiHan dataset, and the data augmentation step stabilizes the training.

Abstract (translated)

原型重建一直是语言学家们痛苦的过程。最近，提出了使用诸如RNN和Transformer这样的计算模型来自动化这一过程。我们采用了三种不同的方法来改进以前的方法，包括数据增强来恢复缺失的反射，在Transformer模型中添加VAE结构来进行原型到语言预测，以及使用神经机器翻译模型来进行重构任务。我们发现，在添加了VAE结构之后，Transformer模型的WikiHan数据集的表现更好，数据增强步骤使训练趋于稳定。

URL

https://arxiv.org/abs/2404.15690

PDF

https://arxiv.org/pdf/2404.15690.pdf
Read All
GRSN: Gated Recurrent Spiking Neurons for POMDPs and MARL

2024-04-24 02:20:50

Lang Qin, Ziming Wang, Runhao Jiang, Rui Yan, Huajin Tang

arXiv_AI

arXiv_AI RNN Reinforcement_Learning Inference Pose Agent
Abstract

Spiking neural networks (SNNs) are widely applied in various fields due to their energy-efficient and fast-inference capabilities. Applying SNNs to reinforcement learning (RL) can significantly reduce the computational resource requirements for agents and improve the algorithm's performance under resource-constrained conditions. However, in current spiking reinforcement learning (SRL) algorithms, the simulation results of multiple time steps can only correspond to a single-step decision in RL. This is quite different from the real temporal dynamics in the brain and also fails to fully exploit the capacity of SNNs to process temporal data. In order to address this temporal mismatch issue and further take advantage of the inherent temporal dynamics of spiking neurons, we propose a novel temporal alignment paradigm (TAP) that leverages the single-step update of spiking neurons to accumulate historical state information in RL and introduces gated units to enhance the memory capacity of spiking neurons. Experimental results show that our method can solve partially observable Markov decision processes (POMDPs) and multi-agent cooperation problems with similar performance as recurrent neural networks (RNNs) but with about 50% power consumption.

Abstract (translated)

尖峰神经网络（SNNs）因其在节能和快速推理能力而广泛应用于各种领域。将SNN应用于强化学习（RL）可以显著降低代理程序的计算资源需求，并在资源受限条件下提高算法的性能。然而，在当前的尖峰强化学习（SRL）算法中，多个时间步的模拟结果只能对应于RL中的单步决策。这与大脑的实际时间动态以及SNNs处理时间数据的能力之间存在很大的差异。为了解决这一时间差问题，并更好地利用尖峰神经元的固有时间动态，我们提出了一个新的时间对齐范式（TAP）。它利用尖峰神经元的单步更新来累积历史状态信息，并引入门控单元来增强尖峰神经元的记忆容量。实验结果表明，我们的方法可以与具有类似性能的循环神经网络（RNNs）解决部分可观察的马尔可夫决策过程（POMDP）和多智能体合作问题，但功耗大约为RNN的50%。

URL

https://arxiv.org/abs/2404.15597

PDF

https://arxiv.org/pdf/2404.15597.pdf
Read All
Deep Multi-View Channel-Wise Spatio-Temporal Network for Traffic Flow Prediction

2024-04-23 13:39:04

Hao Miao, Senzhang Wang, Meiyue Zhang, Diansheng Guo, Funing Sun, Fan Yang

arXiv_AI

arXiv_AI RNN CNN Relation Prediction Pose
Abstract

Accurately forecasting traffic flows is critically important to many real applications including public safety and intelligent transportation systems. The challenges of this problem include both the dynamic mobility patterns of the people and the complex spatial-temporal correlations of the urban traffic data. Meanwhile, most existing models ignore the diverse impacts of the various traffic observations (e.g. vehicle speed and road occupancy) on the traffic flow prediction, and different traffic observations can be considered as different channels of input features. We argue that the analysis in multiple-channel traffic observations might help to better address this problem. In this paper, we study the novel problem of multi-channel traffic flow prediction, and propose a deep \underline{M}ulti-\underline{V}iew \underline{C}hannel-wise \underline{S}patio-\underline{T}emporal \underline{Net}work (MVC-STNet) model to effectively address it. Specifically, we first construct the localized and globalized spatial graph where the multi-view fusion module is used to effectively extract the local and global spatial dependencies. Then LSTM is used to learn the temporal correlations. To effectively model the different impacts of various traffic observations on traffic flow prediction, a channel-wise graph convolutional network is also designed. Extensive experiments are conducted over the PEMS04 and PEMS08 datasets. The results demonstrate that the proposed MVC-STNet outperforms state-of-the-art methods by a large margin.

Abstract (translated)

准确预测交通流量对许多实际应用（包括公共安全和智能交通系统）至关重要。这个问题包括人和城市交通数据的动态运动模式以及复杂的空间-时间相关性。同时，大多数现有模型忽略了各种交通观察（如车辆速度和道路占用率）对交通流量预测的影响，而不同的交通观察可以被视为不同的输入特征。我们认为，在多通道交通观察分析中可能有助于更好地解决这个问题。在本文中，我们研究了多通道交通流量预测的新问题，并提出了一个深度 Multi-View Multi-Channel Temporal Network (MMC-STNet) 模型来有效地解决它。具体来说，我们首先构建了局部和全局空间图，其中多视图融合模块用于有效地提取局部和全局空间依赖关系。然后使用 LSTM 学习时间关联。为了有效地建模各种交通观察对交通流量预测的不同影响，还设计了一个通道级别的图卷积网络。在 PEMS04 和 PEMS08 数据集上进行了大量实验。结果表明，与最先进的 methods相比，所提出的 MMC-STNet 具有很大的优势。

URL

https://arxiv.org/abs/2404.15034

PDF

https://arxiv.org/pdf/2404.15034.pdf
Read All
Does Your Neural Code Completion Model Use My Code? A Membership Inference Approach

2024-04-22 15:54:53

Yao Wan, Guanghua Wan, Shijie Zhang, Hongyu Zhang, Yulei Sui, Pan Zhou, Hai Jin, Lichao Sun

arXiv_AI

arXiv_AI RNN Deep_Learning Classification Inference Language_Model Transformer Pose Chat
Abstract

Recent years have witnessed significant progress in developing deep learning-based models for automated code completion. Although using source code in GitHub has been a common practice for training deep-learning-based models for code completion, it may induce some legal and ethical issues such as copyright infringement. In this paper, we investigate the legal and ethical issues of current neural code completion models by answering the following question: Is my code used to train your neural code completion model? To this end, we tailor a membership inference approach (termed CodeMI) that was originally crafted for classification tasks to a more challenging task of code completion. In particular, since the target code completion models perform as opaque black boxes, preventing access to their training data and parameters, we opt to train multiple shadow models to mimic their behavior. The acquired posteriors from these shadow models are subsequently employed to train a membership classifier. Subsequently, the membership classifier can be effectively employed to deduce the membership status of a given code sample based on the output of a target code completion model. We comprehensively evaluate the effectiveness of this adapted approach across a diverse array of neural code completion models, (i.e., LSTM-based, CodeGPT, CodeGen, and StarCoder). Experimental results reveal that the LSTM-based and CodeGPT models suffer the membership leakage issue, which can be easily detected by our proposed membership inference approach with an accuracy of 0.842, and 0.730, respectively. Interestingly, our experiments also show that the data membership of current large language models of code, e.g., CodeGen and StarCoder, is difficult to detect, leaving amper space for further improvement. Finally, we also try to explain the findings from the perspective of model memorization.

Abstract (translated)

近年来，在基于深度学习的自动编程补全模型的发展方面取得了显著的进展。虽然在GitHub使用源代码进行深度学习模型训练是一种常见做法，但可能会引发一些法律和道德问题，例如版权侵犯。在本文中，我们研究了当前神经代码补全模型的法律和道德问题，回答了一个问题：我的代码被用于训练您的神经代码补全模型吗？为此，我们将最初为分类任务设计的成员推断方法（称为CodeMI）适应更具挑战性的代码补全任务。特别是，由于目标代码补全模型表现为黑盒，阻止访问其训练数据和参数，我们选择训练多个影子模型以模仿其行为。这些影子模型的获得的概率随后被用于训练一个成员分类器。随后，成员分类器可以有效地用于根据目标代码补全模型的输出推断给定代码样本的成员状态。我们对这种自适应方法在各种神经代码补全模型上的效果进行全面评估（即基于LSTM的模型、基于CodeGPT的模型、基于CodeGen的模型和基于StarCoder的模型）。实验结果表明，基于LSTM和CodeGPT的模型存在成员泄漏问题，可以通过我们提出的成员推断方法以准确度为0.842和0.730进行检测。有趣的是，我们的实验还发现，当前大型语言模型的数据成员，例如CodeGen和StarCoder，很难检测，留下了一定的改进空间。最后，我们还从模型记忆的角度尝试解释这些发现。

URL

https://arxiv.org/abs/2404.14296

PDF

https://arxiv.org/pdf/2404.14296.pdf
Read All
Automated Text Mining of Experimental Methodologies from Biomedical Literature

2024-04-21 21:19:36

Ziqing Guo

arXiv_CL

arXiv_CL RNN Classification Language_Model Bert Pose Medical
Abstract

Biomedical literature is a rapidly expanding field of science and technology. Classification of biomedical texts is an essential part of biomedicine research, especially in the field of biology. This work proposes the fine-tuned DistilBERT, a methodology-specific, pre-trained generative classification language model for mining biomedicine texts. The model has proven its effectiveness in linguistic understanding capabilities and has reduced the size of BERT models by 40\% but by 60\% faster. The main objective of this project is to improve the model and assess the performance of the model compared to the non-fine-tuned model. We used DistilBert as a support model and pre-trained on a corpus of 32,000 abstracts and complete text articles; our results were impressive and surpassed those of traditional literature classification methods by using RNN or LSTM. Our aim is to integrate this highly specialised and specific model into different research industries.

Abstract (translated)

生物医学文献是一个快速发展的科学和技术领域。生物医学文献分类是生物医学研究的重要组成部分，尤其是在生物学领域。本文提出了一个针对生物医学文献的微调DistilBERT，一种特定于方法论的预训练生成分类语言模型，用于挖掘生物医学文本。该模型在语言理解能力方面已经证明了其有效性，并将BERT模型的大小缩小了40\%但速度提高了60\%。本项目的主要目标是为该模型改进并评估其与未微调模型的性能。我们将DistilBERT用作支持模型，预先训练在32,000个摘要和完整文章的语料库中；我们的结果令人印象深刻，超过了传统文献分类方法的水平，这是通过使用RNN或LSTM实现的。我们的目标是将这种高度专业化和特定化的模型整合到不同的研究产业中。

URL

https://arxiv.org/abs/2404.13779

PDF

https://arxiv.org/pdf/2404.13779.pdf
Read All
Social Force Embedded Mixed Graph Convolutional Network for Multi-class Trajectory Prediction

2024-04-20 13:37:55

Quancheng Du, Xiao Wang, Shouguo Yin, Lingxi Li, Huansheng Ning

arXiv_RO

arXiv_RO RNN CNN Deep_Learning Relation Prediction Pose Autonomous Action Agent
Abstract

Accurate prediction of agent motion trajectories is crucial for autonomous driving, contributing to the reduction of collision risks in human-vehicle interactions and ensuring ample response time for other traffic participants. Current research predominantly focuses on traditional deep learning methods, including convolutional neural networks (CNNs) and recurrent neural networks (RNNs). These methods leverage relative distances to forecast the motion trajectories of a single class of agents. However, in complex traffic scenarios, the motion patterns of various types of traffic participants exhibit inherent randomness and uncertainty. Relying solely on relative distances may not adequately capture the nuanced interaction patterns between different classes of road users. In this paper, we propose a novel multi-class trajectory prediction method named the social force embedded mixed graph convolutional network (SFEM-GCN). SFEM-GCN comprises three graph topologies: the semantic graph (SG), position graph (PG), and velocity graph (VG). These graphs encode various of social force relationships among different classes of agents in complex scenes. Specifically, SG utilizes one-hot encoding of agent-class information to guide the construction of graph adjacency matrices based on semantic information. PG and VG create adjacency matrices to capture motion interaction relationships between different classes agents. These graph structures are then integrated into a mixed graph, where learning is conducted using a spatiotemporal graph convolutional neural network (ST-GCNN). To further enhance prediction performance, we adopt temporal convolutional networks (TCNs) to generate the predicted trajectory with fewer parameters. Experimental results on publicly available datasets demonstrate that SFEM-GCN surpasses state-of-the-art methods in terms of accuracy and robustness.

Abstract (translated)

准确预测代理的运动轨迹对自动驾驶至关重要，有助于减少人与车辆互动中的碰撞风险，并为其他交通参与者确保充足的反应时间。目前的研究主要集中在传统深度学习方法，包括卷积神经网络（CNNs）和循环神经网络（RNNs）。这些方法利用相对距离预测单一类代理的运动轨迹。然而，在复杂的交通场景中，不同类型交通参与者的运动模式表现出固有的随机性和不确定性。仅依赖相对距离可能不足以捕捉不同类别道路用户之间的细微交互模式。在本文中，我们提出了名为社会力嵌入混合图卷积神经网络（SFEM-GCN）的新颖多类轨迹预测方法。SFEM-GCN由三个图结构组成：语义图（SG）、位置图（PG）和速度图（VG）。这些图编码了复杂场景中不同类别代理之间的社会力关系。具体来说，SG利用代理类信息的one-hot编码来引导构建基于语义信息的图邻接矩阵。PG和VG创建邻接矩阵以捕捉不同类别代理之间的运动交互关系。然后将这些图结构整合成一个混合图，使用时空图卷积神经网络（ST-GCNN）进行学习。为了进一步提高预测性能，我们采用时间卷积网络（TCNs）生成预测轨迹，同时参数更少。公开可用数据集上的实验结果表明，SFEM-GCN在准确性和鲁棒性方面超过了最先进的 methods。

URL

https://arxiv.org/abs/2404.13378

PDF

https://arxiv.org/pdf/2404.13378.pdf
Read All

Content

RNN (20)

RNN

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF