With the increased use of the internet and social networks for online discussions, the spread of toxic and inappropriate content on social networking sites has also increased. Several studies have been conducted in different languages. However, there is less work done for South Asian languages for inappropriate content identification using deep learning techniques. In Urdu language, the spellings are not unique, and people write different common spellings for the same word, while mixing it other languages, like English in the text makes it more challenging, and limited research work is available to process such language with the finest algorithms. The use of attention layer with a deep learning model can help handling the long-term dependencies and increase its efficiency . To explore the effects of the attention layer, this study proposes attention-based Bidirectional GRU hybrid model for identifying inappropriate content in Urdu Unicode text language. Four different baseline deep learning models; LSTM, Bi-LSTM, GRU, and TCN, are used to compare the performance of the proposed model. The results of these models were compared based on evaluation metrics, dataset size, and impact of the word embedding layer. The pre-trained Urdu word2Vec embeddings were utilized for our case. Our proposed model BiGRU-A outperformed all other baseline models by yielding 84\% accuracy without using pre-trained word2Vec layer. From our experiments, we have established that the attention layer improves the model's efficiency, and pre-trained word2Vec embedding does not work well with an inappropriate content dataset.
随着互联网和社交网络在在线讨论中的使用增加,社交媒体平台上毒性和不适当内容的传播也有所增加。不同语言中已经进行了多项研究,但在南亚语言中利用深度学习技术进行不当内容识别的研究工作较少。乌尔都语拼写并不唯一,同一单词有多种常见的拼写方式,而且会与其他语言(如英语)混合使用,这使得处理这种语言更具挑战性,并且可用的算法研究有限。 使用注意力层与深度学习模型相结合可以帮助处理长期依赖关系并提高其效率。为了探索注意力层的效果,本研究提出了一种基于注意力的双向GRU混合模型,用于识别乌尔都语Unicode文本中的不当内容。四种不同的基线深度学习模型:LSTM、Bi-LSTM、GRU和TCN被用来比较所提出的模型性能。根据评估指标、数据集大小以及词嵌入层的影响来对比这些模型的结果。我们使用了预训练的乌尔都语word2Vec嵌入。 我们的拟议模型BiGRU-A在不使用预训练的word2Vec层的情况下达到了84%的准确率,优于所有其他基线模型。从实验中得出结论,注意力层可以提高模型效率,并且与不当内容数据集相比,预训练的词向量层表现不佳。
https://arxiv.org/abs/2501.09722
This review underscores the critical need for effective strategies to identify and support individuals with suicidal ideation, exploiting technological innovations in ML and DL to further suicide prevention efforts. The study details the application of these technologies in analyzing vast amounts of unstructured social media data to detect linguistic patterns, keywords, phrases, tones, and contextual cues associated with suicidal thoughts. It explores various ML and DL models like SVMs, CNNs, LSTM, neural networks, and their effectiveness in interpreting complex data patterns and emotional nuances within text data. The review discusses the potential of these technologies to serve as a life-saving tool by identifying at-risk individuals through their digital traces. Furthermore, it evaluates the real-world effectiveness, limitations, and ethical considerations of employing these technologies for suicide prevention, stressing the importance of responsible development and usage. The study aims to fill critical knowledge gaps by analyzing recent studies, methodologies, tools, and techniques in this field. It highlights the importance of synthesizing current literature to inform practical tools and suicide prevention efforts, guiding innovation in reliable, ethical systems for early intervention. This research synthesis evaluates the intersection of technology and mental health, advocating for the ethical and responsible application of ML, DL, and NLP to offer life-saving potential worldwide while addressing challenges like generalizability, biases, privacy, and the need for further research to ensure these technologies do not exacerbate existing inequities and harms.
这篇评论强调了识别和支持有自杀倾向个体的有效策略的迫切需求,利用机器学习(ML)和深度学习(DL)等技术创新来进一步推动自杀预防工作。研究详细介绍了这些技术在分析大量非结构化社交媒体数据方面的作用,以检测与自杀想法相关的语言模式、关键词、短语、语气以及上下文线索。该评论探讨了各种ML和DL模型,如支持向量机(SVM)、卷积神经网络(CNN)、长短期记忆网络(LSTM)和神经网络,并评估它们在解释复杂数据模式和文本情感细微差别方面的有效性。 评论讨论了这些技术作为挽救生命工具的潜力,通过识别有自杀风险个体在其数字足迹中留下的线索。此外,它还评估了将这些技术用于预防自杀的实际效果、局限性和伦理考虑,强调负责任地开发和使用这些技术的重要性。研究旨在填补知识空白,分析近期该领域的研究、方法论、工具和技术。 评论指出综合现有文献对于指导实践工具和自杀预防工作至关重要,并推动可靠且符合伦理规范的早期干预系统创新。这项研究综述评估了科技与心理健康之间的交汇点,并倡导负责任地应用ML、DL以及自然语言处理(NLP)技术,以在全球范围内提供挽救生命的潜力。同时,评论也指出了诸如泛化性、偏见、隐私问题等挑战,强调需要进一步的研究以确保这些技术不会加剧现有的不平等和伤害。
https://arxiv.org/abs/2501.09309
Aviation safety is paramount, demanding precise analysis of safety occurrences during different flight phases. This study employs Natural Language Processing (NLP) and Deep Learning models, including LSTM, CNN, Bidirectional LSTM (BLSTM), and simple Recurrent Neural Networks (sRNN), to classify flight phases in safety reports from the Australian Transport Safety Bureau (ATSB). The models exhibited high accuracy, precision, recall, and F1 scores, with LSTM achieving the highest performance of 87%, 88%, 87%, and 88%, respectively. This performance highlights their effectiveness in automating safety occurrence analysis. The integration of NLP and Deep Learning technologies promises transformative enhancements in aviation safety analysis, enabling targeted safety measures and streamlined report handling.
航空安全是至高无上的,需要对不同飞行阶段的安全事件进行精确分析。本研究采用自然语言处理(NLP)和深度学习模型,包括长短期记忆网络(LSTM)、卷积神经网络(CNN)、双向LSTM(BLSTM)以及简单的递归神经网络(sRNN),来分类澳大利亚运输安全局(ATSB)的安全报告中的飞行阶段。这些模型表现出高准确率、精确度、召回率和F1分数,其中LSTM表现最佳,分别达到了87%、88%、87%和88%。这一性能突显了它们在自动化安全事件分析方面的有效性。将NLP和深度学习技术结合使用有望对航空安全分析产生变革性的改进,使有针对性的安全措施得以实施,并简化报告处理流程。
https://arxiv.org/abs/2501.07923
Long-range sequence modeling is a crucial aspect of natural language processing and time series analysis. However, traditional models like Recurrent Neural Networks (RNNs) and Transformers suffer from computational and memory inefficiencies, especially when dealing with long sequences. This paper introduces Logarithmic Memory Networks (LMNs), a novel architecture that leverages a hierarchical logarithmic tree structure to efficiently store and retrieve past information. LMNs dynamically summarize historical context, significantly reducing the memory footprint and computational complexity of attention mechanisms from O(n2) to O(log(n)). The model employs a single-vector, targeted attention mechanism to access stored information, and the memory block construction worker (summarizer) layer operates in two modes: a parallel execution mode during training for efficient processing of hierarchical tree structures and a sequential execution mode during inference, which acts as a memory management system. It also implicitly encodes positional information, eliminating the need for explicit positional encodings. These features make LMNs a robust and scalable solution for processing long-range sequences in resource-constrained environments, offering practical improvements in efficiency and scalability. The code is publicly available under the MIT License on GitHub: this https URL.
长序列建模是自然语言处理和时间序列分析中的一个关键方面。然而,传统的模型如循环神经网络(RNN)和变换器在处理长序列时会遇到计算效率低下和内存消耗过高的问题。本文介绍了一种新的架构——对数记忆网络(LMN),该架构利用了分层的对数树结构来高效存储和检索过去的信息。LMNs能够动态地总结历史背景,显著减少了注意力机制的记忆占用量和计算复杂度,从O(n^2)降至O(log(n))。模型采用单向量、目标注意机制来访问存储信息,并且记忆块构建工人(摘要生成器)层在两种模式下运行:一种是在训练期间用于高效处理分层树结构的并行执行模式;另一种是推理时作为内存管理系统工作的顺序执行模式。此外,LMNs还隐式地编码位置信息,从而消除了对显式位置编码的需求。这些特性使LMN成为资源受限环境中处理长序列的有效和可扩展解决方案,提供了在效率和可扩展性方面的实际改进。该代码以MIT许可证的形式公开发布在GitHub上:[此链接](请将括号内的文本替换为实际的URL)。
https://arxiv.org/abs/2501.07905
A significant limitation of current smartphone-based eye-tracking algorithms is their low accuracy when applied to video-type visual stimuli, as they are typically trained on static images. Also, the increasing demand for real-time interactive applications like games, VR, and AR on smartphones requires overcoming the limitations posed by resource constraints such as limited computational power, battery life, and network bandwidth. Therefore, we developed two new smartphone eye-tracking techniques for video-type visuals by combining Convolutional Neural Networks (CNN) with two different Recurrent Neural Networks (RNN), namely Long Short Term Memory (LSTM) and Gated Recurrent Unit (GRU). Our CNN+LSTM and CNN+GRU models achieved an average Root Mean Square Error of 0.955 cm and 1.091 cm, respectively. To address the computational constraints of smartphones, we developed an edge intelligence architecture to enhance the performance of smartphone-based eye tracking. We applied various optimisation methods like quantisation and pruning to deep learning models for better energy, CPU, and memory usage on edge devices, focusing on real-time processing. Using model quantisation, the model inference time in the CNN+LSTM and CNN+GRU models was reduced by 21.72% and 19.50%, respectively, on edge devices.
当前基于智能手机的眼动追踪算法的一个显著限制是,当应用于视频类型的视觉刺激时,其准确性较低,因为这些算法通常是在静态图像上进行训练的。此外,随着对实时互动应用(如游戏、VR和AR)的需求增加,必须克服由计算能力有限、电池寿命短以及网络带宽不足等资源约束所带来的问题。因此,我们开发了两种新的智能手机眼动追踪技术来处理视频类型的视觉内容:结合卷积神经网络(CNN)与两种不同类型的递归神经网络(RNN),即长短期记忆(LSTM)和门控循环单元(GRU)。我们的 CNN+LSTM 和 CNN+GRU 模型分别达到了平均均方根误差为 0.955 厘米和 1.091 厘米的水平。 为了应对智能手机计算资源的限制,我们开发了一种边缘智能架构来提升基于智能手机的眼动追踪性能。通过采用诸如量化和剪枝等优化方法对深度学习模型进行改进,以更好地在边缘设备上使用能源、CPU 和内存,并专注于实时处理。利用模型量化技术,在边缘设备上的 CNN+LSTM 和 CNN+GRU 模型的推理时间分别减少了 21.72% 和 19.50%。
https://arxiv.org/abs/2408.12463
This research provides an in-depth evaluation of various machine learning models for energy forecasting, focusing on the unique challenges of seasonal variations in student residential settings. The study assesses the performance of baseline models, such as LSTM and GRU, alongside state-of-the-art forecasting methods, including Autoregressive Feedforward Neural Networks, Transformers, and hybrid approaches. Special attention is given to predicting energy consumption amidst challenges like seasonal patterns, vacations, meteorological changes, and irregular human activities that cause sudden fluctuations in usage. The findings reveal that no single model consistently outperforms others across all seasons, emphasizing the need for season-specific model selection or tailored designs. Notably, the proposed Hyper Network based LSTM and MiniAutoEncXGBoost models exhibit strong adaptability to seasonal variations, effectively capturing abrupt changes in energy consumption during summer months. This study advances the energy forecasting field by emphasizing the critical role of seasonal dynamics and model-specific behavior in achieving accurate predictions.
这项研究对多种机器学习模型在能源预测中的表现进行了深入评估,重点关注学生宿舍环境中季节变化带来的独特挑战。该研究评估了基线模型(如LSTM和GRU)以及最先进的预测方法(包括自回归前馈神经网络、Transformer以及混合方法)的性能。特别关注的是,在面对诸如季节性模式、假期、气象变化及不规律的人类活动等因素所导致的能量消耗突然波动的情况下,如何进行准确预测。 研究发现没有单一模型能够在所有季节中始终优于其他模型,这强调了根据特定季节选择相应模型或设计定制化解决方案的必要性。值得一提的是,基于超网络的LSTM和MiniAutoEncXGBoost模型表现出对季节变化的强大适应能力,在夏季等月份能够有效捕捉到能源消耗中的突然变化。 这项研究通过强调季节动态以及针对不同模型的行为特征在实现准确预测方面的关键作用,推进了能源预测领域的进展。
https://arxiv.org/abs/2501.07423
The Internet of Things (IoT) and mobile technology have significantly transformed healthcare by enabling real-time monitoring and diagnosis of patients. Recognizing medical-related human activities (MRHA) is pivotal for healthcare systems, particularly for identifying actions that are critical to patient well-being. However, challenges such as high computational demands, low accuracy, and limited adaptability persist in Human Motion Recognition (HMR). While some studies have integrated HMR with IoT for real-time healthcare applications, limited research has focused on recognizing MRHA as essential for effective patient monitoring. This study proposes a novel HMR method for MRHA detection, leveraging multi-stage deep learning techniques integrated with IoT. The approach employs EfficientNet to extract optimized spatial features from skeleton frame sequences using seven Mobile Inverted Bottleneck Convolutions (MBConv) blocks, followed by ConvLSTM to capture spatio-temporal patterns. A classification module with global average pooling, a fully connected layer, and a dropout layer generates the final predictions. The model is evaluated on the NTU RGB+D 120 and HMDB51 datasets, focusing on MRHA, such as sneezing, falling, walking, sitting, etc. It achieves 94.85% accuracy for cross-subject evaluations and 96.45% for cross-view evaluations on NTU RGB+D 120, along with 89.00% accuracy on HMDB51. Additionally, the system integrates IoT capabilities using a Raspberry Pi and GSM module, delivering real-time alerts via Twilios SMS service to caregivers and patients. This scalable and efficient solution bridges the gap between HMR and IoT, advancing patient monitoring, improving healthcare outcomes, and reducing costs.
物联网(IoT)和移动技术通过实现实时监测和诊断患者,极大地改变了医疗保健。识别与医疗相关的身体活动(MRHA)对于医疗系统至关重要,特别是为了识别对患者健康至关重要的动作。然而,在人体运动识别(HMR)方面仍然存在挑战,例如计算需求高、准确性低以及适应性有限等问题。尽管一些研究将HMR与IoT集成用于实时医疗应用,但针对有效患者监测所必需的MRHA识别的研究却相对较少。本研究提出了一种新的HMR方法来检测MRHA,采用多阶段深度学习技术并结合物联网功能。 该方法使用EfficientNet从骨架帧序列中提取优化的空间特征,并通过七个Mobile Inverted Bottleneck Convolutions (MBConv)块进行处理,随后应用ConvLSTM捕捉时空模式。分类模块包括全局平均池化层、全连接层和dropout层,以生成最终预测结果。 模型在NTU RGB+D 120和HMDB51数据集上进行了评估,重点是诸如打喷嚏、跌倒、行走、坐下等MRHA动作。该模型在NTU RGB+D 120跨受试者验证中达到94.85%的准确率,在NTU RGB+D 120跨视图验证中的准确率为96.45%,同时在HMDB51数据集上的准确率为89.00%。 此外,该系统还利用Raspberry Pi和GSM模块整合IoT功能,并通过Twilio短信服务向护理人员和患者提供实时警报。这种可扩展且高效的解决方案填补了HMR与IoT之间的空白,提升了患者的监测效果,改善医疗保健结果并降低费用。
https://arxiv.org/abs/2501.07039
3D medical image segmentation has progressed considerably due to Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs), yet these methods struggle to balance long-range dependency acquisition with computational efficiency. To address this challenge, we propose UNETVL (U-Net Vision-LSTM), a novel architecture that leverages recent advancements in temporal information processing. UNETVL incorporates Vision-LSTM (ViL) for improved scalability and memory functions, alongside an efficient Chebyshev Kolmogorov-Arnold Networks (KAN) to handle complex and long-range dependency patterns more effectively. We validated our method on the ACDC and AMOS2022 (post challenge Task 2) benchmark datasets, showing a significant improvement in mean Dice score compared to recent state-of-the-art approaches, especially over its predecessor, UNETR, with increases of 7.3% on ACDC and 15.6% on AMOS, respectively. Extensive ablation studies were conducted to demonstrate the impact of each component in UNETVL, providing a comprehensive understanding of its architecture. Our code is available at this https URL, facilitating further research and applications in this domain.
3D医学图像分割由于卷积神经网络(CNN)和视觉变换器(ViT)的进步而取得了显著进展,然而这些方法在长距离依赖获取与计算效率之间难以找到平衡。为了应对这一挑战,我们提出了一种新的架构UNETVL(U-Net Vision-LSTM),该架构利用了最近在时间信息处理方面的进步。UNETVL融合了Vision-LSTM(ViL)以提高可扩展性和记忆功能,并结合高效的Chebyshev Kolmogorov-Arnold Networks (KAN) 来更有效地处理复杂且长距离的依赖模式。 我们在ACDC和AMOS2022(赛后任务2)基准数据集上验证了我们的方法,结果显示与最近的最先进方法相比,平均Dice得分显著提高。相较于其前身UNETR,在ACDC数据集上的性能提高了7.3%,在AMOS数据集上的性能提高了15.6%。 我们进行了详尽的消融研究,以展示UNETVL中每个组件的影响,并提供对其架构全面理解的基础。我们的代码可在[此处](https://example.com)获取(请将链接替换为实际网址),这有助于进一步的研究和该领域的应用开发。
https://arxiv.org/abs/2501.07017
Load forecasting plays a crucial role in energy management, directly impacting grid stability, operational efficiency, cost reduction, and environmental sustainability. Traditional Vanilla Recurrent Neural Networks (RNNs) face issues such as vanishing and exploding gradients, whereas sophisticated RNNs such as LSTMs have shown considerable success in this domain. However, these models often struggle to accurately capture complex and sudden variations in energy consumption, and their applicability is typically limited to specific consumer types, such as offices or schools. To address these challenges, this paper proposes the Kolmogorov-Arnold Recurrent Network (KARN), a novel load forecasting approach that combines the flexibility of Kolmogorov-Arnold Networks with RNN's temporal modeling capabilities. KARN utilizes learnable temporal spline functions and edge-based activations to better model non-linear relationships in load data, making it adaptable across a diverse range of consumer types. The proposed KARN model was rigorously evaluated on a variety of real-world datasets, including student residences, detached homes, a home with electric vehicle charging, a townhouse, and industrial buildings. Across all these consumer categories, KARN consistently outperformed traditional Vanilla RNNs, while it surpassed LSTM and Gated Recurrent Units (GRUs) in six buildings. The results demonstrate KARN's superior accuracy and applicability, making it a promising tool for enhancing load forecasting in diverse energy management scenarios.
负荷预测在能源管理中扮演着至关重要的角色,直接影响电网稳定性、运营效率、成本节约和环境可持续性。传统的 vanilla 循环神经网络(RNN)面临诸如梯度消失和爆炸等问题,而像长短期记忆网络(LSTM)这样的复杂 RNN 在这一领域表现出显著的成功。然而,这些模型通常难以准确捕捉能源消耗中的复杂和突然变化,并且它们的应用范围往往仅限于特定类型的消费者,如办公室或学校。为了解决这些问题,本文提出了 Kolmogorov-Arnold 循环网络(KARN),这是一种将 Kolmogorov-Arnold 网络的灵活性与 RNN 的时间建模能力相结合的新负荷预测方法。KARN 利用可学习的时间样条函数和基于边界的激活来更好地对负载数据中的非线性关系进行建模,使其能够适应各种消费者类型。所提出的 KARN 模型在多种现实世界的数据集上进行了严格的评估,包括学生宿舍、独立住宅、带有电动汽车充电的住宅、联排别墅以及工业建筑。在整个这些消费者类别中,KARN 一直优于传统的 vanilla RNN,并且在六个建筑物中超过了 LSTM 和门控循环单元(GRU)。结果表明 KARN 具有更高的准确性和适用性,使其成为增强各种能源管理场景中的负荷预测的有希望的工具。
https://arxiv.org/abs/2501.06965
Pain management and severity detection are crucial for effective treatment, yet traditional self-reporting methods are subjective and may be unsuitable for non-verbal individuals (people with limited speaking skills). To address this limitation, we explore automated pain detection using facial expressions. Our study leverages deep learning techniques to improve pain assessment by analyzing facial images from the Pain Emotion Faces Database (PEMF). We propose two novel approaches1: (1) a hybrid ConvNeXt model combined with Long Short-Term Memory (LSTM) blocks to analyze video frames and predict pain presence, and (2) a Spatio-Temporal Graph Convolution Network (STGCN) integrated with LSTM to process landmarks from facial images for pain detection. Our work represents the first use of the PEMF dataset for binary pain classification and demonstrates the effectiveness of these models through extensive experimentation. The results highlight the potential of combining spatial and temporal features for enhanced pain detection, offering a promising advancement in objective pain assessment methodologies.
疼痛管理和严重程度的检测对于有效治疗至关重要,然而传统的自我报告方法具有主观性,并且可能不适用于非语言个体(言语能力有限的人)。为了解决这一局限性,我们探索了利用面部表情进行自动化疼痛检测的方法。我们的研究通过分析来自Pain Emotion Faces数据库(PEMF)的面部图像,采用深度学习技术来改进疼痛评估。我们提出了两种新的方法: 1. 一种结合了ConvNeXt模型和长短期记忆(LSTM)模块的混合模型,用于分析视频帧并预测疼痛的存在。 2. 一个集成LSTM处理来自面部图像的关键点的空间-时间图卷积网络(STGCN),以进行疼痛检测。 我们的工作首次使用PEMF数据集来进行二元分类疼痛,并通过广泛的实验展示了这些模型的有效性。结果强调了结合空间和时间特征对增强疼痛检测的潜力,为客观疼痛评估方法的发展提供了一个有希望的进步。
https://arxiv.org/abs/2501.06787
The air transport system recognizes the criticality of safety, as even minor anomalies can have severe consequences. Reporting accidents and incidents play a vital role in identifying their causes and proposing safety recommendations. However, the narratives describing pre-accident events are presented in unstructured text that is not easily understood by computer systems. Classifying and categorizing safety occurrences based on these narratives can support informed decision-making by aviation industry stakeholders. In this study, researchers applied natural language processing (NLP) and artificial intelligence (AI) models to process text narratives to classify the flight phases of safety occurrences. The classification performance of two deep learning models, ResNet and sRNN was evaluated, using an initial dataset of 27,000 safety occurrence reports from the NTSB. The results demonstrated good performance, with both models achieving an accuracy exceeding 68%, well above the random guess rate of 14% for a seven-class classification problem. The models also exhibited high precision, recall, and F1 scores. The sRNN model greatly outperformed the simplified ResNet model architecture used in this study. These findings indicate that NLP and deep learning models can infer the flight phase from raw text narratives, enabling effective analysis of safety occurrences.
航空运输系统认识到安全的重要性,因为即使是较小的异常情况也可能导致严重的后果。报告事故和事件对于识别其原因并提出安全建议至关重要。然而,描述事故发生前事件的叙述通常以不易被计算机系统理解的非结构化文本形式呈现。基于这些叙述对安全事故进行分类和归类可以支持航空业利益相关者的知情决策。 在这项研究中,研究人员应用了自然语言处理(NLP)和人工智能(AI)模型来处理文本叙述,并据此将安全事件分类为不同的飞行阶段。他们使用美国国家运输安全委员会(NTSB)提供的初始数据集中的27,000份安全事故报告,评估了两种深度学习模型(ResNet 和 sRNN)的分类性能。研究结果显示,这两种模型都表现良好,在七类分类问题中,准确率均超过了68%,远高于随机猜测的14%概率。这些模型还表现出高精度、召回率和F1分数。 在本次研究中,sRNN 模型大大优于所使用的简化ResNet 模型架构。这一发现表明,NLP 和深度学习模型可以从原始文本叙述中推断出飞行阶段,从而能够有效地分析安全事件。
https://arxiv.org/abs/2501.06564
This paper presents \textit{TopoFormer}, a novel hybrid deep learning architecture that integrates transformer-based encoders with convolutional long short-term memory (ConvLSTM) layers for the precise prediction of topographic beach profiles referenced to elevation datums, with a particular focus on Mean Low Water Springs (MLWS) and Mean Low Water Neaps (MLWN). Accurate topographic estimation down to MLWS is critical for coastal management, navigation safety, and environmental monitoring. Leveraging a comprehensive dataset from the Wales Coastal Monitoring Centre (WCMC), consisting of over 2000 surveys across 36 coastal survey units, TopoFormer addresses key challenges in topographic prediction, including temporal variability and data gaps in survey measurements. The architecture uniquely combines multi-head attention mechanisms and ConvLSTM layers to capture both long-range dependencies and localized temporal patterns inherent in beach profiles data. TopoFormer's predictive performance was rigorously evaluated against state-of-the-art models, including DenseNet, 1D/2D CNNs, and LSTMs. While all models demonstrated strong performance, \textit{TopoFormer} achieved the lowest mean absolute error (MAE), as low as 2 cm, and provided superior accuracy in both in-distribution (ID) and out-of-distribution (OOD) evaluations.
这篇论文介绍了\textit{TopoFormer}, 这是一种将基于变压器的编码器与卷积长短期记忆(ConvLSTM)层相结合的新型混合深度学习架构。它用于精确预测参照高度基准点的高度地形海滩轮廓,特别是在平均低潮大潮(MLWS)和平均低潮小潮(MLWN)方面的预测。准确估计至MLWS的高度对于海岸管理、航行安全和环境监测至关重要。通过利用来自威尔士沿海监测中心(WCMC)的全面数据集进行研究,该数据集包括对36个沿海调查单位超过2000次调查的数据,TopoFormer解决了地形预测中的关键挑战,如调查测量的时间变化性和数据缺口问题。这种架构独特地结合了多头注意机制和ConvLSTM层,能够捕捉海滩轮廓数据中固有的长范围依赖关系和局部时间模式。 TopoFormer的预测性能经过严格的评估,与最先进的模型(包括DenseNet、1D/2D CNNs以及LSTMs)进行了比较。尽管所有这些模型都表现出强大的性能,但\textit{TopoFormer}在均方绝对误差(MAE)方面达到了最低水平,低至2厘米,并且在分布内(ID)和分布外(OOD)评估中均提供了更优的准确性。
https://arxiv.org/abs/2501.06494
Safety is a critical aspect of the air transport system given even slight operational anomalies can result in serious consequences. To reduce the chances of aviation safety occurrences, accidents and incidents are reported to establish the root cause, propose safety recommendations etc. However, analysis narratives of the pre-accident events are presented using human-understandable, raw, unstructured, text that a computer system cannot understand. The ability to classify and categorise safety occurrences from their textual narratives would help aviation industry stakeholders make informed safety-critical decisions. To classify and categorise safety occurrences, we applied natural language processing (NLP) and AI (Artificial Intelligence) models to process text narratives. The study aimed to answer the question. How well can the damage level caused to the aircraft in a safety occurrence be inferred from the text narrative using natural language processing. The classification performance of various deep learning models including LSTM, BLSTM, GRU, sRNN, and combinations of these models including LSTM and GRU, BLSTM+GRU, sRNN and LSTM, sRNN and BLSTM, sRNN and GRU, sRNN and BLSTM and GRU, and sRNN and LSTM and GRU was evaluated on a set of 27,000 safety occurrence reports from the NTSB. The results of this study indicate that all models investigated performed competitively well recording an accuracy of over 87.9% which is well above the random guess of 25% for a four-class classification problem. Also, the models recorded high precision, recall, and F1 scores above 80%, 88%, and 85%, respectively. sRNN slightly outperformed other single models in terms of recall (90%) and accuracy (90%) while LSTM reported slightly better performance in terms of precision (87%).
安全性是航空运输系统中的关键方面,因为即使是轻微的操作异常也可能导致严重的后果。为了减少航空安全事件的发生概率,事故和事件会被报告以确定根本原因,并提出安全建议等措施。然而,在事故发生前的事件分析叙述通常使用人类可以理解但未经过结构化处理的文本形式进行记录,这种格式计算机系统无法直接解析。 能够根据文本描述对航空安全事件进行分类和归类,将帮助航空业利益相关者做出更加明智的安全决策。为了实现这一目标,我们应用了自然语言处理(NLP)和人工智能(AI)模型来处理这些未结构化的叙述文本。本研究旨在回答以下问题:通过使用自然语言处理技术,可以从事故发生的叙述性描述中推断出飞机受损程度的程度有多高? 在来自美国国家运输安全委员会(NTSB)的27,000份安全事件报告的数据集上,我们评估了包括长短时记忆网络(LSTM)、双向长短期记忆网络(BLSTM)、门控循环单元网络(GRU)、简化递归神经网络(sRNN),以及这些模型的各种组合在内的多个深度学习模型的分类性能。具体组合包括:LSTM+GRU、BLSTM+GRU、sRNN+LSTM、sRNN+BLSTM、sRNN+GRU、sRNN+BLSTM+GRU 和 sRNN+LSTM+GRU。 研究结果表明,所有被调查的模型均表现优异,在四类分类问题中达到了超过87.9%的准确率,这一成绩远远超过了随机猜测的25%。此外,这些模型在精度、召回率和F1分数方面也都取得了较高的得分,分别为80%,88%,以及85%以上。 其中,简化递归神经网络(sRNN)在召回率(90%)和准确率(90%)方面的表现略优于其他单一模型。而长短时记忆网络(LSTM)则在精度方面表现出略微更好的性能(87%)。
https://arxiv.org/abs/2501.06490
Traditional microlensing event vetting methods require highly trained human experts, and the process is both complex and time-consuming. This reliance on manual inspection often leads to inefficiencies and constrains the ability to scale for widespread exoplanet detection, ultimately hindering discovery rates. To address the limits of traditional microlensing event vetting, we have developed LensNet, a machine learning pipeline specifically designed to distinguish legitimate microlensing events from false positives caused by instrumental artifacts, such as pixel bleed trails and diffraction spikes. Our system operates in conjunction with a preliminary algorithm that detects increasing trends in flux. These flagged instances are then passed to LensNet for further classification, allowing for timely alerts and follow-up observations. Tailored for the multi-observatory setup of the Korea Microlensing Telescope Network (KMTNet) and trained on a rich dataset of manually classified events, LensNet is optimized for early detection and warning of microlensing occurrences, enabling astronomers to organize follow-up observations promptly. The internal model of the pipeline employs a multi-branch Recurrent Neural Network (RNN) architecture that evaluates time-series flux data with contextual information, including sky background, the full width at half maximum of the target star, flux errors, PSF quality flags, and air mass for each observation. We demonstrate a classification accuracy above 87.5%, and anticipate further improvements as we expand our training set and continue to refine the algorithm.
传统的微透镜事件筛选方法需要经过高度训练的人类专家,且过程复杂耗时。这种对人工检查的依赖往往导致效率低下,并限制了大规模系外行星检测的能力,从而阻碍了发现速度。为了解决传统微透镜事件验证的局限性,我们开发了一种名为LensNet的机器学习管道,专门设计用于区分合法的微透镜事件和由于像素溢出轨迹、衍射光斑等仪器伪影造成的假阳性。我们的系统与初步算法协同工作,该算法检测到荧光强度增加的趋势。标记出来的这些实例随后会被传递给LensNet进行进一步分类,从而实现及时警报并安排后续观测。 LensNet专为韩国微透镜望远镜网络(KMTNet)的多天文台设置而设计,并通过丰富的手动分类事件数据集进行了训练,优化了对微透镜事件早期检测和预警的能力,使天文学家能够迅速组织后续观测。管道内部模型采用了一种具有多重分支递归神经网络(RNN)架构,该架构利用上下文信息评估时间序列荧光强度数据,包括天空背景、目标恒星的半高全宽、荧光误差、点扩散函数质量标志和每次观测的大气层厚度。 我们证明了LensNet在分类准确度上超过了87.5%,预计随着训练集的扩大和算法的进一步优化,其性能将得到提升。
https://arxiv.org/abs/2501.06293
While attention-based architectures, such as Conformers, excel in speech enhancement, they face challenges such as scalability with respect to input sequence length. In contrast, the recently proposed Extended Long Short-Term Memory (xLSTM) architecture offers linear scalability. However, xLSTM-based models remain unexplored for speech enhancement. This paper introduces xLSTM-SENet, the first xLSTM-based single-channel speech enhancement system. A comparative analysis reveals that xLSTM-and notably, even LSTM-can match or outperform state-of-the-art Mamba- and Conformer-based systems across various model sizes in speech enhancement on the VoiceBank+Demand dataset. Through ablation studies, we identify key architectural design choices such as exponential gating and bidirectionality contributing to its effectiveness. Our best xLSTM-based model, xLSTM-SENet2, outperforms state-of-the-art Mamba- and Conformer-based systems on the Voicebank+DEMAND dataset.
基于注意力的架构(如Conformers)在语音增强方面表现出色,但它们面临着输入序列长度上的可扩展性挑战。相比之下,最近提出的扩展长短期记忆(xLSTM)架构提供了线性的可扩展性。然而,目前还没有人探索过xLSTM架构在语音增强领域的应用。本文介绍了一种新的系统——xLSTM-SENet,这是第一个基于xLSTM的单通道语音增强系统。 通过比较分析发现,在VoiceBank+Demand数据集上进行语音增强时,无论是xLSTM还是传统的LSTM模型都能与最先进的Mamba和Conformer基线方法相匹配或超越它们的表现,并且这种表现跨越了不同规模的模型。通过消融实验,我们识别出了有助于其有效性的关键架构设计选择,包括指数门控(exponential gating)和双向性(bidirectionality)。我们的最佳xLSTM模型——xLSTM-SENet2,在VoiceBank+DEMAND数据集上优于最先进的Mamba和Conformer基线系统。
https://arxiv.org/abs/2501.06146
Video restoration plays a pivotal role in revitalizing degraded video content by rectifying imperfections caused by various degradations introduced during capturing (sensor noise, motion blur, etc.), saving/sharing (compression, resizing, etc.) and editing. This paper introduces a novel algorithm designed for scenarios where noise is introduced during video capture, aiming to enhance the visual quality of videos by reducing unwanted noise artifacts. We propose the Latent space LSTM Video Denoiser (LLVD), an end-to-end blind denoising model. LLVD uniquely combines spatial and temporal feature extraction, employing Long Short Term Memory (LSTM) within the encoded feature domain. This integration of LSTM layers is crucial for maintaining continuity and minimizing flicker in the restored video. Moreover, processing frames in the encoded feature domain significantly reduces computations, resulting in a very lightweight architecture. LLVD's blind nature makes it versatile for real, in-the-wild denoising scenarios where prior information about noise characteristics is not available. Experiments reveal that LLVD demonstrates excellent performance for both synthetic and captured noise. Specifically, LLVD surpasses the current State-Of-The-Art (SOTA) in RAW denoising by 0.3dB, while also achieving a 59\% reduction in computational complexity.
视频修复在恢复降级的视频内容方面发挥着关键作用,通过纠正由拍摄(如传感器噪声、运动模糊等)、保存/分享(压缩、调整大小等)和编辑过程中引入的各种退化因素所引起的缺陷。本文介绍了一种旨在处理视频捕获过程中引入噪点场景的新算法,目的是通过减少不需要的噪点来提高视频的视觉质量。 我们提出了一种端到端盲去噪模型——潜空间LSTM视频去噪器(LLVD)。该模型独特地结合了空间和时间特征提取,并在编码特征域中采用了长短期记忆网络(LSTM),这对于保持恢复视频中的连续性和减少闪烁至关重要。此外,在编码特征域处理帧显著减少了计算量,从而形成了一个非常轻量级的架构。 由于其盲特性,LLVD非常适合实际应用中的去噪场景,即没有关于噪声特性的先验信息的情况下使用。实验结果显示,LLVD在合成和捕获噪音方面表现出色。具体来说,在RAW降噪方面,LLVD超越了当前的最佳性能(SOTA)0.3dB,并且计算复杂性减少了59%。
https://arxiv.org/abs/2501.05744
The self-attention (SA) mechanism has demonstrated superior performance across various domains, yet it suffers from substantial complexity during both training and inference. The next-generation architecture, aiming at retaining the competitive performance of SA while achieving low-cost inference and efficient long-sequence training, primarily focuses on three approaches: linear attention, linear RNNs, and state space models. Although these approaches achieve reduced complexity than SA, they all have built-in performance degradation factors, such as diminished “spikiness†and compression of historical information. In contrast to these approaches, we propose a novel element-wise attention mechanism, which uses the element-wise squared Euclidean distance, instead of the dot product operation, to compute similarity and approximates the quadratic complexity term $\exp(q_{ic}k_{jc})$ with a Taylor polynomial. This design achieves remarkable efficiency: during training, the element-wise attention has a complexity of $\mathcal{O}(tLD)$, making long-sequence training both computationally and memory efficient, where $L$ is the sequence length, $D$ is the feature dimension, and $t$ is the highest order of the polynomial; during inference, it can be reformulated as recurrent neural networks, achieving a inference complexity of $\mathcal{O}(tD)$. Furthermore, the element-wise attention circumvents the performance degradation factors present in these approaches and achieves performance comparable to SA in both causal and non-causal forms.
自我注意(SA)机制在多个领域中表现出色,但在训练和推理过程中却面临着复杂度高的问题。下一代架构旨在保留SA的竞争性性能的同时实现低成本推理和高效的长序列训练,主要关注三种方法:线性注意力、线性RNNs以及状态空间模型。尽管这些方法比传统的自我注意机制的复杂度有所降低,但它们都存在固有的性能下降因素,例如“尖锐度”减少及历史信息压缩的问题。 相比之下,我们提出了一种新颖的逐元素注意力机制,它使用逐元素平方欧氏距离来计算相似性,而不是点积操作,并且用泰勒多项式近似了二次复杂度项$\exp(q_{ic}k_{jc})$。这种设计实现了显著的效率:在训练阶段,逐元素注意的复杂度为$\mathcal{O}(tLD)$,其中$L$是序列长度,$D$是特征维度,而$t$是最高的多项式阶数;这使得长序列训练既计算上又内存上都更加高效。在推理阶段,该机制可以重新表述为递归神经网络形式,实现了$\mathcal{O}(tD)$的推理复杂度。 此外,逐元素注意力规避了这些方法中存在的性能下降因素,并且无论是在因果关系还是非因果关系的形式下,都能实现与SA相当的表现水平。
https://arxiv.org/abs/2501.05730
This paper presents a novel predictive maintenance framework centered on Enhanced Quantile Regression Neural Networks EQRNNs, for anticipating system failures in industrial robotics. We address the challenge of early failure detection through a hybrid approach that combines advanced neural architectures. The system leverages dual computational stages: first implementing an EQRNN optimized for processing multi-sensor data streams including vibration, thermal, and power signatures, followed by an integrated Spiking Neural Network SNN, layer that enables microsecond-level response times. This architecture achieves notable accuracy rates of 92.3\% in component failure prediction with a 90-hour advance warning window. Field testing conducted on an industrial scale with 50 robotic systems demonstrates significant operational improvements, yielding a 94\% decrease in unexpected system failures and 76\% reduction in maintenance-related downtimes. The framework's effectiveness in processing complex, multi-modal sensor data while maintaining computational efficiency validates its applicability for Industry 4.0 manufacturing environments.
本文提出了一种基于增强分位数回归神经网络(EQRNN)的新型预测性维护框架,用于预测工业机器人系统的故障。我们通过结合高级神经架构的方法解决了早期故障检测的挑战。该系统采用双计算阶段:首先实施一个优化处理多传感器数据流(包括振动、热和功率特征)的EQRNN,然后是一个集成的脉冲神经网络(SNN)层,使响应时间达到微秒级。此架构在组件故障预测中取得了92.3%的显著准确率,并且能够在提前90小时发出预警。在一个涉及50个机器人系统的工业规模现场测试中,该框架展示了重大的运营改进,导致意外系统故障减少了94%,与维护相关的停机时间减少了76%。此框架在处理复杂多模态传感器数据的同时保持计算效率的能力验证了其在Industry 4.0制造环境中的适用性。
https://arxiv.org/abs/2501.05087
Self-driving cars require extensive testing, which can be costly in terms of time. To optimize this process, simple and straightforward tests should be excluded, focusing on challenging tests instead. This study addresses the test selection problem for lane-keeping systems for self-driving cars. Road segment features, such as angles and lengths, were extracted and treated as sequences, enabling classification of the test cases as "safe" or "unsafe" using a long short-term memory (LSTM) model. The proposed model is compared against machine learning-based test selectors. Results demonstrated that the LSTM-based method outperformed machine learning-based methods in accuracy and precision metrics while exhibiting comparable performance in recall and F1 scores. This work introduces a novel deep learning-based approach to the road classification problem, providing an effective solution for self-driving car test selection using a simulation environment.
自动驾驶汽车需要进行大量的测试,而这些测试在时间成本上可能会非常昂贵。为了优化这一过程,应该排除简单的和直接的测试,专注于具有挑战性的测试。这项研究旨在解决车道保持系统(适用于自动驾驶汽车)中的测试选择问题。研究人员从道路路段中提取了诸如角度和长度等特征,并将它们处理为序列数据,以便使用长短期记忆模型(LSTM)对测试案例进行“安全”或“不安全”的分类。 提出的方法与基于机器学习的测试选择器进行了比较。结果显示,基于LSTM的方法在准确性和精确性指标上优于基于机器学习的方法,在召回率和F1分数方面表现出类似的表现水平。这项工作引入了一种针对道路分类问题的新颖深度学习方法,并为自动驾驶汽车测试的选择提供了一个有效的仿真环境解决方案。
https://arxiv.org/abs/2501.03881
In this paper, we present a novel synergistic framework for learning shape estimation and a shape-aware whole-body control policy for tendon-driven continuum robots. Our approach leverages the interaction between two Augmented Neural Ordinary Differential Equations (ANODEs) -- the Shape-NODE and Control-NODE -- to achieve continuous shape estimation and shape-aware control. The Shape-NODE integrates prior knowledge from Cosserat rod theory, allowing it to adapt and account for model mismatches, while the Control-NODE uses this shape information to optimize a whole-body control policy, trained in a Model Predictive Control (MPC) fashion. This unified framework effectively overcomes limitations of existing data-driven methods, such as poor shape awareness and challenges in capturing complex nonlinear dynamics. Extensive evaluations in both simulation and real-world environments demonstrate the framework's robust performance in shape estimation, trajectory tracking, and obstacle avoidance. The proposed method consistently outperforms state-of-the-art end-to-end, Neural-ODE, and Recurrent Neural Network (RNN) models, particularly in terms of tracking accuracy and generalization capabilities.
在这篇论文中,我们提出了一种新颖的协同框架,用于学习形状估计和基于肌腱驱动的连续机器人全身感知控制策略。我们的方法利用了两个增强神经常微分方程(ANODE)——形体-NODE 和 控制-NODE ——之间的交互作用,以实现持续的形状估计和形体感知控制。Shape-NODE 集成了 Cosserat 杆理论的先验知识,使其能够适应并解决模型不匹配问题,而 Control-NODE 则利用这些形体信息来优化一个基于模型预测控制(MPC)方法训练的全身控制策略。 这一统一框架有效地克服了现有数据驱动方法的局限性,如较差的形状感知能力和捕捉复杂非线性动态过程的挑战。在仿真和现实世界环境中的广泛评估证明了该框架在形体估计、轨迹跟踪以及障碍物规避方面的稳健性能。所提出的方法在跟踪精度和泛化能力方面始终优于最先进的端到端模型、神经常微分方程(Neural-ODE)模型及循环神经网络(RNN)模型。
https://arxiv.org/abs/2501.03859