The advancements in computer vision and image processing techniques have led to emergence of new application in the domain of visual surveillance, targeted advertisement, content-based searching, and human-computer interaction etc. Out of the various techniques in computer vision, face analysis, in particular, has gained much attention. Several previous studies have tried to explore different applications of facial feature processing for a variety of tasks, including age and gender classification. However, despite several previous studies having explored the problem, the age and gender classification of in-wild human faces is still far from the achieving the desired levels of accuracy required for real-world applications. This paper, therefore, attempts to bridge this gap by proposing a hybrid model that combines self-attention and BiLSTM approaches for age and gender classification problems. The proposed models performance is compared with several state-of-the-art model proposed so far. An improvement of approximately 10percent and 6percent over the state-of-the-art implementations for age and gender classification, respectively, are noted for the proposed model. The proposed model is thus found to achieve superior performance and is found to provide a more generalized learning. The model can, therefore, be applied as a core classification component in various image processing and computer vision problems.
计算机视觉和图像处理技术的进步在视觉监视、定向广告、内容搜索和人与计算机交互等领域导致了新的应用的出现。在计算机视觉的各种技术中,特别是面部特征分析,已经引起了很多关注。之前的研究已经尝试探索面部特征处理在各种任务中的应用,包括年龄和性别分类。然而,尽管有几次之前的研究探索了这个问题,但野外人类脸部的年龄和性别分类仍然离实现真实应用所需的精度水平还有很长的路要走。因此,本文试图通过提出一种混合模型来弥合这个差距,该模型结合了自注意力机制和BiLSTM方法来进行年龄和性别分类问题。所提出的模型的性能与目前最先进的模型进行了比较。与年龄和性别分类的现有实现相比,分别提高了约10%和6%的性能。因此,本文提出的方法被认为实现了卓越的性能,并且具有更广泛的泛化学习。因此,该模型可以作为一个核心分类组件应用于各种图像处理和计算机视觉问题。
https://arxiv.org/abs/2403.12483
3D sensors, also known as RGB-D sensors, utilize depth images where each pixel measures the distance from the camera to objects, using principles like structured light or time-of-flight. Advances in artificial vision have led to affordable 3D cameras capable of real-time object detection without object movement, surpassing 2D cameras in information depth. These cameras can identify objects of varying colors and reflectivities and are less affected by lighting changes. The described prototype uses RGB-D sensors for bidirectional people counting in venues, aiding security and surveillance in spaces like stadiums or airports. It determines real-time occupancy and checks against maximum capacity, crucial during emergencies. The system includes a RealSense D415 depth camera and a mini-computer running object detection algorithms to count people and a 2D camera for identity verification. The system supports statistical analysis and uses C++, Python, and PHP with OpenCV for image processing, demonstrating a comprehensive approach to monitoring venue occupancy.
3D传感器,也称为RGB-D传感器,利用深度图,其中每个像素测量相机到物体的距离,利用结构光或时间测距等原理。人工智能的进步使得价格实惠的3D相机能够实现实时物体检测,超过2D相机在信息深度方面的表现。这些相机可以识别各种颜色和反射率的物体,对光线变化的影响较小。描述的原型使用RGB-D传感器进行场馆双向人员计数,帮助体育场馆或机场等空间的安保和监控。它实时确定空位并检查最大容量,在紧急情况下至关重要。系统包括一个RealSense D415深度相机和一个运行物体检测算法的迷你计算机,以及一个2D相机用于身份验证。该系统支持统计分析,并使用C++、Python和PHP与OpenCV进行图像处理,展示了全面监测场馆占有率的方法。
https://arxiv.org/abs/2403.12310
The robustness of unmanned aerial vehicle (UAV) tracking is crucial in many tasks like surveillance and robotics. Despite its importance, little attention is paid to the performance of UAV trackers under common corruptions due to lack of a dedicated platform. Addressing this, we propose UAV-C, a large-scale benchmark for assessing robustness of UAV trackers under common corruptions. Specifically, UAV-C is built upon two popular UAV datasets by introducing 18 common corruptions from 4 representative categories including adversarial, sensor, blur, and composite corruptions in different levels. Finally, UAV-C contains more than 10K sequences. To understand the robustness of existing UAV trackers against corruptions, we extensively evaluate 12 representative algorithms on UAV-C. Our study reveals several key findings: 1) Current trackers are vulnerable to corruptions, indicating more attention needed in enhancing the robustness of UAV trackers; 2) When accompanying together, composite corruptions result in more severe degradation to trackers; and 3) While each tracker has its unique performance profile, some trackers may be more sensitive to specific corruptions. By releasing UAV-C, we hope it, along with comprehensive analysis, serves as a valuable resource for advancing the robustness of UAV tracking against corruption. Our UAV-C will be available at this https URL.
无人机(UAV)跟踪的稳健性在许多任务中至关重要,如监视和机器人技术。尽管稳健性非常重要,但很少关注在普通污损下无人机跟踪器的性能。为了解决这个问题,我们提出了UAV-C,一个评估无人机跟踪器在常见污损下的稳健性的大型基准。具体来说,UAV-C基于两个流行的UAV数据集,引入了包括恶意、传感器、模糊和组合污损在内的18种常见污损水平。最后,UAV-C包含超过10K个序列。为了了解现有UAV跟踪器对污损的稳健性,我们详细评估了UAV-C上的12个代表算法。我们的研究揭示了几个关键发现:1)当前的跟踪器对污损非常脆弱,表明需要更多地关注增强UAV跟踪器的稳健性;2)当一起出现时,组合污损会导致跟踪器性能的更严重恶化;3)虽然每个跟踪器都有其独特的性能概况,但有些跟踪器可能对特定污损更加敏感。通过发布UAV-C,我们希望它,再加上全面的分析,成为提高无人机跟踪器对抗污损的有价值的资源。我们的UAV-C将在此链接上可用:https://www.url。
https://arxiv.org/abs/2403.11424
Swarm robots, which are inspired from the way insects behave collectively in order to achieve a common goal, have become a major part of research with applications involving search and rescue, area exploration, surveillance etc. In this paper, we present a swarm of robots that do not require individual extrinsic sensors to sense the environment but instead use a single central camera to locate and map the swarm. The robots can be easily built using readily available components with the main chassis being 3D printed, making the system low-cost, low-maintenance, and easy to replicate. We describe Zutu's hardware and software architecture, the algorithms to map the robots to the real world, and some experiments conducted using four of our robots. Eventually, we conclude the possible applications of our system in research, education, and industries.
群智能机器人,以昆虫集体行动方式为灵感,以实现共同目标,已成为搜索与救援、区域探索和监控等应用领域的关键部分。在本文中,我们介绍了一种不需要个别外部传感器来感知环境的群智能机器人,而是使用一个中央相机来定位和绘制群落。机器人可以使用易于获得的组件轻松构建,主底盘采用3D打印技术制造,使系统具有低成本、低维护性和易复制的特点。我们描述了Zutu的硬件和软件架构、将机器人映射到现实世界的算法以及使用我们其中的四个机器人进行的一些实验。最后,我们总结了我们的系统在研究、教育和产业领域可能的应用。
https://arxiv.org/abs/2403.11252
Reinforcement Learning- (RL-)based motion planning has recently shown the potential to outperform traditional approaches from autonomous navigation to robot manipulation. In this work, we focus on a motion planning task for an evasive target in a partially observable multi-agent adversarial pursuit-evasion games (PEG). These pursuit-evasion problems are relevant to various applications, such as search and rescue operations and surveillance robots, where robots must effectively plan their actions to gather intelligence or accomplish mission tasks while avoiding detection or capture themselves. We propose a hierarchical architecture that integrates a high-level diffusion model to plan global paths responsive to environment data while a low-level RL algorithm reasons about evasive versus global path-following behavior. Our approach outperforms baselines by 51.2% by leveraging the diffusion model to guide the RL algorithm for more efficient exploration and improves the explanability and predictability.
强化学习(RL)为基础的运动规划已经在最近的研究中证明了在自主导航和机器人操作等传统方法之上超越的可能性。在本文中,我们关注一个在部分可观测多代理器对抗追击游戏(PEG)中的避碰目标的运动规划任务。这些追击- evasion 问题与各种应用有关,例如搜救和救援任务以及监视机器人,在这些应用中,机器人必须有效地规划其行动以收集情报或完成任务,同时避免被检测或捕捉。我们提出了一个分层架构,将高层扩散模型与低层强化学习算法相结合,在响应环境数据的同时规划全局路径。我们的方法通过利用扩散模型引导RL算法进行更有效的探索,提高了可解释性和可预测性。与基线相比,我们的方法提高了51.2%的性能。
https://arxiv.org/abs/2403.10794
Cloth-changing person re-identification aims to retrieve and identify spe-cific pedestrians by using cloth-irrelevant features in person cloth-changing scenarios. However, pedestrian images captured by surveillance probes usually contain occlusions in real-world scenarios. The perfor-mance of existing cloth-changing re-identification methods is significantly degraded due to the reduction of discriminative cloth-irrelevant features caused by occlusion. We define cloth-changing person re-identification in occlusion scenarios as occluded cloth-changing person re-identification (Occ-CC-ReID), and to the best of our knowledge, we are the first to pro-pose occluded cloth-changing person re-identification as a new task. We constructed two occluded cloth-changing person re-identification datasets for different occlusion scenarios: Occluded-PRCC and Occluded-LTCC. The datasets can be obtained from the following link: this https URL Re-Identification.
衣物换人识别旨在通过在人员更衣场景中使用与衣物无关的特征来检索和识别特定的行人。然而,由遮挡导致的现实场景中行人图像通常包含遮挡。由于遮挡导致歧视性衣物无关特征的减少,现有衣物换人识别方法的性能 significantly degraded。我们将遮挡场景中的衣物换人识别定义为遮挡衣物换人识别(Occ-CC-ReID),据我们所知,我们第一个提出了遮挡衣物换人识别作为新任务。我们为不同遮挡场景构建了两个遮挡衣物换人识别数据集:Occluded-PRCC和Occluded-LTCC。数据集可以通过以下链接获取:https://this.url。
https://arxiv.org/abs/2403.08557
Cloth-Changing Person Re-Identification (CC-ReID) aims to accurately identify the target person in more realistic surveillance scenarios, where pedestrians usually change their clothing. Despite great progress, limited cloth-changing training samples in existing CC-ReID datasets still prevent the model from adequately learning cloth-irrelevant features. In addition, due to the absence of explicit supervision to keep the model constantly focused on cloth-irrelevant areas, existing methods are still hampered by the disruption of clothing variations. To solve the above issues, we propose an Identity-aware Dual-constraint Network (IDNet) for the CC-ReID task. Specifically, to help the model extract cloth-irrelevant clues, we propose a Clothes Diversity Augmentation (CDA), which generates more realistic cloth-changing samples by enriching the clothing color while preserving the texture. In addition, a Multi-scale Constraint Block (MCB) is designed, which extracts fine-grained identity-related features and effectively transfers cloth-irrelevant knowledge. Moreover, a Counterfactual-guided Attention Module (CAM) is presented, which learns cloth-irrelevant features from channel and space dimensions and utilizes the counterfactual intervention for supervising the attention map to highlight identity-related regions. Finally, a Semantic Alignment Constraint (SAC) is designed to facilitate high-level semantic feature interaction. Comprehensive experiments on four CC-ReID datasets indicate that our method outperforms prior state-of-the-art approaches.
衣物变化人员重新识别(CC-ReID)的目的是在更真实的监视场景中准确地识别目标人物,其中行人通常会改变衣服。尽管取得了很大进展,但现有的CC-ReID数据集中有限的换衣训练样本还是无法使模型充分学习与衣物无关的特征。此外,由于缺乏明确监督来保持模型持续关注衣物无关区域,现有方法仍然受到服装变化的影响。为解决上述问题,我们提出了一个具有身份感的学习双约束网络(IDNet)用于CC-ReID任务。具体来说,为了帮助模型提取与衣物无关的线索,我们提出了一个衣物多样性增强(CDA),它通过增加衣服颜色的同时保留纹理来生成更逼真的换衣样本。此外,还设计了一个多尺度约束块(MCB),它提取细粒度的身份相关特征,并有效地将衣物无关知识转移。此外,我们还提出了一个反事实引导的注意力模块(CAM),它从通道和空间维度学习衣物无关特征,并利用反事实干预来监督注意力图以突出身份相关区域。最后,我们还设计了一个语义对齐约束(SAC)来促进高级语义特征交互。对四个CC-ReID数据集的全面实验表明,我们的方法超越了先前最先进的解决方案。
https://arxiv.org/abs/2403.08270
Due to the needs of road traffic flow monitoring and public safety management, video surveillance cameras are widely distributed in urban roads. However, the information captured directly by each camera is siloed, making it difficult to use it effectively. Vehicle re-identification refers to finding a vehicle that appears under one camera in another camera, which can correlate the information captured by multiple cameras. While license plate recognition plays an important role in some applications, there are some scenarios where re-identification method based on vehicle appearance are more suitable. The main challenge is that the data of vehicle appearance has the characteristics of high inter-class similarity and large intra-class differences. Therefore, it is difficult to accurately distinguish between different vehicles by relying only on vehicle appearance information. At this time, it is often necessary to introduce some extra information, such as spatio-temporal information. Nevertheless, the relative position of the vehicles rarely changes when passing through two adjacent cameras in the bridge scenario. In this paper, we present a vehicle re-identification method based on flock similarity, which improves the accuracy of vehicle re-identification by utilizing vehicle information adjacent to the target vehicle. When the relative position of the vehicles remains unchanged and flock size is appropriate, we obtain an average relative improvement of 204% on VeRi dataset in our experiments. Then, the effect of the magnitude of the relative position change of the vehicles as they pass through two cameras is discussed. We present two metrics that can be used to quantify the difference and establish a connection between them. Although this assumption is based on the bridge scenario, it is often true in other scenarios due to driving safety and camera location.
为了满足道路交通流量监测和公共安全管理的需要,城市道路中广泛分布着视频监控摄像头。然而,每个摄像头捕捉到的信息都是孤立的,这使得有效使用变得困难。车辆识别是指在另一个摄像头中出现的车辆,这可以关联多个摄像头捕捉到的信息。尽管车牌识别在某些应用中扮演着重要角色,但在某些场景中,基于车辆外观的识别方法可能更合适。主要挑战是,车辆外观信息的特征是高跨类相似度和大类内差异。因此,仅依靠车辆外观信息很难准确地区分不同车辆。此时,通常需要引入一些额外的信息,例如空间-时间信息。然而,在桥场景中,车辆相对位置的变化很少。在本文中,我们提出了一个基于羽量相似度的车辆识别方法,该方法利用了目标车辆周围的车辆信息来提高车辆识别的准确性。当车辆相对位置保持不变,羽量适当时,我们在实验中获得了VeRi数据集中平均相对改善率为204%的结果。接着,我们讨论了车辆在通过两个摄像头时相对位置变化的影响。我们提出了两个可以量化差异并建立联系的指标。虽然这个假设基于桥场景,但在其他场景中也是真实的,因为驾驶安全性和摄像头位置。
https://arxiv.org/abs/2403.07752
Object counting is a challenging task with broad application prospects in security surveillance, traffic management, and disease diagnosis. Existing object counting methods face a tri-fold challenge: achieving superior performance, maintaining high generalizability, and minimizing annotation costs. We develop a novel training-free class-agnostic object counter, TFCounter, which is prompt-context-aware via the cascade of the essential elements in large-scale foundation models. This approach employs an iterative counting framework with a dual prompt system to recognize a broader spectrum of objects varying in shape, appearance, and size. Besides, it introduces an innovative context-aware similarity module incorporating background context to enhance accuracy within messy scenes. To demonstrate cross-domain generalizability, we collect a novel counting dataset named BIKE-1000, including exclusive 1000 images of shared bicycles from Meituan. Extensive experiments on FSC-147, CARPK, and BIKE-1000 datasets demonstrate that TFCounter outperforms existing leading training-free methods and exhibits competitive results compared to trained counterparts.
对象计数是一个具有广泛应用前景的挑战性任务,特别是在安全监控、交通管理和疾病诊断领域。现有的对象计数方法面临着三重挑战:实现卓越性能、保持高泛化能力以及降低注释成本。我们开发了一种新颖的无需训练的对象计数器,TFCounter,它通过大型基础模型中关键元素级联的级联来具有提示上下文感知。这种方法采用了一种迭代计数框架和双提示系统来识别形状、外观和大小变化更广泛的对象。此外,它还引入了一种创新的可解释性相似模块,包括背景上下文,以提高在复杂场景中的准确性。为了展示跨领域泛化能力,我们收集了一个名为BIKE-1000的新兴计数数据集,包括来自美团共享自行车的1000张独家图像。在FSC-147、CARPK和BIKE-1000数据集上的实验证明,TFCounter超越了现有的领导训练免费方法,并与其训练后的对应方法具有竞争力的结果。
https://arxiv.org/abs/2405.02301
Human action recognition in videos is a critical task with significant implications for numerous applications, including surveillance, sports analytics, and healthcare. The challenge lies in creating models that are both precise in their recognition capabilities and efficient enough for practical use. This study conducts an in-depth analysis of various deep learning models to address this challenge. Utilizing a subset of the UCF101 Videos dataset, we focus on Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Two-Stream ConvNets. The research reveals that while CNNs effectively capture spatial features and RNNs encode temporal sequences, Two-Stream ConvNets exhibit superior performance by integrating spatial and temporal dimensions. These insights are distilled from the evaluation metrics of accuracy, precision, recall, and F1-score. The results of this study underscore the potential of composite models in achieving robust human action recognition and suggest avenues for future research in optimizing these models for real-world deployment.
视频中的人类动作识别是一个关键的任务,对于包括监控、运动分析和人医疗等在内的众多应用具有重要的影响。挑战在于创建既准确度高又足够高效的模型。本研究对各种深度学习模型进行了深入分析,以应对这个挑战。利用UCF101视频数据集中的部分数据,我们重点关注卷积神经网络(CNNs)、循环神经网络(RNNs)和双流卷积网(Two-Stream CNNs)。研究揭示,虽然CNNs能有效捕捉空间特征,RNNs能编码时间序列,但双流卷积网通过整合空间和时间维度表现出卓越的性能。这些洞察是从准确性、精确度、召回率和F1分数等评估指标中得出的。本研究的成果突出了组合模型的潜力,为实现稳健的人动作识别和优化这些模型以实现真实世界部署提供了方向。
https://arxiv.org/abs/2403.06810
Humans use UAVs to monitor changes in forest environments since they are lightweight and provide a large variety of surveillance data. However, their information does not present enough details for understanding the scene which is needed to assess the degree of deforestation. Deep learning algorithms must be trained on large amounts of data to output accurate interpretations, but ground truth recordings of annotated forest imagery are not available. To solve this problem, we introduce a new large aerial dataset for forest inspection which contains both real-world and virtual recordings of natural environments, with densely annotated semantic segmentation labels and depth maps, taken in different illumination conditions, at various altitudes and recording angles. We test the performance of two multi-scale neural networks for solving the semantic segmentation task (HRNet and PointFlow network), studying the impact of the various acquisition conditions and the capabilities of transfer learning from virtual to real data. Our results showcase that the best results are obtained when the training is done on a dataset containing a large variety of scenarios, rather than separating the data into specific categories. We also develop a framework to assess the deforestation degree of an area.
人类利用无人机(UAVs)监测森林环境的变化,因为它们轻便,能提供大量的监控数据。然而,他们的信息不足以理解需要评估的森林退化程度的情境。深度学习算法必须在大量数据上进行训练,才能输出准确的解释。但是, annotated 森林图像的地面真值录音是不可用的。为解决这个问题,我们引入了一个新的大型的森林检查数据集,其中包括现实世界和虚拟世界记录的自然环境录音,带有密集注释的语义分割标签和深度图,在不同的光照条件下,以各种高度和拍摄角度进行记录。我们研究了在解决语义分割任务(HRNet 和 PointFlow网络)时,各种采集条件对性能的影响以及虚拟数据到真实数据 transfer learning 的能力。我们的结果表明,在训练基于包含大量场景的 datasets 时,取得最佳结果。我们还开发了一个评估地区退化程度的框架。
https://arxiv.org/abs/2403.06621
Anomalous behavior detection is a challenging research area within computer vision. Progress in this area enables automated detection of dangerous behavior using surveillance camera feeds. A dangerous behavior that is often overlooked in other research is the throwing action in traffic flow, which is one of the unique requirements of our Smart City project to enhance public safety. This paper proposes a solution for throwing action detection in surveillance videos using deep learning. At present, datasets for throwing actions are not publicly available. To address the use-case of our Smart City project, we first generate the novel public 'Throwing Action' dataset, consisting of 271 videos of throwing actions performed by traffic participants, such as pedestrians, bicyclists, and car drivers, and 130 normal videos without throwing actions. Second, we compare the performance of different feature extractors for our anomaly detection method on the UCF-Crime and Throwing-Action datasets. The explored feature extractors are the Convolutional 3D (C3D) network, the Inflated 3D ConvNet (I3D) network, and the Multi-Fiber Network (MFNet). Finally, the performance of the anomaly detection algorithm is improved by applying the Adam optimizer instead of Adadelta, and proposing a mean normal loss function that covers the multitude of normal situations in traffic. Both aspects yield better anomaly detection performance. Besides this, the proposed mean normal loss function lowers the false alarm rate on the combined dataset. The experimental results reach an area under the ROC curve of 86.10 for the Throwing-Action dataset, and 80.13 on the combined dataset, respectively.
异常行为检测是计算机视觉领域的一个具有挑战性的研究课题。该领域的进步能够实现使用监控摄像机视频流量进行自动检测危险行为。在其他研究中经常被忽视的危险行为是交通流量中的投掷行为,这是我们智能城市项目提高公众安全的一个独特要求。本文提出了一种使用深度学习检测监控视频投掷行为的方法。目前,投掷行为的数据集不是公开的。为了应对我们的智能城市项目,我们首先生成了由交通参与者(行人、自行车和汽车驾驶员) performing 271 个投掷动作组成的全新公共 '投掷动作' 数据集,以及包含 130 个没有投掷动作的普通视频。接着,我们比较了在我们的异常检测方法上使用不同特征提取器在UCF-Crime和投掷动作-动作数据集上的性能。探索到的特征提取器包括卷积3D(C3D)网络、膨胀3D卷积网络(I3D)和多纤维网络(MFNet)。最后,通过使用Adam优化器而不是Adadelta,并建议使用平均正常损失函数来覆盖交通中的各种正常情况,异常检测算法的性能得到了提高。此外,所提出的平均正常损失函数在投掷动作数据集上的 false alarm rate 降低了。实验结果分别达到 ROC 曲线下 86.10 和 80.13 的区域。
https://arxiv.org/abs/2403.06552
Background: Immune checkpoint inhibitors (ICIs) have revolutionized cancer treatment but can result in severe immune-related adverse events (IrAEs). Monitoring IrAEs on a large scale is essential for personalized risk profiling and assisting in treatment decisions. Methods: In this study, we conducted an analysis of clinical notes from patients who received ICIs at the Tel Aviv Sourasky Medical Center. By employing a Natural Language Processing algorithmic pipeline, we systematically identified seven common or severe IrAEs. We examined the utilization of corticosteroids, treatment discontinuation rates following IrAEs, and constructed survival curves to visualize the occurrence of adverse events during treatment. Results: Our analysis encompassed 108,280 clinical notes associated with 1,635 patients who had undergone ICI therapy. The detected incidence of IrAEs was consistent with previous reports, exhibiting substantial variation across different ICIs. Treatment with corticosteroids varied depending on the specific IrAE, ranging from 17.3% for thyroiditis to 57.4% for myocarditis. Our algorithm demonstrated high accuracy in identifying IrAEs, as indicated by an area under the curve (AUC) of 0.89 for each suspected note and F1 scores of 0.87 or higher for five out of the seven IrAEs examined at the patient level. Conclusions: This study presents a novel, large-scale monitoring approach utilizing deep neural networks for IrAEs. Our method provides accurate results, enhancing understanding of detrimental consequences experienced by ICI-treated patients. Moreover, it holds potential for monitoring other medications, enabling comprehensive post-marketing surveillance to identify susceptible populations and establish personalized drug safety profiles.
背景:免疫检查点抑制剂(ICIs) revolutionized cancer treatment, but can result in severe immune-related adverse events(IrAEs). Monitoring IrAEs on a large scale is essential for personalized risk profiling and assisting in treatment decisions. 方法:在这项研究中,我们在 Tel Aviv Sourasky Medical Center 接受 ICIs 治疗的患者临床笔记中进行分析。通过采用自然语言处理算法管道,我们系统地识别出七种常见或严重的 IrAE。我们研究了糖皮质激素的使用、IrAE 发生后治疗停止率以及治疗期间不良事件的生存曲线,以在治疗过程中可视化不良事件的发生情况。 结果:本研究包括了 108,280 份与 1,635 名患者相关的临床笔记,这些患者已接受过 ICI 治疗。我们发现的 IrAE 发生情况与之前报道的一致,表现出不同 ICIs 之间的显著差异。糖皮质激素的治疗取决于具体的 IrAE,从甲状腺炎到心肌炎的治疗差异很大。我们的算法在识别 IrAE 方面具有很高的准确性,据曲线下的面积(AUC)为 0.89,对于每个涉嫌笔记的 F1 分数为 0.87 或更高。 结论:这项研究提出了一种新颖的、基于深度神经网络的大规模监测方法来检测 IrAE。我们的方法提供了准确的结果,有助于了解接受 ICI 治疗的患者所经历的严重不良后果。此外,该方法还有可能用于监测其他药物,从而全面评估上市后安全性,确定易感人群并建立个性化的药物安全资料。
https://arxiv.org/abs/2403.09708
Vision is an important metaphor in ethical and political questions of knowledge. The feminist philosopher Donna Haraway points out the ``perverse'' nature of an intrusive, alienating, all-seeing vision (to which we might cry out ``stop looking at me!''), but also encourages us to embrace the embodied nature of sight and its promises for genuinely situated knowledge. Current technologies of machine vision -- surveillance cameras, drones (for war or recreation), iPhone cameras -- are usually construed as instances of the former rather than the latter, and for good reasons. However, although in no way attempting to diminish the real suffering these technologies have brought about in the world, I make the case for understanding technologies of computer vision as material instances of embodied seeing and situated knowing. Furthermore, borrowing from Iris Murdoch's concept of moral vision, I suggest that these technologies direct our labor towards self-reflection in ethically significant ways. My approach draws upon paradigms in computer vision research, phenomenology, and feminist epistemology. Ultimately, this essay is an argument for directing more philosophical attention from merely criticizing technologies of vision as ethically deficient towards embracing them as complex, methodologically and epistemologically important objects.
视觉是知识伦理和政治问题中的一个重要隐喻。女性哲学家 Donna Haraway 指出,侵入性、疏离、全知视觉的“变态”性质(我们可能会喊道“停止看我!”),但同时也鼓励我们拥抱视觉的 embodied性质及其为真正情境化知识 的承诺。目前的人工智能视觉技术(例如监控摄像头、无人机,用于战争或娱乐目的 iPhone 相机等)通常被视为前者的实例,而不是后者的实例,这是有道理的。 然而,虽然这些技术带来的真实痛苦不能被轻视,但我认为我们应该理解计算机视觉技术作为 embodied seeing 和situated knowing 的物质实例。此外,借鑒 Iris Murdoch 的道德视觉概念,我认为这些技术将我们的劳动指向了在伦理意义上进行自我反思。我的方法借鉴了计算机视觉研究、现象学和女性主义实证主义的范式。 最终,本文的建议是,从仅仅批评视觉技术作为伦理上有缺陷的工具转向拥抱它们作为复杂、方法论上和认识论上重要的物体。
https://arxiv.org/abs/2403.05805
Radio advertising remains an integral part of modern marketing strategies, with its appeal and potential for targeted reach undeniably effective. However, the dynamic nature of radio airtime and the rising trend of multiple radio spots necessitates an efficient system for monitoring advertisement broadcasts. This study investigates a novel automated radio advertisement detection technique incorporating advanced speech recognition and text classification algorithms. RadIA's approach surpasses traditional methods by eliminating the need for prior knowledge of the broadcast content. This contribution allows for detecting impromptu and newly introduced advertisements, providing a comprehensive solution for advertisement detection in radio broadcasting. Experimental results show that the resulting model, trained on carefully segmented and tagged text data, achieves an F1-macro score of 87.76 against a theoretical maximum of 89.33. This paper provides insights into the choice of hyperparameters and their impact on the model's performance. This study demonstrates its potential to ensure compliance with advertising broadcast contracts and offer competitive surveillance. This groundbreaking research could fundamentally change how radio advertising is monitored and open new doors for marketing optimization.
广播广告仍然被认为是现代市场营销策略的重要组成部分,其吸引力以及精准传播效果无疑非常有效。然而,广播时间的动态性质和多个广播 spot 的上升趋势需要一个高效的事件监测系统。这项研究调查了一种结合高级语音识别和文本分类算法的全新自动广播广告检测技术。RadIA 的方法超越了传统方法,消除了对广播内容先前知识的需要。这一贡献使得检测到即兴和新增的广告成为可能,为广播广告检测提供了全面解决方案。实验结果表明,经过仔细分割和标记的文本数据集训练得到的模型,其F1-macro得分达到了87.76,而理论最大值为89.33。本文提供了关于超参数选择及其对模型性能影响的见解。这项研究证明了其确保遵守广告播出合同的可能性和提供竞争性监控的潜力。这一突破性的研究可能彻底改变如何监测广播广告,并为营销优化打开新的大门。
https://arxiv.org/abs/2403.03538
The emergence of tethered drones represents a major advancement in unmanned aerial vehicles (UAVs) offering solutions to key limitations faced by traditional drones. This article explores the potential of tethered drones with a particular focus on their ability to tackle issues related to battery life constraints and data latency commonly experienced by battery operated drones. Through their connection to a ground station via a tether, autonomous tethered drones provide continuous power supply and a secure direct data transmission link facilitating prolonged operational durations and real time data transfer. These attributes significantly enhance the effectiveness and dependability of drone missions in scenarios requiring extended surveillance, continuous monitoring and immediate data processing needs. Examining the advancements, operational benefits and potential future progressions associated with tethered drones, this article shows their increasing significance across various sectors and their pivotal role in pushing the boundaries of current UAV capabilities. The emergence of tethered drone technology not only addresses existing obstacles but also paves the way for new innovations within the UAV industry.
附着式无人机的出现是一个重大的人工智能无人机(UAV)进步,它提供了解决传统无人机所面临的关键限制的解决方案。本文重点探讨了附着式无人机的潜力,特别关注其解决电池寿命限制和数据延迟等问题。通过通过脐带与地面站连接,附着式无人机提供了持续的电源,并建立了安全的数据传输链路,促进了延长操作持续时间和实时数据传输。这些特性显著增强了在需要长时间监视、持续监测和实时数据处理的场景中无人机任务的有效性和可靠性。探讨附着式无人机的进步、操作优势和潜在未来进展,本文展示了它们在各个领域日益重要的地位,以及它们在推动无人机技术边界方面的关键作用。附着式无人机技术的出现不仅解决了现有问题,也为无人机行业的新创新铺平了道路。
https://arxiv.org/abs/2403.07922
This paper examines the evolving landscape of machine learning (ML) and its profound impact across various sectors, with a special focus on the emerging field of Privacy-preserving Machine Learning (PPML). As ML applications become increasingly integral to industries like telecommunications, financial technology, and surveillance, they raise significant privacy concerns, necessitating the development of PPML strategies. The paper highlights the unique challenges in safeguarding privacy within ML frameworks, which stem from the diverse capabilities of potential adversaries, including their ability to infer sensitive information from model outputs or training data. We delve into the spectrum of threat models that characterize adversarial intentions, ranging from membership and attribute inference to data reconstruction. The paper emphasizes the importance of maintaining the confidentiality and integrity of training data, outlining current research efforts that focus on refining training data to minimize privacy-sensitive information and enhancing data processing techniques to uphold privacy. Through a comprehensive analysis of privacy leakage risks and countermeasures in both centralized and collaborative learning settings, this paper aims to provide a thorough understanding of effective strategies for protecting ML training data against privacy intrusions. It explores the balance between data privacy and model utility, shedding light on privacy-preserving techniques that leverage cryptographic methods, Differential Privacy, and Trusted Execution Environments. The discussion extends to the application of these techniques in sensitive domains, underscoring the critical role of PPML in ensuring the privacy and security of ML systems.
本文研究机器学习(ML)不断演变的地形及其对各个行业的深刻影响,特别是隐私保护机器学习(PPML)领域。随着ML应用越来越成为电信、金融和监控等行业的不可或缺部分,它们引发了显著的隐私问题,迫切需要开发PPML策略。本文突出了在ML框架中保护隐私的独特挑战,这些挑战源于潜在攻击者的多样性能力,包括他们从模型输出或训练数据中推断敏感信息的能力。我们深入探讨了威胁模型的光谱,从成员和属性推断到数据重建。本文强调了保持训练数据的机密性和完整性,并概述了当前研究集中在优化训练数据以最小化隐私敏感信息并提高数据处理技术的努力。通过对集中和协作学习环境中的隐私泄漏风险和应对措施进行全面分析,本文旨在提供对保护ML训练数据免受隐私入侵的有效策略的深入理解。它探讨了数据隐私和模型效用之间的平衡,并揭示了利用加密方法、差分隐私和可信执行环境等隐私保护技术。这些讨论还扩展到这些技术的敏感领域应用,突出了PPML在确保ML系统的隐私和安全性方面的关键作用。
https://arxiv.org/abs/2404.16847
In response to the significant challenges facing the retail sector, including inefficient queue management, poor demand forecasting, and ineffective marketing, this paper introduces an innovative approach utilizing cutting-edge machine learning technologies. We aim to create an advanced smart retail analytics system (SRAS), leveraging these technologies to enhance retail efficiency and customer engagement. To enhance customer tracking capabilities, a new hybrid architecture is proposed integrating several predictive models. In the first stage of the proposed hybrid architecture for customer tracking, we fine-tuned the YOLOV8 algorithm using a diverse set of parameters, achieving exceptional results across various performance metrics. This fine-tuning process utilized actual surveillance footage from retail environments, ensuring its practical applicability. In the second stage, we explored integrating two sophisticated object-tracking models, BOT-SORT and ByteTrack, with the labels detected by YOLOV8. This integration is crucial for tracing customer paths within stores, which facilitates the creation of accurate visitor counts and heat maps. These insights are invaluable for understanding consumer behavior and improving store operations. To optimize inventory management, we delved into various predictive models, optimizing and contrasting their performance against complex retail data patterns. The GRU model, with its ability to interpret time-series data with long-range temporal dependencies, consistently surpassed other models like Linear Regression, showing 2.873% and 29.31% improvements in R2-score and mAPE, respectively.
为了应对零售行业所面临的重要挑战,包括低效的排队管理、缺乏需求预测和有效的市场营销,本文采用了一种利用最先进机器学习技术的新型方法。我们的目标是创建一个先进的智能零售分析系统(SRAS),利用这些技术提高零售效率和客户满意度。为了提高客户跟踪能力,我们提出了一个结合多个预测模型的全新架构。在客户跟踪架构的第一阶段,我们通过使用多样化的参数微调YOLOV8算法,在各种性能指标上取得了惊人的成绩。这个过程利用了零售环境中的实际视频监控 footage,确保其实用性。在第二阶段,我们将两个先进的物体跟踪模型BOT-SORT和ByteTrack与YOLOV8检测到的标签集成。这一集成对于在商店中追踪客户路径至关重要,从而促进了准确访客人数和热力图的创建。这些见解对于理解消费者行为和提高门店运营具有巨大的价值。为了优化库存管理,我们深入研究了各种预测模型,优化它们对复杂零售数据模式的性能,并分别对它们进行比较。具有对长时依赖的时间序列数据进行解释的能力GRU模型,分别比线性回归等模型具有2.873%和29.31%的提高。
https://arxiv.org/abs/2405.00023
The growing integration of large language models (LLMs) into social operations amplifies their impact on decisions in crucial areas such as economics, law, education, and healthcare, raising public concerns about these models' discrimination-related safety and reliability. However, prior discrimination measuring frameworks solely assess the average discriminatory behavior of LLMs, often proving inadequate due to the overlook of an additional discrimination-leading factor, i.e., the LLMs' prediction variation across diverse contexts. In this work, we present the Prejudice-Caprice Framework (PCF) that comprehensively measures discrimination in LLMs by considering both their consistently biased preference and preference variation across diverse contexts. Specifically, we mathematically dissect the aggregated contextualized discrimination risk of LLMs into prejudice risk, originating from LLMs' persistent prejudice, and caprice risk, stemming from their generation inconsistency. In addition, we utilize a data-mining approach to gather preference-detecting probes from sentence skeletons, devoid of attribute indications, to approximate LLMs' applied contexts. While initially intended for assessing discrimination in LLMs, our proposed PCF facilitates the comprehensive and flexible measurement of any inductive biases, including knowledge alongside prejudice, across various modality models. We apply our discrimination-measuring framework to 12 common LLMs, yielding intriguing findings: i) modern LLMs demonstrate significant pro-male stereotypes, ii) LLMs' exhibited discrimination correlates with several social and economic factors, iii) prejudice risk dominates the overall discrimination risk and follows a normal distribution, and iv) caprice risk contributes minimally to the overall risk but follows a fat-tailed distribution, suggesting that it is wild risk requiring enhanced surveillance.
大规模语言模型(LLMs)在社交运营中的日益整合加大了它们对经济、法律、教育和医疗等领域决策的影响,引发了公众对 these模型歧视相关安全性和可靠性的担忧。然而,先前的歧视测量框架仅评估了 LLMs 的平均歧视行为,往往由于忽视了一个额外的歧视主导因素——即 LLMs 在不同情境下的预测变化,而证明不足。在这项工作中,我们提出了偏见容量框架(PCF),通过考虑 LLMs 的一致性偏见和不同情境下的偏好变化,全面测量了 LLMs 的歧视。具体来说,我们将 LLMs 的聚合情境歧视风险分解为偏好风险,来源于 LLMs 的持续偏见,和冠词风险,来源于它们的生成不稳定性。此外,我们利用数据挖掘方法从句子骨架中收集偏好检测探针,摒弃属性指示,以近似 LLMs 的应用场景。尽管最初是为了评估 LLMs 的歧视,但我们的 PCF 促进了全面和灵活地测量任何归纳偏见,包括与偏见相伴的知识,在各种模态模型之间进行。我们将 PCF 应用于 12 种常见的 LLMs,得出了有趣的结果:i)现代 LLMs 表现出明显的男性刻板印象;ii)LLMs 示出的歧视与多种社会和 economic factor 相关;iii)偏好风险主导整体歧视风险,并遵循正态分布;iv)冠词风险对整体风险的贡献很小,但遵循哑变分布,表明这是一个需要加强监视的野生风险。
https://arxiv.org/abs/2402.15481
AI based Face Recognition Systems (FRSs) are now widely distributed and deployed as MLaaS solutions all over the world, moreso since the COVID-19 pandemic for tasks ranging from validating individuals' faces while buying SIM cards to surveillance of citizens. Extensive biases have been reported against marginalized groups in these systems and have led to highly discriminatory outcomes. The post-pandemic world has normalized wearing face masks but FRSs have not kept up with the changing times. As a result, these systems are susceptible to mask based face occlusion. In this study, we audit four commercial and nine open-source FRSs for the task of face re-identification between different varieties of masked and unmasked images across five benchmark datasets (total 14,722 images). These simulate a realistic validation/surveillance task as deployed in all major countries around the world. Three of the commercial and five of the open-source FRSs are highly inaccurate; they further perpetuate biases against non-White individuals, with the lowest accuracy being 0%. A survey for the same task with 85 human participants also results in a low accuracy of 40%. Thus a human-in-the-loop moderation in the pipeline does not alleviate the concerns, as has been frequently hypothesized in literature. Our large-scale study shows that developers, lawmakers and users of such services need to rethink the design principles behind FRSs, especially for the task of face re-identification, taking cognizance of observed biases.
基于AI的Face Recognition系统(FRS)现在在全球范围内广泛分布并作为MLaaS解决方案部署,尤其是在COVID-19大流行期间,例如验证个人购买SIM卡时的人脸识别和监视公民等任务。在这些系统中,针对边缘化群体的偏见报道较多,导致高度歧视性结果。大流行过后,世界已经正常化戴口罩,但FRS并没有跟上时代的变化。因此,这些系统容易受到口罩为基础的人脸遮挡。在本文中,我们对四个商业和九个开源FRS进行了审计,针对不同口罩和未戴口罩的图像进行人脸识别,跨越五个基准数据集(总共14,722张图片)。这些模拟了一个真实世界的验证/监视任务,类似于全球各国广泛部署的任务。三个商业和五个开源FRS高度不准确;它们进一步加剧了针对非白人居民的偏见,最低准确度仅为0%。同样,使用85名人类参与者的相同任务调查也结果不理想,准确度只有40%。因此,在流水线中的人类-闭环管理并不能减轻这些担忧,正如文献中经常假设的那样。 我们的大规模研究显示,开发人员、立法者和使用这些服务的用户需要重新思考FRS的设计原则,尤其是针对人脸识别任务,要关注所观察到的偏见。
https://arxiv.org/abs/2402.13771