Paper Reading AI Learner

AIDOVECL: AI-generated Dataset of Outpainted Vehicles for Eye-level Classification and Localization

2024-10-31 16:46:23
Amir Kazemi, Qurat ul ain Fatima, Volodymyr Kindratenko, Christopher Tessum

Abstract

Image labeling is a critical bottleneck in the development of computer vision technologies, often constraining the potential of machine learning models due to the time-intensive nature of manual annotations. This work introduces a novel approach that leverages outpainting to address the problem of annotated data scarcity by generating artificial contexts and annotations, significantly reducing manual labeling efforts. We apply this technique to a particularly acute challenge in autonomous driving, urban planning, and environmental monitoring: the lack of diverse, eye-level vehicle images in desired classes. Our dataset comprises AI-generated vehicle images obtained by detecting and cropping vehicles from manually selected seed images, which are then outpainted onto larger canvases to simulate varied real-world conditions. The outpainted images include detailed annotations, providing high-quality ground truth data. Advanced outpainting techniques and image quality assessments ensure visual fidelity and contextual relevance. Augmentation with outpainted vehicles improves overall performance metrics by up to 8\% and enhances prediction of underrepresented classes by up to 20\%. This approach, exemplifying outpainting as a self-annotating paradigm, presents a solution that enhances dataset versatility across multiple domains of machine learning. The code and links to datasets used in this study are available for further research and replication at this https URL.

Abstract (translated)

图像标注是计算机视觉技术发展中的一个关键瓶颈,由于手动注释的耗时性质,常常限制了机器学习模型的潜力。本研究介绍了一种新颖的方法,利用扩边技术解决标注数据稀缺的问题,通过生成人工背景和标注,显著减少了手动标注工作量。我们将这一技术应用于自动驾驶、城市规划及环境监测中的一个特别严峻挑战:缺乏所需类别中多样化且符合人眼高度的车辆图像。我们的数据集由AI生成的车辆图像组成,这些图像是通过对手动选择的种子图片进行检测和裁剪获得的,并将它们扩边到更大的画布上以模拟多样的现实条件。扩边后的图像包含了详细的标注,提供了高质量的真实数据。先进的扩边技术和图像质量评估确保了视觉保真度和情境相关性。通过加入扩边车辆进行增强,整体性能指标提高了高达8%,并提升了对代表性不足类别预测的准确性,最多提高20%。这种方法展示了扩边作为一种自我标注范式的解决方案,在机器学习的多个领域中增强了数据集的灵活性。本研究使用的代码和数据集链接可在以下网址获取:[此 HTTPS URL]。

URL

https://arxiv.org/abs/2410.24116

PDF

https://arxiv.org/pdf/2410.24116.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot