Stereo depth estimation is used for many computer vision applications. Though many popular methods strive solely for depth quality, for real-time mobile applications (e.g. prosthetic glasses or micro-UAVs), speed and power efficiency are equally, if not more, important. Many real-world systems rely on Semi-Global Matching (SGM) to achieve a good accuracy vs. speed balance, but power efficiency is hard to achieve with conventional hardware, making the use of embedded devices such as FPGAs attractive for low-power applications. However, the full SGM algorithm is ill-suited to deployment on FPGAs, and so most FPGA variants of it are partial, at the expense of accuracy. In a non-FPGA context, the accuracy of SGM has been improved by More Global Matching (MGM), which also helps tackle the streaking artifacts that afflict SGM. In this paper, we propose a novel, resource-efficient method that is inspired by MGM's techniques for improving depth quality, but which can be implemented to run in real time on a low-power FPGA. Through evaluation on multiple datasets (KITTI and Middlebury), we show that in comparison to other real-time capable stereo approaches, we can achieve a state-of-the-art balance between accuracy, power efficiency and speed, making our approach highly desirable for use in real-time systems with limited power.
https://arxiv.org/abs/1810.12988
Computationally efficient moving object detection and depth estimation from a stereo camera is an extremely useful tool for many computer vision applications, including robotics and autonomous driving. In this paper we show how moving objects can be densely detected by estimating disparity using an algorithm that improves complexity and accuracy of stereo matching by relying on information from previous frames. The main idea behind this approach is that by using the ego-motion estimation and the disparity map of the previous frame, we can set a prior base that enables us to reduce the complexity of the current frame disparity estimation, subsequently also detecting moving objects in the scene. For each pixel we run a Kalman filter that recursively fuses the disparity prediction and reduced space semi-global matching (SGM) measurements. The proposed algorithm has been implemented and optimized using streaming single instruction multiple data instruction set and multi-threading. Furthermore, in order to estimate the process and measurement noise as reliably as possible, we conduct extensive experiments on the KITTI suite using the ground truth obtained by the 3D laser range sensor. Concerning disparity estimation, compared to the OpenCV SGM implementation, the proposed method yields improvement on the KITTI dataset sequences in terms of both speed and accuracy.
https://arxiv.org/abs/1809.07986
In order to improve usability and safety, modern unmanned aerial vehicles (UAVs) are equipped with sensors to monitor the environment, such as laser-scanners and cameras. One important aspect in this monitoring process is to detect obstacles in the flight path in order to avoid collisions. Since a large number of consumer UAVs suffer from tight weight and power constraints, our work focuses on obstacle avoidance based on a lightweight stereo camera setup. We use disparity maps, which are computed from the camera images, to locate obstacles and to automatically steer the UAV around them. For disparity map computation we optimize the well-known semi-global matching (SGM) approach for the deployment on an embedded FPGA. The disparity maps are then converted into simpler representations, the so called U-/V-Maps, which are used for obstacle detection. Obstacle avoidance is based on a reactive approach which finds the shortest path around the obstacles as soon as they have a critical distance to the UAV. One of the fundamental goals of our work was the reduction of development costs by closing the gap between application development and hardware optimization. Hence, we aimed at using high-level synthesis (HLS) for porting our algorithms, which are written in C/C++, to the embedded FPGA. We evaluated our implementation of the disparity estimation on the KITTI Stereo 2015 benchmark. The integrity of the overall realtime reactive obstacle avoidance algorithm has been evaluated by using Hardware-in-the-Loop testing in conjunction with two flight simulators.
为了提高可用性和安全性,现代无人机(UAV)配备了用于监控环境的传感器,例如激光扫描仪和照相机。该监测过程的一个重要方面是检测飞行路径中的障碍物以避免碰撞。由于大量消费者无人机受到严格的重量和功率限制,我们的工作重点是基于轻量级立体摄像机设置的避障。我们使用从相机图像计算的视差图来定位障碍物并自动地控制它们周围的无人机。对于视差图计算,我们优化了众所周知的半全局匹配(SGM)方法,以便在嵌入式FPGA上进行部署。然后将视差图转换成更简单的表示,即所谓的U- / V-地图,其用于障碍物检测。避障是基于一种被动方法,一旦它们与无人机有一个临界距离,它就能找到障碍物周围的最短路径。我们工作的基本目标之一是通过缩小应用程序开发和硬件优化之间的差距来降低开发成本。因此,我们的目标是使用高级综合(HLS)将我们的算法(用C / C ++编写)移植到嵌入式FPGA中。我们评估了KITTI Stereo 2015基准测试中差异估计的实施情况。通过将硬件在环测试与两个飞行模拟器结合使用来评估整体实时反应式避障算法的完整性。
https://arxiv.org/abs/1807.06271
With the technological advancements of aerial imagery and accurate 3d reconstruction of urban environments, more and more attention has been paid to the automated analyses of urban areas. In our work, we examine two important aspects that allow live analysis of building structures in city models given oblique aerial imagery, namely automatic building extraction with convolutional neural networks (CNNs) and selective real-time depth estimation from aerial imagery. We use transfer learning to train the Faster R-CNN method for real-time deep object detection, by combining a large ground-based dataset for urban scene understanding with a smaller number of images from an aerial dataset. We achieve an average precision (AP) of about 80% for the task of building extraction on a selected evaluation dataset. Our evaluation focuses on both dataset-specific learning and transfer learning. Furthermore, we present an algorithm that allows for multi-view depth estimation from aerial imagery in real-time. We adopt the semi-global matching (SGM) optimization strategy to preserve sharp edges at object boundaries. In combination with the Faster R-CNN, it allows a selective reconstruction of buildings, identified with regions of interest (RoIs), from oblique aerial imagery.
随着航空影像的技术进步和城市环境的精确三维重建,城市地区的自动化分析越来越受到重视。在我们的工作中,我们研究了两个重要方面,即允许在给定倾斜航空影像的城市模型中对建筑结构进行实时分析,即利用卷积神经网络(CNN)进行自动建筑物提取和从航空影像中选择性实时深度估计。我们使用转移学习来训练更快的R-CNN方法进行实时深度物体检测,将大型地面数据集用于城市场景理解与来自航空数据集的较少数量的图像相结合。对于在选定评估数据集上构建提取的任务,我们实现了大约80%的平均精度(AP)。我们的评估侧重于数据集特定的学习和转移学习。此外,我们提出了一种算法,允许实时地从航空图像中进行多视图深度估计。我们采用半全局匹配(SGM)优化策略来保留对象边界处的锐边。与更快的R-CNN相结合,它允许从倾斜航拍图像中选择性地重建与感兴趣区域(RoI)相关的建筑物。
https://arxiv.org/abs/1804.08302