Paper Reading AI Learner

Chordal Averaging on Flag Manifolds and Its Applications

2023-03-23 17:57:28
Nathan Mankovich, Tolga Birdal

Abstract

This paper presents a new, provably-convergent algorithm for computing the flag-mean and flag-median of a set of points on a flag manifold under the chordal metric. The flag manifold is a mathematical space consisting of flags, which are sequences of nested subspaces of a vector space that increase in dimension. The flag manifold is a superset of a wide range of known matrix groups, including Stiefel and Grassmanians, making it a general object that is useful in a wide variety computer vision problems. To tackle the challenge of computing first order flag statistics, we first transform the problem into one that involves auxiliary variables constrained to the Stiefel manifold. The Stiefel manifold is a space of orthogonal frames, and leveraging the numerical stability and efficiency of Stiefel-manifold optimization enables us to compute the flag-mean effectively. Through a series of experiments, we show the competence of our method in Grassmann and rotation averaging, as well as principal component analysis.

Abstract (translated)

本论文提出了一种新的、可证明收敛的算法,用于计算在一个 Flag manifold 上、受链式度量影响的一个集合点 Flag-mean 和 Flag-Median。 Flag manifold 是包含 flag 的数学空间,即一个向量空间中的嵌套子空间序列,其维度在增加。 Flag manifold 是许多已知的矩阵群的扩展,包括施特林群和 Grassmanian,因此它是一个通用的对象,在各种计算机视觉问题中非常有用。 要解决计算第一级 Flag 统计的挑战,我们首先将问题转化为一个涉及限制在施特林 manifold 上的辅助变量的问题。施特林 manifold 是一组正交帧的空间,利用施特林 manifold 的优化数值稳定性和效率,我们能够有效地计算 Flag-mean。 通过一系列实验,我们展示了我们方法在 Grassman 和旋转平均以及主成分分析中的能力和优势。

URL

https://arxiv.org/abs/2303.13501

PDF

https://arxiv.org/pdf/2303.13501.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot