Paper Reading AI Learner

Fashion Recommendation: Outfit Compatibility using GNN

2024-04-28 00:57:17
Samaksh Gulati

Abstract

Numerous industries have benefited from the use of machine learning and fashion in industry is no exception. By gaining a better understanding of what makes a good outfit, companies can provide useful product recommendations to their users. In this project, we follow two existing approaches that employ graphs to represent outfits and use modified versions of the Graph neural network (GNN) frameworks. Both Node-wise Graph Neural Network (NGNN) and Hypergraph Neural Network aim to score a set of items according to the outfit compatibility of items. The data used is the Polyvore Dataset which consists of curated outfits with product images and text descriptions for each product in an outfit. We recreate the analysis on a subset of this data and compare the two existing models on their performance on two tasks Fill in the blank (FITB): finding an item that completes an outfit, and Compatibility prediction: estimating compatibility of different items grouped as an outfit. We can replicate the results directionally and find that HGNN does have a slightly better performance on both tasks. On top of replicating the results of the two papers we also tried to use embeddings generated from a vision transformer and witness enhanced prediction accuracy across the board

Abstract (translated)

许多行业都从机器学习和时尚中受益,时尚行业也不例外。通过更好地理解什么是好的服装,公司可以为用户提供有用的产品推荐。在这个项目中,我们遵循了两种现有的方法,即节点图神经网络(NGNN)和超图神经网络,这些方法使用图形来表示服装,并使用对 Graph神经网络(GNN)框架的修改版本。节点图神经网络(NGNN)和超图神经网络的目标是根据服装的兼容性对一组项目进行评分。所使用的数据是Polyvore数据集,它包括精心挑选的服装和每个服装中产品的图片和文字描述。我们在数据子集上重新分析,并比较这两个现有模型的性能在两个任务上:Fill in the blank(FITB):找到完成套路的物品,Compatibility prediction:估计将不同物品分组为套路的兼容性。我们可以沿袭两个论文的结果,并发现HGNN在两个任务上都表现得更好。除了复制两个论文的结果外,我们还试图尝试使用从视觉 transformer生成的嵌入,并全面提高在整个 board上的预测准确性。

URL

https://arxiv.org/abs/2404.18040

PDF

https://arxiv.org/pdf/2404.18040.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot