VQA
VQA
2022-05-26
Mitigating barren plateaus of variational quantum eigensolvers
Xia Liu, Geng Liu, Jiaxin Huang, Xin Wang
arXiv_CV
arXiv_CV
Optimization
Pose
VQA
QA
PDF
2022-05-26
Avoiding Barren Plateaus with Classical Deep Neural Networks
Lucas Friedrich, Jonas Maziero
arXiv_AI
arXiv_AI
Gradient_Descent
Optimization
VQA
QA
PDF
2022-05-25
Guiding Visual Question Answering with Attention Priors
Thao Minh Le, Vuong Le, Sunil Gupta, Svetha Venkatesh, Truyen Tran
arXiv_CV
arXiv_CV
Enhancement
Sparse
Knowledge
Pose
VQA
Attention
Inference
QA
PDF
2022-05-24
Rethinking Evaluation Practices in Visual Question Answering: A Case Study on Out-of-Distribution Generalization
Aishwarya Agrawal, Ivana Kajić, Emanuele Bugliarello, Elnaz Davoodi, Anita Gergely, Phil Blunsom, Aida Nematzadeh
arXiv_AI
arXiv_AI
Image_Caption
Classification
VQA
Text_Generation
Caption
QA
PDF
2022-05-24
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections
Chenliang Li, Haiyang Xu, Junfeng Tian, Wei Wang, Ming Yan, Bin Bi, Jiabo Ye, Hehong Chen, Guohai Xu, Zheng Cao, Ji Zhang, Songfang Huang, Fei Huang, Jingren Zhou
arXiv_CV
arXiv_CV
Image_Caption
Zero-Shot
VQA
Attention
Caption
PDF
2022-05-24
On Advances in Text Generation from Images Beyond Captioning: A Case Study in Self-Rationalization
Shruti Palaskar, Akshita Bhagia, Yonatan Bisk, Florian Metze, Alan W Black, Ana Marasovic
arXiv_CV
arXiv_CV
Image_Caption
Action
VQA
Attention
Text_Generation
Caption
Language_Model
QA
PDF
2022-05-23
VQA-GNN: Reasoning with Multimodal Semantic Graph for Visual Question Answering
Yanan Wang, Michihiro Yasunaga, Hongyu Ren, Shinya Wada, Jure Leskovec
arXiv_AI
arXiv_AI
Recognition
Knowledge
Knowledge_Graph
Pose
VQA
QA
PDF
2022-05-21
Making Video Quality Assessment Models Sensitive to Frame Rate Distortions
Pavan C. Madhusudana, Neil Birkbeck, Yilin Wang, Balu Adsumilli, Alan C. Bovik
arXiv_CV
arXiv_CV
Pose
VQA
QA
PDF
2022-05-19
Let's Talk! Striking Up Conversations via Conversational Visual Question Generation
Shih-Han Chan, Tsai-Lun Yang, Yun-Wei Chu, Chi-Yang Hsu, Ting-Hao Huang, Yu-Shian Chiu, Lun-Wei Ku
arXiv_AI
arXiv_AI
VQA
PDF
2022-05-18
Dialog Inpainting: Turning Documents into Dialogs
Zhuyun Dai, Arun Tejasvi Chaganty, Vincent Zhao, Aida Amini, Qazi Mamunur Rashid, Mike Green, Kelvin Guu
arXiv_AI
arXiv_AI
Inpainting
Pose
VQA
QA
PDF
2022-05-17
Gender and Racial Bias in Visual Question Answering Datasets
Yusuke Hirota, Yuta Nakashima, Noa Garcia
arXiv_CV
arXiv_CV
Relation
VQA
Attention
QA
PDF
2022-05-16
A Framework to Map VMAF with the Probability of Just Noticeable Difference between Video Encoding Recipes
Jingwen Zhu, Suiyi Ling, Yoann Baveye, Patrick Le Callet
arXiv_CV
arXiv_CV
Pose
Deep_Learning
VQA
Prediction
QA
PDF
2022-05-16
A Neuro-Symbolic ASP Pipeline for Visual Question Answering
Thomas Eiter, Nelson Higuera, Johannes Oetsch, Michael Pritz
arXiv_AI
arXiv_AI
Classification
VQA
Prediction
QA
PDF
2022-05-10
Fundamental limitations on optimization in variational quantum algorithms
Hao-Kai Zhang, Chengkai Zhu, Geng Liu, Xin Wang
arXiv_CV
arXiv_CV
Bert
Optimization
VQA
QA
PDF
2022-05-10
Learning to Answer Visual Questions from Web Videos
Antoine Yang, Antoine Miech, Josef Sivic, Ivan Laptev, Cordelia Schmid
arXiv_CV
arXiv_CV
Transformer
Zero-Shot
Pose
VQA
Activity
QA
PDF
2022-05-09
Joint learning of object graph and relation graph for visual question answering
Hao Li, Xu Li, Belhal Karimi, Jie Chen, Mingming Sun
arXiv_CV
arXiv_CV
Pose
Relation
VQA
QA
PDF
2022-05-07
Deep Quality Assessment of Compressed Videos: A Subjective and Objective Study
Liqun Lin, Zheng Wang, Jiachen He, Weiling Chen, Yiwen Xu, Tiesong Zhao
arXiv_CV
arXiv_CV
3D
Pose
Action
VQA
CNN
QA
PDF
2022-05-06
From Easy to Hard: Learning Language-guided Curriculum for Visual Question Answering on Remote Sensing Data
Zhenghang Yuan, Lichao Mou, Qi Wang, Xiao Xiang Zhu
arXiv_CV
arXiv_CV
Pose
Action
VQA
QA
PDF
2022-05-06
QLEVR: A Diagnostic Dataset for Quantificational Language and Elementary Visual Reasoning
Zechen Li, Anders Søgaard
arXiv_CV
arXiv_CV
VQA
PDF
2022-05-06
An Efficient Gradient Sensitive Alternate Framework for Variational Quantum Eigensolver with Variable Ansatz
Ze-Tong Li, Fan-Xu Meng, Han Zeng, Zai-Chen Zhang, Xu-Tao Yu
arXiv_CV
arXiv_CV
Optimization
Pose
VQA
QA
PDF
2022-05-05
What is Right for Me is Not Yet Right for You: A Dataset for Grounding Relative Directions via Multi-Task Learning
Jae Hee Lee, Matthias Kerzel, Kyra Ahrens, Cornelius Weber, Stefan Wermter
arXiv_AI
arXiv_AI
3D
Relation
VQA
QA
PDF
2022-05-05
LAWS: Look Around and Warm-Start Natural Gradient Descent for Quantum Neural Networks
Zeyi Tao, Jindi Wu, Qi Xia, Qun Li
arXiv_CV
arXiv_CV
Gradient_Descent
Optimization
Review
Pose
Classification
VQA
Attention
QA
PDF
2022-05-05
Improved Quantum Computing with the Higher-order Trotter Decomposition
Xiaodong Yang, Xinfang Nie, Yunlan Ji, Tao Xin, Dawei Lu, Jun Li
arXiv_CV
arXiv_CV
Optimization
Pose
VQA
QA
PDF
2022-05-05
Declaration-based Prompt Tuning for Visual Question Answering
Yuhang Liu, Wei Wei, Daowan Peng, Feida Zhu
arXiv_AI
arXiv_AI
Zero-Shot
Self-Supervised
Pose
VQA
Few-Shot
Language_Model
Prediction
QA
Matching
PDF
2022-05-04
qLEET: Visualizing Loss Landscapes, Expressibility, Entangling Power and Training Trajectories for Parameterized Quantum Circuits
Utkarsh Azad, Animesh Sinha
arXiv_CV
arXiv_CV
VQA
QA
PDF
2022-05-04
CoCa: Contrastive Captioners are Image-Text Foundation Models
Jiahui Yu, Zirui Wang, Vijay Vasudevan, Legg Yeung, Mojtaba Seyedhosseini, Yonghui Wu
arXiv_CV
arXiv_CV
Image_Caption
Transformer
Embedding
Recognition
Zero-Shot
Represenation_Learning
Classification
VQA
Attention
Caption
QA
PDF
2022-05-04
All You May Need for VQA are Image Captions
Soravit Changpinyo, Doron Kukliansky, Idan Szpektor, Xi Chen, Nan Ding, Radu Soricut
arXiv_CV
arXiv_CV
Image_Caption
Zero-Shot
Pose
VQA
Caption
QA
PDF
2022-05-02
Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering
AJ Piergiovanni, Wei Li, Weicheng Kuo, Mohammad Saffar, Fred Bertsch, Anelia Angelova
arXiv_CV
arXiv_CV
Image_Caption
Zero-Shot
Pose
VQA
Caption
QA
PDF
2022-04-29
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katie Millican, Malcolm Reynolds, Roman Ring, Eliza Rutherford, Serkan Cabi, Tengda Han, Zhitao Gong, Sina Samangooei, Marianne Monteiro, Jacob Menick, Sebastian Borgeaud, Andrew Brock, Aida Nematzadeh, Sahand Sharifzadeh, Mikolaj Binkowski, Ricardo Barreira, Oriol Vinyals, Andrew Zisserman, Karen Simonyan
arXiv_AI
arXiv_AI
Video_Caption
Pose
VQA
Few-Shot
Caption
Language_Model
PDF
2022-04-29
A Deep Learning based No-reference Quality Assessment Model for UGC Videos
Wei Sun, Xiongkuo Min, Wei Lu, Guangtao Zhai
arXiv_CV
arXiv_CV
Recognition
Sparse
Pose
Action
Deep_Learning
VQA
QA
PDF
2022-04-28
GRIT: General Robust Image Task Benchmark
Tanmay Gupta, Ryan Marten, Aniruddha Kembhavi, Derek Hoiem
arXiv_CV
arXiv_CV
Segmentation
Pose
Face
Detection
VQA
Prediction
PDF
2022-04-28
Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly
Spencer Whitehead, Suzanne Petryk, Vedaad Shakib, Joseph Gonzalez, Trevor Darrell, Anna Rohrbach, Marcus Rohrbach
arXiv_CV
arXiv_CV
Pose
VQA
QA
PDF
2022-04-25
Conversational Question Answering on Heterogeneous Sources
Philipp Christmann, Rishiraj Saha Roy, Gerhard Weikum
arXiv_CL
arXiv_CL
Knowledge
VQA
QA
PDF
2022-04-22
Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks
Zhecan Wang, Noel Codella, Yen-Chun Chen, Luowei Zhou, Xiyang Dai, Bin Xiao, Jianwei Yang, Haoxuan You, Kai-Wei Chang, Shih-fu Chang, Lu Yuan
arXiv_AI
arXiv_AI
Knowledge
Pose
Action
VQA
QA
PDF
2022-04-22
Hypergraph Transformer: Weakly-supervised Multi-hop Reasoning for Knowledge-based Visual Question Answering
Yu-Jung Heo, Eun-Sol Kim, Woo Suk Choi, Byoung-Tak Zhang
arXiv_AI
arXiv_AI
Transformer
Knowledge
Pose
VQA
QA
PDF
2022-04-20
Attention in Reasoning: Dataset, Analysis, and Modeling
Shi Chen, Ming Jiang, Jinhui Yang, Qi Zhao
arXiv_CV
arXiv_CV
Tracking
Pose
Quantitative
VQA
Attention
PDF
2022-04-18
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei
arXiv_CV
arXiv_CV
Transformer
Represenation_Learning
Self-Supervised
Pose
Classification
VQA
Image_Classification
Language_Model
PDF
2022-04-17
Attention Mechanism based Cognition-level Scene Understanding
Xuejiao Tang, Tai Le Quy, Eirini Ntoutsi, Kea Turner, Vasile Palade, Israat Haque, Peng Xu, Chris Brown, Wenbin Zhang
arXiv_AI
arXiv_AI
Knowledge
Pose
Relation
VQA
Attention
Inference
PDF
2022-04-15
Improving Cross-Modal Understanding in Visual Dialog via Contrastive Learning
Feilong Chen, Xiuyi Chen, Shuang Xu, Bo Xu
arXiv_CV
arXiv_CV
Bert
Pose
Contrastive_Learning
VQA
PDF
2022-04-05
CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations
Leonard Salewski, A. Sophia Koepke, Hendrik P. A. Lensch, Zeynep Akata
arXiv_CL
arXiv_CL
Pose
VQA
QA
PDF
2022-04-05
SwapMix: Diagnosing and Regularizing the Over-Reliance on Visual Context in Visual Question Answering
Vipul Gupta, Zhuowan Li, Adam Kortylewski, Chenyu Zhang, Yingwei Li, Alan Yuille
arXiv_CV
arXiv_CV
Pose
VQA
Attention
Prediction
QA
PDF
2022-04-03
Question-Driven Graph Fusion Network For Visual Question Answering
Yuxi Qian, Yuncong Hu, Ruonan Wang, Fangxiang Feng, Xiaojie Wang
arXiv_CV
arXiv_CV
Pose
Detection
Relation
VQA
Object_Detection
Attention
Visual_Relation
QA
PDF
2022-04-02
Co-VQA : Answering by Interactive Sub Question Sequence
Ruonan Wang, Yuxi Qian, Fangxiang Feng, Xiaojie Wang, Huixing Jiang
arXiv_CL
arXiv_CL
Pose
VQA
QA
PDF
2022-03-31
Perceptual Quality Assessment of UGC Gaming Videos
Xiangxu Yu, Zhengzhong Tu, Neil Birkbeck, Yilin Wang, Balu Adsumilli, Alan C. Bovik
arXiv_CV
arXiv_CV
Pose
VQA
CNN
Prediction
QA
PDF
2022-03-31
SimVQA: Exploring Simulated Environments for Visual Question Answering
Paola Cascante-Bonilla, Hui Wu, Letao Wang, Rogerio Feris, Vicente Ordonez
arXiv_CV
arXiv_CV
3D
Pose
VQA
QA
PDF
2022-03-30
VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language Transformers
Estelle Aflalo, Meng Du, Shao-Yen Tseng, Yongfei Liu, Chenfei Wu, Nan Duan, Vasudev Lal
arXiv_CV
arXiv_CV
Transformer
Pose
VQA
Attention
QA
PDF
2022-03-28
Visual Mechanisms Inspired Efficient Transformers for Image and Video Quality Assessment
Junyong You
arXiv_CV
arXiv_CV
Transformer
Pose
VQA
Attention
QA
PDF
2022-03-27
Single-Stream Multi-Level Alignment for Vision-Language Pretraining
Zaid Khan, Vijay Kumar BG, Xiang Yu, Samuel Schulter, Manmohan Chandraker, Yun Fu
arXiv_CV
arXiv_CV
Reconstruction
Zero-Shot
Represenation_Learning
Pose
VQA
Caption
Prediction
QA
PDF
2022-03-26
Learning to Answer Questions in Dynamic Audio-Visual Scenarios
Guangyao Li, Yake Wei, Yapeng Tian, Chenliang Xu, Ji-Rong Wen, Di Hu
arXiv_CV
arXiv_CV
VQA
QA
PDF
2022-03-24
Towards Escaping from Language Bias and OCR Error: Semantics-Centered Text Visual Question Answering
Chengyang Fang, Gangyan Zeng, Yu Zhou, Daiqing Wu, Can Ma, Dayong Hu, Weiping Wang
arXiv_CV
arXiv_CV
Transformer
Recognition
OCR
Optical_Character
Pose
VQA
Prediction
QA
PDF
2022-03-24
Subjective and Objective Analysis of Streamed Gaming Videos
Xiangxu Yu, Zhenqiang Ying, Neil Birkbeck, Yilin Wang, Balu Adsumilli, Alan C. Bovik
arXiv_CV
arXiv_CV
VQA
QA
PDF
2022-03-24
Towards Efficient and Elastic Visual Question Answering with Doubly Slimmable Transformer
Zhou Yu, Zitian Jin, Jun Yu, Mingliang Xu, Jianping Fan
arXiv_CV
arXiv_CV
Transformer
VQA
Inference
QA
PDF
2022-03-22
WuDaoMM: A large-scale Multi-Modal Dataset for Pre-training models
Sha Yuan, Zhao Shuai, Leng Jiahong, Xue Zhao, Zhao Hanyu, Tang Jie
arXiv_CL
arXiv_CL
Image_Caption
Transformer
Relation
VQA
Caption
QA
PDF
2022-03-17
MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-based Visual Question Answering
Yang Ding, Jing Yu, Bang Liu, Yue Hu, Mingxin Cui, Qi Wu
arXiv_CV
arXiv_CV
Embedding
Knowledge
Pose
Action
Relation
VQA
Prediction
QA
PDF
2022-03-15
K-VQG: Knowledge-aware Visual Question Generation for Common-sense Acquisition
Kohei Uehara, Tatsuya Harada
arXiv_CL
arXiv_CL
Knowledge
VQA
PDF
2022-03-15
Can you even tell left from right? Presenting a new challenge for VQA
Sai Raam Venkatraman, Rishi Rao, S. Balasubramanian, Chandra Sekhar Vorugunti, R. Raghunatha Sarma
arXiv_AI
arXiv_AI
Pose
Quantitative
Relation
VQA
QA
PDF
2022-03-15
CARETS: A Consistency And Robustness Evaluative Test Suite for VQA
Carlos E. Jimenez, Olga Russakovsky, Karthik Narasimhan
arXiv_CL
arXiv_CL
Action
VQA
QA
PDF
2022-03-14
CLIP Models are Few-shot Learners: Empirical Studies on VQA and Visual Entailment
Haoyu Song, Li Dong, Wei-Nan Zhang, Ting Liu, Furu Wei
arXiv_CL
arXiv_CL
Image_Caption
Zero-Shot
Pose
VQA
Few-Shot
Caption
QA
PDF
2022-03-12
Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation
Wenliang Dai, Lu Hou, Lifeng Shang, Xin Jiang, Qun Liu, Pascale Fung
arXiv_AI
arXiv_AI
Image_Caption
Zero-Shot
Knowledge
Pose
VQA
Caption
Language_Model
QA
PDF
2022-03-09
NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks
Fawaz Sammani, Tanmoy Mukherjee, Nikos Deligiannis
arXiv_CV
arXiv_CV
Image_Caption
VQA
Caption
Inference
Language_Model
Prediction
QA
PDF
2022-03-07
Barlow constrained optimization for Visual Question Answering
Abhishek Jha, Badri N. Patro, Luc Van Gool, Tinne Tuytelaars
arXiv_CV
arXiv_CV
Embedding
Optimization
Regularization
Pose
Action
Relation
VQA
QA
PDF
2022-03-06
Modeling Coreference Relations in Visual Dialog
Mingxiao Li, Marie-Francine Moens
arXiv_AI
arXiv_AI
Transformer
Unsupervised
Knowledge
Pose
Relation
VQA
PDF
2022-03-06
Dynamic Key-value Memory Enhanced Multi-step Graph Reasoning for Knowledge-based Visual Question Answering
Mingxiao Li, Marie-Francine Moens
arXiv_CV
arXiv_CV
Knowledge
Pose
Relation
VQA
Attention
QA
PDF
2022-03-02
Recent, rapid advancement in visual question answering architecture
Venkat Kodali, Daniel Berleant
arXiv_CV
arXiv_CV
Review
VQA
PDF
2022-03-01
Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment
Mingyang Zhou, Licheng Yu, Amanpreet Singh, Mengjiao Wang, Zhou Yu, Ning Zhang
arXiv_AI
arXiv_AI
Unsupervised
Pose
VQA
QA
PDF
2022-02-28
Robust resource-efficient quantum variational ansatz through evolutionary algorithm
Yuhan Huang, Qingyu Li, Xiaokai Hou, Rebing Wu, Man-Hong Yung, Abolfazl Bayat, Xiaoting Wang
arXiv_CV
arXiv_CV
Optimization
Pose
VQA
QA
PDF
2022-02-25
CAFQA: Clifford Ansatz For Quantum Accuracy
Gokul Subramanian Ravi, Pranav Gokhale, Yi Ding, William M. Kirby, Kaitlin N. Smith, Jonathan M. Baker, Peter J. Love, Henry Hoffmann, Kenneth R. Brown, Frederic T. Chong
arXiv_CV
arXiv_CV
Optimization
Relation
VQA
QA
PDF
2022-02-25
On Modality Bias Recognition and Reduction
Yangyang Guo, Liqiang Nie, Harry Cheng, Zhiyong Cheng, Mohan Kankanhalli, Alberto Del Bimbo
arXiv_CV
arXiv_CV
Recognition
Pose
Action_Recognition
Action
Classification
Relation
VQA
Prediction
PDF
2022-02-25
Joint Answering and Explanation for Visual Commonsense Reasoning
Zhenyang Li, Yangyang Guo, Kejie Wang, Yinwei Wei, Liqiang Nie, Mohan Kankanhalli
arXiv_CV
arXiv_CV
Knowledge
Pose
VQA
Inference
QA
PDF
2022-02-24
Measuring CLEVRness: Blackbox testing of Visual Reasoning Models
Spyridon Mouselinos, Henryk Michalewski, Mateusz Malinowski
arXiv_AI
arXiv_AI
Adversarial
Pose
VQA
QA
PDF
2022-02-21
Vision-Language Pre-Training with Triple Contrastive Learning
Jinyu Yang, Jiali Duan, Son Tran, Yi Xu, Sampath Chanda, Liqun Chen, Belinda Zeng, Trishul Chilimbi, Junzhou Huang
arXiv_CV
arXiv_CV
Embedding
Represenation_Learning
Knowledge
Pose
Contrastive_Learning
VQA
PDF
2022-02-21
OG-SGG: Ontology-Guided Scene Graph Generation. A Case Study in Transfer Learning for Telepresence Robotics
Fernando Amodeo, Fernando Caballero, Natalia Díaz-Rodríguez, Luis Merino
arXiv_CV
arXiv_CV
Transfer_Learning
Knowledge
Pose
Ontology
Action
VQA
QA
PDF
2022-02-17
Pulse based Variational Quantum Optimal Control for hybrid quantum computing
Robert de Keijzer, Oliver Tse, Servaas Kokkelmans
arXiv_CV
arXiv_CV
Optimization
Pose
VQA
QA
PDF
2022-02-17
Deep VQA based on a Novel Hybrid Training Methodology
Chen Feng, Fan Zhang, David R. Bull
arXiv_CV
arXiv_CV
Pose
Deep_Learning
Relation
VQA
QA
PDF
2022-02-15
Privacy Preserving Visual Question Answering
Cristian-Paul Bara, Qing Ping, Abhinav Mathur, Govind Thattai, Rohith MV, Gaurav S. Sukhatme
arXiv_CV
arXiv_CV
Pose
VQA
QA
PDF
2022-02-15
Delving Deeper into Cross-lingual Visual Question Answering
Chen Liu, Jonas Pfeiffer, Anna Korhonen, Ivan Vulic, Iryna Gurevych
arXiv_CV
arXiv_CV
Transformer
Embedding
Zero-Shot
Action
VQA
Few-Shot
QA
PDF
2022-02-14
An experimental study of the vision-bottleneck in VQA
Pierre Marza, Corentin Kervadec, Grigory Antipov, Moez Baccouche, Christian Wolf
arXiv_CV
arXiv_CV
Embedding
Pose
Detection
VQA
Object_Detection
Attention
QA
PDF
2022-02-09
Can Open Domain Question Answering Systems Answer Visual Knowledge Questions?
Jiawen Zhang, Abhijit Mishra, Avinesh P.V.S, Siddharth Patwardhan, Sachin Agarwal
arXiv_CL
arXiv_CL
Unsupervised
Reinforcement_Learning
Bert
Weakly_Supervised
Knowledge
Pose
VQA
QA
PDF
2022-02-08
NEWSKVQA: Knowledge-Aware News Video Question Answering
Pranay Gupta, Manish Gupta
arXiv_CV
arXiv_CV
Surveillance
Video_Indexing
Knowledge
Pose
VQA
Summarization
Activity
QA
Video_Retrieval
PDF
2022-02-07
Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Peng Wang, An Yang, Rui Men, Junyang Lin, Shuai Bai, Zhikang Li, Jianxin Ma, Chang Zhou, Jingren Zhou, Hongxia Yang
arXiv_CV
arXiv_CV
Image_Caption
Bert
Pose
Classification
VQA
Text_Generation
Caption
Image_Classification
QA
PDF
2022-02-04
Grounding Answers for Visual Questions Asked by Visually Impaired People
Chongyan Chen, Samreen Anjum, Danna Gurari
arXiv_CV
arXiv_CV
Recognition
Action
VQA
QA
PDF
2022-01-30
Compositionality as Lexical Symmetry
Ekin Akyürek, Jacob Andreas
arXiv_CV
arXiv_CV
Transformer
RNN
VQA
PDF
2022-01-28
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li, Dongxu Li, Caiming Xiong, Steven Hoi
arXiv_CV
arXiv_CV
Image_Caption
Zero-Shot
Pose
VQA
Caption
QA
PDF
2022-01-27
IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages
Emanuele Bugliarello, Fangyu Liu, Jonas Pfeiffer, Siva Reddy, Desmond Elliott, Edoardo Maria Ponti, Ivan Vulić
arXiv_CL
arXiv_CL
Transfer_Learning
Zero-Shot
VQA
Few-Shot
PDF
2022-01-27
Transformer Module Networks for Systematic Generalization in Visual Question Answering
Moyuru Yamada, Vanessa D'Amario, Kentaro Takemoto, Xavier Boix, Tomotake Sasaki
arXiv_CV
arXiv_CV
Transformer
Pose
VQA
QA
PDF
2022-01-26
Learning to Compose Diversified Prompts for Image Emotion Classification
Sinuo Deng, Lifang Wu, Ge Shi, Lehao Xing, Meng Jian
arXiv_CV
arXiv_CV
Pose
Emotion
Classification
VQA
Language_Model
PDF
2022-01-25
MGA-VQA: Multi-Granularity Alignment for Visual Question Answering
Peixi Xiong, Yilin Shen, Hongxia Jin
arXiv_CV
arXiv_CV
Pose
Relation
VQA
Attention
QA
PDF
2022-01-25
SA-VQA: Structured Alignment of Visual and Semantic Representations for Visual Question Answering
Peixi Xiong, Quanzeng You, Pei Yu, Zicheng Liu, Ying Wu
arXiv_CV
arXiv_CV
Pose
Action
Relation
VQA
Attention
QA
PDF
2022-01-24
Question Generation for Evaluating Cross-Dataset Shifts in Multi-modal Grounding
Arjun R. Akula
arXiv_CV
arXiv_CV
Knowledge
Pose
VQA
QA
PDF
2022-01-15
CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks
Zhecan Wang, Noel Codella, Yen-Chun Chen, Luowei Zhou, Jianwei Yang, Xiyang Dai, Bin Xiao, Haoxuan You, Shih-Fu Chang, Lu Yuan
arXiv_CL
arXiv_CL
Embedding
Knowledge
Pose
VQA
Inference
QA
PDF
2022-01-14
A Thousand Words Are Worth More Than a Picture: Natural Language-Centric Outside-Knowledge Visual Question Answering
Feng Gao, Qing Ping, Govind Thattai, Aishwarya Reganti, Ying Nian Wu, Prem Natarajan
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
Pose
VQA
GAN
Language_Model
QA
PDF
2022-01-13
Towards Automated Error Analysis: Learning to Characterize Errors
Tong Gao, Shivang Singh, Raymond J. Mooney
arXiv_AI
arXiv_AI
Bert
Pose
VQA
PDF
2022-01-11
Uni-EDEN: Universal Encoder-Decoder Network by Multi-Granular Vision-Language Pre-training
Yehao Li, Jiahao Fan, Yingwei Pan, Ting Yao, Weiyao Lin, Tao Mei
arXiv_CV
arXiv_CV
Image_Caption
Transformer
Knowledge
Action
Classification
VQA
Caption
Language_Model
Matching
PDF
2022-01-11
On the Efficacy of Co-Attention Transformer Layers in Visual Question Answering
Ankur Sikarwar, Gabriel Kreiman
arXiv_CV
arXiv_CV
Transformer
Speech
Action
Quantitative
VQA
Attention
QA
PDF
2022-01-10
COIN: Counterfactual Image Generation for VQA Interpretation
Zeyd Boukhers, Timo Hartmann, Jan Jürjens
arXiv_CV
arXiv_CV
Pose
Quantitative
VQA
QA
PDF
2022-01-10
Generating the optimal structures for parameterized quantum circuits by a meta-trained graph variational autoencoder
Chuangtao Chen, Zhimin He, Shenggen Zheng, Yan Zhou, Haozhen Situ
arXiv_CV
arXiv_CV
Optimization
Pose
VQA
QA
PDF
2022-01-07
MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound
Rowan Zellers, Jiasen Lu, Ximing Lu, Youngjae Yu, Yanpeng Zhao, Mohammadreza Salehi, Aditya Kusupati, Jack Hessel, Ali Farhadi, Yejin Choi
arXiv_CL
arXiv_CL
Video_Caption
Zero-Shot
Knowledge
Pose
VQA
Prediction
QA
PDF
2022-01-05
FAVER: Blind Quality Prediction of Variable Frame Rate Videos
Qi Zheng, Zhengzhong Tu, Pavan C. Madhusudana, Xiaoyang Zeng, Alan C. Bovik, Yibo Fan
arXiv_CV
arXiv_CV
Pose
VQA
Prediction
QA
PDF
2022-01-05
Quantum Chaos and Circuit Parameter Optimization
Joonho Kim, Yaron Oz, Dario Rosa
arXiv_CV
arXiv_CV
Optimization
Relation
VQA
QA
PDF
2022-01-04
Interactive Attention AI to translate low light photos to captions for night scene understanding in women safety
Rajagopal A, Nirmala V, Arun Muthuraj Vedamanickam
arXiv_CV
arXiv_CV
Image_Caption
Enhancement
Image_Enhancement
Deep_Learning
VQA
Attention
Caption
Language_Model
PDF
2022-01-04
Quantum circuit architecture search on a superconducting processor
Kehuan Linghu, Yang Qian, Ruixia Wang, Meng-Jun Hu, Zhiyuan Li, Xuegang Li, Huikai Xu, Jingning Zhang, Teng Ma, Peng Zhao, Dong E. Liu, Min-Hsiu Hsieh, Xingyao Wu, Yuxuan Du, Dacheng Tao, Yirong Jin, Haifeng Yu
arXiv_CV
arXiv_CV
NAS
Classification
VQA
QA
PDF
2021-12-30
VisQA: Quantifying Information Visualisation Recallability via Question Answering
Yao Wang, Chuhan Jiao, Mihai Bâce, Andreas Bulling
arXiv_CV
arXiv_CV
Pose
Quantitative
VQA
QA
PDF
2021-12-27
Does CLIP Benefit Visual Question Answering in the Medical Domain as Much as it Does in the General Domain?
Sedigheh Eslami, Gerard de Melo, Christoph Meinel
arXiv_CV
arXiv_CV
Represenation_Learning
VQA
Medical
QA
PDF
2021-12-27
Multi-Image Visual Question Answering
Harsh Raj, Janhavi Dadhania, Akhilesh Bhardwaj
arXiv_AI
arXiv_AI
Embedding
Bert
Pose
Action
VQA
Attention
PDF
2021-12-23
LaTr: Layout-Aware Transformer for Scene-Text VQA
Ali Furkan Biten, Ron Litman, Yusheng Xie, Srikar Appalaraju, R. Manmatha
arXiv_CV
arXiv_CV
Transformer
OCR
Pose
Scene_Text
Detection
VQA
Object_Detection
QA
PDF
2021-12-22
CLEVR3D: Compositional Language and Elementary Visual Reasoning for Question Answering in 3D Real-World Scenes
Xu Yan, Zhihao Yuan, Yuhao Du, Yinghong Liao, Yao Guo, Zhen Li, Shuguang Cui
arXiv_CV
arXiv_CV
Transformer
Recognition
3D
Pose
Classification
Relation
VQA
QA
PDF
2021-12-21
A Parameter Initialization Method for Variational Quantum Algorithms to Mitigate Barren Plateaus Based on Transfer Learning
Huan-Yu Liu, Tai-Ping Sun, Yu-Chun Wu, Yong-Jian Han, Guo-Ping Guo
arXiv_CV
arXiv_CV
Transfer_Learning
Knowledge
Face
VQA
QA
PDF
2021-12-20
General Greedy De-bias Learning
Xinzhe Han, Shuhui Wang, Chi Su, Qingming Huang, Qi Tian
arXiv_CV
arXiv_CV
Gradient_Descent
Regularization
Knowledge
Adversarial
Pose
Classification
Relation
VQA
Image_Classification
Prediction
PDF
2021-12-20
ScanQA: 3D Question Answering for Spatial Scene Understanding
Daichi Azuma, Taiki Miyanishi, Shuhei Kurita, Motoki Kawanabe
arXiv_CV
arXiv_CV
Embedding
3D
Knowledge
Pose
VQA
QA
PDF
2021-12-19
Task-Oriented Multi-User Semantic Communications
Huiqiang Xie, Zhijin Qin, Xiaoming Tao, Khaled B. Letaief
arXiv_CV
arXiv_CV
Transformer
Embedding
Image_Retrieval
Pose
Deep_Learning
VQA
QA
PDF
2021-12-16
Distilled Dual-Encoder Model for Vision-Language Understanding
Zekun Wang, Wenhui Wang, Haichao Zhu, Ming Liu, Bing Qin, Furu Wei
arXiv_CV
arXiv_CV
Image_Caption
Pose
Action
VQA
Attention
Inference
PDF
2021-12-16
KAT: A Knowledge Augmented Transformer for Vision-and-Language
Liangke Gui, Borui Wang, Qiuyuan Huang, Alex Hauptmann, Yonatan Bisk, Jianfeng Gao
arXiv_CL
arXiv_CL
Transformer
Knowledge
Pose
VQA
Prediction
QA
PDF
2021-12-15
3D Question Answering
Shuquan Ye, Dongdong Chen, Songfang Han, Jing Liao
arXiv_CV
arXiv_CV
Transformer
Point_Cloud
3D
Bert
Pose
VQA
QA
PDF
2021-12-14
Dual-Key Multimodal Backdoors for Visual Question Answering
Matthew Walmer, Karan Sikka, Indranil Sur, Abhinav Shrivastava, Susmit Jha
arXiv_CV
arXiv_CV
Embedding
Optimization
Pose
Classification
Deep_Learning
Detection
VQA
Object_Detection
QA
PDF
2021-12-14
Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in Visual Question Answering
JianJian Cao, Xiameng Qin, Sanyuan Zhao, Jianbing Shen
arXiv_CV
arXiv_CV
Embedding
Pose
Deep_Learning
Relation
VQA
Attention
Prediction
QA
Matching
PDF
2021-12-13
Improving and Diagnosing Knowledge-Based Visual Question Answering via Entity Enhanced Knowledge Injection
Diego Garcia-Olano, Yasumasa Onoe, Joydeep Ghosh
arXiv_CV
arXiv_CV
Embedding
Weakly_Supervised
Knowledge
Knowledge_Graph
Pose
VQA
Attention
Language_Model
QA
PDF
2021-12-13
VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks
Yi-Lin Sung, Jaemin Cho, Mohit Bansal
arXiv_AI
arXiv_AI
Image_Caption
Transfer_Learning
Knowledge
Pose
VQA
Caption
Language_Model
QA
PDF
2021-12-12
Change Detection Meets Visual Question Answering
Zhenghang Yuan, Lichao Mou, Zhitong Xiong, Xiaoxiang Zhu
arXiv_CV
arXiv_CV
Face
Detection
VQA
Prediction
QA
PDF
2021-12-10
VAQEM: A Variational Approach to Quantum Error Mitigation
Gokul Subramanian Ravi, Kaitlin N. Smith, Pranav Gokhale, Andrea Mari, Nathan Earnest, Ali Javadi-Abhari, Frederic T. Chong
arXiv_CV
arXiv_CV
Pose
VQA
QA
PDF
2021-12-10
Unified Multimodal Pre-training and Prompt-based Tuning for Vision-Language Understanding and Generation
Tianyi Liu, Zuxuan Wu, Wenhan Xiong, Jingjing Chen, Yu-Gang Jiang
arXiv_CV
arXiv_CV
Bert
Pose
VQA
Few-Shot
Text_Generation
Language_Model
Matching
PDF
2021-12-09
MAGMA -- Multimodal Augmentation of Generative Models through Adapter-based Finetuning
Constantin Eichenberg, Sidney Black, Samuel Weinbach, Letitia Parcalabescu, Anette Frank
arXiv_CV
arXiv_CV
Optimization
Knowledge
VQA
Language_Model
QA
PDF
2021-12-08
MLP Architectures for Vision-and-Language Modeling: An Empirical Study
Yixin Nie, Linjie Li, Zhe Gan, Shuohang Wang, Chenguang Zhu, Michael Zeng, Zicheng Liu, Mohit Bansal, Lijuan Wang
arXiv_CV
arXiv_CV
Transformer
VQA
Attention
Language_Model
QA
PDF
2021-12-06
MoCA: Incorporating Multi-stage Domain Pretraining and Cross-guided Multimodal Attention for Textbook Question Answering
Fangzhi Xu, Qika Lin, Jun Liu, Lingling Zhang, Tianzhe Zhao, Qi Chai, Yudai Pan
arXiv_AI
arXiv_AI
Unsupervised
Pose
VQA
Attention
Language_Model
QA
PDF
2021-11-29
EQC : Ensembled Quantum Computing for Variational Quantum Algorithms
Samuel Stein, Yufei Ding, Nathan Wiebe, Bo Peng, Karol Kowalski, Nathan Baker, James Ang, Ang Li
arXiv_CV
arXiv_CV
Pose
Face
VQA
QA
PDF
2021-11-29
Searching the Search Space of Vision Transformer
Minghao Chen, Kan Wu, Bolin Ni, Houwen Peng, Bei Liu, Jianlong Fu, Hongyang Chao, Haibin Ling
arXiv_CV
arXiv_CV
Transformer
NAS
Segmentation
Semantic_Segmentation
Recognition
Pose
Detection
VQA
Object_Detection
PDF
2021-11-29
Quantifying Fermionic Nonlinearity of Quantum Circuits
Shigeo Hakkaku, Yuichiro Tashima, Kosuke Mitarai, Wataru Mizukami, Keisuke Fujii
arXiv_CV
arXiv_CV
Pose
Action
VQA
QA
PDF
2021-11-29
LiVLR: A Lightweight Visual-Linguistic Reasoning Framework for Video Question Answering
Jingjing Jiang, Ziyi Liu, Yifan Liu, Nanning Zheng
arXiv_CV
arXiv_CV
Video_Caption
Pose
VQA
QA
PDF
2021-11-26
Performance comparison of optimization methods on variational quantum algorithms
Xavier Bonet-Monroig, Hao Wang, Diederick Vermetten, Bruno Senjean, Charles Moussa, Thomas Bäck, Vedran Dunjko, Thomas E. O'Brien
arXiv_CV
arXiv_CV
Optimization
VQA
QA
PDF
2021-11-25
Scene Graph Generation with Geometric Context
Vishal Kumar, Albert Mundu, Satish Kumar Singh
arXiv_AI
arXiv_AI
Image_Caption
Recognition
Knowledge
Relation
VQA
Attention
Caption
Activity
PDF
2021-11-24
Quantum Architecture Search with Meta-learning
Zhimin He, Chuangtao Chen, Lvzhou Li, Shenggen Zheng, Haozhen Situ
arXiv_CV
arXiv_CV
NAS
Optimization
Pose
VQA
QA
PDF
2021-11-23
Crossing the Format Boundary of Text and Boxes: Towards Unified Vision-Language Modeling
Zhengyuan Yang, Zhe Gan, Jianfeng Wang, Xiaowei Hu, Faisal Ahmed, Zicheng Liu, Yumao Lu, Lijuan Wang
arXiv_CV
arXiv_CV
Image_Caption
Transformer
Pose
VQA
Text_Generation
Caption
Language_Model
Prediction
PDF
2021-11-22
Florence: A New Foundation Model for Computer Vision
Lu Yuan, Dongdong Chen, Yi-Ling Chen, Noel Codella, Xiyang Dai, Jianfeng Gao, Houdong Hu, Xuedong Huang, Boxin Li, Chunyuan Li, Ce Liu, Mengchen Liu, Zicheng Liu, Yumao Lu, Yu Shi, Lijuan Wang, Jianfeng Wang, Bin Xiao, Zhen Xiao, Jianwei Yang, Michael Zeng, Luowei Zhou, Pengchuan Zhang
arXiv_AI
arXiv_AI
Image_Caption
Transfer_Learning
Recognition
Zero-Shot
Pose
Action_Recognition
Action
Classification
Detection
VQA
Few-Shot
Object_Detection
Caption
QA
Video_Retrieval
PDF
2021-11-22
Many Heads but One Brain: an Overview of Fusion Brain Challenge on AI Journey 2021
Daria Bakshandaeva, Denis Dimitrov, Alex Shonenkov, Mark Potanin, Vladimir Arkhipkin, Denis Karachev, Vera Davydova, Anton Voronov, Mikhail Martynov, Natalia Semenova, Mikhail Stepnov, Elena Tutubalina, Andrey Chertok, Aleksandr Petiushko
arXiv_AI
arXiv_AI
Recognition
Zero-Shot
Pose
Detection
VQA
Object_Detection
PDF
2021-11-19
Medical Visual Question Answering: A Survey
Zhihong Lin, Donghao Zhang, Qingyi Tac, Danli Shi, Gholamreza Haffari, Qi Wu, Mingguang He, Zongyuan Ge
arXiv_AI
arXiv_AI
Review
Survey
VQA
Medical
QA
PDF
2021-11-19
UFO: A UniFied TransfOrmer for Vision-Language Representation Learning
Jianfeng Wang, Xiaowei Hu, Zhe Gan, Zhengyuan Yang, Xiyang Dai, Zicheng Liu, Yumao Lu, Lijuan Wang
arXiv_CV
arXiv_CV
Image_Caption
Transformer
Optimization
Represenation_Learning
Pose
VQA
Attention
Caption
Language_Model
Matching
PDF
2021-11-18
Blind VQA on 360{deg} Video via Progressively Learning from Pixels, Frames and Video
Li Yang, Mai Xu, Shengxi Li, Yichen Guo, Zulin Wang
arXiv_CV
arXiv_CV
Pose
Relation
VQA
Prediction
QA
PDF
2021-11-17
Achieving Human Parity on Visual Question Answering
Ming Yan, Haiyang Xu, Chenliang Li, Junfeng Tian, Bin Bi, Wei Wang, Weihua Chen, Xianzhe Xu, Fan Wang, Zheng Cao, Ji Zhang, Songfang Huang, Fei Huang, Luo Si
arXiv_CV
arXiv_CV
Knowledge
Action
VQA
QA
PDF
2021-11-16
Document AI: Benchmarks, Models and Applications
Lei Cui, Yiheng Xu, Tengchao Lv, Furu Wei
arXiv_CL
arXiv_CL
Review
Action
Classification
Deep_Learning
VQA
Image_Classification
PDF
2021-11-16
Words of Wisdom: Representational Harms in Learning From AI Communication
Amanda Buddemeyer, Erin Walker, Malihe Alikhani
arXiv_CL
arXiv_CL
VQA
PDF
2021-11-16
Language bias in Visual Question Answering: A Survey and Taxonomy
Desen Yuan
arXiv_CV
arXiv_CV
Enhancement
Review
Survey
VQA
Attention
QA
PDF
2021-11-15
Measurement-induced entanglement phase transitions in variational quantum circuits
Roeland Wiersema, Cunlu Zhou, Juan Felipe Carrasquilla, Yong Baek Kim
arXiv_CV
arXiv_CV
Optimization
Pose
VQA
QA
PDF
2021-11-15
Stochastic Gradient Line Bayesian Optimization: Reducing Measurement Shots in Optimizing Parameterized Quantum Circuits
Shiro Tamiya, Hayata Yamasaki
arXiv_CV
arXiv_CV
Gradient_Descent
Optimization
VQA
QA
PDF
2021-11-12
Visual Intelligence through Human Interaction
Ranjay Krishna, Mitchell Gordon, Li Fei-Fei, Michael Bernstein
arXiv_CV
arXiv_CV
Face
Action
VQA
PDF
2021-11-11
Graph Relation Transformer: Incorporating pairwise object features into the Transformer architecture
Michael Yang, Aditya Anantharaman, Zachary Kitowski, Derik Clive Robert
arXiv_CV
arXiv_CV
Transformer
Optimization
Pose
Relation
VQA
Attention
QA
PDF
2021-11-10
ICDAR 2021 Competition on Document VisualQuestion Answering
Rubèn Tito, Minesh Mathew, C.V. Jawahar, Ernest Valveny, Dimosthenis Karatzas
arXiv_CV
arXiv_CV
VQA
QA
PDF
2021-11-09
Mode connectivity in the QCBM loss landscape
Kathleen E. Hamilton, Emily Lynn, Vicente Leyton-Ortega, Swarnadeep Majumder, Raphael C. Pooser
arXiv_CV
arXiv_CV
Knowledge
VQA
QA
PDF
2021-11-08
Visual Question Answering based on Formal Logic
Muralikrishnna G. Sethuraman, Ali Payani, Faramarz Fekri, J. Clayton Kerce
arXiv_AI
arXiv_AI
Transformer
Knowledge
Pose
Action
Deep_Learning
VQA
QA
PDF
2021-11-08
ORQVIZ: Visualizing High-Dimensional Landscapes in Variational Quantum Algorithms
Manuel S. Rudolph, Sukin Sim, Asad Raza, Michal Stechly, Jarrod R. McClean, Eric R. Anschuetz, Luis Serrano, Alejandro Perdomo-Ortiz
arXiv_CV
arXiv_CV
Optimization
Review
VQA
QA
PDF
2021-11-03
An Empirical Study of Training End-to-End Vision-and-Language Transformers
Zi-Yi Dou, Yichong Xu, Zhe Gan, Jianfeng Wang, Shuohang Wang, Lijuan Wang, Chenguang Zhu, Nanyun (Violet) Peng, Zicheng Liu, Michael Zeng
arXiv_CV
arXiv_CV
Transformer
Bert
VQA
Attention
Inference
QA
PDF
2021-11-03
VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts
Wenhui Wang, Hangbo Bao, Li Dong, Furu Wei
arXiv_CV
arXiv_CV
Transformer
Pose
Classification
VQA
Attention
QA
PDF
2021-11-01
Introspective Distillation for Robust Question Answering
Yulei Niu, Hanwang Zhang
arXiv_AI
arXiv_AI
Pose
VQA
QA
PDF
2021-10-27
Subtleties in the trainability of quantum machine learning models
Supanut Thanasilp, Samson Wang, Nhat A. Nghiem, Patrick J. Coles, M. Cerezo
arXiv_CV
arXiv_CV
Embedding
VQA
QA
PDF
2021-10-27
Perceptual Score: What Data Modalities Does Your Model Perceive?
Itai Gat, Idan Schwartz, Alexander Schwing
arXiv_CV
arXiv_CV
Pose
VQA
PDF
2021-10-25
IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning
Pan Lu, Liang Qiu, Jiaqi Chen, Tony Xia, Yizhou Zhao, Wei Zhang, Zhou Yu, Xiaodan Liang, Song-Chun Zhu
arXiv_AI
arXiv_AI
Transformer
Embedding
Recognition
VQA
QA
PDF
2021-10-25
Alignment Attention by Matching Key and Query Distributions
Shujian Zhang, Xinjie Fan, Huangjie Zheng, Korawat Tanwisuth, Mingyuan Zhou
arXiv_CL
arXiv_CL
Unsupervised
Regularization
Adversarial
Pose
VQA
Attention
Matching
PDF
2021-10-22
Quantum algorithm for Neighborhood Preserving Embedding
Shi-Jie Pan, Lin-Chun Wan, Hai-Ling Liu, Yu-Sen Wu, Su-Juan Qin, Qiao-Yan Wen, Fei Gao
arXiv_CV
arXiv_CV
Embedding
Pose
VQA
QA
PDF
2021-10-21
DAIR: Data Augmented Invariant Regularization
Tianjian Huang, Shaunak Halbe, Chinnadhurai Sankar, Pooyan Amini, Satwik Kottur, Alborz Geramifard, Meisam Razaviyayn, Ahmad Beirami
arXiv_AI
arXiv_AI
Regularization
Pose
Deep_Learning
VQA
PDF
2021-10-21
Single-Modal Entropy based Active Learning for Visual Question Answering
Dong-Jin Kim, Jae Won Cho, Jinsoo Choi, Yunjae Jung, In So Kweon
arXiv_AI
arXiv_AI
Pose
VQA
QA
PDF
2021-10-17
Towards Language-guided Visual Recognition via Dynamic Convolutions
Gen Luo, Yiyi Zhou, Xiaoshuai Sun, Xinghao Ding, Yongjian Wu, Feiyue Huang, Yue Gao, Rongrong Ji
arXiv_CV
arXiv_CV
Recognition
Pose
VQA
QA
PDF
2021-10-16
BAPGAN: GAN-based Bone Age Progression of Femur and Phalange X-ray Images
Shinji Nakazawa, Changhee Han, Joe Hasei, Ryuichi Nakahara, Toshifumi Ozaki
arXiv_CV
arXiv_CV
Embedding
OCR
Knowledge
Adversarial
Pose
VQA
GAN
CNN
PDF
2021-10-16
A Good Prompt Is Worth Millions of Parameters? Low-resource Prompt-based Learning for Vision-Language Models
Woojeong Jin, Yu Cheng, Yelong Shen, Weizhu Chen, Xiang Ren
arXiv_CV
arXiv_CV
Image_Caption
Transformer
Zero-Shot
Pose
VQA
Few-Shot
GAN
Caption
Inference
Language_Model
QA
PDF
2021-10-15
Guiding Visual Question Generation
Nihir Vedd, Zixu Wang, Marek Rei, Yishu Miao, Lucia Specia
arXiv_CV
arXiv_CV
Pose
Quantitative
VQA
QA
PDF
2021-10-14
Semantically Distributed Robust Optimization for Vision-and-Language Inference
Tejas Gokhale, Abhishek Chaudhary, Pratyay Banerjee, Chitta Baral, Yezhou Yang
arXiv_CV
arXiv_CV
Optimization
Knowledge
Adversarial
VQA
Inference
Language_Model
QA
PDF
2021-10-13
Improving Users' Mental Model with Attention-directed Counterfactual Edits
Kamran Alipour, Arijit Ray, Xiao Lin, Michael Cogswell, Jurgen P. Schulze, Yi Yao, Giedrius T. Burachas
arXiv_CV
arXiv_CV
Inpainting
Adversarial
Pose
VQA
Attention
GAN
QA
PDF
2021-10-13
MMIU: Dataset for Visual Intent Understanding in Multimodal Assistants
Alkesh Patel, Joel Ruben Antony Moniz, Roman Nguyen, Nick Tzou, Hadas Kotek, Vincent Renkens
arXiv_CV
arXiv_CV
Transformer
Knowledge
Classification
VQA
QA
PDF
2021-10-11
Beyond Accuracy: A Consolidated Tool for Visual Question Answering Benchmarking
Dirk Väth, Pascal Tilli, Ngoc Thang Vu
arXiv_AI
arXiv_AI
Embedding
Pose
VQA
GAN
QA
PDF
2021-10-11
Pano-AVQA: Grounded Audio-Visual Question Answering on 360$^circ$ Videos
Heeseung Yun, Youngjae Yu, Wonsuk Yang, Kangil Lee, Gunhee Kim
arXiv_CV
arXiv_CV
Transformer
Embedding
Pose
Relation
VQA
Visual_Relation
QA
PDF
2021-10-06
Coarse-to-Fine Reasoning for Visual Question Answering
Binh X. Nguyen, Tuong Do, Huy Tran, Erman Tjiputra, Quang D. Tran, Anh Nguyen
arXiv_CV
arXiv_CV
Pose
Relation
VQA
Attention
Visual_Relation
QA
PDF
2021-10-03
Counterfactual Samples Synthesizing and Training for Robust Visual Question Answering
Long Chen, Yuhang Zheng, Yulei Niu, Hanwang Zhang, Jun Xiao
arXiv_AI
arXiv_AI
Pose
Relation
VQA
QA
PDF
2021-10-03
Coarse grained intermolecular interactions on quantum processors
Lewis W. Anderson, Martin Kiffner, Panagiotis Kl. Barkoutsos, Ivano Tavernelli, Jason Crain, Dieter Jaksch
arXiv_CV
arXiv_CV
Action
VQA
QA
PDF
2021-10-02
Asking questions on handwritten document collections
Minesh Mathew, Lluis Gomez, Dimosthenis Karatzas, CV Jawahar
arXiv_CV
arXiv_CV
Embedding
Recognition
OCR
Pose
VQA
QA
PDF
2021-10-01
Calibrating Concepts and Operations: Towards Symbolic Reasoning on Real Images
Zhuowan Li, Elias Stengel-Eskin, Yixiao Zhang, Cihang Xie, Quan Tran, Benjamin Van Durme, Alan Yuille
arXiv_CV
arXiv_CV
Embedding
Pose
VQA
QA
PDF
2021-09-27
Multimodal Integration of Human-Like Attention in Visual Question Answering
Ekta Sood, Fabian Kögel, Philipp Müller, Dominike Thomas, Mihai Bace, Andreas Bulling
arXiv_CV
arXiv_CV
Transformer
Salient
VQA
Attention
Prediction
QA
PDF
2021-09-27
VQA-MHUG: A Gaze Dataset to Study Multimodal Neural Attention in Visual Question Answering
Ekta Sood, Fabian Kögel, Florian Strohm, Prajit Dhar, Andreas Bulling
arXiv_CV
arXiv_CV
Relation
VQA
Attention
QA
PDF
2021-09-27
High Frame Rate Video Quality Assessment using VMAF and Entropic Differences
Pavan C Madhusudana, Neil Birkbeck, Yilin Wang, Balu Adsumilli, Alan C. Bovik
arXiv_CV
arXiv_CV
Pose
Action
Relation
VQA
QA
PDF
2021-09-24
How to find a good image-text embedding for remote sensing visual question answering?
Christel Chappuis, Sylvain Lobry, Benjamin Kellenberger, Bertrand Le Saux, Devis Tuia
arXiv_CV
arXiv_CV
Embedding
Action
VQA
QA
PDF
2021-09-21
Does Vision-and-Language Pretraining Improve Lexical Grounding?
Tian Yun, Chen Sun, Ellie Pavlick
arXiv_AI
arXiv_AI
VQA
PDF
2021-09-20
Learning Natural Language Generation from Scratch
Alice Martin Donati (X-DEP-MATHAPP), Guillaume Quispe, Charles Ollion, Sylvain Le Corff, Florian Strub, Olivier Pietquin
arXiv_AI
arXiv_AI
Reinforcement_Learning
Knowledge
Action
VQA
Language_Model
PDF
2021-09-17
ChipQA: No-Reference Video Quality Prediction via Space-Time Chips
Joshua P. Ebenezer, Zaixi Shang, Yongjun Wu, Hai Wei, Sriram Sethuraman, Alan C. Bovik
arXiv_CV
arXiv_CV
Pose
VQA
Prediction
QA
PDF
2021-09-15
Image Captioning for Effective Use of Language Models in Knowledge-Based Visual Question Answering
Ander Salaberria, Gorka Azkune, Oier Lopez de Lacalle, Aitor Soroa, Eneko Agirre
arXiv_AI
arXiv_AI
Image_Caption
Knowledge
Knowledge_Graph
Pose
VQA
Caption
Inference
Language_Model
QA
PDF
2021-09-13
Discovering the Unknown Knowns: Turning Implicit Knowledge in the Dataset into Explicit Training Examples for Visual Question Answering
Jihyung Kil, Cheng Zhang, Dong Xuan, Wei-Lun Chao
arXiv_CV
arXiv_CV
Knowledge
VQA
QA
PDF
2021-09-13
xGQA: Cross-Lingual Visual Question Answering
Jonas Pfeiffer, Gregor Geigle, Aishwarya Kamath, Jan-Martin O. Steitz, Stefan Roth, Ivan Vulić, Iryna Gurevych
arXiv_CL
arXiv_CL
Transformer
Zero-Shot
Pose
VQA
Language_Model
QA
PDF
2021-09-10
An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA
Zhengyuan Yang, Zhe Gan, Jianfeng Wang, Xiaowei Hu, Yumao Lu, Zicheng Liu, Lijuan Wang
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
Pose
VQA
Few-Shot
Caption
Prediction
QA
PDF
2021-09-10
Towards Developing a Multilingual and Code-Mixed Visual Question Answering System by Knowledge Distillation
Humair Raj Khan, Deepak Gupta, Asif Ekbal
arXiv_CL
arXiv_CL
Knowledge
Pose
Action
VQA
QA
PDF
2021-09-09
TxT: Crossmodal End-to-End Learning with Transformers
Jan-Martin O. Steitz, Jonas Pfeiffer, Iryna Gurevych, Stefan Roth
arXiv_CV
arXiv_CV
Transformer
Detection
VQA
Object_Detection
QA
PDF
2021-09-09
Weakly-Supervised Visual-Retriever-Reader for Knowledge-based Question Answering
Man Luo, Yankai Zeng, Pratyay Banerjee, Chitta Baral
arXiv_CL
arXiv_CL
Knowledge
Pose
Action
Classification
VQA
QA
PDF
2021-09-06
Improved RAMEN: Towards Domain Generalization for Visual Question Answering
Bhanuka Manesha Samarasekara Vitharana Gamage, Lim Chern Hong
arXiv_CV
arXiv_CV
Transformer
VQA
QA
PDF
2021-09-06
GeneAnnotator: A Semi-automatic Annotation Tool for Visual Scene Graph
Zhixuan Zhang, Chi Zhang, Zhenning Niu, Le Wang, Yuehu Liu
arXiv_CV
arXiv_CV
Image_Caption
Pose
Relation
VQA
Visual_Relation
Caption
QA
PDF
2021-09-04
Weakly Supervised Relative Spatial Reasoning for Visual Question Answering
Pratyay Banerjee, Tejas Gokhale, Yezhou Yang, Chitta Baral
arXiv_CV
arXiv_CV
Transformer
3D
Weakly_Supervised
Action
Classification
VQA
Few-Shot
Prediction
QA
PDF
2021-09-04
A review of Quantum Neural Networks: Methods, Models, Dilemma
Renxin Zhao, Shi Wang
arXiv_AI
arXiv_AI
Review
Face
VQA
QA
PDF
2021-09-03
Accelerating Variational Quantum Algorithms Using Circuit Concurrency
Salonik Resch, Anthony Gutierrez, Joon Suk Huh, Srikant Bharadwaj, Yasuko Eckert, Gabriel Loh, Mark Oskin, Swamit Tannu
arXiv_CV
arXiv_CV
Optimization
Detection
VQA
QA
PDF
2021-09-02
Can Error Mitigation Improve Trainability of Noisy Variational Quantum Algorithms?
Samson Wang, Piotr Czarnik, Andrew Arrasmith, M. Cerezo, Lukasz Cincio, Patrick J. Coles
arXiv_CV
arXiv_CV
VQA
QA
PDF
2021-08-28
On the Significance of Question Encoder Sequence Model in the Out-of-Distribution Performance in Visual Question Answering
Gouthaman KV, Anurag Mittal
arXiv_CV
arXiv_CV
Transformer
RNN
Pose
Relation
VQA
Attention
QA
PDF
2021-08-28
QACE: Asking Questions to Evaluate an Image Caption
Hwanhee Lee, Thomas Scialom, Seunghyun Yoon, Franck Dernoncourt, Kyomin Jung
arXiv_CV
arXiv_CV
Image_Caption
Pose
Classification
VQA
Caption
QA
PDF
2021-08-24
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Zirui Wang, Jiahui Yu, Adams Wei Yu, Zihang Dai, Yulia Tsvetkov, Yuan Cao
arXiv_CV
arXiv_CV
Image_Caption
Zero-Shot
VQA
Caption
Language_Model
QA
PDF
2021-08-24
Auto-Parsing Network for Image Captioning and Visual Question Answering
Xu Yang, Chongyang Gao, Hanwang Zhang, Jianfei Cai
arXiv_CV
arXiv_CV
Image_Caption
Transformer
Embedding
Sparse
Knowledge
Pose
VQA
Attention
Caption
Inference
PDF
2021-08-23
Adaptive shot allocation for fast convergence in variational quantum algorithms
Andi Gu, Angus Lowe, Pavel A. Dub, Patrick J. Coles, Andrew Arrasmith
arXiv_CV
arXiv_CV
Gradient_Descent
Optimization
VQA
QA
PDF
2021-08-22
External Knowledge Augmented Text Visual Question Answering
Arka Ujjal Dey, Ernest Valveny, Gaurav Harit
arXiv_CV
arXiv_CV
Transformer
OCR
Knowledge
Pose
VQA
QA
PDF
2021-08-22
StarVQA: Space-Time Attention for Video Quality Assessment
Fengchuang Xing, Yuan-Gen Wang, Hanpin Wang, Leida Li, Guopu Zhu
arXiv_CV
arXiv_CV
Transformer
Embedding
Pose
VQA
Attention
QA
PDF
2021-08-20
Localize, Group, and Select: Boosting Text-VQA by Scene Text Modeling
Xiaopeng Lu, Zhen Fan, Yansen Wang, Jean Oh, Carolyn P. Rose
arXiv_CV
arXiv_CV
Recognition
OCR
Optical_Character
Pose
Scene_Text
Relation
VQA
QA
PDF
2021-08-19
Blindly Assess Quality of In-the-Wild Videos via Quality-aware Pre-training and Motion Perception
Bowen Li, Weixia Zhang, Meng Tian, Guangtao Zhai, Xianpei Wang
arXiv_CV
arXiv_CV
Transfer_Learning
Recognition
Knowledge
Pose
Action_Recognition
Action
VQA
QA
PDF
2021-08-18
X-modaler: A Versatile and High-performance Codebase for Cross-modal Analytics
Yehao Li, Yingwei Pan, Jingwen Chen, Ting Yao, Tao Mei
arXiv_AI
arXiv_AI
Image_Caption
Video_Caption
Pose
Action
Deep_Learning
VQA
Caption
PDF
2021-08-16
Task-Oriented Multi-User Semantic Communications for Multimodal Data
Huiqiang Xie, Zhijin Qin, Geoffrey Ye Li
arXiv_CV
arXiv_CV
Pose
Relation
VQA
Attention
QA
PDF
2021-08-10
BERTHop: An Effective Vision-and-Language Model for Chest X-ray Disease Diagnosis
Masoud Monajatipoor, Mozhdeh Rouhsedaghat, Liunian Harold Li, Aichi Chien, C.-C. Jay Kuo, Fabien Scalzo, Kai-Wei Chang
arXiv_AI
arXiv_AI
Transformer
Bert
Pose
VQA
Medical
Language_Model
QA
PDF
2021-08-04
Sparse Continuous Distributions and Fenchel-Young Losses
André F. T. Martins, Marcos Treviso, António Farinhas, Pedro M. Q. Aguiar, Mário A. T. Figueiredo, Mathieu Blondel, Vlad Niculae
arXiv_AI
arXiv_AI
Sparse
Classification
VQA
Attention
Prediction
Matching
PDF
2021-07-27
Greedy Gradient Ensemble for Robust Visual Question Answering
Xinzhe Han, Shuhui Wang, Chi Su, Qingming Huang, Qi Tian
arXiv_CV
arXiv_CV
Pose
VQA
Attention
QA
PDF
2021-07-24
X-GGM: Graph Generative Modeling for Out-of-Distribution Generalization in Visual Question Answering
Jingjing Jiang, Ziyi Liu, Yifan Liu, Zhixiong Nan, Nanning Zheng
arXiv_CV
arXiv_CV
Adversarial
Pose
Relation
VQA
QA
PDF
2021-07-19
Separating Skills and Concepts for Novel Visual Question Answering
Spencer Whitehead, Hui Wu, Heng Ji, Rogerio Feris, Kate Saenko
arXiv_CV
arXiv_CV
Recognition
Pose
Contrastive_Learning
VQA
QA
PDF
2021-07-16
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
Junnan Li, Ramprasaath R. Selvaraju, Akhilesh Deepak Gotmare, Shafiq Joty, Caiming Xiong, Steven Hoi
arXiv_CV
arXiv_CV
Transformer
Represenation_Learning
Pose
Action
VQA
Attention
Inference
QA
PDF
2021-07-13
How Much Can CLIP Benefit Vision-and-Language Tasks?
Sheng Shen, Liunian Harold Li, Hao Tan, Mohit Bansal, Anna Rohrbach, Kai-Wei Chang, Zhewei Yao, Kurt Keutzer
arXiv_AI
arXiv_AI
Image_Caption
Zero-Shot
Pose
VQA
Caption
PDF
2021-07-13
Graphhopper: Multi-Hop Scene Graph Reasoning for Visual Question Answering
Rajat Koner, Hang Li, Marcel Hildebrandt, Deepan Das, Volker Tresp, Stephan Günnemann
arXiv_CV
arXiv_CV
Reinforcement_Learning
Knowledge
Knowledge_Graph
Pose
Relation
VQA
Autonomous
QA
PDF
2021-07-12
Zero-shot Visual Question Answering using Knowledge Graph
Zhuo Chen, Jiaoyan Chen, Yuxia Geng, Jeff Z. Pan, Zonggang Yuan, Huajun Chen
arXiv_AI
arXiv_AI
Zero-Shot
Knowledge
Knowledge_Graph
Pose
Action
VQA
QA
Matching
PDF
2021-07-10
DualVGR: A Dual-Visual Graph Reasoning Unit for Video Question Answering
Jianyu Wang, Bing-Kun Bao, Changsheng Xu
arXiv_AI
arXiv_AI
Pose
Relation
VQA
Attention
QA
PDF
2021-07-07
MuVAM: A Multi-View Attention-based Model for Medical Visual Question Answering
Haiwei Pan, Shuning He, Kejia Zhang, Bo Qu, Chunling Chen, Kun Shi
arXiv_CV
arXiv_CV
Pose
Classification
VQA
Attention
Medical
QA
PDF
2021-07-06
Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering
Siddharth Karamcheti, Ranjay Krishna, Li Fei-Fei, Christopher D. Manning
arXiv_AI
arXiv_AI
Recognition
Knowledge
Classification
VQA
Recommendation
PDF
2021-07-04
Cognitive Visual Commonsense Reasoning Using Dynamic Working Memory
Xuejiao Tang
arXiv_CV
arXiv_CV
Knowledge
Pose
Relation
VQA
Inference
PDF
2021-06-30
Variational secure cloud quantum computing
Yuta Shingu, Yuki Takeuchi, Suguru Endo, Shiro Kawabata, Shohei Watabe, Tetsuro Nikuni, Hideaki Hakoshima, Yuichiro Matsuzaki
arXiv_CV
arXiv_CV
Pose
VQA
QA
PDF
2021-06-28
Adventurer's Treasure Hunt: A Transparent System for Visually Grounded Compositional Visual Question Answering based on Scene Graphs
Daniel Reich, Felix Putze, Tanja Schultz
arXiv_CV
arXiv_CV
Knowledge
VQA
Inference
QA
PDF
2021-06-25
Multimodal Few-Shot Learning with Frozen Language Models
Maria Tsimpoukelli, Jacob Menick, Serkan Cabi, S.M. Ali Eslami, Oriol Vinyals, Felix Hill
arXiv_CV
arXiv_CV
Embedding
Knowledge
VQA
Few-Shot
Caption
Language_Model
PDF
2021-06-25
Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training
Hongwei Xue, Yupan Huang, Bei Liu, Houwen Peng, Jianlong Fu, Houqiang Li, Jiebo Luo
arXiv_CV
arXiv_CV
Image_Caption
Transformer
Embedding
Optimization
Knowledge
Pose
Action
Relation
VQA
Attention
Visual_Relation
QA
PDF
2021-06-25
A Picture May Be Worth a Hundred Words for Visual Question Answering
Yusuke Hirota, Noa Garcia, Mayu Otani, Chenhui Chu, Yuta Nakashima, Ittetsu Taniguchi, Takao Onoye
arXiv_CV
arXiv_CV
Image_Caption
Transformer
Pose
VQA
Language_Model
QA
PDF
2021-06-24
FOVQA: Blind Foveated Video Quality Assessment
Yize Jin, Anjul Patney, Richard Webb, Alan Bovik
arXiv_CV
arXiv_CV
VQA
Attention
QA
PDF
2021-06-24
A Transformer-based Cross-modal Fusion Model with Adversarial Training for VQA Challenge 2021
Ke-Han Lu, Bo-Han Fang, Kuan-Yu Chen
arXiv_CV
arXiv_CV
Transformer
Adversarial
Pose
VQA
QA
PDF
2021-06-24
Accelerating variational quantum algorithms with multiple quantum processors
Yuxuan Du, Yang Qian, Dacheng Tao
arXiv_CV
arXiv_CV
Optimization
VQA
QA
PDF
2021-06-24
Winner Team Mia at TextVQA Challenge 2021: Vision-and-Language Representation Learning with Pre-trained Sequence-to-Sequence Model
Yixuan Qiao, Hao Chen, Jun Wang, Yihao Chen, Xianbin Ye, Ziliang Li, Xianbiao Qi, Peng Gao, Guotong Xie
arXiv_CV
arXiv_CV
Represenation_Learning
Scene_Text
Face
VQA
Language_Model
Prediction
QA
PDF
2021-06-21
Normalized Gradient Descent for Variational Quantum Algorithms
Yudai Suzuki, Hiroshi Yano, Rudy Raymond, Naoki Yamamoto
arXiv_CV
arXiv_CV
Gradient_Descent
Optimization
Pose
VQA
QA
PDF
2021-06-19
VQA-Aid: Visual Question Answering for Post-Disaster Damage Assessment and Analysis
Argho Sarkar, Maryam Rahnemoonfar
arXiv_CV
arXiv_CV
Pose
VQA
QA
PDF
2021-06-16
Probing Image-Language Transformers for Verb Understanding
Lisa Anne Hendricks, Aida Nematzadeh
arXiv_CV
arXiv_CV
Transformer
Image_Retrieval
Speech
VQA
Caption
PDF
2021-06-15
Assessment of Subjective and Objective Quality of Live Streaming Sports Videos
Zaixi Shang, Joshua P. Ebenezer, Alan C. Bovik, Yongjun Wu, Hai Wei, Sriram Sethuraman
arXiv_CV
arXiv_CV
Pose
VQA
QA
PDF
2021-06-15
How Modular Should Neural Module Networks Be for Systematic Generalization?
Vanessa D'Amario, Tomotake Sasaki, Xavier Boix
arXiv_CV
arXiv_CV
VQA
QA
PDF
2021-06-11
NAAQA: A Neural Architecture for Acoustic Question Answering
Jerome Abdelnour, Jean Rouat, Giampiero Salvi
arXiv_CL
arXiv_CL
Pose
VQA
QA
PDF
2021-06-10
Supervising the Transfer of Reasoning Patterns in VQA
Corentin Kervadec, Christian Wolf, Grigory Antipov, Moez Baccouche, Madiha Nadri
arXiv_CV
arXiv_CV
Bert
Regularization
Knowledge
Self-Supervised
Pose
VQA
Attention
Prediction
QA
PDF
2021-06-09
Bayesian Attention Belief Networks
Shujian Zhang, Xinjie Fan, Bo Chen, Mingyuan Zhou
arXiv_CL
arXiv_CL
Optimization
Adversarial
Pose
VQA
Attention
PDF
2021-06-08
PAM: Understanding Product Images in Cross Product Category Attribute Extraction
Rongmei Lin, Xiang He, Jie Feng, Nasser Zalmout, Yan Liang, Li Xiong, Xin Luna Dong
arXiv_CV
arXiv_CV
Transformer
Recognition
OCR
Optical_Character
Knowledge
Knowledge_Graph
Pose
Action
VQA
PDF
2021-06-08
Check It Again: Progressive Visual Question Answering via Visual Entailment
Qingyi Si, Zheng Lin, Mingyu Zheng, Peng Fu, Weiping Wang
arXiv_CV
arXiv_CV
Pose
Action
Relation
VQA
QA
PDF
2021-06-08
Are VQA Systems RAD? Measuring Robustness to Augmented Data with Focused Interventions
Daniel Rosenberg, Itai Gat, Amir Feder, Roi Reichart
arXiv_CV
arXiv_CV
Pose
Deep_Learning
VQA
Prediction
QA
PDF
2021-06-07
Universal noise-precision relations in variational quantum algorithms
Kosuke Ito, Wataru Mizukami, Keisuke Fujii
arXiv_CV
arXiv_CV
Pose
Relation
VQA
QA
PDF
2021-06-04
Human-Adversarial Visual Question Answering
Sasha Sheng, Amanpreet Singh, Vedanuj Goswami, Jose Alberto Lopez Magana, Wojciech Galuba, Devi Parikh, Douwe Kiela
arXiv_CV
arXiv_CV
Adversarial
VQA
QA
PDF
2021-06-04
Visual Question Rewriting for Increasing Response Rate
Jiayi Wei, Xilian Li, Yi Zhang, Xin Wang
arXiv_AI
arXiv_AI
Transformer
Emotion
VQA
PDF
2021-06-03
Grounding Complex Navigational Instructions Using Scene Graphs
Michiel de Jong, Satyapriya Krishna, Anuva Agarwal
arXiv_CV
arXiv_CV
Reinforcement_Learning
VQA
Attention
PDF
2021-06-02
Deep Learning based Full-reference and No-reference Quality Assessment Models for Compressed UGC Videos
Wei Sun, Tao Wang, Xiongkuo Min, Fuwang Yi, Guangtao Zhai
arXiv_CV
arXiv_CV
Pose
Action
Deep_Learning
VQA
CNN
QA
PDF
2021-06-01
Adversarial VQA: A New Benchmark for Evaluating the Robustness of VQA Models
Linjie Li, Jie Lei, Zhe Gan, Jingjing Liu
arXiv_CV
arXiv_CV
Adversarial
VQA
QA
PDF
2021-05-29
Diagnosing barren plateaus with tools from quantum optimal control
Martin Larocca, Piotr Czarnik, Kunal Sharma, Gopikrishnan Muraleedharan, Patrick J. Coles, M. Cerezo
arXiv_CV
arXiv_CV
VQA
Attention
QA
PDF
2021-05-29
LPF: A Language-Prior Feedback Objective Function for De-biased Visual Question Answering
Zujie Liang, Haifeng Hu, Jiaying Zhu
arXiv_CV
arXiv_CV
Pose
VQA
QA
PDF
2021-05-24
Multi-modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-Training
Jong Hak Moon, Hyungyung Lee, Woncheol Shin, Edward Choi
arXiv_CV
arXiv_CV
Image_Caption
Transformer
Bert
Represenation_Learning
Pose
Classification
VQA
Attention
Medical
Caption
PDF
2021-05-24
StructuralLM: Structural Pre-training for Form Understanding
Chenliang Li, Bin Bi, Ming Yan, Wei Wang, Songfang Huang, Fei Huang, Luo Si
arXiv_CL
arXiv_CL
Image_Caption
Pose
Action
Classification
VQA
Image_Classification
Language_Model
PDF
2021-05-21
Variational Quantum Classifiers Through the Lens of the Hessian
Pinaki Sen, Amandeep Singh Bhatia
arXiv_CV
arXiv_CV
Gradient_Descent
Optimization
Deep_Learning
VQA
QA
PDF
2021-05-19
Multiple Meta-model Quantifying for Medical Visual Question Answering
Tuong Do, Binh X. Nguyen, Erman Tjiputra, Minh Tran, Quang D. Tran, Anh Nguyen
arXiv_CV
arXiv_CV
Transfer_Learning
Pose
VQA
Medical
QA
PDF
2021-05-18
NExT-QA:Next Phase of Question-Answering to Explaining Temporal Actions
Junbin Xiao, Xindi Shang, Angela Yao, Tat-Seng Chua
arXiv_AI
arXiv_AI
Video_Caption
Action
VQA
QA
PDF
2021-05-16
Survey of Visual-Semantic Embedding Methods for Zero-Shot Image Retrieval
Kazuya Ueki
arXiv_CV
arXiv_CV
Image_Caption
Embedding
Zero-Shot
Image_Retrieval
Survey
VQA
Caption
QA
Matching
PDF
2021-05-15
Show Why the Answer is Correct! Towards Explainable AI using Compositional Temporal Attention
Nihar Bendre, Kevin Desai, Peyman Najafirad
arXiv_CV
arXiv_CV
Pose
Deep_Learning
VQA
Attention
Autonomous
QA
PDF
2021-05-12
TextOCR: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text
Amanpreet Singh, Guan Pang, Mandy Toh, Jing Huang, Wojciech Galuba, Tal Hassner
arXiv_CV
arXiv_CV
Recognition
OCR
Optical_Character
Pose
Scene_Text
Detection
VQA
QA
PDF
2021-05-11
GANs for Medical Image Synthesis: An Empirical Study
Youssef Skandarani, Pierre-Marc Jodoin, Alain Lalande
arXiv_CV
arXiv_CV
Segmentation
Adversarial
VQA
GAN
Medical
PDF
2021-05-11
Reinforcement Learning from Reformulations in Conversational Question Answering over Knowledge Graphs
Magdalena Kaiser, Rishiraj Saha Roy, Gerhard Weikum
arXiv_CL
arXiv_CL
Reinforcement_Learning
Knowledge
Knowledge_Graph
Action
VQA
QA
PDF
2021-05-11
Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules
Aisha Urooj Khan, Hilde Kuehne, Kevin Duarte, Chuang Gan, Niels Lobo, Mubarak Shah
arXiv_CV
arXiv_CV
Weakly_Supervised
Pose
Detection
VQA
Object_Detection
Attention
QA
PDF
2021-05-11
Cross-Modal Generative Augmentation for Visual Question Answering
Zixu Wang, Yishu Miao, Lucia Specia
arXiv_CV
arXiv_CV
Pose
Relation
VQA
PDF
2021-05-05
Proposal-free One-stage Referring Expression via Grid-Word Cross-Attention
Wei Suo, Mengyang Sun, Peng Wang, Qi Wu
arXiv_CV
arXiv_CV
Transformer
VQA
Attention
PDF
2021-05-05
AdaVQA: Overcoming Language Priors with Adapted Margin Cosine Loss
Yangyang Guo, Liqiang Nie, Zhiyong Cheng, Feng Ji, Ji Zhang, Alberto Del Bimbo
arXiv_CV
arXiv_CV
Sparse
Classification
VQA
Prediction
QA
PDF
2021-05-02
A survey on VQA_Datasets and Approaches
Yeyun Zou, Qiyu Xie
arXiv_AI
arXiv_AI
Review
Pose
Survey
VQA
QA
PDF
2021-04-30
Chop Chop BERT: Visual Question Answering by Chopping VisualBERT's Heads
Chenyu Gao, Qi Zhu, Peng Wang, Qi Wu
arXiv_CV
arXiv_CV
Transformer
Bert
VQA
Attention
QA
PDF
2021-04-29
Optimal training of variational quantum algorithms without barren plateaus
Tobias Haug, M.S. Kim
arXiv_CV
arXiv_CV
Optimization
Pose
VQA
QA
PDF
2021-04-29
A First Look: Towards Explainable TextVQA Models via Visual and Textual Explanations
Varun Nagaraj Rao, Xingjian Zhen, Karen Hovsepian, Mingwei Shen
arXiv_AI
arXiv_AI
Pose
Quantitative
Deep_Learning
VQA
Prediction
QA
PDF
2021-04-26
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
Aishwarya Kamath, Mannat Singh, Yann LeCun, Ishan Misra, Gabriel Synnaeve, Nicolas Carion
arXiv_CV
arXiv_CV
Transformer
Segmentation
Pose
Detection
VQA
Few-Shot
Object_Detection
Caption
QA
PDF
2021-04-26
InfographicVQA
Minesh Mathew, Viraj Bagal, Rubèn Pérez Tito, Dimosthenis Karatzas, Ernest Valveny, C.V Jawahar
arXiv_CV
arXiv_CV
VQA
QA
PDF
2021-04-24
RelTransformer: Balancing the Visual Relationship Detection from Local Context, Scene and Memory
Jun Chen, Aniket Agarwal, Sherif Abdelkarim, Deyao Zhu, Mohamed Elhoseiny
arXiv_AI
arXiv_AI
Image_Caption
Transformer
Recognition
Pose
Detection
Relation
VQA
Visual_Relation
Caption
Prediction
QA
PDF
2021-04-23
Playing Lottery Tickets with Vision and Language
Zhe Gan, Yen-Chun Chen, Linjie Li, Tianlong Chen, Yu Cheng, Shuohang Wang, Jingjing Liu
arXiv_CV
arXiv_CV
Transformer
Bert
Adversarial
VQA
QA
Matching
PDF
2021-04-20
GraghVQA: Language-Guided Graph Neural Networks for Graph-based Visual Question Answering
Weixin Liang, Yanhao Jiang, Zixuan Liu
arXiv_CV
arXiv_CV
Pose
Relation
VQA
QA
PDF
2021-04-20
An efficient measure for the expressivity of variational quantum algorithms
Yuxuan Du, Zhuozhuo Tu, Xiao Yuan, Dacheng Tao
arXiv_CV
arXiv_CV
Quantitative
VQA
QA
PDF
2021-04-16
Cross-Modal Retrieval Augmentation for Multi-Modal Classification
Shir Gur, Natalia Neverova, Chris Stauffer, Ser-Nam Lim, Douwe Kiela, Austin Reiter
arXiv_CV
arXiv_CV
Image_Caption
Transformer
Embedding
Knowledge
Classification
VQA
Caption
Inference
QA
PDF
2021-04-16
VGNMN: Video-grounded Neural Module Network to Video-Grounded Language Tasks
Hung Le, Nancy F. Chen, Steven C.H. Hoi
arXiv_AI
arXiv_AI
Pose
Action
VQA
QA
PDF
2021-04-14
Jointly Learning Truth-Conditional Denotations and Groundings using Parallel Attention
Leon Bergen, Dzmitry Bahdanau, Timothy J. O'Donnell
arXiv_CL
arXiv_CL
VQA
Attention
PDF
2021-04-13
Neuro-Symbolic VQA: A review from the perspective of AGI desiderata
Ian Berlot-Attwell
arXiv_CV
arXiv_CV
Review
VQA
QA
PDF
2021-04-13
CLEVR_HYP: A Challenge Dataset and Baselines for Visual Question Answering with Hypothetical Actions over Images
Shailaja Keyur Sampat, Akshay Kumar, Yezhou Yang, Chitta Baral
arXiv_CV
arXiv_CV
Pose
Action
VQA
Language_Model
QA
PDF
2021-04-13
Dealing with Missing Modalities in the Visual Question Answer-Difference Prediction Task through Knowledge Distillation
Jae Won Cho, Dong-Jin Kim, Jinsoo Choi, Yunjae Jung, In So Kweon
arXiv_CV
arXiv_CV
Knowledge
VQA
Prediction
QA
PDF
2021-04-08
How Transferable are Reasoning Patterns in VQA?
Corentin Kervadec, Theo Jaunet, Grigory Antipov, Moez Baccouche, Romain Vuillemot, Christian Wolf
arXiv_CV
arXiv_CV
Transformer
Pose
VQA
Attention
QA
PDF
2021-04-07
Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question Answering
Corentin Dancette, Remi Cadene, Damien Teney, Matthieu Cord
arXiv_AI
arXiv_AI
Face
VQA
QA
PDF
2021-04-07
Multimodal Continuous Visual Attention Mechanisms
António Farinhas, André F. T. Martins, Pedro M. Q. Aguiar
arXiv_CV
arXiv_CV
Pose
VQA
Attention
QA
PDF
2021-04-05
Compressing Visual-linguistic Model via Knowledge Distillation
Zhiyuan Fang, Jianfeng Wang, Xiaowei Hu, Lijuan Wang, Yezhou Yang, Zicheng Liu
arXiv_AI
arXiv_AI
Image_Caption
Transformer
Knowledge
Pose
Detection
VQA
Object_Detection
Attention
Caption
QA
PDF
2021-04-03
MMBERT: Multimodal BERT Pretraining for Improved Medical VQA
Yash Khare, Viraj Bagal, Minesh Mathew, Adithi Devi, U Deva Priyakumar, CV Jawahar
arXiv_CV
arXiv_CV
Transformer
Bert
Self-Supervised
Pose
VQA
Attention
Medical
Caption
Language_Model
QA
PDF
2021-04-02
VisQA: X-raying Vision and Language Reasoning in Transformers
Theo Jaunet, Corentin Kervadec, Romain Vuillemot, Grigory Antipov, Moez Baccouche, Christian Wolf
arXiv_CV
arXiv_CV
Transformer
Pose
Deep_Learning
VQA
Attention
Prediction
QA
PDF
2021-04-01
UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training
Mingyang Zhou, Luowei Zhou, Shuohang Wang, Yu Cheng, Linjie Li, Zhou Yu, Jingjing Liu
arXiv_CV
arXiv_CV
Embedding
Represenation_Learning
Pose
VQA
Caption
Language_Model
Matching
PDF
2021-04-01
An Investigation of Critical Issues in Bias Mitigation Techniques
Robik Shrestha, Kushal Kafle, Christopher Kanan
arXiv_AI
arXiv_AI
Knowledge
Deep_Learning
VQA
QA
PDF
2021-03-31
Analysis on Image Set Visual Question Answering
Abhinav Khattar, Aviral Joshi, Har Simrat Singh, Pulkit Goel, Rohit Prakash Barnwal
arXiv_CV
arXiv_CV
Bert
Regularization
Adversarial
VQA
QA
PDF
2021-03-31
Classically optimal variational quantum algorithms
Jonathan Wurtz, Peter Love
arXiv_CV
arXiv_CV
Optimization
Pose
VQA
QA
PDF
2021-03-29
Domain-robust VQA with diverse datasets and methods but no target labels
Mingda Zhang, Tristan Maidment, Ahmad Diab, Adriana Kovashka, Rebecca Hwa
arXiv_CV
arXiv_CV
Transformer
Unsupervised
Recognition
Optimization
Classification
VQA
QA
PDF
2021-03-29
Variational quantum algorithms to estimate rank, quantum entropies, fidelity and Fisher information via purity minimization
Kok Chuan Tan, Tyler Volkoff
arXiv_CV
arXiv_CV
Pose
Action
VQA
QA
PDF
2021-03-29
Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers
Hila Chefer, Shir Gur, Lior Wolf
arXiv_CV
arXiv_CV
Transformer
Segmentation
Pose
Detection
VQA
Object_Detection
Attention
Prediction
PDF
2021-03-28
Quantum agents in the Gym: a variational quantum algorithm for deep Q-learning
Andrea Skolik, Sofiene Jerbi, Vedran Dunjko
arXiv_CV
arXiv_CV
Unsupervised
Reinforcement_Learning
Optimization
VQA
QA
PDF
2021-03-28
'Just because you are right, doesn't mean I am wrong': Overcoming a Bottleneck in the Development and Evaluation of Open-Ended Visual Question Answering Tasks
Man Luo, Shailaja Keyur Sampat, Riley Tallman, Yankai Zeng, Manuha Vancha, Akarshan Sajja, Chitta Baral
arXiv_CL
arXiv_CL
Pose
VQA
Language_Model
QA
PDF
2021-03-28
Encoding strongly-correlated many-boson wavefunctions on a photonic quantum computer: application to the attractive Bose-Hubbard model
Saad Yalouz, Bruno Senjean, Vedran Dunjko
arXiv_CV
arXiv_CV
Pose
VQA
QA
PDF
2021-03-26
Knowing What VQA Does Not: Pointing to Error-Inducing Regions to Improve Explanation Helpfulness
Arijit Ray, Michael Cogswell, Xiao Lin, Kamran Alipour, Ajay Divakaran, Yi Yao, Giedrius Burachas
arXiv_CV
arXiv_CV
Pose
VQA
Attention
QA
PDF
2021-03-26
On the hidden treasure of dialog in video question answering
Deniz Engin, Yannis Avrithis, Ngoc Q. K. Duong, François Schnitzler
arXiv_CV
arXiv_CV
Transformer
Video_Caption
Knowledge
VQA
Attention
Summarization
QA
PDF
2021-03-25
Visual Grounding Strategies for Text-Only Natural Language Processing
Damien Sileo
arXiv_CL
arXiv_CL
Bert
Image_Retrieval
Pose
Classification
VQA
Language_Model
PDF
2021-03-24
Structured Co-reference Graph Attention for Video-grounded Dialogue
Junyeong Kim, Sunjae Yoon, Dahyun Kim, Chang D. Yoo
arXiv_CV
arXiv_CV
Pose
VQA
Attention
QA
PDF
2021-03-23
Multi-Modal Answer Validation for Knowledge-Based VQA
Jialin Wu, Jiasen Lu, Ashish Sabharwal, Roozbeh Mottaghi
arXiv_CV
arXiv_CV
Knowledge
Pose
VQA
QA
PDF
2021-03-22
How to Design Sample and Computationally Efficient VQA Models
Karan Samel, Zelin Zhao, Binghong Chen, Kuan Wang, Robin Luo, Le Song
arXiv_CV
arXiv_CV
Pose
VQA
QA
PDF
2021-03-17
Automatic Generation of Contrast Sets from Scene Graphs: Probing the Compositional Consistency of GQA
Yonatan Bitton, Gabriel Stanovsky, Roy Schwartz, Michael Elhadad
arXiv_CV
arXiv_CV
Image_Caption
Relation
VQA
QA
PDF
2021-03-17
Scene Graphs: A Survey of Generations and Applications
Xiaojun Chang, Pengzhen Ren, Pengfei Xu, Zhihui Li, Xiaojiang Chen, Alex Hauptmann
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
Survey
Detection
Relation
VQA
Attention
Visual_Relation
Caption
QA
PDF
2021-03-16
Variational Quantum Algorithms for Euclidean Discrepancy and Covariate-Balancing
Jiří Lebl, Asif Shakeel
arXiv_CV
arXiv_CV
Optimization
VQA
QA
PDF
2021-03-13
VMAF And Variants: Towards A Unified VQA
Pankaj Topiwala, Wei Dai, Jiangfeng Pian
arXiv_CV
arXiv_CV
VQA
QA
PDF
2021-03-11
Neural Predictor based Quantum Architecture Search
Shi-Xin Zhang, Chang-Yu Hsieh, Shengyu Zhang, Hong Yao
arXiv_CV
arXiv_CV
NAS
Pose
Deep_Learning
VQA
QA
PDF
2021-03-10
RL-CSDia: Representation Learning of Computer Science Diagrams
Shaowei Wang, LingLing Zhang, Xuan Luo, Yi Yang, Xin Hu, Jun Liu
arXiv_AI
arXiv_AI
Represenation_Learning
Knowledge
Pose
Classification
Relation
VQA
PDF
2021-03-09
Select, Substitute, Search: A New Benchmark for Knowledge-Augmented Visual Question Answering
Aman Jain, Mayank Kothyari, Vishwajeet Kumar, Preethi Jyothi, Ganesh Ramakrishnan, Soumen Chakrabarti
arXiv_CV
arXiv_CV
OCR
Knowledge
Knowledge_Graph
Action
Quantitative
VQA
QA
PDF
2021-03-06
Contextual Dropout: An Efficient Sample-Dependent Dropout Module
Xinjie Fan, Shujian Zhang, Korawat Tanwisuth, Xiaoning Qian, Mingyuan Zhou
arXiv_CV
arXiv_CV
Pose
Classification
VQA
Image_Classification
Prediction
QA
PDF
2021-03-04
Visual Question Answering: which investigated applications?
Silvio Barra, Carmen Bisogni, Maria De Marsico, Stefano Ricciardi
arXiv_CV
arXiv_CV
Image_Caption
Pose
Survey
VQA
Summarization
Caption
QA
PDF
2021-03-03
Weakly-Supervised Open-Retrieval Conversational Question Answering
Chen Qu, Liu Yang, Cen Chen, W. Bruce Croft, Kalpesh Krishna, Mohit Iyyer
arXiv_CL
arXiv_CL
VQA
QA
PDF
2021-03-01
Learning Reasoning Paths over Semantic Graphs for Video-grounded Dialogues
Hung Le, Nancy F. Chen, Steven C.H. Hoi
arXiv_AI
arXiv_AI
Pose
VQA
Prediction
PDF
2021-02-21
Learning Compositional Representation for Few-shot Visual Question Answering
Dalu Guo, Dacheng Tao
arXiv_CV
arXiv_CV
Pose
Deep_Learning
VQA
Few-Shot
GAN
QA
PDF
2021-02-18
Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer
Rafał Powalski, Łukasz Borchmann, Dawid Jurkiewicz, Tomasz Dwojak, Michał Pietruszka, Gabriela Pałka
arXiv_CL
arXiv_CL
Transformer
VQA
Attention
QA
PDF
2021-02-18
SLAKE: A Semantically-Labeled Knowledge-Enhanced Dataset for Medical Visual Question Answering
Bo Liu, Li-Ming Zhan, Li Xu, Lin Ma, Yan Yang, Xiao-Ming Wu
arXiv_AI
arXiv_AI
Knowledge
VQA
Medical
QA
PDF
2021-02-17
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo, Piyush Sharma, Nan Ding, Radu Soricut
arXiv_CV
arXiv_CV
Image_Caption
Recognition
Quantitative
VQA
Caption
PDF
2021-02-17
Enhanced Magnetic Resonance Image Synthesis with Contrast-Aware Generative Adversarial Networks
Jonas Denck, Jens Guehring, Andreas Maier, Eva Rothgang
arXiv_CV
arXiv_CV
Review
Adversarial
Deep_Learning
VQA
GAN
PDF
2021-02-04
Unifying Vision-and-Language Tasks via Text Generation
Jaemin Cho, Jie Lei, Hao Tan, Mohit Bansal
arXiv_AI
arXiv_AI
Image_Caption
Pose
VQA
Text_Generation
Caption
Language_Model
PDF
2021-02-03
Answer Questions with Right Image Regions: A Visual Attention Regularization Approach
Yibing Liu, Yangyang Guo, Jianhua Yin, Xuemeng Song, Weifeng Liu, Liqiang Nie
arXiv_CV
arXiv_CV
Regularization
Pose
VQA
Attention
Prediction
QA
PDF
2021-01-31
An Empirical Study on the Generalization Power of Neural Representations Learned via Visual Guessing Games
Alessandro Suglia, Yonatan Bisk, Ioannis Konstas, Antonio Vergari, Emanuele Bastianelli, Andrea Vanzo, Oliver Lemon
arXiv_CV
arXiv_CV
Pose
VQA
QA
PDF
2021-01-30
A Quantum Interpretation of Bunched Logic for Quantum Separation Logic
Li Zhou, Gilles Barthe, Justin Hsu, Mingsheng Ying, Nengkun Yu
arXiv_CV
arXiv_CV
Pose
VQA
QA
PDF
2021-01-27
VisualMRC: Machine Reading Comprehension on Document Images
Ryota Tanaka, Kyosuke Nishida, Sen Yoshida
arXiv_CV
arXiv_CV
VQA
QA
PDF
2021-01-25
Unanswerable Questions about Images and Texts
Ernest Davis
arXiv_AI
arXiv_AI
VQA
QA
PDF
2021-01-22
Visual Question Answering based on Local-Scene-Aware Referring Expression Generation
Jung-Jun Kim, Dong-Gyu Lee, Jialin Wu, Hong-Gyu Jung, Seong-Whan Lee
arXiv_CV
arXiv_CV
Embedding
Pose
Quantitative
Relation
VQA
Attention
Prediction
QA
PDF
2021-01-18
Training variational quantum algorithms is NP-hard -- even for logarithmically many qubits and free fermionic systems
Lennart Bittel, Martin Kliesch
arXiv_CV
arXiv_CV
Optimization
Pose
VQA
QA
PDF
2021-01-17
Understanding in Artificial Intelligence
Stefan Maetschke, David Martinez Iraola, Pieter Barnard, Elaheh ShafieiBavani, Peter Zhong, Ying Xu, Antonio Jimeno Yepes
arXiv_AI
arXiv_AI
Review
Deep_Learning
VQA
PDF
2021-01-16
Latent Variable Models for Visual Question Answering
Zixu Wang, Yishu Miao, Lucia Specia
arXiv_AI
arXiv_AI
Pose
VQA
Attention
Caption
Inference
QA
PDF
2021-01-15
Reasoning over Vision and Language: Exploring the Benefits of Supplemental Knowledge
Violetta Shevchenko, Damien Teney, Anthony Dick, Anton van den Hengel
arXiv_CV
arXiv_CV
Transformer
Embedding
Knowledge
Pose
Relation
VQA
Language_Model
QA
Matching
PDF
2021-01-15
Recent Advances in Video Question Answering: A Review of Datasets and Methods
Devshree Patel, Ratnam Parikh, Yesha Shastri
arXiv_CV
arXiv_CV
Video_Caption
Knowledge
Review
Survey
VQA
Caption
QA
PDF
2021-01-14
Understanding the Role of Scene Graphs in Visual Question Answering
Vinay Damodaran, Sharanya Chakravarthy, Akshay Kumar, Anjana Umapathy, Teruko Mitamura, Yuta Nakashima, Noa Garcia, Chenhui Chu
arXiv_CV
arXiv_CV
Image_Caption
Pose
Face
VQA
QA
PDF
2021-01-12
Predicting Relative Depth between Objects from Semantic Features
Stefan Cassar, Adrian Muscat, Dylan Seychell
arXiv_AI
arXiv_AI
Recognition
3D
Classification
Detection
Relation
VQA
Visual_Relation
PDF
2021-01-05
End-to-End Video Question-Answer Generation with Generator-Pretester Network
Hung-Ting Su, Chen-Hsi Chang, Po-Wei Shen, Yu-Siang Wang, Ya-Liang Chang, Yu-Cheng Chang, Pu-Jen Cheng, Winston H. Hsu
arXiv_CV
arXiv_CV
Transfer_Learning
Pose
VQA
Caption
QA
PDF
2020-12-31
Seeing is Knowing! Fact-based Visual Question Answering using Knowledge Graph Embeddings
Kiran Ramnath, Mark Hasegawa-Johnson
arXiv_CL
arXiv_CL
Image_Caption
Embedding
Knowledge
Knowledge_Graph
Pose
VQA
Attention
Inference
QA
PDF
2020-12-29
Detecting Hate Speech in Multi-modal Memes
Abhishek Das, Japsimar Singh Wahi, Siyao Li
arXiv_CV
arXiv_CV
Image_Caption
Speech
Pose
Face
Classification
Detection
Sentiment
VQA
Object_Detection
Caption
Prediction
PDF
2020-12-29
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou
arXiv_CL
arXiv_CL
Transformer
Action
Relation
VQA
Attention
Language_Model
QA
Matching
PDF
2020-12-21
Object-Centric Diagnosis of Visual Reasoning
Jianwei Yang, Jiayuan Mao, Jiajun Wu, Devi Parikh, David D. Cox, Joshua B. Tenenbaum, Chuang Gan
arXiv_AI
arXiv_AI
Image_Caption
Relation
VQA
Attention
QA
PDF
2020-12-21
Learning content and context with language bias for Visual Question Answering
Chao Yang, Su Feng, Dongsheng Li, Huawei Shen, Guoqing Wang, Bin Jiang
arXiv_CV
arXiv_CV
Pose
VQA
QA
PDF
2020-12-20
KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA
Kenneth Marino, Xinlei Chen, Devi Parikh, Abhinav Gupta, Marcus Rohrbach
arXiv_CV
arXiv_CV
Transformer
Embedding
Unsupervised
Knowledge
Knowledge_Graph
VQA
Prediction
QA
PDF
2020-12-18
Trying Bilinear Pooling in Video-QA
Thomas Winterbottom, Sarah Xiao, Alistair McLean, Noura Al Moubayed
arXiv_CV
arXiv_CV
Pose
Action
VQA
Attention
QA
PDF
2020-12-18
On Modality Bias in the TVQA Dataset
Thomas Winterbottom, Sarah Xiao, Alistair McLean, Noura Al Moubayed
arXiv_AI
arXiv_AI
Embedding
Bert
Pose
VQA
QA
PDF
2020-12-17
Overcoming Language Priors with Self-supervised Learning for Visual Question Answering
Xi Zhu, Zhendong Mao, Chunxiao Liu, Peng Zhang, Bin Wang, Yongdong Zhang
arXiv_CV
arXiv_CV
Self-Supervised
Pose
VQA
QA
PDF
2020-12-16
Variational Quantum Algorithms
M. Cerezo, Andrew Arrasmith, Ryan Babbush, Simon C. Benjamin, Suguru Endo, Keisuke Fujii, Jarrod R. McClean, Kosuke Mitarai, Xiao Yuan, Lukasz Cincio, Patrick J. Coles
arXiv_CV
arXiv_CV
Review
Pose
VQA
QA
PDF
2020-12-14
Knowledge-Routed Visual Question Reasoning: Challenges for Deep Representation Embedding
Qingxing Cao, Bailin Li, Xiaodan Liang, Keze Wang, Liang Lin
arXiv_CV
arXiv_CV
Embedding
Knowledge
Pose
Relation
VQA
QA
PDF
2020-12-13
Variational Quantum algorithm for Poisson equation
Hailing Liu, Yusen Wu, Linchun Wan, Shijie Pan, Sujuan Qin, Fei Gao, Qiaoyan Wen
arXiv_CV
arXiv_CV
Pose
VQA
QA
PDF
2020-12-09
Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps
Qi Zhu, Chenyu Gao, Peng Wang, Qi Wu
arXiv_CV
arXiv_CV
Image_Caption
Transformer
Recognition
OCR
Optical_Character
VQA
Attention
Caption
QA
PDF
2020-12-08
TAP: Text-Aware Pre-training for Text-VQA and Text-Caption
Zhengyuan Yang, Yijuan Lu, Jianfeng Wang, Xi Yin, Dinei Florencio, Lijuan Wang, Cha Zhang, Lei Zhang, Jiebo Luo
arXiv_CV
arXiv_CV
Image_Caption
OCR
Represenation_Learning
Pose
Scene_Text
Relation
VQA
Caption
Language_Model
Prediction
QA
Matching
PDF
2020-12-08
Study on the Assessment of the Quality of Experience of Streaming Video
Aleksandr Ivchenko, Pavel Kononyuk, Alexander Dvorkovich, Liubov Antiufrieva
arXiv_CV
arXiv_CV
Pose
Relation
VQA
QA
PDF
2020-12-08
CRAFT: A Benchmark for Causal Reasoning About Forces and inTeractions
Tayfun Ates, Muhammed Samil Atesoglu, Cagatay Yigit, Ilker Kesen, Mert Kobas, Erkut Erdem, Aykut Erdem, Tilbe Goksun, Deniz Yuret
arXiv_AI
arXiv_AI
Action
Deep_Learning
VQA
PDF
2020-12-05
FloodNet: A High Resolution Aerial Imagery Dataset for Post Flood Scene Understanding
Maryam Rahnemoonfar, Tashnim Chowdhury, Argho Sarkar, Debvrat Varshney, Masoud Yari, Robin Murphy
arXiv_CV
arXiv_CV
Segmentation
Semantic_Segmentation
Pose
Classification
Deep_Learning
Detection
VQA
Object_Detection
Image_Classification
PDF
2020-12-04
Self-Supervised VQA: Answering Visual Questions using Images and Captions
Pratyay Banerjee, Tejas Gokhale, Yezhou Yang, Chitta Baral
arXiv_CV
arXiv_CV
Self-Supervised
Pose
Detection
VQA
Object_Detection
Caption
QA
PDF
2020-12-04
Understanding Guided Image Captioning Performance across Domains
Edwin G. Ng, Bo Pang, Piyush Sharma, Radu Soricut
arXiv_CV
arXiv_CV
Image_Caption
Transformer
VQA
Caption
QA
PDF
2020-12-01
Open-Ended Multi-Modal Relational Reason for Video Question Answering
Haozheng Luo, Ruiyang Qin
arXiv_AI
arXiv_AI
Action
Relation
VQA
QA
PDF
2020-12-01
Just Ask: Learning to Answer Questions from Millions of Narrated Videos
Antoine Yang, Antoine Miech, Josef Sivic, Ivan Laptev, Cordelia Schmid
arXiv_CV
arXiv_CV
Transformer
Embedding
Zero-Shot
Pose
VQA
Activity
QA
PDF
2020-11-27
Point and Ask: Incorporating Pointing into Visual Question Answering
Arjun Mani, Will Hinthorn, Nobline Yoo, Olga Russakovsky
arXiv_CV
arXiv_CV
Recognition
Gesture
VQA
QA
PDF
2020-11-27
Patch-VQ: 'Patching Up' the Video Quality Problem
Zhenqiang Ying (1), Maniratnam Mandal (1), Deepti Ghadiyaram (2), Alan Bovik (1) ((1) University of Texas at Austin, (2) Facebook AI)
arXiv_CV
arXiv_CV
Review
VQA
Prediction
QA
PDF
2020-11-26
Learning from Lexical Perturbations for Consistent Visual Question Answering
Spencer Whitehead, Hui Wu, Yi Ren Fung, Heng Ji, Rogerio Feris, Kate Saenko
arXiv_CV
arXiv_CV
Regularization
Pose
VQA
QA
PDF
2020-11-26
Transformation Driven Visual Reasoning
Xin Hong, Yanyan Lan, Liang Pang, Jiafeng Guo, Xueqi Cheng
arXiv_AI
arXiv_AI
Pose
Relation
VQA
QA
PDF
2020-11-23
Large Scale Multimodal Classification Using an Ensemble of Transformer Models and Co-Attention
Varnith Chordia, Vijay Kumar BG
arXiv_AI
arXiv_AI
Transformer
Embedding
Classification
Relation
VQA
Attention
QA
Recommendation
PDF
2020-11-23
Siamese Tracking with Lingual Object Constraints
Maximilian Filtenborg, Efstratios Gavves, Deepak Gupta
arXiv_CV
arXiv_CV
Surveillance
Tracking
Object_Tracking
Pose
VQA
Summarization
PDF
2020-11-21
LRTA: A Transparent Neural-Symbolic Reasoning Framework with Modular Supervision for Visual Question Answering
Weixin Liang, Feiyang Niu, Aishwarya Reganti, Govind Thattai, Gokhan Tur
arXiv_AI
arXiv_AI
Pose
Quantitative
Relation
VQA
Prediction
QA
PDF
2020-11-19
Logically Consistent Loss for Visual Question Answering
Anh-Cat Le-Ngo, Truyen Tran, Santu Rana, Sunil Gupta, Svetha Venkatesh
arXiv_AI
arXiv_AI
Knowledge
Pose
VQA
GAN
QA
PDF
2020-11-17
Generating Natural Questions from Images for Multimodal Assistants
Alkesh Patel, Akanksha Bindal, Hadas Kotek, Christopher Klein, Jason Williams
arXiv_AI
arXiv_AI
Knowledge
VQA
QA
PDF
2020-11-16
iPerceive: Applying Common-Sense Reasoning to Multi-Modal Dense Video Captioning and Video Question Answering
Aman Chadha, Gurneet Arora, Navpreet Kaloty
arXiv_AI
arXiv_AI
Recognition
Video_Caption
Knowledge
Speech
Pose
Relation
VQA
Attention
Caption
Activity
QA
PDF
2020-11-11
End-to-End Chinese Landscape Painting Creation Using Generative Adversarial Networks
Alice Xue
arXiv_CV
arXiv_CV
Sketch
Adversarial
Pose
VQA
GAN
PDF
2020-11-09
CapWAP: Captioning with a Purpose
Adam Fisch, Kenton Lee, Ming-Wei Chang, Jonathan H. Clark, Regina Barzilay
arXiv_CV
arXiv_CV
Image_Caption
Reinforcement_Learning
Pose
VQA
Caption
Inference
QA
PDF
2020-11-09
Unified Quality Assessment of In-the-Wild Videos with Mixed Datasets Training
Dingquan Li, Tingting Jiang, Ming Jiang
arXiv_CV
arXiv_CV
Pose
Face
VQA
QA
PDF
2020-11-07
Learning to Model and Ignore Dataset Bias with Mixed Capacity Ensembles
Christopher Clark, Mark Yatskar, Luke Zettlemoyer
arXiv_CV
arXiv_CV
Recognition
Pose
Relation
VQA
PDF
2020-11-06
Disentangling 3D Prototypical Networks For Few-Shot Concept Learning
Mihir Prabhudesai, Shamit Lal, Darshan Patil, Hsiao-Yu Tung, Adam W Harley, Katerina Fragkiadaki
arXiv_CV
arXiv_CV
3D
Self-Supervised
Pose
Classification
Detection
Relation
VQA
Few-Shot
Object_Detection
PDF
2020-11-04
An Improved Attention for Visual Question Answering
Tanzila Rahman, Shih-Han Chou, Leonid Sigal, Giuseppe Carenini
arXiv_CV
arXiv_CV
Pose
Relation
VQA
Attention
QA
PDF
2020-11-02
Reasoning Over History: Context Aware Visual Dialog
Muhammad A. Shah, Shikib Mehri, Tejas Srinivasan
arXiv_CL
arXiv_CL
Pose
VQA
Attention
QA
PDF
2020-10-30
Loss-rescaling VQA: Revisiting Language Prior Problem from a Class-imbalance View
Yangyang Guo, Liqiang Nie, Zhiyong Cheng, Qi Tian
arXiv_CV
arXiv_CV
Recognition
Sparse
Pose
Face
Classification
VQA
Face_Recognition
Image_Classification
Prediction
QA
PDF
2020-10-28
Leveraging Visual Question Answering to Improve Text-to-Image Synthesis
Stanislav Frolov, Shailza Jolly, Jörn Hees, Andreas Dengel
arXiv_CV
arXiv_CV
Image_Generation
Pose
Face
VQA
QA
PDF
2020-10-27
MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual Question Answering
Aisha Urooj Khan, Amir Mazaheri, Niels da Vitoria Lobo, Mubarak Shah
arXiv_CV
arXiv_CV
Transformer
Bert
Knowledge
Pose
VQA
QA
PDF
2020-10-26
ST-GREED: Space-Time Generalized Entropic Differences for Frame Rate Dependent Video Quality Prediction
Pavan C. Madhusudana, Neil Birkbeck, Yilin Wang, Balu Adsumilli, Alan C. Bovik
arXiv_CV
arXiv_CV
VQA
Prediction
QA
PDF
2020-10-24
RUArt: A Novel Text-Centered Solution for Text-Based Visual Question Answering
Zan-Xia Jin, Heran Wu, Chun Yang, Fang Zhou, Jingyan Qin, Lei Xiao, Xu-Cheng Yin
arXiv_AI
arXiv_AI
Recognition
OCR
Optical_Character
Pose
Relation
VQA
QA
Matching
PDF
2020-10-24
Beyond VQA: Generating Multi-word Answer and Rationale to Visual Questions
Radhika Dua, Sai Srinivas Kancheti, Vineeth N Balasubramanian
arXiv_AI
arXiv_AI
Pose
Quantitative
Classification
VQA
QA
PDF
2020-10-21
Removing Bias in Multi-modal Classifiers: Regularization by Maximizing Functional Entropies
Itai Gat, Idan Schwartz, Alexander Schwing, Tamir Hazan
arXiv_CV
arXiv_CV
Regularization
Pose
Classification
VQA
QA
PDF
2020-10-20
SOrT-ing VQA Models : Contrastive Gradient Learning for Improved Consistency
Sameer Dharur, Purva Tendulkar, Dhruv Batra, Devi Parikh, Ramprasaath R. Selvaraju
arXiv_AI
arXiv_AI
Pose
VQA
QA
PDF
2020-10-18
Hierarchical Conditional Relation Networks for Multimodal Video Question Answering
Thao Minh Le, Vuong Le, Svetha Venkatesh, Truyen Tran
arXiv_AI
arXiv_AI
Pose
Relation
VQA
QA
PDF
2020-10-17
Answer-checking in Context: A Multi-modal FullyAttention Network for Visual Question Answering
Hantao Huang, Tao Han, Wei Han, Deep Yap, Cheng-Ming Chiang
arXiv_CV
arXiv_CV
Image_Caption
Bert
Pose
Relation
VQA
Attention
QA
PDF
2020-10-16
Measurement Error Mitigation for Variational Quantum Algorithms
George S. Barron, Christopher J. Wood
arXiv_CV
arXiv_CV
Relation
VQA
QA
PDF
2020-10-16
New Ideas and Trends in Deep Multimodal Content Understanding: A Review
Wei Chen, Weiping Wang, Li Liu, Michael S. Lew
arXiv_CV
arXiv_CV
Image_Caption
Embedding
Review
Adversarial
Survey
Deep_Learning
VQA
Caption
PDF
2020-10-15
Natural Language Rationales with Full-Stack Visual Reasoning: From Pixels to Semantic Frames to Commonsense Graphs
Ana Marasović, Chandra Bhagavatula, Jae Sung Park, Ronan Le Bras, Noah A. Smith, Yejin Choi
arXiv_CV
arXiv_CV
Image_Caption
Transformer
Recognition
VQA
Attention
Language_Model
PDF
2020-10-13
Does my multimodal model learn cross-modal interactions? It's harder to tell than you might think!
Jack Hessel, Lillian Lee
arXiv_CV
arXiv_CV
Text_Classification
Pose
Action
Classification
VQA
Prediction
PDF
2020-10-13
Contrast and Classify: Alternate Training for Robust VQA
Yash Kant, Abhinav Moudgil, Dhruv Batra, Devi Parikh, Harsh Agrawal
arXiv_CV
arXiv_CV
Adversarial
Pose
Contrastive_Learning
Classification
VQA
QA
PDF
2020-10-10
Interpretable Neural Computation for Real-World Compositional Visual Question Answering
Ruixue Tang, Chao Ma
arXiv_CV
arXiv_CV
VQA
Attention
QA
PDF
2020-10-08
Characterizing Datasets for Social Visual Question Answering, and the New TinySocial Dataset
Zhanwen Chen, Shiyao Li, Roxanne Rashedi, Xiaoman Zi, Morgan Elrod-Erickson, Bryan Hollis, Angela Maliakal, Xinyu Shen, Simeng Zhao, Maithilee Kunda
arXiv_CV
arXiv_CV
VQA
QA
PDF
2020-10-06
Finding the Evidence: Localization-aware Answer Prediction for Text Visual Question Answering
Wei Han, Hantao Huang, Tao Han
arXiv_CV
arXiv_CV
Recognition
OCR
Optical_Character
Pose
VQA
Prediction
QA
PDF
2020-10-06
Pathological Visual Question Answering
Xuehai He, Zhuo Cai, Wenlan Wei, Yichen Zhang, Luntian Mou, Eric Xing, Pengtao Xie
arXiv_CV
arXiv_CV
Knowledge
Self-Supervised
Pose
VQA
Medical
QA
PDF
2020-10-05
Attention Guided Semantic Relationship Parsing for Visual Question Answering
Moshiur Farazi, Salman Khan, Nick Barnes
arXiv_AI
arXiv_AI
Pose
Action
Relation
VQA
Attention
QA
PDF
2020-10-02
CAPTION: Correction by Analyses, POS-Tagging and Interpretation of Objects using only Nouns
Leonardo Anjoletto Ferreira, Douglas De Rizzo Meneghetti, Paulo Eduardo Santos
arXiv_CV
arXiv_CV
Image_Caption
Pose
Deep_Learning
Detection
VQA
Object_Detection
Caption
PDF
2020-10-01
ISAAQ -- Mastering Textbook Questions with Pre-trained Transformers and Bottom-Up and Top-Down Attention
Jose Manuel Gomez-Perez, Raul Ortega
arXiv_AI
arXiv_AI
Transformer
Relation
VQA
Attention
Language_Model
QA
PDF
2020-09-30
Graph-based Heuristic Search for Module Selection Procedure in Neural Module Network
Yuxuan Wu, Hideki Nakayama
arXiv_AI
arXiv_AI
Reinforcement_Learning
Pose
VQA
QA
PDF
2020-09-29
Spatial Attention as an Interface for Image Captioning Models
Philipp Sadler
arXiv_CV
arXiv_CV
Image_Caption
Face
Action
Deep_Learning
VQA
Attention
Caption
PDF
2020-09-27
Hierarchical Deep Multi-modal Network for Medical Visual Question Answering
Deepak Gupta, Swati Suman, Asif Ekbal
arXiv_CV
arXiv_CV
Pose
Quantitative
VQA
Medical
Prediction
QA
PDF
2020-09-23
X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers
Jaemin Cho, Jiasen Lu, Dustin Schwenk, Hannaneh Hajishirzi, Aniruddha Kembhavi
arXiv_AI
arXiv_AI
Image_Caption
Transformer
Bert
VQA
Caption
Language_Model
PDF
2020-09-23
Multiple interaction learning with question-type prior knowledge for constraining answer search space in visual question answering
Tuong Do, Binh X. Nguyen, Huy Tran, Erman Tjiputra, Quang D. Tran, Thanh-Toan Do
arXiv_CV
arXiv_CV
Knowledge
Pose
Action
VQA
QA
PDF
2020-09-21
Regularizing Attention Networks for Anomaly Detection in Visual Question Answering
Doyup Lee, Yeongjae Cheon, Wook-Shin Han
arXiv_CV
arXiv_CV
Regularization
Pose
Detection
VQA
Attention
QA
PDF
2020-09-18
MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering
Tejas Gokhale, Pratyay Banerjee, Chitta Baral, Yezhou Yang
arXiv_CV
arXiv_CV
Knowledge
Pose
Relation
VQA
QA
PDF
2020-09-17
A Multimodal Memes Classification: A Survey and Open Research Issues
Tariq Habib Afridi, Aftab Alam, Muhammad Numan Khan, Jawad Khan, Young-Koo Lee
arXiv_AI
arXiv_AI
Image_Caption
Bert
Knowledge
Pose
Survey
Classification
VQA
Attention
Caption
QA
PDF
2020-09-17
Self-supervised pre-training and contrastive representation learning for multiple-choice video QA
Seonhoon Kim, Seohyeong Jeong, Eunbyul Kim, Inho Kang, Nojun Kwak
arXiv_CL
arXiv_CL
Represenation_Learning
Self-Supervised
Pose
Contrastive_Learning
VQA
Attention
QA
PDF
2020-09-14
From pulses to circuits and back again: A quantum optimal control perspective on variational quantum algorithms
Alicia B. Magann, Christian Arenz, Matthew D. Grace, Tak-San Ho, Robert L. Kosut, Jarrod R. McClean, Herschel A. Rabitz, Mohan Sarovar
arXiv_CV
arXiv_CV
Optimization
Pose
Action
VQA
QA
PDF
2020-09-07
Deep Local and Global Spatiotemporal Feature Aggregation for Blind Video Quality Assessment
Wei Zhou, Zhibo Chen
arXiv_CV
arXiv_CV
Pose
Deep_Learning
VQA
CNN
QA
PDF
2020-09-03
A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and Reports
Yikuan Li, Hanyin Wang, Yuan Luo
arXiv_CV
arXiv_CV
Embedding
Bert
Represenation_Learning
RNN
Classification
VQA
Attention
Medical
Language_Model
PDF
2020-08-31
Cross-modal Knowledge Reasoning for Knowledge-based Visual Question Answering
Jing Yu, Zihao Zhu, Yujing Wang, Weifeng Zhang, Yue Hu, Jianlong Tan
arXiv_AI
arXiv_AI
Knowledge
Knowledge_Graph
Pose
VQA
QA
PDF
2020-08-28
A Dataset and Baselines for Visual Question Answering on Art
Noa Garcia, Chentao Ye, Zihua Liu, Qingtao Hu, Mayu Otani, Chenhui Chu, Yuta Nakashima, Teruko Mitamura
arXiv_CV
arXiv_CV
Knowledge
VQA
QA
PDF
2020-08-27
Visual Question Answering on Image Sets
Ankan Bansal, Yuting Zhang, Rama Chellappa
arXiv_CV
arXiv_CV
Relation
VQA
QA
PDF
2020-08-22
Data augmentation techniques for the Video Question Answering task
Alex Falcon, Oswald Lanz, Giuseppe Serra
arXiv_CV
arXiv_CV
Pose
Action
VQA
QA
PDF
2020-08-20
Document Visual Question Answering Challenge 2020
Minesh Mathew, Ruben Tito, Dimosthenis Karatzas, R. Manmatha, C.V. Jawahar
arXiv_CV
arXiv_CV
Pose
Deep_Learning
VQA
GAN
PDF
2020-08-18
Linguistically-aware Attention for Reducing the Semantic-Gap in Vision-Language Tasks
Gouthaman KV, Athira Nambiar, Kancheti Sai Srinivas, Anurag Mittal
arXiv_CV
arXiv_CV
Image_Caption
Pose
Detection
Relation
VQA
Object_Detection
Attention
Caption
Language_Model
QA
PDF
2020-08-16
DeVLBert: Learning Deconfounded Visio-Linguistic Representations
Shengyu Zhang, Tan Jiang, Tan Wang, Kun Kuang, Zhou Zhao, Jianke Zhu, Jin Yu, Hongxia Yang, Fei Wu
arXiv_CV
arXiv_CV
Bert
Zero-Shot
Image_Retrieval
Pose
Quantitative
Relation
VQA
PDF
2020-08-15
Graph Edit Distance Reward: Learning to Edit Scene Graph
Lichang Chen, Guosheng Lin, Shijie Wang, Qingyao Wu
arXiv_CV
arXiv_CV
Image_Retrieval
Pose
VQA
QA
Matching
PDF
2020-08-08
Assisting Scene Graph Generation with Self-Supervision
Sandeep Inuganti, Vineeth N Balasubramanian
arXiv_CV
arXiv_CV
Image_Caption
Pose
Action
Detection
Relation
VQA
Object_Detection
Caption
PDF
2020-08-03
Quantum-optimal-control-inspired ansatz for variational quantum algorithms
Alexandre Choquette, Agustin Di Paolo, Panagiotis Kl. Barkoutsos, David Sénéchal, Ivano Tavernelli, Alexandre Blais
arXiv_CV
arXiv_CV
VQA
QA
PDF
2020-07-31
No-Reference Video Quality Assessment Using Space-Time Chips
Joshua P. Ebenezer, Zaixi Shang, Yongjun Wu, Hai Wei, Alan C. Bovik
arXiv_CV
arXiv_CV
Pose
Relation
VQA
QA
PDF
2020-07-28
Noise-Induced Barren Plateaus in Variational Quantum Algorithms
Samson Wang, Enrico Fontana, M. Cerezo, Kunal Sharma, Akira Sone, Lukasz Cincio, Patrick J. Coles
arXiv_CV
arXiv_CV
VQA
QA
PDF
2020-07-27
REXUP: I REason, I EXtract, I UPdate with Structured Compositional Reasoning for Visual Question Answering
Siwen Luo, Soyeon Caren Han, Kaiyuan Sun, Josiah Poon
arXiv_AI
arXiv_AI
Pose
Action
Quantitative
Relation
VQA
Attention
QA
PDF
2020-07-26
Contrastive Visual-Linguistic Pretraining
Lei Shi, Kai Shuang, Shijie Geng, Peng Su, Zhengkai Jiang, Peng Gao, Zuohui Fu, Gerard de Melo, Sen Su
arXiv_CV
arXiv_CV
Bert
Represenation_Learning
Self-Supervised
Pose
Contrastive_Learning
Classification
VQA
QA
PDF
2020-07-24
Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data
Michael Cogswell, Jiasen Lu, Rishabh Jain, Stefan Lee, Devi Parikh, Dhruv Batra
arXiv_AI
arXiv_AI
VQA
QA
PDF
2020-07-23
Spatially Aware Multimodal Transformers for TextVQA
Yash Kant, Dhruv Batra, Peter Anderson, Alex Schwing, Devi Parikh, Jiasen Lu, Harsh Agrawal
arXiv_CV
arXiv_CV
Transformer
OCR
Pose
Relation
VQA
Attention
QA
PDF
2020-07-19
Semantic Equivalent Adversarial Data Augmentation for Visual Question Answering
Ruixue Tang, Chao Ma, Wei Emma Zhang, Qi Wu, Xiaokang Yang
arXiv_CV
arXiv_CV
Adversarial
VQA
QA
PDF
2020-07-17
Learning to Discretely Compose Reasoning Module Networks for Video Captioning
Ganchao Tan, Daqing Liu, Meng Wang, Zheng-Jun Zha
arXiv_CV
arXiv_CV
Video_Caption
Pose
VQA
Caption
PDF
2020-07-17
Knowledge-Based Video Question Answering with Unsupervised Scene Descriptions
Noa Garcia, Yuta Nakashima
arXiv_CV
arXiv_CV
Transformer
Unsupervised
Weakly_Supervised
Knowledge
Action
VQA
QA
PDF
2020-07-13
Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder
Gouthaman KV, Anurag Mittal
arXiv_CV
arXiv_CV
Pose
VQA
QA
PDF
2020-07-12
Applying recent advances in Visual Question Answering to Record Linkage
Marko Smilevski
arXiv_AI
arXiv_AI
Pose
Deep_Learning
Detection
VQA
Attention
CNN
Matching
PDF
2020-07-10
Image Captioning with Compositional Neural Module Networks
Junjiao Tian, Jean Oh
arXiv_CV
arXiv_CV
Image_Caption
Pose
VQA
Caption
PDF
2020-07-08
IQ-VQA: Intelligent Visual Question Answering
Vatsal Goel, Mohit Chandak, Ashish Anand, Prithwijit Guha
arXiv_CV
arXiv_CV
Pose
Quantitative
VQA
Attention
QA
PDF
2020-07-04
Modality Shifting Attention Network for Multi-modal Video Question Answering
Junyeong Kim, Minuk Ma, Trung Pham, Kyungsu Kim, Chang D. Yoo
arXiv_CV
arXiv_CV
Pose
VQA
Attention
Prediction
QA
PDF
2020-07-03
Eliminating Catastrophic Interference with Biased Competition
Amelia Elizabeth Pollard, Jonathan L. Shapiro
arXiv_CV
arXiv_CV
Pose
Action
VQA
Attention
QA
PDF
2020-07-03
Visual Question Answering as a Multi-Task Problem
Amelia Elizabeth Pollard, Jonathan L. Shapiro
arXiv_AI
arXiv_AI
VQA
QA
PDF
2020-07-02
Scene Graph Reasoning for Visual Question Answering
Marcel Hildebrandt, Hang Li, Rajat Koner, Volker Tresp, Stephan Günnemann
arXiv_CV
arXiv_CV
Pose
Relation
VQA
Autonomous
QA
PDF
2020-07-02
The Impact of Explanations on AI Competency Prediction in VQA
Kamran Alipour, Arijit Ray, Xiao Lin, Jurgen P. Schulze, Yi Yao, Giedrius T. Burachas
arXiv_AI
arXiv_AI
Bert
Pose
Relation
VQA
Language_Model
Prediction
QA
PDF
2020-07-01
DocVQA: A Dataset for VQA on Document Images
Minesh Mathew, Dimosthenis Karatzas, R. Manmatha, C.V. Jawahar
arXiv_CV
arXiv_CV
VQA
QA
PDF
2020-06-30
Ontology-guided Semantic Composition for Zero-Shot Learning
Jiaoyan Chen, Freddy Lecue, Yuxia Geng, Jeff Z. Pan, Huajun Chen
arXiv_AI
arXiv_AI
Embedding
Zero-Shot
Pose
Ontology
Classification
Relation
VQA
Image_Classification
PDF
2020-06-28
Improving VQA and its Explanations by Comparing Competing Explanations
Jialin Wu, Liyan Chen, Raymond J. Mooney
arXiv_CV
arXiv_CV
Knowledge
VQA
QA
PDF
2020-06-26
Graph Optimal Transport for Cross-Domain Alignment
Liqun Chen, Zhe Gan, Yu Cheng, Linjie Li, Lawrence Carin, Jingjing Liu
arXiv_CV
arXiv_CV
Image_Caption
Sparse
Pose
VQA
Attention
Summarization
Caption
Matching
PDF
2020-06-25
Self-Segregating and Coordinated-Segregating Transformer for Focused Deep Multi-Modular Network for Visual Question Answering
Chiranjib Sur
arXiv_CV
arXiv_CV
Transformer
Enhancement
VQA
Attention
QA
PDF
2020-06-20
Neuro-Symbolic Visual Reasoning: Disentangling 'Visual' from 'Reasoning'
Saeed Amizadeh, Hamid Palangi, Oleksandr Polozov, Yichen Huang, Kazuhito Koishida
arXiv_AI
arXiv_AI
Represenation_Learning
Pose
VQA
QA
PDF
2020-06-16
Mucko: Multi-Layer Cross-Modal Knowledge Reasoning for Fact-based VisualQuestion Answering
Zihao Zhu, Jing Yu, Yujing Wang, Yajing Sun, Yue Hu, Qi Wu
arXiv_AI
arXiv_AI
Knowledge
Pose
VQA
CNN
QA
PDF
2020-06-15
ORD: Object Relationship Discovery for Visual Dialogue Generation
Ziwei Wang, Zi Huang, Yadan Luo, Huimin Lu
arXiv_CV
arXiv_CV
Image_Caption
Embedding
Pose
Action
Relation
VQA
Attention
Visual_Relation
Caption
CNN
PDF
2020-06-12
Sparse and Continuous Attention Mechanisms
André F. T. Martins, Marcos Treviso, António Farinhas, Vlad Niculae, Mário A. T. Figueiredo, Pedro M. Q. Aguiar
arXiv_CV
arXiv_CV
Text_Classification
Sparse
Classification
VQA
Attention
PDF
2020-06-12
Video Understanding as Machine Translation
Bruno Korbar, Fabio Petroni, Rohit Girdhar, Lorenzo Torresani
arXiv_CV
arXiv_CV
Video_Caption
Speech
Self-Supervised
Pose
Classification
VQA
Caption
QA
Video_Classification
PDF
2020-06-11
Closed Loop Neural-Symbolic Learning via Integrating Neural Perception, Grammar Parsing, and Symbolic Reasoning
Qing Li, Siyuan Huang, Yining Hong, Yixin Chen, Ying Nian Wu, Song-Chun Zhu
arXiv_AI
arXiv_AI
Recognition
Reinforcement_Learning
Sparse
Pose
VQA
PDF
2020-06-11
Exploring Weaknesses of VQA Models through Attribution Driven Insights
Shaunak Halbe
arXiv_CV
arXiv_CV
Adversarial
VQA
Prediction
QA
PDF
2020-06-11
Large-Scale Adversarial Training for Vision-and-Language Representation Learning
Zhe Gan, Yen-Chun Chen, Linjie Li, Chen Zhu, Yu Cheng, Jingjing Liu
arXiv_CV
arXiv_CV
Embedding
Represenation_Learning
Regularization
Adversarial
Pose
VQA
PDF
2020-06-10
Estimating semantic structure for the VQA answer space
Corentin Kervadec (imagine), Grigory Antipov, Moez Baccouche, Christian Wolf (imagine)
arXiv_CV
arXiv_CV
Pose
Classification
VQA
QA
PDF
2020-06-09
Roses Are Red, Violets Are Blue... but Should Vqa Expect Them To?
Corentin Kervadec (LIRIS), Grigory Antipov, Moez Baccouche (imagine), Christian Wolf (imagine)
arXiv_CV
arXiv_CV
VQA
GAN
Prediction
QA
PDF
2020-06-08
Counterfactual VQA: A Cause-Effect Look at Language Bias
Yulei Niu, Kaihua Tang, Hanwang Zhang, Zhiwu Lu, Xian-Sheng Hua, Ji-Rong Wen
arXiv_CV
arXiv_CV
Knowledge
Pose
VQA
Inference
QA
PDF
2020-06-01
Multimodal grid features and cell pointers for Scene Text Visual Question Answering
Lluís Gómez, Ali Furkan Biten, Rubèn Tito, Andrés Mafla, Dimosthenis Karatzas
arXiv_CV
arXiv_CV
Pose
Scene_Text
VQA
Attention
QA
PDF
2020-06-01
Structured Multimodal Attentions for TextVQA
Chenyu Gao, Qi Zhu, Peng Wang, Hui Li, Yuliang Liu, Anton van den Hengel, Qi Wu
arXiv_CV
arXiv_CV
Recognition
OCR
Optical_Character
Pose
Relation
VQA
Attention
QA
PDF
2020-05-29
UGC-VQA: Benchmarking Blind Video Quality Assessment for User Generated Content
Zhengzhong Tu, Yilin Wang, Neil Birkbeck, Balu Adsumilli, Alan C. Bovik
arXiv_CV
arXiv_CV
Deep_Learning
VQA
Prediction
QA
PDF
2020-05-22
Open-Retrieval Conversational Question Answering
Chen Qu, Liu Yang, Cen Chen, Minghui Qiu, W. Bruce Croft, Mohit Iyyer
arXiv_CL
arXiv_CL
Transformer
Regularization
VQA
QA
PDF
2020-05-21
Fluent Response Generation for Conversational Question Answering
Ashutosh Baheti, Alan Ritter, Kevin Small
arXiv_CL
arXiv_CL
Bert
Pose
Action
VQA
QA
PDF
2020-05-19
On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law
Damien Teney, Kushal Kafle, Robik Shrestha, Ehsan Abbasnejad, Christopher Kanan, Anton van den Hengel
arXiv_CV
arXiv_CV
Knowledge
VQA
QA
PDF
2020-05-18
User-generated Video Quality Assessment: A Subjective and Objective Study
Yang Li, Shengbin Meng, Xinfeng Zhang, Shiqi Wang, Yue Wang, Siwei Ma
arXiv_CV
arXiv_CV
Optimization
Pose
VQA
QA
PDF
2020-05-16
Visual Relationship Detection using Scene Graphs: A Survey
Aniket Agarwal, Ayush Mangal, Vipul
arXiv_CV
arXiv_CV
Image_Caption
Segmentation
Recognition
Image_Retrieval
Survey
Deep_Learning
Detection
Relation
VQA
Visual_Relation
Caption
PDF
2020-05-15
C3VQG: Category Consistent Cyclic Visual Question Generation
Shagun Uppal, Anish Madan, Sarthak Bhagat, Yi Yu, Rajiv Ratn Shah
arXiv_CV
arXiv_CV
Pose
Quantitative
VQA
Inference
Prediction
PDF
2020-05-13
Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQA
Hyounghun Kim, Zineng Tang, Mohit Bansal
arXiv_CV
arXiv_CV
Image_Caption
Salient
Pose
Action
Classification
Relation
VQA
Attention
Caption
QA
Matching
PDF
2020-05-12
Cross-Modality Relevance for Reasoning on Language and Vision
Chen Zheng, Quan Guo, Parisa Kordjamshidi
arXiv_CL
arXiv_CL
Pose
Relation
VQA
QA
PDF
2020-05-09
Character Matters: Video Story Understanding with Character-Aware Relations
Shijie Geng, Ji Zhang, Zuohui Fu, Peng Gao, Hang Zhang, Gerard de Melo
arXiv_CV
arXiv_CV
Transformer
Pose
Face
Relation
VQA
QA
Matching
PDF
2020-05-07
COBRA: Contrastive Bi-Modal Representation Algorithm
Vishaal Udandarao, Abhishek Maiti, Deepak Srivatsav, Suryatej Reddy Vyalla, Yifang Yin, Rajiv Ratn Shah
arXiv_CV
arXiv_CV
Image_Caption
Embedding
Relation
VQA
Caption
PDF
2020-05-04
Visual Question Answering with Prior Class Semantics
Violetta Shevchenko, Damien Teney, Anthony Dick, Anton van den Hengel
arXiv_CV
arXiv_CV
Embedding
Knowledge
Relation
VQA
Prediction
QA
PDF
2020-05-01
Diverse Visuo-Lingustic Question Answering Challenge
Shailaja Sampat, Yezhou Yang, Chitta Baral
arXiv_AI
arXiv_AI
Deep_Learning
VQA
Inference
QA
PDF
2020-04-30
Dynamic Language Binding in Relational Visual Reasoning
Thao Minh Le, Vuong Le, Svetha Venkatesh, Truyen Tran
arXiv_CV
arXiv_CV
Knowledge
Pose
Relation
VQA
Attention
QA
PDF
2020-04-29
Pragmatic Issue-Sensitive Image Captioning
Allen Nie, Reuben Cohn-Gordon, Christopher Potts
arXiv_CV
arXiv_CV
Image_Caption
Speech
Pose
VQA
Caption
PDF
2020-04-27
A Novel Attention-based Aggregation Function to Combine Vision and Language
Matteo Stefanini, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara
arXiv_CV
arXiv_CV
Image_Caption
Pose
Classification
VQA
Attention
Caption
QA
Matching
PDF
2020-04-25
Deep Multimodal Neural Architecture Search
Zhou Yu, Yuhao Cui, Jun Yu, Meng Wang, Dacheng Tao, Qi Tian
arXiv_CV
arXiv_CV
NAS
VQA
Matching
PDF
2020-04-24
Revisiting Modulated Convolutions for Visual Counting and Beyond
Duy-Kien Nguyen, Vedanuj Goswami, Xinlei Chen
arXiv_CV
arXiv_CV
Pose
VQA
CNN
Inference
QA
PDF
2020-04-23
Visual Question Answering Using Semantic Information from Image Descriptions
Tasmia Tasrin, Md Sultan Al Nahian, Brent Harrison
arXiv_AI
arXiv_AI
Image_Caption
Knowledge
Pose
VQA
Attention
QA
PDF
2020-04-20
Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision
Damien Teney, Ehsan Abbasnedjad, Anton van den Hengel
arXiv_CV
arXiv_CV
Pose
Classification
Deep_Learning
Relation
Sentiment
VQA
Image_Classification
Inference
PDF
2020-04-19
Are we pretraining it right? Digging deeper into visio-linguistic pretraining
Amanpreet Singh, Vedanuj Goswami, Devi Parikh
arXiv_CV
arXiv_CV
Pose
VQA
Attention
Caption
QA
PDF
2020-04-17
Knowledge-Based Visual Question Answering in Videos
Noa Garcia, Mayu Otani, Chenhui Chu, Yuta Nakashima
arXiv_CV
arXiv_CV
Video_Caption
Knowledge
Pose
VQA
QA
PDF
2020-04-12
A negative case analysis of visual grounding methods for VQA
Robik Shrestha, Kushal Kafle, Christopher Kanan
arXiv_AI
arXiv_AI
Regularization
Pose
Relation
VQA
Attention
QA
PDF
2020-04-12
Which visual questions are difficult to answer? Analysis with Entropy of Answer Distributions
Kento Terao, Toru Tamaki, Bisser Raytchev, Kazufumi Kaneda, Shun'ichi Satoh
arXiv_CV
arXiv_CV
Pose
VQA
QA
PDF
2020-04-12
YouMakeup VQA Challenge: Towards Fine-grained Action Understanding in Domain-Specific Videos
Shizhe Chen, Weiying Wang, Ludan Ruan, Linli Yao, Qin Jin
arXiv_CV
arXiv_CV
Pose
Action
VQA
QA
PDF
2020-04-10
Rephrasing visual questions by specifying the entropy of the answer distribution
Kento Terao, Toru Tamaki, Bisser Raytchev, Kazufumi Kaneda, Shun'ichi Satoh
arXiv_CV
arXiv_CV
Pose
VQA
QA
PDF
2020-04-08
Understanding Knowledge Gaps in Visual Question Answering: Implications for Gap Identification and Testing
Goonmeet Bajaj, Bortik Bandyopadhyay, Daniel Schmidt, Pranav Maneriker, Christopher Myers, Srinivasan Parthasarathy
arXiv_AI
arXiv_AI
Knowledge
VQA
QA
PDF
2020-04-06
Predicting the Quality of Compressed Videos with Pre-Existing Distortions
Xiangxu Yu, Neil Birkbeck, Yilin Wang, Christos G. Bampis, Balu Adsumilli, Alan C. Bovik
arXiv_CV
arXiv_CV
VQA
QA
PDF
2020-04-04
Generating Rationales in Visual Question Answering
Hammad A. Ayyubi, Md. Mehrab Tanjim, Julian J. McAuley, Garrison W. Cottrell
arXiv_AI
arXiv_AI
Bert
Quantitative
VQA
QA
PDF
2020-04-04
Evaluating Multimodal Representations on Visual Semantic Textual Similarity
Oier Lopez de Lacalle, Ander Salaberria, Aitor Soroa, Gorka Azkune, Eneko Agirre
arXiv_CL
arXiv_CL
Image_Caption
VQA
Caption
Inference
PDF
2020-04-02
Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers
Zhicheng Huang, Zhaoyang Zeng, Bei Liu, Dongmei Fu, Jianlong Fu
arXiv_CV
arXiv_CV
Transformer
Embedding
Bert
Pose
VQA
Language_Model
QA
Matching
PDF
2020-03-31
X-Linear Attention Networks for Image Captioning
Yingwei Pan, Ting Yao, Yehao Li, Tao Mei
arXiv_CV
arXiv_CV
Image_Caption
Transformer
Recognition
Action
VQA
Attention
Caption
PDF
2020-03-31
Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text
Difei Gao, Ke Li, Ruiping Wang, Shiguang Shan, Xilin Chen
arXiv_CV
arXiv_CV
Embedding
Pose
Scene_Text
VQA
QA
PDF
2020-03-27
Assessing Image Quality Issues for Real-World Problem
Tai-Yin Chiu, Yinan Zhao, Danna Gurari
arXiv_CV
arXiv_CV
Image_Caption
VQA
Caption
PDF
2020-03-26
P $approx$ NP, at least in Visual Question Answering
Shailza Jolly, Sebastian Palacio, Joachim Folz, Federico Raue, Jorn Hees, Andreas Dengel
arXiv_CV
arXiv_CV
Embedding
VQA
QA
PDF
2020-03-23
Linguistically Driven Graph Capsule Network for Visual Question Reasoning
Qingxing Cao, Xiaodan Liang, Keze Wang, Liang Lin
arXiv_CV
arXiv_CV
Embedding
Pose
VQA
CNN
Inference
QA
PDF
2020-03-22
Visual Question Answering for Cultural Heritage
Pietro Bongini, Federico Becattini, Andrew D. Bagdanov, Alberto Del Bimbo
arXiv_CV
arXiv_CV
Knowledge
Pose
Action
VQA
QA
PDF
2020-03-19
Normalized and Geometry-Aware Self-Attention Network for Image Captioning
Longteng Guo, Jing Liu, Xinxin Zhu, Peng Yao, Shichen Lu, Hanqing Lu
arXiv_CV
arXiv_CV
Image_Caption
Transformer
Video_Caption
Pose
Relation
VQA
Attention
Caption
PDF
2020-03-16
RSVQA: Visual Question Answering for Remote Sensing Data
Sylvain Lobry, Diego Marcos, Jesse Murray, Devis Tuia
arXiv_CV
arXiv_CV
RNN
Knowledge
Pose
Classification
Detection
Relation
VQA
CNN
QA
PDF
2020-03-16
Towards Ground Truth Evaluation of Visual Explanations
Ahmed Osman, Leila Arras, Wojciech Samek
arXiv_AI
arXiv_AI
3D
Salient
Pose
Relation
VQA
PDF
2020-03-14
Counterfactual Samples Synthesizing for Robust Visual Question Answering
Long Chen, Xin Yan, Jun Xiao, Hanwang Zhang, Shiliang Pu, Yueting Zhuang
arXiv_CV
arXiv_CV
Pose
Relation
VQA
QA
PDF
2020-03-07
PathVQA: 30000+ Questions for Medical Visual Question Answering
Xuehai He, Yichen Zhang, Luntian Mou, Eric Xing, Pengtao Xie
arXiv_AI
arXiv_AI
Knowledge
VQA
Medical
Caption
QA
PDF
2020-03-03
XGPT: Cross-modal Generative Pre-Training for Image Captioning
Qiaolin Xia, Haoyang Huang, Nan Duan, Dongdong Zhang, Lei Ji, Zhifang Sui, Edward Cui, Taroon Bharti, Ming Zhou
arXiv_CV
arXiv_CV
Image_Caption
Bert
Image_Retrieval
Pose
VQA
Denoising
Caption
Language_Model
QA
PDF
2020-03-02
A Question-Centric Model for Visual Question Answering in Medical Imaging
Minh H. Vu, Tommy Löfstedt, Tufve Nyholm, Raphael Sznitman
arXiv_CV
arXiv_CV
Pose
Deep_Learning
VQA
Medical
Prediction
PDF
2020-03-01
A Study on Multimodal and Interactive Explanations for Visual Question Answering
Kamran Alipour, Jurgen P. Schulze, Yi Yao, Avi Ziskind, Giedrius Burachas
arXiv_AI
arXiv_AI
VQA
Attention
Prediction
QA
PDF
2020-02-27
Visual Commonsense R-CNN
Tan Wang, Jianqiang Huang, Hanwang Zhang, Qianru Sun
arXiv_CV
arXiv_CV
Image_Caption
Unsupervised
Represenation_Learning
Knowledge
VQA
Caption
CNN
Language_Model
Prediction
QA
PDF
2020-02-27
Unbiased Scene Graph Generation from Biased Training
Kaihua Tang, Yulei Niu, Jianqiang Huang, Jiaxin Shi, Hanwang Zhang
arXiv_CV
arXiv_CV
Pose
VQA
Inference
Prediction
QA
PDF
2020-02-27
Unshuffling Data for Improved Generalization
Damien Teney, Ehsan Abbasnejad, Anton van den Hengel
arXiv_CV
arXiv_CV
Unsupervised
Knowledge
Relation
VQA
QA
PDF
2020-02-25
BERT Can See Out of the Box: On the Cross-modal Transferability of Text Representations
Thomas Scialom, Patrick Bordes, Paul-Alexis Dray, Jacopo Staiano, Patrick Gallinari
arXiv_CV
arXiv_CV
Bert
Zero-Shot
Pose
Action
VQA
Text_Generation
Language_Model
PDF
2020-02-25
A Comparative Evaluation of Temporal Pooling Methods for Blind Video Quality Assessment
Zhengzhong Tu, Chia-Ju Chen, Li-Heng Chen, Neil Birkbeck, Balu Adsumilli, Alan C. Bovik
arXiv_CV
arXiv_CV
Pose
VQA
Attention
Prediction
QA
PDF
2020-02-24
On the General Value of Evidence, and Bilingual Scene-Text Visual Question Answering
Xinyu Wang, Yuliang Liu, Chunhua Shen, Chun Chet Ng, Canjie Luo, Lianwen Jin, Chee Seng Chan, Anton van den Hengel, Liangwei Wang
arXiv_CV
arXiv_CV
Relation
VQA
QA
PDF
2020-02-19
VQA-LOL: Visual Question Answering under the Lens of Logic
Tejas Gokhale, Pratyay Banerjee, Chitta Baral, Yezhou Yang
arXiv_CV
arXiv_CV
Embedding
Pose
VQA
QA
PDF
2020-02-17
CQ-VQA: Visual Question Answering on Categorized Questions
Aakansha Mishra, Ashish Anand, Prithwijit Guha
arXiv_CV
arXiv_CV
Pose
VQA
QA
PDF
2020-02-13
Sparse and Structured Visual Attention
Pedro Henrique Martins, Vlad Niculae, Zita Marinho, André Martins
arXiv_CV
arXiv_CV
Image_Caption
Transformer
Sparse
RNN
Pose
VQA
Attention
Caption
QA
PDF
2020-02-13
EndoL2H: Deep Super-Resolution for Capsule Endoscopy
Yasin Almalioglu, Abdulkadir Gokce, Kagan Incetan, Muhammed Ali Simsek, Kivanc Ararat, Richard J. Chen, Nichalos J. Durr, Faisal Mahmood, Mehmet Turan
arXiv_CV
arXiv_CV
Super_Resolution
Adversarial
Pose
Quantitative
Detection
VQA
Attention
GAN
PDF
2020-02-12
Component Analysis for Visual Question Answering Architectures
Camila Kolling, Jônatas Wehrmann, Rodrigo C. Barros
arXiv_AI
arXiv_AI
Represenation_Learning
VQA
Attention
QA
PDF
2020-02-04
Cost-Function-Dependent Barren Plateaus in Shallow Quantum Neural Networks
M. Cerezo, Akira Sone, Tyler Volkoff, Lukasz Cincio, Patrick J. Coles
arXiv_CV
arXiv_CV
Pose
VQA
QA
PDF
2020-01-31
Augmenting Visual Question Answering with Semantic Frame Information in a Multitask Learning Approach
Mehrdad Alizadeh, Barbara Di Eugenio
arXiv_CV
arXiv_CV
RNN
Pose
Classification
VQA
QA
PDF
2020-01-23
Uncertainty based Class Activation Maps for Visual Question Answering
Badri N. Patro, Mayank Lunayach, Vinay P. Namboodiri
arXiv_CV
arXiv_CV
Pose
Deep_Learning
Relation
VQA
Attention
PDF
2020-01-23
Deep Bayesian Network for Visual Question Generation
Badri N. Patro, Vinod K. Kurmi, Sandeep Kumar, Vinay P. Namboodiri
arXiv_CV
arXiv_CV
Pose
Quantitative
VQA
Caption
PDF
2020-01-23
Robust Explanations for Visual Question Answering
Badri N. Patro, Shivansh Pate, Vinay P. Namboodiri
arXiv_CV
arXiv_CV