Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval

2021-01-02 11:52:20

Omar Khattab, Christopher Potts, Matei Zaharia

arXiv_CL

Abstract
Abstract (translated)
URL
PDF

Abstract

Multi-hop reasoning (i.e., reasoning across two or more documents) at scale is a key step toward NLP models that can exhibit broad world knowledge by leveraging large collections of documents. We propose Baleen, a system that improves the robustness and scalability of multi-hop reasoning over current approaches. Baleen introduces a per-hop condensed retrieval pipeline to mitigate the size of the search space, a focused late interaction retriever (FliBERT) that can model complex multi-hop queries, and a weak supervision strategy, latent hop ordering, to learn from limited signal about which documents to retrieve for a query. We evaluate Baleen on the new many-hop claim verification dataset HoVer, establishing state-of-the-art performance.

Abstract (translated)

URL

https://arxiv.org/abs/2101.00436

PDF

https://arxiv.org/pdf/2101.00436.pdf