Zobrazeno 1 - 10
of 269
pro vyhledávání: '"Hall, Keith A."'
We develop and evaluate multilingual scientific documents similarity measurement models in this work. Such models can be used to find related works in different languages, which can help multilingual researchers find and explore papers more efficient
Externí odkaz:
http://arxiv.org/abs/2309.10539
We present Hybrid Infused Reranking for Passages Retrieval (HYRR), a framework for training rerankers based on a hybrid of BM25 and neural retrieval models. Retrievers based on hybrid models have been shown to outperform both BM25 and neural models a
Externí odkaz:
http://arxiv.org/abs/2212.10528
Autor:
Dai, Zhuyun, Zhao, Vincent Y., Ma, Ji, Luan, Yi, Ni, Jianmo, Lu, Jing, Bakalov, Anton, Guu, Kelvin, Hall, Keith B., Chang, Ming-Wei
Much recent research on information retrieval has focused on how to transfer from one task (typically with abundant supervised data) to various other tasks where supervision is limited, with the implicit assumption that it is possible to generalize f
Externí odkaz:
http://arxiv.org/abs/2209.11755
This paper proposes a framework to improve the typing experience of mobile users in morphologically rich languages. Smartphone keyboards typically support features such as input decoding, corrections and predictions that all rely on language models.
Externí odkaz:
http://arxiv.org/abs/2201.06469
We argue that current IR metrics, modeled on optimizing user experience, measure too narrow a portion of the IR space. If IR systems are weak, these metrics undersample or completely filter out the deeper documents that need improvement. If IR system
Externí odkaz:
http://arxiv.org/abs/2201.01745
Autor:
Ni, Jianmo, Qu, Chen, Lu, Jing, Dai, Zhuyun, Ábrego, Gustavo Hernández, Ma, Ji, Zhao, Vincent Y., Luan, Yi, Hall, Keith B., Chang, Ming-Wei, Yang, Yinfei
It has been shown that dual encoders trained on one domain often fail to generalize to other domains for retrieval tasks. One widespread belief is that the bottleneck layer of a dual encoder, where the final score is simply a dot-product between a qu
Externí odkaz:
http://arxiv.org/abs/2112.07899
Autor:
Ni, Jianmo, Ábrego, Gustavo Hernández, Constant, Noah, Ma, Ji, Hall, Keith B., Cer, Daniel, Yang, Yinfei
We provide the first exploration of sentence embeddings from text-to-text transformers (T5). Sentence embeddings are broadly useful for language processing tasks. While T5 achieves impressive performance on language tasks cast as sequence-to-sequence
Externí odkaz:
http://arxiv.org/abs/2108.08877
In this paper, we report the results of our participation in the TREC-COVID challenge. To meet the challenge of building a search engine for rapidly evolving biomedical collection, we propose a simple yet effective weighted hierarchical rank fusion a
Externí odkaz:
http://arxiv.org/abs/2010.00200
Autor:
Roark, Brian, Wolf-Sonkin, Lawrence, Kirov, Christo, Mielke, Sabrina J., Johny, Cibu, Demirsahin, Isin, Hall, Keith
This paper describes the Dakshina dataset, a new resource consisting of text in both the Latin and native scripts for 12 South Asian languages. The dataset includes, for each language: 1) native script Wikipedia text; 2) a romanization lexicon; and 3
Externí odkaz:
http://arxiv.org/abs/2007.01176
A major obstacle to the wide-spread adoption of neural retrieval models is that they require large supervised training sets to surpass traditional term-based techniques, which are constructed from raw corpora. In this paper, we propose an approach to
Externí odkaz:
http://arxiv.org/abs/2004.14503