Zobrazeno 1 - 10
of 20
pro vyhledávání: '"Bonifacio, Luiz"'
Autor:
Thakur, Nandan, Bonifacio, Luiz, Fröbe, Maik, Bondarenko, Alexander, Kamalloo, Ehsan, Potthast, Martin, Hagen, Matthias, Lin, Jimmy
The zero-shot effectiveness of neural retrieval models is often evaluated on the BEIR benchmark -- a combination of different IR evaluation datasets. Interestingly, previous studies found that particularly on the BEIR subset Touch\'e 2020, an argumen
Externí odkaz:
http://arxiv.org/abs/2407.07790
Autor:
Thakur, Nandan, Bonifacio, Luiz, Zhang, Xinyu, Ogundepo, Odunayo, Kamalloo, Ehsan, Alfonso-Hermelo, David, Li, Xiaoguang, Liu, Qun, Chen, Boxing, Rezagholizadeh, Mehdi, Lin, Jimmy
Retrieval-Augmented Generation (RAG) grounds Large Language Model (LLM) output by leveraging external knowledge sources to reduce factual hallucinations. However, prior work lacks a comprehensive evaluation of different language families, making it c
Externí odkaz:
http://arxiv.org/abs/2312.11361
Autor:
Abonizio, Hugo, Bonifacio, Luiz, Jeronymo, Vitor, Lotufo, Roberto, Zavrel, Jakub, Nogueira, Rodrigo
Recent work has explored Large Language Models (LLMs) to overcome the lack of training data for Information Retrieval (IR) tasks. The generalization abilities of these models have enabled the creation of synthetic in-domain data by providing instruct
Externí odkaz:
http://arxiv.org/abs/2307.04601
Autor:
Jeronymo, Vitor, Bonifacio, Luiz, Abonizio, Hugo, Fadaee, Marzieh, Lotufo, Roberto, Zavrel, Jakub, Nogueira, Rodrigo
Recently, InPars introduced a method to efficiently use large language models (LLMs) in information retrieval tasks: via few-shot examples, an LLM is induced to generate relevant queries for documents. These synthetic query-document pairs can then be
Externí odkaz:
http://arxiv.org/abs/2301.01820
Autor:
Rosa, Guilherme, Bonifacio, Luiz, Jeronymo, Vitor, Abonizio, Hugo, Fadaee, Marzieh, Lotufo, Roberto, Nogueira, Rodrigo
Bi-encoders and cross-encoders are widely used in many state-of-the-art retrieval pipelines. In this work we study the generalization ability of these two types of architectures on a wide range of parameter count on both in-domain and out-of-domain s
Externí odkaz:
http://arxiv.org/abs/2212.06121
Autor:
Almeida, Thales Sales, Laitz, Thiago, Seródio, João, Bonifacio, Luiz Henrique, Lotufo, Roberto, Nogueira, Rodrigo
Publikováno v:
DESIRES 2022-3rd International Conference on Design of Experimental Search and Information REtrieval Systems, 30-31,August 2022, San Jose, CA, USA
The widespread availability of search API's (both free and commercial) brings the promise of increased coverage and quality of search results for metasearch engines, while decreasing the maintenance costs of the crawling and indexing infrastructures.
Externí odkaz:
http://arxiv.org/abs/2210.14837
Autor:
Rosa, Guilherme Moraes, Bonifacio, Luiz, Jeronymo, Vitor, Abonizio, Hugo, Fadaee, Marzieh, Lotufo, Roberto, Nogueira, Rodrigo
Recent work has shown that small distilled language models are strong competitors to models that are orders of magnitude larger and slower in a wide range of information retrieval tasks. This has made distilled and dense models, due to latency constr
Externí odkaz:
http://arxiv.org/abs/2206.02873
Autor:
Rosa, Guilherme Moraes, Bonifacio, Luiz, Jeronymo, Vitor, Abonizio, Hugo, Lotufo, Roberto, Nogueira, Rodrigo
Recent work has shown that language models scaled to billions of parameters, such as GPT-3, perform remarkably well in zero-shot and few-shot scenarios. In this work, we experiment with zero-shot models in the legal case entailment task of the COLIEE
Externí odkaz:
http://arxiv.org/abs/2205.15172
The information retrieval community has recently witnessed a revolution due to large pretrained transformer models. Another key ingredient for this revolution was the MS MARCO dataset, whose scale and diversity has enabled zero-shot transfer learning
Externí odkaz:
http://arxiv.org/abs/2202.05144
Autor:
Bonifacio, Luiz, Jeronymo, Vitor, Abonizio, Hugo Queiroz, Campiotti, Israel, Fadaee, Marzieh, Lotufo, Roberto, Nogueira, Rodrigo
The MS MARCO ranking dataset has been widely used for training deep learning models for IR tasks, achieving considerable effectiveness on diverse zero-shot scenarios. However, this type of resource is scarce in languages other than English. In this w
Externí odkaz:
http://arxiv.org/abs/2108.13897