Výsledky vyhledávání

Report

2D Matryoshka Training for Information Retrieval

Autor: Wang, Shuai, Zhuang, Shengyao, Koopman, Bevan, Zuccon, Guido

2D Matryoshka Training is an advanced embedding representation training approach designed to train an encoder model simultaneously across various layer-dimension setups. This method has demonstrated higher effectiveness in Semantic Text Similarity (S

Externí odkaz: http://arxiv.org/abs/2411.17299

Zobrazit plný text záznamu

Report

Starbucks: Improved Training for 2D Matryoshka Embeddings

Autor: Zhuang, Shengyao, Wang, Shuai, Koopman, Bevan, Zuccon, Guido

Effective approaches that can scale embedding model depth (i.e. layers) and embedding size allow for the creation of models that are highly scalable across different computational resources and task requirements. While the recently proposed 2D Matryo

Externí odkaz: http://arxiv.org/abs/2410.13230

Zobrazit plný text záznamu

Report

Does Vec2Text Pose a New Corpus Poisoning Threat?

Autor: Zhuang, Shengyao, Koopman, Bevan, Zuccon, Guido

The emergence of Vec2Text -- a method for text embedding inversion -- has raised serious privacy concerns for dense retrieval systems which use text embeddings. This threat comes from the ability for an attacker with access to embeddings to reconstru

Externí odkaz: http://arxiv.org/abs/2410.06628

Zobrazit plný text záznamu

Report

Source-Free Domain-Invariant Performance Prediction

Autor: Khramtsova, Ekaterina, Baktashmotlagh, Mahsa, Zuccon, Guido, Wang, Xi, Salzmann, Mathieu

Accurately estimating model performance poses a significant challenge, particularly in scenarios where the source and target domains follow different data distributions. Most existing performance prediction methods heavily rely on the source data in

Externí odkaz: http://arxiv.org/abs/2408.02209

Zobrazit plný text záznamu

Report

Embark on DenseQuest: A System for Selecting the Best Dense Retriever for a Custom Collection

Autor: Khramtsova, Ekaterina, Leelanupab, Teerapong, Zhuang, Shengyao, Baktashmotlagh, Mahsa, Zuccon, Guido

In this demo we present a web-based application for selecting an effective pre-trained dense retriever to use on a private collection. Our system, DenseQuest, provides unsupervised selection and ranking capabilities to predict the best dense retrieve

Externí odkaz: http://arxiv.org/abs/2407.06685

Zobrazit plný text záznamu

Report

Dense Retrieval with Continuous Explicit Feedback for Systematic Review Screening Prioritisation

Autor: Mao, Xinyu, Zhuang, Shengyao, Koopman, Bevan, Zuccon, Guido

The goal of screening prioritisation in systematic reviews is to identify relevant documents with high recall and rank them in early positions for review. This saves reviewing effort if paired with a stopping criterion, and speeds up review completio

Externí odkaz: http://arxiv.org/abs/2407.00635

Zobrazit plný text záznamu

Report

An Investigation of Prompt Variations for Zero-shot LLM-based Rankers

Autor: Sun, Shuoqi, Zhuang, Shengyao, Wang, Shuai, Zuccon, Guido

We provide a systematic understanding of the impact of specific components and wordings used in prompts on the effectiveness of rankers based on zero-shot Large Language Models (LLMs). Several zero-shot ranking methods based on LLMs have recently bee

Externí odkaz: http://arxiv.org/abs/2406.14117

Zobrazit plný text záznamu

Report

A Systematic Investigation of Distilling Large Language Models into Cross-Encoders for Passage Re-ranking

Autor: Schlatt, Ferdinand, Fröbe, Maik, Scells, Harrisen, Zhuang, Shengyao, Koopman, Bevan, Zuccon, Guido, Stein, Benno, Potthast, Martin, Hagen, Matthias

Cross-encoders distilled from large language models (LLMs) are often more effective re-rankers than cross-encoders fine-tuned on manually labeled data. However, the distilled models usually do not reach their teacher LLM's effectiveness. To investiga

Externí odkaz: http://arxiv.org/abs/2405.07920

Zobrazit plný text záznamu

Report

PromptReps: Prompting Large Language Models to Generate Dense and Sparse Representations for Zero-Shot Document Retrieval

Autor: Zhuang, Shengyao, Ma, Xueguang, Koopman, Bevan, Lin, Jimmy, Zuccon, Guido

Utilizing large language models (LLMs) for zero-shot document ranking is done in one of two ways: (1) prompt-based re-ranking methods, which require no further training but are only feasible for re-ranking a handful of candidate documents due to comp

Externí odkaz: http://arxiv.org/abs/2404.18424

Zobrazit plný text záznamu

Report

Set-Encoder: Permutation-Invariant Inter-Passage Attention for Listwise Passage Re-Ranking with Cross-Encoders

Autor: Schlatt, Ferdinand, Fröbe, Maik, Scells, Harrisen, Zhuang, Shengyao, Koopman, Bevan, Zuccon, Guido, Stein, Benno, Potthast, Martin, Hagen, Matthias

Existing cross-encoder re-rankers can be categorized as pointwise, pairwise, or listwise models. Pair- and listwise models allow passage interactions, which usually makes them more effective than pointwise models but also less efficient and less robu

Externí odkaz: http://arxiv.org/abs/2404.06912

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání