Výsledky vyhledávání - "Jiao, Binxing"

Report

Are You Copying My Model? Protecting the Copyright of Large Language Models for EaaS via Backdoor Watermark

Autor: Peng, Wenjun, Yi, Jingwei, Wu, Fangzhao, Wu, Shangxi, Zhu, Bin, Lyu, Lingjuan, Jiao, Binxing, Xu, Tong, Sun, Guangzhong, Xie, Xing

Large language models (LLMs) have demonstrated powerful capabilities in both text understanding and generation. Companies have begun to offer Embedding as a Service (EaaS) based on these LLMs, which can benefit various natural language processing (NL

Externí odkaz: http://arxiv.org/abs/2305.10036

Zobrazit plný text záznamu

Report

Inference with Reference: Lossless Acceleration of Large Language Models

Autor: Yang, Nan, Ge, Tao, Wang, Liang, Jiao, Binxing, Jiang, Daxin, Yang, Linjun, Majumder, Rangan, Wei, Furu

We propose LLMA, an LLM accelerator to losslessly speed up Large Language Model (LLM) inference with references. LLMA is motivated by the observation that there are abundant identical text spans between the decoding result by an LLM and the reference

Externí odkaz: http://arxiv.org/abs/2304.04487

Zobrazit plný text záznamu

Report

Adam: Dense Retrieval Distillation with Adaptive Dark Examples

Autor: Tao, Chongyang, Liu, Chang, Shen, Tao, Xu, Can, Geng, Xiubo, Jiao, Binxing, Jiang, Daxin

To improve the performance of the dual-encoder retriever, one effective approach is knowledge distillation from the cross-encoder ranker. Existing works construct the candidate passages following the supervised learning setting where a query is paire

Externí odkaz: http://arxiv.org/abs/2212.10192

Zobrazit plný text záznamu

Report

Text Embeddings by Weakly-Supervised Contrastive Pre-training

Autor: Wang, Liang, Yang, Nan, Huang, Xiaolong, Jiao, Binxing, Yang, Linjun, Jiang, Daxin, Majumder, Rangan, Wei, Furu

This paper presents E5, a family of state-of-the-art text embeddings that transfer well to a wide range of tasks. The model is trained in a contrastive manner with weak supervision signals from our curated large-scale text pair dataset (called CCPair

Externí odkaz: http://arxiv.org/abs/2212.03533

Zobrazit plný text záznamu

Report

VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning

Autor: Zhu, Qiushi, Zhou, Long, Zhang, Ziqiang, Liu, Shujie, Jiao, Binxing, Zhang, Jie, Dai, Lirong, Jiang, Daxin, Li, Jinyu, Wei, Furu

Although speech is a simple and effective way for humans to communicate with the outside world, a more realistic speech interaction contains multimodal information, e.g., vision, text. How to design a unified framework to integrate different modal in

Externí odkaz: http://arxiv.org/abs/2211.11275

Zobrazit plný text záznamu

Report

Effective and Efficient Query-aware Snippet Extraction for Web Search

Autor: Yi, Jingwei, Wu, Fangzhao, Wu, Chuhan, Huang, Xiaolong, Jiao, Binxing, Sun, Guangzhong, Xie, Xing

Query-aware webpage snippet extraction is widely used in search engines to help users better understand the content of the returned webpages before clicking. Although important, it is very rarely studied. In this paper, we propose an effective query-

Externí odkaz: http://arxiv.org/abs/2210.08809

Zobrazit plný text záznamu

Report

LexMAE: Lexicon-Bottlenecked Pretraining for Large-Scale Retrieval

Autor: Shen, Tao, Geng, Xiubo, Tao, Chongyang, Xu, Can, Huang, Xiaolong, Jiao, Binxing, Yang, Linjun, Jiang, Daxin

In large-scale retrieval, the lexicon-weighting paradigm, learning weighted sparse representations in vocabulary space, has shown promising results with high quality and low latency. Despite it deeply exploiting the lexicon-representing capability of

Externí odkaz: http://arxiv.org/abs/2208.14754

Zobrazit plný text záznamu

Report

LED: Lexicon-Enlightened Dense Retriever for Large-Scale Retrieval

Autor: Zhang, Kai, Tao, Chongyang, Shen, Tao, Xu, Can, Geng, Xiubo, Jiao, Binxing, Jiang, Daxin

Retrieval models based on dense representations in semantic space have become an indispensable branch for first-stage retrieval. These retrievers benefit from surging advances in representation learning towards compressive global sequence-level embed

Externí odkaz: http://arxiv.org/abs/2208.13661

Zobrazit plný text záznamu

Report

SimLM: Pre-training with Representation Bottleneck for Dense Passage Retrieval

Autor: Wang, Liang, Yang, Nan, Huang, Xiaolong, Jiao, Binxing, Yang, Linjun, Jiang, Daxin, Majumder, Rangan, Wei, Furu

In this paper, we propose SimLM (Similarity matching with Language Model pre-training), a simple yet effective pre-training method for dense passage retrieval. It employs a simple bottleneck architecture that learns to compress the passage informatio

Externí odkaz: http://arxiv.org/abs/2207.02578

Zobrazit plný text záznamu

Report

Towards Robust Ranker for Text Retrieval

Autor: Zhou, Yucheng, Shen, Tao, Geng, Xiubo, Tao, Chongyang, Xu, Can, Long, Guodong, Jiao, Binxing, Jiang, Daxin

A ranker plays an indispensable role in the de facto 'retrieval & rerank' pipeline, but its training still lags behind -- learning from moderate negatives or/and serving as an auxiliary module for a retriever. In this work, we first identify two majo

Externí odkaz: http://arxiv.org/abs/2206.08063

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání