Zobrazeno 1 - 10
of 365
pro vyhledávání: '"Clergerie, A."'
Publikováno v:
NAACL2024 - 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Jun 2024, Mexico City, Mexico
In this work, we introduce a comprehensive error typology specifically designed for evaluating two distinct tasks in machine-generated patent texts: claims-to-abstract generation, and the generation of the next claim given previous ones. We have also
Externí odkaz:
http://arxiv.org/abs/2406.06589
Recent advances in language modeling consist in pretraining highly parameterized neural networks on extremely large web-mined text corpora. Training and inference with such models can be costly in practice, which incentivizes the use of smaller count
Externí odkaz:
http://arxiv.org/abs/2404.07647
Language models have long been shown to embed geographical information in their hidden representations. This line of work has recently been revisited by extending this result to Large Language Models (LLMs). In this paper, we propose to fill the gap
Externí odkaz:
http://arxiv.org/abs/2402.19406
The representation degeneration problem is a phenomenon that is widely observed among self-supervised learning methods based on Transformers. In NLP, it takes the form of anisotropy, a singular property of hidden representations which makes them unex
Externí odkaz:
http://arxiv.org/abs/2401.12143
Self-supervised pre-training of language models usually consists in predicting probability distributions over extensive token vocabularies. In this study, we propose an innovative method that shifts away from probability prediction and instead focuse
Externí odkaz:
http://arxiv.org/abs/2309.08351
CamemBERT-bio: Leveraging Continual Pre-training for Cost-Effective Models on French Biomedical Data
Clinical data in hospitals are increasingly accessible for research through clinical data warehouses. However these documents are unstructured and it is therefore necessary to extract information from medical reports to conduct clinical studies. Tran
Externí odkaz:
http://arxiv.org/abs/2306.15550
The representation degeneration problem is a phenomenon that is widely observed among self-supervised learning methods based on Transformers. In NLP, it takes the form of anisotropy, a singular property of hidden representations which makes them unex
Externí odkaz:
http://arxiv.org/abs/2306.07656
Static subword tokenization algorithms have been an essential component of recent works on language modeling. However, their static nature results in important flaws that degrade the models' downstream performance and robustness. In this work, we pro
Externí odkaz:
http://arxiv.org/abs/2212.07284
Autor:
Scialom, Thomas, Martin, Louis, Staiano, Jacopo, de la Clergerie, Éric Villemonte, Sagot, Benoît
Automatic evaluation remains an open research question in Natural Language Generation. In the context of Sentence Simplification, this is particularly challenging: the task requires by nature to replace complex words with simpler ones that shares the
Externí odkaz:
http://arxiv.org/abs/2104.07560
Autor:
Song, Fuqi, de la Clergerie, Éric
In contract analysis and contract automation, a knowledge base (KB) of legal entities is fundamental for performing tasks such as contract verification, contract generation and contract analytic. However, such a KB does not always exist nor can be pr
Externí odkaz:
http://arxiv.org/abs/2012.01942