Zobrazeno 1 - 10
of 254
pro vyhledávání: '"AGIRRE, ENEKO"'
Autor:
Sainz, Oscar, García-Ferrero, Iker, Jacovi, Alon, Campos, Jon Ander, Elazar, Yanai, Agirre, Eneko, Goldberg, Yoav, Chen, Wei-Lin, Chim, Jenny, Choshen, Leshem, D'Amico-Wong, Luca, Dell, Melissa, Fan, Run-Ze, Golchin, Shahriar, Li, Yucheng, Liu, Pengfei, Pahwa, Bhavish, Prabhu, Ameya, Sharma, Suryansh, Silcock, Emily, Solonko, Kateryna, Stap, David, Surdeanu, Mihai, Tseng, Yu-Min, Udandarao, Vishaal, Wang, Zengzhi, Xu, Ruijie, Yang, Jinglin
The 1st Workshop on Data Contamination (CONDA 2024) focuses on all relevant aspects of data contamination in natural language processing, where data contamination is understood as situations where evaluation data is included in pre-training corpora u
Externí odkaz:
http://arxiv.org/abs/2407.21530
Autor:
Agirre, Eneko
Publikováno v:
Computational Linguistics, Vol 46, Iss 1, Pp 245-248 (2020)
Externí odkaz:
https://doaj.org/article/b93ffc30d9fc4d1c998c4561a09fd5b0
Existing Vision-Language Compositionality (VLC) benchmarks like SugarCrepe are formulated as image-to-text retrieval problems, where, given an image, the models need to select between the correct textual description and a synthetic hard negative text
Externí odkaz:
http://arxiv.org/abs/2406.09952
Cross-lingual transfer-learning is widely used in Event Extraction for low-resource languages and involves a Multilingual Language Model that is trained in a source language and applied to the target language. This paper studies whether the typologic
Externí odkaz:
http://arxiv.org/abs/2404.06392
Autor:
Etxaniz, Julen, Sainz, Oscar, Perez, Naiara, Aldabe, Itziar, Rigau, German, Agirre, Eneko, Ormazabal, Aitor, Artetxe, Mikel, Soroa, Aitor
Publikováno v:
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 14952--14972. 2024
We introduce Latxa, a family of large language models for Basque ranging from 7 to 70 billion parameters. Latxa is based on Llama 2, which we continue pretraining on a new Basque corpus comprising 4.3M documents and 4.2B tokens. Addressing the scarci
Externí odkaz:
http://arxiv.org/abs/2403.20266
This paper shows that text-only Language Models (LM) can learn to ground spatial relations like "left of" or "below" if they are provided with explicit location information of objects and they are properly trained to leverage those locations. We perf
Externí odkaz:
http://arxiv.org/abs/2403.13666
Autor:
Salaberria, Ander, Azkune, Gorka, de Lacalle, Oier Lopez, Soroa, Aitor, Agirre, Eneko, Keller, Frank
Existing work has observed that current text-to-image systems do not accurately reflect explicit spatial relations between objects such as 'left of' or 'below'. We hypothesize that this is because explicit spatial relations rarely appear in the image
Externí odkaz:
http://arxiv.org/abs/2403.00587
Table-to-text generation involves generating appropriate textual descriptions given structured tabular data. It has attracted increasing attention in recent years thanks to the popularity of neural network models and the availability of large-scale d
Externí odkaz:
http://arxiv.org/abs/2311.09808
Autor:
Sainz, Oscar, Campos, Jon Ander, García-Ferrero, Iker, Etxaniz, Julen, de Lacalle, Oier Lopez, Agirre, Eneko
In this position paper, we argue that the classical evaluation on Natural Language Processing (NLP) tasks using annotated benchmarks is in trouble. The worst kind of data contamination happens when a Large Language Model (LLM) is trained on the test
Externí odkaz:
http://arxiv.org/abs/2310.18018
Autor:
Alonso, Iñigo, Agirre, Eneko
Publikováno v:
Expert Systems with Applications, Volume 238, Part D, 15 March 2024, 121869
Table-to-text systems generate natural language statements from structured data like tables. While end-to-end techniques suffer from low factual correctness (fidelity), a previous study reported gains when using manual logical forms (LF) that represe
Externí odkaz:
http://arxiv.org/abs/2310.17279