Zobrazeno 1 - 10
of 2 435
pro vyhledávání: '"Marta, R."'
Autor:
Bell, Samuel J., Meglioli, Mariano Coria, Richards, Megan, Sánchez, Eduardo, Ropers, Christophe, Wang, Skyler, Williams, Adina, Sagun, Levent, Costa-jussà, Marta R.
Text toxicity detection systems exhibit significant biases, producing disproportionate rates of false positives on samples mentioning demographic groups. But what about toxicity detection in speech? To investigate the extent to which text-based biase
Externí odkaz:
http://arxiv.org/abs/2411.08135
Several algorithms implemented by language models have recently been successfully reversed-engineered. However, these findings have been concentrated on specific tasks and models, leaving it unclear how universal circuits are across different setting
Externí odkaz:
http://arxiv.org/abs/2410.06496
Direct speech-to-text translation systems encounter an important drawback in data scarcity. A common solution consists on pretraining the encoder on automatic speech recognition, hence losing efficiency in the training process. In this study, we comp
Externí odkaz:
http://arxiv.org/abs/2409.18044
Autor:
Sánchez, Eduardo, Alastruey, Belen, Ropers, Christophe, Stenetorp, Pontus, Artetxe, Mikel, Costa-jussà, Marta R.
We propose a new benchmark to measure a language model's linguistic reasoning skills without relying on pre-existing language-specific knowledge. The test covers 894 questions grouped in 160 problems across 75 (mostly) extremely low-resource language
Externí odkaz:
http://arxiv.org/abs/2409.12126
Autor:
Tan, Xiaoqing Ellen, Hansanti, Prangthip, Wood, Carleigh, Yu, Bokai, Ropers, Christophe, Costa-jussà, Marta R.
In the current landscape of automatic language generation, there is a need to understand, evaluate, and mitigate demographic biases as existing models are becoming increasingly multilingual. To address this, we present the initial eight languages fro
Externí odkaz:
http://arxiv.org/abs/2407.00486
The rapid progress of research aimed at interpreting the inner workings of advanced language models has highlighted a need for contextualizing the insights gained from years of work in this area. This primer provides a concise technical introduction
Externí odkaz:
http://arxiv.org/abs/2405.00208
Data scarcity and the modality gap between the speech and text modalities are two major obstacles of end-to-end Speech Translation (ST) systems, thus hindering their performance. Prior work has attempted to mitigate these challenges by leveraging ext
Externí odkaz:
http://arxiv.org/abs/2402.10422
Autor:
Nguyen, Tu Anh, Muller, Benjamin, Yu, Bokai, Costa-jussa, Marta R., Elbayad, Maha, Popuri, Sravya, Ropers, Christophe, Duquenne, Paul-Ambroise, Algayres, Robin, Mavlyutov, Ruslan, Gat, Itai, Williamson, Mary, Synnaeve, Gabriel, Pino, Juan, Sagot, Benoit, Dupoux, Emmanuel
We introduce Spirit LM, a foundation multimodal language model that freely mixes text and speech. Our model is based on a 7B pretrained text language model that we extend to the speech modality by continuously training it on text and speech units. Sp
Externí odkaz:
http://arxiv.org/abs/2402.05755
Autor:
Ropers, Christophe, Dale, David, Hansanti, Prangthip, Gonzalez, Gabriel Mejia, Evtimov, Ivan, Wong, Corinne, Touret, Christophe, Pereyra, Kristina, Kim, Seohyun Sonia, Ferrer, Cristian Canton, Andrews, Pierre, Costa-jussà, Marta R.
Assessing performance in Natural Language Processing is becoming increasingly complex. One particular challenge is the potential for evaluation datasets to overlap with training data, either directly or indirectly, which can lead to skewed results an
Externí odkaz:
http://arxiv.org/abs/2401.16247
Autor:
Costa-jussà, Marta R., Meglioli, Mariano Coria, Andrews, Pierre, Dale, David, Hansanti, Prangthip, Kalbassi, Elahe, Mourachko, Alex, Ropers, Christophe, Wood, Carleigh
Research in toxicity detection in natural language processing for the speech modality (audio-based) is quite limited, particularly for languages other than English. To address these limitations and lay the groundwork for truly multilingual audio-base
Externí odkaz:
http://arxiv.org/abs/2401.05060