Zobrazeno 1 - 10
of 30
pro vyhledávání: '"Tomasello, Paden"'
Autor:
Communication, Seamless, Barrault, Loïc, Chung, Yu-An, Meglioli, Mariano Coria, Dale, David, Dong, Ning, Duppenthaler, Mark, Duquenne, Paul-Ambroise, Ellis, Brian, Elsahar, Hady, Haaheim, Justin, Hoffman, John, Hwang, Min-Jae, Inaguma, Hirofumi, Klaiber, Christopher, Kulikov, Ilia, Li, Pengwei, Licht, Daniel, Maillard, Jean, Mavlyutov, Ruslan, Rakotoarison, Alice, Sadagopan, Kaushik Ram, Ramakrishnan, Abinesh, Tran, Tuan, Wenzek, Guillaume, Yang, Yilin, Ye, Ethan, Evtimov, Ivan, Fernandez, Pierre, Gao, Cynthia, Hansanti, Prangthip, Kalbassi, Elahe, Kallet, Amanda, Kozhevnikov, Artyom, Gonzalez, Gabriel Mejia, Roman, Robin San, Touret, Christophe, Wong, Corinne, Wood, Carleigh, Yu, Bokai, Andrews, Pierre, Balioglu, Can, Chen, Peng-Jen, Costa-jussà, Marta R., Elbayad, Maha, Gong, Hongyu, Guzmán, Francisco, Heffernan, Kevin, Jain, Somya, Kao, Justine, Lee, Ann, Ma, Xutai, Mourachko, Alex, Peloquin, Benjamin, Pino, Juan, Popuri, Sravya, Ropers, Christophe, Saleem, Safiyyah, Schwenk, Holger, Sun, Anna, Tomasello, Paden, Wang, Changhan, Wang, Jeff, Wang, Skyler, Williamson, Mary
Large-scale automatic speech translation systems today lack key features that help machine-mediated communication feel seamless when compared to human-to-human dialogue. In this work, we introduce a family of models that enable end-to-end expressive
Externí odkaz:
http://arxiv.org/abs/2312.05187
We introduce the Efficient Monotonic Multihead Attention (EMMA), a state-of-the-art simultaneous translation model with numerically-stable and unbiased monotonic alignment estimation. In addition, we present improved training and inference strategies
Externí odkaz:
http://arxiv.org/abs/2312.04515
Autor:
Communication, Seamless, Barrault, Loïc, Chung, Yu-An, Meglioli, Mariano Cora, Dale, David, Dong, Ning, Duquenne, Paul-Ambroise, Elsahar, Hady, Gong, Hongyu, Heffernan, Kevin, Hoffman, John, Klaiber, Christopher, Li, Pengwei, Licht, Daniel, Maillard, Jean, Rakotoarison, Alice, Sadagopan, Kaushik Ram, Wenzek, Guillaume, Ye, Ethan, Akula, Bapi, Chen, Peng-Jen, Hachem, Naji El, Ellis, Brian, Gonzalez, Gabriel Mejia, Haaheim, Justin, Hansanti, Prangthip, Howes, Russ, Huang, Bernie, Hwang, Min-Jae, Inaguma, Hirofumi, Jain, Somya, Kalbassi, Elahe, Kallet, Amanda, Kulikov, Ilia, Lam, Janice, Li, Daniel, Ma, Xutai, Mavlyutov, Ruslan, Peloquin, Benjamin, Ramadan, Mohamed, Ramakrishnan, Abinesh, Sun, Anna, Tran, Kevin, Tran, Tuan, Tufanov, Igor, Vogeti, Vish, Wood, Carleigh, Yang, Yilin, Yu, Bokai, Andrews, Pierre, Balioglu, Can, Costa-jussà, Marta R., Celebi, Onur, Elbayad, Maha, Gao, Cynthia, Guzmán, Francisco, Kao, Justine, Lee, Ann, Mourachko, Alexandre, Pino, Juan, Popuri, Sravya, Ropers, Christophe, Saleem, Safiyyah, Schwenk, Holger, Tomasello, Paden, Wang, Changhan, Wang, Jeff, Wang, Skyler
What does it take to create the Babel Fish, a tool that can help individuals translate speech between any two languages? While recent breakthroughs in text-based models have pushed machine translation coverage beyond 200 languages, unified speech-to-
Externí odkaz:
http://arxiv.org/abs/2308.11596
Autor:
Pratap, Vineel, Tjandra, Andros, Shi, Bowen, Tomasello, Paden, Babu, Arun, Kundu, Sayani, Elkahky, Ali, Ni, Zhaoheng, Vyas, Apoorv, Fazel-Zarandi, Maryam, Baevski, Alexei, Adi, Yossi, Zhang, Xiaohui, Hsu, Wei-Ning, Conneau, Alexis, Auli, Michael
Expanding the language coverage of speech technology has the potential to improve access to information for many more people. However, current speech technology is restricted to about one hundred languages which is a small fraction of the over 7,000
Externí odkaz:
http://arxiv.org/abs/2305.13516
Autor:
Tang, Yun, Sun, Anna Y., Inaguma, Hirofumi, Chen, Xinyue, Dong, Ning, Ma, Xutai, Tomasello, Paden D., Pino, Juan
Transducer and Attention based Encoder-Decoder (AED) are two widely used frameworks for speech-to-text tasks. They are designed for different purposes and each has its own benefits and drawbacks for speech-to-text tasks. In order to leverage strength
Externí odkaz:
http://arxiv.org/abs/2305.03101
With the development of hardware for machine learning, newer models often come at the cost of both increased sizes and computational complexity. In effort to improve the efficiency for these models, we apply and investigate recent quantization techni
Externí odkaz:
http://arxiv.org/abs/2301.00652
Autor:
Diwan, Anuj, Yeh, Ching-Feng, Hsu, Wei-Ning, Tomasello, Paden, Choi, Eunsol, Harwath, David, Mohamed, Abdelrahman
Automatic speech recognition research focuses on training and evaluating on static datasets. Yet, as speech models are increasingly deployed on personal devices, such models encounter user-specific distributional shifts. To simulate this real-world s
Externí odkaz:
http://arxiv.org/abs/2212.01393
Autor:
Chen, Peng-Jen, Tran, Kevin, Yang, Yilin, Du, Jingfei, Kao, Justine, Chung, Yu-An, Tomasello, Paden, Duquenne, Paul-Ambroise, Schwenk, Holger, Gong, Hongyu, Inaguma, Hirofumi, Popuri, Sravya, Wang, Changhan, Pino, Juan, Hsu, Wei-Ning, Lee, Ann
We study speech-to-speech translation (S2ST) that translates speech from one language into another language and focuses on building systems to support languages without standard text writing systems. We use English-Taiwanese Hokkien as a case study,
Externí odkaz:
http://arxiv.org/abs/2211.06474
Autor:
Tomasello, Paden, Shrivastava, Akshat, Lazar, Daniel, Hsu, Po-Chun, Le, Duc, Sagar, Adithya, Elkahky, Ali, Copet, Jade, Hsu, Wei-Ning, Adi, Yossi, Algayres, Robin, Nguyen, Tu Ahn, Dupoux, Emmanuel, Zettlemoyer, Luke, Mohamed, Abdelrahman
End-to-end spoken language understanding (SLU) predicts intent directly from audio using a single model. It promises to improve the performance of assistant systems by leveraging acoustic information lost in the intermediate textual representation an
Externí odkaz:
http://arxiv.org/abs/2207.10643
Autor:
Le, Duc, Shrivastava, Akshat, Tomasello, Paden, Kim, Suyoun, Livshits, Aleksandr, Kalinli, Ozlem, Seltzer, Michael L.
We propose a novel deliberation-based approach to end-to-end (E2E) spoken language understanding (SLU), where a streaming automatic speech recognition (ASR) model produces the first-pass hypothesis and a second-pass natural language understanding (NL
Externí odkaz:
http://arxiv.org/abs/2204.01893