Zobrazeno 1 - 10
of 21
pro vyhledávání: '"van Esch, Daan"'
Autor:
Bharadwaj, Shikhar, Ma, Min, Vashishth, Shikhar, Bapna, Ankur, Ganapathy, Sriram, Axelrod, Vera, Dalmia, Siddharth, Han, Wei, Zhang, Yu, van Esch, Daan, Ritchie, Sandy, Talukdar, Partha, Riesa, Jason
Spoken language identification refers to the task of automatically predicting the spoken language in a given utterance. Conventionally, it is modeled as a speech-based language identification task. Prior techniques have been constrained to a single m
Externí odkaz:
http://arxiv.org/abs/2309.10567
Autor:
Ritchie, Sandy, Cheng, You-Chi, Chen, Mingqing, Mathews, Rajiv, van Esch, Daan, Li, Bo, Sim, Khe Chai
Almost none of the 2,000+ languages spoken in Africa have widely available automatic speech recognition systems, and the required data is also only available for a few languages. We have experimented with two techniques which may provide pathways to
Externí odkaz:
http://arxiv.org/abs/2208.03067
Autor:
Aksënova, Alëna, Chen, Zhehuai, Chiu, Chung-Cheng, van Esch, Daan, Golik, Pavel, Han, Wei, King, Levi, Ramabhadran, Bhuvana, Rosenberg, Andrew, Schwartz, Suzan, Wang, Gary
Building inclusive speech recognition systems is a crucial step towards developing technologies that speakers of all language varieties can use. Therefore, ASR systems must work for everybody independently of the way they speak. To accomplish this go
Externí odkaz:
http://arxiv.org/abs/2205.08014
Autor:
Bapna, Ankur, Caswell, Isaac, Kreutzer, Julia, Firat, Orhan, van Esch, Daan, Siddhant, Aditya, Niu, Mengmeng, Baljekar, Pallavi, Garcia, Xavier, Macherey, Wolfgang, Breiner, Theresa, Axelrod, Vera, Riesa, Jason, Cao, Yuan, Chen, Mia Xu, Macherey, Klaus, Krikun, Maxim, Wang, Pidong, Gutkin, Alexander, Shah, Apurva, Huang, Yanping, Chen, Zhifeng, Wu, Yonghui, Hughes, Macduff
In this paper we share findings from our effort to build practical machine translation (MT) systems capable of translating across over one thousand languages. We describe results in three research domains: (i) Building clean, web-mined datasets for 1
Externí odkaz:
http://arxiv.org/abs/2205.03983
Autor:
Conneau, Alexis, Bapna, Ankur, Zhang, Yu, Ma, Min, von Platen, Patrick, Lozhkov, Anton, Cherry, Colin, Jia, Ye, Rivera, Clara, Kale, Mihir, Van Esch, Daan, Axelrod, Vera, Khanuja, Simran, Clark, Jonathan H., Firat, Orhan, Auli, Michael, Ruder, Sebastian, Riesa, Jason, Johnson, Melvin
We introduce XTREME-S, a new benchmark to evaluate universal cross-lingual speech representations in many languages. XTREME-S covers four task families: speech recognition, classification, speech-to-text translation and retrieval. Covering 102 langua
Externí odkaz:
http://arxiv.org/abs/2203.10752
This paper proposes a framework to improve the typing experience of mobile users in morphologically rich languages. Smartphone keyboards typically support features such as input decoding, corrections and predictions that all rely on language models.
Externí odkaz:
http://arxiv.org/abs/2201.06469
Autor:
Kreutzer, Julia, Caswell, Isaac, Wang, Lisa, Wahab, Ahsan, van Esch, Daan, Ulzii-Orshikh, Nasanbayar, Tapo, Allahsera, Subramani, Nishant, Sokolov, Artem, Sikasote, Claytone, Setyawan, Monang, Sarin, Supheakmungkol, Samb, Sokhar, Sagot, Benoît, Rivera, Clara, Rios, Annette, Papadimitriou, Isabel, Osei, Salomey, Suarez, Pedro Ortiz, Orife, Iroro, Ogueji, Kelechi, Rubungo, Andre Niyongabo, Nguyen, Toan Q., Müller, Mathias, Müller, André, Muhammad, Shamsuddeen Hassan, Muhammad, Nanda, Mnyakeni, Ayanda, Mirzakhalov, Jamshidbek, Matangira, Tapiwanashe, Leong, Colin, Lawson, Nze, Kudugunta, Sneha, Jernite, Yacine, Jenny, Mathias, Firat, Orhan, Dossou, Bonaventure F. P., Dlamini, Sakhile, de Silva, Nisansa, Ballı, Sakine Çabuk, Biderman, Stella, Battisti, Alessia, Baruwa, Ahmed, Bapna, Ankur, Baljekar, Pallavi, Azime, Israel Abebe, Awokoya, Ayodele, Ataman, Duygu, Ahia, Orevaoghene, Ahia, Oghenefego, Agrawal, Sweta, Adeyemi, Mofetoluwa
Publikováno v:
Transactions of the Association for Computational Linguistics (2022) 10: 50-72
With the success of large-scale pre-training and multilingual modeling in Natural Language Processing (NLP), recent years have seen a proliferation of large, web-mined text datasets covering hundreds of languages. We manually audit the quality of 205
Externí odkaz:
http://arxiv.org/abs/2103.12028
Pronunciation modeling is a key task for building speech technology in new languages, and while solid grapheme-to-phoneme (G2P) mapping systems exist, language coverage can stand to be improved. The information needed to build G2P models for many mor
Externí odkaz:
http://arxiv.org/abs/2101.11575
Large text corpora are increasingly important for a wide variety of Natural Language Processing (NLP) tasks, and automatic language identification (LangID) is a core technology needed to collect such datasets in a multilingual context. LangID is larg
Externí odkaz:
http://arxiv.org/abs/2010.14571
Autor:
van Esch, Daan, Sarbar, Elnaz, Lucassen, Tamar, O'Brien, Jeremy, Breiner, Theresa, Prasad, Manasa, Crew, Evan, Nguyen, Chieu, Beaufays, Françoise
This technical report describes our deep internationalization program for Gboard, the Google Keyboard. Today, Gboard supports 900+ language varieties across 70+ writing systems, and this report describes how and why we have been adding support for hu
Externí odkaz:
http://arxiv.org/abs/1912.01218