Zobrazeno 1 - 9
of 9
pro vyhledávání: '"Chrzanowski, Mike"'
Training stability of large language models(LLMs) is an important research topic. Reproducing training instabilities can be costly, so we use a small language model with 830M parameters and experiment with higher learning rates to force models to div
Externí odkaz:
http://arxiv.org/abs/2410.16682
Autor:
Adams, Virginia, Subramanian, Sandeep, Chrzanowski, Mike, Hrinchuk, Oleksii, Kuchaiev, Oleksii
General translation models often still struggle to generate accurate translations in specialized domains. To guide machine translation practitioners and characterize the effectiveness of domain adaptation methods under different data availability sce
Externí odkaz:
http://arxiv.org/abs/2206.01137
This paper presents Non-Attentive Tacotron based on the Tacotron 2 text-to-speech model, replacing the attention mechanism with an explicit duration predictor. This improves robustness significantly as measured by unaligned duration ratio and word de
Externí odkaz:
http://arxiv.org/abs/2010.04301
In this paper we propose to augment a modern neural-network architecture with an attention model inspired by human perception. Specifically, we adversarially train and analyze a neural model incorporating a human inspired, visual attention component
Externí odkaz:
http://arxiv.org/abs/1912.02184
Inspired by recent work in attention models for image captioning and question answering, we present a soft attention model for the reinforcement learning domain. This model uses a soft, top-down attention mechanism to create a bottleneck in the agent
Externí odkaz:
http://arxiv.org/abs/1906.02500
Autor:
Yogatama, Dani, d'Autume, Cyprien de Masson, Connor, Jerome, Kocisky, Tomas, Chrzanowski, Mike, Kong, Lingpeng, Lazaridou, Angeliki, Ling, Wang, Yu, Lei, Dyer, Chris, Blunsom, Phil
We define general linguistic intelligence as the ability to reuse previously acquired knowledge about a language's lexicon, syntax, semantics, and pragmatic conventions to adapt to new tasks quickly. Using this definition, we analyze state-of-the-art
Externí odkaz:
http://arxiv.org/abs/1901.11373
Autor:
Santoro, Adam, Faulkner, Ryan, Raposo, David, Rae, Jack, Chrzanowski, Mike, Weber, Theophane, Wierstra, Daan, Vinyals, Oriol, Pascanu, Razvan, Lillicrap, Timothy
Memory-based neural networks model temporal data by leveraging an ability to remember information for long periods. It is unclear, however, whether they also have an ability to perform complex relational reasoning with the information they remember.
Externí odkaz:
http://arxiv.org/abs/1806.01822
Autor:
Arik, Sercan O., Chrzanowski, Mike, Coates, Adam, Diamos, Gregory, Gibiansky, Andrew, Kang, Yongguo, Li, Xian, Miller, John, Ng, Andrew, Raiman, Jonathan, Sengupta, Shubho, Shoeybi, Mohammad
We present Deep Voice, a production-quality text-to-speech system constructed entirely from deep neural networks. Deep Voice lays the groundwork for truly end-to-end neural speech synthesis. The system comprises five major building blocks: a segmenta
Externí odkaz:
http://arxiv.org/abs/1702.07825
Autor:
Amodei, Dario, Anubhai, Rishita, Battenberg, Eric, Case, Carl, Casper, Jared, Catanzaro, Bryan, Chen, Jingdong, Chrzanowski, Mike, Coates, Adam, Diamos, Greg, Elsen, Erich, Engel, Jesse, Fan, Linxi, Fougner, Christopher, Han, Tony, Hannun, Awni, Jun, Billy, LeGresley, Patrick, Lin, Libby, Narang, Sharan, Ng, Andrew, Ozair, Sherjil, Prenger, Ryan, Raiman, Jonathan, Satheesh, Sanjeev, Seetapun, David, Sengupta, Shubho, Wang, Yi, Wang, Zhiqian, Wang, Chong, Xiao, Bo, Yogatama, Dani, Zhan, Jun, Zhu, Zhenyao
We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech--two vastly different languages. Because it replaces entire pipelines of hand-engineered components with neural networks, end-to-end
Externí odkaz:
http://arxiv.org/abs/1512.02595