Zobrazeno 1 - 10
of 2 349
pro vyhledávání: '"Yu, LiLi"'
Autor:
Shi, Weijia, Han, Xiaochuang, Zhou, Chunting, Liang, Weixin, Lin, Xi Victoria, Zettlemoyer, Luke, Yu, Lili
We present LlamaFusion, a framework for empowering pretrained text-only large language models (LLMs) with multimodal generative capabilities, enabling them to understand and generate both text and images in arbitrary sequences. LlamaFusion leverages
Externí odkaz:
http://arxiv.org/abs/2412.15188
Autor:
Pagnoni, Artidoro, Pasunuru, Ram, Rodriguez, Pedro, Nguyen, John, Muller, Benjamin, Li, Margaret, Zhou, Chunting, Yu, Lili, Weston, Jason, Zettlemoyer, Luke, Ghosh, Gargi, Lewis, Mike, Holtzman, Ari, Iyer, Srinivasan
We introduce the Byte Latent Transformer (BLT), a new byte-level LLM architecture that, for the first time, matches tokenization-based LLM performance at scale with significant improvements in inference efficiency and robustness. BLT encodes bytes in
Externí odkaz:
http://arxiv.org/abs/2412.09871
Autor:
Liang, Weixin, Yu, Lili, Luo, Liang, Iyer, Srinivasan, Dong, Ning, Zhou, Chunting, Ghosh, Gargi, Lewis, Mike, Yih, Wen-tau, Zettlemoyer, Luke, Lin, Xi Victoria
The development of large language models (LLMs) has expanded to multi-modal systems capable of processing text, images, and speech within a unified framework. Training these models demands significantly larger datasets and computational resources com
Externí odkaz:
http://arxiv.org/abs/2411.04996
Reconstructing transmission networks is essential for identifying key factors like superspreaders and high-risk locations, which are critical for developing effective pandemic prevention strategies. In this study, we developed a Bayesian framework th
Externí odkaz:
http://arxiv.org/abs/2409.05245
Autor:
Zhou, Chunting, Yu, Lili, Babu, Arun, Tirumala, Kushal, Yasunaga, Michihiro, Shamis, Leonid, Kahn, Jacob, Ma, Xuezhe, Zettlemoyer, Luke, Levy, Omer
We introduce Transfusion, a recipe for training a multi-modal model over discrete and continuous data. Transfusion combines the language modeling loss function (next token prediction) with diffusion to train a single transformer over mixed-modality s
Externí odkaz:
http://arxiv.org/abs/2408.11039
Autor:
Ma, Xuezhe, Yang, Xiaomeng, Xiong, Wenhan, Chen, Beidi, Yu, Lili, Zhang, Hao, May, Jonathan, Zettlemoyer, Luke, Levy, Omer, Zhou, Chunting
The quadratic complexity and weak length extrapolation of Transformers limits their ability to scale to long sequences, and while sub-quadratic solutions like linear attention and state space models exist, they empirically underperform Transformers i
Externí odkaz:
http://arxiv.org/abs/2404.08801
In recent years, advances in the large-scale pretraining of language and text-to-image models have revolutionized the field of machine learning. Yet, integrating these two modalities into a single, robust model capable of generating seamless multimod
Externí odkaz:
http://arxiv.org/abs/2309.15564
Autor:
Yu, Lili, Shi, Bowen, Pasunuru, Ramakanth, Muller, Benjamin, Golovneva, Olga, Wang, Tianlu, Babu, Arun, Tang, Binh, Karrer, Brian, Sheynin, Shelly, Ross, Candace, Polyak, Adam, Howes, Russell, Sharma, Vasu, Xu, Puxin, Tamoyan, Hovhannes, Ashual, Oron, Singer, Uriel, Li, Shang-Wen, Zhang, Susan, James, Richard, Ghosh, Gargi, Taigman, Yaniv, Fazel-Zarandi, Maryam, Celikyilmaz, Asli, Zettlemoyer, Luke, Aghajanyan, Armen
We present CM3Leon (pronounced "Chameleon"), a retrieval-augmented, token-based, decoder-only multi-modal language model capable of generating and infilling both text and images. CM3Leon uses the CM3 multi-modal architecture but additionally shows th
Externí odkaz:
http://arxiv.org/abs/2309.02591
Autor:
Zhou, Chunting, Liu, Pengfei, Xu, Puxin, Iyer, Srini, Sun, Jiao, Mao, Yuning, Ma, Xuezhe, Efrat, Avia, Yu, Ping, Yu, Lili, Zhang, Susan, Ghosh, Gargi, Lewis, Mike, Zettlemoyer, Luke, Levy, Omer
Large language models are trained in two stages: (1) unsupervised pretraining from raw text, to learn general-purpose representations, and (2) large scale instruction tuning and reinforcement learning, to better align to end tasks and user preference
Externí odkaz:
http://arxiv.org/abs/2305.11206
Autoregressive transformers are spectacular models for short sequences but scale poorly to long sequences such as high-resolution images, podcasts, code, or books. We proposed Megabyte, a multi-scale decoder architecture that enables end-to-end diffe
Externí odkaz:
http://arxiv.org/abs/2305.07185