Zobrazeno 1 - 10
of 3 571
pro vyhledávání: '"Karlinsky A"'
Autor:
Shabtay, Nimrod, Polo, Felipe Maia, Doveh, Sivan, Lin, Wei, Mirza, M. Jehanzeb, Chosen, Leshem, Yurochkin, Mikhail, Sun, Yuekai, Arbelle, Assaf, Karlinsky, Leonid, Giryes, Raja
The large-scale training of multi-modal models on data scraped from the web has shown outstanding utility in infusing these models with the required world knowledge to perform effectively on multiple downstream tasks. However, one downside of scrapin
Externí odkaz:
http://arxiv.org/abs/2410.10783
Autor:
Mirza, M. Jehanzeb, Zhao, Mengjie, Mao, Zhuoyuan, Doveh, Sivan, Lin, Wei, Gavrikov, Paul, Dorkenwald, Michael, Yang, Shiqi, Jha, Saurav, Wakaki, Hiromi, Mitsufuji, Yuki, Possegger, Horst, Feris, Rogerio, Karlinsky, Leonid, Glass, James
In this work, we propose a novel method (GLOV) enabling Large Language Models (LLMs) to act as implicit Optimizers for Vision-Langugage Models (VLMs) to enhance downstream vision tasks. Our GLOV meta-prompts an LLM with the downstream task descriptio
Externí odkaz:
http://arxiv.org/abs/2410.06154
Autor:
Stallone, Matt, Saxena, Vaibhav, Karlinsky, Leonid, McGinn, Bridget, Bula, Tim, Mishra, Mayank, Soria, Adriana Meza, Zhang, Gaoyuan, Prasad, Aditya, Shen, Yikang, Surendran, Saptha, Guttula, Shanmukha, Patel, Hima, Selvam, Parameswaran, Dang, Xuan-Hong, Koyfman, Yan, Sood, Atin, Feris, Rogerio, Desai, Nirmit, Cox, David D., Puri, Ruchir, Panda, Rameswar
This paper introduces long-context Granite code models that support effective context windows of up to 128K tokens. Our solution for scaling context length of Granite 3B/8B code models from 2K/4K to 128K consists of a light-weight continual pretraini
Externí odkaz:
http://arxiv.org/abs/2407.13739
Autor:
Bhati, Saurabhchand, Gong, Yuan, Karlinsky, Leonid, Kuehne, Hilde, Feris, Rogerio, Glass, James
State-space models (SSMs) have emerged as an alternative to Transformers for audio modeling due to their high computational efficiency with long inputs. While recent efforts on Audio SSMs have reported encouraging results, two main limitations remain
Externí odkaz:
http://arxiv.org/abs/2407.04082
Autor:
Huang, Brandon, Mitra, Chancharik, Arbelle, Assaf, Karlinsky, Leonid, Darrell, Trevor, Herzig, Roei
The recent success of interleaved Large Multimodal Models (LMMs) in few-shot learning suggests that in-context learning (ICL) with many examples can be promising for learning new tasks. However, this many-shot multimodal ICL setting has one crucial p
Externí odkaz:
http://arxiv.org/abs/2406.15334
Recently, Large Language Models (LLMs) attained impressive performance in math and reasoning benchmarks. However, they still often struggle with logic problems and puzzles that are relatively easy for humans. To further investigate this, we introduce
Externí odkaz:
http://arxiv.org/abs/2406.12172
Autor:
Kang, Junmo, Karlinsky, Leonid, Luo, Hongyin, Wang, Zhen, Hansen, Jacob, Glass, James, Cox, David, Panda, Rameswar, Feris, Rogerio, Ritter, Alan
We present Self-MoE, an approach that transforms a monolithic LLM into a compositional, modular system of self-specialized experts, named MiXSE (MiXture of Self-specialized Experts). Our approach leverages self-specialization, which constructs expert
Externí odkaz:
http://arxiv.org/abs/2406.12034
Autor:
Rouditchenko, Andrew, Gong, Yuan, Thomas, Samuel, Karlinsky, Leonid, Kuehne, Hilde, Feris, Rogerio, Glass, James
Audio-Visual Speech Recognition (AVSR) uses lip-based video to improve performance in noise. Since videos are harder to obtain than audio, the video training data of AVSR models is usually limited to a few thousand hours. In contrast, speech models s
Externí odkaz:
http://arxiv.org/abs/2406.10082
Autor:
Lin, Wei, Mirza, Muhammad Jehanzeb, Doveh, Sivan, Feris, Rogerio, Giryes, Raja, Hochreiter, Sepp, Karlinsky, Leonid
Comparing two images in terms of Commonalities and Differences (CaD) is a fundamental human capability that forms the basis of advanced visual reasoning and interpretation. It is essential for the generation of detailed and contextually relevant desc
Externí odkaz:
http://arxiv.org/abs/2406.09240
Autor:
Huang, Irene, Lin, Wei, Mirza, M. Jehanzeb, Hansen, Jacob A., Doveh, Sivan, Butoi, Victor Ion, Herzig, Roei, Arbelle, Assaf, Kuhene, Hilde, Darrel, Trevor, Gan, Chuang, Oliva, Aude, Feris, Rogerio, Karlinsky, Leonid
Compositional Reasoning (CR) entails grasping the significance of attributes, relations, and word order. Recent Vision-Language Models (VLMs), comprising a visual encoder and a Large Language Model (LLM) decoder, have demonstrated remarkable proficie
Externí odkaz:
http://arxiv.org/abs/2406.08164