Zobrazeno 1 - 10
of 510
pro vyhledávání: '"savarese, silvio"'
Autor:
Panagopoulou, Artemis, Zhou, Honglu, Savarese, Silvio, Xiong, Caiming, Callison-Burch, Chris, Yatskar, Mark, Niebles, Juan Carlos
Programming based approaches to reasoning tasks have substantially expanded the types of questions models can answer about visual scenes. Yet on benchmark visual reasoning data, when models answer correctly, they produce incorrect programs 33% of the
Externí odkaz:
http://arxiv.org/abs/2412.08859
Autor:
Zhang, Jieyu, Xue, Le, Song, Linxin, Wang, Jun, Huang, Weikai, Shu, Manli, Yan, An, Ma, Zixian, Niebles, Juan Carlos, savarese, silvio, Xiong, Caiming, Chen, Zeyuan, Krishna, Ranjay, Xu, Ran
With the rise of multimodal applications, instruction data has become critical for training multimodal language models capable of understanding complex image-based queries. Existing practices rely on powerful but costly large language models (LLMs) o
Externí odkaz:
http://arxiv.org/abs/2412.07012
Autor:
Ma, Zixian, Zhang, Jianguo, Liu, Zhiwei, Zhang, Jieyu, Tan, Juntao, Shu, Manli, Niebles, Juan Carlos, Heinecke, Shelby, Wang, Huan, Xiong, Caiming, Krishna, Ranjay, Savarese, Silvio
While open-source multi-modal language models perform well on simple question answering tasks, they often fail on complex questions that require multiple capabilities, such as fine-grained recognition, visual grounding, and reasoning, and that demand
Externí odkaz:
http://arxiv.org/abs/2412.05479
Autor:
Liu, Ye, Meng, Rui, Joty, Shafiq, Savarese, Silvio, Xiong, Caiming, Zhou, Yingbo, Yavuz, Semih
Despite the success of text retrieval in many NLP tasks, code retrieval remains a largely underexplored area. Most text retrieval systems are tailored for natural language queries, often neglecting the specific challenges of retrieving code. This gap
Externí odkaz:
http://arxiv.org/abs/2411.12644
Autor:
Peng, Yun, Gotmare, Akhilesh Deepak, Lyu, Michael, Xiong, Caiming, Savarese, Silvio, Sahoo, Doyen
Large Language Models (LLMs) are widely adopted for assisting in software development tasks, yet their performance evaluations have narrowly focused on the functional correctness of generated code. Human programmers, however, require LLM-generated co
Externí odkaz:
http://arxiv.org/abs/2412.03578
Autor:
Awadalla, Anas, Xue, Le, Shu, Manli, Yan, An, Wang, Jun, Purushwalkam, Senthil, Shen, Sheng, Lee, Hannah, Lo, Oscar, Park, Jae Sung, Guha, Etash, Savarese, Silvio, Schmidt, Ludwig, Choi, Yejin, Xiong, Caiming, Xu, Ran
We introduce BLIP3-KALE, a dataset of 218 million image-text pairs that bridges the gap between descriptive synthetic captions and factual web-scale alt-text. KALE augments synthetic dense image captions with web-scale alt-text to generate factually
Externí odkaz:
http://arxiv.org/abs/2411.07461
Pre-trained on massive amounts of code and text data, large language models (LLMs) have demonstrated remarkable achievements in performing code generation tasks. With additional execution-based feedback, these models can act as agents with capabiliti
Externí odkaz:
http://arxiv.org/abs/2411.04329
Autor:
Chen, Haolin, Feng, Yihao, Liu, Zuxin, Yao, Weiran, Prabhakar, Akshara, Heinecke, Shelby, Ho, Ricky, Mui, Phil, Savarese, Silvio, Xiong, Caiming, Wang, Huan
Large language models (LLMs) have shown impressive capabilities, but still struggle with complex reasoning tasks requiring multiple steps. While prompt-based methods like Chain-of-Thought (CoT) can improve LLM reasoning at inference time, optimizing
Externí odkaz:
http://arxiv.org/abs/2411.04282
Autor:
Huang, Kung-Hsiang, Prabhakar, Akshara, Dhawan, Sidharth, Mao, Yixin, Wang, Huan, Savarese, Silvio, Xiong, Caiming, Laban, Philippe, Wu, Chien-Sheng
Customer Relationship Management (CRM) systems are vital for modern enterprises, providing a foundation for managing customer interactions and data. Integrating AI agents into CRM systems can automate routine processes and enhance personalized servic
Externí odkaz:
http://arxiv.org/abs/2411.02305
Autor:
Ginart, Antonio A., Kodali, Naveen, Lee, Jason, Xiong, Caiming, Savarese, Silvio, Emmons, John
While frontier large language models (LLMs) are capable tool-using agents, current AI systems still operate in a strict turn-based fashion, oblivious to passage of time. This synchronous design forces user queries and tool-use to occur sequentially,
Externí odkaz:
http://arxiv.org/abs/2410.21620