Zobrazeno 1 - 10
of 131
pro vyhledávání: '"Yao Benjamin"'
Autor:
Zhu, Xinliang, Huang, Michael, Ding, Han, Yang, Jinyu, Chen, Kelvin, Zhou, Tao, Neiman, Tal, Xie, Ouye, Tran, Son, Yao, Benjamin, Gray, Doug, Bindal, Anuj, Dhua, Arnab
Publikováno v:
Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024
Image to image matching has been well studied in the computer vision community. Previous studies mainly focus on training a deep metric learning model matching visual patterns between the query image and gallery images. In this study, we show that pu
Externí odkaz:
http://arxiv.org/abs/2412.13364
Autor:
Chen, Changyou, Ding, Han, Sisman, Bunyamin, Xu, Yi, Xie, Ouye, Yao, Benjamin Z., Tran, Son Dinh, Zeng, Belinda
Diffusion-based generative modeling has been achieving state-of-the-art results on various generation tasks. Most diffusion models, however, are limited to a single-generation modeling. Can we generalize diffusion models with the ability of multi-mod
Externí odkaz:
http://arxiv.org/abs/2407.17571
Autor:
Swetha, Sirnam, Yang, Jinyu, Neiman, Tal, Rizve, Mamshad Nayeem, Tran, Son, Yao, Benjamin, Chilimbi, Trishul, Shah, Mubarak
Recent advancements in Multimodal Large Language Models (MLLMs) have revolutionized the field of vision-language understanding by integrating visual perception capabilities into Large Language Models (LLMs). The prevailing trend in this field involve
Externí odkaz:
http://arxiv.org/abs/2407.13851
Autor:
Gupta, Rohit, Rizve, Mamshad Nayeem, Unnikrishnan, Jayakrishnan, Tawari, Ashish, Tran, Son, Shah, Mubarak, Yao, Benjamin, Chilimbi, Trishul
Pre-trained vision-language models (VLMs) have enabled significant progress in open vocabulary computer vision tasks such as image classification, object detection and image segmentation. Some recent works have focused on extending VLMs to open vocab
Externí odkaz:
http://arxiv.org/abs/2407.09073
Autor:
Rizve, Mamshad Nayeem, Fei, Fan, Unnikrishnan, Jayakrishnan, Tran, Son, Yao, Benjamin Z., Zeng, Belinda, Shah, Mubarak, Chilimbi, Trishul
In this paper, we propose VidLA, an approach for video-language alignment at scale. There are two major limitations of previous video-language alignment approaches. First, they do not capture both short-range and long-range temporal dependencies and
Externí odkaz:
http://arxiv.org/abs/2403.14870
Personalized dialogue agents (DAs) powered by large pre-trained language models (PLMs) often rely on explicit persona descriptions to maintain personality consistency. However, such descriptions may not always be available or may pose privacy concern
Externí odkaz:
http://arxiv.org/abs/2306.08126
In recent years, Pre-trained Language Models (PLMs) have shown their superiority by pre-training on unstructured text corpus and then fine-tuning on downstream tasks. On entity-rich textual resources like Wikipedia, Knowledge-Enhanced PLMs (KEPLMs) i
Externí odkaz:
http://arxiv.org/abs/2305.01810
Autor:
Park, Dookun, Yuan, Hao, Kim, Dongmin, Zhang, Yinglei, Spyros, Matsoukas, Kim, Young-Bum, Sarikaya, Ruhi, Guo, Edward, Ling, Yuan, Quinn, Kevin, Hung, Pham, Yao, Benjamin, Lee, Sungjin
Measuring user satisfaction level is a challenging task, and a critical component in developing large-scale conversational agent systems serving the needs of real users. An widely used approach to tackle this is to collect human annotation data and u
Externí odkaz:
http://arxiv.org/abs/2006.07113
Knowledge distillation is typically conducted by training a small model (the student) to mimic a large and cumbersome model (the teacher). The idea is to compress the knowledge from the teacher by using its output probabilities as soft-labels to opti
Externí odkaz:
http://arxiv.org/abs/1910.03723
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.